Generative Pre-Training Transformer

GPT4D: Generative Pre-training Transformer with Next-Scale Spatio-temporal Token Prediction for 4D Human Action Recognition

GPT4D is an autoregressive generative framework that reformulates 4D point cloud video understanding as next-token prediction, integrating long-range motion priors with local geometric details to achieve state-of-the-art performance on human action recognition benchmarks.

Mar 5, 2026