Adaptive perception for efficient spatio-temporal language grounding in dynamic environments