Featured Research
The ultimate goal of our research is to build trustworthy, interactive, and human-centered autonomous embodied agents that can perceive, understand, and reason about the physical world; safely interact and collaborate with humans; and efficiently coordinate with other intelligent agents so that they can benefit society in daily lives. To achieve this goal, we have been pursuing interdisciplinary research and unifying the techniques and tools from robotics, trustworthy AI/ML, deep reinforcement learning, control theory, optimization, and computer vision.
While larger language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable proficiency in comprehending natural language and translating human instructions into detailed plans for straightforward robotic tasks, they encounter significant difficulties when tackling long-horizon complex tasks.
In particular, the challenges of sub-task identification and allocation become especially complicated in scenarios involving cooperative teams of heterogeneous robots.
To address these challenges, we have proposed a novel multi-agent task planning framework designed to excel in long-horizon tasks, which integrates the reasoning capabilities of LLMs with traditional heuristic search planning, which achieves high success rates and efficiency while demonstrating robust generalization across various tasks.
This research not only contributes to the advancement of task planning for heterogeneous robotic teams but also lays the groundwork for future explorations in multi-agent collaboration.
Related Publications:
1. LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner, ICRA 2025.
2. Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning, under review.
3. Can Large Vision Language Models Read Maps Like a Human?, under review.
Mobile robots and autonomous vehicles rely heavily on onboard sensors for perceiving and understanding the surroundings,
and thereby learning safe and efficient planning strategies. While deep learning-based perception methods demonstrate impressive abilities in various tasks,
including 2D/3D object detection, occupancy prediction, segmentation, tracking, such dependency is vulnerable to situations with occlusions, impaired visibility, and long-range perception.
With the advancement of multi-agent communication, connected and automated vehicles (CAVs) can overcome the inherent limitations of single autonomous vehicle and mobile robots can collaboratively navigate complex environments.
Our research firstly demonstrates enhanced situational awareness by sharing information between CAVs on both perception and motion prediction modules.
Our framework design is robust to tolerate realistic V2X bandwidth limitations and transmission delays. Through extensive experiments and ablation studies on both simulated
and real-world V2V datasets, we demonstrate the effectiveness of our method in cooperative perception, tracking, and motion prediction.
This work advances multi-agent cooperation in robotics and autonomous driving systems.
Related Publications:
1. CMP: Cooperative Motion Prediction with Multi-Agent Communication, IEEE Robotics and Automation Letters (RA-L), 2025.
2. STAMP: Scalable Task And Model-agnostic Collaborative Perception, ICLR 2025.
3. CoMamba: Real-time Cooperative Perception Unlocked with State Space Models, under review.