Breadcrumb

Featured Research

The ultimate goal of our research is to build trustworthy, interactive, and human-centered autonomous agents that can perceive, understand, and reason about the physical world; safely interact and collaborate with humans and other agents, and clearly explain their behaviors to build trust with humans so that they can benefit society in daily lives. To achieve this goal, we have been pursuing interdisciplinary research and unifying the techniques and tools from robotics, machine learning, reinforcement learning, explainable AI, control theory, optimization, and computer vision.
 

Explainable Relational Reasoning and Multi-Agent Interaction Modeling (Social & Physical)

graph
We investigate relational reasoning and interaction modeling in the context of the trajectory prediction task, which aims to generate accurate, diverse future trajectory hypotheses or state sequences based on historical observations. Our research introduced the first unified relational reasoning toolbox that systematically infers the underlying relations/interactions between entities at different scales (e.g., pairwise, group-wise) and different abstraction levels (e.g., multiplex) by learning dynamic latent interaction graphs and hypergraphs from observable states (e.g., positions) in an unsupervised manner. The learned latent graphs are explainable and generalizable, significantly improving the performance of downstream tasks, including prediction, sequential decision making, and control. We also proposed a physics-guided relational learning approach for physical dynamics modeling.
 
Related Publications:
6. Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation, submitted to IEEE Transactions on Robotics (T-RO), under review.
 

Interaction-Aware Decision Making and Model-Based Control

How Will Self-Driving Cars Be Insured in the Future?Although autonomous navigation in simple, static environments has been well studied, it remains challenging for robots to navigate in highly dynamic, interactive scenarios (e.g., intersections, narrow corridors) where humans are involved. Robots must learn a safe and efficient behavior policy that can model the interactions, coordinate with surrounding static/dynamic entities, and generalize to out-of-distribution (OOD) situations. Our research introduced a novel interaction-aware decision making framework for autonomous vehicles based on reinforcement learning, which integrates human internal state inference, domain knowledge, trajectory prediction, and counterfactual reasoning systematically. We also investigate model-based control methods that leverage the learned pairwise and group-wise relations for social robot navigation around human crowds. Both methods achieve superior performance in the corresponding tasks in terms of a wide range of evaluation metrics and provide explainable, human-understandable intermediate representations to build both users’ and developers’ trust.

Related Publications:
6. Interactive Autonomous Navigation with Internal State Inference and Interactivity Estimation, submitted to IEEE Transactions on Robotics (T-RO), under review.
7. Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation, submitted to IEEE Transactions on Robotics (T-RO), under review.
 

Vision and Language Models for Embodied Intelligence

We investigate foundation models and vision language models (VLMs) for robotics and autonomous systems to enhance their reasoning capability and reliability. For example, inferring the short-term and long-term intentions of traffic participants and understanding the contextual semantics of scenes are the keys to scene understanding and situational awareness of autonomous vehicles. Moreover, how to enable autonomous agents (e.g., self-driving cars) to explain their reasoning, prediction, and decision making processes to human users (e.g., drivers, passengers) in a human understandable form (e.g., natural language) to build humans’ trust remains largely underexplored. Therefore, we created the first multimodal dataset for a new risk object ranking and natural language explanation task in urban scenarios and a rich dataset for intention prediction in autonomous driving, establishing benchmarks for corresponding tasks. Meanwhile, my research introduced novel methods that achieve superior performance on these problems.

Improving Generalizability by Learning Context Relations

How to generalize the prediction to different scenarios is largely underexplored. In contrast to recent works that use the Cartesian coordinate system and global context images directly as input, we propose to leverage human prior knowledge including the comprehension of pairwise relations between agents and pairwise context information extracted by self-supervised learning approaches to attain an effective Frenet-based representation. We demonstrate that our approach achieves superior performance in terms of overall performance, zero-shot, and few-shot transferability across different traffic scenarios with diverse layouts.
 
Related Publications:
 

Continual / Lifelong Learning from Incremental Data

The current mainstream research focuses on how to achieve accurate prediction on one large dataset. However, whether the multi-agent trajectory prediction model can be trained with a sequence of datasets, i.e., continual learning settings, remains a question. Can the current prediction methods avoid catastrophic forgetting? Can we utilize the continual learning strategy in the multi-agent trajectory prediction application? Motivated by the generative replay methods in continual learning literature, we propose a multi-agent interaction behavior prediction framework with a graph neural network-based conditional generative memory system to mitigate catastrophic forgetting. To the best of our knowledge, this work is the first attempt to study the continual learning problem in multi-agent interaction behavior prediction problems. We empirically show that several approaches in literature indeed suffer from catastrophic forgetting, and our approach succeeds in maintaining a low prediction error when datasets come sequentially.

Related Publications:
 

Diverse Prediction and Generation with Deep Generative Models

The objective of generative models is to approximate the true data distribution, with which one can generate new samples similar to real data points with a proper variance. Generative models have been widely employed in representation learning and distribution approximation. We designed novel trajectory or human skeleton motion prediction methods based on deep generative models, which generate accurate and diverse prediction hypotheses. These methods can be broadly applied to time series prediction problems.

State Estimation with Learning-Based Models

We proposed a constrained mixture sequential Monte Carlo method that mitigates mode collapse in sequential Monte Carlo methods for tracking multiple targets and significantly improves tracking accuracy. Since prediction is a step in state estimation, we also proposed that the prior update in the state estimation framework can be implemented with any learning-based interaction-aware prediction model. The results in complex traffic scenarios show that using the prediction model outperforms purely physical models by a large margin due to the capability of relational reasoning. In particular, our method performs significantly better when handling missing or noisy sensor measurements.