Research Output

Publications

Selected and full publication record across visual understanding, multimodal perception, robust learning, datasets, and emerging embodied or real-world AI. Citation counts are best viewed on Google Scholar.

Selected

Representative Papers

Archive

By Year

2026

  1. Rethinking Video Human-Object Interaction: Set Prediction over Time for Unified Detection and Anticipation. Y. Luo, D. Wen, K. Peng, R. Liu, J. Zheng, Y. Chen, J. Wei, R. Stiefelhagen. arXiv preprint arXiv:2604.10397, 2026. Paper
  2. IMPACT: A Dataset for Multi-Granularity Human Procedural Action Understanding in Industrial Assembly. D. Wen, Z. Zhong, D. Schneider, M. Zaremski, L. Kunzmann, Y. Shi, R. Liu, et al. arXiv preprint arXiv:2604.10409, 2026. Paper Code Project
  3. Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments. D. Wen, L. Qi, K. Peng, K. Yang, F. Teng, A. Luo, J. Fu, Y. Chen, R. Liu, Y. Shi, et al. International Conference on Learning Representations, 2026. Paper
  4. MICA: Multi-Agent Industrial Coordination Assistant. D. Wen, K. Peng, J. Zheng, Y. Chen, Y. Shi, J. Wei, R. Liu, K. Yang, R. Stiefelhagen. IEEE International Conference on Robotics and Automation, 2026. Paper Code
  5. Towards Multi-Source Domain Generalization for Sleep Staging with Noisy Labels. K. Wang, D. Wen, Y. Chen, R. Liu, J. Zheng, J. Wei, K. Yang, R. Stiefelhagen, et al. arXiv preprint arXiv:2604.10009, 2026. Paper
  6. ProOOD: Prototype-Guided Out-of-Distribution 3D Occupancy Prediction. Y. Zhang, M. Duan, K. Peng, Y. Wang, D. Wen, D. P. Paudel, L. Van Gool, et al. arXiv preprint arXiv:2604.01081, 2026. Paper
  7. Not an Obstacle for Dog, but a Hazard for Human: A Co-Ego Navigation System for Guide Dog Robots. R. Liu, J. Zhang, J. Zheng, Y. Chen, P. S. Lee, D. Wen, K. Peng, J. Zhang, et al. arXiv preprint arXiv:2603.20121, 2026. Paper
  8. InterEdit: Navigating Text-Guided Multi-Human 3D Motion Editing. Y. Yang, D. Wen, L. Qi, W. Kong, J. Zheng, R. Liu, Y. Chen, C. Wu, K. Yang, et al. arXiv preprint arXiv:2603.13082, 2026. Paper
  9. M2-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs. K. Lin, K. Peng, D. Wen, Y. Chen, R. Liu, K. Yang. arXiv preprint arXiv:2603.09737, 2026. Paper
  10. What if? Emulative Simulation with World Models for Situated Reasoning. R. Liu, Y. Chen, Y. Zhang, J. Zheng, K. Peng, C. Wu, C. Huang, D. Wen, et al. arXiv preprint arXiv:2603.06445, 2026. Paper
  11. Can we Trust Unreliable Voxels? Exploring 3D Semantic Occupancy Prediction under Label Noise. W. Li, K. Peng, D. Wen, J. Zheng, J. Wei, M. Duan, Y. Zhang, R. Fan, K. Yang. arXiv preprint arXiv:2603.06279, 2026. Paper
  12. SGR3 Model: Scene Graph Retrieval-Reasoning Model in 3D. Z. Wang, R. Liu, Y. Chen, J. Zheng, W. Fan, K. Peng, D. Wen, J. Wei, J. Zhang, et al. arXiv preprint arXiv:2603.04614, 2026. Paper
  13. Mitigating Label Noise using Prompt-Based Hyperbolic Meta-Learning in Open-Set Domain Generalization. K. Peng, D. Wen, M. S. Sarfraz, Y. Chen, J. Zheng, D. Schneider, K. Yang, J. Wu, et al. International Journal of Computer Vision 134(3): 99, 2026. Paper

2025

  1. RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba. K. Peng, D. Wen, J. Fu, J. Wu, K. Yang, J. Zheng, R. Liu, Y. Chen, Y. Fu, et al. arXiv preprint arXiv:2510.16444, 2025. Paper
  2. EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels. K. Peng, D. Wen, K. Yang, J. Fu, Y. Chen, R. Liu, J. Wu, J. Zheng, M. S. Sarfraz, et al. arXiv preprint arXiv:2510.12687, 2025. Paper
  3. Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels. W. Kong, Z. Zeng, D. Wen, J. Wei, K. Peng, J. M. Goo, J. Boehm, et al. arXiv preprint arXiv:2510.09035, 2025. Paper
  4. Segment-to-Act: Label-Noise-Robust Action-Prompted Video Segmentation Towards Embodied Intelligence. W. Li, K. Peng, D. Wen, R. Liu, M. Duan, K. Luo, K. Yang. arXiv preprint arXiv:2509.16677, 2025. Paper
  5. Snap, Segment, Deploy: A Visual Data and Detection Pipeline for Wearable Industrial Assistants. D. Wen, J. Zheng, R. Liu, Y. Xu, K. Peng, R. Stiefelhagen. SMC 2025 / arXiv:2507.21072, 2025. Paper Code
  6. RoHOI: Robustness Benchmark for Human-Object Interaction Detection. D. Wen, K. Peng, K. Yang, Y. Chen, R. Liu, J. Zheng, A. Roitberg, D. P. Paudel, et al. arXiv preprint arXiv:2507.09111, 2025. Paper Code
  7. HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios. K. Peng, J. Huang, X. Huang, D. Wen, J. Zheng, Y. Chen, K. Yang, J. Wu, et al. arXiv preprint arXiv:2506.09650, 2025. Paper
  8. Exploring Video-Based Driver Activity Recognition under Noisy Labels. L. Fan, D. Wen, K. Peng, K. Yang, J. Zhang, R. Liu, Y. Chen, J. Zheng, J. Wu, et al. SMC 2025 / arXiv:2504.11966, 2025. Paper
  9. VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility. Y. Shi, D. Wen, G. Chen, E. Welte, S. Liu, K. Peng, R. Stiefelhagen, R. Rayyes. IROS 2025. Paper
  10. Graph-based Document Structure Analysis. Y. Chen, R. Liu, J. Zheng, D. Wen, K. Peng, J. Zhang, R. Stiefelhagen. ICLR 2025 / arXiv:2502.02501. Paper
  11. Grasp the Invisibility by Vision-Language guided Active View Planning. Y. Shi, D. Wen, E. Welte, K. Peng, R. Stiefelhagen, R. Rayyes. Preprint, 2025.

2024

  1. Referring Atomic Video Action Recognition. K. Peng, J. Fu, K. Yang, D. Wen, Y. Chen, R. Liu, J. Zheng, J. Zhang, et al. European Conference on Computer Vision, 166-185, 2024. Paper arXiv
  2. Advancing Open-Set Domain Generalization Using Evidential Bi-Level Hardest Domain Scheduler. K. Peng, D. Wen, K. Yang, A. Luo, Y. Chen, J. Fu, M. S. Sarfraz, A. Roitberg, et al. Advances in Neural Information Processing Systems 37, 85412-85440, 2024. Paper arXiv
  3. Skeleton-Based Human Action Recognition with Noisy Labels. Y. Xu, K. Peng, D. Wen, R. Liu, J. Zheng, Y. Chen, J. Zhang, A. Roitberg, et al. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024. Paper

2023

  1. FeatFSDA: Towards Few-Shot Domain Adaptation for Video-Based Activity Recognition. K. Peng, D. Wen, D. Schneider, J. Zhang, K. Yang, M. S. Sarfraz, R. Stiefelhagen, A. Roitberg. arXiv preprint arXiv:2305.08420, 2023. Paper