Research Output

Publications

Selected and full publication record across visual understanding, multimodal perception, robust learning, datasets, and emerging embodied or real-world AI. Citation counts are best viewed on Google Scholar.

Selected

Representative Papers

arXiv 2026

IMPACT: A Dataset for Multi-Granularity Human Procedural Action Understanding in Industrial Assembly

Di Wen, Z. Zhong, D. Schneider, M. Zaremski, L. Kunzmann, Y. Shi, R. Liu, et al.

Multi-granularity procedural action understanding in industrial assembly.

Paper Code Project Dataset

By Year

2026

Rethinking Video Human-Object Interaction: Set Prediction over Time for Unified Detection and Anticipation. Y. Luo, D. Wen, K. Peng, R. Liu, J. Zheng, Y. Chen, J. Wei, R. Stiefelhagen. arXiv preprint arXiv:2604.10397, 2026. Paper
IMPACT: A Dataset for Multi-Granularity Human Procedural Action Understanding in Industrial Assembly. D. Wen, Z. Zhong, D. Schneider, M. Zaremski, L. Kunzmann, Y. Shi, R. Liu, et al. arXiv preprint arXiv:2604.10409, 2026. Paper Code Project
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments. D. Wen, L. Qi, K. Peng, K. Yang, F. Teng, A. Luo, J. Fu, Y. Chen, R. Liu, Y. Shi, et al. International Conference on Learning Representations, 2026. Paper
MICA: Multi-Agent Industrial Coordination Assistant. D. Wen, K. Peng, J. Zheng, Y. Chen, Y. Shi, J. Wei, R. Liu, K. Yang, R. Stiefelhagen. IEEE International Conference on Robotics and Automation, 2026. Paper Code
Towards Multi-Source Domain Generalization for Sleep Staging with Noisy Labels. K. Wang, D. Wen, Y. Chen, R. Liu, J. Zheng, J. Wei, K. Yang, R. Stiefelhagen, et al. arXiv preprint arXiv:2604.10009, 2026. Paper
ProOOD: Prototype-Guided Out-of-Distribution 3D Occupancy Prediction. Y. Zhang, M. Duan, K. Peng, Y. Wang, D. Wen, D. P. Paudel, L. Van Gool, et al. arXiv preprint arXiv:2604.01081, 2026. Paper
Not an Obstacle for Dog, but a Hazard for Human: A Co-Ego Navigation System for Guide Dog Robots. R. Liu, J. Zhang, J. Zheng, Y. Chen, P. S. Lee, D. Wen, K. Peng, J. Zhang, et al. arXiv preprint arXiv:2603.20121, 2026. Paper
InterEdit: Navigating Text-Guided Multi-Human 3D Motion Editing. Y. Yang, D. Wen, L. Qi, W. Kong, J. Zheng, R. Liu, Y. Chen, C. Wu, K. Yang, et al. arXiv preprint arXiv:2603.13082, 2026. Paper
M²-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs. K. Lin, K. Peng, D. Wen, Y. Chen, R. Liu, K. Yang. arXiv preprint arXiv:2603.09737, 2026. Paper
What if? Emulative Simulation with World Models for Situated Reasoning. R. Liu, Y. Chen, Y. Zhang, J. Zheng, K. Peng, C. Wu, C. Huang, D. Wen, et al. arXiv preprint arXiv:2603.06445, 2026. Paper
Can we Trust Unreliable Voxels? Exploring 3D Semantic Occupancy Prediction under Label Noise. W. Li, K. Peng, D. Wen, J. Zheng, J. Wei, M. Duan, Y. Zhang, R. Fan, K. Yang. arXiv preprint arXiv:2603.06279, 2026. Paper
SGR3 Model: Scene Graph Retrieval-Reasoning Model in 3D. Z. Wang, R. Liu, Y. Chen, J. Zheng, W. Fan, K. Peng, D. Wen, J. Wei, J. Zhang, et al. arXiv preprint arXiv:2603.04614, 2026. Paper
Mitigating Label Noise using Prompt-Based Hyperbolic Meta-Learning in Open-Set Domain Generalization. K. Peng, D. Wen, M. S. Sarfraz, Y. Chen, J. Zheng, D. Schneider, K. Yang, J. Wu, et al. International Journal of Computer Vision 134(3): 99, 2026. Paper

2025

RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba. K. Peng, D. Wen, J. Fu, J. Wu, K. Yang, J. Zheng, R. Liu, Y. Chen, Y. Fu, et al. arXiv preprint arXiv:2510.16444, 2025. Paper
EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels. K. Peng, D. Wen, K. Yang, J. Fu, Y. Chen, R. Liu, J. Wu, J. Zheng, M. S. Sarfraz, et al. arXiv preprint arXiv:2510.12687, 2025. Paper
Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels. W. Kong, Z. Zeng, D. Wen, J. Wei, K. Peng, J. M. Goo, J. Boehm, et al. arXiv preprint arXiv:2510.09035, 2025. Paper
Segment-to-Act: Label-Noise-Robust Action-Prompted Video Segmentation Towards Embodied Intelligence. W. Li, K. Peng, D. Wen, R. Liu, M. Duan, K. Luo, K. Yang. arXiv preprint arXiv:2509.16677, 2025. Paper
Snap, Segment, Deploy: A Visual Data and Detection Pipeline for Wearable Industrial Assistants. D. Wen, J. Zheng, R. Liu, Y. Xu, K. Peng, R. Stiefelhagen. SMC 2025 / arXiv:2507.21072, 2025. Paper Code
RoHOI: Robustness Benchmark for Human-Object Interaction Detection. D. Wen, K. Peng, K. Yang, Y. Chen, R. Liu, J. Zheng, A. Roitberg, D. P. Paudel, et al. arXiv preprint arXiv:2507.09111, 2025. Paper Code
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios. K. Peng, J. Huang, X. Huang, D. Wen, J. Zheng, Y. Chen, K. Yang, J. Wu, et al. arXiv preprint arXiv:2506.09650, 2025. Paper
Exploring Video-Based Driver Activity Recognition under Noisy Labels. L. Fan, D. Wen, K. Peng, K. Yang, J. Zhang, R. Liu, Y. Chen, J. Zheng, J. Wu, et al. SMC 2025 / arXiv:2504.11966, 2025. Paper
VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility. Y. Shi, D. Wen, G. Chen, E. Welte, S. Liu, K. Peng, R. Stiefelhagen, R. Rayyes. IROS 2025. Paper
Graph-based Document Structure Analysis. Y. Chen, R. Liu, J. Zheng, D. Wen, K. Peng, J. Zhang, R. Stiefelhagen. ICLR 2025 / arXiv:2502.02501. Paper
Grasp the Invisibility by Vision-Language guided Active View Planning. Y. Shi, D. Wen, E. Welte, K. Peng, R. Stiefelhagen, R. Rayyes. Preprint, 2025.

2024

Referring Atomic Video Action Recognition. K. Peng, J. Fu, K. Yang, D. Wen, Y. Chen, R. Liu, J. Zheng, J. Zhang, et al. European Conference on Computer Vision, 166-185, 2024. Paper arXiv
Advancing Open-Set Domain Generalization Using Evidential Bi-Level Hardest Domain Scheduler. K. Peng, D. Wen, K. Yang, A. Luo, Y. Chen, J. Fu, M. S. Sarfraz, A. Roitberg, et al. Advances in Neural Information Processing Systems 37, 85412-85440, 2024. Paper arXiv
Skeleton-Based Human Action Recognition with Noisy Labels. Y. Xu, K. Peng, D. Wen, R. Liu, J. Zheng, Y. Chen, J. Zhang, A. Roitberg, et al. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024. Paper

2023

FeatFSDA: Towards Few-Shot Domain Adaptation for Video-Based Activity Recognition. K. Peng, D. Wen, D. Schneider, J. Zhang, K. Yang, M. S. Sarfraz, R. Stiefelhagen, A. Roitberg. arXiv preprint arXiv:2305.08420, 2023. Paper