M.I.O: Towards Interactive Intelligence for Digital Humans

Yiyi Cai¹ Xuangeng Chu^1,2 Xiwei Gao¹ Sitong Gong¹ Yifei Huang^1,2 Caixin Kang^1,2 Kunhang Li^1,2 Haiyang Liu¹ Ruicong Liu^1,2 Yun Liu¹ Dianwen Ng¹ Zixiong Su¹ Erwin Wu^1,3 Yuhan Wu¹ Dingkun Yan¹ Tianyu Yan¹ Chang Zeng¹ Bo Zheng¹ You Zhou¹
Authors are listed in alphabetical order of their last name.

¹Shanda AI Research Tokyo,
²The University of Tokyo, ³Institute of Science Tokyo

Technical Report

Thinker Talker Facial Animator Body Animator Renderer

A demo video of our proposed Multimodal Interactive Omni-Avator (M.I.O) framework.

Introduction

Most existing digital humans remain primarily imitative, reproducing surface patterns of behavior without genuine interactive intelligence: the ability to generate real-time, emotionally coherent responses with consistent personality across voice, face and body motions, appearance.

We model digital humans as autonomous agents with personality-consistent expression, adaptive interaction, and self-evolution, and propose a cascading paradigm composed of five modules: Thinker, Talker, Facial Animator, Body Animator, and Renderer. The Thinker performs contextual reasoning and control, while the remaining modules generate coordinated speech, facial motion, body motion, and final visual appearance in an end-to-end controllable manner.

We further introduce a new benchmark for interactive intelligence evaluating speech, expression, motion, visual style, and personality consistency. Together, these contributions move digital humans beyond superficial imitation toward truly intelligent interaction.

BibTeX

@misc{cai2025interactiveintelligencedigitalhumans,
      title={Towards Interactive Intelligence for Digital Humans}, 
      author={Yiyi Cai and Xuangeng Chu and Xiwei Gao and Sitong Gong and Yifei Huang and Caixin Kang and Kunhang Li and Haiyang Liu and Ruicong Liu and Yun Liu and Dianwen Ng and Zixiong Su and Erwin Wu and Yuhan Wu and Dingkun Yan and Tianyu Yan and Chang Zeng and Bo Zheng and You Zhou},
      year={2025},
      eprint={2512.13674},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.13674}, 
}