Publications

* equal contribution. † corresponding author.

Remote Sensing Image Interpretation

arXiv 2025
DescribeEarth

DescribeEarth: Describe Anything for Remote Sensing Images

Kaiyu Li*, Zixuan Jiang*, Xiangyong Cao†, Jiayu Wang, Yuchen Xiao, Deyu Meng, Zhi Wang

Resources: Code, Dataset, Benchmark

  • We introduce geo-spatial detailed localized captioning.
  • We build the first describe-anything model in remote sensing.
  • We release the related dataset and benchmark.

Media Coverage: 遥感与深度学习, 码科智能, CV炼丹术

Audio Intelligence

arXiv 2026
Interactive ASR

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Zixuan Jiang*, Yanqiao Zhu*, Peng Wang*, Qinyuan Chen, Xinjian Zhao, Xipeng Qiu, Wupeng Wang, Zhifu Gao, Xiangang Li, Kai Yu, Xie Chen†

Resources: Project Page, Live Demo

  • We propose Interactive ASR, extending one-pass ASR into an interactive system with user feedback and semantic correction.
  • We propose Agentic ASR, an agent-based framework enabling interactive speech recognition.
  • We develop the semantic consistency metric $S^2ER$ and a simulation framework, ISS, for evaluating Interactive ASR.