Kaiwen Zhou

This is Kaiwen Zhou, a final-year Ph.D student at the University of California, Santa Cruz, fortunately advised by Prof. Xin (Eric) Wang. My current research focuses on Responsible AI and AI agents, aiming to build Safe and Aligned AGI in the long run. Below is a list of research areas I’ve worked on (purple denotes first-author contributions):

LLM Alignment Training: SafeKey (EMNLP 2025)
Safety Evaluation and Red-teaming: R1 Safety Eval (IJCNLP-AACL 2025), Multimodal Situational Safety (ICLR 2025), SIRAJ
Responsible Embodied Agent: FedVLN (ECCV 2022), Navigation as the Attacker Wishes (NAACL 2024), Multimodal Situational Safety (ICLR 2025)
LLM and Agents: ESC (ICML 2023), JARVIS (NeSy 2025 Oral), EvoPresent
Multimodal Understanding & Reasoning: ViCor (ACL Findings 2024), Multipanel VQA (ACL 2024)

Before joining UCSC, I received my bachelor’s degree in statistics from Zhejiang University.

I am now on the job market! I would be happy to discuss any opportunities that may be a good fit.

News

Our SafeKey paper is accepted by EMNLP 2025!(08/2025)
Invited talk at Microsoft on safety reasoning!(06/2025)
Honored to receive UCSC Dissertation-Year Fellowship to support my research on Trustworthy AI!(06/2025)
I will join Microsoft as a research intern this summer!(03/2025)
Our MSSBench paper is accepted by ICLR 2025!(01/2025)
Two papers are accepted by ACL 2024!(05/2024)
One paper is accepted by NAACL 2024!(03/2024)
Our SlugJARVIS team won the third place in the first-ever Amazon Alexa SimBot challenge! (06/2023)
Our ESC paper is accepted by ICML 2023!(04/2023)
I will join Honda Research Institute as research intern this spring and summer!(04/2023)
Our paper FedVLN is accepted by ECCV 2022!(07/2022)
We are ranking No.1 in Alexa Prize SimBot Public Benchmark Challenge!(04/2022)
I will join Samsung AI Center as a research intern this summer!(04/2022)

Selected Publications

SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
Kaiwen Zhou, Ahmed Elgohary, A S M Iftekhar, Amin Saied
NeurIPS 2025 Lock-LLM Workshop
[Paper]

Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
Chengzhi Liu*, Yuzhe Yang*, Kaiwen Zhou, Zhen Zhang, Yue Fan, Yannan Xie, Peng Qi, Xin Eric Wang
Arxiv 2025
[Paper] [Website] [Code] [Data]

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
Kaiwen Zhou, Xuandong Zhao, Gaowen Liu, Jayanth Srinivasa, Aosong Feng, Dawn Song, Xin Eric Wang
EMNLP 2025
[Paper] [Website] [Code] [Models]

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
Kaiwen Zhou, Chengzhi Liu, Xuandong Zhao, Shreedhar Jangam, Jayanth Srinivasa, Gaowen Liu, Dawn Song, Xin Eric Wang
IJCNLP-AACL 2025
[Paper] [Website]

Multimodal Situational Safety
Kaiwen Zhou*, Chengzhi Liu*, Xuandong Zhao, Anderson Compalas, Dawn Song, Xin Eric Wang
ICLR 2025
NeurIPS Workshop on RBFM 2024 Oral
[Paper] [Website] [Code] [Data]

Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Yue Fan, Jing Gu, Kaiwen Zhou, Qianqi Yan, Shan Jiang, Ching-Chen Kuo, Xinze Guan, Xin Eric Wang
ACL 2024
[Paper] [Website]

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
Kaiwen Zhou, Kwonjoon Lee, Teruhisa Misu, Xin Eric Wang
Findings of ACL 2024
[Paper]

Navigation as the Attacker Wishes? Towards Building Byzantine-Robust Embodied Agents under Federated Learning
Yunchao Zhang, Zonglin Di, Kaiwen Zhou, Cihang Xie, Xin Eric Wang
NAACL 2024
[Paper]

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Kaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise Getoor, Xin Eric Wang
ICML 2023
[Paper] [Website]

FedVLN: Privacy-preserving Federated Vision-and-Language Navigation
Kaiwen Zhou, Xin Eric Wang
ECCV 2022
[Paper] [Code]

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Kaizhi Zheng*, Kaiwen Zhou*, Jing Gu*, Yue Fan*, Zonglin Di*, Jialu Wang, Xuehai He, Xin Eric Wang
SoCal NLP 2022, NeSy 2025 Oral
Winner Model of the Alexa Prize SimBot Public Benchmark Challenge
[Paper]

Service

Reviewer
NeurIPS 2023, ICLR 2024, ICML 2024, ICLR 2025, ICLR 2026