Xuwang Yin

I build AI models that truly understand and that we can trust.

One model should be enough.

Generative AI and discriminative AI have traditionally been two separate worlds—different models, different training, different applications. But a model that truly understands should be able to both recognize and imagine. Inspired by Yann LeCun's vision of energy-based models, I build unified models that achieve state-of-the-art classification and generation—while also explaining their decisions—all trained with a single objective.

→ Scalable Energy-Based Models via Adversarial Training (arXiv)
Building on: ICLR 2020, ECCV 2022

Building trustworthy AI.

Previously at the Center for AI Safety, I worked on making LLMs transparent and controllable—understanding their internal representations, evaluating their robustness, and analyzing their emergent behaviors.

→ Utility Engineering (NeurIPS 2025)
→ HarmBench (ICML 2024)
→ Representation Engineering (arXiv)