Xuwang Yin
I build AI models that truly understand and that we can trust.
One model should be enough.
Generative AI and discriminative AI have traditionally been two separate worlds—different models, different training, different applications. But a model that truly understands should be able to both recognize and imagine. Inspired by Yann LeCun's vision of energy-based models, I build unified models where classification is grounded in generation: decisions are anchored to learned data structure, so the model can only classify what it can imagine—and can explain decisions with counterfactuals.
→ Scalable Energy-Based Models via Adversarial Training (ICLR 2026)
Building on: Detecting Adversarial Examples (ICLR 2020), AT-EBMs (ECCV 2022)
Safety through understanding.
Previously at the Center for AI Safety, I worked on making LLMs transparent and controllable—understanding their internal representations, evaluating their robustness, and analyzing their emergent behaviors.
→ Utility Engineering (NeurIPS 2025)
→ HarmBench (ICML 2024)
→ Representation Engineering (arXiv)