Xuwang Yin
Independent AI researcher. Energy-based models, adversarial robustness, AI safety—building towards AI that truly understands and that we can trust.
Unified discriminative-generative modeling
Generative AI and discriminative AI have traditionally been two separate worlds—different models, different training, different applications. But a model that truly understands should be able to both recognize and imagine. Building on the energy-based learning framework, my research unifies them in a single model, where classification is grounded in the model's ability to generate, and decisions can be explained through counterfactual examples.
→ Scalable Energy-Based Models via Adversarial Training (ICLR 2026)
Building on: Detecting Adversarial Examples (ICLR 2020), AT-EBMs (ECCV 2022)
AI safety and interpretability
Previously at the Center for AI Safety, I worked on making LLMs transparent and controllable—understanding their internal representations, evaluating their robustness, and analyzing their emergent behaviors.
→ Utility Engineering (NeurIPS 2025)
→ HarmBench (ICML 2024)
→ Representation Engineering (arXiv)