my face

Shi Feng

Assistant professor at The George Washington University.
[email] [scholar] [cv] [twitter]
I work on AI safety, in particular improving human supervision / scalable oversight.
Crowdsourcing truth at scale has been a main driving force behind the recent AI advancements, but AIs are quickly evolving beyond the paradigm where resource-constrained crowd workers can reliably provide supervision. My approach to this problem is to design truth-finding processes for and with AIs.
Recent research threads:

Talks

  • Sep 2024 Challenges in AI-Assisted AI Supervision [pdf]
  • May 2023 Evaluating AI: From Crowdsourcing Truths to Truth-finding Processes [pdf]
  • Jul 2022 NAACL Tutorial on Human Evaluations of Explanations [website]
  • Apr 2019 NLP Highlights Podcast on pathologies of neural models [spotify]

Papers

  • Managing Diffuse Risks in the Safe Deployment of Untrusted Large Language Models
    Jiaxin Wen, Vivek Hebbar, Caleb Larson, Aryan Bhatt, Ansh Radhakrishnan, Mrinank Sharma, Henry Sleight, Shi Feng, He He, Ethan Perez, Buck Shlegeris, Akbir Khan
    ICLR 2025 [arxiv]

  • Language Models Learn to Mislead Humans via RLHF
    Jiaxin Wen, Ruiqi Zhong, Akbir Khan, Ethan Perez, Jacob Steinhardt, Minlie Huang, Samuel R. Boman, He He, Shi Feng
    ICLR 2025 [arxiv]

  • Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?
    Nishant Balepur, Feng Gu, Abhilasha Ravichander, Shi Feng, Jordan Lee Boyd-Graber, Rachel Rudinger
    NAACL 2025 [arxiv]

  • Spontaneous Reward Hacking in Iterative Self-Refinement
    Jane Pan, He He, Samuel R. Bowman, Shi Feng
    [arxiv]

  • LLM Evaluators Recognize and Favor Their Own Generations
    Arjun Panickssery, Samuel R. Bowman, Shi Feng
    NeurIPS 2024, oral [arxiv]

  • Large Language Models Help Humans Verify Truthfulness—Except When They Are Convincingly Wrong
    Chenglei Si, Navita Goyal, Sherry Wu, Chen Zhao, Shi Feng, Hal Daumé III, Jordan Boyd-Graber
    NAACL 2024 [arxiv]

  • KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students
    Matt Shu, Nishant Balepur, Shi Feng, Jordan Boyd-Graber
    EMNLP 2024 [arxiv]

  • A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick
    Nishant Balepur, Matt Shu, Alexander Hoyle, Alison Robey, Shi Feng, Seraphina Goldfarb-Tarrant, Jordan Boyd-Graber
    EMNLP 2024 [arxiv]

  • Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations
    Chenglei Si*, Dan Friedman*, Nitish Joshi, Shi Feng, Danqi Chen, He He
    ACL 2023 [acl]

  • Machine Explanations and Human Understanding
    Chacha Chen*, Shi Feng*, Amit Sharma, Chenhao Tan
    TMLR 2023, FAccT 2023, and best paper at HMCaT @ ICML 2022 [arxiv]

  • Learning Human-Compatible Representations for Case-Based Decision Support
    Han Liu, Yizhou Tian, Chacha Chen, Shi Feng, Yuxin Chen, Chenhao Tan
    ICLR 2023 [openreview]

  • Learning to Explain Selectively
    Shi Feng, Jordan Boyd-Graber
    EMNLP 2022 [acl] [pdf]

  • Active Example Selection for In-Context Learning
    Yiming Zhang, Shi Feng, Chenhao Tan
    EMNLP 2022 [acl] [pdf]

  • Calibrate Before Use: Improving Few-shot Performance of Language Models
    Tony Z. Zhao*, Eric Wallace*, Shi Feng, Dan Klein, Sameer Singh
    ICML 2021 [pmlr] [pdf]

  • Concealed Data Poisoning Attacks on NLP Models
    Eric Wallace*, Tony Z. Zhao*, Shi Feng, Sameer Singh
    NAACL 2021 [acl] [pdf] [blog]

  • Universal Adversarial Triggers for Attacking and Analyzing NLP
    Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh
    EMNLP 2019, oral [acl] [pdf] [blog]

  • Misleading Failures of Partial-input Baselines
    Shi Feng, Eric Wallace, Jordan Boyd-Graber
    ACL 2019, short paper [acl] [pdf]

  • Quizbowl: The Case for Incremental Question Answering
    Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, Jordan Boyd-Graber
    In submission [arxiv]

  • Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation
    Sahil Singla, Eric Wallace, Shi Feng, Soheil Feizi
    ICML 2019 [pmlr] [pdf]

  • Trick Me If You Can: Human-in-the-loop Generation of Adversarial Examples for Question Answering
    Eric Wallace, Pedro Rodriguez, Shi Feng, Jordan Boyd-Graber
    TACL 2019 [acl] [pdf]

  • What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play
    Shi Feng, Jordan Boyd-Graber
    IUI 2019 [arxiv]

  • Pathologies of Neural Models Make Interpretation Difficult
    Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez, Jordan Boyd-Graber
    EMNLP 2018, oral [acl] [pdf] [talk]

  • Interpreting Neural Networks with Nearest Neighbors
    Eric Wallace*, Shi Feng*, Jordan Boyd-Graber
    BlackboxNLP @ EMNLP 2018 [acl] [pdf]

  • The UMD Neural Machine Translation Systems at WMT17 Bandit Learning Task
    Amr Sharaf, Shi Feng, Khanh Nguyen, Kianté Brantley, Hal Daumé III
    WMT @ EMNLP 2017 [acl] [pdf]

  • Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
    Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, Kenny Q. Zhu
    COLING 2016 [acl] [pdf]