my face

Shi Feng

Assistant professor at The George Washington University.
[email] [scholar] [cv] [twitter]
I work on AI safety, in particular improving human supervision / scalable oversight.
Crowdsourcing truth at scale has been a main driving force behind the recent AI advancements, but AIs are quickly evolving beyond the paradigm where resource-constrained crowd workers can reliably provide supervision. My approach to this problem is to design truth-finding processes for and with AIs.
Recent research threads:

Talks

  • Sep 2024 Challenges in AI-Assisted AI Supervision [pdf]
  • May 2023 Evaluating AI: From Crowdsourcing Truths to Truth-finding Processes [pdf]
  • Jul 2022 NAACL Tutorial on Human Evaluations of Explanations [website]
  • Apr 2019 NLP Highlights Podcast on pathologies of neural models [spotify]

Papers

  • Language Models Learn to Mislead Humans via RLHF
    Jiaxin Wen, Ruiqi Zhong, Akbir Khan, Ethan Perez, Jacob Steinhardt, Minlie Huang, Samuel R. Boman, He He, Shi Feng
    [arxiv]

  • Spontaneous Reward Hacking in Iterative Self-Refinement
    Jane Pan, He He, Samuel R. Bowman, Shi Feng
    [arxiv]

  • LLM Evaluators Recognize and Favor Their Own Generations
    Arjun Panickssery, Samuel R. Bowman, Shi Feng
    NeurIPS 2024, oral [arxiv]

  • Large Language Models Help Humans Verify Truthfulness—Except When They Are Convincingly Wrong
    Chenglei Si, Navita Goyal, Sherry Wu, Chen Zhao, Shi Feng, Hal Daumé III, Jordan Boyd-Graber
    NAACL 2024 [arxiv]

  • KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students
    Matt Shu, Nishant Balepur, Shi Feng, Jordan Boyd-Graber
    EMNLP 2024 [arxiv]

  • A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick
    Nishant Balepur, Matt Shu, Alexander Hoyle, Alison Robey, Shi Feng, Seraphina Goldfarb-Tarrant, Jordan Boyd-Graber
    EMNLP 2024 [arxiv]

  • Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations
    Chenglei Si*, Dan Friedman*, Nitish Joshi, Shi Feng, Danqi Chen, He He
    ACL 2023 [acl]

  • Machine Explanations and Human Understanding
    Chacha Chen*, Shi Feng*, Amit Sharma, Chenhao Tan
    TMLR 2023, FAccT 2023, and best paper at HMCaT @ ICML 2022 [arxiv]

  • Learning Human-Compatible Representations for Case-Based Decision Support
    Han Liu, Yizhou Tian, Chacha Chen, Shi Feng, Yuxin Chen, Chenhao Tan
    ICLR 2023 [openreview]

  • Learning to Explain Selectively
    Shi Feng, Jordan Boyd-Graber
    EMNLP 2022 [acl] [pdf]

  • Active Example Selection for In-Context Learning
    Yiming Zhang, Shi Feng, Chenhao Tan
    EMNLP 2022 [acl] [pdf]

  • Calibrate Before Use: Improving Few-shot Performance of Language Models
    Tony Z. Zhao*, Eric Wallace*, Shi Feng, Dan Klein, Sameer Singh
    ICML 2021 [pmlr] [pdf]

  • Concealed Data Poisoning Attacks on NLP Models
    Eric Wallace*, Tony Z. Zhao*, Shi Feng, Sameer Singh
    NAACL 2021 [acl] [pdf] [blog]

  • Universal Adversarial Triggers for Attacking and Analyzing NLP
    Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh
    EMNLP 2019, oral [acl] [pdf] [blog]

  • Misleading Failures of Partial-input Baselines
    Shi Feng, Eric Wallace, Jordan Boyd-Graber
    ACL 2019, short paper [acl] [pdf]

  • Quizbowl: The Case for Incremental Question Answering
    Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, Jordan Boyd-Graber
    In submission [arxiv]

  • Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation
    Sahil Singla, Eric Wallace, Shi Feng, Soheil Feizi
    ICML 2019 [pmlr] [pdf]

  • Trick Me If You Can: Human-in-the-loop Generation of Adversarial Examples for Question Answering
    Eric Wallace, Pedro Rodriguez, Shi Feng, Jordan Boyd-Graber
    TACL 2019 [acl] [pdf]

  • What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play
    Shi Feng, Jordan Boyd-Graber
    IUI 2019 [arxiv]

  • Pathologies of Neural Models Make Interpretation Difficult
    Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez, Jordan Boyd-Graber
    EMNLP 2018, oral [acl] [pdf] [talk]

  • Interpreting Neural Networks with Nearest Neighbors
    Eric Wallace*, Shi Feng*, Jordan Boyd-Graber
    BlackboxNLP @ EMNLP 2018 [acl] [pdf]

  • The UMD Neural Machine Translation Systems at WMT17 Bandit Learning Task
    Amr Sharaf, Shi Feng, Khanh Nguyen, Kianté Brantley, Hal Daumé III
    WMT @ EMNLP 2017 [acl] [pdf]

  • Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
    Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, Kenny Q. Zhu
    COLING 2016 [acl] [pdf]