Shi Feng
Assistant professor at The George Washington University.
[email]
[scholar]
[cv]
[twitter]
I work on AI safety, in particular improving human supervision / scalable oversight.
Crowdsourcing truth at scale has been a main driving force behind the recent AI advancements, but AIs are quickly evolving beyond the paradigm where resource-constrained crowd workers can reliably provide supervision. My approach to this problem is to design truth-finding processes for and with AIs.
Recent research threads:
Talks
- Sep 2024 Challenges in AI-Assisted AI Supervision [pdf]
- May 2023 Evaluating AI: From Crowdsourcing Truths to Truth-finding Processes [pdf]
- Jul 2022 NAACL Tutorial on Human Evaluations of Explanations [website]
- Apr 2019 NLP Highlights Podcast on pathologies of neural models [spotify]
Papers
-
Managing Diffuse Risks in the Safe Deployment of Untrusted Large Language Models
Jiaxin Wen, Vivek Hebbar, Caleb Larson, Aryan Bhatt, Ansh Radhakrishnan, Mrinank Sharma, Henry Sleight, Shi Feng, He He, Ethan Perez, Buck Shlegeris, Akbir Khan
ICLR 2025
[arxiv]
-
Language Models Learn to Mislead Humans via RLHF
Jiaxin Wen, Ruiqi Zhong, Akbir Khan, Ethan Perez, Jacob Steinhardt, Minlie Huang, Samuel R. Boman, He He, Shi Feng
ICLR 2025
[arxiv]
-
Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?
Nishant Balepur, Feng Gu, Abhilasha Ravichander, Shi Feng, Jordan Lee Boyd-Graber, Rachel Rudinger
NAACL 2025
[arxiv]
-
Spontaneous Reward Hacking in Iterative Self-Refinement
Jane Pan, He He, Samuel R. Bowman, Shi Feng
[arxiv]
-
LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery, Samuel R. Bowman, Shi Feng
NeurIPS 2024, oral
[arxiv]
-
Large Language Models Help Humans Verify Truthfulness—Except When They Are Convincingly Wrong
Chenglei Si, Navita Goyal, Sherry Wu, Chen Zhao, Shi Feng, Hal Daumé III, Jordan Boyd-Graber
NAACL 2024
[arxiv]
-
KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students
Matt Shu, Nishant Balepur, Shi Feng, Jordan Boyd-Graber
EMNLP 2024
[arxiv]
-
A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick
Nishant Balepur, Matt Shu, Alexander Hoyle, Alison Robey, Shi Feng, Seraphina Goldfarb-Tarrant, Jordan Boyd-Graber
EMNLP 2024
[arxiv]
-
Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations
Chenglei Si*, Dan Friedman*, Nitish Joshi, Shi Feng, Danqi Chen, He He
ACL 2023
[acl]
-
Machine Explanations and Human Understanding
Chacha Chen*, Shi Feng*, Amit Sharma, Chenhao Tan
TMLR 2023, FAccT 2023, and best paper at HMCaT @ ICML 2022
[arxiv]
-
Learning Human-Compatible Representations for Case-Based Decision Support
Han Liu, Yizhou Tian, Chacha Chen, Shi Feng, Yuxin Chen, Chenhao Tan
ICLR 2023
[openreview]
-
Learning to Explain Selectively
Shi Feng, Jordan Boyd-Graber
EMNLP 2022
[acl]
[pdf]
-
Active Example Selection for In-Context Learning
Yiming Zhang, Shi Feng, Chenhao Tan
EMNLP 2022
[acl]
[pdf]
-
Calibrate Before Use: Improving Few-shot Performance of Language Models
Tony Z. Zhao*, Eric Wallace*, Shi Feng, Dan Klein, Sameer Singh
ICML 2021
[pmlr]
[pdf]
-
Concealed Data Poisoning Attacks on NLP Models
Eric Wallace*, Tony Z. Zhao*, Shi Feng, Sameer Singh
NAACL 2021
[acl]
[pdf]
[blog]
-
Universal Adversarial Triggers for Attacking and Analyzing NLP
Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh
EMNLP 2019, oral
[acl]
[pdf]
[blog]
-
Misleading Failures of Partial-input Baselines
Shi Feng, Eric Wallace, Jordan Boyd-Graber
ACL 2019, short paper
[acl]
[pdf]
-
Quizbowl: The Case for Incremental Question Answering
Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, Jordan Boyd-Graber
In submission [arxiv]
-
Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation
Sahil Singla, Eric Wallace, Shi Feng, Soheil Feizi
ICML 2019
[pmlr]
[pdf]
-
Trick Me If You Can: Human-in-the-loop Generation of Adversarial Examples for Question Answering
Eric Wallace, Pedro Rodriguez, Shi Feng, Jordan Boyd-Graber
TACL 2019
[acl]
[pdf]
-
What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play
Shi Feng, Jordan Boyd-Graber
IUI 2019
[arxiv]
-
Pathologies of Neural Models Make Interpretation Difficult
Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez, Jordan Boyd-Graber
EMNLP 2018, oral
[acl]
[pdf]
[talk]
-
Interpreting Neural Networks with Nearest Neighbors
Eric Wallace*, Shi Feng*, Jordan Boyd-Graber
BlackboxNLP @ EMNLP 2018
[acl]
[pdf]
-
The UMD Neural Machine Translation Systems at WMT17 Bandit Learning Task
Amr Sharaf, Shi Feng, Khanh Nguyen, Kianté Brantley, Hal Daumé III
WMT @ EMNLP 2017
[acl]
[pdf]
-
Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, Kenny Q. Zhu
COLING 2016
[acl]
[pdf]