Shi Feng
Assistant professor at The George Washington University.
[email]
[scholar]
[cv]
[twitter]
I work on AI safety, in particular improving human supervision / scalable oversight.
Crowdsourcing truth at scale has been a main driving force behind the recent AI advancements, but AIs are quickly evolving beyond the paradigm where resource-constrained crowd workers can reliably provide supervision. My approach to this problem is to design truth-finding processes for and with AIs.
Recent research threads:
Talks
- Sep 2024 Challenges in AI-Assisted AI Supervision [pdf]
- May 2023 Evaluating AI: From Crowdsourcing Truths to Truth-finding Processes [pdf]
- Jul 2022 NAACL Tutorial on Human Evaluations of Explanations [website]
- Apr 2019 NLP Highlights Podcast on pathologies of neural models [spotify]
Papers
-
Language Models Learn to Mislead Humans via RLHF
Jiaxin Wen, Ruiqi Zhong, Akbir Khan, Ethan Perez, Jacob Steinhardt, Minlie Huang, Samuel R. Boman, He He, Shi Feng
[arxiv]
-
Spontaneous Reward Hacking in Iterative Self-Refinement
Jane Pan, He He, Samuel R. Bowman, Shi Feng
[arxiv]
-
LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery, Samuel R. Bowman, Shi Feng
NeurIPS 2024, oral
[arxiv]
-
Large Language Models Help Humans Verify Truthfulness—Except When They Are Convincingly Wrong
Chenglei Si, Navita Goyal, Sherry Wu, Chen Zhao, Shi Feng, Hal Daumé III, Jordan Boyd-Graber
NAACL 2024
[arxiv]
-
KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students
Matt Shu, Nishant Balepur, Shi Feng, Jordan Boyd-Graber
EMNLP 2024
[arxiv]
-
A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick
Nishant Balepur, Matt Shu, Alexander Hoyle, Alison Robey, Shi Feng, Seraphina Goldfarb-Tarrant, Jordan Boyd-Graber
EMNLP 2024
[arxiv]
-
Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations
Chenglei Si*, Dan Friedman*, Nitish Joshi, Shi Feng, Danqi Chen, He He
ACL 2023
[acl]
-
Machine Explanations and Human Understanding
Chacha Chen*, Shi Feng*, Amit Sharma, Chenhao Tan
TMLR 2023, FAccT 2023, and best paper at HMCaT @ ICML 2022
[arxiv]
-
Learning Human-Compatible Representations for Case-Based Decision Support
Han Liu, Yizhou Tian, Chacha Chen, Shi Feng, Yuxin Chen, Chenhao Tan
ICLR 2023
[openreview]
-
Learning to Explain Selectively
Shi Feng, Jordan Boyd-Graber
EMNLP 2022
[acl]
[pdf]
-
Active Example Selection for In-Context Learning
Yiming Zhang, Shi Feng, Chenhao Tan
EMNLP 2022
[acl]
[pdf]
-
Calibrate Before Use: Improving Few-shot Performance of Language Models
Tony Z. Zhao*, Eric Wallace*, Shi Feng, Dan Klein, Sameer Singh
ICML 2021
[pmlr]
[pdf]
-
Concealed Data Poisoning Attacks on NLP Models
Eric Wallace*, Tony Z. Zhao*, Shi Feng, Sameer Singh
NAACL 2021
[acl]
[pdf]
[blog]
-
Universal Adversarial Triggers for Attacking and Analyzing NLP
Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh
EMNLP 2019, oral
[acl]
[pdf]
[blog]
-
Misleading Failures of Partial-input Baselines
Shi Feng, Eric Wallace, Jordan Boyd-Graber
ACL 2019, short paper
[acl]
[pdf]
-
Quizbowl: The Case for Incremental Question Answering
Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, Jordan Boyd-Graber
In submission [arxiv]
-
Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation
Sahil Singla, Eric Wallace, Shi Feng, Soheil Feizi
ICML 2019
[pmlr]
[pdf]
-
Trick Me If You Can: Human-in-the-loop Generation of Adversarial Examples for Question Answering
Eric Wallace, Pedro Rodriguez, Shi Feng, Jordan Boyd-Graber
TACL 2019
[acl]
[pdf]
-
What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play
Shi Feng, Jordan Boyd-Graber
IUI 2019
[arxiv]
-
Pathologies of Neural Models Make Interpretation Difficult
Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez, Jordan Boyd-Graber
EMNLP 2018, oral
[acl]
[pdf]
[talk]
-
Interpreting Neural Networks with Nearest Neighbors
Eric Wallace*, Shi Feng*, Jordan Boyd-Graber
BlackboxNLP @ EMNLP 2018
[acl]
[pdf]
-
The UMD Neural Machine Translation Systems at WMT17 Bandit Learning Task
Amr Sharaf, Shi Feng, Khanh Nguyen, Kianté Brantley, Hal Daumé III
WMT @ EMNLP 2017
[acl]
[pdf]
-
Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, Kenny Q. Zhu
COLING 2016
[acl]
[pdf]