Efficient Game-Theoretic Planning With Prediction Heuristic for Socially-Compliant Autonomous Driving

Chenran Li , Tu Trinh , Letian Wang , Changliu Liu
IEEE Robotics and Automation Letters 7 ( 4) 10248 -10255

1
2022
A StrongREJECT for Empty Jailbreaks

Alexandra Souly , Qingyuan Lu , Dillon Bowen , Tu Trinh
arXiv preprint arXiv:2402.10260

6
2024
Softmax Probabilities (Mostly) Predict Large Language Model Correctness on Multiple-Choice Q&A

Benjamin Plaut , Khanh Nguyen , Tu Trinh
arXiv preprint arXiv:2402.13213

1
2024
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents

Priyanshu Kumar , Elaine Lau , Saranya Vijayakumar , Tu Trinh

2024
Getting By Goal Misgeneralization With a Little Help From a Mentor

Tu Trinh , Mohamad H Danesh , Nguyen X Khanh , Benjamin Plaut
arXiv preprint arXiv:2410.21052

2024