Gradient Temporal-Difference Learning with Regularized Corrections

Adam White , Martha White , Sina Ghiassian , Andrew Patterson
international conference on machine learning 1 3524 -3534

3
2020
Towards integrated dialogue policy learning for multiple domains and intents using Hierarchical Deep Reinforcement Learning

Tulika Saha , Dhawal Gupta , Sriparna Saha , Pushpak Bhattacharyya
Expert Systems With Applications 162 113650

2020
A hierarchical approach for efficient multi-intent dialogue policy learning

Tulika Saha , Dhawal Gupta , Sriparna Saha , Pushpak Bhattacharyya
Multimedia Tools and Applications 1 -26

3
2020
Bayesian Optimization Based Terrestrial Gait Tuning for a 12-DOF Alligator-Inspired Robot With Active Body Undulation

Krishna Agrawal , Kushagra Jain , Dhawal Gupta , Raunak Srivastav
ASME 2018 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference

2018
Structural credit assignment in neural networks using reinforcement learning

Dhawal Gupta , Gabor Mihucz , Matthew Schlegel , James Kostas
Advances in Neural Information Processing Systems 34 30257 -30270

2
2021
Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of rlhf

Simeng Sun , Dhawal Gupta , Mohit Iyyer
arXiv preprint arXiv:2309.09055

10
2023
Reinforcement learning based dialogue management strategy

Tulika Saha , Dhawal Gupta , Sriparna Saha , Pushpak Bhattacharyya
Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13–16, 2018, Proceedings, Part III 25 359 -372

9
2018
A unified dialogue management strategy for multi-intent dialogue conversations in multiple languages

Tulika Saha , Dhawal Gupta , Sriparna Saha , Pushpak Bhattacharyya
Transactions on Asian and Low-Resource Language Information Processing 20 ( 6) 1 -22

4
2021
Coagent Networks: Generalized and Scaled

James E Kostas , Scott M Jordan , Yash Chandak , Georgios Theocharous
arXiv preprint arXiv:2305.09838

1
2023
Behavior Alignment via Reward Function Optimization

Dhawal Gupta , Yash Chandak , Scott Jordan , Philip S Thomas
Advances in Neural Information Processing Systems 36

3
2024
ICU-Sepsis: A Benchmark MDP Built from Real Medical Data

Kartik Choudhary , Dhawal Gupta , Philip S Thomas
arXiv preprint arXiv:2406.05646

2024
Emotion Aided Dialogue Act Classification for Task-Independent Conversations in a Multi-modal Framework

Tulika Saha , Dhawal Gupta , Sriparna Saha , Pushpak Bhattacharyya
Cognitive Computation 1 -13

26
2020
From Past to Future: Rethinking Eligibility Traces

Dhawal Gupta , Scott M Jordan , Shreyas Chaudhari , Bo Liu
arXiv preprint arXiv:2312.12972

2023
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

Dhawal Gupta , Yinlam Chow , Mohammad Ghavamzadeh , Craig Boutilier
arXiv preprint arXiv:2302.10850

2023
A Mixture-of-Expert Approach to RL-based Dialogue Management

Yinlam Chow , Aza Tulepbergenov , Ofir Nachum , MoonKyung Ryu
arXiv preprint arXiv:2206.00059

2
2022