1.
Jyotsna Shastry, Shweta Agrawal. Learnable Reward Weighting in Multimodal RLHF: A Proximal Policy Optimization Framework for Safe and Helpful Dialogue Alignment. IJRASHT. 2026;3(2):191-197. doi:10.71143/216rys72