Jyotsna Shastry, & Shweta Agrawal. (2026). Learnable Reward Weighting in Multimodal RLHF: A Proximal Policy Optimization Framework for Safe and Helpful Dialogue Alignment. International Journal of Research and Review in Applied Science, Humanities, and Technology, 3(2), 191-197. https://doi.org/10.71143/216rys72