Jyotsna Shastry and Shweta Agrawal (2026) “Learnable Reward Weighting in Multimodal RLHF: A Proximal Policy Optimization Framework for Safe and Helpful Dialogue Alignment”, International Journal of Research and Review in Applied Science, Humanities, and Technology, 3(2), pp. 191–197. doi:10.71143/216rys72.