Jyotsna Shastry, and Shweta Agrawal. “Learnable Reward Weighting in Multimodal RLHF: A Proximal Policy Optimization Framework for Safe and Helpful Dialogue Alignment”. International Journal of Research and Review in Applied Science, Humanities, and Technology, vol. 3, no. 2, June 2026, pp. 191-7, https://doi.org/10.71143/216rys72.