Jyotsna Shastry, and Shweta Agrawal. 2026. “Learnable Reward Weighting in Multimodal RLHF: A Proximal Policy Optimization Framework for Safe and Helpful Dialogue Alignment”. International Journal of Research and Review in Applied Science, Humanities, and Technology 3 (2): 191-97. https://doi.org/10.71143/216rys72.