Repair-R1: Enhancing Automated Program Repair with Test-Driven Training
Researchers have introduced Repair-R1, a novel approach to Automated Program Repair (APR) that integrates test cases into the model's training phase and prioritizes test generation before repair. This method enables the model to better locate defects and understand their underlying causes, thereby improving repair effectiveness. Experimental results demonstrate significant improvements in repair success rates and test generation success rates across multiple benchmarks.
Flow Matching Policy Gradients: Advancing Reinforcement Learning with Flow-Based Models
A new reinforcement learning algorithm, Flow Policy Optimization (FPO), has been proposed to enhance policy optimization by integrating flow matching into the policy gradient framework. FPO casts policy optimization as maximizing an advantage-weighted ratio computed from the conditional flow matching loss, sidestepping the need for exact likelihood computation while preserving the generative capabilities of flow-based models. This approach has shown superior performance in continuous control tasks, capturing multimodal action distributions and achieving higher performance than Gaussian policies, particularly in under-conditioned settings.