Tracking Reward Function Improvement with Proxy Human Preferences in ICPL
Table of Links Abstract and Introduction Related Work Problem Definition Method Experiments Conclusion and References A. Appendix A.1. Full Prompts and A.2 ICPL Details A. 3 Baseline Details A.4 Environment Details A.5 Proxy Human Preference A.6 Human-in-the-Loop Preference A.5 PROXY HUMAN PREFERENCE A.5.1 ADDITIONAL RESULTS Due to the high variance in LLMs performance, we report … Read more