Few-shot In-Context Preference Learning Using Large Language Models: Full Prompts and ICPL Details

Table of Links

  1. Abstract and Introduction
  2. Related Work
  3. Problem Definition
  4. Method
  5. Experiments
  6. Conclusion and References


A. Appendix

A.1. Full Prompts and A.2 ICPL Details

A. 3 Baseline Details

A.4 Environment Details

A.5 Proxy Human Preference

A.6 Human-in-the-Loop Preference

A APPENDIX

We would suggest visiting https://sites.google.com/view/few-shot-icpl/home for more information and videos.

A.1 FULL PROMPTS


Prompt 1: Initial System Prompts of Synthesizing Reward Functions


Prompt 2: Feedback Prompts


Prompt 3: Prompts of Tips for Writing Reward Functions


Prompt 4: Prompts of Describing Differences

A.2 ICPL DETAILS

The full pseudocode of ICPL is listed in Algo. 2.

:::info
Authors:

(1) Chao Yu, Tsinghua University;

(2) Hong Lu, Tsinghua University;

(3) Jiaxuan Gao, Tsinghua University;

(4) Qixin Tan, Tsinghua University;

(5) Xinting Yang, Tsinghua University;

(6) Yu Wang, with equal advising from Tsinghua University;

(7) Yi Wu, with equal advising from Tsinghua University and the Shanghai Qi Zhi Institute;

(8) Eugene Vinitsky, with equal advising from New York University (zoeyuchao@gmail.com).

:::


:::info
This paper is available on arxiv under CC 4.0 license.

:::

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.