GPT-4 Prompts for Computing Summarization and Dialogue Win Rates

A key component of our experimental setup is GPT-4 win rate judgments. In this section, we include the prompts used to generate win rates for the summarization and dialogue experiments. We use gpt-4-0314 for all our experiments. The order of summaries or responses are randomly chosen for every evaluation.

Summarization GPT-4 win rate prompt (S).

Which of the following summaries does a better job of summarizing the most important points in the given forum post?

Post:

Summary A:

Summary B:

FIRST provide a one-sentence comparison of the two summaries, explaining which you prefer and why. SECOND, on a new line, state only “A” or “B” to indicate your choice. Your response should use the format: Comparison: Preferred:

Summarization GPT-4 win rate prompt (C).

Which of the following summaries does a better job of summarizing the most important points in the given forum post, without including unimportant or irrelevant details? A good summary is both precise and concise.

Post:

Summary A:

Summary B:

Comparison:

Preferred:

Dialogue GPT-4 win rate prompt.

For the following query to a chatbot, which response is more helpful?

Query:

Response A:

Response B:

FIRST provide a one-sentence comparison of the two responses and explain which you feel is more helpful. SECOND, on a new line, state only “A“ or “B“ to indicate which response is more helpful. Your response should use the format:

Comparison:

More helpful:

:::info
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

:::

Table of Links

C.2 GPT-4 prompts for computing summarization and dialogue win rates

Leave a Comment Cancel reply