Month: August 2024
Show HN: Z80 Sans
Comments
Deriving the DPO Objective Under the Plackett-Luce Model
:::info Authors: (1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier; (2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier; (3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier; (4) Stefano Ermon, CZ Biohub; (5) Christopher D. Manning, Stanford University; (6) Chelsea Finn, … Read more
Deriving the DPO Objective Under the Bradley-Terry Model
:::info Authors: (1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier; (2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier; (3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier; (4) Stefano Ermon, CZ Biohub; (5) Christopher D. Manning, Stanford University; (6) Chelsea Finn, … Read more
Deriving the Optimum of the KL-Constrained Reward Maximization Objective
:::info Authors: (1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier; (2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier; (3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier; (4) Stefano Ermon, CZ Biohub; (5) Christopher D. Manning, Stanford University; (6) Chelsea Finn, … Read more
Behind the Scenes: The Team Behind DPO
:::info Authors: (1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier; (2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier; (3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier; (4) Stefano Ermon, CZ Biohub; (5) Christopher D. Manning, Stanford University; (6) Chelsea Finn, … Read more
GPT-4 vs. Humans: Validating AI Judgment in Language Model Training
:::info Authors: (1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier; (2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier; (3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier; (4) Stefano Ermon, CZ Biohub; (5) Christopher D. Manning, Stanford University; (6) Chelsea Finn, … Read more
Theoretical Analysis of Direct Preference Optimization
:::info Authors: (1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier; (2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier; (3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier; (4) Stefano Ermon, CZ Biohub; (5) Christopher D. Manning, Stanford University; (6) Chelsea Finn, … Read more
Bypassing the Reward Model: A New RLHF Paradigm
:::info Authors: (1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier; (2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier; (3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier; (4) Stefano Ermon, CZ Biohub; (5) Christopher D. Manning, Stanford University; (6) Chelsea Finn, … Read more
How AI Learns from Human Preferences
:::info Authors: (1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier; (2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier; (3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier; (4) Stefano Ermon, CZ Biohub; (5) Christopher D. Manning, Stanford University; (6) Chelsea Finn, … Read more
Simplifying AI Training: Direct Preference Optimization vs. Traditional RL
:::info Authors: (1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier; (2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier; (3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier; (4) Stefano Ermon, CZ Biohub; (5) Christopher D. Manning, Stanford University; (6) Chelsea Finn, … Read more
Writing a Rust compiler in C
Comments
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
:::info Authors: (1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier; (2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier; (3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier; (4) Stefano Ermon, CZ Biohub; (5) Christopher D. Manning, Stanford University; (6) Chelsea Finn, … Read more
Martin Shkreli must surrender his Wu-Tang album copies
Photo: Drew Angerer / Getty Images Former pharmaceutical executive Martin Shkreli must turn over his copies of The Wu-Tang Clan’s Once Upon a Time in Shaolin album to comply with a preliminary injunction issued by federal Judge Pamela Chen in an ongoing lawsuit, ArtNet reported on Friday. PleasrDAO, NFT collective and current owner of Shaolin … Read more
Pi Pico 2 Extreme Teardown
Comments
This Week in Crypto Games: ‘Catizen’ Airdrop With HashKey, ‘Ragnarok’ Ronin Beta
Catch up on this week’s biggest crypto and NFT gaming news and find some weekend reads in our latest roundup.
MacOS X Malware Development
Comments
Telegram issues official statement on Pavel Durov detention
The Telegram team disputes reports that Durov had reason to avoid traveling within Europe.
Forgotten Runiverse Open Beta Billed as ‘Coming Out Party’ for Ethereum Franchise
The co-founder of Forgotten Runes believes the Ronin-based MMORPG will help push the project to become a “global franchise.”