This is what happens when you let AIs debate
Machine Learning Street Talk Machine Learning Street Talk
146K subscribers
9,577 views
338

 Published On Sep 27, 2024

Akbir Khan, AI researcher and ICML best paper winner, discusses his work on AI alignment, debate techniques for truthful AI responses, and the future of artificial intelligence.

Key points discussed:
Using debate between language models to improve truthfulness in AI responses
Scalable oversight for supervising AI models beyond human-level intelligence
The relationship between intelligence and agency in AI systems
Challenges in AI safety and alignment
The potential for a Cambrian explosion in human-like intelligent systems

The discussion also explored broader topics:
The wisdom of crowds vs. expert knowledge in machine learning debates
Deceptive alignment and reward tampering in AI systems
Open-ended AI systems and their implications for development and safety
The space of possible minds and defining superintelligence
Cultural evolution and memetics in understanding intelligence

Akbir Khan:
https://x.com/akbirkhan
https://akbir.dev/

Show notes and transcript: https://www.dropbox.com/scl/fi/sjekiv...

TOC (*) are best bits
00:00:00 1. Intro: AI alignment and debate techniques for truthful responses *
00:05:00 2. Scalable oversight and hidden information settings
00:10:05 3. AI agency, intelligence, and progress *
00:15:00 4. Base models, RL training, and instrumental goals
00:25:11 5. Deceptive alignment and RL challenges in AI *
00:30:12 6. Open-ended AI systems and future directions
00:35:34 7. Deception, superintelligence, and the space of possible minds *
00:40:00 8. Cultural evolution, memetics, and intelligence measurement

References:
1. [00:00:40] Akbir Khan et al. ICML 2024 Best Paper: "Debating with More Persuasive LLMs Leads to More Truthful Answers"
https://arxiv.org/html/2402.06782v3

2. [00:03:28] Yann LeCun on machine learning debates
   • Yann LeCun - A Path Towards Autonomou...  

3. [00:06:05] OpenAI's Superalignment team
https://openai.com/index/introducing-...

4. [00:08:10] Sam Bowman on scalable oversight in AI systems
https://arxiv.org/abs/2211.03540

5. [00:10:35] Sam Bowman on the sandwich protocol
https://www.alignmentforum.org/posts/...

6. [00:14:35] Janus' article on "Simulators" and LLMs
https://www.lesswrong.com/posts/vJFdj...

7. [00:16:35] Thomas Suddendorf's book "The Gap: The Science of What Separates Us from Other Animals"
https://www.amazon.in/GAP-Science-Sep...

8. [00:19:10] DeepMind on responsible AI
https://deepmind.google/about/respons...

9. [00:20:50] Technological singularity
https://en.wikipedia.org/wiki/Technol...

10. [00:21:30] Eliezer Yudkowsky on FOOM (Fast takeoff)
https://intelligence.org/files/AIFoom...

11. [00:21:45] Sammy Martin on recursive self-improvement in AI
https://www.alignmentforum.org/posts/...

12. [00:24:25] LessWrong community
https://www.lesswrong.com/

13. [00:24:35] Nora Belrose on AI alignment and deception
https://www.lesswrong.com/posts/YsFZF...

14. [00:25:35] Evan Hubinger on deceptive alignment in AI systems
https://www.lesswrong.com/posts/zthDP...

15. [00:26:50] Anthropic's article on reward tampering in language models
https://www.anthropic.com/research/re...

16. [00:32:35] Kenneth Stanley's work on open-endedness in AI
https://www.amazon.co.uk/Why-Greatnes...

17. [00:34:58] Ryan Greenblatt, Buck Shlegeris et al. on AI safety protocols
https://arxiv.org/pdf/2312.06942

18. [00:37:20] Aaron Sloman's concept of 'the space of possible minds'
https://www.cs.bham.ac.uk/research/pr...

19. [00:38:25] François Chollet on defining and measuring intelligence in AI
https://arxiv.org/abs/1911.01547

20. [00:42:30] Richard Dawkins on memetics
https://www.amazon.co.uk/Selfish-Gene...

21. [00:42:45] Jonathan Cook et al. on Artificial Generational Intelligence
https://arxiv.org/abs/2406.00392

22. [00:45:00] Peng on determinants of cryptocurrency pricing
https://www.emerald.com/insight/conte...

show more

Share/Embed