This post's goal is to distill our takeaways from building a research team (somewhat) from scratch over the past four months. We describe some context about our team, how it came about, and then provide some lessons learned.
Since AI safety is becoming more and more entrepreneurial, we hope this is helpful for others trying to do the same.
1. The team
We're a new alignment research team within Arcadia Impact, based in London. We’re a team of 8, working closely with members of the UK AISI alignment team. We currently have three main projects:
Understanding model motivations. This currently looks like:
Trying to generate documents which fully describe a model's behaviour (given just its behaviour).
Producing a open analysis of alignment training techniques and ways this training could go wrong.
Doing scalable oversight for alignment. This includes validating debate protocols in practice and then trying to apply them to fuzzy alignment-relevant tasks.
Building pipelines for doing automated alignment research.
We're also hiring for two roles! More on this at the bottom.
2. Context about how the team came about
The rest of this post is written from the perspective of Andrew Draganov (research lead & current [...]
---
Outline:
(00:33) 1. The team
(01:29) 2. Context about how the team came about
(04:13) 3. Lessons learned
(04:25) 3.1. Hiring
(06:36) 3.2. Networking
(09:13) 3.3. Trying to build a good team culture
(11:17) Interested in working with us?
The original text contained 1 footnote which was omitted from this narration.
Podden och tillhörande omslagsbild på den här sidan tillhör
LessWrong. Innehållet i podden är skapat av LessWrong och inte av,
eller tillsammans med, Poddtoppen.