“Learnings from starting an AI safety research team” by draganover, Erin Robertson - LessWrong (30+ Karma) | Lyssna här

This post's goal is to distill our takeaways from building a research team (somewhat) from scratch over the past four months. We describe some context about our team, how it came about, and then provide some lessons learned.

Since AI safety is becoming more and more entrepreneurial, we hope this is helpful for others trying to do the same.

1. The team

We're a new alignment research team within Arcadia Impact, based in London. We’re a team of 8, working closely with members of the UK AISI alignment team. We currently have three main projects:

Understanding model motivations. This currently looks like:
1. Trying to generate documents which fully describe a model's behaviour (given just its behaviour).
2. Producing a open analysis of alignment training techniques and ways this training could go wrong.
Doing scalable oversight for alignment. This includes validating debate protocols in practice and then trying to apply them to fuzzy alignment-relevant tasks.
Building pipelines for doing automated alignment research.

We're also hiring for two roles! More on this at the bottom.

2. Context about how the team came about

The rest of this post is written from the perspective of Andrew Draganov (research lead & current [...]

---

Outline:

(00:33) 1. The team

(01:29) 2. Context about how the team came about

(04:13) 3. Lessons learned

(04:25) 3.1. Hiring

(06:36) 3.2. Networking

(09:13) 3.3. Trying to build a good team culture

(11:17) Interested in working with us?