Now

Now

This page contains what I’m currently up to.

11/04/23

  • I recently started on the dangerous capability evaluations team at OpenAI.

15/10/22

  • Started working on a new project: dtchess as part of SERI MATS, under the supervision of Evan Hubinger. dtchess is a library I’m writing to train and open-source language models fine-tuned to play chess. The goal here is to do mechanistic interpretability on these models and detect interesting properties – such as internal search, or optimisation, in the case of chess. For additional information, check out the auditing games for high-level interpretability on the alignment forum.

01/07/22

  • I moved to the SF Bay Area to participate in two research/engineering programmes: the Machine Learning for Aligmment Theory Scholarship from the Stanford Existential Risk Initiative, and the ML for Alignment Bootcamp organised by Redwood Research.