Inviting Darwin into Antibody Language Models
Event Details
- Type
- Center for Studies in Physics and Biology Seminars
- Speaker(s)
-
Frederick A. Matsen, Ph.D., professor, Fred Hutchinson Cancer Research Center
- Speaker bio(s)
-
Antibodies are coded by nucleotide sequences that are generated by V(D)J recombination and evolve according to nucleotide mutation and selection processes. Existing antibody language models, however, focus exclusively on antibodies as strings of amino acids and are fit using the masked language modeling objective. In this talk, I will first show that fitting using this objective implicitly incorporates nucleotide-level processes as part of the protein language model, which degrades performance when predicting functional properties of antibodies. To address this limitation, we propose a new framework: a deep amino acid selection model (DASM) that predicts the selective effect of replacing every amino acid with every alternate amino acid. By fitting selection as a separate term from the mutation process, the DASM exclusively quantifies functional effects. This separation of concerns leads to substantially improved performance on standard functional benchmarks. Moreover, our model is an order of magnitude smaller and orders of magnitude faster to evaluate than existing approaches, as well as being readily interpretable. I will then describe some surprising conclusions about how natural selection works for antibodies: there is more to the story than framework vs CDRs!
- Open to
- Public
- Phone
- (212) 327-8636
- Sponsor
-
Melanie Lee
(212) 327-8636
leem@rockefeller.edu