–––––––– –––––––– archives investing twitter
Brian Naughton | Sat 08 March 2025 | biotech | biotech ai

I have written about protein binder design a few times now (the Adaptyv competition; a follow up). Corin Wagen recently wrote a great piece about protein–ligand binding. This purpose of this post is to review how well protein binder design is working today, and point out some interesting differences in model performance that I do not understand.

Protein design

There are two major types of protein design:

  1. Design a sequence to perform some task: e.g., produce a sequence that improves upon some property of the protein
  2. Design a structure to perform some task: e.g., produce a protein structure that binds another protein

There is spillover between these two classes but I think it's useful to split this way.

Sequence models

Sequence models include open-source models like the original ESM2, ProSST, SaProt, and semi-open or fully proprietary models from EvolutionaryScale (ESM3), OpenProtein (PoET-2), and Cradle Bio. The ProteinGym benchmark puts ProSST, PoET-2 and SaProt up near the top.

Many of the recent sequence-based models now also include structure information, represented as a parallel sequence, with one "structure token" per amino acid. This addition seems to improve performance quite a lot, allows sequence models to make use of the PDB, and — analogously to Vision Transformers — blurs the line between sequence and structure models.

SaProt uses a FoldSeek-derived alphabet to encode structural information

The most basic use-case for sequence models is probably improving the stability of a protein. You can take a protein sequence, make whatever edits your model deems high likelihood, and this should produce a sequence that retains the same fold, but is more "canonical", and so may have improved stability too.

An elaboration of this experiment is to find some data, e.g., thermostability for a few thousand proteins, and fine-tune the original language model to be able to predict that property. SaProtHub makes this essentially push-button.

A further elaboration is doing active learning, where you propose edits using your model, generate empirical data for these edits (e.g., binding affinity), and go back and forth, hopefully improving performance each iteration. For example, EVOLVEpro, Nabla Bio's JAM (which also uses structure), and Prescient's Lab-in-the-loop. These systems can be complex, but can also be as simple as running regressions on the output of the sequence models.

EvolvePro's learning loop

Sequence-based models are a natural fit to these kinds of problems, since you can easily edit the sequence but maintain the same fold and function. Profluent and other companies make use of this ability by producing patent-unencumbered sequences like OpenCRISPR.

This is especially enabling for the biosimilars industry. Many biologics patents protect the sequence by setting amino acid identity thresholds. For example, in the Herceptin/trastuzumab patent they protect any sequence >=85% identical to the heavy (SEQ ID NO: I) or light chain (SEQ ID NO: II).

Excerpt from the main trastuzumab patent

Patent attorneys will layer as many other protections on top of this as they can think of, but the sequence of the antibody is the primary IP. (Tangentially, it is insane how patents always give examples of numbers greater than X. Hopefully, the AIs that will soon be writing patents won't do this.)

For binder design, sequence models appear to have limits. Naively, since you do not know the positions of the atoms, then unless you are apeing known interaction motifs, you would assume binder design should be difficult?

Diego del Alamo points out apparent limits in the performance of sequence models for antibody design

Structural models

Structural models include the original RFdiffusion and the recently released antibody variant RFantibody from the Baker lab, RSO from the ColabDesign team, BindCraft, EvoBind2, foldingdiff from Microsoft, and models from startups like Generate Biomedicines (Chroma), Chai Discovery, and Diffuse Bio. (Some of these tools are available on my biomodals repo).

Structural models are trained on both sequence data (e.g., UniRef) and structure data (PDB), but they deal in atom co-ordinates instead of amino acid strings. That difference means diffusion-style models dominate here over the discrete-token–focused transformers.

There are two major classes of structural models:

  • Diffusion models like RFdiffusion and RFantibody
  • AlphaFold2-based models like BindCraft, RSO, and EvoBind2

The success rates of RFdiffusion and RFantibody are not great. For some targets they achieve a >1% success rate (if we define success as finding a <1µM binder), but in other cases they nominate thousands of designs and find no strong binder.

An example from the RFantibody paper showing a low success rate

BindCraft and RSO are two similar methods that produce minibinders (small-ish non-antibody–based proteins) and rely on inverting AlphaFold2 to turn structure into sequence. EvoBind2 produces cyclic or linear peptides, and also relies heavily on an AlphaFold confidence metric (pLDDT) as part of its loss.

BindCraft (top) and EvoBind2 (bottom) have similar loss functions that rely on AF2's pLDDT and intermolecular contacts

Even though these AF2-based models work very well, one non-obvious catch is that you cannot take a binding pose and get AlphaFold2 to evaluate it. These models can generate binders, but not discriminate binders from non-binders. In the EvoBind2 paper, they found that "No in silico metric separates true from false binders", which means the problem is a bit more complex than just "ask AF2 if it looks good".

According to the AF2Rank paper, the AF2 model has a good model of the physics of protein folding, but may not find the global minimum. The MSAs' job is to help focus that search. This was surprising to me! The protein folding/binding problem is more of a search problem than I realized, which means more compute should straightforwardly improve performance by simply doing more searching. This is also evidenced by the AlphaFold 3 paper, where re-folding antibodies 1000 times led to improved prediction quality.

Excerpt from the AF2Rank paper (top), and a tweet from Sergey Ovchinnikov (bottom) explaining the primacy of sequence data in structure prediction

RFdiffusion/RFantibody vs BindCraft/EvoBind2

The main comparison I wanted to make in this post is between RFdiffusion/RFantibody vs BindCraft and EvoBind2.

These are all recently released, state-of-the-art models from top labs. However, the difference in claimed performance is pretty striking.

While the RFdiffusion and RFantibody papers caution that you may need to test hundreds or even thousands of proteins to find one good binder, the BindCraft and EvoBind2 papers appear to show very high success rates, perhaps even as high as 50%. (EvoBind2 only shows results for one ribonuclease target but BindCraft includes multiple).

Words of caution from the RFantibody github repo (top) and BindCraft's impressive results for 10 targets (bottom)

There is no true benchmark to reference here, but I think under reasonable assumptions, BindCraft (and arguably EvoBind2) achieve a >10X greater success rate than RFdiffusion or RFantibody. The Baker lab is the leading and best resourced lab in this domain, so what accounts for this large difference in performance? I can think of a few possibilities:

  • RoseTTAFold2 was not the best filter for RFantibody to use, and switching to AlphaFold3 would improve performance. This is plausible, but it is hard to believe that is a 10X improvement.
  • Antibodies are just harder than minibinders or cyclic peptides. Hypervariable regions are known to be difficult to fold, since they do not have the advantage of evolutionary conservation. However, RFdiffusion also produces minibinders, so this is not a satisfactory explanation.
  • BindCraft and EvoBind2 are testing on easier targets. There is likely some truth to this. Most (but not all) examples in the BindCraft paper are for proteins with known binders; EvoBind2 is only tested against a target with a known peptide binder. However, most of RFantibody's targets also have known antibodies in PDB.
  • Diffusion currently just does not work as well as AlphaFold-based methods. AlphaFold2 (and its descendants, AF3, Boltz, Chai-1, etc.) have learned enough physics to recognize binding, and by leaning on this ability heavily, and filtering carefully, you get much better performance.

What comes next?

RFdiffusion and RFantibody are arguably the first examples of successful de novo binder design and antibody design, and for that reason are important papers. BindCraft and EvoBind2 have proven they can produce one-shot nanomolar binders under certain circumstances, which is technically extremely impressive.

However, if we could get another 10X improvement in performance, then I think these tools are being used in every biotech and research lab. Some ideas for future directions:

  • More compute: One of the interesting things about BindCraft and EvoBind2 is how long they take to produce anything. In BindCraft's case, it generates a lot of candidates, but has a long list of criteria that must be met. One BindCraft run will screen hundreds or thousands of candidates and can easily cost $10+. Similary, EvoBind2 can run for 5+ hours before producing anything, again easily costing $10+. This approach of throwing compute at the problem may be generally applicable, and may be analogous to the recently successful LLM reasoning approaches.
  • Combine diffusion and AlphaFold-based methods: I have no specific idea here, but since they are quite different approaches, maybe integrating some ideas from RFdiffusion into EvoBind2 or BindCraft could help.
  • Combine sequence models and structure models: There is already a lot of work happening here, both from the sequence side and structure side. In the simplest case, the output of a sequence model like ESM2 could be an independent contributor to the loss of a structure model. At the very least, this could help filter out structures that do not fold.
  • Neural Network Potentials: Neural Network Potentials are an exciting new tool for molecular dynamics (see Duignan, 2024 or Barnett, 2024). Achira just got funded to work on this, and has several of the pioneers of the field on board. Semi-open source models like orb-v2 from Orbital Materials are being actively developed too. The amount of compute required is prohibitive right now, but even a short trajectory could plausibly help with rank ordering binders, and would be independent of the AF2 metrics.

Tweet from Tim Duignan at Orbital Materials

Comment
Brian Naughton | Sat 07 September 2024 | biotech | biotech ai llm

Adaptyv is a newish startup that sells high-throughput protein assays. The major innovations are (a) they tell you the price (a big innovation for biotech services!) (b) you only have to upload protein sequences, and you get results in a couple of weeks.

A typical Adaptyv workflow might look like the following:

  • Design N protein binders for a target of interest (Adaptyv has 50-100 pre-specified targets)
  • Submit your binder sequences to Adaptyv
  • Adaptyv synthesizes DNA, then protein, using your sequences
  • In ~3 weeks you get affinity measurements for each design at a cost of $149 per data-point

This is an exciting development since it decouples "design" and "evaluation" in a way that enables computation-only startups to get one more step towards a drug (or sensor, or tool). There are plenty of steps after this one, but it's still great progress!

The Adaptyv binder design competition

A couple of months ago, Adaptyv launched a binder design competition, where the goal was to design an EGFR binder. There was quite a lot of excitement about the competition on Twitter, and about 100 people ended up entering. At around the same time, Leash Bio launched a small molecule competition on Kaggle, so there was something in the air.

PAE and iPAE

For this competition, Adaptyv ranked designs based on the "PAE interaction" (iPAE) of the binder with EGFR.

PAE (Predicted Aligned Error) "indicates the expected positional error at residue x if the predicted and actual structures are aligned on residue y". iPAE is the average PAE for residues in the binder vs target. In other words, how accurate do we expect the relative positioning of binder and target to be? This is a metric that David Baker's lab seems to use quite a bit, at least for thresholding binders worth screening. It is straightforward to calculate using the PAE outputs from AlphaFold.

Unusually, compared to, say, a Kaggle competition, in this competition there are no held-out data that your model is evaluated on. Instead, if you can calculate iPAE, you know your expected position on the leaderboard before submitting.

The original paper Adaptyv reference is Improving de novo protein binder design with deep learning and the associated github repo has an implementation of iPAE that I use (and I assume the code Adaptyv use.)

Confusingly, there is also a metric called "iPAE" mentioned in the paper Systematic discovery of protein interaction interfaces using AlphaFold and experimental validation. It is different, but could actually be a more appropriate metric for binders?

At the end of last month (August 2024), there was a new Baker lab paper on Ras binders that also used iPAE, in combination with a few other metrics like pLDDT.

Experiments

A week or so after the competition ended, I found some time to try a few experiments.

Throughout these experiments, I include modal commands to run the relevant software. If you clone the biomodals repo it should just work(?)

iPAE vs Kd

The consensus seems to be that <10 represents a decent iPAE, but in order for iPAE to be useful, it should correlate with some physical measurement. As a small experiment, I took 55 PDB entries from PDBbind (out of ~100 binders that were <100 aas long, had an associated Kd, and only two chains), ran AlphaFold, calculated iPAE, and correlated this to the known Kd. I don't know that I really expected iPAE to correlate strongly with Kd, but it's pretty weak.

PDBbind Kd vs iPAE correlation

# download the PDBbind protein-protein dataset in a more convenient format and run AlphaFold on one example
wget https://gist.githubusercontent.com/hgbrian/413dbb33bd98d75cc5ee6054a9561c54/raw -O pdbbind_pp.tsv
tail -1 pdbbind_pp.tsv
wget https://www.rcsb.org/fasta/entry/6har/display -O 6har.fasta
echo ">6HAR\nYVDYKDDDDKEFEVCSEQAETGPCRACFSRWYFDVTEGKCAPFCYGGCGGNRNNFDTEEYCMAVCGSAIPRHHHHHHAAA:IVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWVVSAAHCYKTRIQVRLGEHNIKVLEGNEQFINAAKIIRHPKYNRDTLDNDIMLIKLSSPAVINARVSTISLPTAPPAAGTECLISGWGNTLSFGADYPDELKCLDAPVLTQAECKASYPGKITNSMFCVGFLEGGKDSCQRDAGGPVVCNGQLQGVVSWGHGCAWKNRPGVYTKVYNYVDWIKDTIAANS" > 6har_m.fasta
modal run modal_alphafold.py --input-fasta 6har_m.fasta --binder-len 80

Greedy search

This is about the simplest approach possible.

  • Start with EGF (53 amino acids)
  • Mask every amino acid, and have ESM propose the most likely amino acid
  • Fold and calculate iPAE for the top 30 options
  • Take the best scoring iPAE and iterate

Each round takes around 5-10 minutes and costs around $4 on an A10G on modal.

# predict one masked position in EGF using esm2
echo ">EGF\nNSDSECPLSHDGYCL<mask>DGVCMYIEALDKYACNCVVGYIGERCQYRDLKWWELR" > esm_masked.fasta
modal run modal_esm2_predict_masked.py --input-fasta esm_masked.fasta
# run AlphaFold on the EGF/EGFR complex and calculate iPAE
echo ">EGF\nNSDSECPLSHDGYCLHDGVCMYIEALDKYACNCVVGYIGERCQYRDLKWWELR:LEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEVVLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALAVLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDFQNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGCTGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYVVTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFKNCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAFENLEIIRGRTKQHGQFSLAVVSLNITSLGLRSLKEISDGDVIISGNKNLCYANTINWKKLFGTSGQKTKIISNRGENSCKATGQVCHALCSPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPREFVENSECIQCHPECLPQAMNITCTGRGPDNCIQCAHYIDGPHCVKTCPAGVMGENNTLVWKYADAGHVCHLCHPNCTYGCTGPGLEGCPTNGPKIPSI" > egf_01.fasta
modal run modal_alphafold.py --input-fasta egf_01.fasta --binder-len 53

One of the stipulations of the competition is that your design must be at least 10 amino acids different to any known binder, so you must run the loop above 10 or more times. Of course, there is no guarantee that there is a single amino acid change that will improve the score, so you can easily get stuck.

After 12 iteratations (at a cost of around $50 in Alphafold compute), the best score I got was 7.89, which would have been good enough to make the top 5. (I can't be sure, but I think my iPAE calculation is identical!) Still, this is really just brute-forcing EGF tweaks. I think the score was asymptoting, but there were also jumps in iPAE with certain substitutions, so who knows?

Unfortunately, though the spirit of the competition was to find novel binders, the way iPAE works means that the best scores are very likely to come from EGF-like sequences (or other sequences in AlphaFold's training set).

Adaptyv are attempting to mitigate this issue by (a) testing the top 200 and (b) taking the design process into account. It is a bit of an impossible situation, since the true wet lab evaluation happens only after the ranking step.

Bayesian optimization

Given an expensive black box like AlphaFold + iPAE, some samples, and a desire to find better samples, one appropriate method is Bayesian optimization.

Basically, this method allows you, in a principled way, to control how much "exploration" of new space is appropriate (looking for global minima) vs "exploitation" of variations on the current best solutions (optimizing local minima).

Bayesian optimization of a 1D function

The input to a Bayesian optimization is of course not amino acids, but numbers, so I thought reusing the ESM embeddings would be a decent, or at least convenient, idea here.

I tried both the Bayesian Optimization package and a numpyro Thompson sampling implementation. I saw some decent results at first (i.e., the first suggestions seemed reasonable and scored well), but I got stuck either proposing the same sequences over and over, or proposing sequences so diverged that testing them would be a waste of time. The total search space is gigantic, so testing random sequences will not help. I think probably the ESM embeddings were not helping me here, since there were a lot of near-zeros in there.

This is an interesting approach, and not too difficult to get started with, but I think it would work better with much deeper sampling of a smaller number of amino acids, or perhaps a cruder, less expensive, evaluation function.

ProteinMPNN

ProteinMPNN (now part of the LigandMPNN package), maps structure to sequence (i.e., the inverse of AlphaFold). For example, you can input an EGF PDB file, and it will return a sequence that should produce the same fold.

I found that for this task ProteinMPNN generally produced sequences with low confidence (as reported by ProteinMPNN), and as you'd expect, these resulted in low iPAEs. Some folds are difficult for ProteinMPNN, and I think EGF falls into this category. To run ProteinMPNN, I would recommend Simon Duerr's huggingface space, since it has a friendly interface and includes an AlphaFold validation step.

ProteinMPNN running on huggingface


# download a EGF/EGFR crytal structure and try to infer a new sequence that folds to chain C (EGF)
wget https://files.rcsb.org/download/1IVO.pdb
modal run modal_ligandmpnn.py --input-pdb 1IVO.pdb --extract-chains AC --params-str '--seed 1 --checkpoint_protein_mpnn "/LigandMPNN/model_params/proteinmpnn_v_48_020.pt" --chains_to_design "C" --save_stats 1 --batch_size 5 --number_of_batches 100'

RFdiffusion

RFdiffusion was the first protein diffusion method that showed really compelling results in generating de novo binders. I would recommend ColabDesign as a convenient interface to this and other protein design tools.

The input to RFdiffusion can be a protein fold to copy, or a target protein to bind to, and the output is a PDB file with the correct backbone co-ordinates, but with every amino acid labeled as Glycine. To turn this output into a sequence, this PDB file must then be fed into ProteinMPNN or similar. Finally, that ProteinMPNN output is typically folded with AlphaFold to see if the fold matches.

Although RFdiffusion massively enriches for binders over random peptides, you still have to screen many samples to find the really strong binders. So, it's probably optimistic to think that a few RFdiffusion-derived binders will show strong binding, even if you can somehow get a decent iPAE.

In my brief tests with RFdiffusion here, I could not generate anything that looked reasonable. I think in practice, the process of using RFdiffusion successfully is quite a bit more elaborate and heuristic-driven than anything I was going to attempt.

Figure 1 from De novo design of Ras isoform selective binders, showing multiple methods for running RFdiffusion

# Run RFdiffusion on the EGF/EGFR crystal structure, and diffuse a 50-mer binder against chain A (EGFR)
modal run modal_rfdiffusion.py --contigs='A:50' --pdb="1IVO"

Other things

A few other strategies I thought might be interesting:

  • Search FoldSeek for folds similar to EGF. The idea here is that you might find a protein in another organism that wants to bind EGFR. I do find some interesting human-parasitic nematode proteins in here, but decided these were unlikely to be EGFR binders.
  • Search NCBI for EGF-like sequences with blastp. You can find mouse, rat, chimp, etc. but nothing too interesting. The iPAEs are lower than human EGF, as expected.
  • Search the patent literature for EGFR binders. I did find some antibody-based binders, but as expected for folds that AlphaFold cannot solve, the iPAE was low.
  • Delete regions of the protein with low iPAE contributions to increase the average score. I really thought this would work for at least one or two amino acids, but it did not seem to. I did not do this comprehensively, but perhaps there are no truly redundant parts of this small binder?

Conclusion

All the top spots on the leaderboard went to Alex Naka, who helpfully detailed his methods in this thread. (A lot of this is similar to what I did above, including using modal!) Anthony Gitter also published an interesting thread on his attempts. I find these kinds of threads are very useful since they give a sense of the tools people are using in practice, including some I had never heard of, like pepmlm and Protrek.

Finally, I made a tree of the 200 designs that Adaptyv is screening (with iPAE <10 in green, <20 in orange, and >20 in red). All the top scoring sequences are EGF-like and cluster together. (Thanks to Andrew White for pointing me at the sequence data). We can look forward to seeing the wet lab results published in a couple of weeks.

Tree of Adaptyv binder designs

Comment
Brian Naughton | Mon 04 September 2023 | biotech | biotech machine learning ai

Molecular dynamics (MD) means simulating the forces acting on atoms. In drug discovery, MD usually means simulating protein–ligand interactions. This is clearly a crucial step in modern drug discovery, yet MD remains a pretty arcane corner of computational science.

This is a different problem to docking, where molecules are for the most part treated as rigid, and the problem is finding the right ligand orientation and the right pocket. Since in MD simulations the atoms can move, there are many more degrees of freedom, and so a lot more computation is required. For a great primer on this topic, see Molecular dynamics simulation for all (Hollingsworth, 2018).

What about deep learning?

Quantum chemical calculations, though accurate, are too computationally expensive to use for MD simulation. Instead, "force fields" are used, which enable computationally efficient calculation of the major forces. As universal function approximators, deep nets are potentially a good way to get closer to ground truth.

Analogously, fluid mechanics calculations are very computationally expensive, but deep nets appear to do a good job of approximating these expensive functions.

A deep net approximating Navier-Stokes

Recently, the SPICE dataset (Eastman, 2023) was published, which is a reference dataset that can be used to train deep nets for MD.

We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids.

This dataset has enabled new ML force fields like Espaloma (Takaba, 2023) and the recent Allegro paper (Musaelian, 2023), where they simulated a 44 million atom system of a HIV capsid. Interestingly, they scaled their system as high as 5120 A100's (which would cost $10k an hour to run!)


There are also hybrid ML/MM approaches (Rufa, 2020) based on the ANI2x ML force field (Devereux, 2020).

All of this work is very recent, and as I understand it, runs too slowly to replace regular force fields any time soon. Despite MD being a key step in drug development, only a small number of labs (e.g., Chodera lab) appear to work on OpenMM, OpenFF, and the other core technologies here.


Doing an MD simulation

I have only a couple of use-cases in mind:

  • does this ligand bind this protein in a human cell?
  • does this mutation affect ligand binding in a human cell?

Doing these MD simulations is tricky since a lot of background is expected of the user. There are many parameter choices to be made, and sensible options are not obvious. For example, you may need to choose force fields, ion concentrations, temperature, timesteps, and more.

By comparison, with AlphaFold you don't need to know how many recycles to run, or specify how the relaxation step works. You can just paste in a sequence and get a structure. As far as I can tell, there is no equivalent "default" for MD simulations.

A lot of MD tutorials I have found are geared towards simulating the trajectory of a system for inspection. However, with no specific numerical output, I don't know what to do with these results.

Choosing an approach

There are several MD tools out there for doing protein–ligand simulations, and calculating binding affinities:

  • Schrodinger is the main player in computational drug discovery, and a mature suite of tools. It's not really suitable for me, since it's expensive, geared toward chemists, designed for interactive use over scripting, and not even necessarily cutting-edge.
  • OpenEye also appears to be used a lot, and has close ties to open-source. Like Schrodinger, the tools are high quality, mostly interactive and designed for chemists.
  • HTMD from Acellera is not open-source, but it has a nice quickstart and tutorials.
  • GROMACS is open-source, actively maintained, and has tutorials, but is still a bit overwhelming with a lot of boilerplate.
  • Amber, like GROMACS, has been around for decades. It gets plenty of use (e.g., AlphaFold uses it as a final "relaxation" step), but is not especially user-friendly.
  • OpenMM seems to be where most of the open-source effort has been over the past five years or so, and is the de facto interface for a lot of the recent ML work in MD (e.g., Espaloma). A lot of tools are built on top or OpenMM:
    • yank is a tool for free energy binding calculations. Simulations are parameterized by a yaml file.
    • perses is also used for free energy calculation. It is pre-alpha software but under active development — e.g., this recent paper on protein–protein interaction. (Note, I will not claim to understand the differences between yank and perses!)
    • SEEKR2 is a tool that enables free energy calculation, among other things.
    • Making it rain is a suite of tools and colabs. It is a very well organized repo that guides you through running simulations on the cloud. For example, they include a friendly colab to run protein–ligand simulations. The authors did a great job and I'd recommend this repo broadly.
    • BAT, the Binding Affinity Tool, calculates binding affinity using MD (also see the related GHOAT).

OpenMM quickstart for protein simulation

Since I am not a chemist, I am really looking for a system with reasonable defaults for typical drug development scenarios. I found a nice repo by tdudgeon that appears to have the same goal. It uses OpenMM, and importantly has input from experts on parameters and settings. For example, I'm not sure I would have guessed you can multiply the mass of Hydrogen by 4.

This keeps their total mass constant while slowing down the fast motions of hydrogens. When combined with constraints (typically constraints=AllBonds), this often allows a further increase in integration step size.

I forked the repo, with the idea that I could keep the simulation parameters intact but change the interface a bit to make it focused on the problems I am interested in.

Calculating affinity

I am interested in calculating ligand–protein affinity (or binding free energy) — in other words, how well does the ligand bind the protein. There's a lot here I do not understand, but here is my basic understanding of how to calculate affinity:

  • Using MD: This is the most accurate way to measure affinity, but the techniques are challenging. There are "end-point" approaches (e.g., MM/PBSA) and Free Energy Perturbation (FEP) / alchemical approaches. Alchemical free energy approaches are more accurate, and have been widely used for years. (I believe Schrodinger were the first to show accurate results (Wang, 2015).) Still, I found it difficult to figure out a straightforward way to do these calculations.
  • Using a scoring function: This is how most docking programs like vina or gnina work. Docking requires a very fast, but precise, score to optimize.
  • Using a deep net: Recently, several deep nets trained to predict affinity have been published. For example, HAC-Net is a CNN trained on PDBbind. This is a very direct way to estimate binding affinity, and should be accurate if there is enough training data.


The SQM/COSMO docking scoring function (Ajani, 2017)

Unfortunately, I do know know of a benchmark comparing all the above approaches, so I just tried out a few things.

Predicting cancer drug resistance

One interesting but tractable problem is figuring out if a mutation in a protein will affect ligand binding. For example, let's say we sequence a cancer genome, and see a mutation in a drug target, do we expect that drug will still bind?

There are many examples of resistance mutations evolving in cancer.

Common cancer resistance mutations (Hamid, 2020)

Experiments

BRAF V600E is a famous cancer target. Vemurafenib is a drug that targets V600E, and L505H is known to be a resistance mutation. There is a crystal structure of BRAF V600E bound to Vemurafenib (PDB:3OG7). Can I see any evidence of reduced binding of Vemurafenib if I introduce an L505H mutation?

PDB:3OG7, showing the distance between vemurafenib (cyan) and L505 (yellow)

I ran a simple simulation: starting with the crystal structure, introduce each possible mutation at position 505, allow the protein–ligand system to relax, and check to see if the new protein–ligand interactions are less favorable according to some measure of affinity.

I first used gnina's scoring function, which is fast and should be relatively precise (in order for gnina to work!) The rationale here was that the "obstruction" due to the resistance mutation would be detectable as the new atom positions of the amino acid and ligand would lead to a lower affinity.

Estimated affinity given mutations at position 505 in 3OG7

Nope. The resistance mutation has higher affinity (realistically, there are no distinguishable differences for any mutation).

We also know that MEK1 V215E acts as a resistance mutation to PD0325901, and the PDB has a crystal structure of MEK1 bound to PD0325901 (PDB:70MX).

Estimated affinity given mutations at position 215 in 70MX

Again, I can't detect any difference in affinity due to the resistance mutation.

HAC-Net

I also tried a deep-learning based affinity calculator, HAC-Net. HAC-Net has a nice colab and is relatively easy to run Dockerized.

The HAC-Net colab gives me a pKd of 8.873 for 3OG7 (wild-type)

Estimated pKd given mutations at position 505 in 3OG7 using HAC-Net

I still see no difference in affinity with HAC-Net.

Each of these simulations (relaxing a protein–ligand system with solvent present) took a few minutes on a single CPU. If I wanted to simulate a full trajectory, which could be 50 nanoseconds or longer, it would take hundreds or thousands of times as long.


Conclusions

On the one hand, I can run state-of-the-art MD simulations pretty easily with this system. On the other hand, I could not discriminate cancer resistance mutations from neutral mutations.

There are several possible reasons. Most likely, the short relaxation I am doing is insufficient and I need longer simulations. The simulations may also be insufficiently accurate, either intrinsically or due to poor parameterization. It's difficult to feel confident in these kinds of simulations, since there's no simple way to verify anything.

If anyone knows of any obvious fix for the approach here, let me know! Probably the next thing I would try would be adapting the Making It Rain tools, which include (non-alchemical) free energy calculation. For some reason the Making It Rain colab specifies "This notebook is NOT a standard protocol for MD simulations! It is just simple MD pipeline illustrating each step of a simulation protocol." which begs the questions, why not and where is such a notebook?

I do think that enabling anyone to run such simulations — specifically, with default parameters blessed by experts for common use-cases — would be a very good thing.

There are already several cancer drug selection companies like Oncobox, so maybe there should be a company doing this kind of MD for predicting cancer resistance. Maybe there is and I just have not heard of it?

Addendum: modal labs

I have been experimenting with modal labs for running code like this, where there are very specific environment requirements (i.e., painful to install libraries) and heavy CPU/GPU load. Previously, I would have used Docker, which is fundamentally awkward, and still requires manually provisioning compute. Modal can be a joy to use and I'll probably write up a separate blogpost on it.

To do your own simulation (bearing in mind all the failed experiments above!), you can either use my MD_protein_ligand colab or if you have a modal account, clone the MD_protein_ligand repo and run

mkdir -p input && modal run run_simulation_modal.py --pdb-id 3OG7 --ligand-id 032 --ligand-chain A

This basic simulation (including solvent) should cost around 10c on modal. That means we could relax all 5000 protein–ligand complexes in the PDB for around $500, perhaps in just a day or two (depending on how many machines modal allows in parallel). I'm not sure if there's any point to that, but pretty cool that things can scale up so easily these days!

Comment

Using colab to chain computational drug design tools

Read More
Brian Naughton | Sat 25 February 2023 | biotech | biotech machine learning ai

Using GPT-3 as a knowledge-base for a biotech

Read More

Computational tools for drug development

Read More
Brian Naughton | Sat 30 October 2021 | biotech | biotech

Some notes on setting up data infrastructure for a new biotech.

Read More

DNA sequencing at home using Oxford Nanopore's new flongle sequencer.

Read More

A review of the amyloid hypothesis in Alzheimer's and some recent drug trials.

Read More
Brian Naughton | Mon 11 September 2017 | biotech | biotech vc

A brief look at Y Combinator's biotech investments in 2017.

Read More

Boolean Biotech © Brian Naughton Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More