Personalized peptides for respiratory viruses

2026-06-28 · biotech · ai, biotech, proteindesign, drugdelivery

The nose is the primary entry point for respiratory infections (hence rhinoviruses). This is because most of the air you breathe enters through here. The nose has a host of defenses, including the physical barrier of mucus (which also contains antibodies and peptidases) and the production of nitric oxide, a potent antimicrobial.

The other major entrypoint is the mouth, which has its own defenses including proteases and antibodies in saliva and the fact you can swallow viruses, yet mouth breathing is a significant risk factor for respiratory infections in children.

Respiratory infections are not treated as seriously as they should be. The flu causes tens of thousands of deaths per year in the US and costs tens or hundreds of billions a year; RSV is the leading cause of infant hospitalizations; long COVID affects millions and globally costs an estimated $1T annually; TB remains one of the world's deadliest infectious agents. Even the "common cold", which is actually infection by any of 200 or so viruses, can cause severe complications like pneumonia.

We are increasingly recognizing the accumulated burden of frequent infections, including connections to Alzheimer's and other neurological diseases, and the profound neurological benefits of vaccination, like the huge relative risks in this 2025 paper from Maggi et al.:

Vaccination against herpes zoster was associated with a reduced risk of any dementia (RR 0.76, 95% CI 0.69–0.83) and Alzheimer’s disease (RR 0.53, 95% CI 0.44–0.64). Influenza vaccination was linked to a reduction in dementia risk (RR 0.87, 95% CI 0.77–0.99), as was pneumococcal vaccination (RR 0.64, 95% CI 0.47–0.87) for Alzheimer’s disease. Tetanus, diphtheria, pertussis (Tdap) vaccination was also associated with a significant reduction for any dementia (RR 0.67, 95% CI 0.54–0.83).

We should not be surprised if today's viral load is damaging. The variety and frequency of infections we are now subjected to is an unnatural state for humans; as a species, we are accustomed to living in small tribes with no international travel.

By coincidence, just this week, Stripe launched the Intercept project, "a $500M philanthropic initiative to make respiratory infections, like the common cold and flu, a thing of the past" so at least some people are starting to take the problem seriously.

Nose sprays

This brings me on to one of my favorite subjects: nose sprays! Preventing infectious disease is one of the easiest way to improve long-term health and longevity. Luckily, nose sprays are a pretty simple, inexpensive, and effective intervention. These sprays primarily act outside of your cells, so the risks compared to e.g., antibiotics, are very low.

There are two major types of nose spray: those that act as a physical barrier to prevent entry of viruses, and the more drug-like antimicrobial or antihistamine sprays. The fact that most of the studies below are COVID-specific is just because of the timing of the pandemic and associated funding.

Physical barrier

Salinex and LUCA V-Defense are carrageenan-based sprays with evidence for COVID infection prevention. $20 per bottle
Profi is a gellan/pectin-based spray with evidence for influenza infection prevention in mice. $25 per bottle.
Xlear is a xylitol-based spray with weak evidence as a COVID treatment. $33 per bottle

Antimicrobial / antihistamine

NOWONDER is a nitric oxide spray with evidence for treating mild COVID. $40 per bottle.
ViralDine is a povidone iodine spray with evidence for COVID inactivation. $20 per bottle.
AstePro is an antihistamine spray with evidence for COVID infection prevention. $18 per bottle

Surprisingly, many of the papers above show pretty good evidence. Like masks, the mechanisms of action here are very straightforward.

My personal preference is for the physical barrier sprays. They act as an additional barrier like sunscreen, and appear to be very safe. For example, carrageenan is a GRAS food additive, sometimes used as a vegan alternative to gelatin.

Arguably, the successor to carrageenan sprays is Profi, which essentially builds on the "augmented mucus" concept. Profi has two main advantages over carrageenan: it provides both a physical barrier and pathogen neutralization, and it lasts a claimed eight hours. In 2024, the two professors at Harvard behind Profi published an intriguing study showing complete protection from Influenza A in a mouse model.

Profi acts as a physical barrier and neutralizes pathogens

I currently use Profi a maximum of once per day, but for more protection I would probably recommend Profi in the morning, and maybe NOWONDER nitric oxide spray before bed.

The alternatives

Good evidence and real papers are the exception in the supplement/wellness space. Maybe the nuttiest example is Oscillococcinum, which is somehow both homeopathic and snake oil, yet still gets sold in supermarkets all over the US and Europe.

Despite its insane ingredients, Oscillococcinum had revenue of $15M/yr in the US in 2008

Zicam, a nose spray you can find everywhere for "cold and allergy relief" appears to be exploiting the homeopathy loophole too. The evidence in its favor is weak, and there are hundreds of lawsuits filed against the company, alleging loss of sense of smell.

Despite weak evidence and potential anosmia, Zicam has revenue of approximately $100M/yr

A viral infection case study

The children sometimes get respiratory infections at school. The last time this happened was a couple of months ago, and I decided to sequence some saliva to see what the infectious agent was.

Many of the likely culprits are RNA viruses, so sadly you can't just do DNA sequencing, you need to do metatranscriptome sequencing.

Zymo has a great service where they will do 30 million paired end reads of metatranscriptomic sequencing for $375 from an unprocessed sample (e.g., saliva).

Zymo is very amenable to small projects, and processed my single sample. I did need Zymo DNA/RNA shield ($74) to stabilize the RNA, but I had some from a previous project. The sequencing took around six weeks, and the results look exceptionally clean.

Metatranscriptomic results

The metatranscriptomic analysis found a normal, healthy distribution of bacteria.

Top bacterial species

Species	Abundance	Phylum	Seq identity	Genome coverage
Porphyromonas pasteri	8.5%	Bacteroidota	97.4%	93%
Rothia mucilaginosa	5.8%	Actinobacteriota	98.4%	79%
Rothia sp001808955	2.8%	Actinobacteriota	98.1%	56%
Alloprevotella sp015257125	2.6%	Bacteroidota	97.8%	89%
Prevotella melaninogenica	2.2%	Bacteroidota	98.4%	71%
Actinomyces graevenitzii	2.2%	Actinobacteriota	97.1%	78%
Rothia sp015265375	2.2%	Actinobacteriota	98.2%	47%
Rothia mucilaginosa_B	2.2%	Actinobacteriota	98.2%	47%
Capnocytophaga gingivalis	1.8%	Bacteroidota	97.5%	75%
Neisseria perflava	1.5%	Proteobacteria	98.8%	58%
Streptococcus mitis_BB	1.4%	Firmicutes	99.0%	51%
Alloprevotella sp900095835	1.4%	Bacteroidota	98.3%	79%
Bulleidia sp015256775	1.4%	Firmicutes	98.1%	84%
Rothia aeria	1.3%	Actinobacteriota	98.3%	86%
Gemella sanguinis	1.1%	Firmicutes	97.9%	75%

Top viral species

Virus	Abundance	Note
Tomato brown rugose fruit virus	64%	dietary plant virus (tobamovirus)
uncultured phage	27%	bacteriophage
Human metapneumovirus (HMPV)	9%	real respiratory pathogen → target

The tobamovirus hit is probably from recently eaten food. There is only one human virus in the dataset: human metapneumovirus (HMPV). HMPV is a single-stranded RNA virus with a lipid coat that is one of the most common causes of the common cold. There is no antiviral treatment for HMPV. Like most viruses, the advice is to wait for your immune system to fight it off.

HMPV is usually not serious in older children or adults, but accounts for 5% to 10% of hospitalizations among pediatric patients with acute respiratory tract infections.

Diagram of HMPV from Lianou et al., 2025

Virus sequence

The sequence of the genome is about 3% diverged from the closest reference (FJ168778.1), with 64 missense mutations. It's not that surprising that it all matches a reference sequence so well, but it's still gratifying to see.

The sequence of my HMPV vs a reference sequence

Virus structure

The structure of the HMPV virus is way bigger and more complicated than you would think from the diagram above.

I am used to thinking of viruses as small icosahedra, with a tightly coiled genome inside (see this great article on icosahedral viruses from Asimov Press).

HMPV is pretty different: it's a coiled nucleoprotein, which requires around 1900 "N" proteins to cover the genome, producing a massive structure of hundreds of megadaltons per virion. All of this is squished into a lipid sphere like a ball of yarn.

I used AlphaFold 3 to fold ten nucleoproteins and some RNA from my virus. As you'd expect, given there are good reference structures in PDB, AlphaFold 3 does a fine job folding the nucleoproteins into a coil. In contrast, the RNA has formed a double-stranded hairpin and does not match the crystal structure.

(Left) Ten N proteins from my HMPV in a circular configuration (blue/yellow), with some RNA (orange) wound around. (Right) Eleven N proteins in a spiral configuration (PDB:8PDN)

Viral target

The "F" (Fusion) protein is the obvious target for a therapeutic. It is on the surface of the lipid envelope and mediates cell adhesion and membrane fusion. It has two configurations: pre-fusion and post-fusion. Pre-fusion is the unstable form. When it comes into contact with the host cell, it snaps into the more stable "harpoon" that mediates membrane fusion.

The pre-fusion F protein is compact and the post-fusion F protein is elongated

Luckily, there is a paper (Wen et al., 2012) where the authors created a Fab ("DS7", PDB:4DAG) that binds both the pre-fusion and post-fusion forms. This is the perfect example for us to use as a reference. The sequence of their F protein is 99% similar to ours.

Making a peptide therapy

What if we could design a peptide that binds to the virus and neutralizes it? How hard would that be?

I hear binder design is all the rage these days, so I tried to design a peptide binder. I happened to get some credits for the new BoltzGen API so I decided to try that.

Thanks to the Wen et al. paper, I had a good epitope to go after and a crystal structure of the pre-fusion F protein.

I pointed Claude at the BoltzGen API and asked for a peptide binder of length 20-40, aimed at the DS7 epitope. I spent around $200 on the BoltzGen API, and came up with a length 28 peptide: VKVYDTETPEGYEKWKELARESHGMADV.

Complex	ipTM	ipSAE	iLIS	Notes
Peptide binder + reference pre-fusion F	0.898	0.616	0.563	confident closed-state interface
Peptide binder + sequenced pre-fusion F	0.909	0.635	0.578	confident closed-state interface
Peptide binder + reference post-fusion F	0.159	0.000	0.000	no confident open-state interface

The properties of the binder are pretty good, but not ideal.

The ipTM is high; the ipSAE is relatively high, given the size of the peptide; the iLIS is far into the "confident" range (>0.223), implying a low false positive rate.

One potential limitation is that in theory the pre- and post-fusion forms of the F protein have the same epitope, but when I refold with the post-fusion form it does not appear to bind. In practice, we probably only care about binding the pre-fusion form (before adhesion has occurred).

So I can't say it's definitely a binder, but I think it has a reasonably good shot of binding. Usually, if there is a known binder in PDB, making another binder for the same epitope is not so difficult.

How to make the peptide

There are two main ways to make a peptide: with a ribosome or with chemistry (solid phase synthesis). If you use a ribosome (i.e., translation in a cell or cell-free system), then you need to purify the peptide. For short peptides it's generally easier to synthesize chemically. For example, you can order a peptide from GenScript for around $10-25 per amino acid.

The main advantages of using chemical synthesis are (a) purity: specifically, the lack of endotoxins you get with ribosomal production; (b) the ability to go beyond the simple 20 proteinogenic amino acids.

For this peptide, we may want to add an N-terminal Palmitic acid or a similar fatty acid, which should anchor the peptide in the cell membrane, and prevent it getting flushed as mucus refreshes.

This peptide would cost around $600 and take around 20 business days to arrive

Note, I did not test the binder against the F protein! Maybe I'll do it at Adaptyv at some point just for interest's sake.

Safety

One big open question is whether a peptide like this, sprayed into the nose, would be safe. The main reasons I think it probably would be are that (a) it's extracellular; (b) our noses are exposed to tons of peptides all day (e.g., pollen); (c) if the user experienced irritation, they could stop using it—it doesn't persist.

I did a quick review of the literature, and did not find much on the topic.

Conclusion

It's fun to sequence viruses and design peptide binders, but how would a peptide therapeutic like this actually work in practice?

Detection

First we would need a rapid test that could tell us which virus is present. In theory, sequencing would be best. Oxford Nanopore could do it, but it is still a bit impractical, especially since you'd need RNA, and ideally results within an hour or so.

The most practical thing would probably be an ELISA, similar to the rapid COVID-19 tests. Today you can buy a COVID-19 / Flu A/B / RSV test in the US for around $10. Or, if you go on alibaba, you can buy a 10 in 1 test that includes HMPV for $2.

10 in 1 test kit for "cat, dog, human"(!)

Once you have identified HMPV as the virus, then you would spray the peptide in your nose. Would this actually work post-infection? That is very unclear, though even "protective" sprays like carrageenan do appear to reduce the duration of infection. It is much more likely it could prevent others from getting the virus.

My original idea here was to see if it would make sense to sequence and make a personalized peptide per virus. The answer is probably no, because, as we saw, the viruses are usually not that different, and the steps currently take way too long when a virus can run its course within a week or less.

Instead, we could make a cocktail of peptides to address the top ten common cold viruses. Influenza may evolve too quickly to be included in the panel—it depends on whether we can design a binder to a slowly-evolving part of the virus. Arguably this is all overkill when safe, protective nose sprays exist, but we should do it anyway!

Thanks to Darren Zhu and Saoirse N for helpful comments on this article.

VHH design competition results and easymosaic

2026-04-01 · biotech · ai, biotech, proteindesign, competition

A few months ago I launched a VHH binder design mini-competition. The itch I wanted to scratch was to see how well binder design tools do when run without hand-holding by the developers themselves—i.e., when run the way a typical user would.

There are more details in the original blogpost, but the gist was that the competitor submits a script to generate designs, and I run that script on a target.

If we had a "best script" for binder design, kind of like AlphaFold 3 is for folding, it would be hugely enabling for scientists.

I ended up allowing $100 of compute per design, which I thought was just on the edge of possibly producing a binder. It's also approximately the price of testing one design in the lab, which seems like a reasonable benchmark. The consensus from experts I talked to was that this would be insufficient to generate a binder. Turns out they were right! Nevertheless, here is the rundown.

Competitors

I convinced one person to enter this competition: Nick Boyd from Escalante Bio. Nick won the recent Adaptyv Nipah G competition using his own Mosaic protein design library (and it wasn't close!)

As you'd expect, Nick entered using a Mosaic script, similar to his Nipah G script, but adapted to generate a VHH instead of a mini-binder. While Mosaic is well validated for mini-binders, it has not really been tested for VHH designs, which are generally believed to be more difficult.

I entered using a BoltzGen script. My reasoning was that BoltzGen showed very strong results for VHH designs in their preprint, though they certainly used a lot more GPU hours than I did.

BoltzGen has arguably the strongest published VHH design results

Results

I tested the designs against MBP, part of Adaptyv's BenchBB benchmark, which is a set of seven standardized targets designed to be used for benchmarking. If you elect to make the results public, as I did, you get a discount.

I posted the scripts and full results from Apaptyv on the competition github repo. The results should also appear on proteinbase.com in the near future. Of course, there is not much to see here, since none of the designs bound!

EasyMosaic

One complication of Mosaic compared to other tools like BindCraft, BoltzGen, or mBER is that Mosaic is a library, so the user is expected to define their own optimization parameters and loss function. For example, you could define a loss function as a weighted sum of ipTM, pLDDT, and distance to epitope. Different binder design problems might require a different balance of weights. This is a very powerful approach, and allows the user to tune Mosaic for different targets and use-cases, but it can be difficult to know where to start.

Part of the point of this competition was to see if Mosaic could be packaged into a user-friendly script. Since its success in the Nipah G competition, there has been quite a bit of interest in this.

With some advice from Nick on parameters, I made a web-based interface to mosaic called easymosaic. As with most of my stuff, it runs on modal and lets you run Mosaic with some reasonable default parameters for mini-binders or VHHs. The minibinder parameters should match the parameters used by Nick in the Nipah G competition.

Easymosaic is designed to do a decent job producing a binder without the need for parameter tuning. Your mileage will certainly vary a lot based on your target!

Like protein folding tools, easymosaic's interface has almost no options

Mosaic-TUI

Nick's own Mosaic-TUI is a similar idea, but is more suitable for power users. It runs in the terminal, exposes all the relevant parameters, and has some nice features like the ability to use multiple GPUs.

Both easymosaic and Mosaic-TUI use B200 GPUs by default, so it is very easy to spend hundreds of dollars for a few good designs. Each design, before filtering out the bad ones, can cost $1 or more.

Mosaic-TUI has a sweet retro-futuristic UI

Sadly it's a bit too late to use either of these tools to enter the Adaptyv RBX1 competition but I'm sure there will be more competitions coming!

Hopefully, binder design tools will make some advances and I can try this again in a year or so, with a better chance of success. There are still plenty of things to try: combining the strengths of diffusion with hallucination; grounding designs in physics, etc.

Using Claude Code for computational biology

2025-12-13 · ai · biotech, ai

People are very excited about Anthropic's new Opus 4.5 model, and I am too. It is arguably the first coding model that can code continuously for hours without hitting a wall, or entering a doom loop (continually producing the same bugs over and over.)

Opus 4.5 has crossed a threshold where it has led to what appears to be a permanent change in how I work, so I wanted to write up a short article on this, with a real-world example.

For software engineers, it's obvious how coding agents help: they write code for you. For computational scientists, writing code is one step of many: you read papers, download tools and data, log the steps and parameters of the experiment, plot results and write it all up. This is where agents like Claude Code shine.

Claude Code

There are two main ways to use Opus 4.5: in the Claude chat interface, just like ChatGPT etc., or as an agent in Claude Code. The difference is that an agent is a program running on your computer: it doesn't just produce text, it can run arbitrary commands in the terminal on your behalf.

With Opus 4.5, Claude Code is good enough that it is starting to become my primary interface to the terminal, not just my primary interface to code. This is a little hard to explain, but I will show a real-life example from my own work that hopefully illustrates the point.

You can categorize the eras kind of like self-driving cars. The first era, with zero AI, ended just a few years back, and I now feel like I am now on era four. Things are progressing quickly!

Manual: I write code; I run code; I review output (most of my career!)
Copilot: I co-write code with AI (e.g., in an IDE like Cursor); I run code; I review output
Human-in-the-loop: Claude Code (CC) writes code; I read code to check it; I run code; I review output
Agent: CC writes code; CC runs code; CC and I review output
Teams of agents: A team of CC agents write code, run code and review output over multiple hours; they contact me when they need input
Autonomous: A team of CC agents work collaboratively and never need my input??

Adding ipSAE to af2rank

The task here is to add the ipSAE statistic to my af2rank modal app in the biomodals repo. The details don't matter too much, but ipSAE is a popular method of scoring protein–protein interactions—for example it is used in the recent Nipah protein design competition from Adaptyv—and there is a reference implementation on github.

Below are my two(!) prompts to Claude, abridged responses from Claude, and the final result. The full transcript is long so I have included it at the end of the post. There you can see more of the warts and debugging, and how much the model does besides just write code.

[Me] >clone the https://github.com/DunbrackLab/IPSAE repo, extract the ipsae scoring function, and add it to the af2rank modal app

[Claude clones the IPSAE github repo, finds the relevant files, reads the code, finds my modal_af2rank.py file, reads it, and updates it]

[Me] >ok test the repo to make sure it works using a test pdb file as in the docstring; then run ipsae and compare the results to make sure the calculation is correct

[Claude downloads a test pdb file, runs the modal app, learns how to run the reference IPSAE code, debugs several issues, and finally produces a working implementation, which it validates by checking the results against the reference. Finally it summarizes all it did.]

[Claude]
⏺ Now all scores match:

  Final Comparison:

  | Metric        | Our Implementation | Original IPSAE |
  |---------------|--------------------|----------------|
  | A→B ipSAE     | 0.2760             | 0.2760 ✅      |
  | B→A ipSAE     | 0.2862             | 0.2862 ✅      |
  | A-B_max ipSAE | 0.2862             | 0.2862 ✅      |
  | A→B LIS       | 0.2998             | 0.2998 ✅      |
  | B→A LIS       | 0.2206             | 0.2206 ✅      |
  | A-B_max LIS   | 0.2602             | 0.2602 ✅      |
  | pDockQ        | 0.1476             | 0.1476 ✅      |
  | A→B pDockQ2   | 0.1612             | 0.1612 ✅      |
  | B→A pDockQ2   | 0.1289             | 0.1289 ✅      |

⏺ All IPSAE scores now match exactly between our implementation and the original IPSAE script.

  Summary of changes made to modal_af2rank.py:

  1. Added IPSAE scoring functions (lines 66-260):
    - ptm_func() / ptm_func_vec() - PTM-like scoring
    - calc_d0() - TM-score d0 normalization
    - calc_ipsae_scores() - Computes ipSAE, pDockQ, pDockQ2, and LIS for all chain pairs
  2. Modified _get_score() to compute IPSAE scores for multimer models
  3. Added CLI parameters --pae-cutoff (default 10.0) and --dist-cutoff (default 8.0)
  4. Added PAE export for external comparison (af2_scores.json and *_predicted.pdb)

Full disclosure, I have previously asked Claude Code to do this exact task before, just in a different codebase, and that time it actually went smoother. I posted that on twitter, and then thought it might make for an interesting post on the blog.

This time it was not quite as slick, but this represents a more realistic example of my typical Claude Code interactions. The path was circuitous, but it got there in the end, with zero intervention on my part. The final result is functional but overwrought.

Conclusions

I don't test all the models all the time, but Opus 4.5 is the first model I've used that can keep context and make linear progress like this, and others seem to agree. I should also note I pay $200 per month for Claude Code access, but there are lower tiers. OpenAI and Gemini both have tools similar to Claude Code. There are also cheaper but workable Chinese models like Qwen. If these are not as good as Opus 4.5 today, they will be very soon.

People are often concerned with AIs writing "slop" code, meaning poor quality or unmaintainable code. Honestly I'm not totally sure I should care about that any more? If it is maintainable by Claude, that might be all that matters.

This workflow is powerful for software, but in software you probably have to care more about maintainability and consistency. Specifically, the crux may be if a team of people is working on the same code. There are definitely examples online of software teams trying out agents, but then reverting back to no AI for this reason. Even fully AI-pilled experts like Karpathy find full vibe-coding a poor fit for many kinds of code.

I would argue these agents are a better fit for scientific code because it's usually not the code that matters but the insight. If you can test ten hypotheses in a day instead of one, or if a scientist with no deep learning experience can now finetune a model themselves, the maintainability of the code, or even the language the code is written in, matters much less.

The relationship here is more like PI–student, and in general a PI would not look at code, but would ask questions to make sure it's working as intended. You can talk to Claude in the same way ("did you check this?", "did you read this paper?").

I have started doing projects from start to finish like this, never leaving Claude Code, and that includes writing up all the steps (which is tedious work), and detailing the results. It's not always faster or better, screws up frequently, and you have to learn how to handle "context"—i.e., not overloading Claude with too much information—but it is powerful in a very compelling way. Even though it's only been a couple of weeks, it's the way I now kickstart any new project and I can't see going back.

Claude Code transcript

>clone the https://github.com/DunbrackLab/IPSAE repo, extract the ipsae scoring function, and add it to the af2rank modal app
⏺ Bash(git clone https://github.com/DunbrackLab/IPSAE.git)
⏺ Read(modal_af2rank.py)
⏺ Bash(ls -la IPSAE/)
⏺ Read(IPSAE/ipsae.py)
⏺ Now I understand both files. The IPSAE script is a comprehensive scoring function for protein-protein interactions that calculates ipSAE, pDockQ, pDockQ2, and LIS scores. I need to extract the core scoring logic and integrate it into the af2rank modal app.
⏺ Read(IPSAE/README.md)
⏺ Now I understand the IPSAE scoring. For AF2Rank, I need to extract the key calculations:
  - ipSAE: Interface predicted scoring using PAE with cutoff
  - pDockQ: Based on pLDDT and contact counts
  - pDockQ2: Based on PAE values at interface
  - LIS: Local Interaction Score
  The key insight is that AF2Rank already calculates PAE and pLDDT, so I can add IPSAE scoring using those values. Let me create a clean implementation that works with the ColabDesign output.
⏺ Update(modal_af2rank.py)
⏺ Now I need to update the _get_score method and the run_af2rank function to compute and include IPSAE scores. Let me first look at how to extract the necessary information from the ColabDesign model:
⏺ Read(modal_af2rank.py)
⏺ I see that the model has access to all_atom_positions which includes CA (index 1) and CB (index 3) positions. I also see that score["pae"] is scaled by 31.0 from the model output. To compute IPSAE scores, I need to:
  1. Get the PAE matrix (unscaled, as angstroms)
  2. Get pLDDT per residue
  3. Get CB coordinates
  4. Get chain assignments
  The challenge is getting chain information. For multimer models, this should be stored in the model. Let me modify _get_score to compute IPSAE when using multimer:

⏺ Now I need to update the predict method to pass through pae_cutoff and dist_cutoff parameters:
⏺ Now I need to update the call to _get_score() in the predict method to pass the cutoffs:
⏺ Now I need to update the run_af2rank function to add the pae_cutoff and dist_cutoff parameters and pass them through:
⏺ Now let me also update the local entrypoint to add these parameters:
⏺ Now I need to check if the pLDDT is stored as per-residue array in the model. Let me look at how pLDDT is handled in the af2rank class - I see it converts score["plddt"] to a scalar. For IPSAE we need the per-residue array. Let me check the ColabDesign aux output more carefully:
⏺ Now let me also add IPSAE scores to the print output:
⏺ The JSON serialization may have issues with numpy arrays in the score dictionary. Let me check the results saving and ensure the ipsae scores are properly serialized
⏺ I need to add a custom JSON encoder for numpy types:
⏺ Here's a summary of the changes made to modal_af2rank.py:

  Added IPSAE scoring to AF2Rank:

  1. New scoring functions (lines 66-260):
    - ptm_func() / ptm_func_vec() - PTM-like scoring function
    - calc_d0() - TM-score d0 normalization
    - calc_ipsae_scores() - Main function computing ipSAE, pDockQ, pDockQ2, and LIS scores for all chain pairs
  2. Modified _get_score() (lines 437-510):
    - Added pae_cutoff and dist_cutoff parameters
    - For multimer models, extracts per-residue pLDDT, PAE matrix, CB coordinates, and chain assignments
    - Calls calc_ipsae_scores() and adds results to score dict
  3. Modified predict() (lines 507-628):
    - Added pae_cutoff and dist_cutoff parameters
    - Prints IPSAE scores when verbose
  4. Modified run_af2rank() and main():
    - Added pae_cutoff (default 10.0) and dist_cutoff (default 8.0) CLI parameters
    - Added JSON serialization helper for numpy types

  Usage with multimer:
  uv run modal run modal_af2rank.py --input-pdb complex.pdb --model-name "model_1_multimer_v3" --chains "A,B"

  The IPSAE scores will be computed automatically for multimer models and included in the results.json output.

> ok test the repo to make sure it works using a test pdb file as in the docstring; then run ipsae and compare the results to make sure the calculation is correct
⏺ Bash(wget -q https://files.rcsb.org/download/4KRL.pdb -O 4KRL.pdb && head -50 4KRL.pdb)
⏺ Bash(uv run modal run modal_af2rank.py --input-pdb 4KRL.pdb --model-name "model_1_multimer_v3" --chains "A,B" --run-name test_ipsae 2>&1) timeout: 10m 0s
⏺ It ran successfully! The IPSAE scores show 0.0 which seems suspicious - all scores are zero. Let me check the results and also run the original IPSAE script to compare:
⏺ The PAE is 3.38 which is good (low is better), but the IPSAE scores are all zeros. The problem is likely that I'm computing IPSAE from the PAE matrix incorrectly - the PAE matrix values from ColabDesign might already be scaled differently, or the chain assignments might be wrong.
⏺ Read(modal_af2rank.py)
⏺ Let me look at the _get_score method where the IPSAE calculation happens:
⏺ Read(modal_af2rank.py)
⏺ The issue is that self.model.aux["pae"] might be a scalar (the mean PAE), not the full matrix. Let me add some debug output to understand what's happening:
⏺ Bash(uv run --with modal==1.2.1 modal run modal_af2rank.py --input-pdb 4KRL.pdb --model-name "model_1_multimer_v3" --chai…) timeout: 10m 0s
⏺ The PAE matrix values are way too high: min=23.25, max=893.24. This is way above any reasonable PAE cutoff. The issue is that I'm multiplying by 31 but the values are already in some other scale. The
  score["pae"] in the log shows 3.379 which is the mean - so the raw matrix values must already be in angstroms or a different scale.

  Looking at the original code in _get_score:
  score["pae"] = 31.0 * score["pae"]  # This converts the scalar log value

  But the raw self.model.aux["pae"] matrix might not need the 31x scaling. Let me check:

⏺ Bash(uv run --with modal==1.2.1 modal run modal_af2rank.py --input-pdb 4KRL.pdb --model-name "model_1_multimer_v3" --chai…) timeout: 10m 0s
⏺ Now we get non-zero IPSAE scores:
  - A-B: ipSAE=0.2760
  - B-A: ipSAE=0.2862
  - A-B_max: ipSAE=0.2862

  The PAE matrix now has reasonable values: min=0.75, max=28.81 angstroms. Let me remove the debug output and then run the original IPSAE script to compare:
⏺ Update(modal_af2rank.py)
⏺ Now let me run the original IPSAE script. First I need to save the PAE matrix from af2rank to a JSON file that IPSAE can read. But wait - the IPSAE script expects AlphaFold output files (PAE JSON + PDB). We're running AF2Rank which uses ColabDesign internally.

  Let me check what we can do - we need to either:
  1. Export the PAE matrix and pLDDT from af2rank to files IPSAE can read
  2. Or run an actual AlphaFold prediction on 4KRL and then compare

  Actually, the simplest approach is to save the PAE matrix and pLDDT from af2rank and run the IPSAE calculation on those. Let me modify the run_af2rank function to also save the PAE and pLDDT data:
⏺ Read(modal_af2rank.py)
⏺ Let me add PAE matrix export to the run_af2rank function so we can compare with the original IPSAE script:
⏺ Update(modal_af2rank.py)
⏺ Bash(uv run --with modal==1.2.1 modal run modal_af2rank.py --input-pdb 4KRL.pdb --model-name "model_1_multimer_v3" --chai…) timeout: 10m 0s
⏺ Bash(ls -la out/af2rank/test_ipsae_final/)
⏺ Now let me run the original IPSAE script with the exported data:
⏺ Bash(python IPSAE/ipsae.py out/af2rank/test_ipsae_final/af2_scores.json out/af2rank/test_ipsae_final/4KRL_af2rank.pdb 10 10 2>&1) timeout: 1m 0s
⏺ There's a mismatch in the number of residues. The PDB file has 331 residues but the distance matrix has 333. This is likely because the original 4KRL.pdb has different residues than what ColabDesign outputs. Let me check:
⏺ Bash(grep "^ATOM" 4KRL.pdb | grep " CA " | wc -l && grep "^ATOM" out/af2rank/test_ipsae_final/4KRL_af2rank.pdb | grep " CA " | wc -l)
⏺ Both have 327 CA atoms which matches the PAE matrix size (327x327). Let me check what's happening in the IPSAE script:
⏺ Bash(python -c "...)
⏺ The data looks correct. The issue is that the IPSAE script is reading something else from the PDB. Let me check the original 4KRL.pdb vs the af2rank output:
⏺ Bash(grep "^ATOM" 4KRL.pdb | head -5 && echo "---" && grep "^ATOM" out/af2rank/test_ipsae_final/4KRL_af2rank.pdb | head -5)
⏺ The PDB files are identical. Let me check how IPSAE reads the PDB file - it might be counting something differently:
⏺ Bash(python -c "...)
⏺ Everything matches. The issue must be in how IPSAE is reading the file. Looking at the error more closely - it says shapes (331,) and (333,). Let me check what IPSAE is actually reading:
⏺ Bash(cd out/af2rank/test_ipsae_final && python ../../../IPSAE/ipsae.py af2_scores.json 4KRL_af2rank.pdb 10 10 2>&1 | head -50)
⏺ Let me add some debug output to the IPSAE script to understand what's happening:
⏺ Bash(python -c "...)
⏺ Bash(pwd && ls out/af2rank/)
⏺ Bash(ls out/af2rank/test_ipsae_final/)
⏺ It looks like IPSAE already ran successfully! There are output files. Let me check the IPSAE output:
⏺ Bash(cat out/af2rank/test_ipsae_final/4KRL_af2rank_10_10.txt)
⏺ Read(out/af2rank/test_ipsae_final/4KRL_af2rank_10_10.txt)
⏺ The file is empty. Let me run IPSAE fresh:
⏺ Bash(python IPSAE/ipsae.py out/af2rank/test_ipsae_final/af2_scores.json out/af2rank/test_ipsae_final/4KRL_af2rank.pdb 10 10 2>&1)
⏺ Let me look at what's happening in IPSAE around line 612. The issue is the mismatch between chains (331) and distances (333). Let me add debug output to understand:
⏺ Read(IPSAE/ipsae.py)
⏺ I see - chains is built from CA atoms (line 379, residues), but distances is built from coordinates which comes from cb_residues (line 378). The issue is that cb_residues has a different count because GLY residues add their CA to cb_residues too. Let me check the PDB more carefully:
⏺ Bash(python -c "...)
⏺ So our PDB should be fine (327 CA, 327 CB+GLY). The error mentions 331 and 333 which doesn't match our data. Let me check if there's something else going on - maybe the IPSAE script is reading a different file or the arrays are being built incorrectly for AF2 format:
⏺ Bash(python -c "...)
⏺ Now I see the problem! The PDB has 331 CA residues and 333 CB residues, but our PAE matrix is 327x327. The issue is that the original 4KRL.pdb has more residues than what ColabDesign processed. The 4KRL_af2rank.pdb file is just the original input - it wasn't updated by af2rank. Let me check if we're copying the wrong file:
⏺ Read(modal_af2rank.py)
⏺ Yes - the PDB being saved is pdb_str which is the original input, not the processed structure from ColabDesign. I need to save the predicted structure instead. Let me also save the output PDB from af2rank. But actually, for IPSAE comparison we need the original structure with the same residues as the PAE matrix.  The issue is that ColabDesign processes the input and may remove some residues. We should save the structure that ColabDesign actually used. Let me modify to save the model's output PDB:
⏺ Update(modal_af2rank.py)
⏺ Bash(uv run --with modal==1.2.1 modal run modal_af2rank.py --input-pdb 4KRL.pdb --model-name "model_1_multimer_v3" --chai…) timeout: 10m 0s
⏺ Bash(ls -la out/af2rank/test_ipsae_compare/)
⏺ Now let me check the predicted PDB size and run IPSAE on it:
⏺ Bash(grep "^ATOM" out/af2rank/test_ipsae_compare/4KRL_predicted.pdb | grep " CA " | wc -l)
⏺ 327 CA atoms - matches the PAE matrix. Now let me run IPSAE:
⏺ Bash(python IPSAE/ipsae.py out/af2rank/test_ipsae_compare/af2_scores.json out/af2rank/test_ipsae_compare/4KRL_predicted.pdb 10 10 2>&1)
⏺ Bash(cat out/af2rank/test_ipsae_compare/4KRL_predicted_10_10.txt)
⏺ IPSAE ran successfully. Now let me compare the results:
⏺ Comparison of Results:

  | Metric        | Our Implementation (af2rank) | Original IPSAE  |
  |---------------|------------------------------|-----------------|
  | A→B ipSAE     | 0.2760                       | 0.2760          |
  | B→A ipSAE     | 0.2862                       | 0.2862          |
  | A-B_max ipSAE | 0.2862                       | 0.2862          |
  | A→B LIS       | 0.2998                       | 0.2998          |
  | B→A LIS       | 0.2206                       | 0.2206          |
  | A-B_max LIS   | 0.2602                       | 0.2602          |
  | pDockQ        | 0.0183                       | 0.1476          |
  | pDockQ2       | 0.0074                       | 0.1612 / 0.1289 |

  The ipSAE and LIS scores match exactly! However, there's a significant difference in pDockQ and pDockQ2. Let me investigate the pDockQ calculation:
⏺ Read(modal_af2rank.py)
⏺ I see the issue. The pDockQ calculation uses dist_cutoff which we pass as a parameter (8.0 by default), but the original IPSAE uses a hardcoded 8.0 Å for pDockQ. More importantly, the pLDDT values are in
  different scales:
  - Our implementation uses ColabDesign's pLDDT which is 0-1 scale
  - IPSAE expects pLDDT in 0-100 scale

  Let me check the IPSAE code for pDockQ:
⏺ I see - IPSAE uses cb_plddt which comes from the JSON file. Let me check how it's loaded:
⏺ Read(IPSAE/ipsae.py)
⏺ The IPSAE script uses data['plddt'] directly. In our export, we multiply by 100:
  "plddt": (plddt_per_res * 100).tolist(),  # IPSAE expects 0-100 scale

  But in our internal calculation, we use the raw 0-1 pLDDT from ColabDesign. Let me fix the pDockQ calculation to use the 0-100 scale:
⏺ Update(modal_af2rank.py)
⏺ Bash(eval "$(uv run --with modal==1.2.1 modal run modal_af2rank.py --input-pdb 4KRL.pdb --model-name "model_1_multimer_v3" --chai…) timeout: 10m 0s
⏺ Now all scores match:

  Final Comparison:

  | Metric        | Our Implementation | Original IPSAE |
  |---------------|--------------------|----------------|
  | A→B ipSAE     | 0.2760             | 0.2760 ✅      |
  | B→A ipSAE     | 0.2862             | 0.2862 ✅      |
  | A-B_max ipSAE | 0.2862             | 0.2862 ✅      |
  | A→B LIS       | 0.2998             | 0.2998 ✅      |
  | B→A LIS       | 0.2206             | 0.2206 ✅      |
  | A-B_max LIS   | 0.2602             | 0.2602 ✅      |
  | pDockQ        | 0.1476             | 0.1476 ✅      |
  | A→B pDockQ2   | 0.1612             | 0.1612 ✅      |
  | B→A pDockQ2   | 0.1289             | 0.1289 ✅      |

⏺ All IPSAE scores now match exactly between our implementation and the original IPSAE script.

  Summary of changes made to modal_af2rank.py:

  1. Added IPSAE scoring functions (lines 66-260):
    - ptm_func() / ptm_func_vec() - PTM-like scoring
    - calc_d0() - TM-score d0 normalization
    - calc_ipsae_scores() - Computes ipSAE, pDockQ, pDockQ2, and LIS for all chain pairs
  2. Modified _get_score() to compute IPSAE scores for multimer models
  3. Added CLI parameters --pae-cutoff (default 10.0) and --dist-cutoff (default 8.0)
  4. Added PAE export for external comparison (af2_scores.json and *_predicted.pdb)