As a quick recap: Last year, a challenge on the machine learning competition platform Kaggle was launched to use the data from the OpenKnot labs (as well as other sources) to develop new models which could predict reactivity. From there, an even better model - called RibonanzaNet - was developed, and then fine-tuned to predict secondary structure, creating the most accurate folding engine to date (though it will be interesting to see how AlphaFold3 stacks up!). For more on that, check out the preprint: https://www.biorxiv.org/content/10.1101/2024.02.24.581671v1
We’ve been recently working on getting this new folding engine - RibonanzaNet-SS - into the hands of players so that you can have this tool at your disposal. I’m happy to share that an initial version is now available for testing! You can access it through a demo version of puzzlemaker at the following link: https://scratch-static.eternagame.org/rnnet-demo/index.html?mode=puzzlemaker. Note that this is temporary and will be taken down once it is available in the main game.
We’d love to hear your feedback - let us know if it’s useful in your lab work, see any behavior that is worth noting, potential issues, etc!
Note that this is still a work in progress, and I’ll be updating this thread as it evolves and gets closer to making its way into the main game. Of particular note:
The model will currently predict hairpins of length 2. This was more or less an oversight that will be corrected soon.
You will see energies showing in puzzlemaker. These are carried over from the previous engine selected, and is an existing bug with all engines where energies should be disabled and will be fixed before release.
Not yet implemented are two new metrics: Estimated F1 score (eF1) and Estimated F1 score over cross-paired regions (eF1,cross-pair). These scores give an indication of how confident the model is in its prediction. F1 itself is a statistical metric, and the paper describes how we find it to be highly correlated with the average of the raw model output (pairing probabilities). We anticipate these will be put in the specbox (where the dot plot and other metrics are).
This new model is large, requiring an additional download of ~60-70MB. This may take some time on slow internet connections (or from regions geographically far from our servers). We’ll be investigating options to slim down the download size (both for this model and other parts of the code), load in the background, only load when needed, etc, download progress indicators, improved global access, etc.
There may be updates to the model itself in the future either providing improved accuracy or from creating a “lighter” version of the model which is less resource intensive (while hopefully maintaining reasonable accuracy)
Due to underlying technology limitations, this model can only be run asynchronously. That means interacting with it from EternaScript will not be possible. I’ll be adding new asynchronous APIs that can be opted into in order to use EternaScript with this model.
Some other things to be aware of as far as how this model behaves:
This model behaves substantially differently to the other folding engines we’re used to. Other energy models are ultimately derived from nearest neighbor free energies, and the natural structure is the minimum free energy structure (MFE, determined by finding the structure where the energies add up to the lowest amount vs all other possibilities). EternaFoldThreshknot is different in that it takes the dot plot (which encodes the probability any two bases will pair based on the energies of all possible structures) and picks out the “best pairs”. However, RibonanzaNet-SS is different: Energies are not calculated at all. It uses machine learning to find how patterns of bases contained across the sequence correlate to large volumes of SHAPE data as well as known secondary structures through a large number of relatively complex calculations. The dot plot shows how likely it thinks each set of bases is to pair based on its understanding of these patterns and relationships, and the predicted secondary structure is created from these probabilities using the hungarian algorithm (similar, but not the same as threshknot). It should generally behave in ways that you’d expect from other models (strong pairs are stronger, boosting exists, etc) but it may not behave with fully consistent rules like you’re used to!
There are naturally no energies available - however, you may want to consider using the “target expected accuracy” metric in the specbox (below the dot plot), which indicates how closely the dot plot matches the target structure.
This model is relatively resource intensive. In my testing, for a 150 base puzzle, memory is ~850-950MB with a fold time of ~1.3-1.5s and for a 500 base puzzle, this increases to ~3.5-3.8GB of RAM with a fold time of ~20s. Even with more available RAM than this, you will find the model will crash due to current limitations in the underlying technology (which will change at some point, but we recognize even this could be too large for some players). On slower devices this will run even slower - one of the others on our team found it to run ~10-20x as slow. There is a possibility for this to be GPU accelerated in the future, but the technology is not quite available yet (and the requirements may still wind up being out of reach for many folks).
Due to potential for model changes and concerns around playability (due to how “opaque” this new model is), we won’t be enabling the ability to publish RibonanzaNet-SS puzzles for the forseeable future.
Thank you, LFP6, for the integration of the new RNNet model! I love it and have had fun with making actual stable versions of the Azoarcus Ribozyme with a ton of mutations mostly from my Azoarcus Ribozyme safer mutation sites spreadsheet and found with BLAST. 40 mutations and still stable
However at first I was confused, because I didn’t get the scores that I got from running the structure model from the Kaggle notebook. I just see that my fold is likely - due to no red spots. Those scores from Kaggle was what I wanted. Pulling them from the Kaggle notebook is tedious. Otherwise it is just an engine like all the others and probably more accurate. But no news on how well the knot does.
Example with scores from one of my Kaggle notebook runs:
LFP6 kindly explained the scores to me earlier: global —> over the entire sequence, not just crossed pairs ef1 —> estimated F1
So the global score is for the entire sequence and ef1 crossed pairs is for how certain the pseudoknot is estimated to be.
Getting those scores are going to speed up our designing of real fine pseudoknots. Just imagine what @spvincent could do with this and his synthetic knots
There is one thing more that I have hoped for getting with the new model. I would have wanted a map like the one Yang and AF3 provides.
I first saw this type of map when DigitalEmbrace wrote a blog post on the new model.
I looked up the MERS frameshift element and got lucky that it had been run in lab. When I compared the lab design to the chemical mapping image it hit me, that I could see from it that it formed a pseudoknot. Color highlight of the different structures on both lab design and chemical map.
However at first I was confused, because I didn’t get the scores that I got from running the structure model from the Kaggle notebook. I just see that my fold is likely - due to no red spots. Those scores from Kaggle was what I wanted. Pulling them from the Kaggle notebook is tedious. Otherwise it is just an engine like all the others and probably more accurate. But no news on how well the knot does.
Both ef1 and ef1,cross-pair will be coming! Just not ready yet. You might want to consider looking at target expected accuracy (TEA) for now, which tells you how closely the dot plot matches your target structure (you just need to make sure the target structure is correct).
There is one thing more that I have hoped for getting with the new model. I would have wanted a map like the one Yang and AF3 provides.
The first graph you showed is actually a simulated mutate-and-map vs experimental mutate and map. While it is technically possible to implement, there are a couple of potential issues:
Getting the actual predicted reactivity requires loading a separate model. We’ve discussed doing this, though it means adding yet another 40-50 MB of download (and it has to be run separately from the secondary structure model, so it will take longer to run and use more RAM too).
It would be quite slow. If it takes 20 seconds to fold one sequence, and you now have to do that 150 times (for a 150 base sequence), it will take a while
At least for now, I think generating something like that via a Kaggle notebook would probably be the right approach - though I don’t think anyone has put that together yet.
As far as I can tell, the error plot from alphafold shows the expected error in predicted distance between any two residues, where the yang server is showing the actual predicted distance between any two residues. That specifically is only relevant for 3D predictions.
That said: We already have something analogous - the dot plot! If you view the dot plot with RibonanzaNet-SS active, it will be showing you the model’s output which is its confidence/predicted likelihood that any two bases will be paired.
I will be looking very much forward to getting the ef1 and ef1cross-pair scores.
The thing with (TEA) target expected accuracy you mention, I am not sure how to do. Could you give me an example?
I now understand that the chemical mapping and the 3D predictions from Yang and AF3 are different beasts. I would still like eventually to get an option to see such a chemical mapping if a sequence is inputted, even if as a stand alone kaggle notebook.
RNSS Shows more pairs not bonding than EFTK but both AlphaFold and trRosettaRNA still show no changes in the configuration. Shows more pseudoknots bonds than EternaFoldThreshknot. Found a zigzag that was formed with three GUs which I note as being interesting but not a bug. Might try a white background on the PPP plot so that the light pairings show up better.
RiboNet-SS shows stacks being manipulated by which other NT bonds much better than the other models.
…
I created a design with 98 changes. Both EFTK (weak PPP) and RiboNet-SS strong PPP) say design will work. AlphaFold says very low probability of working. doesn’t look like RiboNet-SS does 3D any better than EFTK.
…
A comment on AlphaFold: Whether I have 95 changes or 5 changes just as long as the SS is stable the 3D structure comes out the same. So the code must create the SS (Dot bracket) from the sequence, then the 3D from the
SS (Dot bracket), then calculate the probability that the NTs will be able to hold that structure’s position, then color code the results.
Might try a white background on the PPP plot so that the light pairings show up better.
This is a unique issue with RNNet-SS - I think it has to do with all pairings having some minimal probability value, which is not the case in other models. Will need to investigate.
I created a design with 98 changes. Both EFTK (weak PPP) and RiboNet-SS strong PPP) say design will work. AlphaFold says very low probability of working. doesn’t look like RiboNet-SS does 3D any better than EFTK.
Keep in mind - as far as I know, what AlphaFold is telling you is its confidence in its prediction. That does not necessarily say anything about the likelihood that it will “work” (which is also dependent on having a definition of what “working” means which AlphaFold is not being given, though I guess you could argue that ends-up-as-predicted could be used in that way).
New type of Eterna pairing observed in RNNet between bases 53 and 76. Please share screenshots if you observe new pairings or behavior not typically seen in Eterna folding predictions.
RN-SS seems prone to leaving bubbles if too many repeats exist. In this design, 60 of the 100 bases are in loops that “should” be paired off. In less extreme examples, a random pair or two in the middle of a stem will be unpaired for no obvious reason.
That is a completely inaccurate folding prediction, thanks for finding it. I experienced a similar misfold. The sequence does not represent sequences typically found in natural RNA and therefore won’t be encountered by researchers, but we still should fix it. In retrospect, we should have included more “bad RNA designs” in the training set players created, although the presence of repeats might generate ambiguous reactivity data. As we collect issues, perhaps @jandersonlee could include variations on these issues in his next contribution to Ribonanza 3.
The DMS looks maybe ok, the 2A3 has issues - though different from the issues of RNNet-SS.
Some questions that come to mind:
How much of the behavior is due to a lack of training data vs representing what would actually happen in experiments due to the experimental behavior of repeats?
We know there are experimental artifacts due to repeats, but is there any experimental data (maybe from other experimental methods) on what actually happens with these kinds of sequences? Would it actually do what we “expect” or are there actual physical things at play that prevents that? (I’m sure others who have a better understanding of what the experimental issues are with repeats could provide some insight here)
How much would adding additional examples of this kind of thing to the ribonanza experiments actually be helpful? As DigitalEmbrace mentioned, would we get useful data? There’s also always a chance that RibonanzaNet-SS could still have issues since there still won’t be example secondary structures for these kind of sequences in the dataset used to fine-tune the base model to predict secondary structures. It could be fine, but it depends on how much the model is able to extrapolate!
While technically this is another example of repeats, another example of an “edge case” that currently is not handled well is a string of all As. Eg, take the sequence AAAAAAAAAAAAAAAAAAAA
RNNet says this is highly paired, and the dot plot from RNNet-SS says that everything is likely to be pairing to everything else!
This doesn’t show up in the secondary structure in-game because we filter out invalid pairs, though the Kaggle notebook actually doesn’t do that. Interestingly, I tried adding that to the notebook (tried both changing the pairing probabilities before being passed to hungarian as well as filtering out pairs after hungarian) and it did not change the performance on the CASP15 test set at all - it appears the model is good about not predicting invalid pairs, but has issues in this “adversarial” case
I expect the bubbles are due to probability and not experimental results or training data. Increasing specificity in a couple places can cause the structure to pair more rationally. The lack of pairing is progressive and seems based on there being too many possibilities, so it ranges from 1 pair not pairing to no pairs forming at all. It’s as if RN-SS is hitting some confidence threshold and defaulting to unpaired, or just giving up.
Here’s a similar example made up of palindromic GUAC repeats with a single 5 repeat of each base. About 50 of the 100 bases “should” be paired off but aren’t. (100% GUAC forms no pairs.)
Viewing SV_B7 by Spvincent in RN-SS shows some bases as paired incorrectly. Natural mode of SV_B7 in RN-SS has a structure length of only 175, with a sequence length of 177. The pairing probability looks sane, but somewhere before 120 RN-SS loses 2 bases of the structure in natural mode, causing the structure of everything to shift and show nonsense pairings in the display.
Looks like this is due to a 4th-order pseudoknot being predicted, while we currently handle only through third-order in some internal utilities. With that fixed in my local testing, the prediction is pretty gnarly, but presumably at least valid
I have been having a bit of fun breaking trying to see how to break the rules in RNNet while still getting a stable design. Here is what I have learned so far.
Can GU’s be placed anywhere else than stack ends and boost spots?
It seems to be very hard sticking in GU pairs anywhere but at stack closings. Plus at boost spots. I played with Jieux’s Solanum Tuberosum because it already came with a GU as a default.
Way too many GU’s possible, but with high eF1 ratings:
I managed to put in 7 GU’s in stacks and 5 GU’s at boost spot.This most likely is way too many GU’s to be even legal. While the positions are quite legal in real RNA:
RNNet is ignoring lab limits
Generally RNNet ignores the warnings that are installed in lab. Which makes sense since it is not integrated there. So it ignores
Ignoring max percentage of A’s
Ignoring poly(X) issues - which Ucad has already mentioned
Allows too many AU’s
Weird AU mismatch (red marking) - normally that would be a base pair
Please do put some of these test cases in OpenKnot Round 6 where you can! I’d be interested to see how the data turns out.
NB: Some of those lab restrictions are due to issues in the synthesis process, right? So it’s possible RNNet’s predictions may still technically be correct. Otherwise I imagine it’s likely due to a lack of these kinds of sequences to train on, which I’m not sure how we could rectify.
The lab restrictions are due to the synthesis process. Since there has been limitations means there will be less data to train on that has polyX and so on. However the poly X limit was lifted for the first pseudoknot rounds, partly due to some of the known pseudoknots themselves having poly G and poly C. Plus later labs with robot solutions are rich in poly X violations, like 240 Pseudoknot detective. But I guess it is not in the algorithm yet.
The A limit was introduced at later labs after too many A’s (also if not in a row) was found to be a problem.
All in all I can make designs stable that are so far from the original sequence, that I suspect they will not work. While sequences that are known to work, show up unstable.
There wasn’t any PK240 data available when the model was trained, so that tracks.
Yeah the A percentage thing had to do with sequence dropout - some of the dropout issues are likely to be improved with recent advancements, and beyond that it may be possible to get usable data via other means, but to be honest I don’t know quite enough about this
Have you shared any examples of known-working sequences that RibonanzaNet doesn’t predict well? I don’t think I’ve seen that yet