RibonanzaNet-SS in dev puzzlemaker

RNet has been enabled in puzzlemaker on the eternadev.org site for players to create RNet puzzles and assess how solvable those puzzles are. (New players will need to earn 20,000 points on the dev site in order to post puzzles.) We would like to avoid a repeat of EFTK where players got frustrated by how difficult and illogical the EFTK player puzzles are. If you don’t have a dev account, you will need to create one. You can use the same player name and password as your main account for ease if you like.

There is no energy calculation in RNet and instead it provides an F1 score. View the F1 score by opening the spec box and keeping it open while flipping. (Box can be resized by dragging the corners.) If F1 proves unhelpful, then I can report to the researchers that RNet in the game would benefit from another metric for puzzle solving.

I recommend we keep puzzles under 300 bases to start because the RNet calculations become slower as the puzzles get longer. It will be great to see lots of pseudoknot puzzles, but non-pseudoknot puzzles also are good for testing. The puzzles on the dev site are for testing; they may not persist in the future. If you want to keep the structures for later use, keep a record for yourself.

2 Likes

Here are some observations I have made while making RNet puzzles:
There are no energies, though you can tell bulges, and all loops are boostable.


Interesting a GU on 80;71 forms this PK on 24;120. Another pic shows a GC unbinding in a stem. 215715_1748316981.png (1707×735)
This Rnet puzzle is called Rnet Study 3.

Interesting, it seems like since it’s a ML model, instead of building up the pairing probabilities, it starts with almost everything being able to pair with something else (albeit with <0.1 confidence), and as you add more pairs, it removes possible bonds until only your intended structure can form. It seems to like structures like <…(… …)…> a lot more than other folding engines, same with pks. It looks a lot like other AI models trained on a somewhat small dataset, where when given a really unlikely case, they give a really unlikely output. It’s a really interesting folding engine.

Thanks for the feedback! Can you post the RNet puzzle with that structure on the dev site? RNet was trained on >500,000 RNA sequences. Is that considered a small dataset?

Interesting, since simpler AI training sets like MNIST (https://en.wikipedia.org/wiki/MNIST_database) only have 40k training and 10k test, albeit they are simpler. While Rnet might have a different type of model to these AIs, since these have fixed input and output spaces, over 500k is a respectable size for a training set. I assume that’s the training data, not the test data, so it is interesting to see it create wildly impossible pseudoknots on a string of Adenines with only a few Uracils, Guanines, and Cystonines. I’ll look into it more, however, and post the unlikely sequences on the dev site soon :slight_smile:

Puzzle is posted :smiley:, here’s the seq if anyone needs it (they probably will): AAAAAAAAUAAAACAAAAUAAAAAAAAAACUUUUAAAAAAAAAAAAAAAAAAAAAAAAGUGACAAAUAAACAAAAAAAAAAAAGCGAAAAAAAAGAAAAAA

To set RibonanzaNet-SS in context for newer players, RNet is far from perfect. RNet 2 has been released, if anyone would like to try it on the Kaggle site. No accurate RNA folding prediction model exists. That is one of the primary reasons Eterna exists, to help researchers develop better models. I personally don’t think a 2D model, such as Vienna 2, could ever be usably accurate without taking 3D contacts into account. Which is precisely what we are trying to accomplish with RNet.

@DigitalEmbrace I’m not sure that I would say that RNet takes 3D folding into account. Rather it is more willing to embrace pseudoknots as a valid possible structure; while many folding algorithms are more either-or about pairing possibilities RNet is more open to “both”. Perhaps 2.5D in a sense.

The original (current) version of RNet does not generate a 3D model. The hope is that a future version will handle 3D. (The dilemma there is that such a model likely would be very slow in the game. But useful for medical researchers!)

The reactivity data reflects 3D contacts and bonding.

If by 3D contacts you mean pseudoknot bonds then yes, you are are correct. The reactivity data includes that information. However to me a full 3D model means one that predicts the 3D spatial orientation of the bases and their atoms and RNet 2 is still quite a way away from that. It is however a kind of 2D+ model that includes the pseudoknot bonding, making it at least 2.1D. Adding non-WC/wobble bonds (Non-canonical base pairing - Wikipedia) might make it even more accurate but I’m not quite sure how one would easily present that. Perhaps in addition to the secondary structure string with (<[.)>] pairing markers one could add another string that encoded the -|[ct][WHS][WHS] (1+2x3x3=19) pairing types. However that would still only work for cases where each base paired with at most one other base and there may be exotic but important cases where a base pairs with multiple other bases.

It would be interesting to see a 3D version of Rnet, but it might be an easier step to take first to release a 3D version of Vienna or EteRNAFold first, since it would both be easier to test and it is a simpler model. It might also be the premise for a new EteRNA-like game, where we have to find out the 3D model!

@DigitalEmbrace It occured to me that the non-canonical base pairing could probably be represented by a third “edge” string that encoded the edges (Watson-Crick, Hoogsteen, Sugar) and orientation (uppercase for CIS, lowercase for TRANS) used by each base. So for example a A:G tHS boost in a GNRA tetraloop would be sequence “GNRA” plus shape “(..)” plus edge “s..h” indicating that the Sugar edge of the Guanine trans paired with the Hoogsteen edge of the Adenine. I don’t know if there is enough information in reactivity data to extract all of this though. Also, if a base interacted with a second base beyond the primary bond, then additional shape2 plus edge2 strings could encode these secondary bonds.

2 Likes

Is anyone not making puzzles on eternadev.org because Puzzlemaker is requiring the player to earn 20,000 points on the dev site first? This can be changed if it is an issue.

For anyone comfortable with using the Kaggle site, there is a notebook for running RNet-SS 2. It would be quite informative to take sequences where RNet predicts poorly and run those through the notebook to see what the new prediction is. I’m particularly interested to see if the WC pairs that open in the middle of stacks in RNet 1 still unpair in RNet 2. A spreadsheet tracking the comparison would be great evidence to share with the researchers.

I’m looking through PDB files as candidates for the RNet Collection and right away came across an example of a natural molecule (PDB 5V3I) where an A and U at one end of a stack and a G and C at the other end do not pair.


Below is the molecule with non-canonical interactions. Those add a lot more structure in the area but nothing that directly interferes with the potential GC or AU pair. The reason these “pairs” don’t pair most likely is the underlying atomic geometry.

(I loaded the dot-bracket and sequence into Puzzlemaker on the dev site to test what RNet would predict. RNet predicts something entirely different in the area.)

My point is that when we see unexpected behavior in RNet, it might actually be correct. Natural RNA does not strictly follow the rules of nearest neighbor thermodynamics to which Eterna players have become accustomed.

1 Like

I suppose it is possible that a pseudoknot or other tertiary structure could cause a stack to have a 1-1 loop somewhere in the middle (in order to flex/bend) where it would ordinarly have a WC pair, but I’d be surprised if Rnet-SS or RNet2-SS had enough training/knowledge to be capable of figuring that out.

Here is an update from Shujun on RNet-SS 2 performance improvements: Stanford RNA 3D Folding | Kaggle

Going to respond to a bunch of things at once with my perspective:

Interesting, it seems like since it’s a ML model, instead of building up the pairing probabilities, it starts with almost everything being able to pair with something else (albeit with <0.1 confidence), and as you add more pairs, it removes possible bonds until only your intended structure can form.

My intuition is that this is at least somewhat an artifact. Eg, you see this with a string of all As, but I imagine that’s likely because it’s “out of distribution” (ie, a situation that the model has no way of understanding because it’s never seen anything like it before) since the training data has limited “poly-N” sequences due to that not working with the experiments. Though I don’t know if there’s any information on what is expected of that situation in an experiment (does it have some other interesting behavior aside from just sitting around unpaired?)

It seems to like structures like <…(… …)…> a lot more than other folding engines, same with pks

I will note that it has a sorta-similar behavioral bias to EFTK, in that it takes pairing likelihood and then picks out non-conflicting pairs above some threshold (using Hungarian rather than Threshknot, but relatively similar), so you can wind up with some weird behavior on “borderline” pairs. That said I think RibonanzaNet may have more “strong opinions” about pairings, making that show up less often? Haven’t gotten to really play enough with it, I’m sure others may have more experience with that.

And as DigitalEmbrace has mentioned, there may be situations that look weird to us, but could be reflecting some 3D behavior we’re not used to seeing! Using 3D models could help give a clue as to whether it’s more likely an artifact or more likely reflecting some interesting interaction.

I’m not sure that I would say that RNet takes 3D folding into account.

If by 3D contacts you mean pseudoknot bonds then yes, you are are correct. The reactivity data includes that information.

Going to “yes and” the discussion here. The RibonanzaNet foundation model, trained on reactivity data, does have the capacity to take 3D contacts into account, even if they’re not Watson-Crick-Franklin pairs - and even potentially if they’re not reflected in the reactivity data directly.

Conventional thermodynamic models (particularly when we’re talking about minimum free energy structures from nearest-neighbor parameters) have to explicitly take into account any potential interaction the authors want to consider into its equations. However, RNet’s approach has the ability to pick up on any kind of motif - it has the ability to say something like “if I see a mix of As and Gs over here that has some characteristic, if I see a sequence that looks a particular way over there, it’s going to be protected”. And those motifs may represent some interesting 3D structure. And it doesn’t even need to be a specific sequence resulting in a specific reactivity profile - the architecture allows for things like complex patterns and inter-related conditions. This is the reason why there’s hope it could be useful as the basis of a 3D model! We have lots of reactivity data, but little 3D data, so the hope is the model can get a lot of insight from just the reactivity and then contextualize it with the 3D data we have.

That being said, what is ultimately reported by the RibonanzaNet secondary structure algorithm is limited. First off it was trained to predict Watson-Crick-Franklin pairs (ie, the training data that it was given to fine-tune from was the WCF pairs of some RNAs). That said, if there were 3D interactions which determined the WCF pairs, it could still pick those up. Additionally, even if the algorithm did predict a non-WCF pair, we filter those out in Eterna (I don’t think I’ve ever seen that happen, but I also haven’t rigorously checked!). Point being, while the 2D structure may be informed by 3D features, we don’t actually get information on what those 3D features are (nor does the model have the benefit of incorporating that contextual knowledge) - a true 3D model would be another step to giving us more information!

It would be interesting to see a 3D version of Rnet, but it might be an easier step to take first to release a 3D version of Vienna or EteRNAFold first, since it would both be easier to test and it is a simpler model. It might also be the premise for a new EteRNA-like game, where we have to find out the 3D model!

Unfortunately, not really. That requires being able to encode specific 3D folding “rules”, which are hard to determine and measure. 3D behaviors - or rather, how you would model 3D structure algorithmically - are also substantially different from how algorithms like Vienna and Eternafold work. There are other models out there that do take a more traditional approach (such as some classic Rosetta RNA modeling code), but they’re not very good. :slight_smile: