Issues with RibonanzaNet-SS

I’m seeing some odd-looking constructs with this: for example in Pseudoknot 100 in Round 6:

SV_9e

  1. Single pair pseudoknot 69/86
  2. Unbonded pairs 7/21 and 66/38 and 32/72
  3. 1-nt stem 1/140 along with the barcode forming a pseudoknot

This is actually not a bug, but what the model is predicting. If it’s wrong, it’s poor behavior of the model!

@spvincent Could you post a screenshot of the RNNet-SS prediction and the SHAPE-directed (experimental data) prediction? I’d like to see how RNNet compares to the reactivity data.

image

RNN-Net

image

etfk

Since its Round 6 there’s no Shape data yet. I was trying to get a similar submission from Round 1 or 2 and paste it into the Round 6 puzzle. But attempts to browse submissions from those rounds right now results in an error message.

1 Like

Oh right, too late in the day. I thought your post said round 1. That single pair at 69/86 as a tertiary contact is wild - but actually possible! The others don’t surprise me as much. Stems are going to start looking a lot different than we are accustomed to seeing. Bases do not pair in Nature nearly as neatly and consistently as the nearest neighbor models (Vienna) display.


My issue with RibonanzaNet as well as EternaFoldThreshknot is that impossible puzzles can be created. In this example there is only 1 sequence that is able to form this structure. Any mutation to any base will change the structure. A base pair change far away from the pseudoknot should not affect the structure.

I’m not surprised that changing a base away from the pseudoknot is changing the prediction, and also possibly the structure. The pairing probabilities change. Is there anything odd in the dot plots? Would be insightful to submit 20-30 of these sequences in round 6 to get the reactivity data. Are the changes in the RNNet prediction minor? If the changes are minor, then the reactivity data may not tell us much.

Yeah ultimately this is just the way these models work. “Conventional” MFE models are convenient to reason about since by picking the structure with the lowest sum of substructure energies, you can’t cause non-local changes unless everything between the mutation and the other area change.

With EFTK, this changed because it picks the best pairs based on the pairing probabilities ie combined likelyhood from all possible structures, and a mutation will cause a change in likelyhood across many alternative structures, which can result in a “remote” pair no longer making the cut, particularly if it’s “on the edge”. With RibonanzaNet it’s actually looking at the entire sequence holistically, essentially considering the impact of every base/set of bases with every other base/set of bases, running a function that comes up with most-likely-pairs that’s tuned to known data - but it doesn’t have any constraints that would require any real defined behavior beyond what it’s “inferred” from its training.

I’m not sure there’s any real way around this (at least, that we know of) - it’s a price we pay for both accuracy and speed

I’m seeing negative cross-pair scores (SV_odd in Pseudoknot 100 in Round 6). Is this legit?

The eF1 you see in the game is weighted in a way that could lead to a range that’s not 0 to 1 (even though it “should be” 0 to 1). We’re currently working on some different models which likely won’t have that behavior. More soon.