Is NUPACK's energy estimation of flush stacking leading us astray in the OpenTB puzzles?

@nando, @rhiju: I’ve been noticing a number of designs being submitted that rely on what seems to be to be a mis-prediction on the part of NUPACK.

Compare the following prediction, which seems reasonable:

to this is one, where I shifted bases 21-25 to the left, creating the potential for two flush stacks:

NUPACK predicts that the the reporter will not bind in this scenario, where it would if there is a base separating the two stacks.

To understand why NUPACK made this prediction, I set the target mode to have both stacks form:

Nupack is predicting a 0.0 energy bonus between the two stacks.  My understanding of the Turner 2004 parameters, on the other hand, is that flush stacks should be evaluated as though both backbones were unbroken. In this case, that corresponds to a -1.7 value instead of 0.

I know the very low concentration of the reporter oligo that we are using in this round leads to predictions I am not used to seeing.  (Specifically, the estimated energy for a folding that binds the reporter can need to be not just lower, but a lot lower than the non-binding folding energy, for the former to be predicted.  And I understand why that is.)  But I can’t see how the low concentration would be a factor in the comparison above.  Am I missing something?

I don’t expect Nando to try to tweak NUPACK.  But if I am right, we can at least alert players that designs that rely on flush stacking to eject the reporter are not likely to work that way in vitro.

2 Likes

Omei, that’s potentially an important obsercation. and actually that does look like a bug in NUPACK.

I know michelle was looking at how the code treats stacked helices, and it has some ‘special’ code to handle them, which may warrant a closer look.

but more likely is that in multistrand mode, NUPACK ‘forgets’ to handle flush stacks. if that’s the case, perhaps if we find the bug fix we can implement and also let the NUPACK devs know about it.

@nando, any thoughts?

I haven’t specifically looked into coaxial stacking in the multistrand case, although I agree that it seems likely that that’s what is happening here.

About the specifics of the presented issue: to begin with, NuPACK’s engine is tailored for older parameters (1995-1999) and is missing some code for handling parameters that were introduced only in Turner 2004. This is not necessarily the case of the one you’re mentioning, but I wanted to make it clear that there are things specified in Turner 2004 that current NuPACK builds *cannot* even handle.

Also, in the case you’re presenting, I would be quite doubtful that Turner 2004 has it right in this case. Really? A break in the backbone and the stability is supposed to be exactly the same? I propose people make following experiment: take 2 identical ladders. On one of them, use a saw to break through one of the rails, and now test and compare their stability thoroughly.

Now, from a general point of view: our models are just that, models. They are clearly imperfect and have always been. If they had been good enough, Eterna players would *not* have been capable of intuiting methods to improve the design of single state structures like they did a couple years ago.

Another important factor is that the MFE is not necessarily a very good predictor of how RNA strands behave in solution. It is a reasonable start, but it’s no guarantee, by far, no matter what the model or the folding engine. 

In short: I don’t believe that it is worth spending much time investigating the presented issue. Many other factors can influence the result/prediction much more strongly anyway.

Good idea about doing experiments.  I’ll propose a different set of experiments, though. Instead of ladders, we could use RNA molecules.  As I understand it, there is a crazy professor at Stanford who is willing to run experiments on RNA molecules for no charge.  In fact, the latest thing I heard is that he is actually encouraging people to publish the results of their experiments in scientific journals.  Now that’s really crazy.

1 Like

@omei it might not take long for you to verify this omission of stacking bonus in multistrand – just send in some of these test examples onto the NUPACK server and see if flush stack bonuses go in properly for single-strand cases but not multi-strand. 

if that’s the case, the nupack source code is available for download – perhaps if you or others check it out you might find a few lines that would fix it. And then we could easily implement it in the game as well as coordinate with the nupack developers to see if they’d be interested in updating their code.

@rhiju: following up on your suggestion to see if flush stack bonuses go in properly for single-strand cases but not multi-strand, I created two series of flush or near flush stacking scenarios, one using a single strand and one using 2. At least for the closing CG/UA pairs that I tried, NUPAC makes no distinction between single and two strands when predicting the effect of inserting additional As between two flush stacks.  In both cases, the predicted MFE drops by .9 kcal/mol for the first A, and .3 for the second.  

It looks to me that NUPACK is indeed leading us astray.  In R104, I submitted some designs that ignored the stated puzzle goals and instead did a controlled test of stacking predictions.  I’ve only looked at one of my better-thought-out experiments so far, but the results seem unambiguous. 

For the A (AND) B - RO puzzle, I submitted 7 designs with the name 
Omei Coaxial Stacking Series A v, where N varied from 1 to 7. Here is state 4 of v4:


In state 4 (both A and B present at 100nM), all but the 16 center bases are tied up either in static stems or by one or the other oligo. The only difference between the 7 variations is where the reporter complement is positioned. In v1, it is positioned all the way to the left of the 16 center bases; in v7, it is all the way to the right.

NUPACK predicts that the MFE is lowest for the middle 3 positions, increased by .02 kcal where the there is a single A separating the reporter complement from either of the other stacks, and significantly higher when the stacks abut. In fact, when slid all the way to the left, NUPACK predicts that the MFE structure doesn’t have the reporter binding.

The lab data appears to say the opposite:


Unless I am just totally confused (which is always a possibility), low values of KD100nM_A_B say that it takes only a small concentration of the the reporter for the reporter to bind to the design. Higher values say that it takes higher concentrations, i.e. the binding of the reporter is weaker. So the lab data seems to say that the binding of the reporter is stronger the closer the reporter stack is to one or the other of the A or B oligo stacks.

Please (gently) correct me if I’ve misinterpreted the data.

This observation for the reporter oligo, and the generalization that co-folding gets stronger, not weaker, as the gap between the co-folded helices gets close together, is turning out to explain a lot of things that are otherwise puzzling in the data, both from R104 and previous rounds.

I’l start with an example from the R104 Sensor A - RIRO puzzle.  I submitted a series of 7 designs that had (at least appeared to have had) a simple pattern that has always done well for me in the past.  They all had (almost) the same active switching sequence, but unlike the example above, the surrounding bases weren’t systematically controlled.

Here is NUPACK’s prediction for one of them:

I’ve marked the strong kernel attractions that can bind to either the reporter or the A oligo.

Much to my surprise when I saw the very first preliminary data, five of these seven had a fold change less than 1.0, meaning that instead of the reporter being ejected by the A oligo, it was supposedly binding more strongly when A was present.  (In fact, this design was supposedly acting as a good RI/RI switch, with a fold change of 13.8 when viewed that way.) Since the data didn’t make sense, I mentioned it to Johan, with the idea that it might point to a problem in some step of the processing.  But he found nothing wrong in the processing, so eventually I had to accept the data and figured out what really happened. 

The data implies this is what actually happened:

The energy reduction inherent in adjacent stacking was allowing the design to bind simultaneously to the A oligo and the reporter, making the net energy of the dimer lower than having only A bound.  NUPACK wouldn’t predict this because its energy model adds a penalty for the adjacent stacking instead of a bonus. 

Note that I have also marked another pair of adjacent stacks in yellow.  Thus, the folding of this design, which had the lowest fold change of the seven (as a RIRO), is reinforced by two adjacent stacks.  Other designs in the series had differing number of bases between these two stacks, and that seems to be the major contribution to the ordering of KD values for this state.  There are no other designs with this second adjacent stack; the only design with one separating base between the stacks has the next lowest KD, and the only design that has a fold change significantly greater than 1.0 has no supporting stack at all on that end.

I did another series of designs for the A (AND) B - RO puzzle, just like the one above but with the A and B complements reversed.

Results for the last two designs got lost in the processing, but the other 5 tell the same basic story.

Again, the reporter binds much more strongly when it is flush against the oligo stack.  

Here, there is no confirmation that  the single-base separation has an energy value between those for 0 and 2 base separations.  But the error factors say that the differences among the latter four numbers shouldn’t be taken as being statistically significant.

Last time I checked v6 and v7 were not lost, just under strange names (probably their initial names), search in the sequence column to find them.

That’s what I thought at first, too.  But the ids of those incorrectly named designs don’t match the ids of my designs, either.  The ids are what ties the eterna database to the measurements, so I’m pretty sure that those two lines of data go with their ids, whatever designs those happen to be.

Thank you for caring!  

Did you enter those two designs under those names initially and later delete and resubmit under the new names? Maybe the DB for some reason kept the first entries.

No, I didn’t change the name.  The flow of events that led to the publishing of these incorrect entries is collectively known, though I’m not sure any one dev has the whole picture.  Only about 1 percent of the entries were lost, and I think most of those disappeared completely rather than showing up under an assumed name.  I’ve given a heads up to the three players that seemed to be most affected.

@rhiju: You had suggested that NUPACK’s mis-prediction might be a bug in the code.  But now that Vienna has a server for predicting dimer structure, I ran a test on it.  Here’s the results of two runs, the first with the reporter complement directly adjacent to the main RNA’s hairpin, and a second with a single A inserted between the reporter complement and the hairpin stem.

Vienna also believes that abutting the stems is energetically unfavorable.

This makes me wonder whether there just isn’t good experimental data for dimeric coaxial stacking, and whether we should submit more experiments of this type in the current round with the intent of publishing the data.  

Your thoughts?

hey, that’s a super-interesting idea. can you outline a systematic set of expts that would let us measure the dimeric coaxial stacking energy in many different ways?

I think I would start out by running the same series (7 designs) using the oligos in the present round, which are different than the first:  For example,

Using one of the A, B or C oligos at either end, this would be 9 series.  The primary goal here would be to compare the flush values (at either end), against the middle values, where presumably interaction between the two oligos is negligible.

(We could try for a larger variety of base pairings by considering designs of this form:
But I don’t think we should use these, at least initially, because of the fact that in the experiment, the ends of our design are not really free, but tethered to DNA molecules, and we really don’t know how that would affect things.)

That’s only 63 synthesis slots.  The next tier of experiments, I think, should be to measure what happens when there are one or two single strand strands emerging from the abutment of the stacks, e.g. 

The number of possibilities here (e.g. which one or both of the helixes has an extending strand, the length of the strands, the bases that make them up) quickly explodes, so we could take advantage of how many synthesis slots were available.  I would have to think more about what to prioritize once I had some notion of how many slots were available.

But before spending too much time working on the finer details of this proposal, I would want to see if others have alternate suggestions for entirely different experimental configurations.

My suggestion is that you reduce the complexity by using only 3 interacting oligos. It would be good to have some way to test all possible pairs next to the nick - the energy would probably be different for different pairs. Testing all pairs with these limited inputs might be hard though.

Thanks for your interest, Atanas.  

I was thinking that each specific design should bind to only three oligos (the reporter and two others, one at each side). I’m thinking that is what you meant.  Or did you mean only gather data on one combination of the A/B/C choices.

And agreed it would be good to get some data for all possible pairs across the gap.  Although the experimental setup doesn’t give us enough flexibility to do that without having one or both single strands sticking out, I think that should be one of the considerations for choosing designs for the second, more open ended, level of experimentation.

I mean the design, the reporter and one oligo, so that you can test the gap between the reporter and the oligo. Considering we can’t test all possible pairs anyway, maybe the design+reporter is also a viable test even though the design has one end tethered, the other could be used for the test.