Is NUPACK's energy estimation of flush stacking leading us astray in the OpenTB puzzles?

Omei · September 12, 2016, 2:10am

@Atanas: OK, I see how your count of three differed from mine.

As far as “test the gap between the reporter and the oligo”, my experiment did that. But the gap on one side was widening at the same time the gap on the other side was narrowing. So now I’m thinking that you mean there might be a better way to design the experiments that would better isolate the contribution of just one of the gaps. I can think of other ways to design a series, but do you have a specific alternative in mind?

As for what is actually happening at the ends of the our design, unless I misunderstood his comment, Rhiju recently said that there was a DNA strand at both ends of our design RNA. @rhiju, did I understand correctly?

atanas.atanasov · September 12, 2016, 2:35am

I think the most flexible would be the one where only the design and the reporter interact, because we can tweak the design as we wish, while the input oligos are fixed. But if both ends of the design are tethered this won’t be possible.

Omei · September 12, 2016, 9:32pm

Ok, I think I have a proposal for how to quantifying dimeric stacking energies.

@Atanas, your suggestion that we reduce the complexity by limiting ourselves to three oligos at a time really helped me come up with this proposal.

I’m creating a document Quantifying the thermodynamics of dimeric stacking to put in more details, but here’s the main idea:

Extending this outline in three dimensions (rightward, downward, and putting A, B or C on either end of the reporter), we can systematically create an extremely large set of data on dimeric stacking.

The interpretation of the data, though, is going to be somewhat challenging, and refining that plan will probably influence which of the above measurements are most important.

I think the goals of the analysis should be

First fit the data to a nearest neighbor model that only takes into account the four bases that form the base of the two helices, i.e. ignoring the single-stranded dangles. This is clearly an oversimplification of the situation, but should be a lot more accurate than what NUPACK is currently using.
Extend the analysis to include the two dangling bases next to the helix. These bases are clearly important, because they can form non-canonical bonds with the other helix. The most commonly interaction might be a dimeric version of the A-minor motif, where dangling A would fit into the minor groove of the other helix, forming a base triple. A reasonable good “nearest neighbor” model for dimeric stacking will probably need to consider these two dangling bases in addition to the four that make up the nicked 0-loop.
Is anyone interested in collaborating on this effort? You wouldn’t need to have a lot of Eterna experience. The primary skills that would be useful would be programming (for generating the designs algorithmically and perhaps some manipulation of the data after collection), data analysis, and paper writing.

atanas.atanasov · September 12, 2016, 9:55pm

Have you checked if nupack reports 0 for the case of a dangling base(s). As far as I understand their algorithm, the case of dangling bases is evaluated as a outer loop with that consists of the dangling bases and should report non-zero energy for that loop. That is, the second and on series probably have already been modeled by nupack.

Also if I correctly understood, there is a more updated model of Turner? Maybe someone already researched this?

Another concern is how do you actually measure this energy. My understanding is that dG reported by the experiment is the difference in the reporter energy of being bound and unbound. Would that be enough to isolate just the energy of the gap? I don’t see any relation between the dG reported by the experiments and the ones in the simulation regarding the energy of the reporter binding (even if I disassemble it into Kd-s per complex and include pseudo knots). So maybe a model that explains the dG-s reported by the experiment and their relation to the simulation might be required.

Omei · September 12, 2016, 11:44pm

Good questions.

I haven’t looked into NUPACK’s estimate on the dangling base associated with stacking of separate oligos. At this point, I’m working on the presumption that NUPACK is not helpful for predicting our lab results once the two oligos become adjacent. That may turn out to be unduly pessimistic, but at the moment I am focusing on the opportunity to gather experimental data from the current round, which is closing in only a couple of days.

I have looked at the Turner parameters, and can say that they don’t address the question of multiple oligos at all. I’ve also searched the net for any sign of thermodynamic parameters for adjacent dimeric stacks, and have come up empty. But I’m not an expert in the field, so this doesn’t say a lot.

As for calculating the energy difference between two conditions, it is my understanding that the natural log of the ratio of the KDs (or the difference in the logs of the KDs) is a measure of the energy difference, in units of kT, between the two experimental states. (Given, of course, some simplifying assumptions). But I really need to get more input from @rhiju before saying anything about that with confidence.

atanas.atanasov · September 13, 2016, 12:14am

Do you know if the experiment uses the input oligos as defined in the games or it uses the full-length tuberculosis RNAs?

Omei · September 13, 2016, 3:55am

Yes, the same as those in the game.

Omei · September 13, 2016, 7:19pm

@atanas, prompted by your question about how the energy contribution due to stacking, I’ve modified my proposal; I realized that some of the data I collected in the previous version wouldn’t have a simple comparison.

Here’s the current screenshot from https://docs.google.com/document/d/1-G1pVWzacMBXuwISi2qaGfGrluHHdWEpE-qzYeUQD14/edit:

The idea here is to make sure that every observation of directly connected stacks has a an instance of a single base separation to be compared against.

I’ve actually added two comparisons, using either an A and a U for the intervening base. Ideally, it wouldn’t make a difference. But it probably will, and I would like to know what that is. Hopefully, that knowledge can be used to refine the estimate of how much of the difference is due to the coaxial stacking, as opposed to some base pairing involving the separating A.

atanas.atanasov · September 13, 2016, 8:27pm

I think dangling oligos would create too much interference as the number of dangling bases increase. That is, the first series look ok, the next series start to introduce more and more factors that could contribute to the energy. So I think we should reduce the tests to the 0-gap vs 1-gap cases.

We can measure the difference between the 0-gap and the 1-gap it seems by the dG-s of the two experiments. What bothers me is the to find the dG of the 0-gap we need to know the dG of the 1-gap. We can take the 1-gap energy from the current model used by nupack.
Adding a test with A replaced with U is a good idea because then we would have to values for the 0-gap vs 1-gap model and if we both values match then probably the model is good and our measurement might be ok. Maybe we should try with not only U but C and G.
Still there is the danger that this extra base might influence the dG by influencing the pairing probability and thus preventing or helping the reporter bind.
In general the experiment is very limited by the fixed inputs and that the inputs and the reporter are not equal in their chance to interact with the design because of the way the experiment is set up. Thus probably the result wouldn’t be deemed very reliable. But if we are lucky to get the exact same energy value in all 4 measurements this might be a good start.

Omei · September 13, 2016, 10:19pm

Agreed that there is a possibility of longer dangles influencing the results. For example, it is known that triplet stranded RNA helices can form. But if we limit the experiment to no more than single dangles, we only have 6 * 3 = 18 data points we can measure with the fixed set of oligos. But all of the other data points, 1680 in all, are relevant to future rounds of the OpenTB challenge because all can be present in a plausible switch. What’s in question is how well be can distinguish the relevant contributions of pi-pi stacking, pair-pair bonding, other hydrogen bonds, backbone torsion, … . And that’s always an issue issue in interpreting RNA thermodynamic data.

I’m think that when we get the results we would examine them critically before deciding exactly how to process it and what to claim as a conclusion. For example, if ddG values for some (or all) of the measurement points are clearly outside the distribution predicted by most, we might eliminate extreme outliers and/or fit the data to a more complex model, perhaps one that includes dangle lengths, and not claim it is all due to pi-pi stacking.

Omei · September 15, 2016, 7:26pm

I think I have a better plan for picking the non-adjacent stacking case against the adjacent one.

As I was writing the script to generate the designs, I realized that my plan for using both A and U as spacing bases wouldn’t be sufficient in the case where the two nearest dangles were A and U; the spacing base would pair with one or the other of the dangling bases.

A better plan, I think, is to always use a spacing of two, with each of the spacing bases selected so as not to form a pair (of the strong ones Eterna recognizes.)

So for the adjacent stacks on the left, a possible comparison case would be on the right, where the the spacing base U would be randomly chosen between U and C, since A and G would form a pair with the dangling U. Similarly, the second spacing base would be chosen between A and G, so as not to pair with the dangling G.

This way,

This requires only two designs per data point,
Neither of the stacks will be inadvertently extended, and
We’ll get sample of all the various adjacent bases, so we can better distinguish the base stacking energy from non-classical hydrogen bonding of the unpaired bases.

Omei · September 17, 2016, 4:39am

@rhiju, in one of the sets of data I published above,

the KD values for stacks that were flush and separated by a single A were 0.28 and 5.38. It is my understanding that the natural log of the ratio of the KDs is a measurement of the energy difference, in units of kT, between the two experimental states. In this case the computed value is -2.96 kT

Two questions:

Is this indeed how we should interpret Johan’s measurements, and
If so, what is the conversion of kT units to the kcal/mol units we normally use?

rhiju · September 17, 2016, 3:14pm

yes. that’s totally valid.
k = Boltzmann constant[ 0.0019872041(18), kcal/(mol⋅K)] and T = 37°C = 273.15+37 K = 310.15 K, so kT = 0.616 kcal/mol. In your example, -2.96 kT = - 1.8 kcal/mol. Pretty big!