Relative Concentrations of RNA Shapes

I’m curious about the relative concentrations of RNA shapes in solution. Although EteRNA shows just one shape (Minimum Free Energy?) as “the” shape for a sequence, other tools such as RNAshapes and RNAsubopt show that other shapes with different free energies can also be formed with different probabilities. In EteRNA, the only hint of these other shapes is in the Dot Plot.

My understanding is that in solution you would end up with each of the shapes showing up with a given relative frequencies/probabilities dependent (in part) on the differences in Free Energies between the shapes. Different energy models might estimate these probabilities differently though.

Is there a metric or rule of thumb for the relative concentrations based on the differences in free energies? For example if a sequence had three likely shapes with free energies of -2.0, -1.0 and 0.0 kcal, is there a reasonable estimate for what the relative concentrations would likely be in solution? (E.g. 25:5:1) Or is that too simplistic?

From playing with RNAshapes and some simple sequences it *seems* like each -1.0 of Free Energy difference often seems to be about 5x difference in relative concentration. However that it is not 100% accurate as shape/structure seems to play some role too as sometimes a higher energy shape is also given a higher probability.

AGGGAAACCA
-5.00 .((…)). 0.9997004 []
0.00 … 0.0002996 _
exp(ln(0.9997004/0.0002996)/5.0) = 5.07

AGCAAAAGCA
-1.20 .((…)). 0.8751232 []
0.00 … 0.1248768 _
exp(ln(0.8751232/0.1248768)/1.2) = 5.07

AGCAAAAGCAAAAGCAAAAGCA
-2.30 .((…((…))…)). 0.5711718 []
-2.40 .((…))…((…)). 0.4202706 [][]
0.00 … 0.0085576 _
exp(ln(0.4202706/0.0085576)/2.4) = 5.07
exp(ln(0.5711718/0.4202706)/-0.1) = 0.05

What’s up?

That looks like a bug in RNAshapes. The lower energy structure should always have higher population… you might want to contact the authors with the problem case.

I’m curious whether the synthesis of simple sequences like the following could provide useful feedback for the energy models regarding multiple shape frequency distribution:

AGCAAAAGCAAAAGCAAAAGCA

According to RNAsubopt this sequence can fold at least 28 ways with free energies ranging from -1.50 to +4.50:

.((…((…))…))., -1.50
.((…))…((…))., -0.80
.((…))…, -0.40
…((…))…, -0.40
…((…))., -0.40
…, 0.00
.((…))…, 0.50
…((…))., 0.50
.((…))., 1.00
…(…((…))…)…, 1.50
.((…(…)…))., 2.10
.((…(…)…))., 2.10
.(…((…))…)., 2.10
.((…))…(…)…, 2.20
…(…)…((…))., 2.20
…(…)…, 2.60
…(…)…, 2.60
…(…)…, 2.60
.((…))…(…)., 2.80
.(…)…((…))., 2.80
.(…)…, 3.20
…(…)…, 3.20
…(…)., 3.20
…(…)…, 3.50
…(…)…, 3.50
…(…)…, 4.00
.(…)…, 4.50
…(…)., 4.50

Using the rough metric of 5x concentration per +1.0 kcal, fifteen of these combinations should occur >0.1% of the time:

.((…((…))…))., -1.5, 0.4871
.((…))…((…))., -0.8, 0.1579
.((…))…, -0.4, 0.0829
…((…))…, -0.4, 0.0829
…((…))., -0.4, 0.0829
…, 0, 0.0436
.((…))…, 0.5, 0.0195
…((…))., 0.5, 0.0195
.((…))., 1, 0.0087
…(…((…))…)…, 1.5, 0.0039
.((…(…)…))., 2.1, 0.0015
.((…(…)…))., 2.1, 0.0015
.(…((…))…)., 2.1, 0.0015
.((…))…(…)…, 2.2, 0.0013
…(…)…((…))., 2.2, 0.0013

Summing the probabilities we end up with about a 76% bonding rate for the outer GC pairs and an 84% bonding rate for the inner GC pairs. If the SHAPE data showed something significantly different, we we would know that something was off with respect to the energy model.

Is that sequence one that could be synthesizable and subject to accurate SHAPE testing?

You probably know this, but 1.0 kcal/mol corresponds to a probability ratio of:

exp( deltaE/ k_Boltzmann T )
= exp( 1.0 kcal/mol /(0.00198 kcal/mol/K * (37+273.15 K) ))
= 5.09

at 37 °C, which is where most online programs set their default temperature (human body temperature).

Its slightly different (5.2x) at 24 °C where we do most of our experiments,

I like the idea of giving players the option of seeing suboptimal structures. We currently do have the dotplot to show some relevant information, but its not broken down in the way you like. We have discussed putting a suboptimal viewer in EteRNA in the past, and its in the queue, but its sufficiently difficult that it is not at the top of the queue.

Now, I *really* like the idea of designing and testing sequences that are really undecided as to their shape to test energy models. Good news: you’ll be able to test this and other sequences explicitly in about 2-3 months. ‘Player projects’ are advancing quickly!

I didn’t know that. Thanks. That’s the factor I was looking for.

Glad to hear that Player Projects are advancing too!

I don’t think there’s a bug in RNAshapes. The software simply treats things a bit differently. See http://bibiserv.techfak.uni-bielefeld.de/rnashapes/manual.html#shapetype

In your example, notice the structure abstractions at the end of the lines

-2.30 .((…((…))…)). 0.5711718 []
-2.40 .((…))…((…)). 0.4202706 [][]

@nando: You are right. The probability is cumulative for that structure abstraction whereas the shape and free energy are just one representative from that class. Here is how Stefan Janssen explained it:

Hi Jeff,

thank you for your “bug report”, but here it is really “the” feature of RNAshapes in probability mode that confuses you.
You are right, that an individual structure probability directly corresponds to its free energy value, because energy is just transformed in “exp(-energy/R*T) / sum of all probabilities”. But with shapes, we group different structures together into shape-classes. Thus, we have to add their probabilities up.
If we look at each possible RNA structure separately, we get:

AGCAAAAGCAAAAGCAAAAGCA
energy structure probability shape (level 5)
-1.20 .((…))… 0.0600 []
-2.40 .((…))…((…)). 0.4203 [][]
-0.30 .((…))… 0.0139 []
0.21 .((…)). 0.0061 []
-2.30 .((…((…))…)). 0.3573 []
0.00 … 0.0086 _
-1.20 …((…)). 0.0600 []
-1.20 …((…))… 0.0600 []
-0.30 …((…)). 0.0139 []

MFE structure also has highest probability, but you can clearly see, that it’s shape [][] has only one member, while the [] shape is populated by 7 structures. Adding their probabilities up gives a higher probability than that of the mfe shape.

These overtaking effects can be observed relatively often and are the main reason for computing the computationally costly shape probabilities.

Best regards,
Stefan