EteRNA Tetraloop Reference Table

Hi All,

Recently, I had a conversation with Sneh in Chat in which resulted in me coming away with a significantly increased awareness of the importance of loops in the game in general, and in our Lab Designs in particular. This got me thinking that I needed to learn more about loops.

So, I set off to do a bit of research, and here is what I found so far: (source:

http://www.bioinfo.rpi.edu/zukerm/cgi…

This is a list of the most abundant, most frequently occurring Tetraloop configurations - including the stack-end pair the loop is attached to (the two bases on the ends). Asking around a bit, I found out that these sequences are in 5’ --> 3’ order, and that this list of 30 Tetraloops accounts for MOST all of the Tetraloops configurations that are found in nature.

Before finding this table, I had no idea that there were such a small finite number (30) of valid Tetraloops, and no idea either that they were all so …identified and defined and recorded. (having just been creating them mostly by beginner’s guesswork thus far in the game).

This table also contained a few other surprises for me; among them, the solid realization that there ARE places where U’s and even C’s can and should be used in a loop, but usually only one at a time (although there is one configuration that does contain 2 consecutive U’s).

Tetra-loops
Seq - Energy Seq - Energy Seq - Energy


GGGGAC -3.00 CUACGG -2.50 GGGAAC -1.50
GGUGAC -3.00 GGCAAC -2.50 UGAAAA -1.50
CGAAAG -3.00 CGCGAG -2.50 AGCAAU -1.50
GGAGAC -3.00 UGAGAG -2.50 AGUAAU -1.50
CGCAAG -3.00 CGAGAG -2.00 CGGGAG -1.50
GGAAAC -3.00 AGAAAU -2.00 AGUGAU -1.50
CGGAAG -3.00 CGUAAG -2.00 GGCGAC -1.50
CUUCGG -3.00 CUAACG -2.00 GGGAGC -1.50
CGUGAG -3.00 UGAAAG -2.00 GUGAAC -1.50
CGAAGG -2.50 GGAAGC -1.50 UGGAAA -1.50

But before I started this endeavor, I found an even better and more detailed version of this table in this article that I was directed to by Player alan.robot (Thank You Alan) - (This version included Frequency-of-Occurrence Data for each Tetraloop):

http://bio.gnu.ac.kr/research/miRNA/R…

(see Figure 8.)

Table 8. Tetraloop hairpin bonuses
Sequence - Occurrence - bonus (kcal/mol)

GGGGAC 87 ÿ3.0 CUACGG 17 ÿ2.5 GGGAAC 9 ÿ1.5
GGUGAC 76 ÿ3.0 GGCAAC 17 ÿ2.5 UGAAAA 9 ÿ1.5
CGAAAG 56 ÿ3.0 CGCGAG 16 ÿ2.5 AGCAAU 8 ÿ1.5
GGAGAC 47 ÿ3.0 UGAGAG 16 ÿ2.5 AGUAAU 8 ÿ1.5
CGCAAG 40 ÿ3.0 CGAGAG 14 ÿ2.0 CGGGAG 8 ÿ1.5
GGAAAC 36 ÿ3.0 AGAAAU 13 ÿ2.0 AGUGAU 7 ÿ1.5
CGGAAG 35 ÿ3.0 CGUAAG 11 ÿ2.0 GGCGAC 6 ÿ1.5
CUUCGG 28 ÿ3.0 CUAACG 11 ÿ2.0 GGGAGC 6 ÿ1.5
CGUGAG 23 ÿ3.0 UGAAAG 11 ÿ2.0 GUGAAC 6 ÿ1.5
CGAAGG 18 ÿ2.5 GGAAGC 9 ÿ1.5 UGGAAA 6 ÿ1.5

…Very interesting, but a bit cryptic, and not very accessible for many people, I thought, so I decided it might be a helpful contribution to the EteRNA community if I were to put it into a format that is hopefully a bit more usable and visually appealing.

After reading this article and finding this version of the table, I decided to create the following Excel Table with a color-coded and otherwise enhanced version of this information on valid Tetraloop configurations which Players can then use in their Lab Designs with confidence that these configurations ARE valid; that they ARE found in nature, and are also published in many scientific articles, and used in RNA folding software packages (such as Vienna RNA and likely EteRNA as well).

It has been my perception that thus far in the game, comparatively speaking, much less progress has been made by most players in improving their knowledge and skill regarding the construction of properly designed loops - than has been made in advancing skills at Stack Design (I know that is true for me, at least) - so it is conceivable this information could be a factor in changing that for the better.

I made the table in two accompanying sorts, the one on the left is sorted as I found it, which is in order of Frequency-of-Occurrence, or Abundance in the Tested Sample of 914 Tetraloops. These most abundant, most frequently occurring Tetraloops were also assigned the lowest energy values.

_The table on the right, I re-sorted for use by EteRNA players. It changes the sort to separate the Tetraloops by the bases of the attached stack-end, so it is easier to see what Tetraloops one can use with a particular stack-end base configuration, when composing a Lab Design. _

In both tables, I also inlcuded both 5’–> 3’ order and in 3’ --> 5’ order - to facilitate visualizations where the actual game layout may differ from the given 5’–> 3’ order, - (to save players the necessity of having to mentally transpose the sequences). The Tetraloops that work in one pair, will not always work on the flipped pair; they are mostly not inter-changeable, so some care must be taken to select the proper orientation.

I also inserted a slight separation between the two end-bases (the stack-end pair) and the four center bases of the Tetraloop itself, just in an attempt to further increase clarity and readability.

(Please click on the table for a larger, clearer version)

(Please click on the table for a larger, clearer version)

I hope this table might help make this information a bit more accessible to some of us; that it might help us all to learn these valid Tetraloop configurations, and hopefully that it might also therefore enhance all of our chances to excel and succeed in our future Lab Designs.

Thanks, and Best Regards,

-d9

Oh wow, excellent work and research d9! :slight_smile:

Bravo!!! that table is very pretty and so much easier to interpret than those darned ascii files. . .

Unfortunately the energy numbers don’t appear to line up with our model. The energy for the first tetraloop [GGGGAC] is indeed awesome (only 0.2 kCal) but the “energy” number in the table shows 3.0.

Maybe instead of pointing this out I should have just posted the numbers. :stuck_out_tongue:

Edit: Also, this is an awesome thing to post.

Chris, Thanks for pointing that out! These are the published numbers, but EteRNA’s implementation may indeed calculate somewhat differently. What I will do as soon as time permits is to actually construct each of these tetraloops in EteRNA itself, and record the readings that EteRNA generates for each, then publish a correction table tailored to EteRNA’s actual readings. Much appreciation for your “eagle-eye,” and always helpful comments! - d9 :slight_smile:

PS- Chris; Here is the methodology used in the quoted study to assign the “bonus” energies - It may help explain the differences between their values and EteRna’s:

“The magnitude of the bonus for each loop (Table 8) is based on its abundance in the database of structures assembled to test the algorithm. For this database, structures for each type of RNA were chosen from all available branches of phylogeny. Loops that occur more than 22 times in the structure database receive a bonus of ÿ3.0 kcal/mol. Loops that occur between 16 and 18 times have a ÿ2.5 kcal/mol bonus, ÿ2.0 kcal/mol is assigned to loops that occur between 11 and 14 times, and a ÿ1.5 kcal/mol bonus is assigned to loops that have between six and nine occurrences, inclusively.”

Again, thanks Chris! :slight_smile:

-d9

Eterna’s loop energy of the left column:

01=0.2 - 11=1.3 - 21=1.7
02=0.2 - 12=0.7 - 22=2.7
03=0.4 - 13=0.9 - 23=3.0
04=0.2 - 14=2.3 - 24=3.0
05=0.4 - 15=1.4 - 25=1.9
06=0.2 - 16=2.5 - 26=3.0
07=0.4 - 17=1.4 - 27=1.7
08=0.8 - 18=2.2 - 28=2.7
09=0.4 - 19=2.8 - 29=2.2
10=1.5 - 20=2.7 - 30=2.7

Ach!.. also, I just realized that all those energy figures are supposed to be negative numbers - and I neglected to put in the unary minus! Oh, well, I guess that is just what happens when you rush a project, have no proof-reader, and complete it late at night when sleepy. Next iteration, I will fix that oversight as well.

Awesome work! I also find it hilarious that they arbitrarily decided energy bonuses. Thanks madde for the extra work. :slight_smile:

EteRNA Tetraloop Reference Table (Revision 1) (which fixes the above issues) has now been published in a separate thread:

http://getsatisfaction.com/eternagame…

Special thanks to ccccc & madde for discovering the issue, and contributing to the solution. :slight_smile:

Best Regards,

-d9

… afterwards, it occurred to me I should also include that revised table right here - so here it is:

(Please click on table for a larger, clearer image)

(Please click on table for a larger, clearer image)

Thanks,

-d9

These “bonus” energies get applied ON TOP of the normal penalty for creating a hairpin loop of size 4 plus the normal reward for formation of the closing base pair (plus any stacks with the next base pair). This is where the difference between what eteRNA shows and the original table values above.

So I think both versions of the table are useful, because it’s worth thinking about loop penalties, stacking, AND tetraloop bonuses as separate, often competing terms - especially when you can’t get the native fold to match a target fold because loops are forming in the wrong place, especially an unintended tetraloop!

@ Chris, the energy bonus system is not as arbitrary as you might think. A free energy is nothing but a logarithm of probabilities; so if all tetraloops occurred with equal likelihood then the appropriate free energy bonus would be ~log(1/1) = 0. Each -0.5 kcal/mol is equivalent to saying it appears about 2.2x more frequently than you would expect from random chance alone (at 37 degrees), so estimating this from natural abundance is a good way to do this.

Finally, I’ll note that if you ACTUALLY measure the free energy of formation of these tetraloops, they are reasonably close to the model.

http://www.ncbi.nlm.nih.gov/pmc/artic…
http://www.ncbi.nlm.nih.gov/pmc/artic…

it’s worth noting for all the lab designers out there that the references above indicate that, in real life, the UUCG tetraloop is in fact EVEN MORE STABLE than the algorithm predicts, that C(UUUU)G is a mediocre tetraloop, and that G(UUUU)C is probably EVEN WORSE than the algorithm predicts.

Good point with the logs and such, but even in that case it’s just lazy arithmetic to apply the same bonus to the one that appears 23 times and the one that appears 87 times. I stand by my criticism that their numbers are needlessly arbitrary.

It’s true the abundance data doesn’t have good enough statistics for a direct fit for an energy function, its more of a supporting argument for needing an extra energy term.

I stand by my implication that these are fudge factors to match the experimental data :slight_smile:

But such kludges are inevitable in a base-pairing and loop-only model of RNA, there’s no mechanism to include other very important phenomena such as steric exclusion or salt dependence, for example, without going to a much less computationally tractable model.

:smiley:

Works for me.

Hi All,

After actually using this table to select alternate tetraloop configurations for my Round 5 Submissions, I decided I needed to make another revision for the following reasons:

  1. I missed having the Vienna Occurrences field (which I had removed from the first revision to make it “purely” Eterna-based information; I found that having some idea of which tetraloops were more prevalent in the Vienna Sample was reassuring when trying out a new configuration - so I put it back in

  2. Having the two different sorts both numbered from 1 to 30 was confusing. I decided each tetraloop configuration should have only one associated number in the table. Also, I found that in actual usage, I spent almost all my time in the far right-hand set of columns (the 3’–>5’ mirror of the Stack-End /Energy Sort, so I used the numbers from the right-hand table (the “Stack-End / EteRNA Energy Sort”) to assign the numbers, and in the other table, sorted purely by Energy, these numbers are retained.

  3. After reading this comment from Ding in the 1st Revision thread:

“I don’t think the old table should be completely superseded though - it’ll be interesting in the lab to compare the two sets of data. After all, the big question there seems to be when and how the EteRNA energy model fails :)”

…and this comment from alan.robot (excerpted from above):

“So I think both versions of the table are useful, because it’s worth thinking about loop penalties, stacking, AND tetraloop bonuses as separate, often competing terms - especially when you can’t get the native fold to match a target fold because loops are forming in the wrong place, especially an unintended tetraloop!”

And because in actual usage, I found myself flipping back and forth between the two (difficult since the numberings didn’t match 100%)…

…I decided that I just had to re-integrate the first two versions to include ALL the data, so it was all available side-by-side for easy reference.

So here is Revision 2… (the original and Revision 1 combined and integrated …hopefully the final revision - I never intended to spend quite this much time on this project) :slight_smile:

(Please click on the table for a larger clearer image)

(Please click on the table for a larger clearer image)

Thanks & Best Regards,

-d9

pow! excellent work.