Reverse Engineering EteRNA's Energy Model

A few of us have been trying to understand the EteRNA Energy model so as to write better Puzzle Solver scripts and also learn how to design potentially stronger lab candidates. We’ve written up a document on a system for estimating multi-loop energies that was reverse engineered from multi-loop energy values shown by the Puzzle Maker.

The 25-cent version is that multi-loop energies (for unboosted multiloops) seem to be dependent on stem adjacency and stem closure. The system was developed by obtaining a large sample of multi-loops with CG and GC stem closures in the Puzzle Maker, deriving from them a set of linear equations in 19 unknowns (9 cases that are dependent on the number of unbonded Adenines adjacent to each stem closure, times two for CG versus GC orientation, plus a base multi-loop energy) and then solving to determine how much each stem contributes to the multi-loop energy. The factors for non-GC and non-CG stems were determined by noting the energy difference between the CG or GC closures and other closing pairs. This process produced a table that seems to be predictive for unboosted multi-loop closure energies. (No counter-examples have been found.)

A Google Doc (with lots of pretty pictures and tables) has the details. Comments and criticisms are invited.

I very much appreciate this work and your linked document. Thanks!

This seems to be based on 3-junctions; have you looked at 4-junctions and higher?

With this and the quad tables someone could perhaps begin to develop a javascript based energy model for the game.

It works with 3 junctions, 4 junctions, 5 junctions - on all the cases we’ve tried so far. Some of the test cases needed to compute the coefficients needed four and five junctions in order to have enough unique linear equations to fully define all of the unknowns. So, yes, we have looked at 4 and higher.

As for a javascript based energy model for the game, I think internal loops and hairpins have their own peculiarities: this doesn’t cover them. So no 1-loops or 2 loops as yet.

JL - thanks for this. I may not full understand what you are doing here, but it occurred to me that the “ViennaRNA 1.85 vs 2.11” stuff that Jee put on Dev scripts might be used to programmatically generate some of the table data you are creating.

Idea would be to take a known E and structure/seq, loop thru mutations only in NT’s in the structure of interest and use the changing total E to back-calculate what the new Energy was for the structure of interest.

Simple example might be going in w/ ((((…)))) w/ CCCCAAAAGGGG, knowing total E = -5.8 and hairpin (area of interest) is +4.1.
Run Jee’s code on “G” boost (CCCCGAAAGGGG), returns total E = -9.5, so can back calc that a GAAA loop (w/ CG close, of course) is +0.4, i.e. (+4.1-(9.5-5.8)) .
Could rapidly mutate thru all the hairpin NT’s to build the table.

You’d have to check that new seq was still original struct (not a mis-fold), but it may be a quick way to build table rapidly.

I think a particularly strong “base” structure might allow a large # of permutations to be captured (i.e. not discarded b/c of mis-folds).

Note - I’m ignoring the 1.85 vs 2.11, but no reason couldn’t build both tables.

Yes. It’s a fine idea. One needs to compensate for the change in energy of the closing stack cell and as you say, make sure that the target shape still forms. It also can produce a lot of data, and reducing it down to the meaningful bits is still a task, but it can help with automating some of the data collection.

I’ve played with this approach for 2-2 loops, but haven’t had time to complete the project. I don’t have as much time for EteRNA and scripting these days.

EteRNA uses the Vienna 1.8.5 Energy Model. The source code for this model is available from the Vienna University website: “…”. I’ve been reverse engineering the C code to learn how EteRNA works. I’ve also found the Windows executables for the Vienna toolset (“…”) very helpful.

Nice post. It is probably a lot of work and way over my head but softcoding the values in the table would allow non-programmers to test different value ranges and match against lab results. Do you think that is possible or worthwhile? Not asking you to volunteer or anything.