Tell us about your EteRNA lab algorithms!

JeehyungLee · April 19, 2011, 3:57pm

We have a very exciting announcement:

EteRNA players are beating state-of-the-art computer algorithms designing real-world RNAs. Congratulations to our amazing players!

We would like to publish this exciting result so that scientists can learn about what you are doing.

However, we just need ONE VERY IMPORTANT thing before we can publish. We want to fully describe what you have learned and what strategies you use in the
publication. What do you do to select good RNA designs in the lab? Do you have an algorithm, or do you have a set of things you look at?

We’re looking forward to delivering great thoughts from the EteRNA community to the rest of the world!

[
Colored points indicate EteRNA players and gray points indicate computer algorithms. EteRNA players came up from the behind and won!

Raw SHAPE data of synthesized designs

Eli_Fisker · April 19, 2011, 4:46pm

This is amazing news. You ask for our strategy on how to select good designs. Here is what I do.

I look for colorpatterns. In designs that synthesis results says is good, but I also learned from the unlucky designs. (My own and others )

When I check a design in RNA Fold, I check under the option positional entropy. I compare the colorpattern in the eternadesign against the colored warningpoints of high entropy (bad if to high) in RNA Fold.

With colorpatterns I mean:
Ex. 2 AU basepairs next to each other, especially if turned the same way, might be a problem. Seems to change a bit from puzzle to puzzle, how sensitive the structure is to this, some of the puzzles seem to allow it. But 3 or 4 AU pairs in a row and you are usually in trouble. Meaning the structure gets unstable there. You might get away with tree, if turned in different direction.

A GU pair, if not carefully placed have a tendency to break up the area nearby. Haven’t broken this mystery yet.

There are more of those colourpatterns to be aware of, but these are the ones I remember. It’s not fail-safe, but it can make you avoid using a pattern in your design, that are very likely to fail.

Eli_Fisker · April 19, 2011, 5:01pm

And a simple one - always place a GC pair to close the loops, that also applies to the internal loops.

There should be a GC pair in the bottom and the top of a "hairpin/stem/helic or what it is called. As far as I remember I haven’t seen a succesfull design that didn’t follow that “rule”.

One to teach the robots

Ding · April 19, 2011, 7:08pm

I think this thread could be a great resource for players starting out in lab also. One of the most common questions I see about lab is “how am I supposed to know what to vote on?”

The first thing I look at is bond content. I pretty much only look at designs in which between 30% and 80% of the bonds are GC, and no more than 10% to 20% are GU. Those are rough guidelines – in a shape like the Cross or One Bulge Cross where most of the nucleotides are bonded (very little unbonded content) I want the GC content to be lower than in a shape like one of the Stars where there’s a lot of unbonded content and a more complex design.

Next thing I look at is tetraloops. If the tetraloops aren’t either AAAA or one of the known patterns that gets an energetic bonus in EteRNA I check to see if there are mispairing possibilities with a complementary sequence nearby (if for instance the four nucleotides in the tetraloop are AGUA and part of a nearby stack is UAC).

Then comes any multiloop. In the case of One Bulge Cross (the first lab I was around for) this would mean the center intersection. Fairly early on in the rounds it became clear that there were two central patterns that seemed to hold up better than others, so I looked for those. In the case of the more recent labs, it has meant making sure that any non-Adenine nucleotides in the unbonded parts of the multiloop don’t look like they’re going to interfere with the formation of that loop.

Next is the dotplot. I usually look at this as it is, then make any tetraloops in the design AAAA and take another look, then make them all GAAA and take a third look. This is harder to set rules for. I’m not necessarily looking for an absolutely perfect dotplot, but some things I like to see are an absence of what I think of as “shadow lines” running parallel to the lines we want to see (which suggest that two sections of the sequence may mispair rather than just a single bond), and not too much variance between the dotplot for the design as is, the design with AAAA loops, and the design with GAAA loops.

If the designer has given RNAfold stats I’ll look at them, otherwise I rarely run sequences through RNAfold or other servers anymore. If I do, I’m looking for an MFE% of over 80% for designs with AAAA tetraloops or over 90% for designs with “stabilized” tetraloops, ensemble diversity under about 0.5, and entropy range under about 0.3.

Finally I look at colorpatterns as Eli Fisker has described, and quad energies. I don’t like to see any quad energies over -0.9 kcal (a UA UA or AU AU quad). If there are GU bonds used, I like to see them stabilized on one side with a GC in a configuration that give -2.1 or -2.5 kcal. I also don’t like more than 3 AU pairs in a row, even if alternating orientation (though if we ever have another shape with significantly longer stems this may change). I also look for GCs at the beginning and end of all stems, though I won’t rule out a design that has one or two stems closed with AU especially in early rounds when I think we’re still testing the tolerance of the shape.

In deciding whether to vote for a design or not, I base the decision on both the above considerations and what I know about other designs that have received votes. If for instance most of the designs being voted up are at the high end of what I consider “desirable” GC content, I’m more likely to vote for a lower-GC design (again, especially in rounds one and two of a new shape). In later rounds, if we already have four or five modifications of a successful shape voted up to the top, I’m more likely to vote for a new modification of a different design, or an entirely new design.

edit to add: in all the above I left out two important things that are harder to quantify. First: the “neck” of the design (the stack nearest the open loop). We’re still having a lot of trouble forming this in all shapes, so I go by my most recent gut feeling about it. Second, the comment section. Especially for designs that are modifications of previous designs this is important – I like to see what the designer is trying to correct and how.

Joshua_Weitzman · April 19, 2011, 7:33pm

I saw a name that has won multiple time and everyone else has chosen and picked that design so i could get the points then I would design something smiler to other designs so I can get more points. You only five cycles of the game and your gonna publish why not get a more data. You don’t even have asymmetrical patters. The lab is flawed there is only 50 people playing and the same people win regardless of the merits of the design. 100,000 is a joke you are not going to get 33,334 people designing RNA if they see the same people winning. As i have said before the lab is a popularity contest.

JeehyungLee · April 19, 2011, 8:17pm

Hi Joshua - thanks for the comment!

I want to point out that the 5 labs and the 2 more labs including the current and the next lab we plan to do (before publishing the data) are about 30 rounds of data, each with 8 synthesized RNAs. That’s actually 240 synthesized sequences and we are also synthesizing designs from various computer algorithms as well, so we’ll have about 300 synthesized sequences. A set of 300 RNA sequences with full SHAPE data is actually quite a huge dataset.

It is true that there are people who get picked consistently. However, I want to point out that they gained their popularity by their past lab performances or their ability to explain why their designs are good. Also, new players do get their designs synthesized and make success. If you look at the “Past RNA Lab” record, you’ll notice that there are always new players who win the lab.

JeehyungLee · April 19, 2011, 8:20pm

This is really interesting Eli! Just curious - do you have a way to use stack/loop color patterns from past RNA lab data as well?

JeehyungLee · April 19, 2011, 8:24pm

Ding, this is super interesting! Can I ask how you made a decision of good tetraloops to use?

chaendryn · April 19, 2011, 8:32pm

Wow, Ding … awesome post

I follow a similar strategy to Ding. An additional step I use is to run some sequences through the Barrier server to see whether there are any obvious kinetic traps that would cause a suboptimal shape to dominate - http://rna.tbi.univie.ac.at/cgi-bin/b…

On designing, if the barrier results on design that synthesized above 80 in early rounds and 90 in later rounds show a problem area, I’d modify it and run it through again to see whether it potentially clears the problem area.

Something else that tends to influence my decisions on voting for a design is whether the person has bothered to explain their thinking. Perhaps they’re testing a theory or have a reason for modifying an already synthesized design. I’d be more inclined to vote for one of those designs above a ‘no comment’ design or one that doesn’t give any insight into what the person is trying to achieve by submitting their design.

Will spend a bit more time thinking about it and see whether there’s anything else I can add

alan.robot · April 19, 2011, 8:42pm

There has been voluminous discussions among all the top players and the developers on how the voting system can be improved - if you have any suggestions, I’m sure they would be welcome as this is a difficult problem to solve. There are almost nightly discussion about this on the chat about this, if you care to join in. Very few of the top players even care about their rankings or points at this juncture, and they spend an inordinate amount of time trying to mentor new players into the synthesis realm.

Secondly, if you look at DNA synthesis and sequencing costs, they are falling at a Moore’s law rate. So it’s entirely possible more than 8 people can get synthesized at a time in the near future, and the WHOLE POINT of doing that is to avoid needing any sort of “voting system” at all. Even if it was just 50 a week, that’s enough for all 50 of the active players at this point, so I don’t see any problem with this.

Lastly, I should also point out that the costs of this project are certainly not being covered by the small, exploratory grant the ETERNA teams currently has - which barely pays for the materials costs let alone technician/developer time and two labs of graduate students. Which means that most of the devs are doing this on their own time as an unfunded investment in the future - so publishing any preliminary results is really the best way to ensure that there IS a game in the future.

chaendryn · April 19, 2011, 8:59pm

I tend to prefer a GAAA tetraloop as it appears from past lab results to be less likely to cause interference. Though Mat747’s success with AAAA loops still confounds me

Eli_Fisker · April 19, 2011, 9:25pm

No, stacks I havn’t looked much into. The same with loop color patterns. Just checking now. There were a few loop color patterns, that seemed to be succesful, the classic GAAA, also saw some with 2 AA’s and 2G’s that were working (GAGA), and of course Mats all 4 A’s. And I know there is more from a list over typical and in nature working loop patterns. (For those who haven’t seen it: http://getsatisfaction.com/eternagame…)

There was also a few very colorful variations that seem to work. Here are the loop patterns among the winners (Sneh had succes with AUAG in the finger. Dimension9 introduced AAGA in the One Bulge Cross, where Berex used UGAA.

In the star, Donald got away with even 4 different and mostly new loops in the same design (GGAA, GAGA, GCAA, GUGA), Deep Thought used two loops of UUCG,

What pattern I see from the loop color patterns of the winners, is there tend to be an “overweight” with loops in especially yellow followed by red (might come down to player preferrence.) Four A’s, Tree A’s and one G and also different combinations of two A’s and two G’s.

I think people tend to get their own favorite. Also it is easier to use and vote on what you know is safe. GAAA

I think moving succesful loop patterns from old designs to new ones could work very well. Quite a few of the working loop color patterns returned in some of the later and winning designpuzzles.

Ding · April 19, 2011, 10:21pm

For my own designs, I’m using GAAA tetraloops at this point, for the same reason chaendryn mentions. It’s just easier to minimize the chances of mispairing than something like a GUGA loop, where I’d want to avoid using any UCA or CAC sequences for example.

For a while I was using GGAGAC and CGGAAG loops because they seemed a little less likely to be penalized points-wise. But then I did a comparison between identical arms with those and with GAAA loops and it looked like the CGGAAG and GGAGAC loops were losing more bonds further down the arm, so I switched back.

Something else I look for in the loops is the risk of a GGAAAC loop slipping (becoming a triloop).

AAAA loops are tricky. If they work (if they form) they’re less likely to lose EteRNA points. But it seems like you have to be really sure that there are no other places that the stacks involved can mispair, and I haven’t figured out a good way to predict this other than by the designer’s history. So I’d be more likely to vote for a design with AAAA loops from mat747 than from myself, for instance

Eli_Fisker · April 19, 2011, 11:00pm

Another thing. There tend to be problems in the bigger (inner) loops, if there are other colors than yellow, besides the area where the loop is connected to a stem. In the challenge puzzles it was sometimes a neccessary strategy to place extra nucleutides in the ring of the loop to stabilize the structure. But in lab it often makes the RNA structure fold in wrong places. It messes up the dot plot a bit as well.

The more othercolored nucleotides, the worse. A few may be okay. But generally the winning designs tend to avoid them.

Joshua_Weitzman · April 19, 2011, 11:13pm

Your conclusions are self fulfilling. They do not take in to the a count human nature. You have 300 answers to five basic shapes it is not a large date set that supports the publication of a statement like EteRNA players are beating state-of-the-art computer algorithms designing real-world RNAs You are aware of the major flaws in your system. Jeehyung Lee, Official Rep, replied 17 days ago
Hi Joshua,

EteRNA is preparing to get rid of the voting system and move to a new system called Elo rating system.

http://getsatisfaction.com/eternagame….
Check out this link for details!
You have more than 20000 people sign up. You have less then .0024% of the people contribute to the lab. Do you ask yourselves why? You are on a great start. But with out some growth from criticism in the Eterna system will be stagnant. You will not get the thousand of people to contribute like the other science base game. I am sick of pointing this out. I will be laughing when you get cold fusioned.

blubblub · April 19, 2011, 11:41pm

We have been testing our color template with each eteRNA RNA synthesized puzzle challenge winner and have come to the conclusion that it may be ‘limiting’ to assign only one color to each nucleotide: blue for uracil; green for cytosine; yellow for adenine; and red for guanine. There may be a higher color template that may help identify nesting patterns within the ‘good’ and ‘problem’ folding patterns to achieve 100% RNA synthesis. This may be more important as more complex challenges are released.

To quote Tevya in ‘Fiddler On The Roof’ – “sounds crazy, no?!”

This the ‘boohoo’ part. In order to actually test this idea the DEVS would have to produce an additional color template that allowed someone like us (blubblub) to assign any of 8 colors to the four nucleotide colors you have now. The colors would be black, white, blue, yellow, green, red, purple and orange.

In other words – the eteRNA folding and energy template and strategies underneath the current design still operate unchanged. However, players like Eli Fisker who use color to drive their intuition might find the ability to identify or discover new nesting patterns inside an existing design pattern useful.

Among other benefits (if it works) is nesting patterns could signal chemical and folding preferences within designs that may not be obvious right now. (If it works) the designs would be more articulate. (If it works) it would make any successful RNA design predictive.

One last comment. Marie Curie melted down tons of pitchblende to reveal the uranium that her and Pierre could hear with their Geiger Counter. In the end, she only had a spoonful of uranium for her efforts but it completely changed the world.

This collaboration to unravel the complexity of RNA is unparalleled. It is the stuff of a new paradigm. ENJOY!

Jonathan_Hall · April 20, 2011, 2:39am

I think the reason most players are not so involved in the lab is that it takes a big commitment, both to understand and to explain to others. It can also be intimidating when some of the best lab players are (or do a good impression of being) actual biochemists. When the challenges use such a different strategy they only somewhat prepare people for the lab.

I think it should be a priority to continually update Eterna’s computer modelling to better reflect reality (even if it messes up all the points). People are attracted by the puzzles, and some amass points because of the satisfaction of solving them. But we all know that GU competitions (which I love, and Joshua must too as a top-5 GU competitor) do not really help with the labs.

Of course this might be impossible, and we can just live with a fun puzzle world which attracts all kinds and a separate fun but time-consuming lab world which attracts mostly scientists.

–Jonathan

duanev · April 20, 2011, 5:14am

I was thinking I could probably create a program that solves the puzzles the way I do - do y’all have a simple way to represent the challenges in a data file? And a verification program (that declares a puzzle solved, and computes the total kcal for both incomplete and completed puzzles)? If there is interest in more grey-dot generators that can compete with players >:), post the above and some more bot code might be forthcoming…

Berex_NZ · April 20, 2011, 5:36am

blubblub, what is your background in this area? You seem to know a lot…

Berex_NZ · April 20, 2011, 5:50am

Joshua, two questions for you. 1) Why do you stick around eterna, to wait for the inevitable closure where you will revel in your own schadenfraude?
2) Please give me a list of all of your criticisms and recommendations, I would sincerely like to know what they would be.

Jonathan Hall, I am really curious. Who do you think of the players, do you think are biochemists? Because the only one I know that is trained in this area is alan robot.