Tell us about your EteRNA lab algorithms!

Eli_Fisker · April 22, 2011, 10:10am

Saw a design, no name, no blame :), in the voting area today, so here is what I think.

Example from this puzzle:

UUUU
GAGA

(loop to the left before nucleotides and inner loop to the right after)

For memorization: Lady Gaga should not sing UUUU in your puzzle

No lines with 4 U’s in line, no matter what their combinations of G’s and A’s are.

This apply to nucleotides, no matter what color. In general there often is a penalty for having more than two repetitative neucleotides in a row.

That applies both if the basepair only share the same colored nucelotide in the one side of the string or if the whole basepair is repeated.

Two repeated (sameturning) basepairs like 2 AU’s, sometimes even 2 GC’s too, can make trouble. 2 GU’s beside each other are a problem no matter how they turn.

The puzzle already by default rules out using four green and reds in a row.

So the general rule about repetitive nucleotides is:

Twist 'em baby!

Ding · April 22, 2011, 10:58am

I wholeheartedly agree here. Even 3 Us in a row is (in my opinion) asking for trouble. Maybe because of the tails which we have no control over being mostly As.

Really, the only place I want to see three of any nucleotide in a row is if it’s As in loops.

Ding · April 22, 2011, 11:09am

Something I noticed in the Star designs: it seemed to be pretty much okay to place a G or a C just before a stack in the central multiloop (so the segment was AAAC or AAAG), whereas putting them just after a stack (GAAA or CAAA) seemed to cause problems.

Eli_Fisker · April 22, 2011, 11:11am

Yes. 3 U’s in a row or G’s, C’s and A’s are a bad idea. 4 is a really bad idea.

Funny and true end comment.

wisdave · April 22, 2011, 1:40pm

Hmmmm… That wasn’t my design, but I do have 3 G’s in a row in one of the branches. When I flip the GU pair, the stack drops from -1.5 to -2.5, and the total drops from -59.1 to -59.4. The dot plot is minimally improved. I’ll remember this in future puzzles. Thanks for the insight.

Eli_Fisker · April 23, 2011, 6:56am

One of the robots just managed to make a row of 5 AU pairs. (Nupack bot design 1, The branches, 80 % synthesis) But that doesn’t mean that it usually will work.

Eli_Fisker · April 23, 2011, 7:41am

I noticed a thing more. As the neck area seems to be a constant battleground for us, I’m beginning to see a reason why it is so hard for us players to keep this part of the puzzle together.

My theory is that another rule we seem to work by, interfere with our understanding of how to make the neckarea stick together.

Interfering rule:
If a string hold more than two GC’pairs, sometimes there is a penalty for having them all turn in the same direction, even if they are not right besides each other. Sometimes it is even necessary to twist one of two, in each closing end of the string, GC’s basepair, as it improves the stability of the structure and make the entropy look better.

(I think this rule have to do with breaking symmetry and thereby preventing the structure from folding in the wrong places.)

The neck area is tricky and usually a lot of GC pairs are needed. (one robot made a functioning neck with only two GC pairs!, Nupack bot design 1, The branches) I think we unconsiously do as we use to do in other strings. That means that we often make the closing GC pairs connected to the inner loop turn in a different direction that the closing GC pairs at the end of the neck. Those who use CGCG (11-12 green/93-94 red) in the neck near the loop and the different turning GCGC (6-7 red/98-99 green) in the neck closing seem to be in trouble.

Actually we might learn from the robots. They seem to be onto something, when it comes to the neck. Maybe it is because they start from the neckarea working their way in and we maybe have a tendency start from the middle and forget that nucleotide 1 and 2 are red and will pair up with whatever pair of green nucleotides they can find their way to.

xmbrst · April 23, 2011, 4:29pm

Voting strategy:

Look at designs with GC content at the lower end of the range for this shape, and pick a maximum GC content threshold such that there are some designs with GC less than this threshold that also have clean pairwise probability plots.
Look at the designs in descending order of current number of votes, and vote for any that have GC content less than threshold and clean plots.

Meta-comment: Some people have described their self-interested voting and designing strategies with a note of cynicism. I think that the cynicism is misplaced: the game is essentially an ensemble algorithm: we have many agents with their own algorithms, and the voting game is how the strategies get merged. Ensemble algorithms usually rule in natural language processing, and they seem to rule here to. You could probably write a computer program that captured this.

Eli_Fisker · April 24, 2011, 3:01pm

Value of sameturning CG pairs in inner loops partly proven.

Two of Mats synthesised Branches designs are almost twins. One received 85 % synthesis score the other 90%.

There are only two small differences:
The highest scoring design have all sameturning GC pairs in the loops, while in the other, the samedirection ”rule” is broken in the two small loops.

Two GC basepairs (63-68 and 62-69 are twisted different. I modified the 85 version and discovered that twisting these mentioned GC pairs, is a necessary move to keep the ensemble diversity down after breaking GC-direction in the two small loops.

When looking on Mats two designs in the blue/yellow mode (color sequenses based on experimental data), it also seems that breaking samedirection GC’s in the two smaller loops, destabilize the centerloop as well. (Wonder if the same happens in designs with tetraloops?)

It seem to result in a higher synthesis score if you drive your GC-car the same way around in the innerloops. Not only just in the centerloop.

Mat – Branches V1.1 (submitted) 85% synthesis score
http://eterna.cmu.edu/htmls/game.html…

Mat – Branches V1 90% synthesis score
http://eterna.cmu.edu/htmls/game.html…

Thanks Mat, for your awesome designs.

blubblub · April 25, 2011, 5:54am

We saw the exchange between Matt747 and Jeehjung tonight in the eteRNA community forum chat area concerning color and coordinates for eteRNA designs. Both ideas complement each other and would be excellent tools. If color blind mode means the ability for players to assign their own colors to current data sets that is fantastic. If so, team blubblub would like to make five requests based on our experience: at a minimum, could the color black be added to the current color template?; could we be given the ability to assign any color to any nucleotide (Example: guanine could be red but it could also be black, yellow or blue); could the color template be made available for both the synthesized and unsynthesized data sets?; could the expanded color template be made available for player created puzzles?; could at least one nucleotide ball template be expanded to 1024 balls?

If future developments provide these additional layers of support, we will happily post and share our results with the eteRNA community.

Eli_Fisker · April 25, 2011, 1:23pm

I just noticed something funny in Mats 90% Branches design. All the GC-pairs in the innerloops are turning in the same direction. This means, if you drive along the string in the reading direction, you are driving against red light all the way! (RNA are read counter-clockwise, from nucleotide 1 to 119)

The exceptionareas for this redlight driving are the neckarea and the loopclosings, the last mentioned, because in strings it’s not good to have all GC pairs turn the same way, as I wrote in another post: If a string hold more than two GC’pairs, sometimes there is a penalty for having them all turn in the same direction, even if they are not right besides each other. Sometimes it is even necessary to twist one of two, in each closing end of the string.

(Mats branches design, 90%)
http://eterna.cmu.edu/htmls/game.html…

And it is not just Mats puzzle following this order. All the winners in the branch design so far and many of the players in general, turn most of their GC’s not only in the same way relatively to each other, but in THE same direction.

Rule about GC direction in innerloops:
In the the innerloop; green to the left and red to the right

Left Right
Green Red
G C

Starryjess’s Y oh Y, 84% (which so far is one of the best synthesised branches designs), breaks in 2 out of 3 points, where GC pairs in the innerloops are red/green and not green/red. Visible in the yellow/blue mode.

http://eterna.cmu.edu/htmls/game.html…

I have always thought the loops to be very different from the innerloops. I’ve been thinking, maybe this samedirection GC rule in innerloops, comes down to the same rule, which makes it important in which direction (if you use others than A’s) you places your nucleotides in (tetra)loops. The reading direction of RNA.

So I am thinking that direction/sequense/turning of nucleotides are important in both (tetra)loops (which were already formulated) and innerloops (which we intuitivly did), simply because of the fact of reading direction in RNA.

Actually, that nucleotides other color than yellow only seem allowed in certain patter on one side of the GC-pair in a innerloop and not in the other, kinds of support the samedirection GC rule in innerloops. Ding wrote earlier: Something I noticed in the Star designs: it seemed to be pretty much okay to place a G or a C just before a stack in the central multiloop (so the segment was AAAC or AAAG), whereas putting them just after a stack (GAAA or CAAA) seemed to cause problems.

The rule seems to hold in the multiloop branches design. But it might be a bit different in earlier Lab designs. Might have something to do with the numbers and size of innerloops.

In The bulged star, Starryjess managed to turn 3 of 5 innerloop GC pairs in the opposite than usual direction. (Starry’s bulged star, 90%)

http://eterna.cmu.edu/htmls/game.html…

Direction of GC pairs in loops seem not to matter much in the The bulged star in general. I think it is because the small loops in the string make the structure more stable in itself. An argument for this, The Star is very similar to the Bulged star, but without the smaller loops in the string. The rule for GCpair diretion in loops were strictly followed in the designs with synthesis success above 82%.

Conclusion: in puzzle with big innerloops and long strings with no smal loops in them and multiloop puzzles, the nucleotide police orders you to drive against red light, going on green nucleotides is usually a no go area.

Just a last thought:
Would a really good design still work/be as good, if we swithced all the GC pairs in the inner loops in the opposite direction? (I’m aware this operation will affect how or if the neckarea would work).

alan.robot · April 25, 2011, 7:49pm

@Jonathan: I think your comments are all spot-on with what devs and players alike are worrying about, so don’t feel as if you are alone and unheard in these concerns. Bridging the puzzle->lab gap is probably the biggest and most significant challenge Eterna is going to have to address, both the devs and the top players are thinking really hard about this.

I, in particular, think that GU challenges have been sending advanced players all the wrong messages, as I can’t think of any problem that would realistically be solved in such a way. And that has nothing to do with the Eterna model per se, but rather the game-mandated constraints on what a solution should look like, and I do think those could be made to be more realistic (for example, keeping GC/AU ratios sane).

It is impressive that you might mistake any of the top players to be trained biochemists, as far as I know none have any formal post-secondary exposure to biochemistry or molecular biology, it’s all stuff they picked up along the way as they become obsessed about becoming better at the “lab” portion. There are some grad and undergrad students playing as well, but AFAIK none are in the top 100 or so players, and none of the players getting synthesized has any such training. So, if it’s any consolation for the rough transition to lab designing, none of the others who successfully bridged the gap have had any special training other than being meticulous, observant, and creative.

JeehyungLee · April 25, 2011, 8:01pm

That’s very interesting memory60 - do you have any specific patterns and colors you choose?

JeehyungLee · April 25, 2011, 8:04pm

This is indeed amazing - kind of like a pseudocode.

EteRNA bots can only do designs - their algorithms were fundamentally created to do the design only, and nothing else. Our hope is to make a bot that can actually vote based on player strategies like yours : )

JeehyungLee · April 25, 2011, 8:06pm

Zila Gorila - totally what we are looking for. Very interesting strategy!

JeehyungLee · April 25, 2011, 8:08pm

Xmbrst thanks for sharing your strategy! GC ratio really seems to play a major role in picking good designs among player

Adrien_Treuille · April 25, 2011, 8:13pm

Eli: This is so fascinating. If you modify one of the most “winning-est” designs (like Starry’s bulged star) and reverse the GC pairs, tell me and I’ll vote for it as much as possible, and hopefully we can get it synthesized.

(It’d actually be great to get a little coalition to accomplish this.)

Jonathan_Hall · April 26, 2011, 4:18am

@Alan (and Berex):

My impressions about biochemists involved with the lab were just general impressions from snips of chat and comments. Someone asked about synthesizing RNA elsewhere could be added to the lab database; people pass around articles and RNA analysis websites like they’re common knowledge. The few players’ backgrounds I do know are not biochemistry, but my concern is the impression a lab newcomer has.

My suggestion is that we maintain an up-to-date basic lab guide. It would include:
–basic terminology
–basic goals (is 90% a success?)
–instructions on how to interpret lab results
–basic principles learned so far
–a few examples with explanations about why certain pairings failed
–a quick rundown of the conventional wisdom
–links to helpful spreadsheets, conversations, and websites

This would have made me feel a lot better jumping into the lab. I keep saying to myself, “When I have the time, I’m going to go through all the lab results (whatever we’re up to now) and analyze what works and what doesn’t.” Of course, the more results come in, the more daunting such a task is. I’d be happy to work on such a guide, but I can’t promise it would happen anytime soon.

As for the puzzle play vs. lab design gap, I think it would be ideal to have a percentage score for the puzzles (i.e., your design has an n% chance of folding correctly). Different difficulty levels would have a different target percentage. Of course, this would require a more advanced algorithm, since currently a mostly GC puzzle seems to be the strongest.

I do think the GU challenges help identify the weakest spots in a structure, even if that is their only practical merit. I definitely think that competitions in the puzzle realm (both the GU sort and the designing a puzzle that few can solve sort) are important in attracting and retaining puzzle-solvers who like a good challenge. I’m only guessing, but I would think Joshua is still active on Eterna only because he had fun doing all the GU challenges and now designing difficult player puzzles.

(Sorry for the mishmash of comments and ideas, especially as I notice that I am not telling anything about my Eterna lab algorithms…)

xmbrst · April 26, 2011, 5:04am

Count me into the hypothesis testing coalition. Just let me know what to vote for.

alan.robot · April 26, 2011, 5:04am

Great comments Johnathan! Hopefully others are taking note. .

I remember being particularly underwhelmed when, upon reaching 10k points, the only instructions for jumping into “lab” were the two youtube videos, and links to very old forum posts.

I had expected some sort of communal “lab notebook”, like a wiki where anyone with lab access could edit (note the requirement of having an account with 10K points to edit pretty much eliminates random vandalism a la wikipedia- any player should be able to view content of course).

I think there are many players who would pitch in to organize and fill such a wiki with useful guides and content, but right now we are pretty much restricted to getsat posts which really isn’t amenable to this sort of endeavor.