Tell us about your EteRNA lab algorithms!

JRStern · April 20, 2011, 6:37am

Well, we cheat, of course!

The computer algorithms take a single shot.

The humans try stuff, get feedback, and then evolve towards the higher scores.

Frankly, I’m underwhelmed by it all.

I think the human results are simple mathematical outcomes of the game theory roles of the humans versus the algorithms. Wrap the algorithms in a multi-pass structure, even a dumb one, and you’ll equalize the outcomes.

As for the voting, I’m with JW, the way to score is to pile onto the popular ones. The actual differences between the models is miniscule, and I have zero reason to believe that the best candidates are being selected for synthesis. Well, that’s a little too harsh. More accurately, you could take any of those that score points for being similar and probably find several that are superior to the one that was synthesized. This might actual merit doing. If the ones selected even rank regularly in the top third, you could call that a success, but I wonder if they would.

Peter_Stampfli · April 20, 2011, 8:06am

I like to do the puzzles and to make lab designs. However, I do it intuitively and have no rules, only experience.
Lab designs I do from zero, often I begin with all AU bonds and change things… ending up with designs similar to Mats or Dings or others …
this finally gives me some lab reward points.

For this reason I have no idea how to vote. I simply vote for my own design to get some more points.

I participate because for me it is amusing and I have no particular ambitions.

JeehyungLee · April 20, 2011, 12:08pm

JRStern,

First, I wouldn’t say evolving via trial-and-error is cheating. That is a fundamentally why people are better than computers at certain stuff and why we believe in human computation.

Second, if you look at the results, people beat computer from the round1 of “Bulged Star.” And in the new shape “The Branches” this is true as well. This clearly shows that players not just blindly submitting designs - they have ideas on why some work and why some does not and we want to figure that out.

Wrapping algorithms in a multi pass structure - this is precisely what most machine learning researchers in the world dream to do. There is no easy way of generally modifying existing algorithms for multi-pass trial and error. Existing RNA design algorithms were fundamentally NOT designed to take in priors, and it’s impossible for them to take multi pass unless they are redesigned from the scratch.

As I pointed out to JW, popularity is not blindly based on how much people like specific players. Popular players earned their fame by past performances and their ability to write - which is essential in research. Indeed, we have no guarantee that best candidates are being selected. But I also have no reasons to believe candidates chosen by players would be worse than random pick.

JeehyungLee · April 20, 2011, 12:27pm

Joshua

The team is moving as fast as we can to do more recruiting on the lab - let me assure you that we have no intention of being stagnant at all. We have a very small number of developers right now and evolving lab requires of tones of work.

penguian · April 20, 2011, 1:39pm

I have a few meta-strategies (i.e. cheating) to make my designs more likely to be voted up:
(1) Get in early in the first round. In the first round of a new design, I get in very early, and try to use standard puzzle algorithms (AU on stacks, GC on corners, GU for variety, one G somewhere in short loops, etc.) I then try to increase energy by flipping, then change more GC to GU to see if still stable. At this point I start checking the dot plot, and flip and change until I can make it as clean as I can. Next I check against RNAFold. If I get 90% or more there, I am satisfied and publish.
(2) Try to use a snappy name. My first names were nuclear tests (Starfish Prime), then story characters (Trillian, Ford Prefect, Frodo, Bombadil), then names related to animals (currently amphibians: Axolotl, Toad Hall, Kermit).
(3) Be successful in earlier rounds, so voters know who you are.
(4) Get in early in later rounds. This gives your design more time to accumulate votes. If your first design is good enough and early enough, your second and third designs in the same round, even if far “better”, will never catch up in votes.
(5) Recycle your “better” designs from earlier rounds into the next round, with trivial changes, if necessary.
(6) Better still, modify someone else’s successful design from the previous round. In later rounds, I modify my own design or someone else’s, to try to strengthen the main stack, to clean up the dot plot, to diversify stacks, to increase the percentage on RNAFold. No matter whose design I use, I keep a short snappy name, with no “Modified” or “Version” in the name.
(7) Start your design comment with the RNAFold percentage.
(8) Vote for your own design, so that it starts with one vote.
(9) Vote for “dot-plot clean” designs with 2 or more votes than yours - your design won’t catch up anyway, and it ensures that designs with less votes than yours won’t get those slots.

Adrien_Treuille · April 20, 2011, 2:14pm

Fascinating, and totally not cheating. You’re playing the game as it was designed, which is very interesting.

Joshua_Weitzman · April 20, 2011, 3:15pm

It will be stagnant in it current configuration. For every 1 person you get to contribute to your lab you will need to have 413 people sign up. if you wanted 100 additional contributers to the lab you need 41300 signup 1.5 times your current enrollment. For the goal of 100,000 synthesize per mouth you need 3.2 mil new recruits thats large then the top podcast shows. They can spend hundreds of thousands on there infrastructure and bandwidth. your current business model is good for selling high end luxury cars. Not for developing a Human base multiprocessors computing system that can do better then current computers. http://youtu.be/hSVo4ejZ7rc Only fifty people contributing to the labs will not cut it. You will need thousands of people contributing to make your statement true. Right know you could say Eterna players show promise of beating state-of-the-art computer algorithms constantly. Your going to publish your statement have a rush of people come to signup and only harvest a handful of contributers to your lab in turn souring the experience of the people took the time to come to your game but did not contribute to the labs. But if you take the time and make some simple changes to your reward system that would make a larger % of people feel like they are contributing you could capture a larger % of people that would contrubute. I could not run a business on a small advertising budget if one in 413 persons that came in bought something. Your idea has good bones but it need to be worked out with criticism and critical thinking not glad handing. If it was a business i was running i would experiment with the rules to increase your capture % before publishing a one in a life time statement like you are plaining on doing.

JeehyungLee · April 20, 2011, 4:07pm

This discussion is now at

http://getsatisfaction.com/eternagame…

memory60 · April 20, 2011, 4:45pm

I have no lab RNA experience so I only put colors and patterns together which I like to do and like getting rewarded when getting the correct sequence.

Joshua_Weitzman · April 20, 2011, 7:34pm

You could do both. Study your lab contributers to make your algorithms and fine tune the rules of your game to increase your lab participation. Is your lab groups choosing the best possible design or are you make an algorithm that will choose the 2nd or 3rd best design. If I wanted to model a system I would have it at is peek efficiency have a group in one cycle design and select for high% synthesis. I think you would need a slightly larger group to do this constantly. The problem you have with publication is you need it to grow. But every time you publish you expose new people to your game which is good. The bad part is you have horribly retention rate. And to get people to come back to your game after they exposed to something they have rejected is hard. I don’t think there its a large population of people that enjoy playing science based games. the more people you have signed up the more money you will have to put into data storage. you can reduce your overhead by increasing your retention rate. if you have more then one selection system I would roll them out in parallel and let them compete to give the optimal metric. The voting system could be as fun as the design system. If people feel like they are cheating then there is a flaw that is driving down your retention rates.

Eli_Fisker · April 21, 2011, 7:58am

Anyway Dermochelys 95% synthesis design got away with placing one green besides each stem in the inner loop. (assymetries GC bounds in branches, The star) But the usual scenario is that more often than not, there is a penalty for placing things besides the stems. There might be a few golden combinations.

Eli_Fisker · April 21, 2011, 8:07am

Think you are right about a closing AU pair in the top of a stem. If the nucleotides are played wise, a few might be allowed.

Eli_Fisker · April 21, 2011, 8:40am

I have noticed that there seem to be a rule for the direction of the closing GC pairs in inner loops. In most of the winning designs, the GC pairs turn in the same direction. (EG: GCaaaaGCaaaa… and not GCaaaaCG…)

(RNA fold seem to have a preference for same direction GC pairs in loops as well. It is visible in the colors of positional entropy, if GC pairs in loops are not turning the same way)

Well no rule without exception, Donald managed to break it with his Dings’s Imp’o’imp, in the star puzzle.

But in designs with multiple loops, as The branches, the picture might be a bit less clear. There still seems to be a preference for same directional GC pairs, especially in the middle loop, which holds tree big stems together.

In the two loops (In the 20-34-36 area and 59-72-85) I guess there could be a certain alowance for different turning GC pairs. Starryjess made Y oh Y, with a synthesis result of 84 %. But we only have 5 results with a synthesis % over 80, so I’m only guessing.

If my theory is correct, it may come down to that different turning GC pairs in this area, makes a bit of assymetry in the outer part of the structure, which again helps the stems avoid pairing up with each other.

JeehyungLee · April 21, 2011, 8:04pm

This is incredible Eli.

Do you think your ideas could become an automated algorithm? We are thinking about implementing a way players write their strategies as computer scripts and see how they do.

Eli_Fisker · April 21, 2011, 9:18pm

Hi Jee!

I’m not sure how the algorithms work, but you know how long the string of nucleotides in the puzzle is and the end structure you want it to fold into. So you know where along the string/ in which areas the GC pairs that closes the stem, are connected to the inner loop. (12-93, 90-54, 51-15) The branches puzzle

So if you make a computer pick these 3(6 nucleotides) points in the inner loop and run a check that those nucleotides turn in same direction.

I’m not sure if the algorithm se the puzzle as a string or a structure. But if it sees it as a string, you could tell your computer that if nucleotide 12 is red, then nucleotide 51 and 90 should be red as well. And if 12 is red, then 93 should be green, and when 93 is green, 54 and 15 should be so as well.

And the oposite senario if 12 is green then 51 and 90 should be green, then 93 is red and so should 54 and 15 as well.

You can then ad the same kind of check for the two other big loops with 3 GC pairs in them, if it turns out that there is a benefit from the GC same direction rule there as well.

You would have to reset this algorithm for each new puzzle, because the GC spots will change.

And you can do the excact same thing with the nucleotides in the big loop that does not close the stems, and check that all are A’s, or that the color of the non-A nucleotides are a green or red and in a position that might be allowed. You know the position(number in the row) of these nucleotides in the end structure, therefore you know which nucleoutides the computer needs to colorcheck.

Now you asked. I was actually wondering if it would be possible to get the robots to vote on designs, from criteria like Ding’s max 0,3 entropy, ensemble diversity under 0,5 and now my GC direction thing? Not that I want them to, I love voting.

zilagorila · April 21, 2011, 9:55pm

I’m not quite sure that my response is what you are asking for, but I thought I’d throw it out there anyway.

I have a strategy for solving the puzzles generally (at least to the point of being accepted by the software). I use the following steps:

Put a G-C at the opening of every loop
Use A-U pairs on the long stretches, switching between A-U and U-A because this is more stable than putting As or Us next to each other.
Put a G in every loop next to the C (or U) of the pair at the mouth of the loop. This is particularly important in small loops
Put Gs for single pair loops
When a single base needs to be unpaired, make it G, and surround with A-U pairs with the Us on the same side as the G
Do whatever is required to further stabilize the molecule. Sometimes this means adding more G-C pairs, sometimes switching the order of a pairing.
If the molecule has symmetrical components, make sure that the components each contain something unique to prevent multiple pairing possibilities.
Add in the required number of G-Us

I haven’t become so enlightened as to optimize this strategy for getting good synthesis results, but it works for the majority of puzzles.

Like Eli, I have also noticed that it is often better to make closing G-C pairs (pairs at the mouths of the loops) in the same direction.

Happy solving!

duanev · April 21, 2011, 11:18pm

Python is a great scripting language…

blubblub · April 22, 2011, 4:50am

@Berex – Thankyou for asking…my background, training and Industry Consultant Degree is in Information Flow Management. I managed the State of Wisconsin and University System Account for AT&T Communications in the 80’s. During this period, I was lucky enough to personally meet with Alan Huang before he became the head of Optical Computing at Bell Labs. He told me “all information is rooted in light”. I had already been studying Base-2 as the mathematical language of information since 1972, and I have been looking for the path to light ever since. It is my passion. My slide show and color template can be found at: https://picasaweb.google.com/collinsm…#

blubblub · April 22, 2011, 5:01am

First off – amazing work, Eli! Second and this is directed to both @Eli and @Jee – blubblub seconds any initiative to expand the current color template for the four nucleotides. Our ideas our already stated above and the images with captions can be found at: https://picasaweb.google.com/collinsm…#

Eli_Fisker · April 22, 2011, 7:47am

A comment to Dings word: I also don’t like more than 3 AU pairs in a row, even if alternating orientation (though if we ever have another shape with significantly longer stems this may change)

Yes, there seem to be a difference in what colorpatterns are allowed, just depending on the length of the string in the puzzle. The longer the string the less ruledependent/sensitive the design seems to be. (Or the rules for what is allowed changes)

In the finger, which is basically a long string with loops in it, Mpb21 made a 6 basepair long row with AU’s in it, and it stayed perfectly together. (Good design)

In The cross which had long strings as well, Mat made a working row of 5 AU pairs, but the two with 6 AU pair broke apart. (Mat cross design V2)

But this would not have been possible in any of the later puzzles, with shorter strings and big inner loops.

Theory: The longer the string, the less sensitive to colorpatterns in general, like 2 sameturning AU’s beside each other and GU’s relatively close to each other.

In the finger designs you can see loops being held together by GU (Pure butter) and AU (DRV_a-Perception 10)