Eterna dreams

Eli_Fisker · October 2, 2011, 10:38am

I would love to have all the past lab submissions, synthesised in the lab. It’s an old wish of mine. I have been discussing this with Brourd and Mat and both think this is a good idea. Actually if it was not for them discussing this topic and including me, I would probably still just have been “complaining” about lack of data with regular intervals in the chat, instead of writing about it. Thanks to you both for ideas and feedback on this.

I think there is a lot more to be learned from the designs we have already submitted to the lab - those which did not make it through the voting process. In most of the lab series we need more data to uncover interesting tendencies or back up the ones we already have found.

I will illustrate with an example, why I would like all the data. I made a spreadsheet, where I looked at energy conditions around the neck area. My data criterion was that I would look only at working necks. I had the feeling that a energy pattern were at play in the neck area. I was interested finding out how the placement of energy around the neck helped facilitate making the neck work.

This energy tendency I was onto for months, before I wrote a post about it. I was waiting for more data to see if the tendency I spotted, were confirmed in the other labs as well.

One look at my data and it becomes obvious, that in the labs with few rounds or few working necks, that there is too little data to draw clear conclusions from. I couldn’t get clear tendencies from the star- and the finger lab. (The data spreadsheet originated from this post.)

Having the previous lab submissions folded and scored, would solve my data problem. Then I might be able to predict more tendencies about what is the right energy condition around a neck, for it to be successful. If all the submitted lab puzzles had been synthesised, I could have written my post and theory about it much earlier.

Another positive side effect is, that there will be much more near twin designs, where only one or a few nucleotides have been changed. Those designs have already proven to be very valuable, when it comes to see what pays of to do and what does not.

Broud mentioned that he thinks it would definitely bring in more players. The chance to have ones design synthesized, not having to sit back while other players always have their designs picked.

The more results we get, the more we learn from our submissions. The more we learn, the more we can contribute to science.

More synthesized slots would also means we could have a new game area, where we could test negative hypothesis. To see if designs we think will fail, actually do fail.

Fx. the lab “Things to test” had a new element, the 2-2 loop. It did not behave as usual. We were not sure how to make it work, therefore we experimented a lot. Things that haven’t worked before, just might. I have earlier described why filling multi loop rings and loops with blue nucleotides were a bad idea.

In second lab round, I on purpose made a mod of Mats successful 94% scoring round 1 design, with pure blue inside the 2-2 loop, to rule out that this would work in a design. I also added an extra GC-pair, which I for the sake of the purity of my experiment, probably shouldn’t have. But as I suspected the blue 2-2 loop didn’t work. And for the next two rounds people dared not vote on me.

If we had a playground in the lab where we can test negative hypothesis, we could test things like this, unpunished. It would even be cool, if we got points for failing in there. People could be allowed to bet, for or against if an experiment will successfully fail. As Mat says: I think the designs would need to be submitted into their own voting category for the idea to work fully.

Experiments with negative hypothesis could lead to finding patterns for why certain things don’t work - sort of the rules of the misfolds. If we can find the rules for what for sure won’t work, we are well on the way to discover more rules about what works and what is to be avoided.

So for Eterna past, present and future – more slots, please…

Adrien_Treuille · October 2, 2011, 4:20pm

It will happen. Seriously!

One thing I’ve been thiniking about is suppose there are, let’s say, 10,000 syntheses per week. Right now, that’s more slots than we have submissions, but that will quickly cease to be the case. There will always be slots scarcity, no matter how many we have. We need to find a way to allot slots when we we have slots-a-plenty, but voting is definitely no longer going to cut it.

Any thoughts?

rhiju · October 2, 2011, 6:37pm

To reiterate what Adrien said … This will happen, and it will be amazing.

It is going to take some fairly challenging experimental innovations, but rest assured that we are ordering the DNA, as we speak, and hope to have all the kinks worked out of the pipeline in the next 3-6 months. Its a big undertaking, but a top priority for my lab and for EteRNA.

Now the question for the players is – with that much throughput (10,000 slots/month, perhaps per week), how are we going to analyze all the data? How can we “publish” our insights? We have some ideas, but we are most looking forward to yours. Please put them here, and I think we should plan another ‘chat summit’ meeting in late october where we discuss this not-too-distant and incredible future…

Quasispecies · October 3, 2011, 4:49am

10K/month? Per week? Holy smokes. I’ve thought about posting a strategy based on a library of structural subunits, but I thought the lack of synthesis data (currently) would make it of limited use. That’s a lot of RNA, but it would be great for the strategy. I’ll try to post it this week when I’m not about to fall asleep.

mat747 · October 3, 2011, 1:15pm

Hi rhiju, Adrien

“10,000 slots/month”
Ok. lets say we have 10,000 slots per month.

What will be the percentage of slots given to players ?

How long will a lab cycle take per target shape/Round ?

Rhiju

“We have some ideas”
What ideas, could you post them.

Eli_Fisker · October 3, 2011, 2:38pm

Hi Adrien and Rhiju!

Seems I’m going to have my wishes fulfilled… and then more.

Wow, you guys are nuts! But in just the right way.

You asked about our thoughts, so here is mine.

RNA LIBRARY SYSTEM (I liked that Quasispecies mentioned the word library)

With that many slots, we need a system (maybe more than one) to sort the results of the designs, in a way that makes them easy to find or drag out the data we want to look at.

We need to be able to search for data from specific criteria, just like in ”Current lab”: Eg. This number of GC-pairs, intervals minimum 12 GC-pairs and maximum 18 GC-pairs and combined with this amount of free energy. Just much more advanced.

It should have groups like in ”Current lab”, but with more categories, as we know more about the RNA after it has been through the lab, than when it enters. We should think through which kind of extra data this gives us, and try to set up categories for it.

This rna library system should also be able to register which part of the elements in a RNA design are working. Is the neck working, do the smaller outer loops folds as they should, how many of the strings are working and which.

ENERGY

I know the energy numbers we have on the RNA, comes from an energy model, and that we can’t be sure it looks the same in the real folded rna. But I would love to be able to compare energy in specific spots, or for whole elements (like a neck) on RNA.

Just like in the example I gave above in the post. I would want to compare the necks that are working. Then I would want energy shown for spot 1, 2 and 3 in combination with the element ”working neck”

Spot 1: energy level inside multiloop, spot 2: energy level inside of hook and spot 3: the collected energy inside of the whole neck.

I should also be allowed to drag out a certain portion of this data, intervals of it. I would also like to be allowed to compare the same spots in different lab shapes.

We would also need a system that somehow can be used to cross examining things between the different lab shapes. As I’m fasinated with necks, I would love to be able to compare working necks from one lab to those of another. I know many of them have different lengths. But some are of same length. I would also like to be able to compare designs, that have multiloops of the same size.

DESIGN STRATEGIES

We will have to rethink our RNA designing. Here is one idea: Change in one puzzle, equals changes in many.

I could imagine taking up an old lab puzzle, and lets say, drag all the existing designs for that. And then through one puzzle make changes at chosen spots, eg in the multiloop, and change all the basepairs in the multiloop to GC-pairs that turn in the right direction. And then synthesise all the designs with that specific change. Sort of to test if it improves the statistic for correctly folded designs.

Or same situation, different strategy. I could throw in a succesfull neck in a group of old synthesised designs, and then see if it improves the overall succes of the designs. With many more slots comming, I guess that my strategy ”Catalog of necks” would pose no problems.

ENSEMBLE ALGORITHM

I guess most of the many slots, are going to be the work of our ensemble algorithm. If I understand it right, then having tons of synthesised RNA, will help teach the algorithm learn what is right and wrong.

EXTRA IDEA

What about some sort of inbuild spreadsheet in Eterna, where we can drag the data in from the ”RNA library”. Also I know too little about making graphs. Make the tools easy for us to use, like you have done so far and we’ll give you scientific results

Adrien_Treuille · October 3, 2011, 8:00pm

Eli,

Thanks for this fantastic set of ideas. I’ve suggested that we go over them in our next developers meeting, and I’ll let you know what ideas come out.

Eli_Fisker · October 4, 2011, 5:56am

Hi Adrien! Happy to help. This sounds promising. I will look forward to all this new eterna playing with old eterna.

slydog · October 6, 2011, 7:19pm

Perhaps the lab focus could be expanded to test proposed hypotheses. Player would submit hypothese they would like to test. A first round of voting would select which hypothesis to pursue. Players would then propose one or more puzzles to test that hypothesis. Other players could also propose puzzles for the hypothesis. Then, players would submit different designs with a goal (this will work because … , this won’t work because …). We could vote first on which puzzles are a good test case for the hypothesis, then which solutions best add new information.

This is a bit loose, but I hope you get the idea which is something more than the simple “let’s try to synthesize this design.”

Adrien_Treuille · October 6, 2011, 9:38pm

That sounds like an awesome idea, Slydog. We have been thinking on similar lines, but haven’t worked out the details yet. Don’t worry, though, keep playing and stay abreast of the forums! We won’t do anything without a lot of discussion with the players!

Eli_Fisker · October 7, 2011, 10:25am

Sequence search tool

Another thing I would like to do in a new RNA library system, is to search for a sequence in a string. I would like to be able to hunt down problem patterns.

A sequence tool would make it easier to find propable problem sequence - sequences with a high failure rate. The same will be the case for suspected succesful patterns.

Also I would like to be given statistical data for such a search. With huge masses of data, to be given statistic tendencies, will be useful.

Problem patterns sometimes do work. With sequence search, one might be able to answer this question:

Why it does work sometimes, what conditions must be present for it to work and why does it not work most of the time?

I would hate if we have to write in letters in a line. (like in the present sequence saver in lab) I would like to be asked how long is the string/pattern I want to search for. Then I should be given a visual string of the wished for length, where I can color in nucleotides with the mouse, just as usual for puzzles and lab. This will make the sequence search tool easier to use for non-biology students.

It would save one to look through all the designs, that are not relevant to the specific thing one is looking for.

Adrien_Treuille · October 7, 2011, 11:08pm

This is a really beautiful idea, and we’re going to take it seriously. For tracking purposes, I added it to the bug tracker as feature #529, even though this is a little bigger than an ordinary bug request.

mat747 · October 8, 2011, 2:45am

Hi Dev

rhiju asked
“Now the question for the players is – with that much throughput (10,000 slots/month, perhaps per week), how are we going to analyze all the data?”

I think my idea for “Computationally selected elements” could help other players with the right UI do what I have been doing visually for sometime now.

Mat

Quasispecies · October 8, 2011, 11:34am

mat - i posted an overview of the sequence fragment / substructure library thing that we were talking about the other day. It’s on another thread. Take a look and tell me if it’s similar to what you had in mind.

mat747 · October 8, 2011, 1:32pm

Hi

Yes, very similar to what I have been using.

The "“Computationally selected elements” idea is abit different in how I think the “substructure” could be determined.

Mat

Eli_Fisker · October 8, 2011, 5:03pm

Clarifying comment to my idea for a sequence search tool:

I would like the option to search for both single stranded and double stranded patterns. And the single stranded part of the search tool could search for patterns in loops too.

mat747 · October 9, 2011, 2:31pm

Hi Adrien, slydog

Adrien
Last week you suggested the idea of testing “Negative Hypothesis” designs"

I think the best way for people to explore new ideas is to have those designs in their own category where there are no points given. (designs would be still scored)
The “Negative Hypothesis” and “proposed hypotheses” (slydog) could be tested on the same Lab puzzle in a similar way to the Bots/Ensemble are now with their own categories/slots.

The Hypothesis designs could be displayed/listed in the Lab together with Main designs, similar to the player puzzles area, having separate “Lab candidates only” and the main puzzles.

In the lab when a player has completed their design and submit it, He/She could be giving a chose to select which category they want to submit the design into Hypothesis or Main.

Eli_Fisker · October 9, 2011, 4:57pm

Hi Slydog, Adrien and Mat!

Mat and I have discussed the negative hypothesis and points lately. Here is a short outline from a chat the other day, to underline Mat’s point.

mat747: back to the testing - me and d9 were asking for away to test before things like “negative hypothesis”
Eli Fisker: Good.
mat747: i think the best way for people to explore new idea is to have category were where is no points getting
mat747: the designs would be still scored
Eli Fisker: There could be a point to that. There is a certain shame, when one is doing something that does not work. So fear of failure may prevent some experiments
mat747: yes and no fear of losing points
Eli Fisker: Exactly. I’m against the idea of losing points, for the same reason. Fear is a motivator, but not in here.
Eli Fisker: More people will stay, if we don’t punish, and they will have a nicer experience. This is a game, we should have fun.

So negative hypothesis could be sort of like Market strategies, where we don’t get point either.

Quasispecies · October 10, 2011, 1:32pm

Could you elaborate on how you would define substructures or elements, mat?

slydog · October 10, 2011, 3:36pm

Hi mat and Eli,
Good ideas. Absolutely hypotheses could be (proposed and) tested in a separate section of the lab design competition using the current design competition as a vehicle. Lots of ways to incorporate testing of hypotheses into the lab.
For the design competition part, from postings 6 months ago, it seems that a change to an Elo system is in the works. An Elo system is a rating system that incorporates scoring that can go up or down, a form of penalties and rewards for failures and successes. Like you, I don’t think that would be appropriate for hypothesis testing, but for the design competition, I think it’s right on. With such a system, or something incorporating penaties, participants would be much more thoughtfull when voting. As it is now, it’s stupid not to vote for the higher scoring designs because only those will be synthesized and that’s the only way extra points can be earned. That’s more game-playing than I like. Carefully reasoned choices is what counts. Incorporating both incentives and penalties would, over time, encourage players to become much better at analyzing designs and understanding what works.

Sly