Looking for Player Input on Current Lab Round (R107)

Just want to put it out there that I’d prefer even less time between rounds - at the very least, there should always be SOMETHING for players just finishing the tutorials to work on.

@whbob To be sure, it’s sad to see players’ creativity reduced, either by cutting their submissions or by not letting them submit their designs in the first place.  This really gets to the question that Johan asked above.  Do we think there will continue to be enough player involvement at this point in Eterna’s evolution to justify increasing the number of slots per lab?  (Basically, by buy bigger chips.) If you have more thoughts on that, it’s probably best to write them as a comment to Johan’s post, where they will be certain to catch his eye.

@LFP6 That’s always the goal.  For something like the final round of OpenTB, it should be pretty easy to achieve because the puzzles are already defined.  It was uniquely hard for the current round because Johan and Nando had to sift through the literature to figure out what aptamers for multiple small molecules would make for appropriate puzzles, meaning they had to satisfy diverse constraints in both the game and the lab experiment.

@rhiju reminded me about another idea, one that Will Greenleaf had at the last dev meeting, that would be an alternative to bringing back some form of voting. That idea is to ask a player, when they are submitting a design, to give an estimate of what its Eterna score will be.  The reward for the design would then take into account not just the score, but also how close the prediction was.

If you’re wondering why this would be an alternative to voting, it is because we are not looking at bringing back voting primarily as a way of influencing what designs are synthesized, but as a way for seeing (and improving) how well a players understand what makes a good switch.

What do you think about that idea?

I like this idea. It may be interesting to see how players would rate their designs, and then compare those predictions to the final scores. It may also be better if that estimate is for a 10 point range, given there is some degree of uncertainty in the scores.

Well the whole point is for us to make a good design so I see no reason to rate the untested design. We test them to see if they are good or not.  

@Astroman None of us has a perfect of how our design will do.  But it is important for the scientific process that we try to improve our understanding, and hence our ability to predict (as best we can) what makes a design do what we want.  If players simply submitted every design that NUPACK said would fold in the right way, we would never learn anything that NUPACK didn’t already know.  And in that case, there would be no need for players, as a computer could randomly generate NUPACK-approved solutions much fast than a human, and we could just choose the ones that scored the highest.

Speaking from my own experience, when I’ve created a design that I like, I usually do multiple variations.  If I were a perfect scientist, and paying to have each design synthesized, I would hopefully formulate a hypothesis for each of my variations, record it in my lab notebook, and make a prediction of what effect my mutations would have on the results. Then when the results came back, I would analyze the results with respect to my hypotheses, in order to see which of my them I should reject and which I should tentatively accept.  Then I would repeat the process.

Now I certainly don’t have that kind of discipline, and I doubt whether many, if any, players do.  But taking an extra few seconds to write down a number representing whether a variation would be more likely to raise or lower the score over my original would certainly be step in that direction.  And if a player doesn’t really care to think about it, they can always just take their average score from the previous round and use it as the estimated score for all their designs.

yeah, it makes more sense to do it in the second stage of each round after results. The first stage of a round it would be more of a guessing thing I would think. However you guys decide to do it is fine though!

In this case, I doubt it even could be in this first round because the devs couldn’t get it ready that quickly.  But I certainly expect experienced players will do much better with these new ligands than they did in the first round of the FMN switch, because collectively we have learned quite a bit about RNA switches. At least some of that knowledge, and hopefully quite a lot, will generalize to these novel ligands.

Oh, are these the same puzzles because i did okay on the fith round 101 of fmn riboswitches>   http://prntscr.com/e0irqi

I think round two was my first lab I just missed the first one I thought these seemed familiar!  i have made all new designs so it will be interesting to see how these scores compared to them! I will go ahead and guess my high score will be better than before. 89-99ish  (:

Filling slots…is a pre-synthesis task with the goal of improving synthesis results. An objective measure would be desirable. A proven correlation to better synthesis scores would also be desirable. Finally it must be immediately available to all players from within the game presentation itself, ie. something the game itself shows you, not something calculated off-game or by a script that not everyone has running.

So, what does the game show me that I can use? In this round we have 5 green or red boxes in common to each puzzle that I use: state1 & reporter, state2 & reporter, state 1, fixed site state1, fixed site state2. In addition box % AU is available. There are currently 3 solving engines, Vienna, Vienna2 and Nupack, so a total of 15 boxes to score, excluding the % AU box reserved for ties or other discrimination. Penalty box 4 nt’s in a row also excluded or used for exceptions.

A perfect score would be 15 green boxes, i.e., all 3 solving engines agree. (Some current solves meet this criterion.) More often 2 of 3 engines agree and some portion of the 3rd’s boxes are green. So, a max score of 15 boxes is available. Often only 1 engine solves and sometimes none. (Thinking outside these boxes is penalized at this point which may be resolved later.)

Count first the solving engines, 3, 2, or 1. First pass rank 3>2>1.
Next count total boxes green. Second pass rank 15>14>13…etc.
Next rank boxes (and errors) on relevance to problem. Bound molecule in state2
with fixed state is better than bound molecule and failed fixed state. (Measures switchiness of fixed state.) We want our solutions to switch and bind the molecule. Partial fixed/unfixed sites can be evaluated by correct Nt % highlighted.

Next, did the reporter form/not form in state2? We may bind the molecule and switch the fixed site but get no indicator. Same for state1. Does the reporter site switch?

Rank reporter site2 > reporter site1 or vice versa as applies to puzzle.

Assign points and totals. Decide on energy use, does |-E1|>|-E2| really matter or is it also a tie breaker?
Assign points and score.

This is how I am currently evaluating designs with additional consideration to how many nt’s need to change in a design to make it work in missing solving engines. If only one engine is missing and only one nt needs to change then that is a very near miss! I also consider the secondary shapes as displayed by the game, giving 6 total points to perfect agreement in both states. So, max score 6, min score 0.(This has actually happened, in my recent submissions.) Finally I look at the dotplots and see if they mostly agree on the major structure probabilities and they often do agree (even Nupack).

Someone should implement code for such an objective method and then correlate to synthesis scores, especially of winners to gauge usefulness.

Still it would be an objective scoring method for slot assignment.

A fuzzy logic representation of above algorithm would be helpful.

The similarity between the current puzzles and round 101 are that they use the formation of the MS2 hairpin as the output, as opposed to the reporter RNA in the OpenTB puzzles.  

(When we are using MS2 as the output indicator, what happens is that the hairpin tertiary structure, when it forms) binds to a moderately large MS2 protein, which in turn is covalently bound to a fluorescent molecule.)

1 Like

An example of my current round XTheo B #30 as a guide picture of the scoring idea above. I would score it a perfect score. YMMV. ;~)

1 Like

Thanks for the detailed process you incorporate into your designs!
it sounds like a great system! I have heard non-switching static stems in designs are a good idea so i try to put them into my designs W/ the hashtag #non-switching-area.

Thanks for this information! So these use ms2 as oppose to the reporter in TB for output indication. Interesting, Thanks

The players have the ability to supply solutions (as they are doing in R107).
I think their desire to play and your desire for puzzle data equals  add more capacity.  
From a practical standpoint though, some tasks may need small capacity and some large.
With the first OpenTB R104, the easy A/C & B/C puzzles, scores of 80-100% came after about 400 to 500 submissions.  R105(Round 2) had high scores in the first 20 submissions. R106(Round 3) high scores for A/C came in the first 20 submissions and B/C came in at about the 100th submission.
I think that feedback matters a lot. First rounds may need to be large just to cover as much diversity as possible.  With feedback, each additional round may need less capacity because bad designs are weeded out early.  

Would more capacity mean just more slots or longer sequences?  My guess is just more slots.
 

Re more slots or longer sequences, there are chips that could do either or both.  But Johan was asking about increasing slots for the current round; the sequence length won’t be changing.