Player/dev collaboration needed to create an effective, rapid, in-round Select-Test-Publish-Analyze cycle

There are a lot of steps needed to transform our success in OpenTB Round 2 into a working diagnostic. The Gates foundation has given the Das Lab a grant for the “hardware” part of that transformation.  But when the dev team started thinking in more detail about the whole process, we realized the weakest link in the current process is getting rapid player feedback, in terms of new designs that address changing requirements, as the engineer/experimentalists narrow down the detailed design of a point-of-care device.

The most obvious case-in-point of this is getting array results for the Round 4 OpenTB designs. This has been an issue because Johan Andreasson, the post-doc who helped pioneer this technique and who personally conducted the Eterna array experiments, has gone on to a new job. He is still available for consultation, but can’t spend the time needed to guide the experiment through its many steps. Fortunately, Feriel Melaine, the new post-doc who has successfully replicated the array results for the AK2.5 design, but with a bead-based experiment, has now agreed to take on the task of getting us array-based results for the Round 4 designs.

But in addition to that, we realized that we didn’t really have adequate structure in place to do a rapid test cycle on the order of one every two weeks. The lab believes they can get their part of the work (receive the list of designs, order and receive the DNA templates needed as inputs to the experiments, run the experiments and return the data to players) in one week.  But a two week cycle implies that players would then have only one week to look at the results, analyze them, disseminate that analysis to other players to create and submit new versions of their designs and then collectively decide on which designs should be submitted to start the next rapid feedback cycle. We have no precedent for having done that before.  Now, we need to create a process to do that, and we’ll be using the extended Light-up Sensors project as our testing ground.

Hoping to help organize the discussion a bit, I propose subdividing the discussion of the player part of the process into some individual steps. Starting with the next step falling into players’ laps with the first 6 designs (currently in the lab):

  1. Players (typically the more experienced ones) analyze the results and try to distill what it “means”.
  2. Analyses are disseminated in a form that is easily accessible by all lab players.
  3. Players submit new designs based on what they think they have learned from the analysis and subsequent discussion.
  4. Players select a new set of designs for the next rapid feedback cycle.
    I’ll immediately follow up with a separate “Reply” on each of these substeps, and if that organization fits with what you want to say, you can “Comment” on the corresponding “Reply”.  But there’s no need to feel constrained to that structure if you choose.
8 Likes

@omei, nice post in game! Time to fill out the placeholder text in this getsat post. =)

Analysis step

What techniques or tools do you useful in analyzing results?  What can we do to make it easier for other players to use them?

1 Like

Dissemination step

Standard practice would be to create a forum topic for the rapid feedback round and not try to further organize the discussion.

My impression, though, is that most players who are submitting designs do not routinely follow a forum thread, if they ever see it at all.  There must be a better way to reach more players. But what is it?

Design step 

What techniques or tools do you use to analyze results?  Is there something we can do to make it easier for other players to learn about and/or use them?

1 Like

Selection step

One thought here is to create a way to add automated bot analysis to player analysis.  We could simplify the creation of evaluation bots, so that players need only to express the evaluation function into code. For example, it would be reasonably easy for the dev team to write a generic evaluation bot that a player could tap into with a minimum of scripting skill:

Suppose you thought that having a specific  range of GC pairs in each state was important for a good switch.  You might write an EternaScript as simple as this.

function Evaluate( design ) { 
if (design.state1.GC\_percentage \>= 50 
and design.state1.GC\_percentage \<= 70
and design.state2.GC\_percentage \>= 40
and design.state2.GC\_percentage \<= 60)
score = 100; 
} else {
 score = 0;
 }
return score;
} 

(Of course, advanced scriptos could write a much more complex evaluation function.)

You would then tell the player-led development team the ID of your script, and it would be added to the list of evaluation bots.  On a regular basis, the team would update a spreadsheet and/or fusion table that evaluated all submitted designs with all bots.  Players could use these predictions to inform them in selecting designs for the next rapid feedback cycle.

I think we could develop scripts to aid us with the analysis.

I seen a template box in jandersonlee’s lab slot count script.

I think we can eventually even use scripts to help us do lab analysis, based on experiments we set up according to data we input in a script which we get out in a template.

Like what Omei did in a table. I wish for a simplified script that could do part of this. Like calculate fold change average.

So if we set out experiments up in a particular way - like making design series that test different settings - we could get a script helping us summing up the data and see what the trends are.

This is an ingame way of doing what some of us are doing in google sheets and fusion tables. 

What should such an analysis script do?

To make a simple illustration. I make two design series with the same amount of designs. 

  • Two set of comparable designs changing just one thing. Like orientation of the one aptamer. 

I place the two MGA sequences around the tryptophan aptamer. I want to know what is most effective. Placing the big (longer) MGA sequence before the aptamer or after. 

MGA (green), Tryptophan (orange)

So I make a series of 10 or 20 each. 

When the results come back I want to know which approach did the best compared to the other. I try get this info by watching a fusion table or a google spreadsheet.  

1) Compare 2 or more design series

I pull the future analysis script. I tell it which lab round or lab. I give it the name of my two design series in the script input field. 

  • An input field could be used to call two design series by their title. Then all the designs in the series should be read in. There should be an additional field to limit by player, to filter out potential moods. 

2) What could be calculated?

  • Score average
  • Fold change average
  • Global fold change average
  • Max fold change error rate

Just with a 3-4 things like score, fold change average, global fold change and a limit imposed when it comes to fold change error, would be a valuable and quick way to compare two design series. The script will do the averaging. This will help make trends stand out between different data sets. 

Later there could be options as to what lab data should be used for input. 

I think we may eventually be able to use scripts not just as lab design tool, but also for proving stuff and getting answers to trends. 

Benefit of script analysis

With just a little preparation of how we put up our lab slots, we can standardise the input (design series) in such a way that a script could help us run analysis on what we have done in lab. Just naming of these design series so we will be allowed to them out as a group for analysis. 

1 Like

Loose thought. We get a heads up when there is a new message in our inbox. 

Typically the related forum posts are already linked into the lab description of the current open lab. I wonder if some connection could be made, so there is a highlight, when there has been a new posting in the forum post related to the lab.

If something could be attached to the forum post links, so there is automatically a highlight, when there is a new post. 

1 Like

I would like to have the predicted structure of each state of each design included with results in an excel or google docs spreadsheet. This would allow me to much more easily write scripts for and analyze the data on my own. It would also be helpful if the spreadsheet included the comments so designs could be filtered by hashtag.

@Meechl, that’s very doable. Quite a while ago, LFP6 wrote a script at my request to calculate and list the foldings for a puzzle, including switches.  My request at that time was pretty narrow (e.g. only evaluate puzzles that had been synthesized), but tweaking it to change what designs are displayed and what is displayed about each design is small potatoes compared to what he spent most of his time on (i.e. asynchronously interacting with the flash app’s folding engine.)

I made a copy of his script at https://eternagame.org/web/script/8959810/ which I modified to (by default) process all designs, submitted or not.  There’s a spreadsheet from the results at https://docs.google.com/spreadsheets/d/13i3xo_u5zr3AtispmZkIv9ibe1ftHxoJBobZi5Q9J1Q/edit#gid=3521795…

Let’s discuss what other script changes will be most useful to support analysis.

1 Like

(The above spreadsheet only includes one puzzle, Tryptophan A Same State (MGA).) 

Certainly the first thing to add is the design name and description, since many players add hashtags there.  I think we should add comments, too, so that players can add hashtags to other player’s designs. Reporting those will probably slow down the script a bit, since with the current API that is a separate API call for each design.

Right now, it takes about a second per design to get the folding. Even at 12,000 designs, that would be about 3.5 hours, which would be pretty tedious if we wanted to update the spreadsheet once or more a day.  But it shouldn’t be hard to save the last design ID for each puzzle in the browser’s local storage so that as long as one person is always running the script, the script could only process newly submitted designs. Hopefully we will automate the process even more than that if the spreadsheet proves its worth.

Hm. I just noticed in the spreadsheet there are duplicate lines for most design IDs. I see the duplicates in the script output too, but they don’t come out consecutively.  I’ll look into that.

I would have to agree with Meechl that having structures for each state is useful, and more than that, having the structures for each state for each engine as well. Just to add statistics available for your average player (whether or not they are useful I really don’t know), may be to have the pair counts and delta pair counts for each state.

1 Like

I’m glad there is a way to get the folding. I’m mostly interested in having the structures after we have results, so perhaps the script could just be run after designs are sent off to be tested. I agree with Brourd that it would be nice to have structures for different engines too. 

The factors that I currently use most are (1) a ranking of the top designs in each sublab (top 10 to 20 would be nice), and (2) some basic data including sequence, id, player, title, and the mfe for each state, plus (3) the eterna score and sub scores for each category (base, folding switch). After that I start looking at dot plots and doing mutation rounds, which is largely a manual/visual step at this point. For me it would be nice to have this data available in a textual tabular (e.g. .csv) format for external processing. But as I move forward with EternaScript and Booster design, having an API to get it in the scripting environment might be useful. At the moment, must of my time is spent collecting and organizing this data and constructing links to be able to view the designs in the lab design tool. If there was some way that I could go straight from a list of the top designs for a sublab into an edit session with the ‘u’ and ‘d’ keys working to switch between designs that would be great. Also, rather than just raw score as the ranking, I like to throw in designs that have ‘different’ folding styles, so having some measure of how similar/different two designs were might be useful. And some way to pick and chose designs of interest to “me”.

Although we may ultimately find some metrics that work, I’m not yet convinced in this regard, still it may be a fruitful line for some to pursue. What might be interesting though is for some of the “bot” designers (e.g. ViennaUCT, SaraBot, …) to express what metrics *they* use in a way that might help players to do similar manual or assisted design.

I tend to use the lab tool a lot, plus some of my own javascript-based tools. A first step is usually to look at better scoring designs and pick out a few that are sufficiently “different” from each other to warrant parallel mutation – i.e. not all eggs in one basket. (I don’t tend to do much original design these days.) I use the Mutation/Submission Booster a lot for this step, typically after manual edits to strengthen necks/stacks aided by the dot plot data. I’d like to be able to prune not just on whether or not it passes a design engine, but on the delta between the MFE in the two states. What might also be nice is if players could vote on designs as potential targets for mutation. This might help less experienced players get a hint on what to try to modify and could be useful if you thing there are more potential targets than what you can manage on your own.

2 Likes

The comments field for designs could possibly be used after the lab for annotations regarding a design. The votes field could be used as a like function. That way people could comment on designs they think might be worth of study/mutation. Having a way to rank the designs by number of likes and/or comments would make it easier to find this information.

@Brourd, Meechl, jandersonlee – Lots of good suggestions here!

As a concrete step forward, Gerry Smith and I have agreed to work on an effort to use past lab switch data to create a Vienna2-based prediction engine for fold change. This should be (hopefully significantly) more helpful in selecting designs than the current in-game Vienna2 Switch/NoSwitch prediction.  But even before we have a full predictor, we should be able to provide CSV files with various predictions like MFE foldings and energy values for others to use in their own way.

We’ll be focusing this effort using a newly created Player Led Development Slack channel named #vienna2-predictions. We’ll publish our progress here and/or with in-game news posts. But If you would like to actively contribute to the effort, ask LFP6 or me for an invitation to the Slack team.

perhaps we could agree to all use one spreadsheet or fusion table, and individual scripters could add in columns based on the outputs of their favorite bots.