How Eterna players can generate 100,000 promising OpenCRISPR designs in just eight weeks

Omei · August 29, 2017, 9:46pm

Well, here we are, with a new challenge and an experimental capacity for analyzing 10 times as many designs as ever before. Wow!

I can imagine that many experienced players rolled their eyes a bit when then saw the goal of ~100,000 submissions, thinking there was no way we could get so many players to generate so many solutions. But really, there is method in this madness. Hear me out.

There are a total of 32 puzzles, most of which have a per-player limit of 150. That means each of us has a potential for submitting ~5000 solutions. I know of a handful of players who can actually do this with the tools they have available. Five players generating ~5000 designs comes to a total of ~25,000 designs, which is more than twice as many as we’ve ever had in the past, but it is still far short of 100,000.

How is it possible for a single player to create 5000 designs in two months? They don’t all use the same techniques, but what they all have in common is having access to programs/scripts to do a lot of the heavy lifting of generating and/or evaluating possible designs. What I want to do is to make one or more of these automation tools accessible to 100 players, instead of only 5.

Here’s an example that has been rolling around in my head the last few days. Say you have a design you think is promising. Maybe you created it, or maybe another player created. In either case, you bring it up in the game and decide on a way to experiment with variations. You mark the bases you would like to modify with the black marker, and load the Awesome booster from the booster menu. You then choose what kind of mutations you would like to see (mutations, deletions, …) and what screening criteria (e.g. satisfies constraints with Vienna2) you would like applied. You click on a button and BOOM! you have a list of all the designs that satisfy your request.

At this point, you can choose to scroll through the designs to see how they look in the game and trim the list down as much as you want. When you are satisfied, you click another button to submit the remaining designs and … (well it won’t be BOOM! ; maybe you should go take a nap) your computer will do that for you.

Jandersonlee has already laid much of the groundwork for a booster like this. But it will take more work to turn it into something that is easy accessible and useful to any player. And that is really the whole point of this post – to recruit players who are willing to contribute their time and skills to making this (or other ideas) into reality. Typically, the skill in shortest supply is Javascript experience. But a lot more than coding goes into software development, like figuring out a really good UI, testing, organizing, cheering, etc.

The floor is now open for discussion. Let the Force be with you.

whbob · August 30, 2017, 2:40pm

@Omei: when I saw the 90,000 + submissions goal, I knew we were about to experience a game changer. A quick look at the puzzles and I thought 900 submissions might be a stretch for me. I set a goal for 30 submissions per puzzle for myself. Not sure I can do it.
I welcome this challenge to muster the players to step the game up a notch.

There has been a great topic going on under “analysis” here in the forum. It’s Brourd’s “Coder Required” topic. It’s an evolving fascinating discussion, but I don’t understand the context of “structure, shape, ensemble, motif etc.”. I can code some basic javascript, but I wouldn’t know where to begin from their discussion so far. It’s my lack of understanding them, not their presentation.

I can imagine two channels of discussions. One by players to propose tactics and choose a few for testing. One by coders to translate the tactics into code and provide feedback to the players if the coding process is not clear. The more clear the mission as defined by the players, the less coders and coding time would be required.

Hopefully, this post will:
1 Identify interested players.
2 get suggestions as to where channels of discussion should take place.
3 Define the mission of each channel with short timeline goals.

PS: The booster link above does not work. It says “could not get page block for …”
I tried the link with and without the last backslash with the same result.

Eli_Fisker · August 30, 2017, 6:32pm

Experiment Template - Science 101

I have an idea that I think can help us players design our own experiments.

Really it is your idea, Omei.

I just transferred and rotated it a bit, just as if it was an RNA design. ;)

Science Template Across Labs

I had long been claiming that the FMN aptamer had a favorite orientation in relation to the switching area. I knew I was right as scores the new labs that had the FMN orientated as I wished to see it, seems to go in the right direction. But I didn’t have a way to demonstrate in a short and easy manner so everybody would understand.

Omei demonstrated across different labs, how the different FMN aptamer orientation affected average fold change and max fold change. And put it in a easily viewable manner.

Science Template Inside a Single Lab

I think we can snatch that template and adapt it to designs in a single lab with different groups of experiments run against each other. (I even suspect the template could be automated later.)

I have put up a google doc with a couple of examples of how to shift Omei’s overview for a lab round until something that can be used to test hypothesis in a single lab:

Science Template

I also put up the labs and their links in a spreadsheet. One could also put up hypothesis like this.

Science Experiment Template

For those who haven’t followed the related discussion, I have put up a collection of hypothesis I think is worth testing. Plus how to mark designs with hypothesis using # (Hashtags)

MasterStormer · August 30, 2017, 6:56pm

I want to point out that designing a ui and really thinking about how it would work from the user’s perspective is just as important as coding it. If you have any ideas about how this would work, they aren’t any less valuable than time spent working on these boosters.

I will post my ideas on this tomorrow, but I will just say for now that I think it would function like foldit, only that instead of your score going up, you’re number of designs will go up.

Eli_Fisker · August 30, 2017, 7:45pm

Example of Experimental Plan

I’m attempting making an experimental plan that allows for running several hypothesis against a starter set of designs.

I am wondering if 15 designs against 15 designs are enough to gather reliable enough data for a comparison? Probably not.

Exclusion lab

I have tried structure it, so I can run two different aptamer positions in relation to MS2, in an exclusion lab (can be done for other than fmn labs). Then I would make two different sets, where one would have 1 long static stem and the other set would have two shorter static stems.

I would also like to test sets with if GU are present in the aptamer gate or not. Aptamer gate - the stem forming at the switching end of the aptamer, when the aptamer is bound to the molecule. This is to test that GU’s are quite helpful in the aptamer gate - which I suspect.

Plan for 120 designs

All in all 4 experiments, testing 4 sets of hypothesis.

4 hypothesis, 4 pairs of 15 against 15 designs. Or 120 designs. For the GU test I would have 60 designs against 60.

However it may be better doing one less hypothesis but have space for more designs and getting better data. Silly me wishing for more slots.

Alternative plan with 80 designs

Omei · August 30, 2017, 8:05pm

Right on, @Eli. Collaborative experimentation, coordinated through the systematic use of hashtags and automated design tools could be treated as separate endeavors. But the synergy of combining them is really much more powerful.

Let me see if I can illustrate with a simple example. I don’t think it is on your list of possible experiments yet, but I have an interest in finding out what, if any, effect the choice of closing bases of the FMN aptamer have on the fold change. Simplistic examination of past labs suggests some bases are better than others, but we’ve never systematically gathered and analyzed enough data to rigorously demonstrate the effect. Let’s think about how we might go about that experiment.

The following is a rough mockup, using existing available screenshots.

Form a core group (at least two players, say) that are interested in the experiment. Give the experiment a name
The group agrees on the hashtag(s) to be used, the question being asked, and the data needed to test it.
The group prepares a presentation for how the general player can participate in the experiment. Here’s how that might work where there is an appropriate booster:

First, bring up an existing design you think is promising.

Click on the lightning bolt and pick the “Mutation Generator” booster (not shown in the picture, but it would become a standard booster, like the “Sequence Stamper”.
Image: https://d2r1vs3d9006ap.cloudfront.net/s3_images/1643109/RackMultipart20170830-73453-1ah61wp-Screen_Shot_2017-08-30_at_12.33.44_PM_inline.png?1504122109490×246 52.3 KB

Image: https://d2r1vs3d9006ap.cloudfront.net/s3_images/1643110/RackMultipart20170830-113587-rmr9yc-Screen_Shot_2017-08-30_at_12.46.38_PM_inline.png?1504122441490×320 63.2 KB

(This booster isn’t written yet, so the screenshot is from one of jandersonlee’s existing boosters, which is close enough to be suggestive.)
Mark the base pairs that are to be mutated and press click on the Mutate button to generate all the mutations that preserve those pairings.
Image: https://d2r1vs3d9006ap.cloudfront.net/s3_images/1643111/RackMultipart20170830-81058-qjramo-Screen_Shot_2017-08-30_at_11.39.44_AM_inline.png?1504122682490×282 69.2 KB
Here’s what happens immediately. All the mutations (35 in this case) are listed in the area at the bottom. One could, if desired, use the Next and Previous buttons to display each design and perhaps modify or delete some. But for this particular experiment, we would want to keep all of them. So all the player would need to do to submit all 35 would be to click on the “Submit” button (not shown in the screenshot), supply the constant part of a title for all the designs (which would include the experiment’s hashtag,) and wait.
How does that mesh with your thoughts?

Eli_Fisker · August 30, 2017, 8:29pm

This mesh beautiful with my thoughts, Omei.

Dream demonstration. I love how you demonstrate the future booster.

And it is probably easier agreeing on hashtags between two players for starters. I even dissagree with my past self.

I think you are right on track with what you wish to see tested.

Just as I suspect the bases right around MS2 have a huge effect on the switch, I believe the ones around FMN to do so. I am currently using some I have generally seen in winning designs. In particular at the closed end of the aptamer.

You have suggested a method to test this systematically and I think we collectively holds the capacity to get the job done.

One more fun experiment

I have been considering just for the fun of it to throw in crossed GU’s (can be done in two ways) at the 4 base spot you have highlighted in your last image - in a set of different designs.(At the non switching end of the aptamer)
This with the specific intention of destabilizing the area and see if we can get the CRISPR part to interact somehow with the Riboswitch part and if that would have any effect.

Eli_Fisker · August 30, 2017, 8:36pm

Additional thought on submission of designs in one go.

It would be helpful having something to tell them apart. So they are not all called the same. So either they get a 1,2,3 etc. Or as jandersonlee’s script does - add a specifier for the specific mutation. So if base 34 is mutated to a G, the title would bear the mutation name G34.

Thought on the FMN surroundings experiment. I suspect it will be beneficial splitting the experiment of probing the bases around FMN into two sets. The 4 bases before and the 6 bases after. Here is why. The 6 bases after - at least for exclusion labs are partly decided by MS2.

Eli_Fisker · August 30, 2017, 8:46pm

Oh, I did get in basepairs around aptamers in the Hashtag Experiments doc. However I was mainly thinking of the closing base pairs. The whole area will be interesting. Adding it in.

Omei · August 30, 2017, 9:25pm

@whbob Without scripting support, I doubt any player has enough time and patience to submit 5000 designs, especially if they put any thought into them. Thus this push for make at least some level of automation available to all players.

I think Brourd’s analysis topic has been very interesting. But analysis has its own requirements that are more complex than sequence generation/submission. I haven’t taken an active part in that discussion merely because it seems like too big a topic for me to come to grips with right now.

I like your suggestions for building momentum. Hopefully more players will continue to chime in.

Re the booster link, it should be fixed now. But you may have to force a browser page flush, or just use this: http://www.eternagame.org/web/script/7466741/.

Omei · August 30, 2017, 10:36pm

Thank you, @Eli. It’s good to have a concrete example to discuss.

I think I understand your experimental plan, and it looks quite feasible if you are doing it yourself. But my guess is that if one wants to recruit lots of players to contribute designs, it would help to break down an experimental plan into multiple, simpler _experiments_.

For example, in the case of your 120 design plan, you have three experimental variables – aptamer, distance between aptamer and MS2, and presence/absence of GU pairs, resulting in your 8 boxes. I suspect trying to analyze the data coming from independent submissions from multiple players who may not really understand your intent would be a nightmare.

What if you were to frame your plan as multiple individual experiments, each experiment consisting of just one puzzle (which would fix the aptamer) and one MS2 position, which you (plural, i.e. core group) pre-determined, along with providing a set of starting solutions that lack GU pairss. You could then present the role of an individual contributor to be choosing a starting puzzle, creating one or more designs with one or more mutations to create GU pairs, and then perhaps do an additional mutation in a static part of the puzzle if it turns out if the sequence is rejected because someone else has already submitted it.

That seems like that might be more manageable, both for individual contributors and for the later analysis.

Omei · August 30, 2017, 10:41pm

Re "It would be helpful having something to tell them apart. So they are not all called the same. So either they get a 1,2,3 etc. Or as jandersonlee’s script does - add a specifier for the specific mutation. So if base 34 is mutated to a G, the title would bear the mutation name G34. ", I agree. I figured that would be the responsibility of the booster script.

MasterStormer · August 31, 2017, 8:51am

How about using names also for data in the excel spreadsheet? For example:
“#DistanceFromAptamer Distance=5 Mutations=23G12C Note=‘Full of buldges’” Or will these be other parameters in the query?

MasterStormer · August 31, 2017, 10:18am

My idea of it is, very similiarly to Jason’s tools, a foldit toolbar with buttons like “Mutate bases”, “Move MS2”, “Insert/Remove Buldges”, “Mutate MS2” (change to another valid MS2 sequence), etc. As you go on it will generate more sequences, and will have a counter for how many you have, similiarly to foldit’s score. When you click on any of them, it will print it to the applet, and there you will be able to edit it, and then replace the old, create a new one with your mutations, or simply remove it from the list.

You will be able to filter them via basic filters (base 50-65 matches the RY pattern you provide, Includes a specific sequence, etc.), or create your own (It’s very basic scritpitng, so even in an in-game textbox).

I think of 2 more types of scripts:

Custom mutation (e.g- add a buldge of size 2, change a random A to G).
Series of mutations (e.g- mutate a random base -> Remove/Add a buldge -> maybe move the MS2 one base -> repeat). I do think the 2 types should be seperate, as that will allow type 2 to have a very simple graphical editor (with dropdowns for every type 1 script), but I’m not totally sure on that
one. We will have to plan more in order to decide.

Maybe type ‘1)’ (and by extention type ‘2)’) scripts could accept filters? For example Mutate the MS2 only to MS2 following the Strong/Strong/Weak ending (per eternac’s comment here

Also, some questions:

Does a query to submit a lab currently exist? If not I doubt it matters for now as for prototyping / early gathering a firebase server could be set up for now.
You said that we can get 100k designs with this setup. However, do you think that duplication of designs won’t be a problem? Especially if a lot of players start with the same sequence. Maybe we should use more than 24 versions of the MS2?

Eli_Fisker · August 31, 2017, 3:23pm

@Omei, thx for your feedback.

“I suspect trying to analyze the data coming from independent submissions from multiple players who may not really understand your intent would be a nightmare.”

I am certain you are quite right.

My experience from round 101 was that the more and smaller experiment sets were, the more pain the analysis.

Also it is a great work around to mutate in the static region of the puzzle if the design a player wants to submit is already submitted.

Multiple Individual Experiments

Let me see if I get what you are proposing and let me try again. Ok so instead of multiple variables tested against each other on individual player basis, we could start off with an experiment design series based on the same starter design. Or it could be considered set zero.

Generate Experiment starter design
Create series on that starter design.

This could be considered first experimental set that one player could post.

Split the variables out so one player only get one.
Several different set of hypothesis could be run again the first design set. Each hypothesis will in itself be a new experiment and a set of designs to submit for a player.

An Experiment design starter example

Since examples are helpful, I have created a an experiment example starter design. I based it in the CRISPR/CAS FMN 1a Exclusion lab.

Experiment 1 - GU or not in in aptamer gate

Basically only one variable change. Namely the GU content.

1a) Set 0

MS2 before the second aptamer sequence #ms2a2
One long static stem #StaticStem1
One GU pair in the aptamer gate #GU+

1b)

MS2 before the second aptamer sequence #ms2a2
One long static stem #StaticStem1
No GU in aptamer gate #GU-

Each of these subsets of the experiments could be done by each a player. So something like 100 designs per set, or they could be run by the same player. (50+50 slots)

That is one hypothesis tested.

However it is possible to test more in the same go.

Experiment 2 - Deleting the turnoff sequence

From round 101 I saw that deleting the turnoff sequence, killed the switch score. (for turnoff sequence - see the image - its the sequence that takes turn to turnoff FMN and MS2.

I had a small set of designs confirming this. However I think it will be valuable confirming in bigger numbers, as it is proof of function for that sequence stretch.

2a) Set 0

MS2 before the second aptamer sequence #ms2a2
One long static stem #StaticStem1
One GU pair in the aptamer gate #GU+

2b)

MS2 before the second aptamer sequence #ms2a2
One long static stem #StaticStem1
One GU pair in the aptamer gate #GU+
Delete the pyrimidine stretch of C’s and U’s, before the MS2 #TurnoffDeletion

Only one variable change - deletion of the TurnoffSequence. And yet a experiment set to make for a player

Remaking of hashtags for specifying order of MS2 to aptamer sequence

I will need to reconsider some of my new hashtags. Something weird happens with my naming of puzzles. I think I have found a lab bug.

On the lab page the design name is far longer.
When viewing the design in the game, everything in the title after "

Eli_Fisker · August 31, 2017, 4:41pm

“Mutate MS2” (change to another valid MS2 sequence"

MasterStormer, I in particular like this option. We lack the option to mutate on the full set of MS2’s. I think it will be very valuable. Even if most won’t be stable, getting those that are in fast and then in one go delete the unstable ones, would be really helpful and ensure that more of the alternative MS2’s gets tested to a more full extent.

On the dublicate part, the design will get rejected if the sequence is already submitted to lab.

Omei · August 31, 2017, 5:49pm

Re your questions:

Does a query to submit a lab currently exist? No. But I should be able to create a script that does that, which any other script can then call.
Duplicating designs when multiple players are trying to contribute to the same focused experiment will definitely be an issue, and not specific to MS2 sequences. As Eli pointed out, the submission process will reject duplicates, but having that happen often would be a disincentive for the player. My general thoughts are that for a focused experiment, part of the experiment’s design should be the designation of some set of bases as being available for mutations that aren’t expected to significantly affect the fold change. Typically, I think this would be part of a static stem where flipping a base pair would not be expected to make any difference in the partition function. How that would actually be implemented, from the players’ point of view, is an open question.

Omei · August 31, 2017, 11:24pm

This seems to me like a big improvement in focus, @Eli.

But now let me bring up another issue, and that is generalizability. Let’s consider the hypothesis you want to test to be “One or more GUs in the aptamer gate correlates with higher fold change”. That’s a general statement, with no restrictions on what aptamer, or what kind of gate it is, or what the reporter is, or what the other aspects of the switch are, etc.

Suppose you were to collect all your data for this hypothesis on variations of the one starter design you you showed above, and they confirmed your hypothesis. Would this be evidence that the general statement above was true? Or would it be evidence that for this specific combination of aptamer, aptamer gate, reporter, blah blah blah, it was true? More the latter than the former, which would be kind of disappointing. The issue is that showing a statistically significant effect in an experiment relevant to one specific switch pattern is not nearly as scientifically significant as one that is generalizable to a broad class of switch patterns.

So I’m thinking your next step should be to decide how broad a class you think you want to generalize this hypothesis to, and then try creating a set of starter puzzles that together is representative of the broader class as a whole. Obviously, the broader the class you want to generalize to, the more starter puzzles you need to form a representative set.

I suspect you were intuitively thinking along these lines when you drew out a complete matrix of test conditions, and you may actually end up with a set of starter designs similar to the ones you would have created for that matrix. The advantage will be that your experiment will be more focused on testing one hypothesis well, as opposed to testing many hypotheses at the same time, and ending up with data that is insufficient to convincingly answer any.

Eli_Fisker · September 1, 2017, 3:36pm

Omei, thanks for your guidance!

I will try to focus the hypothesis.

“But now let me bring up another issue, and that is generalizability. Let’s consider the hypothesis you want to test to be “One or more GUs in the aptamer gate correlates with higher fold change”. That’s a general statement, with no restrictions on what aptamer, or what kind of gate it is, or what the reporter is, or what the other aspects of the switch are, etc.”

The lack of restrictions on what aptamer, aptamer gate and reporter is because I wish to test if it is general to all switches in the whole round (plus future switch elements), also the new ones we have no data on. No matter what their aptamer, reporter and lab type.

Lesson 1: Make a hypothesis that is universal enough that it can be tested on a huge set of different different labs. Not just in one single lab.

This will reveal the bigger view of what is on the puzzle. A single lab only provides a smaller piece of the puzzle. Specifically is only one possible setting of the puzzle is tested.

“Suppose you were to collect all your data for this hypothesis on variations of the one starter design you showed above, and they confirmed your hypothesis. Would this be evidence that the general statement above was true?

Rethorical question. Answer is of cause no.

I did only one Experiment demo puzzle, in an Exclusion lab and with a specific aptamer (FMN) and reporter (MS2), at one specific stem at a specific place in relation to the aptamer. And in particular, I was focusing on only a single spot.

I had a reason for that, but I now see it is not good enough. See the later section: What I really want to test.

"Or would it be evidence that for this specific combination of aptamer, aptamer gate, reporter, blah blah blah, it was true? "

Rethorical question. The answer is of cause yes.

“More the latter than the former, which would be kind of disappointing. The issue is that showing a statistically significant effect in an experiment relevant to one specific switch pattern is not nearly as scientifically significant as one that is generalizable to a broad class of switch patterns.””

I was aiming for detail precision but would miss out on the big picture.That would give me an answer about what is on a microdot on a huge image. While I had planed to do the same on the rest of the labs, it would still only give me spread out microdots of a small part of the full picture.

Lesson 2: Test your hypothesis not just in one specific type of solve in a lab puzzle, but test it in a wide array of settings of that puzzles. And do the same for other labs.

What I really want to test

I wish to know if GU’s positioned in the switching area are beneficial to help the switch get switching easier. Something that jandersonlee suspected and I think too.

What I really want to test is the usefulness of GU’s in the whole switching area and across states and in a huge amount of different puzzles. That GU’s seems beneficial is an old observation that I did in our early switches, where the main part of our first few winners disproportionately had a GU in their switching area.

All this I thought harder to test. Hence the focus on a single GU pair at a single spot in a single specific state. I thought this would be easier to test.

Background posts

GU-pairs in switch designs

Use of GU in two input labs

Also I seen GUs being helpful in the XOR labs. A good deal of winners even had crossed GU’s inside long stems that needed to get switching.

Crossed GU’s are now legit

Eli_Fisker · September 1, 2017, 3:45pm

One hypothesis - multiple experiments

So I give it another go trying to set up an experiment plan.

Hypothesis: Are GU in switching area helpful or harmful to the switching ability of an RNA switch?

1a) One static stem, MS2 before second aptamer, with GU

1b) One static stem, MS2 before second aptamer, without GU

2a) One static stem, first aptamer sequence before MS2, with GU

2b) One static stem, first aptamer sequence before MS2, without GU

3a) Two static stems, MS2 before second aptamer, with GU

3b) Two static stem, MS2 before second aptamer, without GU

4a) Two static stem, first aptamer sequence before MS2, with GU

4b) Two static stem, first aptamer sequence before MS2, without GU

One hypothesis - potentially more insights gained

Things that could also be interesting to look out for related to this GU is beneficial in the switching area hypothesis:

On and OFF states may not benefit equally from a GU addition.
Exclusion and Same state designs may not benefit equally either.
Also in designs that holds a rotated aptamer, the rotated aptamer provokes a different solving style, GU’s may work different in there too.
Different states may differ in their need for GU’s.

However my suspicion that GU’s are a help to prepare a switching stem for switching and help it swap states. Not all GU’s everywhere in the switching area are expected to help. Some will worsen things.

Running the experiments

This could be run several ways:

One player could generate all the necessary designs much like in my Experiment plan and giving each design type 15 designs. (8x15=120)
By several players doing as player one, but following the same template
By giving each a set to different individual players.

Advantages and disadvantages

I have tried sum up what I think may be advantages and disadvantages of each option.

Disadvantage: Although one is testing only one hypothesis, but in 8 different settings, one is still not generating many designs for comparison.

Disadvantage: It takes more work

Advantage: One can get started immediately and others can tag along.

Disadvantage: It takes more work for the individual player to generate 8 set of puzzles - even if over 8 design starter templates, than just generating a ton of mutations in single experiment set based on 1 starter template and one fixed variable.

Advantage: If one player misunderstands the task, the other players can still make good use of the data they generated as one whole set won’t be missing.

Advantage of option two for now. We can get started generating experiments now and have others follow when they are ready.

Disadvantage: Can only be run properly with many players working together.

Future advantage: If there are many players participating in the experimentation, this will be the best option. As one can have several players run the same set then we should not be as prone to missing out on a full set.