Is there a working Bulk Submission script?

@jandersonlee What is the Notes box at the end? I’m unable to enter text. When I click in the field, the dialog closes.

Re: Target structure, this script is manually calling the API to submit, it’s not relying on the game code to do so. The target structure would need to be requested from the game and included in the upload. That being said, I’m wondering if there’s any reason why it would be a poor idea to have a booster function that takes a title and description and uploads directly through the game to simplify things like that.

Re: Puzzle ID and barcode size. You can get the puzzle ID by extracting it from the URL via the regex code I posted earlier in the thread. There’s a booster function I believe called get_barcode_indices that will return an array of indices that need to be unique (which will correlate to one side of the barcode) - note that the barcode can vary in length as well as position

1 Like

@DigitalEmbrace The notes box at the end is for any error messages that might accumulate during submission to be passed on to the user. For example a submission might be a duplicate of one already submitted or a barcode might alredy be in use. Try submitting twice and you should get several errors on the second try.

1 Like

I think I’m burnt out on this for now–too many sleepless nights with my sister-in-law in the hospital. v3.1 is somewhat usable for this lab minus the Target issue. Even for that it can still be used for the mutate/prune/delete portion and then manually step through with goto/next to submit via the game. As for the barcode size/placement generalization, I leave that as an exercise for the student.

Folks in my lab (primarily @rkretsch and @gracenye8 ) are preparing scripts to design barcodes that work in python, using the ARNIE wrapper to model folding with the different barcode hairpins.

Just made public here:

Current use cases include “Creating library from already prepared sequence list”, “Creating libary of sliding windows”, “Creating library for M2Seq (all single-mutants)”, and “Creating library with all single-mutants and select double”.

If folks are interested in piloting a “Creating library for Eterna bulk submission” and helping make the codebase more robust, that might allow others both inside and outside Eterna to also make use of similar scripts.

Feel free to tackle issues (Issues · DasLab/big_library_design · GitHub) and submit pull requests!

(I haven’t yet finished reading all of the later comments)

I had the exact same problem with the turtles-all-the-way-down-ness of the async, and you can’t ever use it in a 100% sync fashion, even with then…
I found success in Eterna - Invent Medicine. by using a then in the top level function, and refactoring the barcode generator at the top of the script, so that you will be able to use the generator in a 100% sync fashion inside of the then function.

The version of my generator you tried to use probably had a lot of broken generators, so that might have been part of your issue (I’m mainly working in python these days, so it took me a while to figure out how it works in js).
Also the function naming is hopefully better in 11653717 than in the original script - and either way you can see which function I used there, and it should be enough.

(To be a bit clearer - in my script each mutation includes a barcode as part of it, and each barcode is unique (currently I use nextSequence and make sure that it isn’t used). I don’t yet generate the pairs of the barcode)

@MasterStormer unfortunately nextSequence seems to generate fairly poor barcodes once you start adding constraints to avoid 3’ conflicts with the barcode. I found generating a new random sequence each time seemed to produce better results. Even better would be to chose the next character based on the previous 2 or 3, but I’m a lazy coder and cycles are cheap when it comes to generating barcodes. I’m also mostly doing Python these days so writing JavaScript is probing a few quiescent brain cells.

When we finally run the boosters in the sandbox, maybe we can start writing them in python :stuck_out_tongue:

In my script I made sure it respects the repeated bases constraints (although not efficiently, and I don’t check the other side of the hairpin… this sure is hard…)
Are you talking about these constraints? or something else?

About the barcode generator, I’ll see what you and I end up using, and refactor the original script/library to have what we need with much clearer names.
The version you played with was very much a broken POC.

I originally had an idea of having the mutations booster “import” the barcode generator, but that probably is a bad idea, since changing the barcode library will break the script.

And about the mutation/submission booster - there are a couple of versions of it scattered around, and they contain a lot of code from almost 7 years and multiple developers.
I’ve been thinking of refactoring it (e.g. move more stuff to classes, store more data in properties instead of in the strings), and also maybe tracking it in github alongside having each version sitting on the site as an eternascript (although I need to make sure it doesn’t raise the barrier to entry…)

Do you have any thoughts about this?
I will never have time to change everything at once, but I am interested in making small changes iteratively, if you’re fine with that

@MasterStormer I tweaked your constraint routine to change the constraints for the barcode:

  • no more than 3Gs, 3Cs, 4As, or 4Us in a row
  • avoid GUU, UGU, UUG which pair with the AACAA spans in the 3’ tail
  • avoid CUU, UCU, UUC which pair with the AAGAA span in the 3’ tail
  • avoid SAA, ASA, AAS whose bar code complement pairs with AASAA in the 3’ tail

It throws out a LOT of barcodes, but the dotplot of that region is much cleaner. Coupling those constraints with nextSequence caused a lot of inefficiency, so I just generate random sequences. There are still millions of them, so the chance of picking two already used barcodes in a row is small - more are thrown out for the constraints.

If you wanted to be less strict you could eliminate UUSU, USUU, ASAA, and AASA which would still be helpful, but less so.

You are welcome to do anything you wish with it. I contribute occasionally as the original author and occasional hacker of that POC, but I have no illusions that it is decent code, it just mostly accomplishes the desired goal. The POST strategy for submission was hacked from someone else’s code (Nando)? The checking and submission code would probably be best refactored to use async/await or async/callback but I don’t have the time or energy or neural cycles to do that. What I did over the last few days was mostly out of a desire to have something that worked for the current lab, not to clean it up for more general use. Tag, you’re it!

I see, I didn’t go that far. Seems like you need to know a lot about what you’re dealing with to choose them.

Seems to me like you fixed the potential of picking already used barcodes in your script; maybe I didn’t understand something about it correctly?

And @DigitalEmbrace said something in Discord about incremental barcodes causing less random misfolds - is that still relevant with all of these constraints?

Wait I don’t have the time to be It either -

Well let’s see what I’ll have the energy to do :slight_smile:

I did manage to get some used barcodes async code working late yesterday (based on your original WIP starter) in v3 and v3.1. I don’t know about incremental barcodes causing less random misfolds, but I do know that nextSequence did NOT mesh well with my additional constraints. If it got a bad span (like UCU) at one end of the sequence it could take MILLIONS of iterations to change that portion.

Anecdotally I don’t seem to notice many conflicts with the “random” barcodes in looking at the dotplots of mutation results I’ve run. It could be that avoiding the spans I chose is also helping against misfolds, but I have no quantitative analysis nor even any thought experiments in that regard. The constraints were selected to avoid problematic spans I’ve seen in the past with manual generation of barcodes.

The barcode constraints jandersonlee devised are very effective, let’s go with those! I didn’t go into that level of detail when I requested MasterStormer update the booster for players. When I saw his incremental approach, I figured most players using the booster would begin with a “smart” barcode and likely not run into a problem. Plus, most hairpins formed in the pilot SHAPE data.

Some notes on using the Mutation with Random Barcode Booster for those who have not used a similar booster before. Open the script page https://eternagame.org/web/script/11657798/ and click on the Favorites icon (a Star) near the top. This will add it to your list of available boosters in the Booster menu.

Now open/create a design you wish to use as a start for mutation.

It helps to minimize the “Design Information” side panel by clicking on the circled (i) icon near the upper right if you have not already done so. (You can always restore it later, but it tends to get in the way of the Boosters menu.) It also gives more screen real-estate for looking at the design when you zoom in.

Select the Mark Bases mode from the bottom toolbar (the dot with a circle around it). Now click on one or more bases to mark/unmark them. If you mark 1-4 bases the booster will try mutating all 4 combinations of each base in parallel for up to 4**N combinations. If you select N>=5 or more bases the booster will try N*4 mutations serially. If choosing no more than 4 bases it sometimes helps to choose paired bases to try for alternative pairings; if choosing paired bases, it helps to pick no more than 2 pairs (4 bases) at a time.

Having marked some bases, now open the Mutation booster by selecting it from the Boosters (Lightning bolt) menu. Be sure to use the v3.1 version for now. You only need to add the Booster to the lab tool once per session (and set it as a Favorite once), even if you do multiple mutations. If you completely leave the lab tool you will have to open it again, but if you enter “View all submitted designs…” mode you can click on “Return to Game” or “View/Copy design” to get back to the session with the booster active.

When the booster is active it will add a small form to the bottom of the lab tool. If this is in your way you can collapse the form to a single line by clicking on the “Mutation Generator with Random Barcode” titlebar. To get it back, click on the title bar again. (Try this now if you are unsure.)

If you forgot to mark some bases, collapse the tool and mark one or more bases now. It may help to zoom in with the Plus key (Shift +) a few times so that you can see and click on the bases better. Pan the view by dragging with the mouse while zoomed in. If you zoom in too close, use the minus key (-) to zoom out. Be sure to select the EternaFoldThreshknot folding model, at least for a first pass with mutations of more than one base. The NuPACK folding model works so slowly that the booster will often time out and it quickly becomes frustrating. If you want to watch the sequence change shapes as it tries the mutations, also be sure to select Natural Mode (the Leaf icon).

Now that you have one or more bases marked and are using the EternaFoldThreshknot model and are in Natural mode, you can re-expand the Mutation booster and click on the Mutate button to start a mutation run. The booster will quickly generate a list of 4**N (N<=4) or 4*N (N>4) mutated versions of the sequence and then start cycling through them to try them out. Don’t interact with the game while it tries them. It should go fairly quickly if you are using the EternaFoldThreshknot model. (You did pick that one, right?) Once it is finished it will stay showing the last sequence and the line number box will stop incrementing.

The text box at the bottom of the window will now contain a list of “sequence,mutation,trueORfalse” lines. Each mutated sequence should have a different, currently unused, random barcode assigned to it. Sequences that failed to meet the constraints will have “false” in the third column; those that fold into a pseudoknot with the current folding model and meet the other constraints should have a “true” at the end of the line. The mutation field has a brief description of the mutation that was done (the bases that were changed). The original sequence should be on the first line and have an empty field for its mutation description.

To return to the original shape, you can type a “1” in the line number box next to the GoTo button and then click Goto. You can also traverse the list of sequences by using the Next and Prev buttons. If there is a design that you don’t want to submit, you can click on Delete to remove it from the list. If using Delete it helps to start at the head of the list by doing “1” GoTo first as then you can walk the whole list just using Delete and Next. If you want to drop all sequences that fail to satisfy one or more constraints you can also use the “Prune” button before you start to walk the list or at any time after.

Having Prune(d)/Delete(d) all of the sequences that you don’t want to submit you can either walk the list one more time and manually submit each sequence the usual way, or you can do a “bulk” Submit.

Clicking the Submit button will prompt you separately for a TITLE and then a DESCRIPTION. The DESCRIPTION will be the same for all submissions from the same batch. The submission title will be different for each submission, composed from “TITLE #Mutation MUTATION” where the TITLE is what you entered at the prompt and “MUTATION” is whatever is in the second comma separated field on the line holding the sequence. The booster will then step though each of the lines and submit them one by one. After it has finished the submissions it will display a “Notes:” pop-up with any error messages accumulated during the submission process. If it just says “Notes:” then all sequences were submitted; otherwise it should tell you what failed to upload and why. Close the Notes popup.

That’s it! Feel free to pick new bases and mutate again, or go to “View all designs…”, select a new design, and click “View/Copy design” to pick a new starting sequence to mutate. If you completely leave the lab tool you can reload the booster when you come back.

If you feel so inspired can also use the Mutation tool to check or submit externally derived designs. It does not allow you to specify a target shape for bulk updates, but you can specify a short title suffix by pasting a list of sequence,suffix lines into the text box instead of clicking on Mutate. (Select and copy the list into the cut buffer then click within the text box and use ctrl-A + ctrl-V for Windows or cmd-A + Cmd-V on a Mac.) You can then step through the designs with GoTo/Next/Prev/Delete to check and select them.

Once again you can manually submit the designs by stepping through them (in which case you can change the target shape first), or bulk submit in which case the titles will be generated as “TITLE #Mutate SUFFIX”. For example you could enter the source pseudoknot reference id such as PKBnnn or “Mod NNNNNNNN” as the TITLE and the unique difference of this sequence as the SUFFIX. The DESCRIPTION will be the same for all submissions in the batch and could describe the source of the sequence and how the modifications were made/selected. When submitting batch designs the tool does not generate random barcodes so make sure they are unique first. As you step though them to review them you can always flip barcode pairs to make them unique if they are not already.

3 Likes

Thank you for explaining so clearly how to use this script. Bulk submission were quite a mystery to me.

1 Like

@MasterStormer This is the next feature players need added to the mutation/submission booster, if you can figure out a way to save the custom target structure from the puzzle when submitting a mutation series. We need this piece of information saved for later retrieval and eventually comparison with experimental results.

If there is a booster hook for submitting a design instead of the POST hack that would probably be the best way to resolve this.

@LFP6

Assuming that:

applet = document.getElementById(‘maingame’);

Are these the correct booster callable hooks to:

  1. Get the current target shape
    applet.get_targets()[0].secstruct
  2. Get the base pairing probabilities
    applet.pairing_probabilities(seq,constr); //constrained
    applet.pairing_probabilities(seq); // unconstrained
  3. get the sequence string
    applet.get_sequence_string();
  4. set the sequence string
    applet.set_sequence_string(seq);
  5. get the MFE shape
    apple.fold(seq); //unconstrained
    applet.fold(seq,constr); // constrained
  6. compute the energy of a shape
    applet.energy_of_structure(seq,shape); // does NOT work for pseudo-knots?
  7. determine the locked bases
    applet.get_locks();

Are there booster callable hooks for:

  1. Set a target shape
  2. Submit a design with title and description

I’m

  1. open to using/setting the target shape if available
  2. considering an ArcKnot variant of ArcPlot for pseudo-knots