I’m working on a scripting tool to find examples of synthesized substructures from the results of previous labs. For example you can look for triloops with attached 3-stacks using a search for substruct="(((…)))" which finds nearly 3000 examples.
The “substruct” search field can be used to specify a substructure pattern to look for in the lab candidates. Dot and round brackets can be used with the usual meanings. In addition, a vertical bar ‘|’ can be used to indicate a balanced pruned structure. For example “(.(|).)” looks for 1-1 loops with just the closing pairs and unpaired bases, while “(((.(((|))))))” looks for 1-0 bulges with attached 3-stacks. Likewise “((((|))((|))((|))))” looks for a 0-0-0-0 (4-way) multiloop with at least attached 2-stacks. Adding a leading ‘|’ to the search anchors it to the “hook” area, so that “|(|)…(((((((…)))))))” will match cases where a lab structure is separated from a barcode by two unpaired bases.
You can restrict the search to labs results scoring within a given range by specifying an optional minscore and maxscore value (defaults 0 and 100 respectively).
If you are simple interested in finding which (if any) labs have a given substructure pattern, you can specify soln==“none” and it will list the locations of the structure matches found within the “secstruct” of any synthesized labs (with links to the lab results pages).
You can also restrict the search to lab results where some particular subsequence is found, such as subseq==‘GGG’ for sequences with 3 or more GGGs somewhere (anywhere) in the design (not just in the targeted substructure).
The script speed is still not optimal: it can take 15-20 minutes to search all synthesized lab results for a given substructure pattern, so you need to make sure that you set the script timeout value to at least 1800 seconds (30 minutes) to be safe, then go grab a coffee or lunch.
Possible Future Work
Right now, the script merely finds and displays all matching structures, unsorted and unranked. In some cases, there may be hundreds or thousands of matches. One possibility may be apply a metric like that derived for Computationally Selected Elements to the SHAPE values for the substructures, then group by similar sequences and sort by average rank.
The substruct search pattern syntax also does not have the flexibility as yet to specify stacks of a precise length, so you cannot search for examples of successful stacks of length exactly equal to 4 for example.
Assistance with either of these goals (or other extensions) would be welcome.