Some sequences are hard to synthesize, and hard to analyze. Why?

Dear all,

EteRNA synthesis team has been recently facing sequences that are hard to synthesize, or even if they synthesize, hard to analyze properly due to unusual outcomes in the data.

These are not so called “Christmas tree” desgins (all GC pairs). They have reasonable pair ratio, and no obvious red flags.

Why these sequences are hard? We don’t know for sure. We suspect it’s the “repetition” of the patterns which can mess up the assembly of the DNA template, or perhaps high rate of "G"s (not GCs).

Here are list of sequences that were hard to “synthesize” or “analyze”

NUPACK designs of “The Finger”, “The Cross”, “One Bulge Cross”, “The Star”, and “Bulged Star.”

http://eterna.cmu.edu/game.php?myType…
http://eterna.cmu.edu/game.php?myType…
http://eterna.cmu.edu/game.php?myType…
http://eterna.cmu.edu/game.php?myType…
http://eterna.cmu.edu/game.php?myType…

Round 5 results of The Branches
http://eterna.cmu.edu/game.php?myType…

Perhaps there are common properties that these sequences share and we are trying to figure out what they are.

We would love to hear our players’ thoughts on this matter.

EteRNA team

The design has high symmetry and short branches which limit the unique pairings available. My understanding is that there is a normal dynamic shifting of structure. Is it possible that there is too much to analyze easily?

I think you’re probably right about repetitive patterns being a problem, at least for the designs that you’re having trouble synthesizing. For instance, in iojp’s “OPRELA bot design”, the patterns AGGA and CGG each occur 4 times. In all, GG occurs 10 times in the sequence. Maybe bot designs are more prone to repetition than player designs, since we tend to deliberately check for that?

As far as why the Branches lab designs seem to be so much harder to analyze, I really haven’t come up with any good ideas. Like Jim Morris, I’ve been wondering about dynamic diversity in the samples, whether this shape is prone for some reason to more shifting between pairings than others have been.

Also wondering about non-canonical interactions in the shape. Something that has struck me in the results so far is that the results in tetraloops haven’t been what I’ve expected. In other lab shapes, I’ve come to expect that using GAAA loops instead of AAAA loops is likely to make the G and sometimes the second A in the loop area look bonded to the chemical mapping reaction. But in Branches for some reason this hasn’t been so predictable – often it seems to be the G and the third A that appear most bonded, or even some combination of the three As but not the G. Weird.

Could the high proportion of very short unbonded sections in the three multiloops be changing the way tertiary or non-canonical bonds are forming?

This is an excellent point. Perhaps repeated patterns made the frequency of alternative structure much higher, making it hard to estimate the final structure?

Yes, as far as I know, the multiploops are most unpredictable by the current energy model we use and that made “The Branches” a real challenge. I think many players tried to leave multiloops with all As, hoping it would minimize the chance of interacting with stacks.

But, why would virtually ALL designs score lower than the previous round? We haven’t seen that type of a dramatic late round drop in other labs. I’m perplexed.

If the exact same design was synthesized & analyzed several different times, would there be a range in the scores? If so, what would be the range?

Regarding the comment: “ Here are list of sequences that were hard to “synthesize” or “analyze.” “

In the lab, is it possible to differentiate between the designs that are hard to synthesize? Versus ones that are hard to analyze?

Thanks in advance.