[Strategy market] Strand repetition ban

I have been angry with a couple of patterns for a long time, but wasn’t sure how to hit them best. I think CUC and GUG is bad and so is UCU and UGU. Or at least too many of them, especially if together.

What I have found so far is that these patterns are badly tolerated in designs with strings 5 nt’s long and shorter. These patterns are less prevalent, though not totally absent, in the winning designs compared to lower scoring designs, in labs with short stringed designs. Eg. Binding branches and Branches lab.

The more there are of GUG, UGU, CUC and UCU, the “better” chance there is that they find something else to pair up with. Most designs can deal with one or two of them, sometimes even more. Sometimes even one or two is enough to make trouble. (Note: The two first patterns in the pictures are not to be totally avoided, just moderate the amount of them.) CUC have a habit of making trouble if two of them are close together in the design. Not that they pair up with each other, they are just prone to mispairing. I don’t know the chemistry behind this happening, but I think the repetitiveness makes them bind bad to the intended opposite strand, if in a short string.

And if there are more of them together, things often go really bad. If they are in continuation, like CUCUC, UCUCU, GUGUG, UGUGU, it adds to the problems. Also it seems that some of the problems when these patterns arise when they touches the junctions, which they will do easier in shorter stacks.

After discussing the strategy with Brourd, I realized that those 4 patterns I mention are really just 2. They just start from a different place. But they behave a bit different depending on their starting point, so I will still treat them as more than 2 patterns. UCU and UGU are rarer and often worse when occouring. Brourd said: According to you then, the switches are all going to fail :stuck_out_tongue: since most have UCU. He do got a point. But as long as it is in the hairpin, it is not necessarily bad. We don’t want the switch to be in hairpin position always.

I just think this pattern has a habit of splitting strings, that the repetitiveness makes them bind bad to the other side of the string. That could be the case for other repetitative strand patterns too.

I saw the pattern with long lines of blue and green do well some time earlier, especially in the longer strings, if under a certain lenght. And back then, I thought it had some kind of stabilizing effect. But it might be worse than I originally thought. And could be why my earlier strategy Green and blue nucleotides did bad.

Back to this strategy. The neckarea seems to have a certain tolerance for these 3 nt long bad patterns, so no penalty here. One more thing these bad patterns not only cause trouble if they are in a string, they can even cause trouble if present in a hairpin loop, or crossing between elements, like touching both string and loop.

In designs with really long strings, the long strings themself have a much higher tolerance for this pattern. In this design it works just fine as it is not around these patterns, things breaks up.

I would like a strategy for designs with 5 nt long strings and shorter (not counting the neck in as a string):

For the presence of this pattern designs with solely short strings:

0 CUC, give +1
1 CUC, give 0
2 CUC, give -1 and add -1 pr each extra

0 GUG give +1
1 GUG give 0
2 GUG give -1 and add -1 pr each extra.

For 1 UCU or UGU, give -1
For 2 UCU or UGU, give -1 and add -1 pr.

If there is a combination of two or more of these 4 patterns present together, give -1 pr extra and penalize exponentially.

If any of these patterns continues more than 3 nucleotides, like eg. CUCU or CUCUC, UCUC or UCUCU, GUGU or GUGUG, UGUG or UGUGU, penalize with -1 pr extra nucleotide added to the length of the pattern.

Designs with long strings:

Same as the strategy for short strings, only give half penalty if the patterns are present in a long string.

For the 4 patterns found in any place in the sequence somehow out of the string – touching loop or inside a loop, penalize as in short strings.

Here is a link to my spreadsheet with data.

https://docs.google.com/spreadsheet/c…

It is quite messy, as I sort of found out along the way, which patterns I was going to check. But it gave me what I was after. It originally started as a investigation on another pattern that I later decided to dump. I run through the data for different kind of labs to get a feeling for how different type of designs behaved.

All numbers marked with x, means mispairing in the area near one or more of that pattern. I counted pattern that ended up in hairpins (hp) as seperate, and included pattern that ended up in the neck (n), as in. The last part I propably shouldn’t, given that I found the neckarea to be more tolerant. However I didn’t knew that, at the time I counted.

Dear Eli,

Your strategy has been added to our implementation queue with task id 137. You can check the schedule of the implementation here.

Thanks for sharing your idea!

EteRNA team