[Market strategy] Energetic pressured designs

Eli_Fisker · January 24, 2012, 10:43am

With this strategy I’m trying to give our robot a possibility to determine if a design is energeticly pressured.

As the RNA structure is easiest kept by putting GC-pairs in the junctions, having many short strings dramaticly raises the numbers of junctions present in the design. This again has the consequence that a high content of GC-pairs is necessary to hold the structure together - even though having a very high GC-content is generally bad. So far it seems that the top limit for GC-pairs in energeticly pressured designs are 79%. (80% might turn out to be allowed) Any higher % than that and the design is destined to score bad in synthesis.

I think an energetic pressured design is a design having many short strings, like more strings of the length of 1, 2 or 3 nt’s collectively, than of length of 4 nt long. Or put more simple, energetic pressured designs are designs with many strings only 3 nt’s and shorter.

I will like a strategy that says:

In designs with strings 2 nt’s long (and shorter) and no strings longer than 4 NT long

Give +1 when the GC content is between 70-79%

In designs with strings min. 2 nt’s long and max 4 nt’s long and if more of the strings are 3 and 4 nt’s long than 3 and 2.

Give +1 when the GC-content is between 65-79%

In designs with strings min. 3 nt long and max 4 nt long

Give +1 if GC content is between 50-65%

Eli_Fisker · February 3, 2012, 7:26pm

Got my data behind this strategy put into a spreadsheet. Plus some of my thoughts about unpressured designs too.

Unpressured designs.

Designs with long strings, like The cross lab, are much more forgiving in both ways of the GC spectre. Here the GC-content for winning designs is as different as 29%-80%. (Two arms 8 nt long and 1 neck and arm 9 nt long)

Also the size of multiloops affect how pressured a design is, with the adjacent stacks, as the least pressurized. Small multiloops are not energetic pressured as the bigger multiloops. Asymmetry also plays a role.

I will wait posting a strategy for unpressured designs, as I still lack data that shows clear tendencies across the span of unpressured designs with their difference in size of multiloops and other variations.

JeehyungLee · February 4, 2012, 7:48pm

Dear Eli,

Your strategy has been added to our implementation queue with task id 115. You can check the schedule of the implementation here.

Thanks for sharing your idea!

EteRNA team

Eli_Fisker · June 1, 2012, 9:33pm

I have said there is something as energetic pressured RNA designs.

I was reading in a book tonight on molecule interaction and it suddenly dawned to me that this must hold the explanation for what I see RNA strands in short stringed designs too.

From the section on hydrogen bonds (non-covalent bonds) this got mentioned:

”In general, the strenght of the interaction between two macromolecules increases with increasing size of the interaction interface.” (Molecular biology, Oxford, 2012, p. 38)

Rna strands are hold together by hydrogen bonds. And what I observed was that the shorter the strings got, the harder it was to keep them in place and get them bind up with each other. So short contact surface = bad contact.

It seems very logic.

hoglahoo · June 4, 2012, 2:07am

interesting

jandersonlee · June 6, 2012, 12:43am

Except… if you look at the way the Barriers server folds things, it often starts with either or a pair or quad match and then extends the stack by adding additional pairs in one direction or another.

Granted, a quad is often *less* strong than a larger stack, but that can be because it (typically) has less inherent free-energy reduction than a larger stack, rather than just the number of bonds involved.

On the other hand, a strong quad might sometimes have more sites for mismatching than a longer stack of the same strength, since it only relies on having *two* bases in the right orientation, while a longer stack may require having three or more.

So two factors: bigger stacks mean better opportunities for FE reduction and bigger stacks mean more differentiation to protect against mismatch.

Eli_Fisker · June 6, 2012, 7:40am

I agree with you on that bigger stacks carry better protection against mispairing.

3 days ago Hogla mentioned his comming puzzle, that ended up being called Chinese lantern 2 and he said:

”This is one of those shapes where it seems like it needs lots of g-c to stabilize, but is more fragile with more g-c pairs”

And I said: (It has) Many short strings and strings of equal length. Equal length strings opens up to repetitive sequence, which leads to more mispairing.

Back when we were discussing the Nature of the bots, Paramodic mentioned something:

For example, they (bots) seem to also have issue with structures formed by repetetive sequences, so not just symmetrical structures, but repetetive ones.

Nothing opens more up to repetition than many strings of the equal length. Be they really short or even midlength.

I hear you saying, how bots start up, will affect how well they will do with a certain type of element in a design.

Eli_Fisker · June 13, 2012, 9:32pm

I have been running a few tests by making puzzles, to see a bit more about how the bots react to designs with short stacks. As to further test my hypothesis, that short-stems in lab are hard to get to bind. It is as I mentioned above:

Rna strands are hold together by hydrogen bonds. And what I observed was that the shorter the strings got, the harder it was to keep them in place and get them bind up with each other. So short contact surface = bad contact.

Thanks to Rhiju for the idea that I could ”test this hypothesis further by creating short-stem challenges in silico for bots in the ‘player puzzles’…”

Here is Brourd’s uncomplicated series. As numbers of stems grows, more bots gets in trouble.

Vienna first gets in trouble in the ones with most legs in my shortie series. But tetraloops are much easier to stabilize than triloops. So these puzzles are not as pressured as Brourd’s, though the stacks are shorter.

I have some examples on that putting in more legs of this type, will get the bots in trouble. Check my turtle puzzles for more examples. The more legs, the more bot trouble, mainly to Vienna and SSD.

The bots don’t like too many repeats of similar shapes. A quite sure way to gets the bots in trouble, is to cram more short strings in to one leg and tripple it. So more short stacks pr. leg, more bot trouble.

Starryjess campfire is made of many short strings. Here the lack of symmetry and repetition, does not help the robots, propably as the numbers of short strings is high.

But equal length strings generally seem to worsen the problem for the puzzlesolving algorithms, as they open up much more options for repetitive sequence with leaves the puzzle prone to mispairing.

Symmetry, especially if on more than one axis enhances the problem with equal length strings. Also as general rule, the bigger puzzle, the harder to solve for the bots. It only takes one spot the bots can’t solve, to make them fail. I have tried run InfoRNA in big puzzles to see how it would do. To my surprice it only got stopped at a few spots, but else were able to solved a big puzzle.

Freywa found with his Kyurem puzzles an almost certain method to making the bots go nuts.

they have short stacks (most of the length of 4 nt and shorter)
the stacks are of relatively equal length
bulges placed at sharp angles in relation to each other.

ROBOTS IN LAB COMPARED WITH PUZZLESOLVING ROBOTS

I decided also to take a look at the labs in general, as some of them originated from player puzzles. So the bots have solve some of them in both lab and as puzzles.

LABS WITH SIMILAR LENGTH OF STRINGS (many strings with same length)

The general picture is that bots scored in the 80’ties or worse in designs with similar length of strings if they were fairly short (4 nt and down). Chalk outline, Making it up as we go and FMN binding Branches. The picture was the same in the Branches lab, where more than half the strings of equal length and 4 nt’s long. But here the multiloops size might play in as well.

The bots had no big problem with solving the puzzle. Neither had players.

Nupack did fair in The star lab, here the strings are of equal length which opens up to mispairing. But as I think, equal length strengs when longer, are less prone to mispairing.

LABS WITH SHORT STRINGS (energeticly pressured)

Here the bots overall did bad. Like in really bad. Kudzu were similar hard on the puzzlesolving bots and lab designing bots. Except our own Nupack scored 84% as highest. And human players had a hard time making winning designs too. Hinting that shortstring designs are really hard to get to stick together. Which might not be surprising when we are up against getting very short strings forming hydrogen bonds with each other. And those strings have to be mainly the strong GC-pair. Forcing us in direction Christmas tree. The two high scoring designs in Kudzu (95%) and Water strider (92%) have a high GC-pair ratio on 76% and 79%.

The Water strider lab actually never got a winner in lab. Vienna and Nupack as puzzle solving algorithms failed solving the puzzle. But the puzzles do not pose problems for players to solve as puzzles.

The water strider has opposed to Kudzu, more similar length strings, which opens up to mispairs. It also have more multiloops and there are more sharp angles between strings more strings. All of this putting more energetic pressure on the puzzle.

LABS WITH LONG STRINGS

Nupack did great in The cross lab and One bulged cross, where the strings were very long. Same in the finger lab. The finger lab had no multiloop, which might lessen the pressure. And though it it have very equal length strings, they are not short. Long string designs seem to have a greater tolerance to pattern that will elsewhere not be tolerated. I think it is because it helps break repetition and thus prevents misparing better. See my post Rule sensitivity according to length of string.

Nupack however did worse in the similar shaped Bulged star with 1-1 loops in the arms. 1-1 loops can have a stabilizing effect on a lab design, as the G-G mismatch actually can pair up in the loop. But here the design gets full of small 3 nt long strings. The design have 9 small 3 nt’s length string and one with the length of 4 nt’s. GC-pairs at ends of each string is the strongest solution, but leaves the string both GC-heavy and with a very repetitive pattern. If all of 9 small ones gets solved with GC-pairs at the end of the strings, it opens up to a huge amount of mispairing opportunities.

LABS WITH WARIED LENGTH OF STRINGS

Nupack did great in the asymmetry lab, where string lengths were varied and okay in Bends and ends sampler lab. Nupack did better in shape test lab, that had more varied length of the strings. (92%) All the bots ”in silico” had no trouble at all. So designs with varied leght of strings is the easiest for the lab bots and the puzzle solving bots.

Conclusion
A majority of the designs that puzzle solving bots failed, have short stacks and many equal length stacks. The number of short stacks affect how well the bots do with the puzzles. The more short stacks, the harder time they have solving the puzzles.

Repeats of a structure and symmetry on more than one axis, gives bots a harder time solving the structure. The bigger the puzzle, the worse for the bots.

The designs that posed least problems to both puzzlesolving bots and the lab bots were designs with varied leght of strings in the design. The bots also did best in designs that did not have too many short strings.

The puzzle solving bots did better than the lab bots (Vienna and Nupack) with solving designs with short strings. This suggesting that the energy model of the bots allow for more short strings than nature finds fit. That it is hard to make proper hydrogen bindings between short stretches of string.

Eg. the puzzle solving bots did mostly fine on Chalk outline and Making it up as I go, where the lab bots did bad. Players were not able to solve Making it up as I go. Chalk outline were solved by one player, but the rest of the solutions were below 90%, suggesting it was a hard lab.

These are the tendencies I see this far, based on the data we have.

Eli_Fisker · June 27, 2012, 2:32pm

There is one more thing revealing a design as being pressured.

The meltplot in the highest scoring designs of the pressurized designs have a very special curve, compared to the usual winning designs. I already noticed, when I made and got the results from Kudzu. It was impossible making a design with a decent meltplot. That was even after using almost 80% GC-pairs, something that normally have a flattening effect on the meltplot. Some designs with 100% GC-pairs have an all flat meltplot.

It’s meltplot looked like that from a sure loosing design. A promising meltplot should have at least one flat square in the beginning to the left. To see what a normal meltplot should look like, see Lab guide for new players or read about what affects it here.

I have looked at labs (excluded the ones with aptamers) I checked all design scoring 94% and over and the winner if there was none scoring 94% and above. I have sorted those from our bot for themselves, as it have a tendency making worse meltplot and dot plot than us and thus will be blurring the picture between what truely is a pressured design.

This odd melt plot pattern happens in following labs and these designs. If a lab have none or few winners and the highest scoring looks this way, it is a sure sign that one is dealing with a pressurized design. I have sorted them after what lab that looks most pressured.

WATER STRIDER (no winner) Short strings and many of similar lenght
JP-5-0-19 (92%)

KUDZU (1 out of 1) Short strings but of mixed length
Fiskers Kudzu 2 (95%)

BACKWARDS C (1 of 4)
Mat - Backwards (96%)

Winning designs with this pattern from our bot.

MAKING IT UP AS I GO (no winner)
*Eterna Ensemble 03 (L2) (93%)

A TILTED PICTURE OF RUNNING MAN (1 of 3)
*Eterma ensemble design 07

SHAPE TEST (3 of 10)
*Eterna ensemble 8 (Sparse)
*Eterna ensemble 11 conventional
*Eterna Ensemble Design 6 (sparse 5)

Eli_Fisker · July 25, 2012, 6:41pm

Hydrogen bonds in long strings and their effect on RNA patterns‏

I have been asking Rhiju some question about certain behaviour of RNA. Here is the conversation between us:

Hi Rhiju!

I think I might be able explain why long strings behaves as if they have different rules than short strings. It still have to do with hydrogen bonding.

”In general, the strength of the interaction between two macromolecules increases with increasing size of the interaction interface.” (Molecular biology - principles of genome function, Oxford, 2012, p. 38)

With the quote from the Molecular biology book in mind, I think it makes sense to why, long strings are more sloppy about what patterns they allow. The length of a string and with it, the growing numbers of hydrogen bonds, have in itself a stabilizing effect on the binding of the two strands. Thus making it less important if a basepair have 2 or 3 hydrogen bonds and allowing for a higher frequency of GU and AU-pairs in a string and a less specific order of basepairs. AU-pairs and GU-pairs are even seen as closing basepair in long string, something they are not quite as often with success in short strings. At least not in the EteRNA lab results. I have seen GU-pairs and AU-pairs do well as closing basepairs in natural occouring RNA’s.

Rhiju: Yes this is totally correct.

One more thing the book says:

“The fundamental structural unit of folded RNA molecules is the short, double-stranded helix, generally no longer than six to eight base pairs in length.” (Molecular biology - principles of genome function, Oxford, 2012, p. 54)

This also makes a weird kind of sense. As the rules for base placements gets sloppier when strings are longer, more of 3, 4 of the same base in line, will occour more often in the same strand. Long lines of the same basepair (eg. GGG, AAAA, CCC or UUUU) have shown a strong tendency to mispair elsewhere. And this pattern is most prevalent in the low scoring and bad folding RNA’s in EteRNA lab.

Having many really long strings in a design, would make it harder for RNA to keep it’s integrity.

Rhiju: Here, I’m not so sure. I think having long RNA stems would be fine – the main issue then would be whether they become *too* stable. Also, most living cells in complex organisms (including ours) detect long RNA helices and consider them evidence of viral infection – in our cases, the cells often kill themselves!

But I do remember seeing strings much longer than 6-8 bases in natural occouring RNA in the challenge puzzles. How can that be? And does that have anything to do with RNA type? Another thing that worries me is that the natural occouring RNA looks so different from ours. I think I don’t understand it.

Rhiju: Pretty simple reason: We have made our lab challenge puzzles small in size for ease of synthesis, so that keeps the helix lengths small. Also, we find that it is hard sometimes to reverse transcribe RNAs with long, stable stems (reverse transcription is part of our chemical mapping process), so we avoid them in lab challenges.

Eli_Fisker · August 3, 2012, 9:24pm

I even like Rhiju’s answer on how long RNA stacks behave much better than my own guess. It is the totally logical opposite of what I’m saying, that short stacks are being too unstable as there is a short strech for the hydrogen bonds to bond. Long streches for the hydrogens to stabilize the stack, means (too) super stable stack. RNA is behaving in a concievable way.

Actually his explanation fits the lab results for long string designs better. It is much easier making a highscoring design there than in the other labs = more stability. And as we are not using our RNA to do things with, it have not been a problem so far. But I understand that it can be later.

I wonder what it is about long RNA strings that makes cells detect it as a virus. Is it solely the length of the helix or is it specific types of sequences that triggers?

I loved Quasispecies RNA joke that I picked up from chat:

Rule #1 - don’t trust strangers with double-stranded RNA (They are viruses)

Eli_Fisker · August 5, 2012, 11:18am

I asked if it is the long length of an RNA that get human cells to recognize a RNA as viruses or it is particular sequences that triggers it. And I think I found some part answers.

I found this WIKI answer: Essentially, one way your body recognizes viruses by the antibodies it left behind the last time you were infected. Another is the sheer presence of something “different.”

I especially like the part of that it gets recognized by something that is different. That diffence could be length, sequence etc. Read more

Another article says it is certain sequences or rather lack of them in a virus RNA that triggers a counterattack.

Molecular ‘signature’ protects cells from viruses

Eli_Fisker · May 26, 2013, 1:14pm

Note: The cloud labs showed that RNA was able to tolerate an even higher GC-percentage than the 80% I earlier mention as a propable limit. Here is an example from Cloud Lab 14 - Easy loops. In this lab most of the highscorers has about 79% of GC-pairs. This pressured design and almost winner have a GC-percentage of nearly 100%.

And though that will usually be in the high end, energetic pressured labs have shown they can deal with a higher than 80% GC-pair ratio and sometimes even need it. As can be seen in Cloud lab 4 - Random. Here a design with 95% GC-pairs.

And even more interesting, designs where only part of the design is pressured, due to short similar length stack and perhaps also small multiloop, it seems to behave like pressured and have a higher want for GC-pairs. The only winner in this lab, Pumpkin Seed has 87% GC-pairs.

Cloud lab 5 has the stemlength to be called pressured and a high GC-percentage in the solves, but interestingly wasn’t as hard getting a winner in as for a usual pressurized design. Apparantly the lack of symmetry helps and I think loop size and type plays in too on how pressurized a lab behaves. It appears that pressure gets worse if short equally length stacks are packed closely together in a multiloop with adjacent stacks.

There were more pressured labs with few or no winners. Cl 11 has 1 winner.

Cloud lab 20 - Random 4 with zero winners and pressured to by both short stacks and symmetry.

Cloud lab 9 - Final count down. Not pressured because of short stems, but more due to symmetry and similar length stems.

Cloud lab 2 behaved partly as pressurized. More due to similar length and perhaps loops, than short stems. It had one winner with a GC-percentage on 84%.

Eli_Fisker · May 26, 2013, 11:10pm

Wanted to add a comment for the labs with long strings. For them the picture is reversed compared to the pressured designs. Just as the pressured can tolerate a real high percentage of GC-pairs, so the longstringed can tolerate a high amount of AU-pairs. Here are some exampels. I have picked out the one with the most extreme AU-content among the winners for these two long stringed labs. Though longstringed designs are much more versatile and can tolerate a relatively high GC-content too. About up to 87% and perhaps even higher.