[Strategy market] 2 and 3 nt bulges

I have looked through The thinker lab and see following pattern for successful 2 and 3 nt bulges. The both follow same pattern.

This 2 nt bulge is from Great plot and balanced melting point no GU 1 (89%) by Jennifer Pearl.

This 3 nt bulge is from Stormed puppet solution #3 (89%) by StormedPuppet.

I rotated this one to make it more obvious that the closing GC-basepairs in both size bulges, follows same pattern. Turn the same way when following the reading direction of RNA.

I would like a strategy for 2 and 3 nt bulges that says:

Give + 2 for each 2 and 3 nt bulge that has two GC closing pairs swapped opposite each other. The pairs should follow a specific direction. Following the RNA’s numbered sequence from nr. 1 and up, the green nucleotide should be first. (see direction placement from numbers on the pictures)

Give + 1 for bulges having two GC- closing basepairs

I am most sure about the above part of the strategy. The rest might change a bit when we get more data.

Give -1⁄2 pr each blue lining up after the green nucleotides in the bulge closing, like below.

Give minus -1⁄2 if there is a GC-basepair to the left side of the bulge closing basepairs like below. No matter what way it turns.

In an older thread I looked at the success rate of closing stack cells for tetra loops.

For the tetra loops, here is what seemed to work best when not boosted (i.e. most like a bulge):

ACAAAAGU 67% (18/27)
UCAAAAGA 73% (29/40)
AGAAAACU 82% (18/22)
CGAAAACG 93% (14/15)
UGAAAACA 100% (9/9)

It would be interesting to see if/how this differs for bulges.

Yes, I agree. It will be interesting to see. At some spots in the design, there seem to be some directionality of what way a closing GC-basepair should turn. For now I suspect it will be even stronger for bulges. And your number suggest there is such a tendency for tetraloops too. For now I am not too sure if it will continues (for designs like the star with a multiloop with strings sticking out from it). We have worked with mostly shorter strings than in nature. When there are two or more GC-pairs in a string, they like to be twisted compared to each other. When the GC-pairs in the multiloop ring are in a mostly fixed position as to fit directionality, that leaves the closing basepair at the tetraloop end to be twisted. That again affects the direction of the closing basepair for tetraloops. But off cause this tendency do tell us something about what will be more usual for this specific type of design, which again can be useful to know.

It’s may be early to say too much about 2-0 and 3-0 bulges at this point given the small amount of data we have. I looked at old labs with a 90 score or better to see how their 2-0 and 1-0 bulges worked. We have more data for the 1-0 bulges since they have been used more in past labs.

First the 2-0 bulges. Here is the data for the “opening” stack cell that leads into the bulge:

CC-GG 1/1 100%
UG-CA 1/1 100%
AG-CU 3/4 75%
UC-GA 5/7 71%
AC-GU 2/4 50%
GC-GC 1/2 50%

Figure 1.

Figure 1 shows an example of the AG-CU version where 38,39 are AG and 58,59 are CU. This worked in 75% of the cases (three out of four instances). Half of the cases were tested only one or two times, so 100% success out of one trial should be taken with a cup of salt. Even the most frequent case occured only seven times.

AG-CU 3/3 100%
GC-GC 1/1 100%
GC-GU 1/1 100%
UA-UA 1/1 100%
UC-GA 5/5 100%
UG-CA 1/1 100%
AC-GU 6/7 86%

The “closing” stack cell cases seemed to be more successful. Is this a reality, that the far side of the bulge is easier to hold closed, or just a fluke of the small sample size? In this case, figure 1 shows the UC-GA case with 56,57 being UC and 42-43 being GA.

For 1-0 bulges, there is more data, but many cases have still been tried only once in a relatively successful design. Even with more data, the sample sizes are still small. For the “opening” stack cell:

AU-UA 1/1 100%
CU-AG 1/1 100%
CG-CG 5/6 83%
GC-GC 4/5 80%
UC-GA 9/12 75%
AG-CU 5/7 71%
UG-CA 7/10 70%
AC-GU 3/5 60%
GG-CC 1/2 50%
GU-AC 1/2 50%

and the “closing” stack cell:

CG-CG 5/5 100%
GA-UC 1/1 100%
GC-GC 1/1 100%
AG-CU 12/14 86%
AC-GU 8/10 80%
UC-GA 6/8 75%
CC-GG 2/3 67%
UG-CA 3/5 60%
GG-CC 1/4 25%

The sample sizes were so small that I did not consider interactions between the opening and closing stack cells, which of course may exist.

I didn’t sample the 3-0 bulges as there were even fewer instances of them in past labs.

True, it might be early to say much about 2 and 3 nt bulges. I just found it compelling that most of the working bulges (bulges that looked solid in shape data) were following a particular pattern. Also that the same pattern for working bulges came up in both of them. I know it is just tendencies, but these were the tendencies I see so far.

I should propably had mentioned that I contrary to usual counted all designs in that had working bulges, no matter their overall score. That gave me a bit more data to work with. I thought the overall score for a design might matter less when I was just focusing on if a smaller element in the design, like a bulge, was working or not.

There were a few working bulges that had another pattern and those I haven’t made stategy for as I need more data to pick out if the pattern continues. But I find it highly likely that more succesfull bulge patterns excist. We just need more data for them to reveal themself.

As for the tetra loops that you have investigated, you are right, there are much more data for them. I am just pointing out a reason to why the pattern that occur, might occur for other reasons than the stability of the pattern inself.

Thanks for your comments.

There is an RNA database with sample sequences and shapes. I extracted the stacks from these and looked at the end stack cells. Here are the counts for the top 30 closing stack cells:

15288 CC-GG
14885 GG-CC
11903 GC-GC
6217 GU-GC
5747 CG-CG
4840 AC-GU
4760 AG-CU
4464 UC-GA
3876 GU-AC
3858 GA-UC
3610 CU-GG
3220 UG-CA
3107 CU-AG
2294 CA-UG
2067 GC-GU
1387 UC-GG
1351 GG-CU
1177 AU-GU
998 AU-AU
861 UA-UA
857 UG-CG
836 GA-GC
767 UU-AA
732 GG-AC
632 AA-UU
581 GG-UC
577 CU-UG
525 UG-CU
491 UU-GA
470 GU-UC

Interesting that some stacks are closed by UU according to the database?!

Note that counts for stacks of size 2 are doubled.

JL, then numbers in front of these examples, are that the number of working stacks in front of a bulge? If so I should reconsider the last part of my strategy about penalizing the GC-pair one spot to the left of the bulge.

Just number of working stack ends in the database; if the same stack appeared in more than one RNA DB entry, it is counted more than once. I didn’t have the contextual information yesterday to know if it fronts an end-loop, loop, bulge, multi-loop or hook. That might be possible to pull out from the raw data, but not today. :slight_smile:

Ok, got you.

Dear Eli,

Your strategy has been added to our implementation queue with task id 144. You can check the schedule of the implementation here.

Thanks for sharing your idea!

EteRNA team

2 NT BULGES

It looks like the 2nt bulge shows a mixed nature. It still digs the CC bottom pattern from the 1nt bulge. That I also mentioned in my post on Bulge tendencies for 1 nt bulges. But shows a preference for the 2 nt pattern, C first in the sequence and then G, that I mention in the above.

So except for the Sea Turtle lab, the others labs show a tendency for the pattern that I mentioned in this strategy.

It also looks like it is some preference for not having a GC-pair as next pair to any of the closing basepairs of the bulge. Unless the design has many short strings like the Thinker and Cloud Lab 4 - Random. Still early on this though.

Seems to get pretty consistent that 2 nt bulges want opposite flipped GC-pairs. And in a particular order first.

THE THINKER

THE SEA TURTLE

However the picture changes here. Can it be the many short stacks making it necessary to vary the pattern more?

This one the first 2 nucleotide bulge take after the CC innerside bulge pattern from the 1 nt bulge.

CLOUD LAB 4 - RANDOM

Also it seems like there is opened up to use of an Au pair in one of the sides too. Which reminds me that in designs with many equal length stacks 3 nt long and shorter I have seen a higher frequency of closing AU and even sometimes GU-pairs. Here the need to break to similar pattern, might overrule other common design rules. However when I check the designs in this Cloud lab that has AU as part of the 2nt bulge closing pairs, half of them shows instabilities. The future will tell on this one.

2 NT BULGES

I decided to rework the post after we got the new and more accurate data.

It looks like the 2nt bulge shows a mixed nature. It still digs the CC bottom pattern from the 1nt bulge. That I also mentioned in my post on Bulge tendencies for 1 nt bulges But shows a preference for the 2 nt pattern, C first in the sequence and then G, that I mention in the above.

So except for the Sea Turtle lab, the others labs show a tendency for the pattern that I mentioned in this strategy.

It also looks like it is some preference for not having a GC-pair as next pair to any of the closing basepairs of the bulge. Unless the design has many short strings like the Thinker and Cloud Lab 4 - Random. Still early on this though.

Seems to get pretty consistent that 2 nt bulges want opposite flipped GC-pairs. And in a particular order first.

THE THINKER

THE SEA TURTLE

However the picture changes here. Can it be the many short stacks making it necessary to vary the pattern more?

This one the first 2 nucleotide bulge take after the CC innerside bulge pattern from the 1 nt bulge.

After the update there is still a tendency for double CC-s at bottom of a 2 nt bulge, though a little less strong counting in all the over 91% designs. Also a mixture of CG or GC will do well. With a slight tendency for CG over GC.

Also it seems like there is opened up to use of an Au pair in one of the sides too. Which reminds me that in designs with many equal length stacks 3 nt long and shorter I have seen a higher frequency of closing AU and even sometimes GU-pairs. Here the need to break to similar pattern, might overrule other common design rules. However when I check the designs in this Cloud lab that has AU as part of the 2nt bulge closing pairs, either one or both bases in that basepair shows instabliities in the SHAPE data. The future will tell on this one.