Why does repetition rule and A 40% rule exist???

when i was trying to solve problems, some questions came up.

We have repetition rule and limitation rule of adenine maximum 40%.
In repetition, i roughly know it could make some misfording. But i want to know the specific reason for why each type of NT sholuldnt have repetition.

In adenine limitation rule, they will conduct PCR to amplify their template DNA after getting some candidate. adenine is limited to maximum 40%. In DNA, it means G or C is more than 60%. So it means it could have many GC pairs. But i know that many GCpairs make PCR hard since they cause high temperature. So i think not upper bound but lower bound must be provided in Adeninde.  How do you think?

1 Like

hi rutan3207!

The 40% adenine max stems back from when our flourescent riboswitches were quite new. A lot of the lab designs got lost or had very few cluster counts. 

As Johan said in his lab conclusion
The highest scoring designs had very few clusters, so beware when interpreting the results.

Here is part of the backstory:
Cluster Siblings
Clusters in relation to base percentages

The end of the story was the adenine limitation. 

Actually for switches there seems to be an additional benefit to going lower on A’s. Repeat A’s in particular. 

I have made a few scripted tutorials on the topic.

Riboswitches #6: Repeat of A base frequencies in static lab designs
Riboswitches #7: Repeat of A base frequencies in switch lab designs 

And while I have earlier mentioned that repeat U’s seemed to raise the amount of clusters, there also seems to be a negative relationship between having lots of repeat U’s and getting a good switch. Although more repeat U’s than repeat A’s seems to be tolerated. 

1 Like

Thaks you for commenting me. and i have some questions.

  1. it seems that ‘cluster’ is a key word to explain what i wondered. i searched the meaning of cluster. but most of the results in google are RNA cluster in NGS. Is it what you said it is? Could you please let me know the exact meading of cluster??
    2. In A repetition, Are there any reason why A repetiton is bad? we just think like that  
    from results of comparision between static case and switch??
  2. There is also cytosine repeat limitation. And the reason of limitation is same with U case??

Thank you!!

in addition, could you please let me know why the number (3G, 3C, 4A) was chosen?

hi rutan3207!

  1. Omei has given an explanation of cluster counts here.
    2: The repeat A’s may result in a stable structure. If 6-7 A’s in a row, they start leave the mark of a stable structure in the SHAPE data for our earlier static designs. It also left our designs scoring worse with many repeat A’s.     
    3: On problems arising from longer stretches of repeat bases is they tend to make trouble in the lab process. Ann has given an explanation here

If you check RNA designs in the RFAM database, you will be able to find designs with more than 3 G’s. Probably also longer stretches of C’s. So nature can handle more, but the lab method we have been using won’t take that. The limit was set so we would loose fewer designs during the lab process. In other words so we would get most possible designs back. If you check labs, you can almost always find some designs that hasn’t synthesized properly. Those we don’t get data for. Earlier we had way more of these. 

1 Like

THANK YOU fisker
and i still have some questions.
According to the way i understand it, clusters are used to amplify fluorescent signal, and they are bound to a surface

  1. it means movement of RNA is limited since they are attached. Doesnt it affect badly to form MS2 when molecules are given??
  2. In comments of Daniel Cantu in link below 
    It describes steps of synthesizing RNA. and that steps are used to judge folding subscore. And there are switch and baseline subscores in many switch labs. So Are there some experiments which judge them exclusively?? and which experiment contains construction of clusters?
  3. In your link, low clustering is good and the graph plotting number of cluster versus A(%) shows that Numberofcluster is decreasing as percentage goes up. it means that we need many A to make number of cluster low. Judging from this facts, minimum 40% of A seems to be reasonable as a restriction. But, Why does lab use maximum 40% of A??

 I am worried about whether my questions are fully understandable to you since my English isnt good enough… I hope you get them

Thank you always, fisker…

Here is my opinion:
“Why does lab use maximum 40% of A??” Too many As have always given stinky lab results but pretty good player puzzle results If you look a real RNA strings they have all sorts of what looks like junk but very few if any have large amounts of 'A"s. The restrictions create testing issues and are there to point us to a usable lab solution. 

Thank you for replying to me, JR

then why 40%?? is it just intuition or observation from results, not scientific base??

I would say observation of results which is ( at least to me) scientific based since it is not a “off of the top of my head” exclusion.