Naturally Occurring RNAs of 60-90nt

jandersonlee · November 18, 2014, 6:27pm

Just curious is there are some naturally occurring RNAs of length 60-90 nts with some interesting folding that might be worth using as the basis for a lab or lab pilot. I know that transfer RNAs are about 70 nts, but they seem to have a lot of methylated bases in vivo and not many helix pairs to play with.

Brourd · November 18, 2014, 7:25pm

I suppose there are a variety of factors in your query that may help to narrow down options.

Interesting Folding? What constitutes an interesting fold?
An interesting tertiary structure?
An interesting secondary structure?
An interesting quaternary interaction?
Some biological function? Is it a function that is dependent on in vivo conditions, or is it something that can be tested in vitro?
Pseudoknots?
Multistate RNA?
Why naturally occurring? What kind of information do you aim to gain from any experiment ran using this naturally occurring RNA?

Some of these may be more interesting from an artificially engineered viewpoint, such as the work I am doing with multistate sequences.

jandersonlee · November 19, 2014, 5:14am

As you mention Brourd, there are a variety of ways that a structure can be interesting. I was trying to be open ended in the question on purpose.

> Why naturally occurring?

My thought was that we have a billion years or so of nature selecting sequences and shapes through evolution. It might be useful to study some of those to see what evolution has come up with as a jumping off point and see whether we can either figure out what makes them work well, or see if we can improve on them somehow - to the degree that we can measure; Newton’s “standing on the shoulder of giants” idea - only standing on the shoulders of natural selection.

Also nature has already found some (many?) of the structures that work in the sense that the shapes are possible and sequences exist to fold into them. We don’t know if a 1-1-1-1-1 multiloop makes sense, but 1b23_1 shows us an example of a 2-2-4 loop for instance:

CGUAACA|UGUAGCG|CGUCUAGUCC|GGAACG,(((…((|))…(((|)))…(((|))))))

And 1c0a_1 shows us a 1-0-4-0 loop:

CGGAGUU|AAUACC|GGGGUCGCGG|CCGCCG,(((.(((|)))(((|)))…(((|))))))

Can they be reused? Can they be improved?

> What kind of information do you aim to gain from any experiment ran using this naturally occurring RNA?

For one thing, natural RNA sequences are more studied at this point. There are x-ray diffraction models of many naturally occurring RNA which can perhaps act as a ground truth for better understanding SHAPE data results for instance. It might also be interesting to look at how the folds predicted by predictive models such as the Vienna energy model seem to compare to those determined by other means.

Also there might be a possibility of using natural RNA as a (refined) strawman or template/toolbox for designs. To compare how SHAPE data for natural sequences with known x-ray structures relates to SHAPE data for altered designs (alter helixes, loops, or both) is one idea off the top of my head. Another possibility might be to take known loops from natural sequences and try to fit them together in new combinations using either human crafted, model tuned, or lab data-mined stacks might be another. Can we construct a tinkertoy kit from natural RNA + lab-data results + energy model + rules for combining? One step at a time.

Still brainstorming at this point.

Brourd · November 19, 2014, 8:33am

Motif based design, the basis of the Eterna3D “game,” is tricky.

You could take any random secondary structure, and fill it in with all naturally occurring sequences, and it could fold into that target secondary structure.

You could take an algorithm that fills in an RNA secondary structure with “data-mined elements,” and it could fold into that target secondary structure.

You could take an algorithm that randomly fills in the sequence with A-U and G-C base pairs then optimizes the partition function, and it could fold into that target secondary structure.

You could have an algorithm that uses a couple of well tested design processes and heuristics, and it could fold into that target secondary structure.

In fact, at the level of data we see, they may all be equally valid. Using one over the other is useless at that point, as it all comes down to a matter of convenience and speed.

However, as soon as you design for tertiary structure, you run into a brick wall.

There is no way to directly measure the tertiary fold of the RNA sequence, using SHAPE chemical probing.

The only reason the motif based design worked, is due to the addition of a tetraloop receptor sequence as an indicator. When the tetraloop is docked, the average reactivity for the tetraloop and the 2-way junction it docks into, will be lower. Without that, you can’t determine if the RNA folds into the intended 3D structure or something else entirely. And if this cannot be proven, there is no way to experimentally verify the tertiary structure of the motif you have inserted into the RNA.

And improving on the motifs? Even more difficult, for one reason alone. How do you improve and then experimentally verify that it is improved?

At best, you may be able to focus on your secondary goal, chemically probing motifs, and then analyzing the reactivity signal based on the interactions that nucleotide is participating in. The Das Lab and probably the Weeks Lab have also looked into this, so it is always worth a try to read through their published papers.

http://daslab.stanford.edu/das_public…

http://www.chem.unc.edu/rna/publicati…

You may also want to look into the paper for the RNA Motif Atlas, the database that all of the motifs used for Eterna3D were pulled from.

http://rnajournal.cshlp.org/content/e…