Thoughts about potential difference between messenger RNA and our kind of RNA

Eli_Fisker · October 29, 2013, 10:13am

I saw a codon table in Brock’s Biology of microorganisms (page 204) recently and it made me think, that this would give a string of RNA a very different base distribution to what I knew from Eterna. It was a codon table over a bacterial species. I also looked up a codon table for humans, where the frequency of use is also mentioned.

http://www.kazusa.or.jp/codon/cgi-bin…

When an alphabet of 20 amino acids is made of blocks of 3 bases (= codon) with just 4 letters, then there must be high number of more than one of the same base in line in messenger RNA. Just from pure statistics.

Of the 64 possible codon’s around, almost half (28 in the above table) have dinucleotides and trinucleotides in them. (Bases of the same kind in line). And they are generally among those with the higher frequencies. When codon’s gets put together, then even more bases of the same kind get the chance of being in line. Between just two codons put together this number of the same bases possible in line, ranges from 2 to 6. And amino acids can be even more of the same amino acid in line, though I don’t have a feel for how typical this is in proteins, compared to our RNA. I have just seen repeat amino acids sometimes in books.

I think what I mention above will make messenger RNA quite different to the RNA we are playing with, with the intent of getting it to fold into a very specific structure. Most of the RNA shapes we are playing with, loves variation and don’t like too many double bases or same turning base pairs for that matter, or more. Except for in longer stems that are more tolerant.

That made me think that our RNA must behaves different, as it is supposed to fold up in a very specific shape. Early I noticed that the bases liked to be twisted in relation to each other and generally varied, for the lab designs to do well. Though longer stems are a bit more tolerant of repeat bases.

https://getsatisfaction.com/eternagam…

From what I understand, messenger RNA also has a 3D structure. So it folds up as well. But I’m imaging a much more loose and less strong fold. I suspect that the tangle is not as tight as what can happen in our (mostly also shorter) and specific structured RNA, or like in tRNA.

If it is too strongly folded, will it then be able to unfold when it has to be read and made into protein? Which makes me think about how RNA like ours will get made in a cell. I’m thinking that all the double and more of the same base in line makes sure that the mRNA don’t get too tangled up in itself. That should make it easier for splicing and so on to happen.

I have read that base sequence can stop transcription. Actually there is a small sequence bit of the last part of the mRNA that folds into a hairpin and thus stalls and interrupts translation. I don’t know if the ribosome that makes mRNA into protein is equally prone to being stopped. It is a lot bigger than polymerase, so perhaps much more stable. As I understand it only gets stopped by stop codon’s. So that makes me think it might be stronger.

So here is what I think. Messenger RNA will have another base frequency of dinucleotides and more bases in line. By having these many of the same bases in line over a long stretch, it ensures that it doesn’t get folded too strictly, but is just loose enough to get untangled. I don’t know if it really is so. No matter what I find it fascinating that our RNA sequences appears to be so different from mRNA. And I’m mainly just trying to understand what it means.

It is my experience that double nucleotides or more is much more welcome in loop area, compared to stem. And loop exactly is characterized with being not binding (at least not most of the loop if it is bigger and the bases are well picked. So I’m imagining mRNA not being nearly as folded and tangled up as our kind of RNA, that is folding in a particular structure.

Which reminds me that about another thing I suspect, namely in case of a lot of base pair in line and turning the same way, I think they will be much easier to split, than stems with more flipped and varied content. Something I have earlier mentioned that I think double bases and more repeat bases, helps cause a stem to split and make it bind less strong.

Just to get a feel for what a mRNA could look like, I took a random mRNA from NCBI. Quite right there is a lot of same kind of bases in line. (this is the DNA template for the mRNA, so it shows DNA bases and not RNA bases.)

http://www.ncbi.nlm.nih.gov/nuccore/N…

I also for the fun of it looked at some of the Eterna Classic designs. Vienna do generally have more repeat bases than Nupack and us players. Nupack usually scores better than Vienna and players start getting better scores as the shapes gets harder.

Im imagining something like this. Defined structure in our small structural RNA’s. And messy, but not too tangled structure for mRNA.

tRNA

Messenger RNA

Question: Are messenger RNA more short lived than structurally folded RNA? I’m trying to understand what amount of repeat bases means for degradation speed.

Ok, let me hear what you guys think about this.

JR1 · October 29, 2013, 6:54pm

Here is an short discussion I liked so borrowed from somewhere on the internet:

An organism’s DNA can be regarded as a set of instructions for putting together all the proteins it requires. The amino acid sequence necessary for each protein is encoded in the DNA in the form of groups of three nucleotides known as codons, each of which represents a particular amino acid unit. The processes of DNA transcription and RNA translation allow these units to be assembled into the correct sequences to form the necessary proteins when cells divide.
First, the DNA is transcribed to make a strand of messenger RNA, or mRNA. The mRNA moves out of the nucleus and into the cell’s cytoplasm to a ribosome, where translation takes place. The mRNA acts as a template for amino acids, allowing them to be joined together. For each codon, transfer RNA, or tRNA, carries the appropriate free amino acid from the cytoplasm to the ribosome where they are joined to the existing chain. As the mRNA is translated, the units are joined to form the specific sequence for that protein.

So where does that leave EteRNA. We may want to understand nature’s RNA but
not necessarily duplicate it. Instead of designing a Swiss army knife, (which is probably closer to nature’s RNA) , we want to design switch blades (good a just one thing and very effective if handled correctly).

That does leave open the question- do lab designs with large strings of codons
synthesize better than our regular designs. I’ll propose some labs to find out.

Eli_Fisker · October 29, 2013, 7:33pm

Thx for your explanation and thoughs. I love your likening of nature’s RNA to a swiss army knife, but our RNA as an army knife.

I will look forward to see what your labs will result in. I suspect that strings with a mRNA like sequence will fold less well. I will look forward to see how a realistic mRNA sequence will fold. I know that mRNA is on average much longer than our RNA. But real nice now that we got access to longer sequences.

Good luck!

nando · October 29, 2013, 8:28pm

In vivo, scientists have identified already dozens (literally) of different classes of RNA. Messenger RNA is the most famous, but it’s only one kind of RNA. The point is that in (very) many cases, we have absolutely no clue what is the purpose of the sequences.

What we know from a general point of view, is that RNA serves either as an information storage device (a list of codons), and/or participates in the metabolism of the cell, essentially by means of its (often flexible) 3 dimensional shape.

In EteRNA, up until now, our experiments have been limited to the in silico and the in vitro domains. And I believe that it is completely intentional that experimental conditions are kept as simple as possible. What do people trying to study a complex system? They take each elements separately, study them thoroughly to the point that they understand the basics, and then go on with more complex stuff.

Take Physics and a theme like pendulums and springs. You start with simple things, one mass, no friction, etc. And then you make the system more and more complex. (just a plug: I strongly recommend the lectures of Walter Lewin, for an amazingly fun way to learn Physics, a famous example at http://videolectures.net/mit801f99_le… )

Coming back to RNA and EteRNA. It seems to me very natural that this community should first try to master the basics. And the basics currently mean, RNA and nothing else, or barely some Magnesium ions to help folding.

The day we seem to understand things better (I believe we still have a long work ahead of us with switches and some tougher single states), we may start thinking about manipulating the environment by modifying different factors. And there are awfully many of them: temperature, ionic concentrations, pH are the most obvious ones. And then we could play with dimers (yes, I know, I started looking there, what can I say, I’m impatient, but still, it’s not yet clear whether we did actually create dimers or not, so another plug: vote for the rerun of Dimer A please, and we shall know soon)

And I would expect that we could slowly increase the complexity of the experiments to the point that the tested environments would seem close from the actual ones in cells. At which point, it would be probably smart for us to start toying with bacteria…

off the tip of my hat

Eli_Fisker · October 29, 2013, 9:13pm

Hi Nando! Thx for the tip of your hat

Yes, very much of the “RNA world” is still unknown land. I remember hearing in a lecture, that the scientist were coming up with names for new RNA classes all the time, and they were not even sure which categories were overlapping, just hoping that their lab ended up with naming one. Which is actually kind of funny.

I also like that we started dissecting and tinkering with small RNA totally from scratch, most of us starting out knowing nothing about RNA, not even what is normal - from textbooks. Had I started the other way around, with reading the textbook first, I might not even wondered about the difference between the codon table and our RNA as the codon table would have been normal to me then.

I think I’m in the process of comparing our normal with the scientists/textbook normal and trying to understand it.

I look forward to what our coming switch adventure and slowly growing complexity of RNA experiments will bring.

Hehe, bacteria are big. Perhaps to that time the smallest bacteria will have become even smaller. I know there are investigation going on to see how small bacteria can get and how many genes one can delete and still have the bacteria live and function. And I can’t forget that the biggest known virus is now bigger than smallest known bacteria.

Thx for the video tip!

Omei · October 29, 2013, 11:47pm

Eli, I found your post very interesting. To me, the central point was the idea that, so far as we know, mRNA is special a special RNA in the sense that the sequence of bases, and not the secondary or tertiary structure, is what determines its biological structure. If that’s true, it makes sense that mRNA that had weaker folding would present fewer obstacles to transcription, and hence have a higher transcription rate.

Looking at the codon frequency tables, my first thought we that it supported that theory. But then I started looking closer, by examining how the frequencies of each codon, were it to pair with with some other triplet via WC pairing, compared to the resultant free energy predicted by the Turner model. And it seemed that there were lots of exceptions. It turned out the correlation between codon frequency and predicted energy was pretty weak (-.07). (The negative here means that overall, codons that had high frequencies were predicted to be more, rather than less, pair bondable.)

I also found this recent paper that I think you will find interesting. It seems to claim that secondary folding actually increases the mRNA’s transcription rate. I haven’t read it carefully enough to really understand it, but so far it does seem like the authors have some strong supportive data and arguments.

Eli_Fisker · October 30, 2013, 12:27am

Hi Omei! Thx for your nice explanation and for checking into this. I think you are right, very likely reality is, opposite to how I imagined it in this case. I don’t mind. I have read a big part of the paper. Thx for providing me with a interesting answer to my questioning.

JR1 · October 30, 2013, 1:17pm

Here an article cited by Omei’s article that might be an interesting read.
http://www.ncbi.nlm.nih.gov/pubmed/23…
Helps relates the “other stuff” back to NTs and secondary folding.

Eli_Fisker · October 30, 2013, 5:35pm

Hi JR! Thx for the paper. I will check it out.

Eli_Fisker · October 31, 2013, 10:00pm

Just wanted to say that I have added a few thoughts related to this discussion in my cloud lab notes. Note that I’m in the process of changing my mind and understanding where all this leads.

https://docs.google.com/document/d/1D…

nando · October 31, 2013, 10:12pm

what’s the ISBN number you reserved for that one?
(and he calls that ‘notes’…)

Eli_Fisker · October 31, 2013, 10:34pm

Haha. Asking for ISBN number. That goes straight to my librarian heart.

I can guarantee that it started very innocent as few notes and then it just grew. It is still where I put my loose thoughts, questions and observations. Some wrong, some right, but all with the intention of getting a little bit wiser.

Eli_Fisker · October 31, 2013, 11:16pm

Sorry, I realize the link didn’t land on the intended page. Search for these words in the doc and you will get to it:

On repeat bases in line

Eli_Fisker · July 3, 2014, 7:51pm

I found an article which has relevance to how mRNA is folding up. I find it quite intriguing. Folds of mRNA matters too.

“The analysis uncovered some interesting general features of mRNA structure. For example, protein-coding regions had less structure than noncoding regulatory sequences, particularly in segments involved in splicing—an enzymatic process that expands the number of proteins encoded by a single transcript. Additionally, nearly 10 per cent of the mRNAs the team examined assumed multiple secondary structure arrangements, suggesting that ‘switching’ between conformations plays an important regulatory role.”

Genomic differences between individuals can change the physical organization of RNA transcripts

Eli_Fisker · May 25, 2015, 1:28pm

Messenger RNA, base repeats and half-life

As a continuation of my post on the possible relation between raised entropy and repeat bases in riboswitches, I got reminded that I earlier noticed that messenger RNA (mRNA) had many long repeats plus a high rate of base repeats.

All the repeat ratio thing I have been up to lately RNA switches made me think back on messenger RNA, which I recalled had even more repeats than what turned up in switches.

So now I wonder about if mRNA has longer half-lives if it has more repeats than if it has fewer?

I also wonder about if mRNA with lots of repeats have higher entropy and is more loosely folded, than mRNA with fewer repeats.

Thoughts about potential difference between messenger RNA and our kind of RNA

For the fun of it I ran a messenger RNA for Zea Maize mRNA that I mentioned in the blog post, through Vienna to get an idea of its entropy.

I picked it from the current version and nicked the full DNA sequence under the FASTA link and dumped it in Vienna.

Vienna neatly translated the DNA sequence to RNA - actually I didn’t knew it could. Then I got some beautiful structure output and some very high entropy - 3.3. The highest I have seen so far. High entropy areas seems to be estimated to be at small hairpins at certain intervals but most of the structure quite unstable.

How to get entropy numbers:

Quick guide to Vienna RNA fold

Small experiment

Now I suspect that long sequences in themselves will spark high entropy. At least that’s the tendency in riboswitches. But I also wonder if the ratio of base repeats can effect things like half-life. So I wanted to see two shorter mRNA’s and see if there were differences between base repeat ratio and their entropy.

Then I got lucky. There were two small protein sequences based on messenger RNA in the science article that I recently introduced in the forum blog and they even had half-lifes attached.

Human genes - turned off

Paper:Global quantification of mammalian gene expression control

I asked Google how to translate protein sequence into RNA and ended up with finding this tool that could do translation different ways, between RNA, protein and DNA.

EMBL-EBI

I picked this one:

backtranseq

And since the protein sequences I had taken from the paper, were from mouse I set that as species. Since I have read that codon tables can vary for different species.

Protein long half life

Protein sequence: SEAAPAAPAAAPPAEK

Half-life for Hist1h1c: 62.1 hour

Amino acid repeat: 9/14=64%

Base sequence: AGCGAGGCCGCCCCCGCCGCCCCCGCCGCCGCCCCCCCCGCCGAGAAG

Base repeat: 30/48=62.5%

Protein short half life

Protein sequence: APTNPSVEDEPLLR

Half-life for Rrm2: 4.5 hour

Amino Adic repeat: 2/14=15%

Sequence: GCCCCCACCAACCCCAGCGTGGAGGACGAGCCCCTGCTGAGG
RNA repeat bases (22/42=52%)

Result

The one mRNA that did have longest half life was the one with the highest repeat sequence. (62.5% versus 52%) Now I wonder if there is any general trend when it comes to that.

They also have quite different ratio of protein repeat sequence. Now I wonder if repeat amino acids in proteins affect protein stability?

I took a look at the original data set attached to the paper. I downloaded and moved it to a google spreadsheet.

Original dataset

I then sorted after mRNA half-life average and then took the 10 with longest and shortest and sent them through Vienna. Then I dumped mRNA ID number in the search field of this base:

refseq

I grabbed the sequence under the FASTA link and ran it through Vienna to see entropy.

The mRNA’s that have longer half-lifes, tend to be longer. But there isn’t always a connection.

There isn’t much difference on the entropy account between sequences with long or short half-life. Most of them are at 3 or above.

There still could be something with repeat rate. I think I see more and longer repeats for the designs with longer half-life. But I can’t say for sure. Also I think I see a higher amount of U repeats in the short lived RNA’s compared to the long lived. So there might be something like an optimal ratio of the repeats against each other going on.

I did the same experiment with protein half-lifes. Amino acids and proteins with both short or long half-life were very similar in repeat rate.There were surprisingly few repeat amino acids.

Anyway - was a fun RNA adventure even though it ended up in protein country. What can I say? I am a previous foldit player. Back to RNA again.