How to get same pairing probabilities from dot plot via external tool?

Can someone tell me how to get access to the pairing probability’s of a design by using the sequence via an external tool instead of using the dot plots? I need it for an analysis program I am writing. If that is not accessible then access to the original raw bmp used for the dot plots is requested.


1 Like

Jennifer, here’s how I get pairing probabilities from Vienna.

The basic idea is to get Vienna RNA’s RNAfold to create a PDF file for the dot plot and then examine the PDF file as a text file. If you are only looking at a few PDF files, requesting them from the Web server is probably easiest. But if you want to process a large batch of them, it is probably best to download the ViennaRNA package to your own machine and generate the PDF files locally.

To get the dot plot PDF from the, Web server, cut and paste your sequence into the first box. Make sure the “minimum free energy (MFE) and partition function” option is selected, and click on Proceed.

You’ll get a page saying your job has been submitted to the queue. Typically, in 10-20 seconds, this will be replaced by a screen showing you the results.

Find the line that says “You may look at the dot plot containing the base plot probabilities” and click on the PDF option. You should get something like this:

Although formatted slightly differently, this is essentially the same dot plot you see in the game UI. The biggest difference is that the probability vales are indicated by the area of the dot, rather than by a gray scale level.

To extract the numeric values for the probabilities, copy the PDF file to your local drive and open the file with a text editor.

Whoops! I just discovered that the PDF file generated on the server has the drawing data in a compressed format. This isn’t what happens when I produce the files locally. But, it turns out that the EPS file generated on the server works, so download that instead and open it with a text editor. Now scroll down to the line “%start of base pair probability data”.

Each of these lines describes one dot in the dot plot. The first line, “1 27 0.013068432 ubox” says that the box (“dot”) in the upper (right) half of the graph, at the intersection of row 1 and column 27, should have a side of 0.013068432 units wide. Since the area of a square box is equal to the square of its sides, the box will have an area of 0.00017078391494 square units. This means that the probability of finding base pairs 1 and 27 paired at any point in time is 0.00017078391494, or 0.017%.

But I suspect what you really want, if you’re going to methodically processes the results, is to is to download the ViennaLib software and script the whole process. But then, the details of the process are going to depend on your environment and preferred scripting language(s). I’m currently using a Mac, so I needed to build the software locally, and then used a combination of shell scripting, python and manual operations to semi-automate the extraction of the specific data I was interested in. If you are going to go this route, I’l be happy to share what I’ve done in more detail.

Hope this helps.

This means that the probability of finding base pairs 1 and 27 paired at any point in time is 0.00017078391494, or 0.017%.

I think everything else you explained is correct, Omei, but I believe this bit to be a (possibly quite common) misconception. My understanding is that the partition function gives us the probabilities of base pairs after the solution has reached equilibrium. And as far as I know, even with the knowledge of the exact starting distribution, the partition function cannot tell you whether reaching equilibrium is going to take 5 milliseconds or 5 trillion years…

You’re right, of course, Nando. What I intended to convey was that even after the solution reaches equilibrium, it is a dynamic equilibrium. The molecules are not fixed in one MFE folding; instead there is a probability distribution associated with the various foldings that are accessible with the current temperature, and that probability distribution remains constant as long as the macro conditions don’t change. My choice of wording did not make that assumption of equilibrium explicit.

I also think this is a very relevant observation. I remember some time ago, Rhiju made the comment that the expected time for the lab designs to reach equilibrium was tiny compared to the time scale of the experiment. But we were making smaller RNAs then, and we were targeting designs that would have one clear local energy minimum. Looking at the current riboswitch lab data, I suspect we can no longer count on having reached equilibrium. I think we have to be concerned not only whether the presence of the FMN leads to a clearly different folding, but how easily (i.e. quickly) the OFF state can switch to the ON state as the MS2 protein levels are increased. We haven’t talked about this explicitly, but I can be pretty sure from your comments that you are thinking along the same line.

Thanks Omei, that helps a lot. Do you know where I can find probabilities for pairs outside of the base pairs? I want to take a look at the intermediate states as well, or is that something I can only get from EteRNA.

You’re absolutely right, and this is a probably even more common misconception among players. I tried to address that topic once in an Eterna University “lecture”, I’m afraid the public it reached was pretty microscopic in size though… :smiley:

Talking about “tiny”, the Das lab also had to face a strange difficulty in the past: a certain construct (Tetrahymena P4-P6), which was about the same length as our Cloud lab designs, would refuse to fold properly. In other words, its SHAPE signature didn’t match the results obtained by other techniques like XRD, NMR, etc. And it is a very well-studied molecule in the field… It turns out, Ann Wipapat had to add an incubation at 52°C (if I recall correctly) for a period of 30 minutes, and then finally, experimental results started to make sense.

What I take out from the above story, is that how long it takes to reach equilibrium depends very much on the design, and there can be very large differences, even for relatively short constructs. In essence, this is the reason why Xmas tree designs fail: the MFE predictions for GC-rich designs are quite possibly correct, but due to their much higher free energy barriers, it takes them a huge amount of time to reach equilibrium, and it simply takes way way longer than the experiment allows for.

I’m guessing that what you mean by “outside the base pairs” actually is “outside the MFE”. The listing Omei showed above contains all non-zero pairing probabilities, which includes all possible pairs, whether they are part of the MFE or not.