So, we have these nice dotplots, but they are kindof hidden, and they are a visual tool.

Could we make this a data tool?

Let’s say I want to have the slope of the dotplot, and the initial start value, and the “cleanliness” (would have to think about how to describe this),

and normalize each value on a scale from 0-1, where 0 is bad and 1 is good, as defined by the group (i.e., us), so we get 1, 2, or 3 values telling us if our dotplot looks clean.

This, we could directly use, and interpret in our spreadsheets and would mitigate having to look up thousands (literally) of dotplots for the MS2’s.

I pursued this path (for dot plots, not melt plots) in some detail a few months ago. I didn’t find the killer strategy I had been hoping for, so I never wrote it up. But EternaBot doesn’t rely on any single killer strategy, so it’s probably worthwhile describing what I looked at.

Here’s a dot plot created by ViennaFold 2.2 that I will use to illustrate. I’ve circled the seven base pairs that define the MS2 hairpin. (Note that Vienna represents the base pairing probabilities by varying the area of a black square instead of varying the gray scale of a fixed size square. For me, Vienna’s rendition is better at conveying the differences in probability.)

I started by asking myself why “messiness” was a problem at all. The answer is that each gray (as opposed to black) dot represents a non-zero probability of the RNA folding into something other than the target folding. But in our current round of switch labs, we’re not trying for a design that forms a unique overall folding; we only care about whether the MS2 hairpin is or isn’t present in the folding. So rather than trying to quantify all messiness, I figured I should concentrate on the messy base pairs that indicated the MS2 hairpin wasn’t folding.

Here I have marked off, in magenta and red, the areas in which a grey dot indicates one of the bases involved in the MS2 hairpin is actually pairing up with some other base. This “messiness” is bad because it means the MS2 hairpin isn’t forming. If you look closely, you’ll see that there are grey dots in these areas, but they aren’t that prominent.

Looking at the areas outside the MS2 bands, we see varying degree of messiness. The two stems called out in green have only minor “messiness” associated with them. On the other hand, the two stems in blue both have intermediate probablity values, since they are actively competing for the same bases. But then, this has no direct impact on whether the hairpin can form, so it we shouldn’t really need to care about it.

For a comparison, here’s the dot plot for another design that differs in only one base.

Here there is a significant amount of messiness in the red and magenta bands, indicating the presence of base pairings that show foldings where the MS2 hairpin isn’t forming. And so the prediction is that the first design (mod 14) would be expected to score better than the second (mod 12). And in fact, the first scored 86.5 and the second 63.5. (Note that I did not select the two dot plots first, and then discover that the scores were so different. If that’s what I had done, the significance of the scoring difference would have been huge! But I selected this pair of single-mutation designs because their scores differed by so much and then compared their dot plots looking for a possible explanation.)

At this point, I made two more decisions to reduce the complexity of the calculation. The first was that rather than trying to quantify the “messiness” in the MS2 bands, I would quantify the “cleanness” of the MS2 hairpin itself. Basically, I would reduce the calculation to one of determining how close to 1.00000 the probability of the MS2 hairpin forming was. Secondly, after looking at the precise data values from lots of dot plots, I came to the conclusion that the one base pair farthest from the loop itself always had a lower probability of forming that the other pairs. So I decided to reduce the whole calculation of “messiness” to the single data point of the probability of pairing of the hairpin stem’s closing bases.

I have to admit that there is a pretty big distance between the goal of “quantifying the messiness of the dot plot” to “use a single number from the dot plot as that measure”. But each step along the way seemed justifiable. The above is a somewhat shortened version of what I wrote up in Predicting Riboswitch Characteristics from Ensemble Dot Plots. If some of my justification above doesn’t make sense, maybe the longer version will.

Anyway, to wrap up the story, I did a similar thing with the OFF state, recording the probability of the same pair in the dot plot using the constraints that act as proxies for the presence of the FMN aptamer. The end result turned out not to be a great predictor. It was pretty good at weeding out a lot of bad designs, but not so good at distinguishing between mediocre and great scoring switches. In retrospect, this makes a lot of sense now, because I think the key difference between mediocre and great scoring switches lies not in differences between the thermodynamic equilibria of the states, but in difference in the speed of switching (i.e. the kinetics.)

But EternaBot doesn’t rely on finding a single killer predictor, so my simplistic predictor is probably a useful thing to add to its bag of tricks. I’ll propose it in the strategy marketplace. And if anyone else wants to experiment more extracting data from Vienna-produced dot plots, I can probably help you with the mechanics.

exactly what I was thinking. Well thought through, Omei

I’ll see if we can get some “single digit indicator” fromt he melt plots, too, to make them a bit more useful than they are at the moment…