For reviewing a long list of lab sumbissions, need a metric for dot plot "sharpness"

Omei · November 20, 2012, 12:47am

It would be extremely useful to have a single metric for the overall “sharpness” of a dot plot. if this were a column in the list of lab submissions, it could serve as a useful guide for deciding which designs merited closer scrutiny for voting. As it stands now, there are too many designs to bring up one at a time to see the dot plot.

jandersonlee · November 20, 2012, 5:52pm

I’d love to see some [Strategy Market] ideas based around the dot plot. If we could get a quantitative dump of the dot plot (.csv?) the same way that we can copy the sequence to the cut buffer it would help us start to develop these.

The way I see it, each nt in a switch lab falls into one of four cases:
unpaired in both states
paired in unbonded state only
paired in bonded (aptamer) state only
paired in both states

There are probably separate thresholds/metrics for each of the four cases that
can be averaged (arithmetic/geometric) or summed to give an estimator score for each NT and the whole shape. A user could run one or more models/strategies against the dot-plot for a given lab+sequence to get an overall metric.

But knowing what the quantitative data looks like for a given shape is where to start.

bentrem · November 20, 2012, 6:06pm

Omei - Glad to see you surfaced this here; I brought it up in Chat with a couple of people but couldn’t relay the nut of your suggestion / request.

One thing you said really piqued my interest; hoping you can elaborate here … and apologies if I misunderstood you.
Seems to me you said something like the PPlot was actually graphing more than one thing. I don’t have enough nomenclature in this field, so didn’t really understand you at that time.
Scientific visualization is a pet of mine … I know the techniques, but I don’t know this field!

p.s. one of the things I want to bring up in a separate thread is what I call a “diff” function, to allow A/B comparison of designs that succeed.

Omei · November 20, 2012, 7:36pm

For a single measure of how “sharply focused” the dot plot is, I was thinking of something simple like the standard deviation, or the Shannon entropy (http://en.wikipedia.org/wiki/Shannon_…), of the distribution of values in the plot. (Maybe excluding those involving a hook nt.) A perfectly clean plot, where there is nothing but black and while pixels, would have a large standard deviation and a low entropy value. A “messy” plot would have low standard deviation and high entropy.

Based on Eli’s “Lab guide for new players”, I think this would be pretty effective for highlighting the more promising designs, at least for single-shape ones, It might be less effective for switches, since they will inherently be messier.

Omei · November 20, 2012, 7:50pm

I don’t remember saying something like that. Might it have been something about switches?

jandersonlee · November 20, 2012, 7:53pm

It also could be focused, but on the wrong NTs. I think it’s important to add the “desired shape” into the mix somehow. And yes, it appears to be messier for switches.

Omei · November 20, 2012, 7:56pm

I like the idea of being able to get the dot plot data.

(To be honest, I’m new to EteRNA and haven’t really tackled switches yet.)

Omei · November 20, 2012, 8:12pm

Wouldn’t satisfying EteRNA’s criteria for submission ensure that it had the desired shape?

jandersonlee · November 20, 2012, 8:14pm

Something like a chi-squared test might suffice for non-switches. And no, Eterna’s submission requirements don’t guarantee the right shape. That just says that the MFE (minimum free energy) matches the target shape according to the energy model.

jandersonlee · November 20, 2012, 8:16pm

If you look at the dot-plots for some of the lab submissions you can see that they don’t all have black dots in all the right places. Some are missing some, and some have extras.

Omei · November 20, 2012, 9:44pm

How about the Euclidean distance between target and actual?

jandersonlee · November 20, 2012, 9:50pm

This is where having the quantitative dot-plot data available for sequences with lab results might help: so we could try various strategies (rather than guessing). Euclidean distance might work, but statisticians tend to use least-squares or chi-squared tests to measure goodness of fit. I presume there is a reason for that.

Omei · November 20, 2012, 10:10pm

I don’t know how responsive the EteRNA team is to requests like this. But I’m thinking that it wouldn’t be that much work to write a program that takes the dot plot bitmap file (clipped from the screen) and outputs the data values along with some summary stats. If I were to do it just for myself, I would probably write it in C. But if other people wanted to use it, it would be better to do it as a web service of some kind.

At the moment, there are are only two of us pushing for this. If I wrote it in C and gave you the source code, would you be able to either compile it yourself or trust me enough to run it on your own (Windows) machine?

jandersonlee · November 20, 2012, 10:20pm

I can compile. I use Linux for development. Thanks.

jandersonlee · November 20, 2012, 10:20pm

I put in an RFE in any case. I’ll let them prioritize.

bentrem · November 20, 2012, 11:45pm

Omei - yes, indeed. But something you said … not “confounded variables”, but … and I was asking you about PPlot (as I called it) … you seemed to be saying something about measuring 2 things at once, and wanting just 1.
Perhaps I misunderstood. (Because I don’t yet grasp the nomenclature, what you said wasn’t clear to me, so of course I cannot recall / depict.) And perhaps you’ve here laid out what I recall badly.
^5

Omei · November 21, 2012, 12:02am

Ah! I remember now. At some point, I had the impression that you were thinking I was suggesting one statistic that summarized both the dot plot and the melt plot. I wanted to clarify that that wasn’t what I had in mind, since the two plots seem to measure to more or less orthogonal properties.

Does that match what you remember?

bentrem · November 21, 2012, 12:31pm

uhhhh … well … ok. Nice to see we’re in sync in terminology.

Now _ I think_ there was more to what you said, in the flow of the moment, than what’s here. Yes, orthogonality. That would not have confused me; it would certainly have intrigued me.

It seemed to me (Context: #ImplicitKnowledge #TacitKnowledge) that you said something about PPlot representing 2 factors. Orthogonal? Maybe not.

When you blue-sky imagined your finest data presentation (see Heidegger, essays on technology, re: real-ization and techne) … are you quite sure you weren’t anticipating an alternative?

Melt here; status quo PPlot there … you sure you don’t see another presentation?
Not orthogonal , exactly, but …
… what you said to me seemed right sensible. (see above) I didn’t have the nomenclature to record it / don’t have the nomenclature to depict it.

eternacac · November 25, 2012, 5:45am

to Bentram’s point. I just asked EteRNA to consider a difference tool for comparing a/b dot plots. I think it would make decerning qualities between nearly identical seeming dot plots and actual results much easier.

bentrem · November 25, 2012, 6:50am

… trem. with an “e”. If you would.

I’d guess most if not all graphical presentations could be used in some form of A/B. And perhaps some might be more meaningful, but not more immediately accessible than showing which bases have changed where.