In-game display of recent lab results has problems

Omei · May 30, 2014, 10:34pm

Eli brought to my attention that a lot of lab data has been (recently?) published, both in-game and in the RDAT files. I was especially interested in the round 85 results because it is appears to be the first good set of data for the Reproducibility lab in a long time. But the first design I looked at, 2333413/2676335/Triloop Hairpin Test didn’t make sense in the Eterna UI. I compared it against the RDAT files and found several problems.

Here’s a screenshot of the display.

Here are the issues I see:

The display is in target mode, but the target structure isn’t right. There should be three unpaired bases at the 5’ end, not 1. As a result, all of the SHAPE scores are misaligned with the structure.
The synthesis score is 80/100. This is probably a result of the misalignment with the target. But even when I realigned it in my head, it didn’t look like the results I expected. So I turned on the SHAPE values. Here’s a closeup.

Note that the base 49 is a bright yellow, even though its value is 0.40, and base 51 is still quite yellow at a value of 0.28. So it appears that legacy scaling is being used. I certainly thought that the UI had switched to absolute scaling. Is this a regression bug?
I never fully understood how legacy scaling was computed, but from observing it, it didn’t quite make sense to me that it should set the threshold for this design so low. So I cross-checked against the RDAT file. Here’s the main entry:

This part of the RDAT files contains the general description of the design, without the base-by-base data. Notice that the target structure and Eterna score are plausible (as opposed to what is displayed in the UI.) But the min, max and threshold values don’t seem right, but do seem consistent with the UI.

To verify that the min_SHAPE and max_SHAPE values aren’t right, here’s the section from the RDAT files that contains the SHAPE values. Note that within a single RDAT file, separate entries for the same design share a small integer identifier, in this case 949.

As a cross-check to make sure I was looking at the right data, I’ve highlighted (magenta) the values for bases 48-51, which match up with the values reported in the UI. I’ve also highlighted the actual minimum and maximum SHAPE scored (green) which are -0.02 and 2.00, contrary to what the the min_SHAPE and max_SHAPE values in the screenshot above.

I haven’t looked further to determine whether all the labs in this synthesis round have the same issues, or whether it affects more synthesis rounds than 85.

jnicol · June 1, 2014, 1:39pm

Hi Omei, Jee and I are looking into this. Thanks, John

Omei · June 2, 2014, 4:15am

Thanks, John. As an added note, I recall a bit more now about how legacy scaling worked, and remember that min_SHAPE and max_SHAPE were not supposed to represent the minimum and maximum SHAPE values, but the clamping points for 0.0 and 1.0. Since I never knew how those were calculated, my points 4) and 5) are probably non-issues.

Certainly the most important issue is that we really don’t want to take a step backward by returning to legacy scaling in the UI.

Eli_Fisker · June 2, 2014, 7:24am

I too say we really need the absolute scaling as this is the most precise picture of the data we can get. Here is how Omei explained legacy scaling to me:

“Remember that with legacy scaling, every individual RNA molecule has its own unique scaling. It’s not just a matter of different labs being scaled differently; each design within the same lab is scaled differently.”

However it will be real nice to still be able to see legacy scaling data, like we can in the old lab interface. Else we will loose connection to and ability to check past analysis against current. As all past analysis was solely based on legacy scaling.

Also I have a problem to report. I have long been wondering why some of the labs had so odd SHAPE data, when shown in absolute scaling in the new lab interface. This in particular is true for the classic labs and some of the player projects too. They have very wrong looking SHAPE data. However I think I can explain what happened.

In these cases the legacy scaling makes a lot more sense watching. Because the absolute scaling have SHAPE values that are not scaled to fit a 0-1 scale like the later lab data. But this fitting between 0.1 and 1.0, will be needed for the data to make sense, in absolute scaling.

First I will give some examples of how the same lab design looks in different labs.

Normal looking SHAPE data - legacy scaling data.

(http://eterna.cmu.edu/web/browse/733019/)

Somewhat unormal looking SHAPE data - absolute scaling.

Very unormal looking SHAPE data for the Player project lab NUPACK’s finger revisited. Screaming because the shape values are not clamped between 0.0 and 1.0

Eli_Fisker · June 12, 2014, 10:09am

I found one more lab with the data looking like they were posted in native instead of target mode:

Target

Estimate

Lab of origin: http://eterna.cmu.edu/web/browse/3376…

And it is a problem in its sibling labs as well:
http://eterna.cmu.edu/web/browse/3376…
http://eterna.cmu.edu/web/browse/3376…
http://eterna.cmu.edu/web/browse/3376…