Bug in Eterna's rna javascript library makes switch structure comparison difficult

Omei · June 17, 2015, 4:26am

In order to refine our algorithms for predicting switch success, it’s going to be necessary for us to distinguish stems that change when the state changes and those which are held constant across both states. Eli has made reference to this need a number of times.

It should be straightforward to distinguish between static and changing stems using the Eterna library’s RNA constructor function. Unfortunately, there is a long standing bug (or bugs) in that function and it often gets the structure wrong.

To see this, I’ll recommend jandersonlee’s script. It walks the tree structure produced by the RNA constructor and dumps what it finds.

There are at least two different symptoms:

Given the secondary structure ((((((…))).(((…))).))), the test works as expected. But with the structure ((((((…)))(((…))).))), the map function throws an exception that returns the string “Hairpin length is under 3”. I have other examples that trigger the same exception, if that will help track down the problem.
Sometimes rna.map completes, but shows incorrect characterization of loops. For example, the isolated unpaired base in (((…))).(((…))) is reported to be a bulge loop instead of part of the outer dangling loop. Adding an unpaired base at the end, i.e. (((…))).(((…))). changes the categorization from bulge to internal loop, which still isn’t right. But if you add an unpaired base at the beginning, i.e. .(((…))).(((…)))., it categorizes all three isolated unpaired bases as dangles, which is correct.

I reported this a year and a half ago, but at the time the developers had other priorities and nothing happened. Perhaps now, when it seems to be quite important for characterizing the details of what substructures change when a switch changes state, is the appropriate time to fix it.

At least I hope so. It’s something I need in order to return to a tool that would allow searching for secondary structures with the power of regular expressions, but the basic units having more semantic information than just “.”, “(” and “)”.

nando · June 17, 2015, 11:51am

This is a very old story, indeed, and as can be seen in the comments of the script, other players (me included) were aware of these issues, even before you reported them…

The good news is that the workflow in the dev team is steadily improving, and I was finally able to push a patch (which has been waiting for two years now) in the web repo. I actually coded a near complete rewrite of this RNA() class, and for now, it’s active on my personal developer instance.

Omei (and maybe others who were also interested in this topic like jandersonlee or codygeary), may I ask you guys a favor? My resources are unfortunately limited, and I don’t have much time to thoroughly test this class.

http://nando.eternadev.org/web/script…

Using the exact same script, it passes all the cases I know about, and it seems to also properly handle the cases you presented. But I’d like it to be tested further. And if we can’t manage to break it within the next few days, I’ll contact John to push the patch to production asap.

Omei · June 17, 2015, 6:44pm

Thank you Nando! I will exercise it and post back whatever I find.

Omei · June 18, 2015, 6:29am

I’ve tested some hand-crafted examples and found two small issues

The exception “TypeError: Cannot read property ‘index’ of undefined” is sometime thrown. For example, all of these cause a problem:

((…))((…))
((((((…)))(((…))))))
.((((((…)))(((…))))))
((((((…)))(((…)))))).

But this doesn’t

((((((…))).(((…))))))

It would seem that the sequence “)(” isn’t handled quite right.

Any characters other than ‘(’ and ‘)’ are treated as ‘.’. It turns out this was true for the original version as well. But I burned a bunch of time because the script I was actually testing with allowed for batch entry, and there was a trailing blank in each of the strings that I passed to the RNA constructor. That trailing blank was creating a dangling loop as the root RNAElement, which I tried to track down as being a quirk in the code. Perhaps there is some reason for this being a useable feature, but in the absence of a rationale for it, I would vote for either ignoring unexpected characters or throwing an exception.

Yet to do: I’ll write a script that generates the enhanced structure string (as in the above script) along with the image of the target structure for each of the classic labs. That will give us some “realistic” data.

nando · June 18, 2015, 9:27am

I think I found the origin of that problem in the RNAElement() class. In a few words, it’s simply pointless to walk over an empty set, which was what the old code was trying to do. Hopefully, I fixed that now.
Unfortunately, I think I can neither ignore, nor crash in these cases, because the puzzle database does contain a few such unexpected characters (left-over pseudoknots for instance), and the behavior of the Flash applet has always consisted in converting any character other than ‘(’ and ‘)’ to dots. Implementing a different behavior in the javascript classes would introduce serious inconsistencies. I’m not sure I want to go there…

Omei · June 18, 2015, 4:14pm

Great! All my short hand-made examples work now. And backward compatibility is a legitimate concern, so I have no complaints about leaving 2) as it is.

Omei · June 19, 2015, 4:34am

OK, Nando. I’ve manually checked 100+ lab target structures against the structure graphic and haven’t found anything amiss. I think it’s ready to release.

Thanks again for being so responsive.

nando · June 19, 2015, 3:35pm

Thank you Omei for the report and the feedbacks.

Apparently, John is out of town for another week or so. Unless there’s an emergency, I’d rather not touch to the prod server without him being around… I guess we’ll do the update soon after his return.

Omei · June 19, 2015, 4:37pm

I understand. Sounds like a plan.

nando · July 6, 2015, 11:24am

It took a little longer than expected, but the patch has been merged and pushed to prod now.