Post-Test Bot Review

paramodic · March 19, 2012, 1:40pm

Foreword: I know it’s quite a wall of text, so for those who don’t feel like subjecting themselves to it may skip straight to the sections marked as ‘the bottom line’ to get the basic idea of what I’m saying.

So I’ve been spending the last couple months primarily focused on identifying the flaws in the current bot algorithms, and I think I’ve got something that’s worth reviewing. I’ll be starting with Vienna, whose problem, I believe, would be the easiest to identify, confirm, and correct. I’ll finish with Infobot, whose problem I’ve identified, but don’t understand, and thus, cannot provide any recommendations for a fix.

Vienna: This one wasn’t so clear right away, particularly since Vienna’s semi-random algorithm caused some events that clouded the data by solving some surprisingly intense puzzles on occasion. Vienna seems to have the must trouble with puzzles that have either or both of two properties. The first property which tends to inflict failure is a high ratio of unpaired to paired bases, counting each residue as an individual as opposed to counting the pair as the unit. The second property which tends to inflict failure seems to be a fold which has closely placed internal loops and hairpins- at the very least 2-2 or 3-3 on internals, with a rough maximum of 7 nt’s distance in between at a glance (I’ve yet to pin down the exact specs). I believe that the two properties create failure for a related cause. Vienna has no protocol for leaving unpaired bases unmutated from their blank Adenine form. By comparison, humans are generally lazy, and won’t mutate the unpaired adenines unless they think it will help solve the fold. For this reason, I think, is why Vienna seems to fail some surprisingly simple puzzles, which human players have little trouble solving.

What I think is happening is that the randomly mutated unpaired bases keep reacting with with the randomly mutated unpaired bases in adjacent internal loops, or in relatively unstable short strings, which may be broken by a stronger bond wanting to form from a base in the internal loop or by raising the energy in the loop too high. Granted, this doesn’t always happen, and Vienna doesn’t fail consistently because of its nature. The fact that Vienna uses a random algorithm means that all states are possible, including the ‘correct’ unmutated unpaired base configuration.

-----The Bottom Line-----
If Vienna were given a protocol to first attempt to solve for a fold by leaving unpaired bases unmutated from Adenine, and then moving on to only mutating possible boost points, I expect that its performance would be greatly enhanced. Thus, therefor, is my recommendation.

Infobot: Infobot was a little trickier to pin down than Vienna, albeit that Vienna’s specific error sites hit me later on while testing for infobot. It seems to me that Vienna has trouble with 1 NT bulges under very specific conditions. These conditions, which do not always create a failure, but individually seem to raise the odds that Infobot will fail, and do seem to work in conjunction, are as follows.

-The first condition is that the RNA fold have sequential internal loops. Should the bulge be placed between these loops, Infobot is likely to fail. Whether the loops need be uneven, or the stacks between them need be a certain length is yet undetermined and requires more testing. Though I will say that the shorter stacks do seem to baffle infobot more effectively. Also, the bulge is more successful at baffling Infobot when it is placed within 1 NT of the internal loop. More 1 NT bulges or 1 nt bulges without adjacency to internal loops seems to have no effect on Infobot.

-The second condition seems to be a clockwise turn to the fold. That is to say that following from 9’ to 3’, the fold will turn primarily in a clockwise direction, such that the end hairpin of the RNA fold will be more right-wise or clockwise than the origin of the strand.

Other observations are that if Infobot fails the fold, Vienna typically will as well, though it stands to reason that this is due to the sequential, closely placed groups of unpaired bases.

-----The bottom line-----
I don’t know why infobot has trouble with these conditions, and therefor, I cannot suggest any solutions. I would ask my fellow eteRNA players to help me solve this mystery.

Thank you for your time in reading this, I welcome any criticisms, and I invite you all to test what I’ve here observed and suggested, as well as to use freely in any of your future research.

Quasispecies · March 21, 2012, 6:44am

Nice work, it’s very interesting to see if there is a pattern to puzzles that stump certain bots. One thing that I found striking about your series of puzzles is that they are also consistently solved by RNASSD . What does RNASSD do that the other two bots do not, and how might that give it an advantage in solving your puzzles?

From what I’ve seen, Infobot seems to struggle with zigzags, closely-spaced loops, and anything else that requires you to identify and deliberately destabilize unwanted structures. Maybe these structures cause info-rna to design a poor initial sequence. Maybe they cannot be efficiently solved by info-rna’s algorithm. Maybe both.

A lot of the bots accept/reject trial sequences based on whether they bring the new structure closer to the target structure. InfoRNA seems unique in having a chance of accepting a “bad” mutation. This could let it avoid getting “stuck” with a solution that seems to have nearly solved the puzzle but is actually further off the mark. For certain puzzles, though, InfoRNA’s starting point may be so far off the mark that many bad mutations would be required to reach a solution.

The search algorithm may also be problematic. It may only consider changing improperly paired bases and those immediately adjacent. That won’t help if the bases that need to be changed are in a distant, properly-folded region. Imagine you’re trying to solve an isolated base pair between a multiloop and a large internal loop, like in the structure pictured below (failed by all bots). The UU terminal mismatch is required to solve the puzzle, but you can solve everything except the isolated pair if you use the wrong mismatch/closing pair.

You or I look at that structure and think, “how can i destabilize the multiloop formed if that isolated pair is broken”. Bots don’t seem to consider these things.