How to read the raw switch data

Eli_Fisker · May 28, 2012, 8:59pm

As there keep coming questions on how to read the raw switch data, I have decided to make a intro. It is not as hard as it looks at first. Once you know what columns to read and what to look for, it gets much easier.

First I have marked the areas in the puzzle that is supposed to switch. After that I will mark the same spots in the raw switch data.

Here is the hairpin with the switching areas marked. The first column in the raw data have data for the hairpin shape. Depending on there being a black line or not, it shows if the nucleotides is actually bound or unpaired in this shape. The red circles tells you what they are supposed to be. More on that later.

Below is the molecule-bound shape with the areas for switch marked. The second column in the raw data holds the data for this shape.

To read the raw data you need to focus on the first two columns (from the left) in the raw data picture. The first colum is for the hairpin shape. The second column is for the molecule-bound shape, that should bind up with the FMN-molecule. I have marked the areas where switches are supposed to occur.

In the raw data a dark line means unpaired.

The red circles marks where a dark line should appear if the design is good.

Notice that there are only circles in the one side of the boxed switching areas. That means that for a switch between the two RNA shapes to occur, there should only be dark lines in the one of the two columns – under those red circles.

Remember the first colum represent the hairpin and the second collum, the molecule-bound shape. So when the lines disappear in the second column for the four out of five switching spots, that means that those nucleotides are no longer unpaired, but should preferable be paired up with their intended partners. Except for that 5 spot that is the locked nucleotides in the hairpin string. Those should end up as single nucleotides.

Two black lines and a base number

If there are two black lines in column 1 and 2, that means that nucleotide has not bound in either the two switching shapes. This is bad. No binding means no switching.

Two white “lines” and a base number

If there in column 1 and 2 is only white, that means the nucleotide is bound in both shapes, and not letting go in the one of them, as it is supposed to. This is bad. Totally binding means no switching.

One black line and one white (or fainter line)

Changes in the tone of the line, between the two columns, means a switch occurred. Preferable it should be black one place, and white in the next. But even the slightest switch in the right direction, counts when points are given.

Now to see if the switches actually occur in a specific area, we will take a closer look at the raw data sheet. Lets zoom in on the spot nucleotide 16 to 19 that represent one of the switching areas, the one where Rhiju mentioned that nucleotide nr. 19 did not switch in his analysis of my design.

Reason why 16-18 gets point is that the lines gets fainter in that second column where there are no circles. The line at base 19 does not appear to get fainter, so no switch or a hint of a switch.

And one last thing. When it comes to switch designs, you can’t trust the shape data alone. You need to look at the raw data, to see if the switch occur.

EXTRA - idea for what to do with the raw data

Here is how I use the raw data. There should be other ways to go at the lab, and if you have any ideas, please go ahead - just read those raw data…

I pick one of the five switching areas that I want to improve. For that I pick a high scoring design, that does already do well in that area, but have one or two unswitching nucleotides.

My thought is that we might be able to see which nucleotide (A,U,C or G) does best at switching at a specific base.

Eg. I want to make a change at base 19 as it isn’t switching well. If I put a U at base 19, is it then capable of being involved in an actual switch, in any of the lab data? So you look through the designs that happen to switch well here. And count which and how many of the U, G, C and A bases that does well and count those too that does not switch. That should give you a picture of tendencies.

If U at base 19 appears to be switching well and the other bases have less success at that spot, then it might be a help on picking the right color at that specific problem spot. Of cause there is a whole basepair involved in that particular switch, so the partner to the nucleotide you are interested in, should be watched the same way too.

Honestly I don’t care what score a design have, as long as it have a proper switch at that spot I look at. As Brourd pointed out, that approach might lower the value of my data. True enough. But I think if a nucleotide at this spot several times have shown itself capable of being involved in a switching, that enhances the chances of me picking the right data, even though looking at low scoring designs. And it is a risk I’m willing to take until I figure a better way to do it.

Can’t guarantee this method will work. Just some early thoughts on how to get the unswitching spots to switch.

Good luck with the data mining

hoglahoo · June 24, 2012, 5:44am

interesting idea. I have no idea whether it makes sense that if one nucleotide switches in one sequence, that tendency will carry over to other designs - I’m glad you’re going to go after it and I hope it yields good switching. Have you changed your approach with the recent ‘fix’ in scoring and shape data?

Eli_Fisker · June 24, 2012, 8:48am

Hi Hogla!

Imagine it like this. I am watching a whole basepair in the molecule-bound shape, where at least one of the nucleotides is not involved in a switch. I then look for designs where both nucleotides at this spot is involved in a switch. So my idea is basicly to see eg, if I have the basepair AU at a spot, and one of the nucleotides at least is not switching - when following it’s route. Then I try to see if I should change that basepair into a GU, because this statisticly have had better succes at this spot or rather a GC-pair.

I used this method for round 6. I didn’t succeded improving the design. I was moding the Tebowned design (89%) and my mod scored (75%)

Nascarnut moded Tebowned too, based on another suggestion I had on the raw data, and it scored (78%).

I still like the idea of my method. What could be wrong? I might have relied to much on lower scoring designs, what Brourd pointed out could be a problem. Also I have been ignoring the effect neighbouring pairs can have on a basepair. I don’t know. Just that so far it does not look what I did, was working.

The fifth round I used a slightly different method. I looked at the raw data to find a base that was not switching. Then I looked at it’s partner (in molecule-bound shape) and it didn’t looked too good either according to my understanding of the black lines back then. So I simply swapped them around, from the idea, that when both were bad, I might get lucky getting one or both better, by just swapping them around. Potential maximum output from minimum change. That was my Tebowned mod that ended scoring 94% after the data update.

For round 7, I have started looking at the shapedata again. I still cast a glance at the raw data, for the design I’m interested in making a mod of. I aim at getting binds at spots that looks like unbinding in the shapedata.

UPDATE
Above I said that: “When it comes to switch designs, you can’t trust the shape data alone. You need to look at the raw data, to see if the switch occur.”

Brourd found a concrete example of a mismatch between the shapedata and the raw data. This made the devs able to figure what was wrong with how the shapedata was presented.

And so far it looks like the new shapedata is making more sense. I will now say, look at both the raw data and the shapedata. But you should be able to go on the shapedata alone.

Eli_Fisker · July 2, 2012, 6:00pm

Just an addition on the raw data versus shape data. In case you haven’t seen Rhiju’s update on our data situation.

The shape data can now be trusted more than the raw data. So when using the raw data, combine it with the shapedata.

If you want to know more about the why and what happened in the meantime, go read here.