[Strategy Market-Switch] DPAT analysis data for R100 for use in designing high and low scoring desings

Jennifer_Pearl · March 18, 2016, 7:01am

Hi. I just got done running R100 through DPAT and here is all the data it generated through the macro search filter settings for 80-100 scores and 30 60 group scores designs for everyone to use. There are 3 types of files for each single oligo sublab group of 80-100 and 30 60 groups. There is the file with just the round number and sublab name which has just the list of markers from the MFE secondary structure.There is a RawDesignInfo file that has all the unique secondary structures that fit the search criteria along with sequences and partition function information when available. There is also a Report file that has a bunch of things going on.

The report file starts with a summary of the structural characteristics of the results of the macro search. It has the averages of stacks and pairs for the whole set of results and the standard deviations of the averages. This is broken into two two groups that repeat. The two groups are designs that have stacks that have stacks that fit the search criteria as well as stacks that don’t fit the search criteria (which is the first group) and the designs that have all stacks that fit the search criteria (the second one). Then there is a list of each unique secondary structure and the number of times it appears in the search criteria as well as the number of pairs and stacks and ratios. I was doing some research into this before I changed focus and worked on how to detect the 30 60 group. This data is repeated again but for the second group as described above and at the very end is a list of all the stacks found that fit the search criteria.

Also included are the raw DPAT data files with all the stacks and the pairing probabilities for the stacks that were used as a source for the searches.

The DPAT data I used as the constraints for Sara in R101 is the plain names file for the markers, the partition function data ranges for the applicable score group, and the pairing probabilities of “ideal” binding sites which don’t exist for these sublabs. So I would recommend designing for round 2 of R100 so that at least one of the stacks in the 80-100 score group is in your design for a good chance at a high scorer and that the partition function values are in the same range as the good results. I can have values for you tomorrow but I need to go to bed now.

Edit:
I fixed the link and added more files and fixed my error in calling this round 101 vrs 100. I have included the macro settings file I used along with an explanation of DPAT’s language for a detailed explanation of what was searched. Basically DPAT looked at specific score ranges and picked out structures that were only present in those score ranges.

Here is the zip file
https://dl.dropboxusercontent.com/u/87351147/R100_DPAT%20macro%20search%20results%20and%20raw%20data…

rhiju · March 20, 2016, 4:22pm

how well did the DPAT predictions do?

Jennifer_Pearl · March 20, 2016, 7:29pm

I am going through the DPAT predictions now. I just wrote a program to parse the data and give me a list of scores for each list of predictions. The predictions I posted in the forums are using DPAT data to predict optimal binding site stack lengths with optimal min pairing probabilites. The designs that I submitted using SaraBot in SSNG1 and SSNG2 use marker theory mixed with some optimal binding site stuff with some partition function ranges.

The SaraBot data looks good I think and demonstrates that high scores can be gained using DPAT and Sara. I posted my initial impressions in the forum under the R101 conversation. Based in an initial observation of the raw data as I go through everything I think DPAT gives what stacks can get high scores and which ones wont but not a “single ideal stack” for the binding site but will know more once I dig int everything.

Jennifer_Pearl · March 20, 2016, 8:46pm

Just got done looking at and plotting data for SSNG3. The NG3 sublabs had a unique opportunity in that they were the only sublabs with large loop stacks that had such a stark difference in performance with the same constant static long small loop binding site when looking at the plots for stacks form Round 1 and 2. In SSNG3 I found that a large loop length of 5 would spell disaster for the design while a large loop length of 4 would be good for it. I used a 3 base pair long large loop for the control. This was supported in the lab results and here is a plot of everything.

I am have more data to post but wanted to get this out there as quick since I have alot to do today. I am going to post data for EXNG3 next and then look at data for the static small loops to see if the data agree’s with Eli’s thoughts on the static small loop which is similar to what DPAT has been seeing.

rhiju · March 20, 2016, 9:13pm

Cool – this is a great check. Was same state NG3 a repeat of a prior puzzle (and if so, did you use prior solves to help train the predictor?). Did you have predictions for the puzzles that were new to R101?

Jennifer_Pearl · March 20, 2016, 9:24pm

I used prior solves from SSNG3 round 1 and 2 to train the predictor. I did submit SaraBot good designs (no bad or control) for Inverted Same State NG 1 using generalized predictions from SSNG1 but did not post any predictions for new designs.

Jennifer_Pearl · March 21, 2016, 6:22am

Here is data for EXNG3 with good and bad large loop binding site predictions. Since the bad prediction actually happened to be the control group (3 base pair long) I was using for everything I ended up using the predictions for static long loop without probabilities as the control. This basically includes all variations of the large loop submitted. It was not the best but it did not think of a control to use back then and just figured I could use it as a control now but not sure if it is a valid control since there are so many different things going on in that one it doesn’t really say anything other than there are only 2 groups a good and a bad which I predicted. There were a couple OK designs in the predicted bad group but none that go higher than the high 70’s and it seams to focus on the 60 group. The good group has a few where the predicted good’s go up to 80 but with the distribution between 50 and 80. The bad predicted group score significantly lower than the good predicted group with its distribution between 10 and 40 mostly. The highest good is not the highest design in the round but it is high.

I am starting to believe that there are a series of ideal binding site lengths and not just a single one which is in agreement with Eli’s observations about the small loop binding site. He observed that the longer static loop is better but that the occasional non-static loop could still work thus there is not just a single ideal structure. I have not looked at my predictions for dynamic small loop vrs long static small loop but I will post that data next.

Here is the plot for EXNG3

Jennifer_Pearl · March 22, 2016, 1:19am

I’m gong to repost the data for SSNG3 and EXNG3 in the discussion thread as well as further data on the other sublabs.