[Strategy Market] Tests by Region

Quasispecies · August 2, 2011, 5:04pm

This strategy breaks the RNA into regions and applies penalties to each region. Each sequence starts with 100 points. Each time you penalize, you deduct one point. You could probably add a weighing factor to each penalty, but I would only be guessing at what they should be.

Score = (100 - total # of penalties)

Break the target structure into regions.
Use the following types: loops, stacks, and boundaries. A given base pair may be a member of more than one region.

A “loop” includes bulges, internal loops, hairpin loops, multi-branched loops, and unpaired overhangs at the termini
The closing pair of a stack is a member of the stack and the adjoining loop. It is additionally classified as a “boundary”.
Any base pair shared by two loops is also classified as a “boundary”

For all loop and overhang regions:

Penalize if the free energy contribution per nucleotide in the region is >0.5 kcal
Penalize for each stretch of 4 consecutive bases complementary to 4 bases elsewhere in the molecule. Example:

  
 Penalized if more than one sequence in the molecule  
 is complementary to "GAGU"   
  
 Round 2: 5' ...G[AGUA]ACGGAC... 3'   
  
 Penalize if more than one sequence in the molecule  
 is complementary to "AGUA"   
  
 And so on... ```   
  
 **_For all stack regions:_**   
 Treat the stack as two separate chains. Example:
``` 5' ...GAGUAACGGAC... 3' 3' ...CUCAUUGCUUG... 5' ```   
- Penalize each instance of 3 or the same base placed consecutively (ex. one penalty for UUU, two for UUUU, three for UUUUU, etc...)
  
- Penalize each placement of 2 or more adjacent purines (or pyrimidines). Example: 
``` Y is any pyrimidine, R is any purine   
  
 YGAGY YGACR YGUGY   
 ^^^ ^^   
3 penalties 2 penalties 0 penalties ```   
 Do the following for both chains:   
- Penalize each stretch of 4 or more consecutive bases that are complementary to more than one 4-nucleotide sequence elsewhere in the RNA. Example:

Round 1: 5’ …[GAGU]AACGGAC… 3’

Penalized if more than one sequence in the molecule
is complementary to “GAGU”

Round 2: 5’ …G[AGUA]ACGGAC… 3’

Penalize if more than one sequence in the molecule
is complementary to “AGUA”

And so on… ```

For all boundary regions:

Give a penalty if the boundary is not a G/C pair
Give a penalty If there is an adjacent base pair and it is identical to the boundary pair

 A A  
 G A G U A A A   
 C U C A U U A   
 ^ G A   
  
 The indicated base pair would be penalized twice.  
 It is not G/C and it is identical to the adjacent pair. ```

Adrien_Treuille · August 2, 2011, 5:12pm

Wow. This is a super sophisticated strategy!

Quasispecies · August 2, 2011, 5:14pm

Unfortunately, yes.

Probably too unwieldy and poorly described, but maybe someone will play with the ideas and find a way to implement them in a way that is less complicated.

JeehyungLee · August 4, 2011, 7:07pm

Dear Quasispecies,

Your strategy has been added to our implementation queue with task id 44. You can check the schedule of the implementation here.

ETA of the implementation is 8/11/2011

Thanks for sharing your idea!

EteRNA team

JeehyungLee · August 15, 2011, 3:20am

Hi Quasispecies,

As your strategies included more than one components, we have implemented them in 2 separate algorithms. These 2 algorithms will eventually be combined in the EteRNA ensemble strategy with all other strategies.

“For all loop and overhang regions” and “For all boundary regions” got implemented as 2 strategies. “For all stack regions” has not been implemented as there were other strategies looking at the same thing (aldo’s "repetition, Eli Fisker’s “blue/red/green line”.

The results from 2 strategies will be posted shortly!

JeehyungLee · August 15, 2011, 3:23am

Dear Quasispecies

We are glad to report that your strategy has been implemented and tested.

While implementing your strategy, we have made small changes to the parameters you specified to optimize the performance.

Note that we’ll always run a optimization over the parameters you specify, so you won’t have to worry about fine tuning all the numbers you use.

Just the idea and rough numbers are enough to run your algorithm!

Length : Your strategy was implmented with 10 line of code.

Ordering : We ran your strategy on all synthesized designs and ordered them based on predicted scores. The correlation of your strategy’s ordering with the ordering based on the actual scores was 0.116889559965. (1.0 is the best score, -1.0 is the worst score. A completely random prediction would have 0 correlation)

Please note that the numbers specified above will change in future as we’ll rerun your algorithm whenever new synthesis data is available.

More detailed result has been posted on the strategy market page. Thank you for sharing your idea, and we look forward to other brilliant strategies from you!

JeehyungLee · August 15, 2011, 3:24am

Dear Quasispecies

We are glad to report that your strategy has been implemented and tested.

While implementing your strategy, we have made small changes to the parameters you specified to optimize the performance.

Note that we’ll always run a optimization over the parameters you specify, so you won’t have to worry about fine tuning all the numbers you use.

Just the idea and rough numbers are enough to run your algorithm!

Length : Your strategy was implmented with 30 line of code.

Ordering : We ran your strategy on all synthesized designs and ordered them based on predicted scores. The correlation of your strategy’s ordering with the ordering based on the actual scores was 0.16307161346. (1.0 is the best score, -1.0 is the worst score. A completely random prediction would have 0 correlation)

Please note that the numbers specified above will change in future as we’ll rerun your algorithm whenever new synthesis data is available.

More detailed result has been posted on the strategy market page. Thank you for sharing your idea, and we look forward to other brilliant strategies from you!

JeehyungLee · August 15, 2011, 3:24am

“For all boundary regions”

JeehyungLee · August 15, 2011, 3:24am

“For all loop and overhang regions”