[Strategy Market] deivad's strategy

deivad · June 4, 2011, 1:21am

My strategy consists on these steps:

If the RNA has got two big loops close to each other (that means for me less than 4 bonds between them), I put some GC’s pairs there (maximum 3), and AU pairs.
If two small loops are separated by a single bond, I put a GC pair there.
In the rest of the RNA sequence, I put random AU, UG and GC pairs to complete it. I put about a 40-60% of AU’s, a 20-35% of UG’s and a 5-15% of GC’s.
When I finish this step, I look to the target shape to see how close I am to the goal.

-If a short series of bonds between two loops isn’t bonded correctly, I switch to natural model to see if it doesn’t bond correctly because of wrong bonding (first case) or because it doesn’t bond at all (second case).

-In the first case, I swap some paired bases randomly to see if it changes something, and it usually does. If not, I do the same as in the second case.

In the second case, I change UG’s and AU’s to GC’s.
If it doesn’t solve the problem, I change some elements in the loops to reduce the energy of them, ensuring that this action isn’t creating wrong bonds.
If small loops between long bonds aren’t bonded correctly, first I swap the bond near the wrong small loop, and if it doesn’t solve the problem, I swap some other, randomly, and usually the problem solves there, and the target shape is obtained.
Finally, as an “extra entertainment”, I try to change the elements in the loops, to avoid lots of A’s there. I do it randomly, but if one change brings me off the target shape, I undo it and try with another element.

I hope you can implement that!

JeehyungLee · June 4, 2011, 4:47am

Dear deviad,

It seems like you are describing a strategy to “design” RNA sequences for the lab. Right now the focus in the Strategy Market is a strategy to “score” RNA sequences. In other words, you are given other player’s RNA, how would you score it?

Your strategy is full of very exciting ideas…and we don’t want to overlook this. Could you try to steer the direction of your strategy to “scoring” instead of “designing”?

Thanks for sharing your idea!

deivad · June 4, 2011, 1:04pm

Oh, sorry, Jeehyung!! And thanks for making me notice my mistake!! It seems I didn’t understand well the aim of that “strategy market”.

Well, in fact, I don’t really know how to score a design.

I’ll need to take a look on the lab designs first, and their actual results, because I’ve just arrived to 10,000 points and the lab was blocked to me. It will take me some time, a week or more, but as soon as I learn how to score a design, I’ll post it, and I’ll let you know. Do you want me to reply this post then, or start a new one?

JeehyungLee · June 5, 2011, 2:25pm

hi deivad, a reply to this post will work.

Yep, scoring is hard, but it’s closer to solving problems fundamentally : ] In fact, if we know how to score, we always come up with a “design” algorithm that tries to maximize the score!

deivad · June 11, 2011, 5:14pm

Well, I’ve made a quick look in the lab and that’s what I’ve found to be my scoring strategy. It’s quite long, but I think it works! I add / substract points to each design according to those conditions:

I count how many loops there are with its energy below 1. I add a point for each one. (+1 each)
I do the same for the loops with its energy higher than 4. I substract a point for each loop like this. (-1 each).
I add a point for each short chain (less than 5 bonds) that has two or more GC’s (+1 each chain).
I substract a point for each short chain (less than 5 bond) that has one or more GU (-1 each chain).
If a GU bond is just near a loop with its energy higher than 2, I substract 2 points for each one (-2 each).
I substract a point for each C in a loop (-1 each), as they are potential undesirable strong bonds. I don’t take into account the blocked, default elements.
I substract 0.2 points for each G in a loop (-0,2 each), as they can try to make a strong bond with any C. I don’t take into account the blocked, default elements.
I divide the energy between 10, and put it positive (I mean, if the total energy is -50 kcal, it would have 5 points.
If there are more than a 50% of GC’s, I substract 2 points. It can be a way of avoiding too many GC’s which is an easy way for obtaining the desired design.
If there are more than a 15% of GU’s, I substract 5 points. Probably it will be too weak, and it won’t fold correctly.
I divide the melting point between 20.

Finally, I add the different marks.

As far as I’ve seen, good designs (with 95% or more, obtain more than 10 points (more than 20, sometimes), 80-90% designs obtain between 0-10 points, while designs with less than a 70% are often negative.

Obviously, that needs to be tested, because I’ve tried it just in some cases.

I hope you can implement that!! Make me know if there’s anything you don’t understand or it’s not completely clear.

JeehyungLee · June 12, 2011, 3:15pm

Dear deivad,

Your strategy has been added to our implementation queue with task id 15. You can check the schedule of the implementation here.

ETA of the implementation is 6/15/2011

Thanks for sharing your idea!

EteRNA team

JeehyungLee · June 18, 2011, 8:01pm

Hi deviad,

I want to apologize about the delays in implementing your strategy.

We are having technical problems in integrating energy calculation to the strategy market right now…We’ll get back to your strategy as soon as we fix the problem!

EteRNA team

deivad · June 19, 2011, 1:43pm

Hi Jeehyung!

I’ve already seen this problem in other strategies, so I was right thinking my strategy would have the same issue.

I encourage you to keep improving your program, because energy calculation is probably a key point to get good designs.

By the way, just if you want, you can try to implement my strategy avoiding the conditions which need energy calculations. And when you finally can do those calculations, then you can compare if my strategy is much better/worse with or without energy calculations.

JeehyungLee · June 23, 2011, 3:02am

Dear Deivad,

We are glad to report that your strategy has been implemented and tested.

While implementing your strategy, we have made small changes to the parameters you specified to optimize the performance.

Note that we’ll always run a optimization over the parameters you specify, so you won’t have to worry about fine tuning all the numbers you use. Just the idea and rough numbers are enough to run your algorithm!

Length : Your strategy was implmented with 60 line of code.

Ordering : We ran your strategy on all synthesized designs and ordered them based on predicted scores. The correlation of your strategy’s ordering with the ordering based on the actual scores was 0.329. (1.0 is the best score, -1.0 is the worst score. A completely random prediction would have 0 correlation)

Please note that the numbers specified above will change in future as we’ll rerun your algorithm whenever new synthesis data is available.

More detailed result has been posted on the strategy market page. Thank you for sharing your idea, and we look forward to other brilliant strategies from you!

Also, please take a look at the new strategy market plan. In short, we would like to ask you to use single feature for each strategy - for example, instead of saying GC must be 60% and loops should have energy lower than -3.0…you’ll publish one strategy for each of them. The point is that you don’t have to worry about combining multiple features - we’ll do that for you!