Dear all -
Now that we have (almost) final version of EteRNA ensemble algorithm running in the lab, we thought we should share how we create the algorithm from the player strategies.
First we create an “ensemble classifier” from all player strategies in the market. Ensemble classifier combines player strategies to come up with one single strategy that scores a given RNA design in 0~100 scale. There are 2 ways to do this - with sparse features & with L2 regularization.
Sparse features : In this method, we first preselect the best 5 features. This is done by using a technique, Least Angle Regression and Shrinkage (LARS). After 5 features are selected, we determine weights (importance of each strategy in the classifier) that minimize the errors between the predicted score & actual synthesis score.
L2 regularization : Unlike previous method, we don’t preselect 5 features - we use ALL the strategies and try determine weights of them directly. However, we control the weights by using L2 regularization to prevent the classifier from over-optimizing itself to existing data.
Both techniques are from machine learning literature, and if you are interested, you might find this slides useful.
Once we have the ensemble classifier, we pass it over to a “sequence designer” which first create a sequence with 60% GC pairs and then keeps changing bases at random positions until it finds a sequence that gets high ensemble classifier score.
We now have 2 versions of EteRNA ensemble algorithm running - 1 with the classifier using sparse features and the other with the classifier using L2 regularization. If you have a good idea on improving the algorithm please let us know!