Simple rating system for synthesis candidate selection

JeehyungLee · March 14, 2011, 6:59pm

Hello all,

For past 1 month we have been discussing a new lab synthesis candidate selection system - “Elo” rating system and ran a beta test of the system

http://getsatisfaction.com/eternagame…
(A link to elo rating system discussion)

Recently, there was a discussion among devs about a simpler version of selection system, a system that will just ask you to “guess” a synthesis score of a given RNA, instead of comparing two and you are rewarded based on how accurate your prediction was.

http://getsatisfaction.com/eternagame…
(aldo has also discussed the similar idea in this post)

An advantage of this system is that we’ll be getting much more direct and fine grained information from you (players) to select synthesis candidates. This might lead to better performance overall.

A possible disadvantage is that it’s harder for new players to exactly “predict” what scores will be than voting or picking the better between 2 (elo rating system).

What do you think of the system?

As for the implementation timeline, this system is fairly simple so we can safely assume that both elo system and the new rating system will settle within same amount of time.

dimension9 · March 14, 2011, 8:10pm

Sounds interesting, but I would like to see the results of the elo 1 before going on to this elo2.

I also cannot help but wonder, if the suggestion of an entirely new rating system here, at this time, is an indication that elo 1 may seem like it is not working as hoped, in the dev’s esitmations. I hope this is not the case, but the timing seems to suggest it.

In short, I think it may be better to finish chewing and swallow before taking another bite.

mpb21 · March 14, 2011, 8:34pm

d9, it is not so much that elo1 is not working, it’s that we want to make sure we have the best system before we roll it out. Because having a constantly changing system is confusing to players and make data collection and analysis more difficult.

alan.robot · March 15, 2011, 4:12am

The way I see it, the goals of the ideal voting system would be to provide some consistent player-enforced quality control to weed out known loosing strategies, while simultaneously making it easier for new, innovative designers to get noticed and synthesized, even in a sea of hundreds of entries.

Simple voting is intuitive, but hard indeed to get noticed if you do not have a reputation for being a good designer, plus there are too many designs to choose from (“snowballing”), so the safe bets get all the votes. I saw several promising designs that never got more than 1 or 2 votes as they were from new players.

In the past, I have also seen all GC designs with high votes, and the only thing I could do to compensate was vote the next-lowest reasonable design up to try and bump the GC down in the rankings.

With Elo 1, I could consistently pick other designs over the all GC one, which is probably the easiest comparison decision to make. But I can see that too many pairwise comparisons may be needed to get a statistically significant ranking from all participating lab members as the number of submissions approaches triple digits, and this is the central limitation of the system. Unless everyone does hundreds reviews, I don’t think it will scale well.

With Elo 1 + the comment system, the advantage is that designers I’ve never heard of have their design pop up randomly on my screen, and I can leave them some feedback to point them in the direction of something that would stand a good chance if synthesized. I never would have looked at them if it weren’t for the random aspect of the reviewing system.

The idea proposed above would essentially allow for negative confidence to be expressed as well as positive confidence in a design (sort of like negative voting). This is an interesting idea, it would be neat if players to accumulate some sort of ranking on how accurate their scoring assignments are. There could be some weighting involved too, so that extreme predictions are weighted by how experienced a player has been in past voting rounds (like karma in slashdot or something).

I still think reviews should be randomly assigned, it’s the only way to guarantee new designs get noticed.

mat747 · March 15, 2011, 5:08am

I agree with D9 in that we need to see the results from Elo1 beta before going on to Elo2.

“guess” a synthesis score of a given RNA"
Yes that system would be a lot harder for new and current players and the time that would be needed/required to analyse would be increased too.

Berex_NZ · March 15, 2011, 5:12am

I would hold off this idea at this time. Even though I can see the potential benefit of it, I think we’re missing evaluative outputs that we can use as measures. So the benefit would be muted.

Due to the nature of the fact that the field of RNA is so new, so to speak, I think you will have to resolve to the fact that changes will always be neccessary to keep up with the latest developments. The best system would be one that is up for review every 6 months, unless there’s a new measure that we can immediately put to use and is an easy win.

I think now that we’re moving to more quanitifiable voting systems, we should be careful. I wouldn’t be surprised if there is quite a wide disparity in how elo1 is shaping the results, even amongst the top players. (By the way I’d love to see how elo is currently performing.)

To push for another system change, into more unknown territory, I think will just lead players more into relying onto dotplots and webservers like RNAfold. Which I think blunts the effect of using human spatial awareness and recognition. I don’t know about other people, but I have trouble determining whose design is the best out of Ding’s, Mat’s, mine and d9’s. Much less what it’ll actually score. So I don’t blame people for only voting for the designs which are already leading in votes aka snowballing.

Now if I played the inverse card, I could quite comfortably predict what an all GC design would score. Would I get full points for that, or are you going to not make it allowable? There are designs that we know are going to fail, technically the guess would be correct.

On a totally different side note, which this thread actually popped into my head, what about if one week your design made it into the top 8 for synthesis, that means for the following round, all your designs would have to sit it out? Like you could still submit designs, they just wouldn’t be in the running for synthesis.

wisdave · March 15, 2011, 2:49pm

As anew player, I fully admit to snowballing. I simply don’t know what to do.As a ranked in the top 20 players, I have a good grasp on creating puzzle solving designs. But, I am baffled by what makes a better design - low, high, or medium energy? More or less of certain nucleotide pairs? Short of ta=king classes here at UW - Madison, what would you suggest? I’m afraid I’m just causing more damage than contribution in the lab. Thanks.

Ding · March 15, 2011, 5:05pm

I think the main thing to do is look through past lab results to see what has been tried and what worked or didn’t work. And read old posts on GetSat.

It also helped me in the beginning to do some modifications of already-synthesized designs rather than try to design my own from scratch. That way you can see which parts failed and try to think of ways to fix them, it’s a good way of getting a sense of what does and doesn’t work that only really comes through practice (which we’re all still working on). Plus, I’ve noticed that especially in later rounds of a shape, new players are a lot more likely to get a design voted for synthesis if it’s a modification of a design that did fairly well already (I know that’s how I got my first two voted up).

dimension9 · March 15, 2011, 7:07pm

Hi wisdave - also, take a look at the table at the end of this post to get a rough idea of past successful percentages of each kind of base-pair:

http://getsatisfaction.com/eternagame…

Good Luck!

-d9

alan.robot · March 15, 2011, 9:34pm

Wisdave - this may be totally facetious/irrelevant comment for you if you are fully invested in another field of study, but I can vouch that if Tom Record is still teaching biophysical chemistry at UW, you might try sitting in when you can because he’s the bee’s knees in biochemical thermodynamics which is ultimately what this game is all about

JRStern · March 16, 2011, 1:29am

I’m with dave, I haven’t a clue as to what you even have in mind about guessing a score. What’s a score anyway, points out of 100? How is that computed? For that matter, do you go back after the fact and rate our submissions, show the scores? I know we get lab rewards based on the scores. I’ve played most available puzzles, but still don’t really grok the lab side at all.

aldo · March 16, 2011, 4:35am

Since apparently there are lots of doubts as to whether this would work or would be preferable to the alternatives, and since it’s such a simple system to implement (i.e. add a “Predict” button to the design info dialog, a “My Prediction” column to the lab table, and a formula to assign points for predictions), why not roll it out just as an extra way to earn points at first and then look at the results to see whether they could also be used to select designs for synthesis? That would also allow you to try out various selection criteria (highest average prediction, highest median prediction, highest Elo rating after automated pairwise comparison, etc.) on real prediction data before settling on one to use for actual selection.

aldo · March 16, 2011, 4:38am

I don’t know about other people, but I have trouble determining whose design is the best out of Ding’s, Mat’s, mine and d9’s. Much less what it’ll actually score.

But certainly you have some idea of the ballpark of their scores. It doesn’t have to be exact, just an informed estimate. As far as synthesis selection is concerned, what matters is not the relative scores of the top eight designs but rather the fact that they are all judged to be better than the rest.

Now if I played the inverse card, I could quite comfortably predict what an all GC design would score. Would I get full points for that, or are you going to not make it allowable? There are designs that we know are going to fail, technically the guess would be correct.

If enough other people predict it will fail, it won’t be selected for synthesis, so no one will get points for it. If on the other hand enough people think it will do well that it gets selected for synthesis, and you’re among the few who correctly predict it will fail, you probably deserve the full points.

wisdave · March 16, 2011, 2:26pm

Thanks to everyone for the excellent suggestions.

dimension9 · March 16, 2011, 2:54pm

Hi matt, if we try this “simple” prediction system now, it will mean we will have three systems going simultaneously (normal voting, elo, and prediction system) now THAT is confusing… especially since we have not even seen any result of elo yet to judge how it is working or how it would change the designs sent to synthesis. I agree with you; let’s not do anything confusing to players; a third test running simultaneously would be. One test should be completed and be evaluated before starting anything else new.

Ding · March 16, 2011, 4:44pm

@JRStern - rhiju (who runs the lab where our RNA designs are synthesized and tested) explains the scoring system in his response to this thread: http://getsatisfaction.com/eternagame…

I tried to write out a layman’s version, but it ended up way too long and no clearer than rhiju’s explanation

As far as the scores of designs that aren’t synthesized go, right now they’re just assigning each one the same score as the design synthesized in the round it was submitted in that has the most nucleotides in common. Those scores are what the lab rewards are based on. I’d take them with a grain (or a bowlful) of salt, since a design can share very few nucleotides with any of the synthesized designs but still has to be assigned a score – I’ve seen designs whose “closest” synthesized design shared only about half the nucleotides.

I think that part of either proposal for lab reform is doing away with lab rewards for “closest” sequences - in Elo we’d be rewarded based on how accurate our comparisons were only between two designs that both get synthesized, and in this rating system I think we’d only be rewarded for how close we guessed the score of designs that are actually synthesized.

wisdave · March 16, 2011, 5:56pm

Ok. I played around with D9’s spreadsheet and looked at anything synthesized at 95 and above. I also noted the melting points and energy levels. I modified one of my designs to fit within thse parameters and took into account the comments about repeating patterns of nucleotides. I think it looks much better now. Unforunately, I’ll have to wait to the next round as I have used up my three solutions. Again, thanks for the help.

@ Alan.Robot - Thanks for the comment, but I’m at the tail end of a career in business. It would be a couple of years before I retire and could take a few classes. I just stumbled on this a couple of months ago and was taken in by the possibility of designing RNA. I might get one to synthesize yet. I’ll keep working on it now that I have a few tpis.

JeehyungLee · March 16, 2011, 6:09pm

We got some preliminary results from the new lab on “The Star.”

http://eterna.cmu.edu/news/393375

aldo · March 16, 2011, 6:38pm

Unforunately, I’ll have to wait to the next round as I have used up my three solutions.

You can delete one of them if you want, you just have to unvote it first. Just make sure you copy the sequence and save it somewhere before deleting in case you want to refer back to that design again later.

wisdave · March 16, 2011, 8:13pm

Thanks, aldo. Done