Simple rating system for synthesis candidate selection

Hello all,

For past 1 month we have been discussing a new lab synthesis candidate selection system - “Elo” rating system and ran a beta test of the system

http://getsatisfaction.com/eternagame…
(A link to elo rating system discussion)

Recently, there was a discussion among devs about a simpler version of selection system, a system that will just ask you to “guess” a synthesis score of a given RNA, instead of comparing two and you are rewarded based on how accurate your prediction was.

http://getsatisfaction.com/eternagame…
(aldo has also discussed the similar idea in this post)

An advantage of this system is that we’ll be getting much more direct and fine grained information from you (players) to select synthesis candidates. This might lead to better performance overall.

A possible disadvantage is that it’s harder for new players to exactly “predict” what scores will be than voting or picking the better between 2 (elo rating system).

What do you think of the system?

As for the implementation timeline, this system is fairly simple so we can safely assume that both elo system and the new rating system will settle within same amount of time.

Sounds interesting, but I would like to see the results of the elo 1 before going on to this elo2.

I also cannot help but wonder, if the suggestion of an entirely new rating system here, at this time, is an indication that elo 1 may seem like it is not working as hoped, in the dev’s esitmations. I hope this is not the case, but the timing seems to suggest it.

In short, I think it may be better to finish chewing and swallow before taking another bite.

d9, it is not so much that elo1 is not working, it’s that we want to make sure we have the best system before we roll it out. Because having a constantly changing system is confusing to players and make data collection and analysis more difficult.

The way I see it, the goals of the ideal voting system would be to provide some consistent player-enforced quality control to weed out known loosing strategies, while simultaneously making it easier for new, innovative designers to get noticed and synthesized, even in a sea of hundreds of entries.

Simple voting is intuitive, but hard indeed to get noticed if you do not have a reputation for being a good designer, plus there are too many designs to choose from (“snowballing”), so the safe bets get all the votes. I saw several promising designs that never got more than 1 or 2 votes as they were from new players.

In the past, I have also seen all GC designs with high votes, and the only thing I could do to compensate was vote the next-lowest reasonable design up to try and bump the GC down in the rankings.

With Elo 1, I could consistently pick other designs over the all GC one, which is probably the easiest comparison decision to make. But I can see that too many pairwise comparisons may be needed to get a statistically significant ranking from all participating lab members as the number of submissions approaches triple digits, and this is the central limitation of the system. Unless everyone does hundreds reviews, I don’t think it will scale well.

With Elo 1 + the comment system, the advantage is that designers I’ve never heard of have their design pop up randomly on my screen, and I can leave them some feedback to point them in the direction of something that would stand a good chance if synthesized. I never would have looked at them if it weren’t for the random aspect of the reviewing system.

The idea proposed above would essentially allow for negative confidence to be expressed as well as positive confidence in a design (sort of like negative voting). This is an interesting idea, it would be neat if players to accumulate some sort of ranking on how accurate their scoring assignments are. There could be some weighting involved too, so that extreme predictions are weighted by how experienced a player has been in past voting rounds (like karma in slashdot or something).

I still think reviews should be randomly assigned, it’s the only way to guarantee new designs get noticed.

I agree with D9 in that we need to see the results from Elo1 beta before going on to Elo2.

“guess” a synthesis score of a given RNA"
Yes that system would be a lot harder for new and current players and the time that would be needed/required to analyse would be increased too.

I would hold off this idea at this time. Even though I can see the potential benefit of it, I think we’re missing evaluative outputs that we can use as measures. So the benefit would be muted.

Due to the nature of the fact that the field of RNA is so new, so to speak, I think you will have to resolve to the fact that changes will always be neccessary to keep up with the latest developments. The best system would be one that is up for review every 6 months, unless there’s a new measure that we can immediately put to use and is an easy win.

I think now that we’re moving to more quanitifiable voting systems, we should be careful. I wouldn’t be surprised if there is quite a wide disparity in how elo1 is shaping the results, even amongst the top players. (By the way I’d love to see how elo is currently performing.)

To push for another system change, into more unknown territory, I think will just lead players more into relying onto dotplots and webservers like RNAfold. Which I think blunts the effect of using human spatial awareness and recognition. I don’t know about other people, but I have trouble determining whose design is the best out of Ding’s, Mat’s, mine and d9’s. Much less what it’ll actually score. So I don’t blame people for only voting for the designs which are already leading in votes aka snowballing.

Now if I played the inverse card, I could quite comfortably predict what an all GC design would score. Would I get full points for that, or are you going to not make it allowable? There are designs that we know are going to fail, technically the guess would be correct.

On a totally different side note, which this thread actually popped into my head, what about if one week your design made it into the top 8 for synthesis, that means for the following round, all your designs would have to sit it out? Like you could still submit designs, they just wouldn’t be in the running for synthesis.

As anew player, I fully admit to snowballing. I simply don’t know what to do.As a ranked in the top 20 players, I have a good grasp on creating puzzle solving designs. But, I am baffled by what makes a better design - low, high, or medium energy? More or less of certain nucleotide pairs? Short of ta=king classes here at UW - Madison, what would you suggest? I’m afraid I’m just causing more damage than contribution in the lab. Thanks.

I think the main thing to do is look through past lab results to see what has been tried and what worked or didn’t work. And read old posts on GetSat.

It also helped me in the beginning to do some modifications of already-synthesized designs rather than try to design my own from scratch. That way you can see which parts failed and try to think of ways to fix them, it’s a good way of getting a sense of what does and doesn’t work that only really comes through practice (which we’re all still working on). Plus, I’ve noticed that especially in later rounds of a shape, new players are a lot more likely to get a design voted for synthesis if it’s a modification of a design that did fairly well already (I know that’s how I got my first two voted up).

Hi wisdave - also, take a look at the table at the end of this post to get a rough idea of past successful percentages of each kind of base-pair:

http://getsatisfaction.com/eternagame…

Good Luck!

-d9

Wisdave - this may be totally facetious/irrelevant comment for you if you are fully invested in another field of study, but I can vouch that if Tom Record is still teaching biophysical chemistry at UW, you might try sitting in when you can because he’s the bee’s knees in biochemical thermodynamics which is ultimately what this game is all about :slight_smile:

I’m with dave, I haven’t a clue as to what you even have in mind about guessing a score. What’s a score anyway, points out of 100? How is that computed? For that matter, do you go back after the fact and rate our submissions, show the scores? I know we get lab rewards based on the scores. I’ve played most available puzzles, but still don’t really grok the lab side at all.

Since apparently there are lots of doubts as to whether this would work or would be preferable to the alternatives, and since it’s such a simple system to implement (i.e. add a “Predict” button to the design info dialog, a “My Prediction” column to the lab table, and a formula to assign points for predictions), why not roll it out just as an extra way to earn points at first and then look at the results to see whether they could also be used to select designs for synthesis? That would also allow you to try out various selection criteria (highest average prediction, highest median prediction, highest Elo rating after automated pairwise comparison, etc.) on real prediction data before settling on one to use for actual selection.

I don’t know about other people, but I have trouble determining whose design is the best out of Ding’s, Mat’s, mine and d9’s. Much less what it’ll actually score.

But certainly you have some idea of the ballpark of their scores. It doesn’t have to be exact, just an informed estimate. As far as synthesis selection is concerned, what matters is not the relative scores of the top eight designs but rather the fact that they are all judged to be better than the rest.

Now if I played the inverse card, I could quite comfortably predict what an all GC design would score. Would I get full points for that, or are you going to not make it allowable? There are designs that we know are going to fail, technically the guess would be correct.

If enough other people predict it will fail, it won’t be selected for synthesis, so no one will get points for it. If on the other hand enough people think it will do well that it gets selected for synthesis, and you’re among the few who correctly predict it will fail, you probably deserve the full points.

Thanks to everyone for the excellent suggestions.

Hi matt, if we try this “simple” prediction system now, it will mean we will have three systems going simultaneously (normal voting, elo, and prediction system) now THAT is confusing… especially since we have not even seen any result of elo yet to judge how it is working or how it would change the designs sent to synthesis. I agree with you; let’s not do anything confusing to players; a third test running simultaneously would be. One test should be completed and be evaluated before starting anything else new.

@JRStern - rhiju (who runs the lab where our RNA designs are synthesized and tested) explains the scoring system in his response to this thread: http://getsatisfaction.com/eternagame…

I tried to write out a layman’s version, but it ended up way too long and no clearer than rhiju’s explanation :slight_smile:

As far as the scores of designs that aren’t synthesized go, right now they’re just assigning each one the same score as the design synthesized in the round it was submitted in that has the most nucleotides in common. Those scores are what the lab rewards are based on. I’d take them with a grain (or a bowlful) of salt, since a design can share very few nucleotides with any of the synthesized designs but still has to be assigned a score – I’ve seen designs whose “closest” synthesized design shared only about half the nucleotides.

I think that part of either proposal for lab reform is doing away with lab rewards for “closest” sequences - in Elo we’d be rewarded based on how accurate our comparisons were only between two designs that both get synthesized, and in this rating system I think we’d only be rewarded for how close we guessed the score of designs that are actually synthesized.

Ok. I played around with D9’s spreadsheet and looked at anything synthesized at 95 and above. I also noted the melting points and energy levels. I modified one of my designs to fit within thse parameters and took into account the comments about repeating patterns of nucleotides. I think it looks much better now. Unforunately, I’ll have to wait to the next round as I have used up my three solutions. Again, thanks for the help.

@ Alan.Robot - Thanks for the comment, but I’m at the tail end of a career in business. It would be a couple of years before I retire and could take a few classes. I just stumbled on this a couple of months ago and was taken in by the possibility of designing RNA. I might get one to synthesize yet. I’ll keep working on it now that I have a few tpis.

We got some preliminary results from the new lab on “The Star.”

http://eterna.cmu.edu/news/393375

Unforunately, I’ll have to wait to the next round as I have used up my three solutions.

You can delete one of them if you want, you just have to unvote it first. Just make sure you copy the sequence and save it somewhere before deleting in case you want to refer back to that design again later.

Thanks, aldo. Done