New synthesis candidates selecting system

aldo · January 29, 2011, 6:45pm

An alternative would be to have players “order” designs by predicting their synthesis scores. The designs with the highest average prediction and whose number of predictions is above a certain threshold would be synthesized. The reward for predicting would be a function of the difference between the predicted and actual synthesis score.

This method still lends itself to the randomized evaluation method since you don’t have to look at all the designs to predict the synthesis score for any one of them, just as you don’t have to compare all of the designs to compare any two of them. It has the added benefit that A) more evaluations will be testable against lab results (the probability of any one design being tested is greater than the probability of any two designs both being tested) and B) it should be easier to get an ordering out of a list of scores than out of a relatively sparse set of comparisons (statisticians, feel free to correct/corroborate this).

cesium62 · January 29, 2011, 8:26pm

I thought the goal of the experiement was to design some novel interesting RNA structures and, as a side effect, see if something could be learned about how RNA folds to improve RNA folding computer algorithms.

Putting comments on submitted sequences is critical. This way old-timers can say: “This is a christmas tree, it won’t sequence much less fold, see documentation _here_. Don’t vote for this.”
I think it is critical that people with good reasons why they think a particular sequence will or won’t fold correctly be able to describe those reasons on a design.

The point is not for inidividual humans to recognize patterns. The point is for the community of humans to recognize patterns. Once a new pattern has been recognized, it needs to be disseminated through the community without forcing each individual to discovert that pattern for themself.

cesium62 · January 29, 2011, 8:43pm

per something aculady wrote: Once synthesis patterns are selected and until the synthesis results are available, we should be encouraged to assign orderings (or guess the synthesis scores) to those selections.

Fomeister · January 29, 2011, 8:54pm

@cesium62 - That is precisely why comments should not be allowed. And no, the point of the project is not to create novel structures.

The “point” of the “game” can be found here, http://www.cmu.edu/homepage/computing…

Fomeister · January 29, 2011, 9:04pm

And? That doesn’t conflict with the new scoring system.
Discussion, whether it be about ordering or guessing synthesis scores is a great things. However, these type of discussion should go into the “peanut gallery” or hypothetical type discussions. There is collaboration, which sometimes takes the form of 3 scientists at the bar, and there is analysis.
Two different parts of the same process.

That is why I believe the new system is better, but comments should be avoided when the analysis “moment” comes. Certainly individuals will, and should take into account all discussions, but in the end the point of the project is _not_ to be a popularity contest.

aculady · January 29, 2011, 9:12pm

I really like the fact that we are crowdsourcing the game development to some degree, as well as the RNA algorithms.

dimension9 · January 29, 2011, 9:18pm

Since initial impression-airing, I have been devoting some thought to practical implementation issues, and came up with a preliminary list of questions that I felt would need to be addressed “pre-start-up:” (that is, of course, aside from the obvious need for significant interface infrastructure enhancements to be designed, programmed, and tested)

Since this system seems to operate much like a simple sorting algorithm, how will new entries be inserted into the sort once it is already in progress (at the bottom?, top? center of existing?)
How will late entries be handled in order to receive a fair exposure? (Those in the last 3 days, 2 days, 1 day; last 3 hours, 2 hours, 1 hour)?
In short, the above two issues, illustrate that conducting Elo Pair Review during the design creation cycle would re-produce the same issues that the current system already suffers from.

So, in reference to the above concerns, it strikes me that the preliminary implementation of the “alternating” lab system, as previously proposed here:

http://getsatisfaction.com/eternagame…

…in advance of implementing Elo, could significantly ease the transition to the Elo System.

…however, this “alternating week” system could also be adapted to significantly facilitate the new Elo Pair Review System as well, BUT this would mean the addtion of a third Lab Cycle, “Lab C,” thereby lengthening the cycle by a week.

This scheme would address the above concerns, albeit at the cost of drawing out the cycle to a third week, however, once you take a look at the benefits, perhaps this will not seem too steep a price.

Number 1 above would cease to be an issue, since all entries would be received During Week One before synthesis cut off, meaning that the Elo algorithm would at least be working on a full, complete, stable data-set. During the first week of design, there would be NO Elo Pair Comparisons done; the week would be devoted totally to design creation and submission.

Then at the beginning of Week 2, after design submission cut-off for “Lab A”, the Elo Pair Comparision Process would begin for “Lab A”, and proceed for that whole week, while “Lab B” would simultaneously begin ITS design phase. During this second week of “Lab A’s” Elo Pair Review of designs, there would be NO more design submissions for “Lab A”; the week would be devoted totally to Elo Pair Review. This would give the Elo Pair Review Process more than adequate time for the best designs to “bubble” up to the top, thereby allaying any possible concerns of inadequate pair review affecting results. Also, since all designs would have been submitted prior to cut-off the previous week, there would be no concern about the last and latest entries not receiving a fair shake in the review process. Concurrent with the Elo Pair review for “Lab A,” now remember, the design phase for “Lab B” would be beginning and going into full swing.

At the End of Week 2, the Results of the Elo Pair Review for “Lab A” would be completed and would be published and the top 8 designs sent to the Lab for Synthesis. 'Lab B" would be winding up ITS design phase and getting ready to go to ITS Elo Pair Review Week.

Finally in Week 3, with the “Lab A” Elo Pair Review now completed and the Winners sent off to be synthesized, “Lab A” would close for the week during the synthesis.

Simultaneoulsy, “Lab B” would be completing ITS Elo Pair Review, and would in turn, be preparing to send ITS Top 8 Results to Synthesis.

Meanwhile, “Lab C” would then open for ITS design submission phase, and the whole process would begin again for all 3 Labs.

Here is a quick diagram of the Elo Pair-Review-Enhanced Lab Flow Proposal:

(please click on graphic for larger, clearer view)

I think the added organization, and separation of effort achieved in creating these clear-cut phases may be more than worth the cost of lengthening the cycle to a third week to accommodate the Elo Pair Review Process.

Thanks & Best Regards,

-d9

aculady · January 29, 2011, 9:20pm

I really like the idea of synthesizing either single designs or pairs where there is strong controversy. Controversy indicates either that there are things about the design parameters that are not understood at all under those particular conditions, or that there are two or more competing paradigms in the community regarding how things work that yield different theoretical results under the same conditions.

Edit: This would, of course, be in addition to syntheisizing some designs where there was strong consensus.

cesium62 · January 29, 2011, 9:32pm

The point of the game cannot be found on the page that link leads to.

Got it: The reason to not allow comments is to help the spread of information through the community. Makes sense to me.

aculady · January 29, 2011, 9:33pm

I think that this is an absolutely excellent proposal.

Fomeister · January 29, 2011, 9:36pm

Sure it does. Watch the video on the page.

alan.robot · January 30, 2011, 12:17am

Well, lets remember how Elo works: its the scoring system is used in chess matches and it assumes that the relative rankings actually mean someone beat someone else in a concrete way (i.e. a match). The problem here is that there was no actual match, it’s just an opinion (even if well-informed), and the goal is to synthesize the most representative composite answer from all the collective feedback. There needs to be some way to normalize if different people do different numbers of reviews, otherwise the sorting will inherently bias the viewpoints of those with more reviews. It’s not even necessarily a manipulation issue, it’s because ELO assumes there is one, true,universal ranking and that’s not strictly true for something based on multiple users’ intuition (although that’s what we are trying to take advantage of). I’m all for unlimited comments, but if you can’t normalize the pairwise rankings somehow the sorting will be inherently biased to reflect the opinions of those who do more reviews.

Benjamin_Callaway · January 30, 2011, 1:54am

It seems like your proposal is a relatively complicated process, and not necessarily worth it as it may introduce as many new problems as it solves.
For example, I think being able to review designs while they are being created is very important to the design process, as it allows people to see the problems in their designs and fix them based on provided feedback. In your system, there is no opportunity to review the designs as they are created, and once they can be reviewed, the changes that should be made cannot be.

Personally, I think simple systems generally work better, and I don’t see that your “1” issue will really be an issue. A good design, even if created near the deadline, would still bubble up to the top relatively quickly if it consistenly got positive votes.

Perhaps a combination of systems would do best, such as shortening it two a two week cycle and locking out new submissions 1 day, or several hours, before the final choices are made?

Fomeister · January 30, 2011, 2:09am

I think it bears repeating, that the only test that matters is the lab test. There is absolutely zero people in this group, or in the lab, who know the elements of a design across all shapes.

Too many people are trying to design a process, wihtout considering the goal of the project.

Again, the point of the new review process is to determine how the reviewer rates the designs comparatively.

“…problems in their designs and fix them based on provided feedback.”

The design of the software controls _known_ problems in designs. Feedback from other participants should be taken for what it’s worth. An opinion. That opinion should be shared and people collaborate on their designs for certain.

However, that is a distinctly separate issue from “reviews”, “votes”, or judging.

ChristopherVanLang · January 30, 2011, 3:00am

You could imagine a Quora like system where the useful reviews get more viewtime and the unhelpful ones get voted down. Also author reputation may become a factor.

Adrien_Treuille · January 30, 2011, 9:38pm

A nice thing about this system is that we don’t need to worry about where to “insert” new designs, nor when they’re created because the ordering is well-defined no matter when solutions get compared. Of course we may want to include some sort of “stability” in the system so that newly submitted designs aren’t sorted until they receive a sufficient number of comparisons. But in general, this approach may allow us to weaken the importance of “rounds:” we would simply pop the top 8 solutions off the totally ordered list and synthesize those.

Adrien_Treuille · January 30, 2011, 9:40pm

Alan: We wouldn’t necessarily use Elo, that’s just an example. There are many methods to create total orderings from sets of partial orderings, and we need to investigate which constraints (e.g. normalization constraints) need to be satisfied by a proper ranking.

SpaceFolder · January 31, 2011, 3:11pm

I think the “Elo” method mentioned above is ill-advised. Many people will not want to take the time to compare designs. Although I believe a statistical based approach is warranted, it must be simple for people to grasp and have a “common sense” feel to it.

Here’s my suggestion:

A computer program counts all the submissions. Let’s say, for example, 100 submissions.
The computer program then counts how many of these submissions were submitted by people who have already had past submission(s) selected AND scored above 90. Let’s say for our example…23 people out of 100 submissions.
These 23 submissions are automatically entered into the new trials.
The computer program then randomly selects an arbitrary OR pre-defined number of the remaining 77 submissions. Let’s say it’s predefined at 5 submissions.
The end result is that, for our example, 28 go for lab analysis.

This method removes any and all subjectivity and is based on past performance (the 90 or above criteria). Certainly, someone could have achieved a 90 or above based on luck and perhaps never get a 90 or above again. This could be remedied by a time constraint for how long someone remains in the 90 or above selection preference.

People simply aren’t going to take the time to do lengthy comparisons. People come to this site to have some fun and help science progress…not to have a part-time, unpaid job. People like to compete on an intellectual level and EteRNA offers that opportunity. But it’s not a good idea to have people like myself exiting this site feeling like getting a design selected for further analysis is essentially futile.

Here’s the reality, I received a nice chunk of points for selecting the latest winner…BumpyDiggs. But I didn’t put any thought into it at all…I just jumped on the snowball bandwagon, all the while complaining that the voting scheme needs to be changed.

Thanks for the opportunity,
SpaceFolder

Rudy_Limpaecher · January 31, 2011, 11:00pm

I like the approach; however, since we cannot optimize a design by computer the only judgment is the synthesizing process. Therefore, we should not select a design by marketing of by a fancy tidal.
The review should completely blind, with the reviewer seeing only a number. Any design that is marketed, by calling up friends to select IT, should be disqualified.
RL

Sklavit · February 1, 2011, 4:51pm

As some research say, one-to-one comparison IS the only one really working way to order decisions. And even for one-to-one comparison, people have a great chance of mistake, if too very different objects are compared.