New synthesis candidates selecting system

Berex_NZ · February 5, 2011, 11:06pm

I hope the scoring system also gets rebuilt.
10 Christmas tree’s scored 94% or higher in Round 4.
16 GC pairs is not similar to 33 GC pairs.

Or else we’ll only be dealing with half the issue.

Madde · February 5, 2011, 11:26pm

Maybe you could add a threshold of at least x% correlation to get the score of a synthesized design.

iulian · February 5, 2011, 11:41pm

I am subscribing to the Elo method, yet i would like to propose an important addition:
It would be nice to record all the hypotheses being made during each round.

A hypothesis is represented by a set of user defined restrictions and a metric
When designing a solution, people should be able to create their own standard goals for the design. (example: no 2 consecutive Gs, minimum 4 G-C pairs, …)
Users should also be able to define more complex goals (example: energy being between some values, limits the types of stacks that he can use, …)
For each hypothesis the user defines the metric of the design. By using some predefined macros, he should be able to compose a comprehensive formula that will represent the metric.
The users vote using the Elo method on the set of rules that form the hypothesis and not on the actual design. Then only the top 10 hypotheses are selected.
Every design should be tested against the selected hypotheses and the best ones should be synthesized.

ccccc · February 6, 2011, 12:46am

Yup, this seems like the obvious solution.

ccccc · February 6, 2011, 12:48am

Awesome idea. Seriously, awesome.

Of course implementation is terrifyingly difficult. For instance one of the #1 things I look for when evaluating a design is this problem:

http://s3.amazonaws.com/satisfaction-…

I’m sure other people have various things like that that they look for. I bet most “hypotheses” are, like mine, too difficult to code?

GAH I could have sworn I got an image into a reply once before but I guess you have to live with a link for now.

aldo · February 6, 2011, 2:23am

If we’re only going to synthesize the top eight (not counting multiple entries from the same player), shouldn’t we make sure every comparison includes at least one design from the top eight at the time? All we need to know is whether a given design belongs in the top eight or not; comparing the current #20 to the current #50 seems to be a lot less useful for that purpose than comparing either of them to the current #6. This would also help with the reward problem, since making sure one design in each comparison is from the preliminary top eight increases the odds that both designs will be in the final top eight.

Fomeister · February 6, 2011, 3:02am

Again, everyone here should be aware that one of the _key_ reasons for this project is determine how we, humans, can recognize patterns. NOT to solely apply criteria (free energy, christmas trees).
These are things that could quite simply be prevented from inclusion. BUT there is a reason that they are not.

aculady · February 6, 2011, 4:52am

What are the criteria for determining similarity? Maybe this should be one of the things users are asked to rate: “How similar are these two designs?” When I try to determine if two designs are similar, I don’t think my first impulse is to go through and count bases. I think my first impulse is to look at stack structure, to look at the points near where loops and stacks meet, and then to look at the categories of bond types and how they are being used (serially, alternately, irregularly, etc.).

Samuel_Johnson · February 6, 2011, 10:08pm

I agree that voting is too often done by popularity, guesswork or copying others. The job of studying past synthesis results and analyzing submissions is more work than most will do.

Several people have mentioned hypotheses and it seems to me that making predictions is what we are really trying to do here.

I suggest that, similar to how we have to learn and show our skill in order to be able to submit designs, we should have to learn and show our skill at predicting results in order to be able to vote.

I propose that rather than voting, we make a specific prediction for how a design will actually fold during synthesis. Just mark the design blue and yellow to match what we think the synthesis results will be.

This prediction is also a score and that score is our vote. The caveat is that our predicted score is weighted based upon how accurate our previous predictions have been.

This could be done using the current RNA Lab voting screen, but instead of voting, the viewer can select any design they want and looking at the design, mark each nucleotide blue or yellow in an attempt to match what the lab results will show when/if it is synthesized.

After synthesis, the predictions would be graded based upon the number of correct nucleotide predictions as well as how close the distance of the predicted failure is to the actual failure. The proximity of the failure location must be included to prevent someone from just randomly marking 3 failure nucleotides and beating the averages.

Then each person’s scores for all the designs that they have made predictions for are averaged. Designs that are predicted for but are not synthesized are not graded. This gives incentive for people to make predictions about potentially successful designs and also prevents someone from just choosing the worst designs and labeling all the nucleotides as fail, thus padding their score.

Rather than casting an actual vote, the prediction process is the voting. Every design will have several predicted results, each one being a predicted score. Each predicted score would be weighted based upon the past accuracy rating of the person making the prediction.

Thus, the most promising designs have a high predicted score where the most historically accurate predictors have their scores weighing more heavily. Then the top scoring designs are chosen for synthesis.

Unfortunately, the only way to get a design submitted for synthesis now is just publicity. So here’s my ugly and shameless self promotion.

I’ve combined only sections of the One Bulge Cross that successfully folded in round 4 and that can be combined without any alterations. The resulting design is made totally of tested sections.

Here’s the breakdown: the right leg is from 1337 by Fabian, the top and left legs are from Donald’s nephew 2 by madde, the bottom leg is from -45.5 kcal 107 M.P. by donald, and the center section is from Ding’s Mod… by Ding.

All these parts folded successfully in round 4 and fit together without any modifications. This should be a very good experiment because even if they don’t fold correctly when put together as a single design, it will be very informative to see why a section that folded correctly before doesn’t fold again a second time.

So please vote for Ankh Will Fold! and I apologize for begging.
Thanks!

iulian · February 6, 2011, 10:22pm

I like your idea about predicting how a design would perform in the test tube but you spoiled a very good post with the last paragraph.

mrrln · February 19, 2011, 1:50pm

But maybe people will take the effort if they score points when they vote? For example 10 points per vote, and when the lab results get back, you get 20 point for each «correct» comparrison.

Then you also get rid of the problem that people would vote for popular designers submissions just because they are popular.

Another factor that can remove popular voting is to anonymize the submissions in some way.

Berex_NZ · February 19, 2011, 7:27pm

Ok, I was able to grab a few details from Jee about the Elo system.
I’m sure the following could change, but I thought everyone might like to know the general gist of what is being planned. Or my grasp on it anyways.

Its scheduled to arrive around the end of the month.
When in Elo, all your decisions will be tracked and recorded.
At the end of the week, the top 8 designs will be synthesized.
Using those 8 designs, they will go back and highlight all the comparisons you did which involved BOTH of the synthesized designs.
There won’t be rounds anymore, but there still will be designs sent weekly for synthesis until a winner (over 94%) is found.
Which means your designs won’t be deleted every week, they will stay there till they either get synthesized or the lab ends.
Because of this, we are likely to see the design limit go from 3 to 10.
We are also likely to see the designer score go up.
And you will get 500 points per correct comparison. Which means of the comparisons you do in Elo, 28 of them will be worth points. So with voting alone, you can have a max points gain of 14,000 points per week.

ccccc · February 19, 2011, 8:15pm

Make sure you don’t set it up so that everyone has a point incentive to randomly vote on every pair. Perhaps +500 for each correct comparison, -500 for each incorrect comparison, down to a minimum of 0. Otherwise you will see people click every comparison randomly.

ccccc · February 19, 2011, 8:15pm

Make sure you don’t set it up so that everyone has a point incentive to randomly vote on every pair. Perhaps +500 for each correct comparison, -500 for each incorrect comparison, down to a minimum of 0. Otherwise you will see people click every comparison randomly.

Berex_NZ · February 19, 2011, 8:38pm

In a perfect world, maybe. But you dont want to discourage people from voting. its already hard enough to get people to spend all their votes. From my experience people only use 3 or 4 votes, out of their available 7. The Elo system works better, with the more votes it has access to. Even if its random voting, the best they will get is 50%. Until more useful factors turn up to evaluate designs, I don’t think a negative points system will benefit anyone.

If we were to implement a negative system, I’d make it a lot smaller amount, like 50 or 100.

Anyways, my two cents. Up to the devs.

ccccc · February 19, 2011, 8:53pm

You don’t want to encourage random guessing! If a random guess has an expected value of a positive score, then you are encouraging it.

Edit: maybe you didn’t notice “down to a minimum of zero?”

Berex_NZ · February 19, 2011, 9:13pm

Standard game design, people have got to be feeling they are making progress. If you make the risk the same or higher than the reward, people won’t take risks to make the judgement calls.

Yes you are right, I don’t want to encourage random guessing, but the greater benefit is more lab participation. A balance has to be struck between validity of decisions made versus lab participation.

I did notice down to minimum of zero, to me, that would guarantee also encourage random guessing, cos they would have nothing to lose.

Its up to the devs. I’m happy enough with how its planned at the moment.

JRStern · February 20, 2011, 12:27am

Well, I have only minimal experience with this kind of pairwise rating system. It has faults, but also virtues. The faults are not easily fixed. When you are presented with two bad cases, you have small incentive to proceed, and I suggest the system has small reason to reward you, yet maybe those two faults offset each other? And maybe there needs to be a max number of votes per round. But then, you’d like to be able to rank your comparison both by certainty, and by the quality of the better entry, and submit just your N best. Again, these two shortcomings offset - just vote a lot, and let chance be your friend. But, what about when you learn better and would redo your vote if you could? Aha, well, them’s the breaks, I guess. but then one has to wonder how much rationality is even involved.

I’m generally skeptical of crowd-sourcing anything. It’s a good way to get some random motion, and even a little progress out of that random motion, but it is not an optimizing process.

I’m not even aware of where one goes to see the lab results, I mean yes the leading model, but not an analysis of why it won. Is there even such a thing? And if not, then I wonder what the point is?

ccccc · February 20, 2011, 4:12pm

I’ve been lobbying for a place to officially discuss specific designs for a long time, and if I remember correctly, there is talk of them implementing that feature soon.

Other than that the only place I know of where people discussed what they learn or know are in the chat window and in the discussion thread for one specific round. But that was an awkward discussion because GetSatisfaction is a crappy platform for an open-ended discussion, so I personally haven’t tried to replicate it.

dimension9 · February 20, 2011, 6:23pm

Hi JRStern,

I can sympathize with your doubts and misgivings - I have them too - this is new and strange to all of us. However, I would respectfully counsel assuming (if you can find it within) an attitude of patience and open-mindedness behind the inevitable and fully understandable skepticism and doubt.

After all, the only way to really see how this all will work out, for either good or ill, is to at least give it a chance, so we might as well put our best foot forward into it.

Once it’s in place, the strengths and/or weaknesses will manifest themselves very quickly, I’m sure, and I’m equally certain that whatever does not work for the Players, will quickly be changed or removed by the Devs. After all, their ultimate success depends on us as Players being engaged and happy with the system they create.

So, here’s hoping all our fears and doubts turn into pleasant surprises.

Best Regards,

-d9