CSV File of Complete Lab Design Submission Data for Download

Ding · January 26, 2011, 11:19pm

@d9, jee:

I agree with d9 that it’s a pretty serious flaw. A couple “quick and dirty” fixes might be:

don’t limit comparison within rounds; compare to previous rounds as well. This won’t change things like what happened with Christmas Tree getting 94, since it was first-round, but it might mean that in future rounds other christmas trees will be compared to the ones that have already failed rather than to less similar designs if we have another christmas-tree-free synthesis round
only reward designs that are similar above a certain limit. Don’t get me wrong, I liked getting points for my Round Two submission, but it shared only half the nucleotides of its “closest” match and was otherwise very different (and more similar to other candidates in certain ways). Getting a reward based on synthesis of a completely different RNA felt a bit cheap

Berex_NZ · January 28, 2011, 11:40pm

Hi Pytho, could you please direct us as to how to grab that data for ourselves?

Ding · January 28, 2011, 11:56pm

Berex - if you click the link Pytho gave it downloads an updated file in comma-separated-variable format (csv). You should be able to open it in a spreadsheet program (OpenOffice, Microsoft Excel, …)

dimension9 · January 29, 2011, 12:01am

Very glad this is now in the list to be under more intense developer scrutiny, since, under the current system, not only can a Christmas Tree receive a 94% for being “relatively” close (if 43 out of 85 can be considered “relatively close”) to a high scoring winning design, but perhaps even worse, the HUNDREDS of players who receive relatively high scores for designs which may not actually merit them (had they actually been synthesized)… may then erroneously deduce that their (perhaps unsound) design was “really” a high scorer on its own merit, and then take forward the wrong lessons to repeat in future rounds, duplicating weaknesses not revealed or made apparent because of the use of the “closest-to” scoring system…

Berex_NZ · January 29, 2011, 12:56am

Most excellent~! Thank you Ding.
I was assuming it was a static file. And wouldn’t have any of the current round 4 submissions!

Ding · February 5, 2011, 5:27am

Just bumping this comment, wondering if there have been any further thoughts on the subject.

I notice that in round 5, a handful of 30+ GC designs scored 94+ since it was overall a high-scoring round with no actual GC-heavy designs voted for synthesis.

Berex_NZ · February 7, 2011, 6:50am

Hi Pytho,
I’ve noticed with the new lab being put up, this link no longer provides me with the latest submissions. Can you please detail how we can get this ourselves?

JeehyungLee · February 7, 2011, 10:18pm

Ding & dimension9

We are redesigning the scoring system with the new synthesis candidates selection algorithm. At the very least, RNAs that differ by more than certain percentages will be considered “different” and won’t get rewarded as it does now.

Ding · February 7, 2011, 10:23pm

thanks jee, that’s good to hear

Berex_NZ · February 8, 2011, 9:19am

The csv link doesnt give me the submisssions for the latest lab.
Can somebody point me to the new file location please?

Thank you!

jandersonlee · April 19, 2012, 5:40pm

While we wait for a full EteRNA 2.0 database interface something that would spit out a .CSV file of lab results would be a big help to those of us trying to mine the results for ideas. Even something that had just the following fields:

id,score,melt,FE,sequence,targetshape,bonds

where:

Id is the submission number
Score is the synthesis score
Melt is the estimated melt point
FE is the computed free energy
Sequence is the RNA sequence e.g. “AGCAAAGCA”
TargetShape is the secondary shape string e.g. “.((…)).”
Bonds is the bonding result as explained below

Bonds could be something like “0110000110” if it is just a binary 0=unbonded 1=bonded estimate or something like “0891012871” using 0…9 for a 10-way binned estimate of the bonding of each nucleotide.

Other fields that might help are a lab-id to distinguish between different labs, and a best-guess secondary-shape since the system seems to predict how the sequence actually folded. It also wouldn’t hurt to add the CG, GU and AU counts since you have them.

Either something that would spit out the results for a given lab (at the bottom of the view results page?) or one .csv file to rule them all (including all lab results) would suffice. We can do without a complex search interface for now. Even a more current single static file would do if that’s all you have time for. (The old cached link doesn’t seem to be there anymore.)

Thanks!

jandersonlee · April 20, 2012, 12:44am

Oh, and for “switch puzzles”, you could use two separate result lines, with an extra field to distinguish between (FNM) target molecule present or not.

jandersonlee · July 21, 2012, 12:08am

Is this on the implementation list yet? I sure wish I could get a .CSV file for the lab submissions/results. Thanks!