During the town hall, it was mentioned that you need someone to take a very large spreadsheet of lab results and divide it up into smaller ones that are manageable for the players. If nobody else who was in the town hall is already doing this, I’d be happy to do it if you explain what exactly needs to be done (in other words, which columns or rows of data you want to use as the “keys” to split the data by, or if you just want to start a new one every 100 rows, etc.)
Thank you! The csv file should be at this link. If not, try the google folder and the file name is okr1-ultima-2A3-scored-eterna-20250728. Once you open it, post a list of the column names here and I will tell you which columns to delete. Players don’t need reactivity data, for example.
Here they are:
eterna_id
eterna_author
title
sequence
reads
signal_to_noise
snr
warning
reactivity
errors
modifier
checmical
temperature
score_start_idx
score_end_idx
synthesis
sequencer
ensemble_CPQ
ensemble_ECS
ensemble_OKS
ensemble_structures
ensemble_structures_ecs
ensemble_tags
So you still want it all in ONE spreadsheet, just with some columns missing? Not, for example, one spreadsheet per player, or one per puzzle, or something like that?
Let’s remove:
snr
warning
reactivity
errors
modifier
chemical
temperature
score_start_idx
score_end_idx
ensemble_structures
That will shrink the size quite a bit.
And then sort by ensemble_OKS. Once you have that, create one spreadsheet for ensemble OKS=>80 and second spreadsheet for OKS<80. Then we will have to figure out how to share the two spreadsheets with me. Maybe in google sheets?
If any player wants a spreadsheet with all your designs, speak up.
Does it matter if the ensemble_OKS is from low to high or from high to low within each sheet?
The OKS >=80 sheet is a little over 1 MB and the OKS < 80 is about 57 MB.
I usually do high to low (descending) on the sheet, but the user can always sort the sheet however they want. Wow, only 1MB! Let’s do one sheet with all rows and an additional sheet with OKS>=80 for easy access to our best designs. @Eli_Fisker Would that work for you?
1 MB is for the one over 80. The one with OKS < 80 is 50 times larger (but still about 8x smaller than the original sheet with all of them).
Yes, I understood. That is why I’m thinking a sheet containing all designs might be manageable.
@Arosko, thank you for volonteering with spreadsheet help!
@DigitalEmbrace, you ask if getting the OKS>=80 would work for me.
I took a look at our first Round 1 spreadsheet and there we got data also for the lower scoring designs. If possible, I would rather have the spreadsheet including the lower scores. Because then we get read data and signal to noise data also for the designs that did bad. It may help pinpoint data quality issues.
However if this spreadsheet is too huge to handle for all, it may be helpful getting the small one too. It is kind of like the good quality filtering which has also been done on previous data.
@arosko Eli is agreeing with me. Please generate a spreadsheet with all designs. And if you like, generate a second spreadsheet with only the >=80 designs or I can probably do that myself once you provide the spreadsheet containing all designs. Thank you!
Here they are.
@arosko, thank you very much!
Here they come in google spreadsheet format.
okr1-ultima-2a3-scored-eterna-20250728_all
okr1-ultima-2a3-scored-eterna-20250728_over80
Thx to @DigitalEmbrace for making me realize the folders when opened could be chosen as Open in Google Sheet.
Hi @arosko!
I hope you could work your data magic again once more.
It was discovered that the earlier dataset had errors. The lab experiment itself was luckily fine, but the entire dataset had to have scores recomputed from scratch.
Here is the new google drive file:
Thx to @DigitalEmbrace, @LFP6 and @Rhiju for getting us the new data.
The file we need is the same as before, okr1-ultima-2A3-scored-eterna.
Sure I can do that… so you need again one with all the scores and another with >=80?
We only need one sheet with all the scores.
Here’s the link: https://drive.google.com/file/d/1JFrzT4tqjP-YLYCq889lWUbBGC-AylS8/view?usp=drive_link
I actually had the file almost a week ago–I just didn’t get around to making a shareable Drive link. Sorry for the delay
Arosko, thank you very much! ![]()
Here comes the spreadsheets in google format.
okr1-ultima-2a3-scored-eterna-20251012_all_new
okr1-ultima-2a3-scored-eterna-20251012_all_new_80+
I split the data in two sets. One with all the data and one with the 80+ scoring designs.
@Arosko, I was wondering if you could do the same trick on the 2A3 Illumina data in the google drive also?
@DigitalEmbrace made me realize that our recent data has been scored in a new way that does not reward lots of structure or crossing-pairs as much as before. This means that our earlier Illumina google sheets is scored in a different way than our new Ultima google sheets. To properly compare the Illumina and Ultima data, we need to compare the two datasets having the same scoring method.