This post Stability of barcode, affects SHAPE colors elsewhere in the design made me realize that I wished that there was a way to dig up identical designs or close designs in a more effective way. I have discussed it with Omei and here pass on our ideas:
I was wondering if it would be possible to dig up clusters of similar designs. It was the hairpin skewing with the SHAPE data that got me wondering. I know there are more designs that are the same in the main design than mine and the one you had. I know Mat did some experiments and I’m sure others have as well. But as is they are very hard to find.
Your current script on finding similarity, starts from an individual design. But what if one doesn’t know which designs are similar and still want twin designs? Could a script be made to dig up these twins (or clusters of designs) where eg. the main design is the same but the hairpin barcode not? (or any specific area of interest)
Date: Fri, 23 Aug 2013 10:23:01 -0700
Subject: Re: Clusters of identical designs
There are certainly algorithms for doing clustering. I spent a few minutes searching the Web to see if I could find anything ready-made that we could easily use, but didn’t find anything very promising.
What would be pretty easy, though, would be a script that takes a lab ID as input and outputs any designs that are identical (not counting the barcode.) That could be a starting point. From there we could think about generalizing it, i.e. add features for comparing only certain positions, or for finding close (non-exact) matches or automatically processing all labs, or …
Do you think the simple case would be useable enough to get things started?
On 8/23/2013 5:31 AM, Eli Fisker wrote:
You already did that script that could could find close designs from a lab ID. However I do like the add on ideas like comparing only certain position. And I love the automating processing all labs, when searching for a specific sequence. We have the sequence search in lab. But it is limited to that specific lab.
I think all what you mention could be really nice. And I do think the simple case could a stepping stone to the clustering.