If I’m not mistaken, sequences in labs can be sorted, and the algorithm currently in use seems to be the Hamming distance.
I’d like to propose a new sorting algorithm (which I dubbed “LDq9”), based on a Lee distance metric with a pseudo-alphabet of size 9 (or more). An example mapping would be:
A.G.-.-.U.C.-.-.-
0.1.2.3.4.5.6.7.8
Which would result in following specific distances:
A:G = 1
U:C = 1
G:C = 4
G:U = 3
A:U = 4
A:C = 4
The basic idea simply being that, changes within the same nucleotide classes (purines or pyrimidines) represent a short distance, while a change of class represent a larger jump.
I believe that this would give a somewhat better view of the similarity of sequences, specially in the context of switches.