was digging into the noncoding regions of the ncbi reference genome.
(“unindexed” positions not coding for a gene)
21556…21562 | between ORF1ab … spike | acgaaca |
---|---|---|
25385…25392 | between Spike … ORF3a | acgaactt |
26221…26244 | between ORF3a…ORF4 | gcacaagctgatgagtacgaactt |
26473…26522 | between ORF4…ORF5 | acgaactaaatattatattagtttttctgtttggaactttaattttagcc |
27192…27201 | between ORF5…ORF6 | gtgacaacag |
27388…27393 | between ORF6…ORF7a | acgaac |
27888…27893 | between ORF7b … ORF8 | acgaac |
28260…28273 | between ORF8…ORF9 | acgaacaaactaaa |
29534…29557 | between ORF9…ORF10 | actcatgcagaccacacaaggcag |
29658…29674 | between ORF10…3UTR | actttaatctcacatag |
most of these (not all) seem to contain the motif acgaac
does anyone know more about this motif?
is this sars cov 2 specific or does it appear in other sequences?
edit: found a paper where this sequence has been mentioned: