Finding Similar Sequences in Nature using BLAST

One of the ways we can provide evidence our newly discovered pseudoknots actually form a pseudoknot in the cell is to look for a similar sequence in other organisms. This is a technique commonly used by researchers to identify unique structures possibly responsible for a necessary biological function.

Instructions for finding RNA sequences similar to our novel pseudoknot discoveries and entering them in a weekly lab puzzle.

  1. Go to the blastn search page.
  2. Copy the RNA sequence and paste into the sequence field.
    a. Create a job title
    b. Exclude the known organism
    c. I like to have results open in a new window
    d. Click the BLAST button

  1. Wait 30-60 seconds for the result.

  2. Scroll down the page to get a feel for the organisms listed and query cover rate.

  3. Click MSA viewer option.

  4. The MSA viewer will display how similar sequences vary from the original sequence. Enter those mutations in the lab puzzle. Please use the organism name as the submission title. Also helpful for the researchers when we note the numerical position and NCBI ID (CP068247.1)

More variations can be uncovered by further restricting the search organism field for the blastn search, i.e. exclude the entire genus or family.

1 Like

In order to get your sequence from the lab you will have to strip out the beginning 25 NTs and the ending 50 NTs to extract just the sequence. Copy sequence to a txt doc and split using the position counter @ bottom of page.

This post holds the step for isolating the pseudoknot itself using a booster.

Search for the text: How dig siblings up for your natural origin pseudoknot

Step 1-3 shows how to get the snippet of the pseudoknot structure.

Here is a hit:

Jus_Pseu100 #Mutation C37+G91 ID: 11768603 score:93
AUACCGGUGUAAGUGCAGCCCGUCUUACACCGUAAGGCACAGCGGAAACGCUGAUGUGAAAUACAGGGCUGA or
Jus_Pseu100 #Mutation U37+U91+G105 ID:11768651 score: 91
Severe acute respiratory syndrome coronavirus 2 isolate RNA genome assembly, complete genome: monopartite
Sequence ID: OW047800.1

Jus_Pseu100_9 #Mutation A46+U62 ID:11833956 Score:92
GUUUGCGGUGUAAGUUCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUCAUGUCGUAUACUGGGCUUUUGAAGA or
Jus_P100 #Mutation G96 ID:11713414 Score:84
AUACCGGUGUAAGUGCAGCCCGUCUUACACCGUAAGGCACAGCGGAAACGCUGAUGUCAAAUACAGGGCUGGGA or
Jus_Pseu100_12 #Mutation U29+G87+A99 ID:11835446 Score:82
GAUGAGCAGGCAGAUAAUCUUGUGCAGUAUGCAUGCUCGUAACUCCAGCGCACAAGUGGAGGCACAAGAUUUAAAUCCAGCAUAGCGCAGGAGCCGCAGA or
Jus_Pseu100_12 #Mutation U29+U38+G99 ID:11835359 Score:78
GAUGAGCAGGCUGAUAAUCUUGUGCAGUAUGCAUGCUCGUAACUCCAGCGCACAAGUGGAAGCACAAGAUUUGAAUCCAGCAUAGCGCAGGAGCCGCA or
Jus_Pseu100_9 #Mutation C46+C62+G87 ID:11833956 Score:92
AGUUUUUAAACGCGUUUGCGGUGUAAGUCCAGCCCGUCUUACACCGUGCGGCAGAGGCACUAGUACUCAUGUCGUAUACUGGGCUUUUGA or
AUACCGGUGUAAGUGCAGCCCGUCUUACACCGUAAGGCACAGCGGAAACGCUGAUGUCAAAUACAGGGCUGGGA
Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/CO-CDPHE-2101938143/2021
Sequence ID: OL940773.1

Jus_Pseu100 #Mutation G54 ID:11765159 Score:91
AAUUUUCCGGUGGCCAGCCGCCCGGGCCACCGUUACUCCACUCCACUCCUUCGGGACUGGUUUGGAGGAACAUAACAGGGCGGAUAGA
Porcine kobuvirus swine/K-30-HUN/2008/HUN, complete genome
Sequence ID: GQ249161.1

Jus_Pseu100 #Mutation U105 ID:11768567 Score:90
AUAACGGUGUAAGUGCAGCCCGUCUUACACCGUAAGGCACAGCGGAAACGCUGAUGUAAAAUACAGGGCUGUGA
Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/MN-CDC-VSX-A06586/2023
Sequence ID: OR816861.1

Very interesting one - many different organisms
Jus_Pseu100_16 #Mutation C41+C91 ID:11837603 Score:90
AUAUAAGAGCGUCUUCGCGUCUUGACGUGA
Danio rerio cDNA, clone cssl:d0340
Sequence ID: CU468765.1
Argyresthia albistria genome assembly, chromosome: 10
Sequence ID: OY735262.1

JR_P100_539 #Mutation G84 ID:11746727 Score:90
AUACGCUGGUAGCAGCAAGUAGCAGCUCAAUCCGCAGAUGAGCGGAAUAUACAAAUGAGAAUGGAUCUCGAUAGAGCUGCUACAAUCGAGACCAUUAGA
3 differents hits

JR_P100_534 #Mutation G ID:11809462 Score:93
AGAAGCACGCGAUCCUUGGCACCGCGUGCAAGAGAAUGAGGCCGGCAAAGAAGGAAAGCCGGCAUAAGUGCCAAGUAAGAGA
Gadus macrocephalus isolate Gmac_GOA_2020 chromosome 10
Sequence ID: CP133511.1

JR_P100_579 ID:1166407 Score:87
AAACCGGUUCAUCCAAUUUAUUUGGACGCAUGAUAAGAGAAUAAGGAACCGGACUCUUAUCAACA
3 hits - Prochlorococcus sp. RS04 chromosome, partial genome
Sequence ID: CP018346.1

#2TPK #5#Flanking experiment ID:11641493 Score:92
UAAUUGCUUAUUAUAAAUAGAUUAUAAUUAUCUCACUGA
many diff hits - Escherichia phage T2 DNA, complete sequence
Sequence ID: NC_054931.1

NC_031475.1 #EFTK filtered bases 323841-323940 ID:11745233 Score:85
AGAGCUUCGUUCUUCUGCUGCUGACGAAGCUCGAUGGACUUGCACAUCGUCCUCAGUGUGCGACAGCAGCAU
1 hit - TPA_asm: Toxoplasma gondii VEG, chromosome chrVIIb, complete genome
Sequence ID: LN714497.1

NW_908275.1 #EFTK filtered bases 881-980 ID:11737463 Score:91
AAACUGCACGCGGGCGCCGGGCAUCGCCAGUGCAAGCACCGAGCCAAAGAAGAUGCCGAGGUGCAUCGCGUGCGCCACGACGUACGUCAGGCUCGGGAAG
LOts hits , same bug - Trypanosoma cruzi cruzi strain Sylvio X10/cl1 chromosome TcI16, partial sequence
Sequence ID: CP015666.1

NC_018250.1 #EFTK filtered bases 469361-469460 ID:11712397 Score:92
ACCGGUCUCGAGCAUGCCGAGAUGGCAGCGGAGCAAUCGCCAAGCGCCGCUGCGGCAGCAUGCACGGCGAUGGCCUG
23 hits - same bug - Leishmania donovani strain pasteur chromosome 23, complete sequence
Sequence ID: CP022638.1

Here is one example search that yields several species of Mammaliicoccus. The similarity drops to 75% but still worth testing.


OpenKnot Round 4 - Week 12

Hint for getting more than 100 design sequences to choose from when you blast with the Staphylococcus aureus sequence starter in the lab.

Open Algorithm at the bottom of the BLAST window and change default from 100 to 5000 :slight_smile:

Should be useful in future labs too

1 Like

Great suggestion, Eli!

No further similar sequence puzzles will be going up for the weekly series after this week. If players want to explore covariation in their pseudoknot discoveries, they can create a player puzzle and enter similar sequences there to assess whether the sequence variation occurs logically in their pseudoknot structure. Do pairs remain pairs?

I’m looking forward to reviewing the similar sequences submitted by players in the two weekly puzzles to see whether the variations maintain the pknot structure!

In order to see the 5000 list I had to select the “NEW” blast page results then select the “show” from 100 to 5000.

The sequence stamper in the lab is not working for me. I am not able to copy and paste sequences that I discover.

Hi Stevetclark!

I’m taking a guess that this is where trouble arises. Before stamping the sequence into the lab puzzle I do a checkup of my sequence against the original. Sometimes they are an exact match to the exact same spot. But often the sequence I have found and wish to post in the lab, is pushed a bit. Here is an example from week 12. In this case I will paste the sequence in the Sequence Stamper and after that specify 2 for base start:

Alternatively you can filter your BLAST result in a way where the output will always be the same length as your search query. Like this:

Hope this helps

Speaking of filters, here is another one that is pretty useful. We don’t want designs that are 100% identical to the sequence we start with. Setting Percent identity to 99 as a max will ensure that.

Here is another option I like, when watching the MSA Viewer. I have been playing with the Coloring option and I really like the Frequency-Based Difference. When I have as many sequences as I have from the week 13 search, I don’t want to scroll down over all the sequences. 6182 in the below case. Then I want an average, which is exactly what this view gives me. :slight_smile:

Sometimes one can even get a sense of how the codons for the coding sequences are positioned, because the third wobble base is the one most likely to change, for closely related siblings like the above sequences.

When I scroll further down the MSA Viewer, here to the bottom, I can see the concrete base mutations against my query starter:

I did all of the above and the sequence is still not being pasted. Is there an updated script? What are you using to paste the sequences?

@stevetclark, ok, that is weird. I’m just using the sequence stamper in the Booster list.

I only use the paste option in the tool bar, when the sequence I paste in starts at base 1.

Is there an advantage to using Sequence stamper rather than Paste sequence? I’ve been using the latter.

@spvincent, normally not, if your sequence starts from base 1 and fills the rest. However if your sequence starts at base 3, you want to use the Sequence Stamper as it will allow you to decide where your sequence goes. This is the second box you get in the Sequence stamper after pasting in your sequence. I put in a 3 for specifying a different start position for my sequence.

So in this lab where we get a starting structure and post in sibling sequences, it makes it easier also using sibling sequences that are a tiny bit shorter than our original 100 nt puzzle.

From your blast alignments: If you select Tree View, then click on a node you get a list of exact NT changes you can change in a text doc if you want to then copy into the lab to see what happens.