Pseudoknot Finder tool

We have a Pseudoknot Finder tool! Thank you jnicol for developing this great script for us!

After adding the Pseudoknot Finder script to your Favorites, open the OpenKnot lab puzzle and initiate the booster. The dialog box will pop up. Enter a title for the organism you are searching and a description such as the source (hyperlinks are supported), and paste the long sequence. Here is an mRNA I found on RNA Central.

Click the Apply button. The script will search 100-base segments at 20-base intervals, then return a list of pseudoknots found.

Pseudoknot Finder 2

If desired, click on each Pseudoknot entry to view folding in Natural Mode. Once the player is comfortable with the tool, sequences can be submitted without reviewing each one. I’m keeping this one even though it has 6A in a row. Probably will synthesize and test fine. And if it doesn’t, not a big deal. We have plenty of slots.

I’m keeping this one as well even though it folds with the 5’ end. Only base 26 is involved.

If you want to delete a sequence from the list (not submit it), click Enabled to change to Disabled. Please note, any mutation made to the sequences in the list will not be saved when sequences are submitted. I asked jnicol to keep this tool very simple so that new players can learn it easily, no bells and whistles.

Click the Submit 6 designs button and designs will submit!

The script can process up to 500,000 bases, but a more manageable length RNA to search for pseudoknots is the 10,000 - 50,000 range. (To close the dialog box, click the script in the booster list again.)

Hi all!

If you are wondering about where to find genomes to browse for pseudoknots with Jnicol’s new Pseudoknot Finder tool, here is a way to do it. The tool don’t discriminate, weather you give it RNA or DNA, it will chunk out fine potential pseudoknots.

This a demo of how to find a FASTA sequence. A FASTA sequence is just an easily searchable genome format for computers. Like instead of the computer searching a code with millions of basepairs, the code is broken up in smaller chunks that are easily searchable.

Here is how to get the FASTA. Go to NCBI. Choose the option Genome, beside the search box.

For this particular booster, viruses seems to be of the perfect genome size.

Type in Flaviviridae

This will give you a bunch of viruses that are of a reasonable genome size:

I pick out Hepacivirus C. This lands me on a site with a lot of info. I scroll to the bottom:

This give me the representative genome for the organism and in this case two. Notice the NC in the post name. Those I always go for, they are the quality sign of a reference genome. The page also spills how large the organism is. The first one is 9.65 Kb which is kilo bases = 1000, meaning it is around 10000 bases long. This one is doable.

I open the post for Hepatitis C virus genotype 1:

Notice the title has complete genome. This means you are not just having a single protein or so. You got it all. It also spills the exact length of the sequences in base pairs.

Now click on FASTA (top left). And you have the DNA for the organism of your interest.

Highlight all the DNA and nothing else. Copy it and paste it into Jnicol’s Pseudoknot finder tool like this:

Click Apply. The tool starts running through the sequence and tests if for pseudoknots. Depending on the size of your genome, it may take a while.

billede

It says it has found 79 pseudoknots. Which is a reasonable amount to look through.
billede

If you want ideas for viruses in this size range, I have collected a bunch that could make nice targets in the Most wanted organism spreadsheet. I have picked mainly from viruses that are troublemakers for food production. Which I have found on this Wikipedia page on Plant pathogens.

If you want to work with much larger organisms than viruses, you should give the bulk submission approach a thought.

Other viral genome families with a tendency to small genome sizes should be: Picornaviridae, Caliciviridae, Togaviridae, Paramyxoviridae, Orthomyxoviridae, Rhabdoviridae and Coronaviridae.

Huh. Wow! This is crazy good. I hope I did it right.

1 Like

Great write up, thanks! Actually that interaction with base 26 is a bug, the tool should not have allowed that sequence :slight_smile:

These instructions are very clear. Thanks!

Something’s wrong. The pseudoknot finder doesn’t work for me anymore. Everything looks normal. It goes through the motions but no pseudknots are found.

@jnicol The Pseudoknot Finder tool isn’t finding any pknots for me either. Can you take a look?

I made a change to fix the base 26 issue and broke the checking, its working now please test and let me know

1 Like

It worked well. It still lets pseudoknots through that may have more than 5 of the same bases in a row, but that’s ok I think.

@mjt Yes, that’s okay. Testing natural RNA segments with more than 5 of the same base in a row will be useful.

1 Like

I wish to highlight that there is a new version of the Pseudoknot finder that Jnicol has made. It is called Customized Pseudoknot Finder.

Its main difference is that it has two extra settings. You can choose Any knot and you will get each and every pseudoknot detected. Or you can choose Kissing loops and you will get kissing loops (which are a minority among the pseudoknots).

Then there is my favorite setting - pseudoknot bindings. It introduces an extra level of filtering beyond what was already there in the starter script (a minimum of 3 pseudoknot bindings). I typically set it at 4 pseudoknot bindings. This way I get rid of some of the weaker pseudoknots without loosing out of too many potential pseudoknots.

This setting is equivalent to filtering level -4 in jandersonlee’s spreadsheets for bulk submission. It looks out for getting both 4 pseudo brackets and 4 normal brackets as a minimum. Generally jandersonlee and I fitted the filtering to the amount of pseudoknot potentials we got back. If we get few potentials back - typically for viruses, we used lower level of filtering. If we got a ton of potentials back, typically for bacteria and eucaryotes, we raised filtering level.

f1 = is there one set of [ ] (Basic level before more filtering)
f2 = [[ ]]
f3 = [[[ ]]]
-f3 = ((([[[ )))]]]
-f4 = (((([[[[ ))))]]]]
-f5 = ((((([[[[[ )))))]]]]]
-f6 = (((((([[[[[[ ))))))]]]]]]
-f7 = ((((((([[[[[[[ )))))))]]]]]]]

Submitting a bacterial size organism with Pseudoknot Finder

I have submitted a +1 million bp size organism with the Costumized Pseudoknot Finder. Here is how I did it.

I took the FASTA from the monster size viral organism that I wanted to submit, including the FASTA title header. Mimivirus terra2 genome

Instead of pasting it into Pseudoknot Finder as usual, I instead pasted it into a bioinformatics tool. I set fragment length to 200000 bp and clicked Submit.

Sequence Manipulation Suite: Split FASTA

This give me fine portioned six 200000 bp fragments (okay, the last one was shorter). I use the fragment title as the puzzle title:

Then I just repeated the process in Pseudoknot Finder for each fragment.

Here is the result:

Working with 200000 base fragments was a rather painful experience, 100000 bp is probably a better size. You can adjust this to what size you personally prefer working with in the Pseudoknot Finder.

Now you can do bacteria or a smaller chromosome if you like.

2 Likes