Rare diseases and how to find complete human genes
DigitalEmbrace has brought up that Rhiju and co will first be scanning the human genome in 2024. I have an idea for where to start with homo sapiensā¦
There are a lot of rare diseases, caused by a mutation in a gene. Here is a story about two little boys with a genetic mutation that mean they have dementia. There are 70+ such rare diseases, with the same result - a child with dementia.
āYou mean thereās nothing?ā The families fighting for their children with dementia by The Guardian
I have downloaded the entire Orphanet database over rare diseases and isolated just the gene names. When I removed all the doublets (many diseases share a gene), I ended with 4418 human genes, that have rare diseases clustered with them.
Orphan diseases related genes
There is a special useful trick when it comes to finding human genes. I earlier shared tips on how to find refseq genomes, a scientific agreed upon represenative genome for the organism and also often the most complete version up till now. When it comes to human genes, there is something similar. It is just called RefSeqGene. It includes introns and exons.
To get a RefSeqGene, first you need the name of a human gene. Before I found the databases with rare diseases and gene names, I found genes behind diseases, by looking them up in Wikipedia.
Here is how to find a RefSeqGene. Open NCBI and choose Nucleotide in the menu. I decided to search for the human gene PFKM.
Most of the time you will end up with a search result like this. A direct link to the RefSeqGene.
The few exceptions I have hit upon, are for genes that have not yet been curated. Then I specify Homo sapiens for organism and try pick the first variant of the refseqās.
From here on, it is just like any other Personalized Pseudoknot Finder run. I copy the FASTA.
I use the amount of basepairs to judge how much filtering I would add. If the gene is anything less than 50000 bp, I will typically set to Pknot bindings at filter 4. Above 50000 bp I will opt for Pknot bindings filter 5 and for above 100000 I will pick Pknot bindings at filter 6. I donāt recall having pulled filter 7 on a gene yet, but if a gene gets longer than 300000, Iāll probably consider.
I also adjust filter setting to how many pseudoknots it looks like I will get for a search. So I sometimes do a prerun, just to get an idea of how many pseudoknots I will get. If I can see Iāll get more pseudoknots than I care to look through, I raise the filter level. This was the case this time, so I set my filter at 5 instead of the normal 4.
You can pick whichever genes you want from the Orphanet document. When I work my way through them, Iāll search lab to check if it is used and register it to your name, if you have used a RefSeqGene.
When I get the Pseudoknot Finder results back, I look all the pseudoknots through, alongside with jandersonleeās ArcKnot tool, as to pick out the judged stronger knots.
How may this help?
We are not running pseudoknot tests of the diseased version of a gene. Which may also be interesting. However there are so many disease variations that I wouldnāt know were to start for now and we wouldnāt have the slots to run them all. However getting to know where pseudoknots are in genes that are functional, I think is where we will get more bang for our slot bucks. It may be useful knowing where there are pseudoknots in a gene. So if there is a version of the gene with one or more mutations at the spot of such pseudoknot, it may be valuable knowledge. Also if pseudoknots turn out to be medical targets, they may be useful for creating medicine to slow down or speed up the function of a gene. Depending on if it is overactive or not working. Pseudoknots as medical targets is potentially akin to ASO medicine, just using a different method.
Additional data
For those who wish to look up a specific rare disease or know what disease/s their chosen gene is involved with, I have made an extra spreadsheet with the full Orphanet dataset:
Full orphanet file