Identifying individuals with rare disease variants by inferring shared ancestral haplotypes from SNP array data.NAR genomics and bioinformatics • November 20, 2024
Erandee Robertson, Bronwyn Grinton, Karen Oliver, Liam Fearnley, Michael Hildebrand, Lynette Sadleir, Ingrid Scheffer, Samuel Berkovic, Mark Bennett, Melanie Bahlo
We describe FoundHaplo, an identity-by-descent algorithm that can be used to screen untyped disease-causing variants using single nucleotide polymorphism (SNP) array data. FoundHaplo leverages knowledge of shared disease haplotypes for inherited variants to identify those who share the disease haplotype and are, therefore, likely to carry the rare [minor allele frequency (MAF) ≤ 0.01%] variant. We performed a simulation study to evaluate the performance of FoundHaplo across 33 disease-harbouring loci. FoundHaplo was used to infer the presence of two rare (MAF ≤ 0.01%) pathogenic variants, SCN1B c.363C>G (p.Cys121Trp) and WWOX c.49G>A (p.E17K), which can cause mild dominant and severe recessive epilepsy, respectively, in the Epi25 cohort and the UK Biobank. FoundHaplo demonstrated substantially better sensitivity at inferring the presence of these rare variants than existing genome-wide imputation. FoundHaplo is a valuable screening tool for searching disease-causing variants with known founder effects using only SNP genotyping data. It is also applicable to nonhuman applications and nondisease-causing traits, including rare-variant drivers of quantitative traits. The FoundHaplo algorithm is available at https://github.com/bahlolab/FoundHaplo (DOI:10.5281/zenodo.8058286).
Understanding Plasmodium vivax recurrent infections using an amplicon deep sequencing assay, PvAmpSeq, identity-by-descent and model-based classification. MedRxiv : The Preprint Server For Health Sciences • June 10, 2025
Jason Rosado, Jiru Han, Thomas Obadia, Jacob Munro, Zeinabou Traore, Kael Schoffer, Jessica Brewster, Caitlin Bourke, Joseph Vinetz, Michael White, Melanie Bahlo, Dionicia Gamboa, Ivo Mueller, Shazia Ruybal Pesántez
Plasmodium vivax infections are characterised by recurrent bouts of blood-stage parasitaemia. Understanding the genetic relatedness of recurrences can distinguish whether these are caused by relapse, reinfection, or recrudescence, which is critical to understand treatment efficacy and transmission dynamics. We developed PvAmpseq, an amplicon sequencing assay targeting 11 SNP-rich regions of the P. vivax genome. PvAmpSeq was validated on field isolates from a clinical trial in the Solomon Islands and a longitudinal observational cohort in Peru, and statistical models were applied for genetic classification of infection pairs. In the Solomon Islands trial, where participants received antimalarials at baseline, half of the recurrent infections were caused by parasites with >50% relatedness to the baseline infection, with statistical models classifying 25% and 25% as probable relapses and recrudescences, respectively. In the Peruvian cohort, 26% of recurrences were likely relapses. PvAmpSeq provides high-resolution genotyping to characterise P. vivax recurrences, offering insights into transmission and treatment outcomes.
Comprehensive Characterisation of the RFC1 Repeat in an Australian Cohort.Cerebellum (London, England) • June 01, 2025
Kayli Davies, Haloom Rafehi, Liam Fearnley, Penny Snell, Greta Gillies, Tess Field, Gábor Halmágyi, Kishore Kumar, Kate Pope, Renee Smyth, Susan Tomlinson, Stephen Tisch, Chi-chang Tang, Shaun R Watson, Thomas Wellings, Kathy H Wu, David Szmulewicz, Martin Delatycki, Melanie Bahlo, Paul Lockhart
RFC1-related disease, which includes cerebellar ataxia, neuropathy, and vestibular areflexia syndrome (CANVAS), is a late-onset neurodegenerative disorder primarily caused by biallelic AAGGG(n) repeat expansions (RE) in RFC1. The RFC1 locus is highly polymorphic, with multiple pathogenic and non-pathogenic repeat motifs identified. This study aimed to characterise the structure of the RFC1 repeat and determine the pathogenic allele frequency in an Australian cohort. Using a combination of PCR and next generation sequencing techniques, we provide a comprehensive characterisation of the RFC1 repeat locus in an Australian cohort of 232 individuals with adult-onset ataxia and 269 healthy controls. Biallelic pathogenic RFC1 variants were identified in 34.1% of affected individuals. The overwhelming majority (93.7%) have biallelic AAGGG(n) RE, although other pathogenic alleles, including ACAGG(n), AAAGG(>500) and the Māori AAAGG(10-25)AAGGG(n)AAAGG(4-6) configuration were detected in some affected individuals. We also demonstrate the utility of targeted long-read sequencing in resolving complex alleles. The carrier frequency of the pathogenic AAGGG(n) expansion was approximately 1 in 16 in controls, highlighting the potential for pseudodominant inheritance and the likelihood that RFC1-related disease is underdiagnosed. We further demonstrate the significant RFC1 repeat heterogeneity, identifying 16 distinct motifs, complex repeat structures, and at least six motifs with an allele frequency > 1%. The frequency of RFC1-related disease in individuals with adult-onset cerebellar ataxia and the high carrier frequency of pathogenic RFC1 alleles in the Australian population underscores the need for improved diagnostic strategies. Our findings indicate RFC1 RE are a major cause of late-onset cerebellar ataxia and sensory neuropathy in Australia and provide further insights into RFC1 repeat diversity.
GeneSetPheno: a web application for the integration, summary, and visualization of gene and variant-phenotype associations across gene sets.Bioinformatics Advances • December 09, 2024
Jiru Han, Zachary Gerring, Longfei Wang, Melanie Bahlo
The comprehensive study of genotype-phenotype relationships requires the integration of multiple data types to "triangulate" signals and derive meaningful biological conclusions. Large-scale biobanks and public resources generate a wealth of comprehensive results, facilitating the discovery of associations between genes or genetic variants and multiple phenotypes. However, analyzing these data across resources presents several challenges, including limited flexibility in gene set analysis, the integration of multipe databases, and the need for effective data visualization to aid interpretation. GeneSetPheno is a user-friendly graphical interface that integrates, summarizes, and visualizes gene and variant-phenotype associations across genomic resources. It allows users to explore interrelationships between genetic variants and phenotypes, offering insights into the genetic factors driving phenotypic variation within user-defined gene sets. GeneSetPheno also supports comparisons across gene sets to identify shared or unique genetic variants, phenotypic associations, biological pathways, and potential gene-gene interactions. GeneSetPheno is a free and highly configurable tool for exploring the complex relationships between gene sets, genetic variants, and phenotypes. Target users include molecular biologists and clinicians who wish to explore a gene or gene set of particular interest. GeneSetPheno is freely accessible at: https://shiny.wehi.edu.au/han.ji/GeneSetPheno/. The source code is available on GitHub at: https://github.com/bahlolab/GeneSetPheno.
Genetic Risk of Reticular Pseudodrusen in Age-Related Macular Degeneration: HTRA1 /lncRNA BX842242.1 dominates, with no evidence for Complement Cascade involvement.MedRxiv : The Preprint Server For Health Sciences • October 14, 2024
Samaneh Farashi, Carla Abbott, Brendan Ansell, Zhichao Wu, Lebriz Altay, Ella Arnon, Louis Arnould, Yelena Bagdasarova, Konstantinos Balaskas, Fred Chen, Emily Chew, Itay Chowers, Steven Clarke, Catherine Cukras, Cécile Delcourt, Marie-noëlle Delyfer, Anneke Den Hollander, Sascha Fauser, Robert Finger, Pierre-henry Gabrielle, Jiru Han, Lauren Hodgson, Ruth Hogg, Frank Holz, Carel Hoyng, Himeesh Kumar, Eleonora Lad, Aaron Lee, Ulrich Luhmann, Matthias Mauschitz, Amy Mcknight, Samuel Mclenachan, Aniket Mishra, Ismail Moghul, Luz Orozco, Danuta Sampson, Liam Scott, Vasilena Sitnilska, Scott Song, Amy Stockwell, Anand Swaroop, Jan Terheyden, Liran Tiosano, Adnan Tufail, Brian Yaspan, Robyn Guymer, Melanie Bahlo
Age-related macular degeneration (AMD) is a multifactorial retinal disease with a large genetic risk contribution. Reticular pseudodrusen (RPD) is a sub-phenotype of AMD with a high risk of progression to late vision threatening AMD. In a genome-wide association study of 2,165 AMD+/RPD+ and 4,181 AMD+/RPD-compared to 7,660 control participants, both chromosomes 1 ( CFH ) and 10 ( ARMS2/HTRA1 ) major AMD risk loci were reidentified. However association was only detected for the chromosome 10 locus when comparing AMD+/RPD+ to AMD+/RPD-cases. The chromosome 1 locus was notably absent. The chromosome 10 RPD risk region contains a long non-coding RNA (ENSG00000285955/BX842242.1) which colocalizes with genetic markers of retinal thickness. BX842242.1 has a strong retinal eQTL signal, pinpointing the parafoveal photoreceptor outer segment layer. Whole genome sequencing of phenotypically extreme RPD cases identified even stronger enrichment for the chromosome 10 risk genotype.