Over the past 5 decades or so, the growing understanding that RNA can influence protein function through routes other than direct translation has opened the prospect of discovering small molecules for tackling diseases in novel ways. Using sources such as the UK Biobank, progress is now being made on using multi-omics data to derive mechanism of action (MoA) insights into the role of small molecules on RNA function, specifically the splice site selection process. However, such approaches are hindered by an approach to data management that is not designed to work with a large amount of interconnected, complex data in RNA splicing experiments which relies on analysing changes in the distribution of ranges of transcripts along the genomic coordinate axis and thereby requires a flexible, scalable and shareable analysis platform. In this article, we describe how a vector-based database approach can allow data to be managed, interrogated, processed and shared much more effectively, offering a faster, less costly route to small-molecule insights, and hence future success in developing drugs for a range of conditions.
The Evolution in Drug Development
The approaches used for drug discovery have changed markedly over the last 50 years, as new enabling technologies have emerged and evolved. Prominent amongst these is the revolution in genomics and protein sequencing tools that enabled high-throughput screening of small-molecule libraries to become viable – and indeed this remains a widely used route today. However, powerful as this route is, it is time-consuming and resource-intensive, prompting scientists to develop new tools to speed up the identification of candidate drugs.
For example, advances in molecular modelling and computing power now enable in silico screening to play a major role in drug development, while improving capabilities in the analysis of ligand-protein complexes by NMR and XRD have aided the development of fragment-based lead discovery. Above and beyond this, and drawing on an ever-growing portfolio of analytical and biotechnological methods, biologics are now playing a major role as therapeutics, with peptides, antibodies, nucleic acids, blood components and cell therapies all seeing success in recent years.
A Promising Paradigm – RNA as a Drug Target
Joining this evolving portfolio of approaches to drug discovery is a paradigm – first suggested thirty years ago1 – that suggests that RNA structures could be targeted using small molecules, in a way that is analogous to the targeting of protein structures. Termed the ‘RNA revolution’,2 this approach has for a number of years been developed using antisense oligonucleotides and small interfering RNAs, which work by complementary base-pairing with the target and have helped researchers to identify therapeutically tractable target RNAs. However, small molecules are now receiving increased interest, thanks to their generally better pharmacological properties and easily tunable structures, particularly because of their ability to ‘reprogram’ RNA processing.
This reprogramming capability opens the possibility of treating diseases in entirely new ways that overcome challenges associated with protein-based targets. Perhaps most importantly, by targeting RNAs directly, the activities of proteins that are difficult to a drug, or prone to give rise to undesirable side effects, could be modulated. This potential is already beginning to be realized, with the small-molecule drug risdiplam (Evrysdi™), developed by Roche/Genentech in collaboration with PTC Therapeutics, having received FDA approval in 2020 for the treatment of spinal muscular atrophy.3
Addressing the Challenge of Data Processing
There are two interconnected data analysis challenges in RNA targeted therapy development: 1. The identification of disease-modifying intervention points in the RNA splicing process and, 2. Detecting alterations in splicing at the genome scale both broadly (selectivity) and specifically at the disease-modifying intervention points. Developing RNA-targeting small molecules as drug targets first requires the identification of areas on the mRNA molecule that are relevant to the disease or trait in question. This can be done using genome- or phenome-wide association screening (GWAS and PheWAS) in conjunction with quantitative trait loci (QTL), to identify mutations in specific regions of mRNA that occur more frequently in people with that disease or trait, compared to those without it.