Graph data modelling for genomic variants
Journal article
Authors | Anjum, Ashiq and Aizad, Sanna |
---|---|
Abstract | Genome variant analysis is performed on Variant Call Format (VCF) files. It can take days to process these files for genome analytics due to challenges such as loading the files for each user query and processing them to answer questions of interest. As data sizes grow, timely processing of this data is putting enormous pressure on the computational resources, leading to significant processing delays and may jeopardise the ultimate goal of bringing the genomic discoveries to masses. We believe this problem will not be solved until the underlying data structure to organise and process these files undergoes a transformation. To overcome this problem, we have proposed a graph based system to represent the data in VCF files. This allows the data to be loaded once in a graph model which is then subsequently queried and processed numerous times without any additional computational and data access penalties. This helps reduce data access time by giving a constant time access to any node and addresses performance and scalability challenges that have been a limiting factor for the mass scale adoption of genome analytics. It takes only 2ms to access any data node in our graph model and remains constant for any number of nodes. |
Keywords | graph model; genome; Variant Analysis |
Year | 2019 |
Publisher | IEEE |
Web address (URL) | http://hdl.handle.net/10545/624334 |
hdl:10545/624334 | |
Publication dates | 2019 |
Publication process dates | |
Deposited | 13 Dec 2019, 12:03 |
Accepted | 01 Jul 2019 |
Contributors | University of Derby |
File | File Access Level Open |
https://repository.derby.ac.uk/item/947q5/graph-data-modelling-for-genomic-variants
Download files
113
total views0
total downloads2
views this month0
downloads this month