A Robust Unified Graph Model Based on Molecular Data Binning for Subtype Discovery in High-dimensional Spaces

PhD Thesis


Hassan Zada, M. 2023. A Robust Unified Graph Model Based on Molecular Data Binning for Subtype Discovery in High-dimensional Spaces. PhD Thesis University of Derby School of Computing and Engineering https://doi.org/10.48773/q033x
AuthorsHassan Zada, M.
TypePhD Thesis
Qualification nameDoctor of Philosophy (PhD)
Abstract

Machine learning (ML) is a subfield of artificial intelligence (AI) that has already revolutionised the world around us. It is a widely employed process for discovering patterns and groups within datasets. It has a wide range of applications including disease subtyping, which aims to discover intrinsic subtypes of disease in large-scale unlabelled data. Whilst the groups discovered in multi-view high-dimensional data by ML algorithms are promising, their capacity to identify pertinent and meaningful groups is limited by the presence of data variability and outliers. Since outlier values represent potential but unlikely outcomes, they are statistically and philosophically fascinating.

Therefore, the primary aim of this thesis was to propose a robust approach that discovers meaningful groups while considering the presence of data variability and outliers in the data. To achieve this aim, a novel robust approach (ROMDEX) was developed that utilised the proposed intermediate graph models (IMGs) for robust computation of proximity between observations in the data. Finally, a robust multi-view graph-based clustering approach was developed based on ROMDEX that improved the discovery of meaningful groups that were hidden behind the noise in the data.

The proposed approach was validated on real-world, and synthetic data for disease subtyping. Additionally, the stability of the approach was assessed by evaluating its performance across different levels of noise in clustering data. The results were evaluated through Kaplan-Meier survival time analysis for disease subtyping. Also, the concordance index (CI) and normalised mutual information (NMI) are used to evaluate the predictive ability of the proposed clustering model. Additionally, the accuracy, Kappa statistic and rand index are computed to evaluate the clustering stability against various levels of Gaussian noise. The proposed approach outperformed the existing state-of-the-art approaches MRGC, PINS, SNF, Consensus Clustering, and Icluster+ on these datasets. The findings for all datasets were outstanding, demonstrating the predictive ability of the proposed unsupervised graph-based clustering approach.

KeywordsMachine Learning; Artificial Intelligence; Clustering; Disease Subtyping; Genomics Analytics; Stratification; Precision Medicine; Data Science; Survival Analysis; Robust; Spectral Clustering; Hypothesis Testing; Exploratory Data Analysis; Data Integration; Network Fusion; Multiview data; Multimodal; Unsupervised Learning
Year2023
PublisherCollege of Science and Engineering, University of Derby
Digital Object Identifier (DOI)https://doi.org/10.48773/q033x
File
License
File Access Level
Open
Publication process dates
Deposited22 Aug 2023
Permalink -

https://repository.derby.ac.uk/item/q033x/a-robust-unified-graph-model-based-on-molecular-data-binning-for-subtype-discovery-in-high-dimensional-spaces

Download files


File
Sadiq-PhD-Thesis-Amended-Final.pdf
License: CC BY-NC-ND 4.0
File access level: Open

  • 116
    total views
  • 145
    total downloads
  • 1
    views this month
  • 2
    downloads this month

Export as

Related outputs

A unified graph model based on molecular data binning for disease subtyping
Hassan Zada, M., Yuan, B, Khan, W., Anjum, A., Reiff-Marganiec, S. and Saleem, R. 2022. A unified graph model based on molecular data binning for disease subtyping. Journal of Biomedical Informatics. pp. 1-24. https://doi.org/10.1016/j.jbi.2022.104187
Large-scale Data Integration Using Graph Probabilistic Dependencies (GPDs)
Zada, Muhammad Sadiq Hassan, Yuan, Bo, Anjum, Ashiq, Azad, Muhammad Ajmal, Khan, Wajahat Ali and Reiff-Marganiec, Stephan 2020. Large-scale Data Integration Using Graph Probabilistic Dependencies (GPDs). IEEE. https://doi.org/10.1109/bdcat50828.2020.00028