Application of Machine Learning in Autism Spectrum Disorder Screening Final (2)

PhD Thesis

Qureshi, S. 2023. Application of Machine Learning in Autism Spectrum Disorder Screening Final (2). PhD Thesis University of Derby College of Computing and Mathematics
AuthorsQureshi, S.
TypePhD Thesis
Qualification nameDoctor of Philosophy

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that manifests itself in problems related to communication, social skills, and repetitive and
stereotypical behaviours. Caregivers, clinicians, and psychiatrists carry out the screening process using gold-standard screening and diagnostic tools. These tools are often criticized for being too lengthy and time-consuming due to the number of items
and administration time. There is a need to identify the most significant features of
these tools to identify and detect autistic traits in a wider population accurately and
The current screening instruments for Autism Spectrum Disorder (ASD), such as
the Quantitative Checklist for Autism in Toddlers (Q-CHAT) and Autism Quotient
(AQ), rely heavily on subjective cut-off scores and are open to debate. The aim of
this study was to improve the screening process using unsupervised machine learning algorithms, specifically Principal Component Analysis (PCA) and K-means, to
detect early ASD symptoms and severity, as well as Natural Language Processing
(NLP) to examine the linguistic redundancies and construct validity of the AQ questionnaire.
This study used a dataset of 1016 children aged 16 to 36 months, divided into four
groups: typically developing toddlers, toddlers with parent-reported ASD-specific
concerns, toddlers at risk for autism due to having an older sibling with ASD, and
toddlers with a developmental delay. PCA was applied to detect the most influential
items in the Q-CHAT questionnaire and reduce the dimensionality, and K-means
clustering was employed to distinguish between ASD and non-ASD individuals in
the dataset.
The findings suggest that there is no significant difference in the most influential items of the Q-CHAT questionnaire between the three groups, but typically developing toddlers exhibited different behaviour. The study also found that the NLP
techniques, such as Bag of Words and Bag of N-Gram, could be used to extract sentiment polarity, linguistic peculiarities, and redundancies between the questions in
the AQ questionnaire. The results indicate that there is a bias in the way items are
constructed that affects the ability to measure autistic traits among individuals, and
highly similar sentences on the questionnaire could be combined to form an integral
Overall, this study proposes a more effective, valid, and robust screening process
for ASD using unsupervised machine learning algorithms and NLP. The research
opens the door for further investigation into the screening and diagnosis of ASD,
as well as other mental health issues, and recommends more research in the area of
NLP on ASD questionnaires for devising effective questionnaires for a wider population with autism. This study is the first of its kind to combine unsupervised and
natural language processing algorithms to evaluate ASD screening questionnaires.

KeywordsAutism Spectrum Disorder, ASD, Unsupervised Learning, Natural Language Processing, NLP, Q-CHAT, Autism Quotient
PublisherCollege of Science and Engineering, University of Derby
Digital Object Identifier (DOI)
File Access Level
Output statusUnpublished
Publication process dates
Deposited18 Aug 2023
Permalink -

Restricted files


  • 26
    total views
  • 2
    total downloads
  • 13
    views this month
  • 0
    downloads this month

Export as

Related outputs

Funding a Model of Inclusive Pre-school Education: A focus on stakeholder perspectives
Codina, G., Robinson, D., Delgado-Fuentes, M., Shepherd, R. and Qureshi, S. 2023. Funding a Model of Inclusive Pre-school Education: A focus on stakeholder perspectives. AERA 2023 Annual Meeting. Chicago (USA) 13 - 16 Apr 2023