Abstract | Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that manifests itself in problems related to communication, social skills, and repetitive and stereotypical behaviours. Caregivers, clinicians, and psychiatrists carry out the screening process using gold-standard screening and diagnostic tools. These tools are often criticized for being too lengthy and time-consuming due to the number of items and administration time. There is a need to identify the most significant features of these tools to identify and detect autistic traits in a wider population accurately and effectively. The current screening instruments for Autism Spectrum Disorder (ASD), such as the Quantitative Checklist for Autism in Toddlers (Q-CHAT) and Autism Quotient (AQ), rely heavily on subjective cut-off scores and are open to debate. The aim of this study was to improve the screening process using unsupervised machine learning algorithms, specifically Principal Component Analysis (PCA) and K-means, to detect early ASD symptoms and severity, as well as Natural Language Processing (NLP) to examine the linguistic redundancies and construct validity of the AQ questionnaire. This study used a dataset of 1016 children aged 16 to 36 months, divided into four groups: typically developing toddlers, toddlers with parent-reported ASD-specific concerns, toddlers at risk for autism due to having an older sibling with ASD, and toddlers with a developmental delay. PCA was applied to detect the most influential items in the Q-CHAT questionnaire and reduce the dimensionality, and K-means clustering was employed to distinguish between ASD and non-ASD individuals in the dataset. The findings suggest that there is no significant difference in the most influential items of the Q-CHAT questionnaire between the three groups, but typically developing toddlers exhibited different behaviour. The study also found that the NLP techniques, such as Bag of Words and Bag of N-Gram, could be used to extract sentiment polarity, linguistic peculiarities, and redundancies between the questions in the AQ questionnaire. The results indicate that there is a bias in the way items are constructed that affects the ability to measure autistic traits among individuals, and highly similar sentences on the questionnaire could be combined to form an integral question. Overall, this study proposes a more effective, valid, and robust screening process for ASD using unsupervised machine learning algorithms and NLP. The research opens the door for further investigation into the screening and diagnosis of ASD, as well as other mental health issues, and recommends more research in the area of NLP on ASD questionnaires for devising effective questionnaires for a wider population with autism. This study is the first of its kind to combine unsupervised and natural language processing algorithms to evaluate ASD screening questionnaires. |
---|