The United Nations estimates that the world population will continue to grow, with a projection indicating a world population of up to approximately 8.5 billion people in 2030, 9.7 billion in 2050 and 10.9 billion in 2100. In addition to the phenomenon of population growth, the United Nations also estimates that in 2050 about 70% of the total world population will live in cities. These conditions increase the complexity of the services that public administrations and private companies must provide to citizens with the aim of optimising resources and increasing the level of quality of life. For an adequate design, implementation and management of these services, an extensive effort is required towards the design of effective solutions for data collection and analysis, applying Data Science and Artificial Intelligence techniques.
Several approaches were addressed during the development of this research thesis. Furthermore, different real-world use cases are introduced where the presented work was tested and validated.
The first thesis part focuses on data analysis on data collected using crowdsourcing. A real case study used for the analyses was a study conducted in Sheffield in which the goal was to understand people’s interaction with green areas and their wellbeing. In this study, an app with a chatbot was used to ask questions targeted to the study and collected not only the subjective answers but also objective data like users’ location. Through the analysis of this data, it was possible to extract insights that otherwise would not be easily reachable in other ways. Some limitations have arisen for less frequented areas, in fact, not enough information has been collected to have a statistical significance of the insights found. Conversely, more information than necessary was collected in the most frequented areas. For this reason, a framework that analyses the amount of information and its statistical significance in real-time has been developed. It increases the efficiency of the study and reduces intrusiveness towards the study participants. The limit that this approach presents is certainly the low sample of data that can be acquired.
In the second part of this thesis, a move on to passive data collection is done, where the user does not have to interact in any way. Any data acquired is pseudonymised upon capture so that the dictates of the privacy legislation are respected. A system is then presented that collects probe requests generated by Wi-Fi devices while scanning radio channels to detect Access Points. The system processes the collected data to extract key information on people’s mobility, such as crowd density by area of interest, people flow, permanence time, return time, heat maps, origin-destination matrix and estimate of the locations of the people.
The main novelty with respect to the state of the art is related to new powerful indicators necessary for some key services of the city, such as safety management and passenger transport services, and to experimental activities carried out in real scenarios. Furthermore, a de-randomisation algorithm to solve the problem of MAC address randomisation is presented.