Cognitive spammer: A Framework for PageRank analysis with Split by Over-sampling and Train by Under-fitting

Journal article


Makkar, A. and Kumar, N. 2018. Cognitive spammer: A Framework for PageRank analysis with Split by Over-sampling and Train by Under-fitting. Future Generation Computer Systems. 90 (January 2019), pp. 381-404. https://doi.org/10.1016/j.future.2018.07.046
AuthorsMakkar, A. and Kumar, N.
Abstract

From the past few years, there is an exponential increase in one of the most popular technologies of the modern era called as Internet of Things (IoT). In IoT, various objects perform the tasks of sensing, communication, and computation for providing uninterrupted services (e.g., e-health, e-transportation, security access, etc.) to the end users. In this era, Cognitive Internet of Things (CIoT) is an another paradigm of IoT developed to enhance the capabilities of intelligence in IoT objects where these objects can take independent decisions in any environment. IoT follows the service oriented architecture (SOA), in which the application layer is the topmost layer. It enables the IoT objects to interact with the other objects located across the globe. The power of learning, thinking, and understanding by these objects, can make the information access more accurate and reliable but Web spam is one of the challenges while accessing information from the web. It has been observed from the literature review that search engines are preferred mostly by the people for accessing information. The efficient ranking by the search engines can reduce the computational cost of information exchange by IoT objects. Search engines should be able to prevent the spam from being injected into the web. But, the existing techniques for this problem target in finding the spam after its occurrence in search engine result pages. So, in this proposal, we present an intelligent cognitive spammer framework, Cognitive spammer, which eliminates the spam pages during the web page rank score calculation by search engines. The framework update the Google's ranking algorithm, PageRank in such a way that it automatically prevents link spam by considering the link structure of web for rank score computation. The updated PageRank algorithm provided the better ranking of web pages. The proposed framework is validated with the WEBSPAM-UK2007 dataset. Before processing, the dataset is preprocessed with a new technique, called as ‘Split by Over-sampling and Train by Under-fitting’ to remove the trade off between imbalanced instances of target class. After data cleaning, we applied machine learning techniques (Bagged model, Boosted linear model, etc) with the web page features to make accurate predictions. The detection classifiers only consider the link features of the web page irrespective of the page content. Out of the fifteen classifiers, best three are ensemble, which results in better performance with overall accuracy improvement. Ten-fold cross validation has also been applied with the resulted ensemble model, which results in getting the accuracy of 99.6% in the proposed scheme.

KeywordsCognitive IoT; Internet of Things(IoT); PageRank; Web spam; Service oriented architecture (SOA)
Year2018
JournalFuture Generation Computer Systems
Journal citation90 (January 2019), pp. 381-404
PublisherElseiver
ISSN0167-739X
Digital Object Identifier (DOI)https://doi.org/10.1016/j.future.2018.07.046
Web address (URL)http://www.scopus.com/inward/record.url?eid=2-s2.0-85051984719&partnerID=MN8TOARS
Output statusPublished
Publication dates
Online06 Aug 2018
Publication process dates
Accepted22 Jul 2018
Deposited22 May 2023
Permalink -

https://repository.derby.ac.uk/item/9yx39/cognitive-spammer-a-framework-for-pagerank-analysis-with-split-by-over-sampling-and-train-by-under-fitting

  • 22
    total views
  • 0
    total downloads
  • 1
    views this month
  • 0
    downloads this month

Export as

Related outputs

Preserving Accuracy in Federated Learning via Equitable Model and Efficient Aggregation
Mehdi, M., Makkar, A., Conway, M. and Sama. L 2024. Preserving Accuracy in Federated Learning via Equitable Model and Efficient Aggregation. International Conference on Recent Trends in Image Processing and Pattern Recognition. Springer Nature. https://doi.org/10.1007/978-3-031-53082-1_7
Exploring Imaging Biomarkers for Early Detection of Alzheimer’s Disease Using Deep Learning: A Comprehensive Analysis
Sami, N., Makkar, A., Meziane, F. and Conway, M. 2024. Exploring Imaging Biomarkers for Early Detection of Alzheimer’s Disease Using Deep Learning: A Comprehensive Analysis. International Conference on Recent Trends in Image Processing and Pattern Recognition. Springer. https://doi.org/10.1007/978-3-031-53085-2_17
Advancements in enhancing cyber-physical system security: Practical deep learning solutions for network traffic classification and integration with security technologies
Gaba, S., Budhiraja, S., Kumar, V. and Makkar, A. Advancements in enhancing cyber-physical system security: Practical deep learning solutions for network traffic classification and integration with security technologies. Communications in Analysis and Mechanics. 21 (1), pp. 1527-155. https://doi.org/10.3934/mbe.2024066
SecureFed: federated learning empowered medical imaging technique to analyze lung abnormalities in chest X‑rays
Makkar, A. and Santosh, K. C. 2023. SecureFed: federated learning empowered medical imaging technique to analyze lung abnormalities in chest X‑rays. International Journal of Machine Learning and Cybernetics. 14, p. 2659–2670. https://doi.org/10.1007/s13042-023-01789-7
A Fuzzy-based approach to Enhance Cyber Defence Security for Next-generation IoT
Makkar, A., Ghosh, U., Sharma, P.K. and Javed, A. 2023. A Fuzzy-based approach to Enhance Cyber Defence Security for Next-generation IoT. IEEE Internet of Things Journal. Vol 10 (Issue 3), pp. 2079-2086. https://doi.org/10.1109/jiot.2021.3053326
Quantum Machine Learning Driven Malicious User Prediction for Cloud Network Communications
Gupta, R., Saxena. R., Gupta, I., Makkar, A. and Sing, A. K. 2022. Quantum Machine Learning Driven Malicious User Prediction for Cloud Network Communications. IEEE Networking Letters . 4 (4), pp. 174-178. https://doi.org/https://doi.org/10.1109/LNET.2022.3200724
SecureEngine: Spammer classification in cyber defence for leveraging green computing in Sustainable city
Aaisha Makkar 2022. SecureEngine: Spammer classification in cyber defence for leveraging green computing in Sustainable city. Sustainable Cities and Society. 79 (April 2022), p. 103658. https://doi.org/10.1016/j.scs.2021.103658
SecureIIoT Environment: Federated Learning Empowered Approach for Securing IIoT From Data Breach
Aaisha Makkar, Tae Woo Kim, Ashutosh Kumar Singh, Jungho Kang and Jong Hyuk Park 2022. SecureIIoT Environment: Federated Learning Empowered Approach for Securing IIoT From Data Breach. IEEE Transactions on Industrial Informatics. 18 (9), pp. 6406 - 6414. https://doi.org/10.1109/tii.2022.3149902
Visualization and deep-learning-based malware variant detection using OpCode-level features
Darem, A., Abawajy, J., Makkar, A., Alhashmi, A. and Alanazi, S. 2021. Visualization and deep-learning-based malware variant detection using OpCode-level features. Future Generation Computer Systems. Vol 125 (Dec 2021), pp. 314-323. https://doi.org/10.1016/j.future.2021.06.032
An Efficient Spam Detection Technique for IoT Devices Using Machine Learning
Makkar, A., Garg, S., Kumar, N., Hossain, M.S., Ghoneim, A. and Alrashoud, M. 2021. An Efficient Spam Detection Technique for IoT Devices Using Machine Learning. IEEE Transactions on Industrial Informatics. Vol 17 (Issue 2), pp. 903-912. https://doi.org/10.1109/tii.2020.2968927
PROTECTOR: An optimized deep learning-based framework for image spam detection and prevention
Makkar, A. and Kumar, N. 2021. PROTECTOR: An optimized deep learning-based framework for image spam detection and prevention. Future Generation Computer Systems. 125 (Dec 2021), pp. 41-58. https://doi.org/10.1016/j.future.2021.06.026
Artificial Intelligence and Edge Computing-enabled Web Spam Detection for Next Generation IoT Applications
Makkar, A., Ghosh, U. and Sharma, P.K. 2021. Artificial Intelligence and Edge Computing-enabled Web Spam Detection for Next Generation IoT Applications. IEEE Sensors Journal. 21 (Issue: 22), pp. 25352 - 25361. https://doi.org/10.1109/jsen.2021.3066492
FedLearnSP: Preserving Privacy and Security using Federated Learning and Edge Computing
Makkar, A., Ghosh, U., Rawat, D.B. and Abawajy, J. 2021. FedLearnSP: Preserving Privacy and Security using Federated Learning and Edge Computing. IEEE. https://doi.org/10.1109/mce.2020.3048926
Ai based management of food wastage
Sama, L., Makkar, A., Prokshitha, P., Sharma, B.K. and Dhaloria, D. 2021. Ai based management of food wastage. 2021 International Semantic Intelligence Conference, ISIC 2021; New Delhi; India; 25 February 2021 through 27 February 2021; Code 167510. CEUR Workshops Proceedings.
An Intelligent Phishing Detection Scheme Using Machine Learning
Makkar, A., Kumar, N., Sama, L., Mishra, S. and Samdani, Y. Giri D., Buyya R., Ponnusamy S., De D., Adamatzky A. and Abawajy J.H. (ed.) An Intelligent Phishing Detection Scheme Using Machine Learning. Springer.
SPAMI: A cognitive spam protector for advertisement malicious images
Makkar, A., Kumar, N., Zomaya, A.Y. and Dhiman, S. 2020. SPAMI: A cognitive spam protector for advertisement malicious images. Information Sciences. 540 (Nov 2020), pp. 17-37. https://doi.org/10.1016/j.ins.2020.05.113
An efficient deep learning-based scheme for web spam detection in IoT environment
Makkar, A. and Kumar, N. 2020. An efficient deep learning-based scheme for web spam detection in IoT environment. Future Generation Computer Systems. 108 (July 2020), pp. 467-487. https://doi.org/10.1016/j.future.2020.03.004
DIADL: An Energy Efficient Framework for Detecting Intrusion Attack Using Deep LearnIing
Sama, L., Makkar, A., Mishra, S.K. and Samdani, Y. 2020. DIADL: An Energy Efficient Framework for Detecting Intrusion Attack Using Deep LearnIing. 12th International Conference on Computer Modeling and Simulation, ICCMS 2020 and the 9th International Conference on Intelligent Computing and Applications. ICICA 2020; Virtual, Online; Australia; 22 June 2020 through 24 June 2020; Code 162275. ACM. https://doi.org/10.1145/3408066.3408107
The Power of AI in IoT : Cognitive IoT-based Scheme for Web Spam Detection
Makkar, A., Kumar, N. and Guizani, M. 2019. The Power of AI in IoT : Cognitive IoT-based Scheme for Web Spam Detection. 2019 IEEE Symposium Series on Computational Intelligence, SSCI 2019; Xiamen; China; 6 December 2019 through 9 December 2019; Category numberCFP19COI-ART; Code 157933. IEEE. https://doi.org/10.1109/ssci44817.2019.9002885
FS2RNN: Feature Selection Scheme for Web Spam Detection Using Recurrent Neural Networks
Makkar, A., Obaidat, M.S. and Kumar, N. 2018. FS2RNN: Feature Selection Scheme for Web Spam Detection Using Recurrent Neural Networks. 2018 IEEE Global Communications Conference, GLOBECOM 2018; Abu Dhabi National Exhibition Centre (ADNEC) Abu Dhabi; United Arab Emirates; 9 December 2018 through 13 December 2018; Category number CFP18GLO-ART; Code 145422. IEEE. https://doi.org/10.1109/glocom.2018.8647294
User behavior analysis-based smart energy management for webpage ranking: Learning automata-based solution
Makkar, A. and Kumar, N. 2018. User behavior analysis-based smart energy management for webpage ranking: Learning automata-based solution. Sustainable Computing: Informatics and Systems. Vol 20 (Dec 2018), pp. 174-191. https://doi.org/10.1016/j.suscom.2018.02.003
QAIR: Quality Assessment Scheme for Information Retrieval in IoT Infrastructures
Makkar, A., Kumar, N., Obaidat, M.S. and Hsiao, K.-F. 2018. QAIR: Quality Assessment Scheme for Information Retrieval in IoT Infrastructures. 2018 IEEE Global Communications Conference. Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/glocom.2018.8647180
Spammer classification using ensemble methods over content-based features
Makkar, A. and Goel, S. Kusum Deep, Jagdish Chand Bansal, Kedar Nath Das, Arvind Kumar Lal, Harish Garg, Atulya K. Nagar and Millie Pant (ed.) 2017. Spammer classification using ensemble methods over content-based features. Springer Verlag.