Abstract

World Wide Web (WWW) is a platform that explores a wide range of information used for the development of web applications. Some examples of these applications include social network analysis, personalized item recommendations, and web page classification and ranking. Among these applications, search engines and web page ranking are particularly important as they consistently index and store billions of web pages on the internet. The main objective of this paper is to create an innovative framework for the classification and re-ranking of web pages using intelligent techniques. The framework is structured into two key phases: classification and re-ranking-based retrieval. In the initial classification phase, a series of pre-processing steps are implemented, including the elimination of HTML tags, punctuation, stop words, and the application of stemming. After pre-processing, a word-to-vector conversion is performed, followed by feature extraction utilizing Principal Component Analysis (PCA). This sequence of actions leads to optimal feature selection, which is vital for the precise classification of web pages. Given the multitude of features present in web pages that can compromise classification accuracy, this study employs a novel meta-heuristic algorithm, the Opposition Based-Tunicate Swarm Algorithm (O-TSA), to facilitate optimal feature selection. The refined features are subsequently processed through the Enhanced Convolutional-Recurrent Neural Network (E-CRNN), enhanced by O-TSA, resulting in the effective classification of diverse web page categories. In the second phase, the re-ranking process is executed using O-TSA, which establishes the objective function based on a similarity function (correlation) for URL matching, leading to optimal re-ranking of web.

Keywords

Correlation, Classification, ECRNN, OTSA, Principal Component Analysis,

Downloads

Download data is not yet available.

References

  1. M. Gheisari, H. Hamidpour, Y. Liu, P. Saedi, A. Raza, A. Jalili, H. Rokhsati, R. Amin, Data mining techniques for web mining: a survey. In Artificial intelligence and applications, 1(1), (2023) 3-10. https://doi.org/10.47852/bonviewAIA2202290
  2. C. Choudhary, D. Mehrotra, A.K. Shrivastava, Enhancing the website usage using process mining. International Journal of Quality & Reliability Management, 41(9), (2024) 2311-2332. https://doi.org/10.1108/IJQRM-07-2022-0211
  3. B. Ravinder, S.K. Seeni, V. S. Prabhu, P. Asha, S. P. Maniraj, C. Srinivasan, (2024) Web Data Mining with Organized Contents Using Naive Bayes Algorithm. International Conference on Computer, Communication and Control (IC4), IEEE, India. https://doi.org/10.1109/IC457434.2024.10486403
  4. A. Breit, L. Waltersdorfer, F.J. Ekaputra, M. Sabou, A. Ekelhart, A. Iana, H. Paulheim, J. Portisch, A. Revenko, A.T. Teije, F. Van Harmelen, Combining machine learning and semantic web: A systematic mapping study. ACM Computing Surveys, 55(14s), (2023) 1-41. https://doi.org/10.1145/3586163
  5. J.K. Saini, D. Bansal, Computational techniques to counter terrorism: a systematic survey. Multimedia Tools and Applications, 83, (2024) 1189–1214. https://doi.org/10.1007/s11042-023-15545-0
  6. A. Dutt, M. Akmar Ismail, T. Herawan, I. Abaker Hashem, Partition-Based Clustering Algorithms Applied to Mixed Data for Educational Data Mining: A Survey From 1971 to 2024, IEEE Access, 12, (2024) 172923-172942. https://doi.org/10.1109/ACCESS.2024.3496929
  7. G. Papageorgiou, P. Economou, S. Bersimis, A method for optimizing text preprocessing and text classification using multiple cycles of learning with an application on shipbrokers emails. Journal of Applied Statistics, 51 (13), (2024) 2592–2626. https://doi.org/10.1080/02664763.2024.2307535
  8. P. Ristoski, (2023). Web Mining. Machine Learning for Data Science Handbook. Springer. https://doi.org/10.1007/978-3-031-24628-9_20
  9. M.S. Lin, R. Wen, (2024) A Web-based Text Mining System for Analyzing Customer Feedback of Returned Products. In Proceedings of the 2024 7th International Conference on Computers in Management and Business, 8-12. https://doi.org/10.1145/3647782.3647784
  10. S.H. Liao, R. Widowati, S.T. Liao, Two stages data mining analytics for food intentional and behavioral recommendations. Intelligent Data Analysis, (2024) 1-29. https://doi.org/10.3233/IDA-240664
  11. J.P. Bharadiya, A comparative study of business intelligence and artificial intelligence with big data analytics. American Journal of Artificial Intelligence, 7(1), (2023) 24. https://doi.org/10.11648/j.ajai.20230701.14
  12. A. Pradeep, (2023) Web Mining: Opportunities, Challenges, and Future Directions. International Conference on Intelligent Technologies (CONIT), IEEE, India. https://doi.org/10.1109/CONIT59222.2023.10205913
  13. S.P. Singh, M.A. Ansari, L. Kumar, (2023) Analysis of Website in Web Data Mining using Web Log Expert Tool. IEEE 12th International Conference on Communication Systems and Network Technologies (CSNT), Bhopal, India. https://doi.org/10.1109/CSNT57126.2023.10134696
  14. S. Ouf, Y. Helmy, M. Ashraf, Web Mining Techniques-A Framework to Enhance Customer Retention. International Journal of e-Collaboration (IJeC), 19(1), (2023) 1-30. https://doi.org/10.4018/IJeC.315790
  15. J. Koo, D.K. Chae, D.J. Kim, S.W. Kim, Incremental C-Rank: An effective and efficient ranking algorithm for dynamic Web environments. Knowledge-Based Systems, 176, (2019) 147-158. https://doi.org/10.1016/j.knosys.2019.03.034
  16. P. Chahal, M. Singh, S. Kumar, an Efficient Web Page Ranking for Semantic Web. Journal of the Institution of Engineers (India): Series B, 95, (2014) 15–21. https://doi.org/10.1007/s40031-014-0070-7
  17. W. Rong, B. Peng, Y. Ouyang, K. Liu, Z. Xiong, Collaborative personal profiling for web service ranking and recommendation. Information Systems Frontiers, 17, (2015) 1265–1282. https://doi.org/10.1007/s10796-014-9495-4
  18. G. Michal, J. Zilincan, (2015) Improving Rank of a Website in Search Results-An Experimental Approach,10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), IEEE, Poland. https://doi.org/10.1109/3PGCIC.2015.145
  19. G.A. Amran, H.F. Aldheleai, H. Al-Sanabani, Understanding the Classification of Data Mining and Web Mining. International Journal of Applied Information Systems, 12(37), (2021) 36-39.
  20. S. Krrabaj, F. Baxhaku, D. Sadrijaj, (2017) Investigating search engine optimization techniques for effective ranking: A case study of an educational site. 2017 6th Mediterranean Conference on Embedded Computing (MECO), IEEE, Montenegro. https://doi.org/10.1109/MECO.2017.7977137
  21. Y. Xi, J. Lin, W. Liu, X. Dai, W. Zhang, R. Zhang, R. Tang, Y. Yu,. A bird's-eye view of reranking: from list level to page level. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, (2023) 1075-1083. https://doi.org/10.1145/3539597.3570399