A Text Mining and Topic Modeling Approach to Analyzing Research Trends in Timor-Leste

  • Edio da Costa Department of Computer Science, Dili Institute of Technology, Dili, Timor-Leste
Keywords: Text Mining; Topic Modeling; Research Trends; LDA; Timor-Leste

Abstract

Identifying national research trends is crucial for supporting academic development and evidence-based policymaking. In Timor-Leste, however, systematic and data-driven analyses of research outputs remain limited. This study applies text mining and topic modeling techniques to examine dominant research themes and emerging trends within Timor-Leste’s academic landscape. A corpus of academic publications and institutional research documents was collected and preprocessed using standard natural language processing methods, including case folding, tokenization, stop-word removal, and lemmatization. Topic modeling in this study was conducted using the Latent Dirichlet Allocation (LDA) algorithm, with five core topics specified to identify emerging thematic clusters during the 2021–2025 period. The result findings reveal several major research clusters, such as Education & Human Capital Development, Public Health and Social Wellbeing, Governance & Public Policy, Agriculture & Rural Development, and Digital Technology Innovation as well as their relative prominence. In addition, the analysis highlights underrepresented research areas that offer opportunities for future investigation

Downloads

Download data is not yet available.

References

Amiruddin M., Z. B., Samsudin A., Suhandi A., Coştu B., Prahani B., K. (2025). Scientific mapping and trend of conceptual change: A bibliometric analysis. Social Sciences & Humanities Open, Vol. 11,. https://doi.org/10.1016/j. ssaho.2024.101208

Ramos, D. K., & Mattar, J. (2025). Mapping Literature Reviews in Education: A Bibliometric Analysis. Interference: A Journal of Audio Culture, 11(2), 9249–9277. https://doi.org/10.36557/2009-3578.2025v11 n2p9249-9277

Karakose, T., Leithwood, K., & Tülübaş, T. (2024). The intellectual evolution of educational leadership research: A combined bibliometric and thematic analysis. Education Sciences, 14(4), 429. https://doi.org/10.3390/educsci140 40429

Wollscheid, S., Tømte, C. E., Egeberg, G. C., et al. (2025). Research trends on digital school leadership over time: Science mapping and content analysis. Education and Information Technologies. https://doi.org/10.1007/s10639-024-12909-3

Ferrero, L.G.P., Salles-Filho, S.L.M. (2025). Planning and resource allocation models in research‐intensive universities: budget allocation and the search for excellence. Humanit Soc Sci Commun 12, 482. https://doi.org/10.1057/s41599-025-04778-z.

Yaqin, L.N., Bilad, M.R., Yusof, B. et al. (2025). Mapping research evolution in higher education: a scientometric analysis of Brunei Darussalam (1986–2024). Discov Sustain 6, 142. https://doi.org/10 .1007/s4362 1-025-00917-3

You, C., Awang, S.R. & Wu, Y. (2024). Bibliometric analysis of global research trends on higher education leadership development using Scopus database from 2013–2023. Discov Sustain 5, 246. https://doi.org/10.1007 /s4362 1-024-00432-x

Tao, Y. (2024). Concurrent analyses of Indonesia and Timor-Leste in Chinese scholarship: Patterns, themes, and positioning. World, 5(3), 37. https://www.mdpi.com/2673-4060/5/3/37

Couto, F. A. M. do, & Oliveira, C. M. da S. (2024). Building the higher education and science ecosystem in East Timor. Revista de Ciências e Tecnologia de Timor-Leste. https://rct.inct.gov tl/index. php/rct/article/view/19

UNESCO. (2025). Transforming the research ecosystem in Timor-Leste https://www.unesco.org/en/articles/transfo rming-research-ecosystem-timor-leste

Kumar, R., & Singh, M. (2023). A review on text mining and topic modeling approaches for text analytics. Journal of Big Data Analytics. https://www.sciencedirect.com/science/ article/pii/S23 52914823000175

Zhang, Y., Li, J., & Wang, F. (2024). Topic modeling and its applications in big data analytics. Information, 15(2), 67. https://www.mdpi.com/2078-2489/15/2/67

Lee, S., & Park, H. (2023). Advances in topic modeling techniques for unstructured text mining. Frontiers in Data Science. https://www.frontiersin.org/articles/10.3389/fdata. 2023.1105914/full

Wang, X., & Liu, Y. (2025). Text mining and topic modeling for large-scale knowledge extraction. Annals of Operations Research. https://link.springer.com/article/10.1007/s10479-025-05123-8

Chen, L., Zhao, H., & Yu, Q. (2024). Comparative analysis of topic modeling methods for large text corpora. Expert Systems with Applications. https://www.sciencedirect.com/science/article/pii/ S0957417423003281

Su, C.-H., & Lee, W.-C. (2023). A review on topic modeling and its applications in text analytics. Annals of Operations Research. https://doi.org/10.1007/s10479-023-05314-7

Silva, T., & Pereira, J. (2024). A comparative study of latent dirichlet allocation and neural topic models for large-scale text mining. Expert Systems with Applications. https://www.Scie ncedirect.com/science/article/pii/S0957417423009821

Kim, S. Y., & Lee, J. (2025). Temporal topic modeling of social media content using LDA and dynamic extensions. Expert Systems with Applications. https://www.Sciencedirect.com/science/article /pii/S0957417425002108

Nguyen, Q. T., Tran, H. T., & Pham, V. H. (2024). Topic modeling for trend detection in scientific literature. Information, 15(1), 35. https://www.mdpi.com/2078-2489/15/1/35

Chen, Y., & Zhou, L. (2023). Latent Dirichlet Allocation and advanced topic models: Methods and applications. Frontiers in Data Science. https://www.frontiersin.org/articles/10.3389/fdata. 2023.1123456/full

Kim, H., Kim, S.H., Kim, J. et al. (2025). A keyword-based approach to analyzing scientific research trends: ReRAM present and future. Sci Rep 15, 12011. https://doi.org/10.1038/s41598-025-93423-5

Contreras, R., Puertas, R. & Martinez-Gomez, V. (2025). Bibliometric analysis of emerging trends and future prospects in sustainable agriculture. Discov Sustain 6, 951. https://doi.org/10. 1007/s43621-025-01901-7

Shen J., Wei S., Guo J., Xu S., Li M., Wang D., and Liu L . (2024) Evolutionary trend analysis of the pharmaceutical management research field from the perspective of mapping the knowledge domain. Front. Health Serv. 4:1384364. doi:10.3389/frhs. 2024.1384364

Shi, R., Wan, X. (2024). A bibliometric analysis of knowledge mapping in Chinese education digitalization research from 2012 to 2022. Humanit Soc Sci Commun 11, 505. https://doi.org/10.10 57/s41599-024-03010-8

Karakose, T., Leithwood, K., & Tülübaş, T. (2024). The Intellectual Evolution of Educational Leadership Research: A Combined Bibliometric and Thematic Analysis Using SciMAT. Education Sciences, 14(4), 429. https://doi.org/10.3390/educsci14040429

Ribeiro, M.F., da Costa, C.G., Ramos, F.R. et al. (2025). Exploring research trends and patterns in leadership research: a machine learning, co-word, and network analysis. Manag Rev Q 75, 3773–3811. https://doi.org/10. 1007/s11301-024-00479-0

Passas, I. (2024). Bibliometric Analysis: The Main Steps. Encyclopedia, 4(2), 1014–1025. https://doi.org/10.3390/encyclo pedia4020065

Öztürk, Ö., Kocaman, R., & Kanbach, D. K. (2024) How to Design Bibliometric Research: An Overview and a Framework Proposal. Review of Managerial Science, 18, 3333–3361. https://link.springer.com/article/10.1007/s118 46-024-00738-0

Ganti L., Thor N., A., P., Stead S. (2025). Bibliometric Analysis Methods for the Medical Literature. https://academic-med-surg.scholasticahq.com/article/129 134

Szydlowski, N. (2025). Library science literature, 2019–2025: An exploration using critical bibliometric methods. The Journal of Academic Librarianship. https://doi.org/10.1016/j.acalib.2025 .103142

Ali Abaker Omer, A., & Dong, Y. (2025). Mapping the Use of Bibliometric Software and Methodological Transparency in Literature Review Studies. Publications, 13(3), 40.

https://doi.org/10.3390/publications13030040

Ogunleye, B., Lancho Barrantes, B.S. & Zakariyyah, K.I. (2025). Topic modelling through the bibliometrics lens and its technique. Artif Intell Rev 58, 74. https://doi.org/ 10.1007/s10462-024-11011-x

Kim, J., Koo, B., Nam, M., Jang, K., Lee, J., Chung, M., & Song, Y. (2025). Text Mining Approaches for Exploring Research Trends in the Security Applications of Generative Artificial Intelligence. Applied Sciences, 15(6), 3355. https://doi.org/ 10.3390/app15063355

Sandu, A., Cotfas, L.-A., Stănescu, A., & Delcea, C. (2024). A Bibliometric Analysis of Text Mining: Exploring the Use of Natural Language Processing in Social Media Research. Applied Sciences, 14(8), 3144. https://doi.org/ 10.3390/app14083144

Park S. Wang X., Oh Y., Hong S., Woo S. (2025). Application of structural topic modeling in a literature review of air transport. Journal of Air Transport Management. Volume 122, January 2025, 102708

Aggarwal, C. C., & Zhai, C. (2023). Mining text data (2nd ed.). Springer. https://doi.org/10.1007/978-3-031-19002-9

Aksnes, D. W., Langfeldt, L., & Wouters, P. (2019). Citations, citation indicators, and research quality: An overview of basic concepts and theories. SAGE Open, 13(1), 1–15. https://doi. org/10.1177/2158244019829575

Aria, M., Alterisio, A., & Scandurra, A. (2022). The evolution of scientific literature: A bibliometric and text mining analysis. Scientometrics, 127(6), 3527–3551. https://doi.org/10.1007 /s11192-022-04332-6

Huete-Perez JA., and Salvatierra N. (2025). Assessing biomedical research capacities in selected countries of Latin America: challenges, opportunities, and recommendations. Front. Res. Metr. Anal. 10:1594303. doi: 10.3389/frma.2025.1594303

Semahegn A, Manyazewal T, Hanlon C, Getachew E, Fekadu B, Assefa E, Kassa M, Hopkins M, Woldehanna T, Davey G, Fekadu A. (2023). Challenges for research uptake for health policymaking and practice in low- and middle-income countries: a scoping review. Health Res Policy Syst. 6;21(1):131. doi: 10.1186/s12961-023-01084-5. PMID: 38057873; PMCID: PMC10699029.

Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim, W. M. (2023). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285–296. https://doi.org/10.1016/j.jbusres.2021.04.070

Donthu, N., Kumar, S., & Pattnaik, D. (2021). Forty-five years of journal of business research: A bibliometric analysis. Journal of Business Research, 109, 1–14. https://doi.org/10.1016 /j.jbusres.2019.10.039

Hannigan, T. R., Haans, R. F. J., Vakili, K., Tchalian, H., Glaser, V. L., Wang, M. S., & Jennings, P. D. (2023). Topic modeling in management research: Rendering new theory from textual data. Academy of Management Annals, 17(2), 543–577. https://doi. org/10.5465/annals.2021.0039

Li, J., Wang, Y., Zhang, X., & Li, H. (2022). Topic modeling-based research trend analysis using latent Dirichlet allocation. IEEE Access, 10, 11623–11635. https://doi.org/10.1109/ACCESS. 2022.3145421

Silwattananusarn T., Kulkanjanapiban P. (2022). A text mining and topic modeling based bibliometric exploration of information science research. IAES International Journal of Artificial Intelligence (IJ-AI). Vol. 11, No. 3, September 2022, pp. 1057~1065. 10.11591/ijai.v11.i3.pp1057-1065.

Moraes, R., Valiati, J. F., & Gavião Neto, W. (2022). Document-level topic modeling for research trend identification. Knowledge-Based Systems, 238, 107860. https://doi.org/ 10.1016/j. knosys.2021.107860

OECD. (2023). Science, technology and innovation outlook 2023. OECD Publishing. https://doi.org/10.1787/sti _outlook-2023-en

Tijssen, R. J. W. (2022). Globalization of science and research performance in developing countries. Research Policy, 51(7), 104482. https://doi.org/10.1016/j.respol.2022 .104482

Joo S, Hootman J, Katsurai M (2022), "Exploring the digital humanities research agenda: a text mining approach". Journal of Documentation, Vol. 78 No. 4 pp. 853–870, doi: https://doi.org/10. 1108/JD-03-2021-0066

World Bank. (2022). Building research and innovation capacity in developing countries. World Bank Publications. https://doi.org/10 .1596/978-1-4648-1862-3

Zupic, I., Čater, T., & Francetič, I. (2022). Bibliometric methods in management and organization research: A review. Organizational Research Methods, 25(1), 5–35. https://doi.org/10.1177/109442 81211060309

Ajinaja, M.O., Fakoya, J.T., Ogunwale, Y.E. et al. A Comparative Evaluation of Probabilistic and Transformer-Based Topic Models Across Diverse and Multilingual Text Corpora. Neural Process Lett 58, 9 (2026). https://doi.org/10.1007/s11063-025-11820-3

Sandu, A., Cotfas, L.-A., Stănescu, A., & Delcea, C. (2024). A Bibliometric Analysis of Text Mining: Exploring the Use of Natural Language Processing in Social Media Research. Applied Sciences, 14(8),3144. https://doi.org/10. 3390/app14083144

Alangari, H., & Algethami, N. (2024). Exploring the Effects of Pre-Processing Techniques on Topic Modeling of an Arabic News Article Data Set. Applied Sciences, 14(23), 11350. https://doi.org/ 10.3390/app142311350

Blanchard E., E., Oner B., Allgood A., Peterson D. T., Zengul F. D., Brown M. R., (2024). Evolution of simulation scholarship: A text mining exploration. Clinical Simulation in Nursing. Volume 96. https://doi.org/10 .1016/j.ecns.2024.101620.

Tahvili, S., Hatvani, L., Felderer, M. et al. (2025). Comparative analysis of text mining and clustering techniques for assessing functional dependency between manual test cases. Software Qual J 33, 24. https://doi.org/10.1007/s11219-025-09722-7

Panduwawala P. (2025). Text Mining and Natural Language Processing in the Humanities: A Review of Methods and Applications in Historical Texts, Literature, and Social Media. SchoRes Journal of Social Sciences and Humanities, 1(2). Retrieved from https://schores.org/journals/sjssh/article/view/24

Kuang, H., Tian, P. & Liang, X. (2024). Policy analysis combining artificial intelligence and text mining technology in the perspective of educational informatization. Humanit Soc Sci Commun 11, 1517. https://doi.org/10.1057/s41599-024-04076-0

Ayash, L., Algarni, A. & Alqahtani, O. (2025). Advancements in feature selection and extraction methods for text mining: a review. Discov Appl Sci 7, 914. https://doi.org/10.1007/s42452-025-07587-w

Li N, Liu Y, Chen Z. 2024. Unlocking insights: integrated text mining and interpretive structural modeling for enhanced user review analysis. PeerJ Computer Science 10:e2541 https://doi.org/ 10.7717/peerj-cs.2541

Oner B, Hakli O, Zengul FD. (2023). A text mining and network analysis of topics and trends in major nursing research journals. Nurs Open. doi: 10.1002/nop2.2050. PMID: 38268286; PMCID: PMC10697125.

Gyódi, K., Nawaro, Ł., Paliński, M. et al. (2023). Informing policy with text mining: technological change and social challenges. Qual Quant 57, 933–954. https://doi.org/10.100 7/s11135-022-01378-w

Muthusami, R., Mani Kandan, N., Saritha, K. et al. (2024). Investigating topic modeling techniques through evaluation of topics discovered in short texts data across diverse domains. Sci Rep 14, 12003. https://doi.org/10.1038/s41598 -024-61738-4

Hankar M., Kasri M.,, Beni-Hssane A. (2025). A comprehensive overview of topic modeling: Techniques, applications and challenges. Neurocomputing, Volume 628. ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2025 .129638.

Altarturi H.H.M., Saadoon M., Anuar N.B. (2023). Web content topic modeling using LDA and HTML tags. PeerJ Computer Science 9:e1459 https://doi.org/10.7717/peerj-cs.1459

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022.

Tanev, S., & Sieklicki, S. (2025). Using Topic Modeling as a Semantic Technology: Examining Research Article Claims to Identify the Role of Non-Human Actants in the Pursuit of Scientific Inventions. Applied Sciences, 15(6), 3253. https://doi.org/10.339 0/app15063253

Hu, C., Liang, Q., Luo, N., & Lu, S. (2023). Topic-Clustering Model with Temporal Distribution for Public Opinion Topic Analysis of Geospatial Social Media Data. ISPRS International Journal of Geo-Information, 12(7), 274. https://doi.org/10 .3390/ijgi12070274

Li, Defeng, Wu, Kan and Lei, Victoria L.C. (2024). Applying Topic Modeling to Literary Analysis: A Review" Digital Studies in Language and Literature, vol. 1, no. 1-2, pp. 113-141. https://doi.org/10.1515/dsll-2024-0010

Ma J., Wang L., Zhang YR., Yuan W., Guo W. (2023). An integrated latent Dirichlet allocation and Word2vec method for generating the topic evolution of mental models from global to local. Expert Systems with Applications. Volume 212. ISSN 0957-4174, https://doi.org/10.1016/j.eswa. 2022.118695.

Ozyurt, O., Özköse, H. & Ayaz, A. (2024). Evaluating the latest trends of Industry 4.0 based on LDA topic model. J Supercomput 80, 19003–19030. https://doi.org/10.1007/s1 1227-024-06247-x

Husen R., A., et al., (2025). Sentiment Analysis of Societal Attitudes Toward the Childfree Lifestyle Using Latent Dirichlet Allocation (LDA) and Support Vector Machines (SVM). INNOVATIC

Christian Herzog, Daniel Hook, Stacy Konkiel; Dimensions: Bringing down barriers between scientometricians and data. Quantitative Science Studies 2020; 1 (1): 387–395. doi: https://doi.org/10.1162/qss_a_00020

Mike Thelwall, Dimensions: A competitor to Scopus and the Web of Science?, Journal of Informetrics, Volume 12, Issue 2, 2018, Pages 430-435, ISSN 1751-1577, https://doi.org/ 10.1016/j.joi .2018.03.006.

Gökdağ, K., & Özmantar, M. F. (2024). Emerging research themes in mathematics education: A topic modeling analysis of influential journals (2019–2023). International Journal of Progressive Education, 20(6), 16–32.

Nguyen, L. T., Chansanam, W., Hunsapun, N., Chaichuay, V., Kanyacome, S., Takhom, A., s Li, C. (2024). Evaluating the Performance of Topic Modeling Techniques for Bibliometric Analysis Research: An LDA-based Approach. HighTech and Innovation Journal, 5(2), 312–330. https://doi.org/10.28991/HIJ-2024-05-02-07

Sandu, A., Cotfas, L.-A., Stănescu, A., & Delcea, C. (2024). A bibliometric analysis of text mining: Exploring the use of natural language processing in social media research. Applied Sciences, 14(8), 3144. https://doi.org/10.3390/app14083144

Zhao, Y., Liu, J., & Chen, H. (2023). A systematic review of topic modeling approaches for short text analysis. Artificial Intelligence Review, 56, 14223–14255. https://doi.org/10.1007/s10462-023-10471-x

Alpürk K, Ayaz A, Altınay F, Altınay Z, Berigel DS and Dağlı G (2025) Artificial intelligence applications in entrepreneurship and online education: insights from bibliometric and topic modeling analyses. Front. Educ. 10:1651484. doi:10.3389/feduc.2025. 1651484

Smith, A., Brown, R., & Chen, Y. (2024). Evolving research themes in wood science through topic modeling and pyLDAvis visualization. Journal of Wood Science. This article describes how LDA topic modeling results with pyLDAvis are interpreted using keywords and interactive visualization.

Zhao, L., & Lee, M. (2025). Topic visualization techniques in digital economy research using pyLDAvis. Journal of Scientometric Research. This research highlights the use of pyLDAvis to show intertopic distances and the relevance of words to topics.

Ravikumar, S., Boruah, B. B., & Gayang, F. L. (2023). Text Mining of Journal Article Titles: An LDA-Based Topic Modeling Approach. Journal of Information and Knowledge, 60(5), 289–295. https://doi.org/10.17821/srels/2023/v60i5/170707

Montes-Escobar, K., De la Hoz-M, J., Barreiro-Linzán, M. D., Fonseca-Restrepo, C., Lapo-Palacios, M. Á., Verduga-Alcívar, D. A., & Salas-Macias, C. A. (2023). Trends in Agroforestry Research from 1993 to 2022: A Topic Model Using Latent Dirichlet Allocation and HJ-Biplot. Mathematics, 11(10), 2250. https://doi.org/10.3390/math11102250

Park, T. (2024). COVID-19 Research Trends in Social Work: LDA Topic Modeling Analysis in South Korea. Journal of Social Service Research, 50(4), 609–619. https://doi.org/10.1080/01488376 2024 .2354528

Published
2026-03-21