Trending Research Topics Detection over Time Using the Latent Dirichlet Allocation Model
Main Article Content
Abstract
Topic modeling has become an extraordinary tool for analyzing large, unclassified documents to detect patterns that use similar words. It also permits the discovery of hidden themes that pervade the collection, allowing annotation according to those themes, and then using these to summarize and search the text. This paper focuses on Latent Dirichlet Allocation, which is one of the most widely used methods for topic modeling. Trend detection and evolution of topics relating to research topics was attempted from the Journal of the Modern Language Association of America. The study also identified those documents that explained topics according to time series and citation. The experimental data consisted of 5,605 articles from the Journal between the years 1889–2007 held in the Journal Storage digital library. The results show that the Latent Dirichlet Allocation model can effectively detect distinct topics and those documents that explained these topics over time.
Article Details
Copyright: Asia-Pacific International University reserve exclusive rights to publish, reproduce and distribute the manuscript and all contents therein.
References
Alghamdi, R., & Alfalqi, K. (2015). A Survey of topic modeling in text mining. International Journal of Advanced Computer Science and Applications, 6(1), 147–153.
Al-Khateeb, S. (2014). Topic modeling for associated press articles using Latent Dirichlet Allocation [LDA]. Retrieved from https://pdfs.semanticscholar.org/4eeb/900c36a8a9714cceb8a3ccc6ec8a307d2170.pdf
Blei, D., & Lafferty, J. (2006). Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning. ACM, New York, 113–120. doi: 10.1145/ 1143844.1143859
Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84. doi:10.1145/2133806.2133826
Blei, M., Ng, Y., & Jordan, I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022.
Bolelli, L., Ertekin, S., & Giles, C. (2009). Topic and trend detection in text collections using Latent Dirichlet Allocation. Advances in Information Retrieval, 5478, 776–780. doi:10.1007/978-3-642-00958-7_84
Christidis, K., Apostolou, D., & Mentzas, G. (2010). Exploring customer preferences with probabilistic topics models. Retrieved from http://imu.ntua.gr/sites/default/files/biblio/Papers/exploring-customer-preferences-with-probabilistic-topics-models.pdf
Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. Proceeding of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2, 524–531. doi: 10.1109/CVPR.2005.16
Goldstone, A. (2013). Dfr-Browser license (Licensed under the MIT license). Retrieved April 24, 2019, from https://github.com/agoldst/dfr-browser/find/master
Hisano, R., Sornette, D., Mizuno, T., Ohnishi, T., & Watanabe, T. (2013). High quality topic extraction from business news explains abnormal financial market volatility. PLoS ONE, 8(6), e64846. doi.org/10.1371/journal.pone. 0064846
Hoffman, M., Blei, D., & Bach, F. (2010). Online learning for Latent Dirichlet Allocation. Proceeding of the 23rd International Conference on Neural Information Processing Systems (NIPS-10), 856–864.
Hofmann, T. (2001). Unsupervised learning by probabilistic Latent semantic analysis. Machine Learning, 42(1–2), 177–196. doi: 10.1023/A:1007617005950.
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2017). Latent Dirichlet Allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications, 1–43. Retrieved from https:// link.springer.com/article/10.1007/s11042-018-6894-4
Landauer, T., Foltz, P., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2–3), 259–284.
Liu, L., Tang, L., Dong, W., Yao, S., & Zhou, W. (2016). An overview of topic modeling and its current applications in Bioinformatics. SpringerPlus, 5(1):1608. doi: 10.1186/s40064-016-3252-8.
Luo, W., Stenger, B., Zhao, X., & Kim, T-K. (2015). Automatic topic discovery for multi-object tracking. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Texas, 3820–3826. Retrieved from https:// dl.acm.org/citation.cfm?id=2888246
Meng, C., Zhang, M., & Guo, W. (2012). Evolution of movie topics over time. Retrieved from http://cs229.stanford. edu/proj2012/MengZhangGuo-EvolutionofMovieTopicsOverTime.pdf
Nastase, V. (2012). Introduction to topic models. Retrieved from http://www.cl.uni-heidelberg.de/courses/ss12/ topicmodels/intro.pdf
Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., & Steyvers, M. (2010). Learning author-topic models from text corpora. ACM Transactions on Information Systems, 28(1), 1–38.
Wang, X., & McCallum, A. (2006). Topics over time: A non-Markov continuous-time model of topical trends. In KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, ACM Press. 424–433. doi: 10.1145/1150402.1150450
Yang, Y., Downey, D., & Boyd-Graber, J. (2015). Efficient methods for incorporating knowledge into topic models. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics, 308-317. Retrieved from https://www.aclweb.org/anthology/D/ D15/D15-1037.pdf
Zeng, J. (2012). A topic modeling toolbox using belief propagation. Journal of Machine Learning Research, 13(1), 2233–2236.