Trending Research Topics Detection over Time Using the Latent Dirichlet Allocation Model

Pimpa Cheewaprakobkit

pdf

Published: May 28, 2019

Keywords:

Topic model, Latent Dirichlet Allocation, trending topics detection, topic evolution

Pimpa Cheewaprakobkit

Asia-Pacific International University, Thailand

Abstract

Topic modeling has become an extraordinary tool for analyzing large, unclassified documents to detect patterns that use similar words. It also permits the discovery of hidden themes that pervade the collection, allowing annotation according to those themes, and then using these to summarize and search the text. This paper focuses on Latent Dirichlet Allocation, which is one of the most widely used methods for topic modeling. Trend detection and evolution of topics relating to research topics was attempted from the Journal of the Modern Language Association of America. The study also identified those documents that explained topics according to time series and citation. The experimental data consisted of 5,605 articles from the Journal between the years 1889–2007 held in the Journal Storage digital library. The results show that the Latent Dirichlet Allocation model can effectively detect distinct topics and those documents that explained these topics over time.

Issue

Vol. 20 No. 2 (2019): April - June 2019

Section

Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Copyright: Asia-Pacific International University reserve exclusive rights to publish, reproduce and distribute the manuscript and all contents therein.

References

Aiello, L., Petkos, G., Martin, C., Corney, D., Papadopoulos, S. Skraba, R., (. . .) Jaimes, A. (2013). Sensing trending topics in Twitter. IEEE Transactions on Multimedia, 15(6), 1268–1282.

Alghamdi, R., & Alfalqi, K. (2015). A Survey of topic modeling in text mining. International Journal of Advanced Computer Science and Applications, 6(1), 147–153.

Al-Khateeb, S. (2014). Topic modeling for associated press articles using Latent Dirichlet Allocation [LDA]. Retrieved from https://pdfs.semanticscholar.org/4eeb/900c36a8a9714cceb8a3ccc6ec8a307d2170.pdf

Blei, D., & Lafferty, J. (2006). Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning. ACM, New York, 113–120. doi: 10.1145/ 1143844.1143859

Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84. doi:10.1145/2133806.2133826

Blei, M., Ng, Y., & Jordan, I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022.

Bolelli, L., Ertekin, S., & Giles, C. (2009). Topic and trend detection in text collections using Latent Dirichlet Allocation. Advances in Information Retrieval, 5478, 776–780. doi:10.1007/978-3-642-00958-7_84

Christidis, K., Apostolou, D., & Mentzas, G. (2010). Exploring customer preferences with probabilistic topics models. Retrieved from http://imu.ntua.gr/sites/default/files/biblio/Papers/exploring-customer-preferences-with-probabilistic-topics-models.pdf

Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. Proceeding of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2, 524–531. doi: 10.1109/CVPR.2005.16

Goldstone, A. (2013). Dfr-Browser license (Licensed under the MIT license). Retrieved April 24, 2019, from https://github.com/agoldst/dfr-browser/find/master

Hisano, R., Sornette, D., Mizuno, T., Ohnishi, T., & Watanabe, T. (2013). High quality topic extraction from business news explains abnormal financial market volatility. PLoS ONE, 8(6), e64846. doi.org/10.1371/journal.pone. 0064846

Hoffman, M., Blei, D., & Bach, F. (2010). Online learning for Latent Dirichlet Allocation. Proceeding of the 23rd International Conference on Neural Information Processing Systems (NIPS-10), 856–864.

Hofmann, T. (2001). Unsupervised learning by probabilistic Latent semantic analysis. Machine Learning, 42(1–2), 177–196. doi: 10.1023/A:1007617005950.

Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2017). Latent Dirichlet Allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications, 1–43. Retrieved from https:// link.springer.com/article/10.1007/s11042-018-6894-4

Landauer, T., Foltz, P., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2–3), 259–284.
Liu, L., Tang, L., Dong, W., Yao, S., & Zhou, W. (2016). An overview of topic modeling and its current applications in Bioinformatics. SpringerPlus, 5(1):1608. doi: 10.1186/s40064-016-3252-8.

Luo, W., Stenger, B., Zhao, X., & Kim, T-K. (2015). Automatic topic discovery for multi-object tracking. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Texas, 3820–3826. Retrieved from https:// dl.acm.org/citation.cfm?id=2888246

Meng, C., Zhang, M., & Guo, W. (2012). Evolution of movie topics over time. Retrieved from http://cs229.stanford. edu/proj2012/MengZhangGuo-EvolutionofMovieTopicsOverTime.pdf

Nastase, V. (2012). Introduction to topic models. Retrieved from http://www.cl.uni-heidelberg.de/courses/ss12/ topicmodels/intro.pdf

Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., & Steyvers, M. (2010). Learning author-topic models from text corpora. ACM Transactions on Information Systems, 28(1), 1–38.

Wang, X., & McCallum, A. (2006). Topics over time: A non-Markov continuous-time model of topical trends. In KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, ACM Press. 424–433. doi: 10.1145/1150402.1150450

Yang, Y., Downey, D., & Boyd-Graber, J. (2015). Efficient methods for incorporating knowledge into topic models. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics, 308-317. Retrieved from https://www.aclweb.org/anthology/D/ D15/D15-1037.pdf

Zeng, J. (2012). A topic modeling toolbox using belief propagation. Journal of Machine Learning Research, 13(1), 2233–2236.

Article Sidebar

Main Article Content

Abstract

Article Details

References