Volume 6, Issue 3, June 2018, Page: 77-83
Topic Modeling of Environmental Data on Social Networks Based on ED-LDA
Lei Feng, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China; Institute for Applied Microelectronics (IUMA), ULPGC, Las Palmas de G.C., Spain; School of Urban Construction and Environmental Engineering, Chongqing University, Chongqing, China
Jose López, Institute for Applied Microelectronics (IUMA), ULPGC, Las Palmas de G.C., Spain
Li Feng, Chongqing Academe of Environmental Science, Chongqing, China
Sheng Zhang, Chongqing Academe of Environmental Science, Chongqing, China
Bormin Huang, Institute for Applied Microelectronics (IUMA), ULPGC, Las Palmas de G.C., Spain
Fang Fang, School of Urban Construction and Environmental Engineering, Chongqing University, Chongqing, China
Chongming Li, Chongqing Academe of Environmental Science, Chongqing, China
Received: Apr. 24, 2018;       Accepted: Jun. 21, 2018;       Published: Jul. 23, 2018
DOI: 10.11648/j.ijema.20180603.12      View  591      Downloads  41
Abstract
The rapid development in information technology and web technology has facilitated an extreme increase in the collection and storage of digital data. With the development of environmental online monitoring science and internet technology development, more and more environmental data are stored on the Internet and shared by people on social networks. Therefore, there is a growing interest in automatically identifying environmental factors and environmental big data mining that contribute to public environmental risks, such as mining water quality problem, air pollution problem, soil problem on internet. Better understanding of these factors and analysis data will enable more precise prediction of the location and time of high risk events for environmental management. These environmental data from social networks by using WebCrawler in Twitter, Early work research on environmental data analysis focused more on specific filed analysis for traditional data without consider data relationships and data structure on social networks. The traditional environmental data analysis methods have been studied well, but no algorithms are designed for analysis environmental data on social networks. In this paper, this research propose a novel probabilistic generative model based on LDA, it called ED-LDA algorithm model that algorithm model not only consider the traditional environmental data analysis method, but also include the environmental data relationship and structure to help us find out the useful information and analysis to mine the relationship between users and their posted environmental data on social network to better understand data meaning for environmental management. This research present a Gibbs sampling implementation for inference of our model, and find out the environmental data topic on twitters. Besides our model can be used to many other environmental context files. The experimental result shows that Comparing with the traditional LDA clustering algorithm ED-LDA method can effectively mine and classify environmental data. This method can be a powerful computational approach for clustering environmental data on internet.
Keywords
ED-LDA, Probabilistic, Environmental Data, Social Network, Data Mining
To cite this article
Lei Feng, Jose López, Li Feng, Sheng Zhang, Bormin Huang, Fang Fang, Chongming Li, Topic Modeling of Environmental Data on Social Networks Based on ED-LDA, International Journal of Environmental Monitoring and Analysis. Vol. 6, No. 3, 2018, pp. 77-83. doi: 10.11648/j.ijema.20180603.12
Copyright
Copyright © 2018 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Reference
[1]
Robert C. Machine Learning, a Probabilistic Perspective [J]. Chance, 2014, 27(2):62-63.
[2]
Boneschanscher M P, Evers W H, Geuchies J J, et al. Long-range orientation and atomic attachment of nanocrystals in 2D honeycomb superlattices [J]. Science, 2014, 344(6190):1377.
[3]
Schwarz C. ldagibbs: A command for topic modeling in Stata using latent Dirichlet allocation [J]. Stata Journal, 2018, 18.
[4]
Turney, Peter D, Pantel, et al. From frequency to meaning: vector space models of semantics [J]. Journal of Artificial Intelligence Research, 2010, 37(1):141-188.
[5]
Xie L, Li G, Xiao M, et al. Novel classification method for remote sensing images based on information entropy discretization algorithm and vector space model [J]. Computers & Geosciences, 2016, 89(C):252-259.
[6]
Hebballi V, Rojit V. Latent Semantic Analysis (LSA) based object recognition and clustering[C]// International Conference on Green Computing and Internet of Things. IEEE, 2016:416-421.
[7]
Zhang M, Li P, Wang W. An index-based algorithm for fast on-line query processing of latent semantic analysis [J]. Plos One, 2017, 12(5):e0177523.
[8]
Littman M L, Dumais S T, Landauer T K. Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing [M]// Cross-Language Information Retrieval. Springer US, 1998:51-62.
[9]
Wang H L, Sui D N. Latent Semantic Analysis for Text-Based Research [J]. Journal of Chongqing University, 2005.
[10]
Hofmann T. Unsupervised Learning by Probabilistic Latent Semantic Analysis [J]. Machine Learning, 2001, 42(1-2):177-196.
[11]
Wu X, Yan J, Liu N, et al. Probabilistic latent semantic user segmentation for behavioral targeted advertising[C]// ACM SIGKDD Workshop on Data Mining and Audience Intelligence for Advertising, Paris, France, June. DBLP, 2009:10-17.
[12]
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.
[13]
Griffiths, T. L., & Steyvers, M. (2003). Prediction and semantic association. In Neural information processing systems 15. Cambridge, MA: MIT Press.
[14]
Chae, B. K. (2015). Insights from hashtag# supplychain and Twitter Analytics: Considering Twitter and Twitter data for supply chain practice and research. International Journal of Production Economics, 165, 247-259.
[15]
Wang H L, Sui D N. Latent Semantic Analysis for Text-Based Research [J]. Journal of Chongqing University, 2005.
Browse journals by subject