Mahdi Farnaghi, Zeinab Ghaemi, Ali Mansourian. Dynamic Spatio-Temporal Tweet Mining for Event Detection: A Case Study of Hurricane Florence[J]. International Journal of Disaster Risk Science, 2020, 11(3): 378-393. doi: 10.1007/s13753-020-00280-z
Citation: Mahdi Farnaghi, Zeinab Ghaemi, Ali Mansourian. Dynamic Spatio-Temporal Tweet Mining for Event Detection: A Case Study of Hurricane Florence[J]. International Journal of Disaster Risk Science, 2020, 11(3): 378-393. doi: 10.1007/s13753-020-00280-z

Dynamic Spatio-Temporal Tweet Mining for Event Detection: A Case Study of Hurricane Florence

doi: 10.1007/s13753-020-00280-z
  • Available Online: 2021-04-26
  • Extracting information about emerging events in large study areas through spatiotemporal and textual analysis of geotagged tweets provides the possibility of monitoring the current state of a disaster. This study proposes dynamic spatio-temporal tweet mining as a method for dynamic event extraction from geotagged tweets in large study areas. It introduces the use of a modified version of ordering points to identify the clustering structure to address the intrinsic heterogeneity of Twitter data. To precisely calculate the textual similarity, three state-of-the-art text embedding methods of Word2vec, GloVe, and FastText were used to capture both syntactic and semantic similarities. The impact of selected embedding algorithms on the quality of the outputs was studied. Different combinations of spatial and temporal distances with the textual similarity measure were investigated to improve the event detection outcomes. The proposed method was applied to a case study related to 2018 Hurricane Florence. The method was able to precisely identify events of varied sizes and densities before, during, and after the hurricane. The feasibility of the proposed method was qualitatively evaluated using the Silhouette coefficient and qualitatively discussed. The proposed method was also compared to an implementation based on the standard density-based spatial clustering of applications with noise algorithm, where it showed more promising results.
  • loading
  • Arcaini, P., G. Bordogna, D. Ienco, and S. Sterlacchini. 2016. User-driven geo-temporal density-based exploration of periodic and not periodic events reported in social networks. Information Sciences 340–341: 122–143.
    Benhardus, J., and J. Kalita. 2013. Streaming trend detection in twitter. International Journal of Web Based Communities 9(1): 122–139.
    Ben-Lhachemi, N., and E.H. Nfaoui. 2018. Using tweets embeddings for hashtag recommendation in twitter. Procedia Computer Science 127: 7–15.
    Bifet, A. 2010. Adaptive stream mining: Pattern learning and mining from evolving data streams. Amsterdam: IOS Press.
    Bojanowski, P., E. Grave, A. Joulin, and T. Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5: 135–146.
    Capdevila, J., J. Cerquides, J. Nin, and J. Torres. 2017. Tweet-SCAN: An event discovery technique for geo-located tweets. Pattern Recognition Letters 93: 58–68.
    Cheng, T., and T. Wicks. 2014. Event detection using twitter: A spatio-temporal approach. PloS One 9(6): Article e97807.
    Croitoru, A., N. Wayant, A. Crooks, J. Radzikowski, and A. Stefanidis. 2015. Linking cyber and physical spaces through community detection and clustering in social media feeds. Computers, Environment and Urban Systems 53: 47–64.
    Cui, W., P. Wang, Y. Du, X. Chen, D. Guo, J. Li, and Y. Zhou. 2017. An algorithm for event detection based on social media data. Neurocomputing 254: 53–58.
    Ester, M., H.-P. Kriegel, J. Sander, and X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the international conference on knowledge discovery and data mining, 226–231, 2-4 August 1996, Portland, OR, USA.
    Farnaghi, M., and A. Mansourian. 2013. Disaster planning using automated composition of semantic OGC web services: A case study in sheltering. Computers, Environment and Urban Systems 41: 204–218.
    Fócil-Arias, C., J. Zúñiga, G. Sidorov, I. Batyrshin, and A. Gelbukh. 2017. A tweets classifier based on cosine similarity. Working notes of CLEF 2017—Conference and Labs of the Evaluation Forum, Dublin, Ireland, 11-14 September 2017.
    Ghaemi, Z., and M. Farnaghi. 2019. A varied density-based clustering approach for event detection from heterogeneous twitter data. ISPRS International Journal of Geo-Information 8(2): Article 82.
    Guerra, L., V. Robles, C. Bielza, and P. Larrañaga. 2012. A comparison of clustering quality indices using outliers and noise. Intelligent Data Analysis 16(4): 703–715.
    Hasan, M., M.A. Orgun, and R. Schwitter. 2018. A survey on real-time event detection from the Twitter data stream. Journal of Information Science 44(4): 443–463.
    Hecht, B., L. Hong, B. Suh, and E.H. Chi. 2011. Tweets from Justin Bieber’s heart: The dynamics of the “location” field in user profiles. In Proceedings of the ACM CHI annual conference on human factors in computing systems, 237–246, 7-12 May 2011, Vancouver, BC, Canada.
    Huang, Q., and Y. Xiao. 2015. Geographic situational awareness: Mining tweets for disaster preparedness, emergency response, impact, and recovery. ISPRS International Journal of Geo-Information 4(3): 1549–1568.
    Huang, Y., Y. Li, and J. Shan. 2018. Spatial-temporal event detection from geotagged tweets. ISPRS International Journal of Geo-Information 7(4): Article 150.
    Idrissi, A., H. Rehioui, A. Laghrissi, and S. Retal. 2015. An improvement of DENCLUE algorithm for the data clustering. In Proceedings of the 2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA), 21-23 December 2015, Marrakech, Morocco. IEEE. https://doi.org/10.1109/icta.2015.7426936.
    Joshi, A., and R. Kaur. 2013. A review: Comparative study of various clustering techniques in data mining. International Journal of Advanced Research in Computer Science and Software Engineering 3(3): 55–57.
    Kaleel, S.B., and A. Abhari. 2015. Cluster-discovery of twitter messages for event detection and trending. Journal of Computational Science 6: 47–57.
    Kirilenko, A.P., and S.O. Stepchenkova. 2017. Sochi 2014 Olympics on twitter: Perspectives of hosts and guests. Tourism Management 63: 54–65.
    Krajewski, W.F., D. Ceynar, I. Demir, R. Goska, A. Kruger, C. Langel, R. Mantilla, J. Niemeier, et al. 2016. Real-time flood forecasting and information system for the State of Iowa. Bulletin of the American Meteorological Society 98(3): 539–554.
    Lee, C.-H. 2012. Mining spatio-temporal information on microblogging streams using a density-based online clustering method. Expert Systems with Applications 39(10): 9623–9641.
    Lee, K., D. Palsetia, R. Narayanan, M.M.A. Patwary, A. Agrawal, and A.N. Choudhary. 2011. Twitter trending topic classification. In Proceedings of the 11th IEEE international conference on data mining workshops, 251–258, 11 December 2011, Vancouver, BC, Canada.
    Liu, P., D. Zhou, and N. Wu. 2007. VDBSCAN: Varied density based spatial clustering of applications with noise. In Proceedings of the 2007 international conference on service systems and service management, 1-4, 9-11 June 2007, Chengdu, China.
    Mary, S.A.L., A.N. Sivagami, and M.U. Rani. 2015. Cluster validity measures dynamic clustering algorithms. ARPN Journal of Engineering and Applied Sciences 10(9): 4009–4012.
    Mikolov, T., K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st international conference on learning representations, 1-12, 2-4 May 2013, Scottsdale, AZ, USA.
    Morchid, M., Y. Portilla, D. Josselin, R. Dufour, E. Altman, M. El-Beze, J.-V. Cossu, G. Linarès, and A. Reiffers-Masson. 2015. An author-topic based approach to cluster tweets and mine their location. Procedia Environmental Sciences 27: 26–29.
    Nguyen, M.D, and W.-Y. Shin. 2017. DBSTexC: Density-based spatio-textual clustering on twitter. In Proceedings of the 9th IEEE/ACM international conference on advances in social networks analysis and mining, 23–26, 31 July-3 August 2017, Sydney, Australia.
    Nguyen, T., M.E. Larsen, B. O’Dea, D.T. Nguyen, J. Yearwood, D. Phung, S. Venkatesh, and H. Christensen. 2017. Kernel-based features for predicting population health indices from geocoded social media data. Decision Support Systems 102: 22–31.
    Niederkrotenthaler, T., B. Till, and D. Garcia. 2019. Celebrity suicide on twitter: Activity, content and network analysis related to the death of Swedish DJ Tim Bergling alias Avicii. Journal of Affective Disorders 245: 848–855.
    Parimala, M., D. Lopez, and N.C. Senthilkumar. 2011. A survey on density based clustering algorithms for mining large spatial databases. International Journal of Advanced Science and Technology 31(1): 59–66.
    Pennington, J., R. Socher, and C. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543, 25–29 October 2014, Doha, Qatar.
    Phelan, O., K. McCarthy, and B. Smyth. 2009. Using twitter to recommend real-time topical news. In Proceedings of the 2009 ACM conference on recommender systems, 385–388, 23-25 October 2009, New York, NY, USA.
    Reddy, B.G.O., and M. Ussenaiah. 2012. Literature survey on clustering techniques. IOSR Journal of Computer Engineering 3(1): 1–50.
    Rousseeuw, P.J. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20: 53–65.
    Sander, J., M. Ester, H.-P. Kriegel, and X. Xu. 1998. Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery 2(2): 169–194.
    Schubert, E., and M. Gertz. 2018. Improving the cluster structure extracted from OPTICS plots. In Proceedings of the conference “lernen, wissen, daten, analysen”, 318–329, 22-24 August 2018, Mannheim, Germany.
    Schubert, E., J. Sander, M. Ester, H.P. Kriegel, and X. Xu. 2017. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems 42(3): Article 19.
    Sit, M.A., C. Koylu, and I. Demir. 2019. Identifying disaster-related tweets and their semantic, spatial and temporal context using deep learning, natural language processing and spatial analysis: A case study of Hurricane Irma. International Journal of Digital Earth 12(11): 1205–1229.
    Srijith, P.K., M. Hepple, K. Bontcheva, and D. Preotiuc-Pietro. 2017. Sub-story detection in twitter with hierarchical Dirichlet processes. Information Processing & Management 53(4): 989–1003.
    Steiger, E., B. Resch, and A. Zipf. 2016. Exploration of spatiotemporal and semantic clusters of Twitter data using unsupervised neural networks. International Journal of Geographical Information Science 30(9): 1694–1716.
    Steiger, E., R. Westerholt, B. Resch, and A. Zipf. 2015. Twitter as an indicator for whereabouts of people? Correlating twitter with UK census data. Computers, Environment and Urban Systems 54: 255–265.
    Sutton, J., S.C. Vos, M.K. Olson, C. Woods, E. Cohen, C.B. Gibson, N.E. Phillips, J.L. Studts, et al. 2018. Lung cancer messages on twitter: Content analysis and evaluation. Journal of the American College of Radiology 15(1): 210–217.
    Teh, Y.W., M.I. Jordan, M.J. Beal, and D.M. Blei. 2006. Hierarchical Dirichlet processes. Journal of the American Statistical Association 101(476): 1566–1581.
    Vijayarani, S., and P. Jothi. 2014. Partitioning clustering algorithms for data stream outlier detection. International Journal of Innovative Research in Computer and Communication Engineering 2(4): 3975–3981.
    Walther, M., and M. Kaisser. 2013. Geo-spatial event detection in the twitter stream. In Proceedings of the 35th European conference on advances in information retrieval, ECIR 2013, 356–367, 24-27 March 2013, Moscow, Russia.
    Wang, Z., X. Ye, and M.-H. Tsou. 2016. Spatial, temporal, and content analysis of Twitter for wildfire hazards. Natural Hazards 83(1): 523–540.
    Yang, W., and L. Mu. 2015. GIS analysis of depression among twitter users. Applied Geography 60: 217–223.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (73) PDF downloads(0) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return