Volume 13 Issue 5
Oct.  2022
Turn off MathJax
Article Contents
Volodymyr V. Mihunov, Navid H. Jafari, Kejin Wang, Nina S. N. Lam, Dylan Govender. Disaster Impacts Surveillance from Social Media with Topic Modeling and Feature Extraction: Case of Hurricane Harvey[J]. International Journal of Disaster Risk Science, 2022, 13(5): 729-742. doi: 10.1007/s13753-022-00442-1
Citation: Volodymyr V. Mihunov, Navid H. Jafari, Kejin Wang, Nina S. N. Lam, Dylan Govender. Disaster Impacts Surveillance from Social Media with Topic Modeling and Feature Extraction: Case of Hurricane Harvey[J]. International Journal of Disaster Risk Science, 2022, 13(5): 729-742. doi: 10.1007/s13753-022-00442-1

Disaster Impacts Surveillance from Social Media with Topic Modeling and Feature Extraction: Case of Hurricane Harvey

doi: 10.1007/s13753-022-00442-1

This article is based on work supported by two grants from the National Science Foundation of the United States (under Grant Numbers 1620451 and 1945787). Any opinions, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the National Science Foundation.

  • Available Online: 2022-11-01
  • Twitter can supply useful information on infrastructure impacts to the emergency managers during major disasters, but it is time consuming to filter through many irrelevant tweets. Previous studies have identified the types of messages that can be found on social media during disasters, but few solutions have been proposed to efficiently extract useful ones. We present a framework that can be applied in a timely manner to provide disaster impact information sourced from social media. The framework is tested on a well-studied and data-rich case of Hurricane Harvey. The procedures consist of filtering the raw Twitter data based on keywords, location, and tweet attributes, and then applying the latent Dirichlet allocation (LDA) to separate the tweets from the disaster affected area into categories (topics) useful to emergency managers. The LDA revealed that out of 24 topics found in the data, nine were directly related to disaster impacts-for example, outages, closures, flooded roads, and damaged infrastructure. Features such as frequent hashtags, mentions, URLs, and useful images were then extracted and analyzed. The relevant tweets, along with useful images, were correlated at the county level with flood depth, distributed disaster aid (damage), and population density. Significant correlations were found between the nine relevant topics and population density but not flood depth and damage, suggesting that more research into the suitability of social media data for disaster impacts modeling is needed. The results from this study provide baseline information for such efforts in the future.
  • loading
  • Alam, F., F. Ofli, and M. Imran. 2020. Descriptive and visual summaries of disaster events using artificial intelligence techniques: Case studies of Hurricanes Harvey, Irma, and Maria. Behaviour & Information Technology 39(3): 288–318.
    Albalawi, R., T.H. Yeap, and M. Benyoucef. 2020. Using topic modeling methods for short-text data: A comparative analysis. Frontiers in Artificial Intelligence 3: Article 42.
    Blei, D.M., A.Y. Ng, and M.I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3: 993–1022.
    Blum, A., J. Hopcroft, and R. Kannan. 2020. Foundations of data science. Cambridge: Cambridge University Press.
    Cambon, J., D. Hernangómez, C. Belanger, and D. Possenriede. 2021. tidygeocoder: An R package for geocoding. Journal of Open Source Software 6(65): Article 3544.
    Chakkarwar, V., and S.C. Tamane. 2020. Quick insight of research literature using topic modeling. Singapore: Springer.
    Chen, Y., and W. Ji. 2021. Enhancing situational assessment of critical infrastructure following disasters using social media. Journal of Management in Engineering 37(6): 04021058. https://doi.org/10.1061/(ASCE)ME.1943-5479.0000955.
    Cheng, X., X. Yan, Y. Lan, and J. Guo. 2014. BTM: Topic modeling over short texts. IEEE Transactions on Knowledge and Data Engineering 26(12): 2928–2941.
    Endsley, M.R. 1995. Toward a theory of situation awareness in dynamic systems. Human Factors: The Journal of the Human Factors and Ergonomics Society 37(1): 32–64.
    Esri. 2021. How the zonal statistics tools work. https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-analyst/how-zonal-statistics-works.htm. Accessed Jan 2022.
    Fan, A., F. Doshi-Velez, and L. Miratrix. 2019. Assessing topic model relevance: Evaluation and informative priors. Statistical Analysis and Data Mining: The ASA Data Science Journal 12(3): 210–222.
    Feinerer, I., and K. Hornik. 2020. tm: Text mining package. R package version 0.7-8, https://CRAN.R-project.org/package=tm. Accessed Apr 2021.
    Fellows, I. 2018. wordcloud: Word clouds. https://cran.r-project.org/package=wordcloud. Accessed Aug 2021.
    FEMA (Federal Emergency Management Agency). 2018. FEMA—Harvey flood depths grid. HydroShare. https://doi.org/10.4211/hs.165e2c3e335d40949dbf501c97827837.
    FEMA (Federal Emergency Management Agency). 2020. OpenFEMA dataset: Registration Intake and Individuals Household Program (RI-IHP) – v1. https://www.fema.gov/openfema-data-page/registration-intake-and-individuals-household-program-ri-ihp-v1. Accessed Nov 2021.
    Ferner, C., C. Havas, E. Birnbacher, S. Wegenkittl, and B. Resch. 2020. Automated seeded latent Dirichlet allocation for social media based event detection and mapping. Information 11(8): Article 376.
    Ford, I. 2017. Semantic representation of general topology in the wolfram language. Cham: Springer.
    Google. 2022. Geocoding API. https://developers.google.com/maps/documentation/geocoding. Accessed Apr 2021.
    Griffiths, T.L., and M. Steyvers. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences 101(S1): 5228–5235.
    Grün, B., and K. Hornik. 2011. topicmodels: An R package for fitting topic models. Journal of Statistical Software 40(13): 1–30.
    Huang, Q., and Y. Xiao. 2015. Geographic situational awareness: Mining tweets for disaster preparedness, emergency response, impact, and recovery. ISPRS International Journal of Geo-Information 4(3): 1549–1568.
    Imran, M., C. Castillo, F. Diaz, and S. Vieweg. 2015. Processing social media messages in mass emergency. ACM Computing Surveys 47(4): 1–38.
    Jafari, N.H., X. Li, Q. Chen, C.-Y. Le, L.P. Betzer, and Y. Liang. 2021. Real-time water level monitoring using live cameras and computer vision techniques. Computers & Geosciences 147: Article 104642.
    Jamali, M., A. Nejat, S. Ghosh, F. Jin, and G. Cao. 2019. Social media data and post-disaster recovery. International Journal of Information Management 44: 25–37.
    Khan, S.M., M. Chowdhury, L.B. Ngo, and A. Apon. 2020. Multi-class Twitter data categorization and geocoding with a novel computing framework. Cities 96: Article 102410.
    Kryvasheyeu, Y., H. Chen, N. Obradovich, E. Moro, P. Van Hentenryck, J. Fowler, and M. Cebrian. 2016. Rapid assessment of disaster damage using social media activity. Science Advances 2(3): Article e1500779.
    Li, J., K.K. Stephens, Y. Zhu, and D. Murthy. 2019. Using social media to call for help in Hurricane Harvey: Bonding emotion, culture, and community relationships. International Journal of Disaster Risk Reduction 38: Article 101212.
    Lyu, J.C., and G.K. Luli. 2021. Understanding the public discussion about the Centers for Disease Control and Prevention during the COVID-19 pandemic using Twitter data: Text mining analysis study. Journal of Medical Internet Research 23(2): Article e25108.
    Middleton, S.E., G. Kordopatis-Zilos, S. Papadopoulos, and Y. Kompatsiaris. 2018. Location extraction from social media. ACM Transactions on Information Systems 36(4): 1–27.
    Mihunov, V.V., N.S.N. Lam, L. Zou, Z. Wang, and K. Wang. 2020. Use of Twitter in disaster rescue: Lessons learned from Hurricane Harvey. International Journal of Digital Earth 13: 1454–1466.
    Murzintcev, N., and N. Chaney. 2020. ldatuning package. https://CRAN.R-project.org/package=ldatuning. Accessed Apr 2021.
    Phan, X.-H., L.-M. Nguyen, and S. Horiguchi. 2008. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceedings of the 17th International World Wide Web Conference (WWW 2008), 21–25 Apr 2008, Beijing, China, 91–100.
    Rinker, T.W. 2013. qdapDictionaries: Dictionaries to accompany the qdap Package. 1.0.7. University at Buffalo, Buffalo, New York, USA. http://github.com/trinker/qdapDictionaries. Accessed Apr 2021.
    Russell, S.J., and P. Norvig. 2010. Artificial intelligence: A modern approach. Pearson: Prentice-Hall.
    Samuels, R., J.E. Taylor, and N. Mohammadi. 2020. Silence of the tweets: Incorporating social media activity drop-offs into crisis detection. Natural Hazards 103(1): 1455–1477.
    Sarkar, D. 2016. Text analytics with Python: A practical real-world approach to gaining actionable insights from your data. New York: Apress.
    Schofield, A., M. Magnusson, and D. Mimno. 2017. Pulling out the stops: Rethinking stopword removal for topic models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, ed. M. Lapata, P. Blunsom, and A. Koller, 432–436. Valencia, Spain: Association for Computational Linguistics.
    Stanley, S., and C. Arendt. 2020. tidyjson: Tidy complex ‘JSON’. https://cran.r-project.org/package=tidyjson. Accessed Sept 2020.
    U.S. Census Bureau. 2021. 2013–2017 American community survey 5-year estimates: DP05 ACS demographic and housing estimates. Washington: U.S. Census Bureau.
    U.S. Census Bureau. 2022. Census geocoder documentation. https://www.census.gov/programs-surveys/geography/technical-documentation/complete-technical-documentation/census-geocoder.html. Accessed Jan 2022.
    Wang, Z., and X. Ye. 2018. Social media analytics for natural disaster management. International Journal of Geographical Information Science 32(1): 49–72.
    Wang, Z., and X. Ye. 2018. Space, time, and situational awareness in natural hazards: A case study of Hurricane Sandy with social media data. Cartography and Geographic Information Science 46(4): 334–346.
    Wang, Z., N.S.N. Lam, N. Obradovich, and X. Ye. 2019. Are vulnerable communities digitally left behind in social responses to natural disasters? An evidence from Hurricane Sandy with Twitter data. Applied Geography 108: 1–8.
    Wang, K., N.S.N. Lam, L. Zou, and V. Mihunov. 2021. Twitter use in Hurricane Isaac and its implications for disaster resilience. ISPRS International Journal of Geo-Information 10(3): Article 116.
    Watson, K.M., G.R. Harwell, D.S. Wallace, T.L. Welborn, V.G. Stengel, and J.S. McDowell. 2018. Characterization of peak streamflows and flood inundation of selected areas in southeastern Texas and southwestern Louisiana from the August and September 2017 flood resulting from Hurricane Harvey. Scientific Investigations Report 2018-5070. Reston, VA: U.S. Geological Survey.
    Wolfram Research, Inc. 2021. Mathematica, Version 12.3.1. Champaign, IL: Wolfram Research, Inc.
    Xu, Z., K. Lachlan, L. Ellis, and A.M. Rainear. 2020. Understanding public opinion in different disaster stages: A case study of Hurricane Irma. Internet Research 30(2): 695–709.
    Xue, J., J. Chen, C. Chen, C. Zheng, S. Li, and T. Zhu. 2020. Public discourse and sentiment during the COVID 19 pandemic: Using latent Dirichlet allocation for topic modeling on Twitter. PLoS ONE 15(9): Article e0239441.
    Yao, F., and Y. Wang. 2020. Towards resilient and smart cities: A real-time urban analytical and geo-visual system for social media streaming data. Sustainable Cities and Society 63: Article 102448.
    Yuan, F., M. Li, R. Liu, W. Zhai, and B. Qi. 2021. Social media for enhanced understanding of disaster resilience during Hurricane Florence. International Journal of Information Management 57: Article 102289.
    Zou, L., N.S.N. Lam, H. Cai, and Y. Qiang. 2018. Mining Twitter data for improved understanding of disaster resilience. Annals of the American Association of Geographers 108(5): 1422–1441.
    Zou, L., N.S.N. Lam, S. Shams, H. Cai, M.A. Meyer, S. Yang, K. Lee, S.-J. Park, and M.A. Reams. 2019. Social and geographical disparities in Twitter use during Hurricane Harvey. International Journal of Digital Earth 12(11): 1300–1318.
  • 加载中


    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (114) PDF downloads(0) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint