The spread of infectious disease presents a threat to the health of all nations, so mitigating it is in the interest of the entire world. Methods of mitigating the spread of infectious diseases involve tracking its progress across a country, but many traditional methods of tracking are inadequate. Manually noting the number of positives and reporting them is time-consuming and is prone to inaccuracy. As a result, predictive models that simulate the spread of disease are valuable in coordinating responses to infectious diseases. Previous methods relied on statistical modeling and linear regression models, but newer models that incorporate deep learning are a promising advancement in the field of public health. This study used South Korea as its region of interest and looked at two methods of modeling the spread of infectious disease using previous statistical models, and two methods that use deep learning algorithms to test the hypothesis that neural networks and big data can improve the accuracy of infectious disease predicting models. To test this, the researchers looked at the past spread of three diseases: chickenpox, malaria, and scarlet fever. The data used was split into four categories: search query data, social media big data, temperature data, and humidity data. Search query data was taken from the South Korean browser Naver, the social media big data was taken from twitter and consisted of posts that contained keywords or phrases such as “chickenpox” and “chickenpox symptoms”. The weather data was taken from the Korea Meteorological Administration, and the values for temperature and humidity were daily averages for regions of the country. Weather data was used since the temperature and humidity of a region are shown to be key factors in the spread of infectious disease. The methods used for comparison were Ordinary Least Squares (OLS), Autoregressive Integrated Moving Average (ARIMA), Deep Neural Network (DNN), and Long-Short Term Memory (LSTM). DNN and LSTM used deep learning algorithms, and the results of all of the methods were compared to the recorded spread of all of the diseases over 576 days. The methods were tested by running a simulation using the relevant variables above, with the majority of the data going into training sets for the models, with some going into a test set. For OLS and ARIMA, the training set to testing set ratio was 8:2 and for the DNN and LSTM sets the ratio for training sets to testing sets to validation sets was 6:2:2. Optimal lag times from when data was introduced to when the models predicted the number of infections was 7 days since a lag of 7 days was shown to be optimal for predictive power. After the data sets were run through the models, they were compared using Root Mean Squared Error (RMSE) to test for accuracy compared to the known rates, and the standard deviation of the values was used to test for precision. Results showed that the methods that incorporated deep learning algorithms were more accurate than the OLS and ARIMA models. Also, the DNN model was shown to be the most accurate overall. This is mostly because the DNN and LSTM models were more sensitive to increases and decreases in the number of infections, so they stayed truer to the actual number of infections throughout the timeframe of the study. However, the DNN models and LSTM models were shown to have different strengths, which was most evident when they were both used to track the spread of malaria, where LSTM was more accurate than the DNN method. This suggests that DNN is more accurate than LSTM most of the time, but when an infectious disease is rapidly spreading LSTM is a more effective model. This study illustrates the potential of deep learning algorithms in tracking the spread of infectious disease, and some of the most effective methods that could be used to improve the public health of any region.

Chae S, Kwon S, Lee D. Predicting Infectious Disease Using Deep Learning and Big Data. Int J Environ Res Public Health. 2018;15(8):1596. Published 2018 Jul 27. doi:10.3390/ijerph15081596