Alex Coombs

DATA 150

11/15/20

Evaluating the Effectiveness of Data Science Models that Implement User-Generated Data for Mitigating the Spread of Infectious Diseases.

Africa is the only continent where infectious diseases cause more death than chronic illness. Mitigating the spread of infectious disease is the first step to addressing this issue. In addition, healthcare in sub-Saharan Africa is the worst in the world, overall wellbeing being rated a 4.39/10 on the Cantril ladder. Overall healthcare is a difficult, time-consuming, and expensive process, but adding in measures for stopping the spread of disease can be implemented in a much less expensive manner. One of the first steps involved in any public health crisis is understanding how diseases may spread due to the overall population and population densities in a country. This is usually done through censuses, but many sub-Saharan countries have not had censuses in decades, such as DR Congo which had its last census in 1984 due to logistical challenges and budget. Bayesian hierarchal frameworks can help predict census data relatively accurately using data from micro censuses. Data science allows for national-scale evaluations of data, including ways to predict their spread. A DNN or LSTM model implements deep learning and uses big data from search queries and social media, along with weather and climate to generate a prediction for the number of infections within a certain region of choice. This would allow preventative measures to be put into place when a large spike in infections is predicted to occur, which would help lower the cost and manpower needed to treat many infected citizens. Other models implement CDR data, cell phone data that records where and when a call or text is made from a cell phone. Even in the poorer parts of sub-Saharan Africa, cell phone penetration is still high, so CDR data is effective for gathering spatial and temporal data. Methods such as the impedance model predict the movement of populations using nothing but CDR data. This is incredibly useful for deciding where to allocate resources since temporary population densities can be predicted for certain regions, and other information such as the severity of an epidemic in that region can possibly be inferred by how people migrate. Although predictions are useful for implementing countermeasures against infectious disease, those countermeasures should still be analyzed in order to quantify their effectiveness and make the necessary changes for future actions. CDR data allows for this to be done as well since actions such as lockdown can be evaluated by the movement of people, which can be done by looking at CDR data and infection rates. Even though data science techniques using user-generated data are effective for the populations it encompasses, some rural populations do not have the technology necessary to generate data. These people still should be helped but including them in data sets frankly impossible if they don’t have the necessary technology. One option for solving this is providing this population with cell phones, but that presents other logistical challenges such as the need for electricity if it is not already available. The most problematic issue that is to be addressed is how to incorporate these populations which cannot generate their own data into these data sets.