Alex Coombs

10/19/20

DATA 150

Word Count: 2553

Using Bayesian modeling, Neural Networks, and Population Mobility Metrics to Track and Mitigate the Spread of Infectious Disease in sub-Saharan Africa

Overall wellbeing in Sub-Saharan Africa is the worst in the world, scoring 4.39/10 on the Cantril ladder according to the 2012 Gallup World Poll. As a continent, Africa has the highest mortality rate and is the only continent with more deaths attributed to infectious disease than chronic illness (Deaton). Healthcare greatly contributes to overall wellbeing and achieving an objectively good life, so it must be a priority when increasing the overall wellbeing of Sub-Saharan Africa. Even though factors such as economic output and corruption limit sub-Saharan African development, developing an effective healthcare system is integral to the future of the region. Some of the limitations to achieving effective healthcare in sub-Saharan Africa reside in its lack of adequate census data and lack of access to health centers/health professionals. Even with the aggregate health data in the region indicating that it is the worst in the world, different countries vary in healthcare quality, and sub-populations within the Sub-Saharan countries do as well. For example, according to a Gallup world poll conducted in 2012, Only 17% of respondents in Madagascar and Tanzania reported being in perfect health, while 50% of respondents in Ethiopia and Somaliland reported being in perfect health. This trend of inequality does not apply solely to general health, but also contact with health professionals, with less than 50% of respondents in Ghana reporting ever being in contact with a health official, and less than 10% in Sudan (Deaton). Census data is an invaluable resource in health services since population numbers and density are key factors in the spread of epidemics. This paper seeks to address some of these issues and come up with methods to assist the development of healthcare in sub-Saharan Africa with a focus on the use of census data and population mobility metrics to mitigate the spread of infectious diseases. It will also identify some subsets of people who may be more vulnerable than others.

A lack of accurate and recent census data is one of the most prevalent issues regarding the public health of Sub-Saharan Africa since census data helps inform officials where to concentrate resources. Generally, a nation conducts a census every 10 years, but this is not possible in many lower-income countries, which comprises much of Sub-Saharan Africa. Many Sub-Saharan countries have not had a census in decades, such as DR Congo, which has not had a national census since 1984, and Nigeria, who has not had a census since 2006. Attempting to derive enough information from census data from these periods is difficult, if not impossible, and has caused problems with infectious disease control in the past. The Ebola outbreak in DR Congo was unable to be adequately controlled since accurate census data was needed to formulate a response, and yellow fever campaigns in Nigeria have not been entirely successful since the census data used is from 2006 (Douglas). Accurate census data cannot be replaced by any other data set, but it can be supplemented. Hierarchal Bayesian modeling can assist census data by estimating values to fill in the gaps of incomplete or outdated census data. This is done by using micro censuses and relevant covariates such as access to roads, water sources, and satellite imagery to estimate local population statistics. With proper training sets, this method can relatively accurately predict the overall population of a nation, as well as subnational population densities (Douglas). Also, since less data is required than a full-scale census, and the data gathered by the micro censuses is localized, it is less expensive than a full-scale census and could be useful as a data set at times when a census cannot be conducted. This data can help formulate responses to infectious diseases and other health emergencies more effectively than what can be done if incomplete or out of date census data is used alone. If census data is available, it may only be useful for aggregate population numbers, which is not enough data to effectively respond to a public health crisis. The previous method described, a hierarchal Bayesian model, is a bottom-up approach, so it inherently contains disaggregated population data. A top-down approach to data disaggregation uses satellite data to estimate locations of high and low density within enumeration areas. (Wardrop). A top-down approach is less expensive than a bottom-up approach if adequate census data is available, making it another option to obtain the necessary population density data to address public health crises.

Controlling the spread of infectious disease involves tracking its spread, which can be done by predicting how many people will be infected. Traditional methods for disease tracking involve using recorded positive cases and looking for patterns in a disease’s spread. This, however, can become inefficient since not all cases will be reported, and the process of manually recording data is inefficient. As a result, models that predict the spread of infectious diseases are integral to mitigate their effects. Previous methods have relied solely on statistical modeling, such as General Linearized Model (GLM), Absolute Shrinkage and Selection Operator (LASSO), and Autoregressive Integrated Moving Average (ARIMA). Newer methods that implement big data and deep learning algorithms such as Deep Neural Networks (DNN) and Long-Short Term Memory (LSTM) models are more effective (Chae). The improvement that methods implementing deep learning algorithms present are demonstrated by a study tracking the spread of chickenpox, malaria, and scarlet fever in South Korea. Researchers used data taken over 576 days containing search query data, social media big data, and averaged daily weather and humidity data as parameters for DNN and LSTM methods. With an optimal lag time, the DNN model had a 24.45% increase in performance over ARIMA models when looking at the correlation between the recorded spread of each disease over the tested time period, and the LSTM model showed an 18.78% increase over the ARIMA model in the same metric (Chae). Overall, the DNN model was the most accurate, but the LSTM model was the most accurate when tracking a rapidly spreading disease. Infectious diseases in Sub-Saharan Africa is a major point of concern, so models such as this could be integral in assisting public health officials respond to epidemics. Also, the different traits of the DNN and LSTM models can be uniquely useful. Diseases that have been a constant issue such as HIV and Ebola can be tracked primarily with a DNN model while emerging diseases like COVID-19 can be tracked primarily with an LSTM model. Models such as these are inexpensive since they use widely available resources and require little labor. As a result, Low-income countries, such as many Sub-Saharan countries, can take advantage of these neural networks.

In addition to disease tracking using neural networks, general population movement can be tracked and predicted u­­­sing CDR and GPS data. Cell phone penetration is high all over the world, including in Sub-Saharan Africa. CDR data contains information on when and where a cell phone tower is used, which provides both spatial and temporal data. As a result, a general location of any phone user at any time can be recorded and use to formulate movement patterns of large populations. Even though this data is easy to access and record, it lacks a fine spatial resolution since cell phone towers can be spread far apart. Also, not every individual has a cell phone, and the poorest parts of a population are less likely to own a cell phone, so the data generated from CDRs could exclude poorer parts of a population (Lai). However, it is still useful for looking at larger scale movements in a population. GPS data has a fine resolution but requires the use of a smartphone, which is not incredibly prevalent in Sub-Saharan Africa since most residents are not able to afford one. Besides being able to view real-time population mobility, CDR data can be used to predict future mobility patterns. An impedance model implements CDR data to predict how people will move during an epidemic. According to a study where it was used to estimate population mobility during the 2010 Cholera epidemic in Haiti, it can relatively accurately predict the population mobility in the absence of parameters, using only CDR data (Sallah). This makes an impedance model a powerful tool in low-income countries such as those in sub-Saharan Africa. Also, it was especially accurate when predicting the movement of heterogeneous populations, which are prevalent in many low-income countries where people in the same area may have very different levels of wealth. Overall, this model presents another way to allow health officials to take the initiative in addressing infectious diseases, along with predicting its spread. This more people-focused model can help inform responders where to concentrate resources before an area becomes heavily infected. This preventive measure can help limit the spread of infectious disease in a way similar to the DNN and LSTM models.

Cell phone data presents a slightly different perspective on the spread of infectious disease compared to neural networks since the DNN and LSTM methods focus on the spread of the disease, and cell phone data looks at the movement of people. CDR data is uniquely useful compared to the neural networks since one of the best methods of mitigating a disease spread is to limit movement to decrease the number of potential infections. Tracking the movement of people in real-time can be used to inform public health officials on the effectiveness of passed legislation or recommendations. The power of real-time tracking can be illustrated by Nigeria’s COVID-19 restrictions, and the data gathered on the movement of Nigeria’s population before and after restrictions were put into place. Using cell phone data, Nigeria’s people’s interregional and intraregional interactions could be monitored (Flowminder). The population of Nigeria was shown to have smaller interregional and intraregional interactions after the movement restrictions were put into place, showing the restrictions were successful. Using deep learning algorithms and population tracking via cell phone data, the ratio of predicted infection, recorded rate of infection, and population movement can be compared to give public health officials a comprehensive an epidemic within a country.

Along with the spread of infectious disease, access to healthcare is an issue in Sub-Saharan Africa, especially in rural areas. For example, In Ghana, Somaliland, and Sudan, less than 50% of citizens have reported ever being in contact with a medical professional (Deaton). One of the metrics that Africa struggles in is maternal health, where they rate the lowest in the world at 546 deaths per 100,000 live births. The researchers postulated that this was in part due to the distance that mothers needed to travel in order to get to a health center with skilled birth attendance (Dotse-Gborgbortsi). To test this, the researchers used HMIS and DHMIS2 data to locate mothers’ places of residence to find the distance from their home to the hospital or health center they gave birth. Using the gathered data, three maps were constructed: Two displaying the expected health center the mothers would give birth at, and one showing the actual location they gave birth. The maps containing the expected distances traveled and the actual distances the mothers traveled were not consistent, and mothers traveled further distances than expected. On average, mothers traveled 5.73 km to give birth, but in rural areas mothers traveled an average of 7.53 km. The distance that mothers traveled to where they gave birth was compared to the quality of the health centers used, and it was found that, in Ghana, the quality of the health center used decreased by 6.7% per kilometer. Also, in Zambia, this number was much higher at 36% per kilometer (Dotse-Gborgbortsi). Access to health centers is at the core of good public health, and this study showed that in low-income countries such as Ghana and Zambia in Sub-Saharan Africa, access remains an issue. Also, since the actual distance traveled by the mothers was farther than the expected distance, some health centers were bypassed. This suggests that the quantity of health centers in Sub-Saharan Africa is not entirely the issue, but the quality of those health centers also needs to be addressed. Rural populations are most affected by the lack of access to effective healthcare since many rural areas will be farther away from health centers than urban ones, and the closest health centers may be worse on average. Even with the rapidly increasing urban populations in Sub-Saharan Africa, rural populations must not be left behind in development.

While conducting my research, I first focused on data science techniques since I knew that understanding the methods used in the articles I read would take the most time. As a result, I did not have much of an idea of what I wanted to investigate until late into my bibliography. However, I did try to focus on some specific regions such as Africa and Asia when doing my initial annotations since I have some familiarity with these regions. I also had some general idea of what umbrella issue that I wanted to address, which was health. As I was working on my bibliography, I found interesting methods to be the ones that attempted to predict an outcome, such as the spread of infectious disease or population statistics. I then made the focus of my bibliography on these types of techniques, which lead me to Bayesian modeling, neural networks, and the use of CDR data. My human development topic of interest followed the methods that I wanted to investigate, which lead me to look at the spread of infectious disease and the evaluation of subsequent responses; and physical distance to healthcare facilities. After doing my preliminary research, I have noticed that areas that are assumed to be unpopulated are often left out of data sets for the sake of accuracy and consistency. This is a glaring gap in my human development issue since these locations are not all unpopulated, so at least some minority of people are not able to be helped directly by data science methods. Also, poorer parts of countries have a similar issue since they are not likely to produce as much data, such as CDR data, which leaves them out of algorithms that implement it. A gap I may want to investigate could be how to incorporate residents of unpopulated regions and poorer citizens in rural areas into data sets they may not be able to be directly placed in. Overall, I want to continue to study how these methods can help mitigate the spread of infectious disease and find ways to include rural populations and sparsely populated areas into data science algorithms.

Data science allows us to look at a wide range of data and make more accurate conclusions about a region as a result. I have talked about some incredibly large data sets such as national weather data, CDR data, and census data; all of these would be almost impossible to analyze without data science methods. The ability to analyze these data sets is what data science contributes to human development: The capability to quantitatively capture and understand the big picture of a human development problem, and the ability to also narrow it down to its fine details. Each nation is uniquely complex, with different social, economic, and political systems attached to it. As a result, no two regions have the same solutions to their development problems. Data science again addresses this since the data sets it works with are so extensive that they can include the minute nuances and great differences between each country. Overall, the sheer scale of data science is what makes it such a powerful tool in human development, and all of its complex implications.

Annotated Bibliography

Dotse-Gborgbortsi, W.; Dwomoh, D.; Alegana, V.; Hill, A.; Tatem, AJ.; Wright, J.: 2020. The influence of distance and quality on utilization of birthing services at health facilities in Eastern Region, Ghana. BMJ Global Health, 2024.4: article e002020. 10.1136/bmjgh-2019-002020
Maternal mortality rate in sub-Saharan Africa is high in comparison to the rest of the world, at 546 deaths per 100,000 live births, and has declined at half the rate of the rest of the world. The key factor relating to this problem has shown itself to be skilled birth attendance during childbirth. Good health facilities and health care are core components of social freedom, so mitigating maternal mortality rates is in the interest of increasing real freedoms, and therefore development. This study aimed to use routinely collected childbirth data from HMIS (health management information services) for hospitals in Ghana to determine the effect of distance and quality of health care facilities on birthing services. To do this, the researchers used HMIS data derived from the DHIMS2(Ghana District Health Information Systems 2) to collect data on individual patients. DHIMS2 data recorded mothers’ places of residence along with other information such as occupation, birth outcome, and health insurance. Birthing quality was determined by whether a skilled birth attendant was present at the time of birth, and since less than 1% of hospitals did not have a skilled attendant, hospitals were used as a proxy for skilled birth attendance. Spatial distributions of potential demand for obstetric care was determined by gridded map(100x100m) of estimated pregnancy in 2015. Mothers’ places of residence were overlaid onto this map, and straight lines from the mothers’ places of residence were used to judge distance to the nearest health center. 3 maps were developed, 2 displaying expected movement and one displaying observed movement. Results showed that women travelled an average of 5.73 km to give birth, but women living in rural areas travelled significantly farther than those in urban areas at 7.53km. Also, most women bypassed their nearest health facility, although at a lower rate for hospitals. On average, the quality of healthcare for the observed destination and bypassed destinations were similar and were most likely bypassed for reason relating to reputation or familiarity. Overall, this analysis suggests that there is a decrease in quality facility usage of 6.7% per kilometer, which is much lower than other regions such as Zambia, which is 36%. More importantly, this study showed the importance of keeping accurate and consistent health records and HMIS’ ability to be used to assist development in the domain of public health. I chose this source since it describes a facet of healthcare in developing nations, which is one of the aspects of development I am planning to consider for my topic. More specifically I am interested in the idea of efficiently spacing hospitals and other medical centers, which would inherently lead to better, and lower cost healthcare if the medical professionals in those hospitals are well trained. Since this is one of the ideas I may focus on, the method of creating maps using spatial distributions of mothers’ places of residence seems like a useful model for identifying good places for a hospital.

Flowminder, “COVID-19 | Ghana: Report #1: Initial insights into the effect of mobility restrictions in Ghana, using anonymized and aggregated mobile phone data”, April 03, 2020, https://statsghana.gov.gh/COVID-19%20press%20release%20report%20-%20analysis%20overview%20-%20final1.pdf
This source analyzes the use of aggregated and anonymous data from MNO’s (Mobile Network Operators), which can be used to understand mobility patterns of populations to improve planning and decision making during the COVID-19 pandemic. The researchers in this study used the MNO data to track the mobility of Ghana’s population between and within the Greater Accra and Ashanti regions, using average active mobile phone subscribers in a region as a proxy for the number of people in it. COVID-19 is an important public health concern, and this article aims to judge the effectiveness of Ghana’s lockdown. As a result, this article it is contributing to the development of Ghana by showing the efficacy in MNO data, reducing the unfreedoms associated with disease by increasing the efficiency of social freedoms such as good public health. The article describes the mobility pattern of Ghana’s population over time from February 17th – March 31st, a period of 6 weeks. The first 4 weeks were used a baseline for population mobility, representing the time before lockdown was announced. 3 points in the final 2 weeks were analyzed, occurring at March 16th, the date when initial restrictions were put into place, March 27th, when lockdown was announced, and March 30th, when lockdown was put into action. Data from the inter-district analysis which measured the population changes in the Ayaso West district of the Accra region showed that there was a small change in the overall number of phone subscribers after restrictions were announced, but a much larger decrease after lockdown was put into action. This trend was similar but slightly different than Awutu Senya East district, which had a peak the days following the announcement of the lockdown, even though average subscribers in the region dropped in a similar manner to the Ayaso West District. This was likely due to the different characteristics of each region, such as the number of people who travel there to work or socialize. Overall, the inter-district data suggests that the lockdown was successful in reducing travel in between districts in Ghana since average subscribers dropped significantly after the lockdown started. Data was also gathered on travel in between the Accra and Ashanti regions, which showed no significant decrease in travel after the lockdown was announced, but a significant decrease in inter-regional travel after the lockdown was put in place. This also reinforced the notion that the lockdown succeeded in mitigating travel to only essential trips. Further analysis aims to establish the degree at which the proportion of resident to non-resident subscribers influenced this decrease in travel. Healthcare is one of the development topics I am interested in looking at, and Covid-19 is an excellent model to understand how the modern world reacts to an epidemic. The reaction of Ghana to the pandemic shows the effectiveness of good legislation for limiting the spread of disease, which is a core principle of good public health. This study also presents a method of monitoring movement and the potential spread of disease, which is important for both creating and evaluating legislature for mitigating a disease’s impact.

Lai, S.; Farnham, A.; Ruktanonchai, N.W.; Tatem, A.J.: 2019. Measuring mobility, disease connectivity and individual risk: A review of using phone data and mHealth for travel medicine. Journal of Travel Medicine, 2019.26(3): article taz019. DOI: 10.1093/jtm/taz19
The world has become more interconnected than ever before since the mobility human populations has increased. However, the spread of diseases had also increased in conjunction with our increased mobility, prompting a global health concern. Accurate data on people’s movement can help mitigate the spread of disease since locational data can help experts simulate the progression of epidemics and identify high risk areas. The use of mobile phones is providing us with more accurate real-time data than ever before due to the high penetration of mobile phone usage around the world, including lower income regions such as sub-Saharan Africa. This article aims to highlight some of the advantages and collection methods of mobile data, specifically CDR data, localized GPS data from social media and web browsers, and mHealth applications. Developing models using this mobile data would contribute to the world’s development by decreasing the unfreedoms associated with disease and increasing social freedoms such as efficient healthcare. CDR data is routinely collected by phone operators, and contains subscriber identification, date and time of communications, and the location of the cell tower used to make a call or send a message. Since mobile phone penetration is high, anonymized CDR data can simulate the movement of a population by identifying individual’s location by what cell tower they use. Models that have used CDR data, such as ones used to try and eliminate malaria in sub-Saharan Africa have been successful in identifying transmission routes and local foci. The quickly updated nature of CDR data helps make efficient responses easier and planning more effective. Even though this data is widespread, its measurements can only be as precise as the distance between cell towers. Using GPS data from social media, such as twitter geotags, and location data from web browsers, is more accurate then CDR data. However, since smartphone penetration is not nearly as high in low income regions as developed ones, this data cannot be used as widely as CDR data. mHealth (mobile health) applications also present an unprecedented opportunity to improve travel health. They can passively track GPS data, and record responses from daily health questionnaires from its users to generate accurate and reliable data on both health and location. This would create better data than previous methodologies since it eliminates recall bias. Overall, mobile data and mHealth applications would be effective in improving global public health by increasing the efficiency of responses from officials due to their accuracy, breadth, and low cost. This study is useful since it provides 2 different and useful methods for tracking the movement of a population, which adds more nuance into data collection. Population tracking is imperative to understanding the spread of a virus, and how to mitigate it, so these methods are incredibly useful in the domain of public health. Also, the use of mHealth apps presents an interesting way to increase access and speed of healthcare to many people.

Douglas R. Leasure, Warren C. Jochem, Eric M. Weber, Vincent Seaman, Andrew J. Tatem 2020. Nation population mapping from sparse survey data: A hierarchal Bayesian modeling framework to account for uncertainty Proceedings of the National Academy of Sciences Sep 202, 201913050 DOI: 10.1073/pnas.1913050117
Accurate and available census data is integral for understanding the size of densities of a country’s population, which is used to inform administrative bodies what legislation or other actions are beneficial for a country. In addition, the portion of a population that would benefit the most from good census data are those most at risk for disease and poverty, but accurate census data is lacking in many developing countries. Good public services fall under the domain of eliminating unfreedoms associated with disease and poverty, among other aspects of development, meaning that improving population data is relevant to development. Methods such as satellite imagery and micro censuses have helped in the past but are not nearly as effective as true population census data. Bayesian modeling frameworks allow for the estimation of a more complete data set from sparse data points. In the case of this study, a Bayesian modeling framework was used to estimate the population and population densities of Nigeria using micro censuses. To achieve this, the researchers in this study used a Bayesian model implementing a hierarchal framework that aided it in predicting population densities across settled areas and administrative units. Furthermore, they used covariates such as high-resolution satellite data to identify settled areas and unsettled areas. Using the data collected from the Bayesian modeling framework the researchers generated population data containing overall population and population density for 100m x 100m squares of settled areas. The predicted population estimates for the areas where micro census data was not collected was fairly accurate, and overall population density across regions was similar to the national average. There were some exceptions however, such as how the rural Ebonyi state and urban Kano state had larger population densities than expected. Also, the model overestimated the population of smaller spatial areas since there were many large clusters of small populations in the model. Other limitations of this method were that it did not account for uncertainty in any of the micro censuses, it did not contain the most recent data possible so the population estimates in the micro surveys were likely underestimated, and the model assumes that no one lives in unsettled areas, which is likely false. This method of estimation has shown itself to be effective but is still less accurate and contains more uncertainty than a population census, meaning it cannot serve as a replacement for true census data. Also, this model itself does rely on census data to some degree since micro censuses were used as individual data points. Overall, this source provides an incredibly useful insight into the capabilities of Bayesian modeling. As stated above, accurate population data is imperative for effective management of public health and effective healthcare, which is probably the aspect of human development I will be focusing on.

Deaton, Angus S., and Robert Tortora. “People In Sub-Saharan Africa Rate Their Health And Health Care Among The Lowest In The World.” Health Affairs, vol. 34, no. 3, 2015, pp. 519–527., doi:10.1377/hlthaff.2014.0798.
Africa has the highest mortality rate in the world and is the only continent with more deaths attributed to infectious disease than chronic illness. Africa, especially Sub-Saharan Africa, has some of the worst healthcare in the world. This article sought to gain a relatively comprehensive and unbiased view of wellbeing in sub-Saharan Africa and compare it to the rest of the world. The researchers did this by examining the results from the Gallup world poll in 2012. A study such as this one is not as related to analysis using data science techniques, but it does offer context into problems within Africa. Knowing the problems experienced in a region is necessary in forming a plan to help address said issues. As a result, this article does relate to Sen’s definition of development since it is concerned with identifying unfreedoms of residents in Sub-Saharan Africa. Using the Gallup world poll, the researchers were able to see the overall wellbeing of many regions, such as Europe and Sub-Saharan Africa. The poll revealed that Sub-Saharan Africa had the worst overall wellbeing score of all regions polled, with a score of 4.39/10 as measured by the Cantril ladder. This is far lower and any developed regions such as N. Europe non-Anglo countries and rich Anglo countries which each scored highest at 6.99. Sub-Saharan Africa also scored the worst on their perceived healthcare, with only 42.4% responding that they are satisfied with their healthcare. This aggregate percentage does not accurately represent the entirety of Sub-Saharan Africa however, since the countries which compromise it experience radically different healthcare quality. 50% of respondents from Somaliland and Ethiopia report being in perfect health, while only 17% in Madagascar and Tanzania do, for example. Spending is also not consistent between nations, with South Africa spending $942 per capita on healthcare, while to median of all Sub-Saharan Africa is only $109. A regression model created by the researchers in this study did shine some light on some possible reasons why. More than any other statistic, HIV prevalence correlated the most with quality of healthcare, especially with nations with the highest HIV prevalence. This was postulated to be because more western aid goes to countries with the highest HIV prevalence, and that this aid with the HIV epidemic could spread to other aspects of healthcare. This article is relevant to my topic since it gives a good amount of context for the state of healthcare in Sub-Saharan Africa, and some of the reasons for the blatant inequality in healthcare between countries. It also highlights some of the main problems with healthcare such as differences in spending, aid and access to trained health officials. This context will help me form the problems I will focus on the most and some of the aspects of those problems that will be the hardest and most important to address.

Chae S, Kwon S, Lee D. Predicting Infectious Disease Using Deep Learning and Big Data. Int J Environ Res Public Health. 2018;15(8):1596. Published 2018 Jul 27. doi:10.3390/ijerph15081596.
The spread of infectious disease presents a threat to the health of all nations, so mitigating it is in the interest of the world and to reducing the unfreedoms associated with them. Methods of mitigating the spread of infectious diseases involve tracking its progress across a country, but many traditional methods of tracking are inadequate. Manually noting the number of positives and reporting them is time-consuming and is prone to inaccuracy. As a result, predictive models that simulate the spread of disease are valuable in coordinating responses to infectious diseases. Previous methods relied on statistical modeling and linear regression models, but newer models that incorporate deep learning are a promising advancement in the field of public health. This study used South Korea as its region of interest and looked at two methods of modeling the spread of infectious disease using previous statistical models, and two methods that use deep learning algorithms to test the hypothesis that neural networks and big data can improve the accuracy of infectious disease predicting models. To test this, the researchers looked at the past spread of three diseases: chickenpox, malaria, and scarlet fever. The data used was split into four categories: search query data, social media big data, temperature data, and humidity data. Search query data was taken from the South Korean browser Naver, the social media big data was taken from twitter and consisted of posts that contained keywords or phrases such as “chickenpox” and “chickenpox symptoms”. The weather data was taken from the Korea Meteorological Administration, and the values for temperature and humidity were daily averages for regions of the country. Weather data was used since the temperature and humidity of a region are shown to be key factors in the spread of infectious disease. The methods used for comparison were Ordinary Least Squares (OLS), Autoregressive Integrated Moving Average (ARIMA), Deep Neural Network (DNN), and Long-Short Term Memory (LSTM). DNN and LSTM used deep learning algorithms, and the results of the methods were compared to the recorded spread of all the diseases over 576 days. After the data sets were run through the models, they were compared using Root Mean Squared Error (RMSE) to test for accuracy compared to the known rates, and the standard deviation of the values was used to test for precision. Results showed that the methods that incorporated deep learning algorithms were more accurate than the OLS and ARIMA models. Also, the DNN model was shown to be the most accurate overall. This is mostly because the DNN and LSTM models were more sensitive to increases and decreases in the number of infections, so they stayed truer to the actual number of infections throughout the timeframe of the study. This source provides some more advanced computational methods to my development problem of healthcare in Sub-Saharan Africa, this in combination with CDR data can provide a nuanced view of an epidemic, and this method is also useful for general disease tracking which is always helpful.

N.A. Wardrop, W.C. Jochem, T.J. Bird, H.R. Chamberlain, D. Clarke, D. Kerr, L. Bengtsson, S. Juran, V. Seaman, A. J. Tatem. Spatially disaggregated population estimates in the absence of national population and housing census data. Proceedings of the National Academy of Sciences. Apr 2018, 115 (14) 3529-3537; DOI: 10.1073/pnas.1715305115
Census data is only useful when controlling the spread of infectious disease if it has enough information to inform public health officials on how to proceed. Even if census data is available, it may be uninformative if it is incomplete or does not have a fine enough scope. Ways to address this, such as bottom-up and top-down approaches of data disaggregation can greatly increase the effectiveness of responses to public health emergencies, which will in turn reduce unfreedoms associated with disease. If enough census data is available, but is aggregated since enumeration areas are too large, or population density data is insufficient, a top down approach to data disaggregation is useful. Using satellite data to identify covariates such as topography and distances to the nearest roads, an area’s population density can be estimated. This is possible since people are more likely to live near areas where infrastructure or natural resources are available, also, topography can affect where people live because it can affect how easy it is to enter and leave an area. After using the satellite data to estimate densities, dissymmetric map can be created to show the relative population densities within a country. The main limitation to this method is that it is only as useful as the census data that comprises it, so if the census data used to create a dissymmetric map is inadequate, the map will be useless. Another way to disaggregate population data is to use a bottom up approach, which is useful when census data is incomplete or out of date. It involves using micro census data from around a country, along with necessary covariates, to create and estimation of local population densities. This estimation can be done using a variety of statistical models such as hierarchal Bayesian modeling. A dissymmetric map can be created using this method by putting together the individual regions where population densities were estimated. This map in theory should be like what is created by a top down approach, so they should be different methods for achieving the same result. The main limitation associated with a bottom up approach is that it is using sparse data as the base of its estimation, so the more data that is added, the more accurate it can be. Also, it can never be as accurate as a census since it is only estimating population numbers. However, it is much cheaper than a census so it can be useful for gathering data when a census is not possible. This source is useful for me since it directly concerns the census data aspect of my human development problem. The top down method is especially useful for my topic since I have already researched bottom up methods of data disaggregation/collection, so now I have methods for making use of census data when it is available.

Sallah, K., Giorgi, R., Bengtsson, L. et al. Mathematical models for predicting human mobility in the context of infectious disease spread: introducing the impedance model. Int J Health Geogr 16, 42 (2017). https://doi.org/10.1186/s12942-017-0115-7
Predicting the movement of people during an epidemic is necessary to respond as quickly and effectively as possible since organizing a response is an intensive process. As a result, models to predict population mobility are useful for infectious disease control. Older methods such as the gravity and radiation models are outdated, and the gravity model is difficult to implement in low-income countries because it requires parameters. The researchers in this article propose an impedance model that does not rely on parameters, so it is can be implemented well in low-income countries and is more accurate than the gravity and radiation models. This model could help address the unfreedoms that arise with infectious disease, and therefore relates to Amartya Sen’s definition of development. The impedance model is based off ohms law and consists of three sections: The number of trips a person takes per day(Fij), overall probability of mobility(α) to the size of the source and destination population(Pi+Pj), and distance between locations(dij). The equation comes out to be Fij=α(Pi+Pj)/dij. All the variables can be derived from CDR data, which is the data set used for this model. This model was tested using the 2010 Cholera epidemic in Haiti and gathered results that followed a susceptible-infected-recovered (SIR) framework, and the results of the impedance model were compared to a gravity model, radiation model, and the actual data. Results showed that the impedance model was more accurate than the gravity and radiation models, especially when a scarce amount of data was present. Also, the impedance was especially accurate in comparison to the others when applied to a heterogeneous population. These aspects make the impedance especially useful in low-income countries where data is scarce, and levels of wealth may be very uneven in a given area. The main limitations of this model are that not everyone has a cell phone, and have no cell phone data, although this model is strong in this regard compared to previous models, and that cell phone towers may be spaced far apart making spatial resolution low in some cases. Since I am focusing on sub-Saharan Africa, a poor region of the world, this model is incredibly useful for predicting human mobility. In conjunction with other models I have researched for predicting the spread of infectious disease, this model can provide even more of an opportunity to proactively take action to prevent the spread of infectious disease.