(In favor of Kitchin)
The opportunities that big data provides are extensive, due to its volume, velocity, variety and exhaustive scope. These advantages make big data and data science incredibly powerful for analysis in all fields. However, data science is not a silver bullet and cannot replace traditional epistemology since forming theories and conclusions based solely on data removes the context of where that data comes from, making them rigid and sometimes inaccurate.
The largest proponents of the power of data science, such as Anderson, believe in empiricist theory, which suggests that the sheer volume of big data along with analysis techniques can reveal the inherent truth of data without a theory. This is true to some extent; data will have trends and those trends may exhibit high correlations. If correlations are near perfect, maybe a causal relationship is evident. Supposedly this is better than traditional scientific methods for two reasons: it is more efficient and accurate than theories and models, and it eliminates human bias. At least the latter part of this statement is incorrect. There is inherent human bias in the data analysis techniques used to form the objective results that empiricists believe in since humans made the algorithms. This is illustrated by the concept that many different analysis techniques can be used to comb through the same data set. This choice requires human input so the results cannot be entirely unbiased.
(Would also add a section on social sciences and humanities but I did not have time)
(In favor of Anderson)
Data science presents data sets that so large that they can incorporate an entire population, making statistical analysis more and more accurate. At some point, finding trends and correlational relationships in data may become so extensive they can be considered causal. At this point, theory was eliminated in the formation of a conclusion. This is the underlying power of data science: the ability to form a conclusion void a preceding theory.
What sets big data apart from previous data is, in no small part, its immense volume. Using this immense volume, analysis techniques can uncover many underlying trends, and form conclusions faster than a traditional scientific approach would. This can be demonstrated by Google, who revolutionized advertising without knowing what was in their advertisements. Google uses the personalized data generated from its users to find trends in what their users look at and show ads that are related to their personal habits. These targeted ads are often eerily similar to what people are thinking about or looking for, but they were not shown based on the judgement of a person, but rather a machine. The accuracy of these ads illustrates the idea that statistical correlations can form accurate results, in this case targeted ads, more efficiently than a person would, which is shown by the sheer number of users that use google and receive relevant targeted ads.
Data science also presents a more accurate view of the world than a model can. Models and theories are often crude and inaccurate since they are, at their heart, a generalization. Data science is not a generalization based on what little information is readily available, but is a conclusion drawn mathematically based on enormous amounts of an increasing amount of data. This can be demonstrated by Mendelian genetics, which stresses the importance of dominant and recessive genes, and uses a simple model to see which traits will be expressed. This model is not accurate, as shown by more recent research on protein DNA interactions. A data science method here would be to analyze the many gene, protein interactions from many people to understand where the genes would be expressed and where they are not. This method does not rely on an potentially inaccurate model, but instead it would use the actual full or near full scale data which does have the inaccuracy associated with a model.