The decision-makers at various levels — micro, meso, and macro levels — use data explicitly or implicitly to make decisions. More often than not, households, firms, development practitioners, and policymakers base their decisions on ‘ever-pouring data.’ The academicians’ and researchers’ need for data and analytical tools is almost insatiable.
There have been innovations in ways to collect data, checking the robustness of conclusions using data collected from different sources, and methodologies used for the calibration and processing of data. Applying data science tools to the ‘messy data’ for arriving at meaningful insights has become the order of the day, be it natural sciences or social sciences.
A reliable framework
Data and data science analytical tools are critical inputs for evidence-based research. Let us talk about the application of data science in various spheres of knowledge.
In the domain of economics, the growth of e-commerce has led to a generation of loads of messy data in a non-intrusive manner. Every click on a device used for purchasing generates data. The onslaught of the pandemic has boosted e-commerce and also data availability.
The e-commerce firms, especially the big ones, use tools of data science to clean the data and decode the consumer preferences about the products and services demanded, the quality of products demanded, consumer satisfaction or dissatisfaction, and re-purchases.
There is a need for talent to process data and automate actions based on the analysis. Automated suggestions to consumers to purchase a substitute good or service almost instantaneously is an example of the use of data science to increase the consumer base. If you search for a house, you will start getting offers from a number of real estate developers within no time. Thus, networking and sharing of data have also become possible, though, at times, they violate the norms of privacy and again, you need personnel who can attend to these issues.
The tools of data science have found increasing applications in the domain of public policies. Examples such as using tax returns to measure inequality, and temporary or permanent migration patterns, using satellite data to measure the area and density of forests, and using night light data as a proxy for economic activity.
Amy Finkelstein from Massachusetts Institute of Technology, says that, “It opened my eyes to the idea that one could use data to inform what had otherwise seemed like ideological debates.” Her work encompasses issues ranging from the estimation of the welfare benefits of alternative social insurance programmes to the effectiveness of mammogram screening.
The underlying connection in her work is the use of large data sets to test economic models for arriving at conclusions which do not necessarily validate the conventional wisdom.
Raj Chetty, Professor - Public Economics, Harvard University, is often described as a ‘Data Evengelist’. His work on ‘how to improve opportunities for children by using big data’ caught the eye of many. His seminal work on the identification of barriers to economic opportunity and devising solutions for breaking the poverty barriers won him accolades.
By identifying the patterns in big data which consisted 20 million Americans, he could grade the various neighbourhoods in the US that provided the best opportunity for people to exit the net of poverty. In brief, big data can be used to evolve theoretical frameworks rather than testing the theories with the help of data.
Traditionally, the courses on data science have been taught by scientists and engineers and the courses on social sciences and development studies have been taught by the faculties of humanities and social sciences. This compartmentalisation has to be broken and there has been a realisation that an integrated approach to the study of societal issues is a much better way to examine problems and work out solutions.
(The writer is Senior Professor, Shiv Nadar University, Chennai.)