8.25 Humbler Data Science Reflection
Erin Liebe Professor Tyler Frazier Evolving Solutions- DATA 150 25 August 2020
Prioritizing the People Represented in the Data
Prior to reading Joshua Blumenstock’s article “Don’t forget people in the use of big data for development,” I never considered how vulnerable big data is to manipulation. If I am fed a statistic from the news or read about population projections in an article, I am quick to trust the source and assume the data is not only clean, but authentically represented. However, I found myself in a different situation after the recent upsurge of support for the Black Lives Matter movement. After George Floyd’s murder, I sifted through donation options and found myself skeptical of donating to or buying from organizations that were unreliable or unverified, seeking alternatives with a bigger following. It is fascinating how careful I was when donating my own paychecks to a movement and community that I feel strongly about simply because the organization was small and my finances were being immediately affected.
In situations that are far detached from my finances, well-being, and community, I am far less careful. As Blumenstock explained how people in developing countries forged thatched roof housing in order to manipulate the system and receive financial benefits, I realized how tempting it is to trust big data simply because it has a news source’s credibility or a government official’s name attached to it. I would instinctively trust that exclusively the most needy families are being compensated and that the only wealthy people benefiting from this initiative are the data scientists gaining social points for being charitable with their time and skills. This article completely enlightened me. I agree with Blumenstock’s suggestion that we are in need of a humbler data science that equally considers the people that the statistics are abou, nudging us to consider its flaws and vulnerabilities.
The successful uses of data science, as Blumenstock listed in “promise,” convinced me of its efficacy. However, the ability to connect with and identify the less fortunate, distribute resources, and track natural disasters can still fall short as listed in “pittfalls.” What stuck out to me in particular is how little time data scientists have had with such technology to consider all of its limitations. While lack of experience can be troubling, a lack of knowledge inspires an open-minded approach and attention to detail. Spending time trying to understand the shortcomings of data science and how to combat them teaches us more about data science as a whole than if we were to merely identify the flaws and choose other technologies for that project. It seems that a learning experience is in order so that data scientists, humanitarian lobbyists, and natural disaster recovery workers can properly employ data science. This calls for a healthy balance between seeing the benefits of data science while considering that numbers and charts are fallible and sometimes even false. The data represents people who are misunderstood, misrepresented, or underrepresented as a whole, and it is important to consider how that translates into the data collected about them.