Challenging the Norm: A Data Scientist's Essential Skill
Written on
Chapter 1: The Power of Questioning
In exploring some of the most groundbreaking achievements in data science, it's hard not to be impressed. Whether it's algorithms that can outsmart professional gamers, accurately counting large crowds, or identifying cancerous cells, these are remarkable accomplishments. Each of these feats demands extensive practice, and the ability of machines to replicate such tasks—and do so effortlessly—is nothing short of astounding.
However, a critical skill lies beneath these accomplishments: the capacity to question established beliefs. A decade ago, if you had suggested that machine learning could achieve what it does today, many would have been skeptical. This discerning perspective is invaluable for data scientists and highlights how we can often fall prey to the Illusory Truth Effect more than we recognize. This phenomenon can hinder advancement in the realm of machine learning.
What is the Illusory Truth Effect?
The Illusory Truth Effect describes the tendency to believe information simply because it has been repeated frequently, often without sufficient scrutiny. Consider this question:
Which planet spends the most time closest to Earth?
Your immediate response might be Venus. But why do we think this? It's rooted in the common teaching that Venus orbits closer to the Sun, leading us to assume it must also be nearest to Earth.
This notion exemplifies the Illusory Truth Effect. A quick online search reveals numerous articles claiming Venus is typically the closest planet to Earth. Interestingly, since I began writing this in March 2019, many of these claims have been challenged by newer content, yet older beliefs still linger in search results.
Which Planet Is Actually Closer?
Initially, my instinct was to agree with Venus being the closest. However, upon closer examination, the nearest planet at any given moment depends on their positions in their respective orbits. There are times when Mars, Mercury, or Venus may be closer to Earth. To determine which planet spends the most time in proximity, we need to consider statistical averages.
My first thought experiment involved visualizing these orbits.
Orbital Distance Thought Experiment (Credit: Author)
As illustrated above, if we consider the circular paths of each planet relative to Earth (excluding eccentricity for simplicity), we can visualize areas where each planet is closer. The blue area shows where Venus will typically be nearer, while the green area indicates Mercury's proximity. The grey zone represents a space where either could be closer, likely balancing out over time. This leads me to suspect that Mercury might actually be closer more often than Venus.
Conducting Simulations
To validate this hypothesis, I conducted a simple calculation using the semi-major axes and orbital periods of each planet (again, ignoring eccentricity) through a Python simulation in a Jupyter Notebook, running the model over 100 years.
From the simulation, I calculated the distances from Earth at various time points and determined the median distances as follows:
- Mars: 1.826 AU
- Venus: 1.231 AU
- Mercury: 1.073 AU
(AU stands for Astronomical Unit, which measures vast distances relative to the Earth-Sun distance.)
Using the median rather than the mean is crucial, as the mean can be skewed by outliers. The median indicates that if Mercury has 50% of its data points closer than the others, it indeed spends more time nearer to Earth.
I also plotted an Empirical Cumulative Density Function, marking the 50% line, which corroborates the earlier findings.
Conclusion: The Need for Critical Thinking
This analysis aligns with data from Wolfram Alpha (noting that my model doesn't account for eccentric orbits). Surprisingly, my findings suggest that Mercury is closer more frequently, which challenges common assumptions. It's essential to question established facts and conduct your investigations, as misinformation can become entrenched simply through repetition (the Illusory Truth Effect).
By engaging in thorough research, one can often discern the validity of claims, embodying a spirit of inquiry reminiscent of Descartes. This is the essence of robust data science.
Chapter 2: Essential Skills for Data Scientists
The #1 Skill That Holds (Most) Data Scientists Back - In this insightful video, discover the crucial skill that many data scientists overlook and how it can impact their work.
MC Challenge Convention: Autonomous Decision Science in Global Consumer Goods and Retail Industries - This video explores how autonomous decision science is transforming industries, particularly in consumer goods and retail.