karasms.com

Challenging the Norm: A Data Scientist's Essential Skill

Written on

Chapter 1: The Power of Questioning

In exploring some of the most groundbreaking achievements in data science, it's hard not to be impressed. Whether it's algorithms that can outsmart professional gamers, accurately counting large crowds, or identifying cancerous cells, these are remarkable accomplishments. Each of these feats demands extensive practice, and the ability of machines to replicate such tasks—and do so effortlessly—is nothing short of astounding.

However, a critical skill lies beneath these accomplishments: the capacity to question established beliefs. A decade ago, if you had suggested that machine learning could achieve what it does today, many would have been skeptical. This discerning perspective is invaluable for data scientists and highlights how we can often fall prey to the Illusory Truth Effect more than we recognize. This phenomenon can hinder advancement in the realm of machine learning.

What is the Illusory Truth Effect?

The Illusory Truth Effect describes the tendency to believe information simply because it has been repeated frequently, often without sufficient scrutiny. Consider this question:

Which planet spends the most time closest to Earth?

Your immediate response might be Venus. But why do we think this? It's rooted in the common teaching that Venus orbits closer to the Sun, leading us to assume it must also be nearest to Earth.

This notion exemplifies the Illusory Truth Effect. A quick online search reveals numerous articles claiming Venus is typically the closest planet to Earth. Interestingly, since I began writing this in March 2019, many of these claims have been challenged by newer content, yet older beliefs still linger in search results.

Which Planet Is Actually Closer?

Initially, my instinct was to agree with Venus being the closest. However, upon closer examination, the nearest planet at any given moment depends on their positions in their respective orbits. There are times when Mars, Mercury, or Venus may be closer to Earth. To determine which planet spends the most time in proximity, we need to consider statistical averages.

My first thought experiment involved visualizing these orbits.

Diagram of planetary orbits relative to Earth

Orbital Distance Thought Experiment (Credit: Author)

As illustrated above, if we consider the circular paths of each planet relative to Earth (excluding eccentricity for simplicity), we can visualize areas where each planet is closer. The blue area shows where Venus will typically be nearer, while the green area indicates Mercury's proximity. The grey zone represents a space where either could be closer, likely balancing out over time. This leads me to suspect that Mercury might actually be closer more often than Venus.

Conducting Simulations

To validate this hypothesis, I conducted a simple calculation using the semi-major axes and orbital periods of each planet (again, ignoring eccentricity) through a Python simulation in a Jupyter Notebook, running the model over 100 years.

Simulation of planetary orbits without eccentricity

From the simulation, I calculated the distances from Earth at various time points and determined the median distances as follows:

  • Mars: 1.826 AU
  • Venus: 1.231 AU
  • Mercury: 1.073 AU

(AU stands for Astronomical Unit, which measures vast distances relative to the Earth-Sun distance.)

Using the median rather than the mean is crucial, as the mean can be skewed by outliers. The median indicates that if Mercury has 50% of its data points closer than the others, it indeed spends more time nearer to Earth.

I also plotted an Empirical Cumulative Density Function, marking the 50% line, which corroborates the earlier findings.

Empirical Cumulative Density Function of planetary distances

Conclusion: The Need for Critical Thinking

This analysis aligns with data from Wolfram Alpha (noting that my model doesn't account for eccentric orbits). Surprisingly, my findings suggest that Mercury is closer more frequently, which challenges common assumptions. It's essential to question established facts and conduct your investigations, as misinformation can become entrenched simply through repetition (the Illusory Truth Effect).

By engaging in thorough research, one can often discern the validity of claims, embodying a spirit of inquiry reminiscent of Descartes. This is the essence of robust data science.

Chapter 2: Essential Skills for Data Scientists

The #1 Skill That Holds (Most) Data Scientists Back - In this insightful video, discover the crucial skill that many data scientists overlook and how it can impact their work.

MC Challenge Convention: Autonomous Decision Science in Global Consumer Goods and Retail Industries - This video explores how autonomous decision science is transforming industries, particularly in consumer goods and retail.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

The Enigmatic Bermuda Triangle: Unraveling the Mystery

Explore the curious phenomena of the Bermuda Triangle and the theories behind its mysterious disappearances.

The Alarming Reality of Wet Bulb Temperature and Climate Change

Discover the critical impact of wet bulb temperature on human survival amid climate change and the looming risks for various regions globally.

Eccentric Habits for Achieving Extraordinary Success

Discover unusual habits that can lead to exceptional success in your entrepreneurial journey.