Mastering Chaos: How The Modern Data Scientist Thrives On Messy Data With Pingouin

Image by Editor

# Introduction

Let’s face it: the clean, textbook version of data science rarely holds up in practice. We learn methods using perfectly curated, normally distributed data, but real-world projects throw us curveballs — extreme outliers, heavily skewed distributions, and wildly uneven variances.

In an earlier piece on building an exploratory data analysis (EDA) pipeline with Pingouin, we saw how statistical tests can flag when data breaks key assumptions like normality or homoscedasticity. But what happens when those tests come back failed? Discarding the data isn’t the answer — going robust is.

This article walks you through the art of applying robust statistics in your data science workflow. These are mathematical techniques specifically designed to produce trustworthy results even when your data violates standard assumptions or is riddled with outliers and noise. Using a “choose your own adventure” format, we’ll walk through three realistic scenarios using Python’s Pingouin library to tackle the messiest data challenges you’ll face on the job.

# Getting Started

First, let’s install (if you haven’t already) and import Pingouin and Pandas, then load the wine quality dataset available at the link below.

!pip install pingouin pandas

import pandas as pd
import pingouin as pg

# Loading our messy, real-world-like dataset with red and white wine samples
url = "
df = pd.read_csv(url)

# Taking a quick look at what we're working with
df.head()

If you read the previous Pingouin article, you’ll recall this is a notoriously messy dataset that falls short on several standard assumptions. Now we’ll dive into three separate “adventures,” each presenting a specific scenario, the core problem it poses, and a robust solution to handle it.

// Adventure 1: When the Normality Test Fails

Imagine we run normality tests on two groups: white wine samples and red wine samples.

white_wine_alcohol = df[df['type'] == 'white']['alcohol']
red_wine_alcohol = df[df['type'] == 'red']['alcohol']

print("Normality test for White Wine Alcohol content:")
print(pg.normality(white_wine_alcohol))
print("nNormality test for Red Wine Alcohol content:")
print(pg.normality(red_wine_alcohol))

You’ll discover that neither group follows a normal distribution, with extremely low p-values. While non-normality alone doesn’t confirm the presence of outliers or skewness, a strong departure from normality often hints that such issues lurk in the data. Running a t-test to compare means under these conditions would be risky and likely produce misleading results.

The robust solution here is the Mann-Whitney U test. Rather than comparing group averages, this test works with data ranks — essentially sorting all values from lowest to highest, such as arranging all wines by alcohol content. This rank-based strategy is the key trick that neutralizes the outsized influence of outliers. Here’s how to apply it:

# Splitting into our two groups
red_wine = df[df['type'] == 'red']['alcohol']
white_wine = df[df['type'] == 'white']['alcohol']

# Running the robust Mann-Whitney U test
mwu_results = pg.mwu(x=red_wine, y=white_wine)
print(mwu_results)

Output:

         U_val alternative     p_val       RBC      CLES
MWU  3829043.5   two-sided  0.181845 -0.022193  0.488903

With a p-value above 0.05, we conclude there’s no statistically significant difference in alcohol content between the two wine types — and this finding is fully resistant to the effects of outliers and skewness.

// Adventure 2: When the Paired T-Test Fails

Now suppose you need to compare two measurements taken from the same subject — say, a patient’s blood sugar before and after taking an experimental drug, or two chemical properties measured from the same bottle of wine. The critical question here is how the differences between paired measurements are distributed. When those differences aren’t normally distributed, a standard paired t-test will give you unreliable confidence intervals.

The go-to robust fix is the Wilcoxon Signed-Rank Test: the non-parametric counterpart to the paired t-test. It works by computing the differences between paired columns and ranking their absolute values. In Pingouin, you simply call pg.wilcoxon() and pass in the two columns containing the paired measurements from the same subject — for example, two types of acidity in wine.

# Running the robust Wilcoxon signed-rank test for paired data
wilcoxon_results = pg.wilcoxon(x=df['fixed acidity'], y=df['volatile acidity'])
print(wilcoxon_results)

Result:

          W_val alternative  p_val  RBC  CLES
Wilcoxon    0.0   two-sided    0.0  1.0   1.0

This result reveals a statistically significant difference — a “perfect separation” — between the two measurements. Not only are the two wine properties different, but they also exist on entirely different scales across the dataset.

// Adventure 3: When ANOVA Fails

In this third and final scenario, we want to determine whether residual sugar levels in wine vary significantly across different quality ratings — which range from 3 to 9 as whole numbers, making them suitable to treat as discrete categories.

If Pingouin’s Levene test for homoscedasticity fails badly — for instance, because sugar variance is enormous in low-quality wines but minimal in premium ones — a standard one-way ANOVA can produce misleading conclusions, since it assumes equal variances across all groups.

The remedy is Welch’s ANOVA, which adjusts for groups with higher variance, effectively leveling the playing field and enabling fairer comparisons across multiple categories. Here’s how to run this robust alternative to the classic ANOVA using Pingouin:

# Running Welch's ANOVA to compare sugar levels across quality ratings
welch_results = pg.welch_anova(data=df, dv='residual sugar', between='quality')
print(welch_results)

Result:

    Source  ddof1      ddof2          F         p_unc       np2
0  quality      6  54.507934  10.918282  5.937951e-08  0.008353

Even in situations where a standard ANOVA would struggle due to unequal variances, Welch’s ANOVA delivers a reliable conclusion. The extremely small p-value provides strong evidence that residual sugar levels do differ significantly across wine quality ratings. Keep in mind, though, that sugar is just one small piece of what determines wine quality — a fact reflected in the low eta-squared value of 0.008.

# Wrapping Up

Through three hands-on scenarios, each pairing a messy-data challenge with a robust statistical technique, we’ve seen that being a great data scientist isn’t about having pristine data or perfecting it — it’s about knowing how to respond when the data throws obstacles your way. Pingouin’s suite of functions implements a range of robust tests that help you sidestep the failed-assumptions trap and draw mathematically sound conclusions with minimal extra effort.

Iván Palomares Carrascosa is a leader, writer, speaker, and adviser in AI, machine learning, deep learning & LLMs. He trains and guides others in harnessing AI in the real world.

Top Posts

Exodus Bets That Self-Custody Can Transform Everyday Life

GSA’s Stealth Overhaul: How FAS 2.0 Is Silently Shifting Federal Procurement Power to ASD/Create

Open Weight Text-to-Speach with Voxtral TTS

Mastering Chaos: How the Modern Data Scientist Thrives on Messy Data with Pingouin

Secure Data, Smarter IoT: Inside the Federated Computing Platform Revolution

The Secret Sauce Behind ZDNET’s AI Testing Process

“Securing Profits Through AI Governance in the Enterprise”

Proxy-Pointer RAG: Turning Text into Cross-Modal Answer Maps

Moonshot AI Unleashes FlashKDA: Cutting-Edge Kernels for Kimi Delta Attention with Variable-Length Batching on H20

Small Business Hidden Tax: Stop Silent Payroll Errors from Draining Thousands

Exodus Bets That Self-Custody Can Transform Everyday Life

GSA’s Stealth Overhaul: How FAS 2.0 Is Silently Shifting Federal Procurement Power to ASD/Create

Open Weight Text-to-Speach with Voxtral TTS

Microsoft’s Modern Windows Run: A Speedier Alternative to the Legacy Dialog

This Portable Computer Is the Raspberry Pi Alternative I Never Knew I Needed

“How to Tell When Your Startup Is Ready to Scale for Enterprise Success”

Mastering Chaos: How the Modern Data Scientist Thrives on Messy Data with Pingouin

Open-Sourcing the Azure Integrated HSM: A Bold Step Toward Trust and Transparency

Trending

Exodus Bets That Self-Custody Can Transform Everyday Life

GSA’s Stealth Overhaul: How FAS 2.0 Is Silently Shifting Federal Procurement Power to ASD/Create

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Mastering Chaos: How the Modern Data Scientist Thrives on Messy Data with Pingouin

# Introduction

# Getting Started

// Adventure 1: When the Normality Test Fails

// Adventure 2: When the Paired T-Test Fails

// Adventure 3: When ANOVA Fails

# Wrapping Up

Related Posts