the shape of data work

why data science is really about plumbing, not algorithms

16-Jul-25

Everyone thinks data science is about algorithms. It’s not. It’s about plumbing.

When someone asks for a “quick analysis” they imagine you’ll wave a magic wand over their data and insights will appear. What actually happens looks more like renovating an old house. Most of the work happens before anyone sees results.

The real process has a shape, and it’s not linear. It’s a funnel that keeps trying to become a circle.

You start wide: framing the problem. This is where half of all projects die. People say things like “we need insights from our data.” That’s not a problem. It’s a wish. A problem sounds like “we’re losing 10% of customers after their first purchase and don’t know why.

Then comes discovery. You’re not analyzing yet; you’re spelunking. Where does this data live? Who owns it? Is that CSV file from 2019 still accurate? You discover that “customer data” means seven different things to seven different departments.

Next is the grind: collection, cleaning, storage. This is 80% of the work and 0% of what anyone wants to hear about. You’re finding that dates are stored as text, customer IDs don’t match between systems, and someone has been entering “NULL” as a string.

The cleaning phase reveals the truth: all data is dirty. The question is whether it’s dirty in consistent ways. You’re not just fixing typos. You’re building a pipeline that handles the specific ways your organization creates chaos.

Only then do you get to the fun part: exploration and modeling. This is what everyone imagines data science is. Finding patterns, building forecasts, creating those satisfying visualizations that make everything suddenly clear.

But here’s the twist: it’s rarely linear. Every insight loops back. That beautiful model reveals your data has gaps. That perfect visualization shows you’ve been asking the wrong question. The prescriptive model that suggests the best action makes someone ask, “But what if we looked at it quarterly instead?

The best data scientists aren’t the ones who know the fanciest algorithms. They’re the ones who’ve accepted this shape. They know that “quick analysis” is an oxymoron. They build systems that expect iteration.

They also know the secret: the earlier stages aren’t overhead. They’re where the real insights live. You learn more about a business by discovering how they store customer IDs than by any clustering algorithm.

The four types of analytics (descriptive, diagnostic, predictive, prescriptive) aren’t stages. They’re tools you cycle through, each revealing new questions that send you back to the beginning.

That’s why the funnel wants to become a circle. Good data work doesn’t end with deployment. It ends with new questions that are better than the ones you started with.