# SQL + Python Just Isn’t Enough
For a long time, the path to landing a data job appeared straightforward: master SQL, learn Python, and you’re in. As mid-sized companies began embracing data-driven decision-making, hiring managers were thrilled to find anyone who could craft a decent GROUP BY and manipulate a pandas DataFrame without causing errors. If you know what PostgreSQL is? Welcome aboard! This approach worked for a while. Until it stopped.
If you haven’t caught on yet, the data professional job market has experienced a fundamental transformation. Yes, SQL and Python remain essential; they appear on every job listing. But they’ve shifted from standout qualifications to basic requirements.
Chances are, you’re still preparing for the interview questions you practiced three years ago. Let it go. This article addresses the disconnect between what candidates study and what companies truly need today.
# What the Job Market Is Actually Asking For
A January 2026 analysis by Future Proof Data Science of over 700 data scientist job postings revealed that Python and SQL still rank among the top three skills, but machine learning and AI skills hold the second and fourth positions.

Image Source: Future Proof Data Science
Not all AI-related job postings demand hands-on AI expertise, but one in three do. The most sought-after specific AI skills are:
- Large language models (LLMs)
- Retrieval-augmented generation (RAG)
- Prompt engineering
- Vector databases
This reflects a growing need for data professionals capable of building and deploying AI systems.
Remember that both the direction and speed of this shift are significant. It reminds me of how machine learning evolved from a specialized requirement in 2012 to an almost universal one by 2020.
The second trend is less obvious but arguably more pressing for most candidates: the baseline engineering expectations have climbed sharply. Data engineering competencies — pipelines, orchestration, cloud platforms, data quality checks — and machine learning in production — model monitoring, drift detection, evaluation design — are now fundamental expectations rather than extras in data science job listings.
A quick look at any major job board verifies it: alongside AI skills, positions labeled “Data Scientist” regularly include Snowflake, dbt, Airflow, and ETL pipeline ownership as requirements, not optional bonuses.
There are four skills you’re likely overlooking. These are the new standout qualifications in today’s job market.

# Skill #1: Data Modeling
// What It Is
Data modeling is the skill of designing how data should be organized, connected, and stored. Picture it as determining which tables to build, what they represent, and how they link to one another.
// Why It Became a Differentiator
Advances in tools reshaped the field. Snowflake, dbt, and BigQuery have made it fairly straightforward for data scientists to take charge of the data transformation layer. Put simply, modeling decisions once handled by data engineers are now being passed to data scientists.
Get a data schema wrong, and you’re in trouble. Usually, these mistakes aren’t apparent right away. By the time they surface, it’s too late. Your machine learning efforts have already been compromised by feature engineering built on data with incorrect granularity — a direct result of a poorly designed foundation.

// How to Acquire It
Take a real dataset you work with and redesign its schema from the ground up. Ask yourself these questions:
- What are the entities?
- What do they connect to?
- What level of detail makes sense?
- Which queries will be run most often?
After that, study dimensional modeling. Kimball’s methodology, outlined in his book The Data Warehouse Toolkit, remains a valuable reference.
# Skill #2: Performance Optimization
// What It Is
Performance optimization means grasping why a query behaves the way it does and how to make it faster, more cost-effective, or capable of handling larger scale. You can fine-tune SQL queries, but also Python pipelines and data workflows broadly — data scientists increasingly manage them from start to finish.
// Why It Became a Differentiator
First, data volumes have expanded to where a correct but sluggish query can cost hundreds of dollars and fail in production.
Second, as noted earlier, data scientists now need to own far more of the pipeline than before. Your code must be production-ready, not just functional in Jupyter notebooks.

// How to Acquire It
Select a few complex SQL queries you’ve written, run EXPLAIN ANALYZE on them, and examine what the query planner actually executed. Then leverage those insights to optimize the query. You’ll likely discover at least one index, restructuring, or rewrite that enhances each query.
For a slow Python pipeline, profile it. There are two primary tools for timing:
- cProfile: Execute it with
python -m cProfile -s cumulative your_script.pyand review the top of the output to identify the functions consuming the most cumulative time. - line_profiler: Provides deeper insight by displaying execution time line by line within a specific function. Apply it once cProfile has revealed which function is slow and you need to understand why.
For memory, use memory_profiler.
Identify the bottleneck — is it slow because a Python loop should be vectorized? Is data loaded into memory all at once rather than in chunks? — address it, and measure the improvement.
# Skill #3: Infrastructure Awareness
// What It Is
This skill means you understand the systems where data resides and flows through. These systems include cloud platforms, distributed computing, data pipelines, storage formats, and cost structures.
You should know enough about the infrastructure to design systems that can be deployed within it.
// Why It Became a Differentiator
Again,
Because much of what used to be a data engineer’s responsibility has shifted onto data scientists. If you rely entirely on data engineers for every infrastructure decision, you’re creating a bottleneck — and that’s not something hiring managers want to see.
Infrastructure awareness covers these key interconnected areas.

You’ll likely need to get comfortable with these tools.

// How to Acquire It
Set up time with your data engineering team. Sit alongside them and ask them to walk you through an entire pipeline. Learn where data is stored, how it’s organized, and what happens when something fails.
Then take it a step further by building a small pipeline on your own: use a free cloud tier, get familiar with cost and performance metrics, and then intentionally break the pipeline to see how it fails.
# Skill #4: Designing RAG Systems, Evaluating LLM Outputs, and Running AI Experiments
// What It Is
This group of skills centers on hands-on AI work. You need to know how to design retrieval-augmented generation (RAG) systems (linking LLMs to real data sources), build evaluation frameworks (determining whether an LLM-powered feature is genuinely effective), and run experiments on AI features.
// Why It Became a Differentiator
AI tools are the reason. They made it possible to build a RAG pipeline without deep research expertise. Frameworks like LangChain and LlamaIndex, paired with cloud-native vector databases, significantly lowered the entry barrier.
So the question is no longer whether it can be built — it can. But can it be built well, evaluated, and trusted in production? Answering that is what you need to do: define metrics, design experiments, and measure results.

When applying these skills, you’ll work with these tools.

// How to Acquire It
Find some interview questions to sharpen your AI thinking. Here are a few examples from AI Product & GenAI interview questions on StrataScratch.
Example #1: Measuring AI Feature Rollout in Retail Stores
How would you measure the impact of an AI-powered inventory recommendation system being rolled out to a sample of retail stores? How would you design the experiment and account for store-level variation?
Example #2: RAG System Architecture
Describe how you would architect a RAG system from scratch. What components are needed, and how would you optimize retrieval quality?
Once you’ve clarified your thinking, build a small RAG application: pick a domain, embed a document corpus, set up retrieval, and evaluate the outputs using a structured metric.
Also, design an experiment: write out a hypothesis, define the metrics, and think through a valid test to evaluate it.
# Conclusion
The four skills — data modeling, performance optimization, infrastructure awareness, and practical AI skills — represent the gap between you and the job market. Hopefully, you won’t fall into it. To help you avoid that, this article has included practical guidance on how to develop each one.
Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.



