A well-known proverb in American culture goes like this:
“You can’t have your cake and eat it too.”
This phrase strikes me as both beautifully poetic and refreshingly practical. At its core, it reminds us that every achievement comes with compromise — nothing comes for free.
We won’t dive into the deeper philosophy here, but when it comes to data science and software engineering, this idea hits especially close to home. Here’s why.
In the world of software engineering and data science, there’s no such thing as a universally “perfect” design. An algorithm that works brilliantly in one context may fall flat in another.
Consider the classic tradeoff between computation speed and memory usage. For instance:
It’s smart to precalculate the distance between two cities and store the result — there’s no reason to recalculate it on every request. Cities don’t move around, after all, and constantly redoing that math would be wasteful and inefficient. [Case A]
On the flip side, expecting a chatbot to memorize every possible question a user might ask — along with prepackaged answers — just isn’t realistic. The range of queries is far too vast and ever-changing, so real-time processing is the only sensible approach. [Case B]
In Case A, we trade extra memory for lightning-fast results. In Case B, we invest more processing power but avoid the burden of storing countless potential responses.
So can we enjoy zero computation and zero memory usage? Not a chance — after all, you can’t have your cake and eat it too 🙂
Now let’s explore a more modern and relevant example: Large Language Models (LLMs).
LLMs represent the cutting edge of AI, trained on vast swaths of human knowledge. They’re also enormous — so large that most teams access them via APIs rather than hosting them locally. But every API call costs tokens, and tokens cost money.
Picture this: you want an AI to help pick the perfect restaurant for dinner. You type something like, “Find me a romantic Italian spot that’s not too pricey and in a great neighborhood.”
If the LLM had to scan every restaurant on Earth to check if it’s Italian, affordable, well-located, and nearby — you’d be waiting forever and racking up a massive bill. By the time it finishes, you’d already be asleep!
Still, we don’t want to throw away the incredible language understanding and reasoning abilities LLMs offer. The trick is knowing when to use their full power — and when to hold back. After all, using the smartest part of the system for every tiny task would be like trying to eat your cake and keep it whole at the same time.
In this article, I’ll walk you through a practical blueprint for building intelligent, LLM-enhanced recommendation systems — using our restaurant finder as a hands-on example.
The system takes a user’s natural-language description of their dream restaurant in a given city and returns a curated list of top matches.
Ready? Let’s dive in!
1. System Design
That cake proverb has a formal name in engineering circles: the Accuracy-Scale-Time triangle:
- You can build something highly accurate that handles huge datasets — but it’ll be slow.
- You can make something fast and accurate — but it won’t handle large-scale data well.
- You can create something fast and scalable — but it won’t be very accurate.
Naturally, we want our final results to be accurate — so relying solely on option 3 won’t work. But here’s the clever part: we can layer a more precise model on top of the fast, scalable one. In other words, option 3 gives us a solid shortlist of candidates quickly, and then we use an LLM to fine-tune the final picks.
Here’s how the architecture breaks down:
- A fast, rule-based search retrieves the top K most relevant restaurants (high recall, low precision).
- A slower, highly intelligent LLM then evaluates those K options and selects the best match based on the user’s query (high precision, AI-driven).
This way, we avoid burning resources on the LLM for every single restaurant — but still benefit from its intelligence by applying it only to a focused set of promising candidates.
Enough theory — time to start coding!
2. The Script
2.1 The Setup
I’ve handled the groundwork for you 🙂
Everything is structured using object-oriented programming (OOP), with modular scripts and a streamlined pipeline that manages the full workflow. You can find the code in this GitHub repository. To get started, clone the repo and use the import block below:
2.2 Data Generation
Before recommending anything, we need data to work with. In a production environment, you’d pull from a live restaurant database — say, one stored in an S3 bucket. For this tutorial, though, we’ll generate a synthetic dataset so everything runs smoothly and costs nothing.
That’s the job of the RestaurantDataGenerator class in datagenerator.py. It creates a reproducible table of roughly 10,000 restaurants spread across eight major U.S. cities: New York, San Francisco, Chicago, Austin, Seattle, Boston, Miami, and Denver. Each entry includes:
– a randomly generated restaurant name
– a city and latitude/longitude coordinates sampled near the city center (within about 13 km)
– a cuisine type (Italian, Japanese, Mexican, Thai, French, etc.)
– a dietary category (omnivore, vegetarian, vegan)
– an average rating
– a total number of reviews
– a price tier (such as 10, 100, or 1,000—representing a rough spending-per-person range).
This tool is designed to be used one time only. Creating the dataset is straightforward—just do the following:
That single action saves the data to data/restaurants.csv, structured like this:
Great! Now that our restaurant data is ready, let’s explore how to use it for recommendations.
2.3 Building the Candidate List
This marks Phase 1 of the recommendation pipeline: a fast, low-cost filtering method based on rules. Users specify their city, and we narrow down options to the nearest locations. The system first selects entries from that city, calculates the straight-line distance between the user and each restaurant, then picks the top N_DISTANCE_CANDIDATES (default: 50).
This step prioritizes recall over accuracy. It processes all 10,000 restaurants without making any API calls or incurring token fees. While the logic isn’t complex, it effectively removes irrelevant options—already a significant improvement.
For instance, consider this real-world query:
“affordable vegan tacos in a vibrant setting” across several cities
Here’s what comes back:
You’ll notice the initial results don’t reflect “vegan,” “affordable,” or “tacos”—only proximity. That’s fine. This stage just creates a geographically relevant shortlist, which the LLM will refine further in the next phase.
Time to bring in the language model!
2.4 Refining the Shortlist
Welcome to Phase 2: the slower, smarter, precision-focused step powered by the LLM. This operates on the 50-candidate list generated previously. Never does the model see the full dataset—it only evaluates the small, location-filtered subset.
We communicate with the model via a lightweight OpenAI client. The API key is pulled from OPENAI_API_KEY in your environment. The recommendation logic lives in RestaurantRecommender, triggered via RestaurantRecommender.recommender(query, city):
A few key details stand out:
- Precision improves significantly. Phase 1 cast a wide net; now the LLM evaluates the actual request (“affordable vegan tacos in a vibrant setting”) and returns only the top 5–10 matches with a clear
fit_score.
- Structured responses using Pydantic. No messy text parsing—the model outputs results that conform strictly to a predefined Pydantic schema (thanks to OpenAI’s structured output feature), ensuring consistency every time.
Each result includes the restaurant_id and name (from the candidate pool), a fit_score ranging from 0 to 100, and a brief reason. The whole response is wrapped in a user-friendly summary. When tested across three cities, here’s an example output:
As you can see, this is a major upgrade over the raw distance-based lists from 2.3. Earlier, the “closest” restaurant was often unrelated (Korean, Lebanese, or vegetarian Mexican spots). Now, the model reorganizes the same 50 candidates based on what the user truly wants: vegan and Mexican options rank highest with strong fit_scores, while partial matches are scored lower with honest explanations—exactly the kind of accuracy the LLM delivers without blowing up costs.
3. Results
Let’s take a step back and assess what this two-stage approach achieved, using our recurring example: “affordable vegan tacos in a vibrant setting” across three cities.
- Phase 1 produces candidate options. These initial lists prioritize broad recall, not pinpoint accuracy.
- Phase 2 surfaces the best recommendations. The LLM reorders those 50 candidates based on real user intent.
Final top picks per city:
- New York: Golden Spoon (vegan, 4.9★) and Maison Fork (Mexican, within budget) lead with fit scores of 90 and 85.
- Miami: Royal Tavern & Co. (vegan, Mexican, affordable) tops the list at 85.
- Boston: Urban Spoon and Little House, both budget-friendly Mexican restaurants, claim the top two spots with fit scores of 90 and 85.
Across all cities, the model elevated entries that aligned with vegan, affordable, and Mexican/taco preferences—and was transparent about near matches: those hitting one criterion but not others appear as backups with noticeably lower fit_scores.
4. Conclusions
Thanks so much for joining me today—it truly means a lot. ❤️ Here’s a quick recap of what we accomplished:
– Designed a scalable, intelligent two-stage recommendation system.
– Applied a fast, rule-based geographic filter (Phase 1) to trim 10,000 restaurants down to the nearest 50.
– Leveraged an LLM for final ranking (Phase 2) to select the top 5–10 options, each with a clear score and explanation.
In practice, this kind of architecture is widely adopted. It keeps costs low by using the LLM sparingly, while delivering highly relevant, context-aware results—making it both efficient and smart.
7. Before You Go!
Really appreciate your time—it means everything. I’m Piero Paialunga, and here’s a little about me:

Originally from Italy, I earned my Ph.D. from the University of Cincinnati and currently work as a Data Scientist at The Trade Desk in New York City. I write about AI, Machine Learning, and the evolving role of data scientists here on TDS and on LinkedIn. If you enjoyed this piece and want to stay updated on my research in machine learning, you can:
A. Connect with me on LinkedIn for all my latest posts
B. Check out my GitHub to explore my code
C. Feel free to reach out via email at piero.paialunga@hotmail



