Ivory Tower Notes: The Methodology

That is publish #2 in my Ivory Tower Notes collection. In publish #1, I wrote about the issue: how each knowledge and AI undertaking begins.

This time, the subject is the methodology, and why “prompt in, slop out” is what typically occurs once we skip it.

Immediate in, slop out

I smirked barely when one in all my connections commented, “You Sent Me AI Slop” beneath some random publish that had a whole lot of likes. The publish, which contained a call matrix, supplied steering on which platform to make use of for particular knowledge workloads, albeit with questionable standards. High quality apart, it actually regarded nice.

My amusement didn’t finish there as I considered how AIS, i.e., “AI Slop”, ought to be added as a button to all social media now alongside the like button.

If any YouTube of us learn this, it is a function concept as an alternative of quizzing individuals, “Does this feel like AI slop?”

Nonetheless, YouTube nailed the “feel” half as a result of all of us are inclined to make choices based mostly on feelings, typically on the expense of important considering.

Why would we make investments power in empiricism, rationalism, and scepticism when we now have AI now? Deadlines usually are not on our facet, and we now have this new software that delivers outputs for us, whatever the “prompt in, slop out” impact.

However let’s assume you’re genuinely enthusiastic about how Platform A compares to Platform B by way of machine studying (ML) capabilities, since you’ve seen two knowledge groups in your organization utilizing separate platforms for nearly an identical ML use circumstances. So, your purpose is to compile an goal overview of each and suggest lowering improvement prices by retaining just one.

What now? How do you decide whether or not you must consolidate ML workloads?

Certainly not by relying purely on AI, however slightly on…

The trail of inquiry

And so that you’re again to Ivory Tower days once more, the place you had been taught that each discovery is roofed by “The methodology”:

The issue → The speculation → Testing the speculation → The conclusions

Furthermore, you had been taught that discovering the issue is half the work, and the artwork of getting there lies in asking good inquiries to slender it right down to one thing particular and testable.

Therefore, you’re taking the imprecise query, “Should we consolidate onto one ML platform?”, and you retain rewriting it till it turns into one thing a take a look at can reply:

Does Platform A run our churn pipeline at comparable accuracy and decrease value than Platform B?

Now you have got outlined a topic, a comparability, and issues you’ll be able to measure, which is sufficient to flip a enterprise query right into a testable speculation.

However first, you do your homework and collect extra info, reminiscent of what Platform B prices per job immediately, what accuracy it hits, and the way it’s designed (e.g., the information, algorithm, and hyperparameters it makes use of), with the intention to reproduce the pipeline on Platform A.

Then, earlier than opinions in your query begin to roll in, you state:

If we run the identical churn pipeline on Platform A as an alternative of Platform B, utilizing an identical knowledge, algorithm and hyperparameters, then the median per-job value will drop by at the least 15%, whereas the imply accuracy stays inside 1 proportion level of Platform B’s.

With this “if-then” formulation, you managed to settle down (at the least some) opinionated solutions, realizing that the PoC comes subsequent. Thus, to check the said presumption, you design and run the PoC, the place you modify solely the unbiased variable, which is the platform. Along with this, you freeze the management variables: the dataset, the algorithm, and the hyperparameters, and measure value and accuracy, that are your dependent variables.

You additionally repeat the run a number of instances to separate the sign from the noise by amassing a number of knowledge factors, contemplating how a single run could be skewed by environmental noise (e.g., cache), and also you need to keep away from that state of affairs. Then you definately account for extra nuances, e.g., triggering runs at totally different instances of day (morning, night, or evening), to reveal each platforms to the identical mixture of situations.

Lastly, you acquire all the outcomes and consider the information in opposition to your speculation, which leads you to one in all these three outcomes:

Final result 1: The info helps your speculation*. The a number of runs present how Platform A is at the least 15% cheaper, and accuracy remained inside the outlined threshold. (*For the be aware solely: the information will help, however not show your speculation, i.e., it gives you a purpose to carry on to it, which in science is as near a “yes” as you get.)
Final result 2: The info rejects your speculation. The a number of runs present how Platform A failed to satisfy one or each standards; it was solely 5% cheaper, or the price dropped, however the accuracy degraded past the outlined threshold.
Final result 3: Your runs are too noisy to name it both manner, and the one reply is to maintain testing earlier than drawing any conclusions.

Whichever state of affairs you land in, you have got findings: you both confirmed your educated guess, realized one thing new, or found that you could maintain testing.

And to be clear about this brief instance: the primary two conclusions received’t provide the inexperienced mild to consolidate two platforms. Company actuality (and a radical analysis) is a bit messier than that, and there’s extra knowledge (to be collected and evaluated) affecting individuals and processes than a single-scoped PoC can settle.

All proper, we will cease with the methodology now, as a result of most of you’re in all probability studying the steps above and questioning…

Photograph by Toru Wa on Unsplash

What the dickens? The place’s AI in all this?

I can solely think about how one thing just like: “MCP, agentic frameworks, agents,…” was going via your head when studying the steps above. Couldn’t agree extra, all great things, and that is how you would velocity up the method.

Nevertheless, merely posting AI outputs from a immediate like, “Give me an overview on how Platform A compares to Platform B for ML workloads,” is the place the slop happens, and “if you aren’t doing the hands-on, your opinion about it is very likely to be completely wrong.”

“If you aren’t doing the hands-on, your opinion about it is very likely to be completely wrong.”

Relevance and optimistic affect don’t come from fairly AI posts or presentation infographics, and so they can injury work relationships.

When you’re already influencing and need to be seen as an authority, it will be simpler to share views and findings from real-life experiments and your personal confirmed expertise.

As an alternative of beginning your posts “This is where you should use Platform A over Platform B for…”, attempt one thing extra concrete (if it’s true, in fact):

“When we (I) changed the [independent variable] to see how it affects the [dependent variable], while keeping the [control variables] the same, our (mine) findings were…”

After which see whether or not the variety of your followers will increase, and report again the findings.

The inspiration for this publish got here from a Croatian paper by Professor Mladen Šolić, “Uvod u znanstveni rad” (Introduction to Scientific Analysis, 2005, [LINK]). I first learn it as a scholar, and it’s nonetheless one of many clearest explanations of easy methods to conduct scientific analysis I’ve come throughout.

Thanks for studying.
If you happen to discovered this publish useful, be happy to share it along with your community. 👏
Join for extra tales on Medium ✍️ and LinkedIn 🖇️.

Top Posts

panic and abort restoration in wasm‑bindgen

I arrange this Linux ‘Watchdog’ and now my system auto-reboots when it locks up

HII companions with Path Robotics, GrayMatter Robotics to speed up shipbuilding

Photon Releases Spectrum: An Open-Supply TypeScript Framework that Deploys AI Brokers On to iMessage, WhatsApp, and Telegram

Seeing What’s Attainable with OpenCode + Ollama + Qwen3-Coder

I received an early have a look at ChatGPT Pictures 2.0, and it is spectacular – with one exception

Git UNDO : The right way to Rewrite Git Historical past with Confidence

Gradient-based Planning for World Fashions at Longer Horizons – The Berkeley Synthetic Intelligence Analysis Weblog

A Coding Implementation on Qwen 3.6-35B-A3B Overlaying Multimodal Inference, Pondering Management, Software Calling, MoE Routing, RAG, and Session Persistence

panic and abort restoration in wasm‑bindgen

I arrange this Linux ‘Watchdog’ and now my system auto-reboots when it locks up

HII companions with Path Robotics, GrayMatter Robotics to speed up shipbuilding

Ivory Tower Notes: The Methodology

Ripple Checks RLUSD for Actual Commerce Settlements in MAS Sandbox

Anthropic bets on EPSS for the approaching bug surge

10 GitHub Repositories To Grasp Claude Code

Trump administration tosses diploma necessities for federal IT managers

Trending

panic and abort restoration in wasm‑bindgen

I arrange this Linux ‘Watchdog’ and now my system auto-reboots when it locks up

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Ivory Tower Notes: The Methodology

Immediate in, slop out

The trail of inquiry

What the dickens? The place’s AI in all this?

Related Posts