By Chandrakant Deshmukh, Senior Vice President – Architecture & IP Governance, Mastek
Step into any enterprise IoT project today and you’ll encounter the same frustrating reality: the devices are in place, the data is streaming, yet the operational insights promised years ago remain frustratingly elusive.
The issue is seldom the hardware, and almost never the network connection. It’s the underlying architecture – the choices about where data gets processed, how it travels, and what transformations it undergoes along the way.
After nearly twenty years building data platforms for manufacturing, supply chain, and infrastructure clients, I’ve reached a clear conclusion: the gap between data created at the Edge and data acted upon in real time is where most IoT return on investment silently vanishes.
Many enterprises still rely on a “lift and stream” approach — send everything to the Cloud, sort out the analytics afterward. That approach was financially justified in 2015. With today’s device densities and data volumes, it destroys the business case.
Why Cloud-first IoT architectures are failing
Three forces are combining to make cloud-centric IoT architectures fundamentally inefficient.
The first is latency. A Cloud round-trip on a good day takes 80–200 milliseconds. For a PLC controlling a robotic arm, a turbine governor, or a grid-balancing inverter, that is far too long. Real-time control loops cannot rely on such roundtrips; they require sub-10-millisecond decisions that only local processing can provide.
The second is bandwidth costs. A single modern CNC machine can produce several gigabytes of vibration and process data daily. Scale that across a plant of three hundred machines, then across a fleet of forty plants, and the data transfer costs alone will devour the modernization budget.
The third is data sovereignty and compliance. For regulated sectors — healthcare, defence, critical infrastructure — certain categories of data are legally prohibited from leaving the facility, the region, or the country of origin. The architecture must respect that restriction at the moment data is created, not at the point of analysis.
What I encounter most frequently is a version of the same flawed pattern: a Cloud data lake acting as a transit hub, batching everything overnight, with “real-time” dashboards that are actually four-hour-old summaries presented urgently. That is not real-time intelligence — it is a reporting delay disguised with different terminology.
The Edge–Cloud continuum: a superior framework
The architectural shift most organizations need is to stop viewing Edge and Cloud as rival locations, and start treating them as layers on a continuum.
At the device layer, you handle raw signals and instant control. At the Edge compute layer — a rugged server or gateway — you filter, aggregate, run inference, and make local decisions. At the regional Cloud, you store, correlate across sites, and deliver near-real-time analytics. At the global Cloud, you train models, run long-term analytics, and push improvements back down to the lower layers.
The core principle is data gravity: process data as close as possible to where it is created and move only what truly needs to move.
A vibration waveform sampled at 25 kHz does not need to go anywhere; a one-second FFT summary and an anomaly alert do. A surveillance feed does not need to stream nonstop; the event clips around a detected intrusion do.
The tipping point where tiered architecture delivers returns is typically one of three: when the latency requirement drops below fifty milliseconds, when per-site data volume exceeds roughly one terabyte per day, or when regulated data surpasses ten percent of the total pipeline. Reach any one of those thresholds and the business case for Edge compute is already justified.
Modernizing the pipeline: what it looks like in practice
A modern IoT data pipeline has three essential components.
First, an event-driven transport backbone. MQTT 5.0 at the Edge for lightweight device telemetry, Kafka at the regional tier for durable, replayable streams. The combination matters — MQTT manages the constrained-device side; Kafka handles the enterprise-integration side, and a broker connects them.
Second, a stream-processing layer near the data. Apache Flink, Spark Structured Streaming, or the native stream analytics services within Azure IoT Hub, AWS IoT Core, and GCP IoT are all viable; the choice matters less than the commitment to processing streams, not static batches.
Third, a digital twin as a synchronization agreement. A well-designed twin provides a canonical representation of each physical asset’s state, linking edge observations and cloud analytics without either side needing to understand the other’s data structure.
This is where most pipelines either become sustainable or collapse under their own complexity.
Teams that succeed treat schema as a first-class artifact. A schema registry, enforced at ingestion, is not bureaucratic overhead — it is the difference between a pipeline that evolves smoothly and one that silently corrupts its own downstream analytics.
Governance, IP, and the question most architects overlook
This is the topic most IoT architecture discussions sidestep, and it is the one I consider most significant. In a multi-vendor industrial IoT deployment — where sensor manufacturers, integrators, cloud providers, and the enterprise all interact with the data — who owns what the devices produce?
The answer is almost never in the contract. It should be. Data ownership, derived data rights, training data rights for models built on that telemetry, and the right to extract data upon contract termination are all negotiable terms. Most enterprises realize they have surrendered the valuable ones only when they attempt to change vendors. An IP governance framework for IoT begins with explicit data classification at ingest, lineage tracking from device to decision, and contractual clarity on who can train what on which dataset.
On the technical side, that translates to three capabilities: end-to-end data lineage so any downstream insight can be traced back to the exact device, firmware, and calibration state that generated it; zero-trust identity at every edge node, because a compromised gateway is a compromised pipeline; and observability that treats data quality as a first-class signal alongside latency and throughput.
The payoff, and what to do Monday morning
When this architecture comes together, the outcomes are concrete: predictive maintenance that activates on genuine anomaly detection rather than calendar schedules; supply chain adjustments that happen in minutes rather than overnight; energy optimization driven by live consumption patterns rather than monthly bills. These are not theoretical benefits — they are the standard for organizations that committed to the architectural shift three to five years ago.
For leaders assessing their own situation, the evaluation is straightforward. Are you processing data at the appropriate tier, or sending raw telemetry to the cloud out of habit? Are your pipelines built on streams, or are your “real-time” dashboards really disguised batch processes? Do you know, contractually and technically, who owns the data your devices generate — and can you demonstrate lineage from device to decision?
Edge and Cloud data modernization is not a technology initiative. It is an architectural commitment — and the enterprises that will lead the next decade of industrial intelligence are the ones making that commitment today, not retrofitting it later.
Author biography:
Chandrakant Deshmukh is Senior Vice President – Architecture & IP Governance at Mastek, a global IT services and digital and Cloud transformation company, headquartered in India.
There’s plenty of other editorial on our sister site, Electronic Specifier! Or you can always join in the conversation by commenting below or visiting our LinkedIn page.



