Building A Custom GStreamer Plugin For NVIDIA DeepStream

A production-ready pipeline for multi-stream video analytics: hardware-accelerated decoding, tracking, on-screen display, and message brokering, all wired through GStreamer. For standard detection models exported to TensorRT, nvinfer handles everything.

However, the common case has its limits. Vision-language models, custom post-processing, rotated bounding boxes, or the need to hot-swap models at runtime — these are situations where nvinfer‘s built-in assumptions fall apart. Sometimes you have a mature PyTorch inference stack that your team has carefully tuned, and you want DeepStream to call that rather than reimplementing it in a config file.

It’s worth noting that for YOLO-family models specifically, DeepStream-Yolo by Marcos Luciano has already done excellent work implementing custom post-processing in C++. If C++ is an option for you, start there. This article takes a different approach: achieving the same result entirely in Python, using a custom GStreamer plugin with pyservicemaker — without sacrificing throughput.

The key insight that makes this possible: downstream elements like nvtracker, nvdsosd, and nvmsgconv don’t care which element produced the detection metadata. Write to DeepStream’s metadata structure correctly, and the rest of the ecosystem works as if nvinfer was never in the picture.

DeepStream Metadata

Every buffer flowing through a DeepStream pipeline carries more than just pixel data. From the moment frames pass through nvstreammux, each GstBuffer has an NvDsBatchMeta structure attached to it. The hierarchy is straightforward and can be found in the official documentation.

NvDsBatchMeta
├── NvDsUserMeta                        (batch-level custom metadata)
└── NvDsFrameMeta                       (one per source stream)
    ├── NvDsUserMeta                    (frame-level custom metadata)
    └── NvDsObjectMeta                  (one per detected object)
        ├── NvDsClassifierMeta
        └── NvDsUserMeta                (object-level custom metadata)

NvDsBatchMeta describes the entire batch. Each NvDsFrameMeta corresponds to one source stream and carries frame-level information such as the source ID and frame number. Each NvDsObjectMeta represents a single detection — meaning that when our plugin writes detections, we’ll create an NvDsObjectMeta for each one.

The critical thing to understand is that none of this is owned by nvinfer. It’s a shared data contract. Any GStreamer element in the pipeline can read from it, write to it, or both:

nvtracker reads object bounding boxes and writes tracking IDs.
nvdsosd reads boxes and labels to draw overlays.
nvmsgconv reads the whole structure to produce message payloads.

Our custom plugin will simply write detections into this structure the same way nvinfer would, and everything downstream picks them up without modification. One important constraint worth understanding before we write any code: NvDsObjectMeta instances cannot be constructed directly from Python. Attempting to instantiate the class raises a No constructor defined! error at runtime.

The reason is architectural. DeepStream manages its metadata objects through memory pools — pre-allocated blocks that get recycled across frames to avoid the overhead of repeated heap allocation and deallocation in a high-throughput pipeline. These pools are owned by NvDsBatchMeta and live on the C side of the boundary. The Python bindings expose access to those pools, but deliberately don’t expose a Python-side constructor, because creating an NvDsObjectMeta outside the pool would bypass the lifecycle management that keeps DeepStream’s memory usage predictable. The correct way to get one is to ask the batch for it: batch_meta.acquire_object_meta(), which hands you a pre-allocated instance from the pool. When the frame is done, DeepStream returns it to the pool automatically.

The Python Bridge: pyservicemaker

To interact with DeepStream’s metadata from Python, we’ll use pyservicemaker, NVIDIA’s current, supported Python SDK for DeepStream. The official documentation covers the basics of pipelines and flows, but stops short of showing how to write and attach metadata from a custom inference element. That’s the gap this article fills.

The key abstraction is BatchMetadataOperator. Subclassing it and implementing handle_metadata(batch_meta) gives you access to the full NvDsBatchMeta for every buffer flowing through the pipeline. From there, iterating frames is as simple as using batch_meta.frame_items and attaching a detection object.

pyservicemaker also provides a Buffer wrapper around Gst.Buffer that exposes batch_meta directly and, importantly, an extract(batch_id) method that returns a DLPack handle to each frame’s GPU memory. That’s what makes zero-copy inference possible — you can hand the frame straight to TensorRT without ever leaving the GPU.

Rather than using BatchMetadataOperator standalone via a probe, we’ll fold the same pattern directly into our custom plugin’s do_transform_ip method, which gives us control over the element’s lifecycle, properties, and caps negotiation alongside the metadata access. But first, we need to build that plugin.

A Discoverable Python GStreamer Plugin

GStreamer discovers plugins at runtime by scanning directories listed in GST_PLUGIN_PATH. For Python plugins specifically, it looks inside a python/ subdirectory within each of those paths. That means your plugin is just a .py file dropped in the right place — no compilation, no CMake, no shared library. The tradeoff is that the registration pattern is strict, and getting it wrong produces silent failures that are genuinely hard to debug.

$GST_PLUGIN_PATH/
└── python/
    └── gstexampleplugin.py   # your plugin

Set GST_PLUGIN_PATH to point at the parent directory, and GStreamer will find python/gstexampleplugin.py automatically on the next pipeline run.

The Plugin Skeleton

Here’s the minimal skeleton for a passthrough inference element: it receives batched video buffers, runs inference, attaches metadata, and passes the buffer downstream unmodified.

import gi
gi.require_version('Gst', '1.0')
gi.require_version('GstBase', '1.0')
from gi.repository import Gst, GstBase, GObject

import torch
from pyservicemaker import Buffer

GST_PLUGIN_NAME = "gstexampleplugin"

Gst.init(None)

class GstExamplePlugin(GstBase.BaseTransform):

    __gstmetadata__

Given the HTML content you provided, here is the paraphrased version. The HTML structure is preserved, while the text has been rewritten for clarity and readability.

import gi
gi.require_version('Gst', '1.0')
gi.require_version('GstBase', '1.0')
gi.require_version('GstVideo', '1.0')
from gi.repository import Gst, GstBase, GObject

import torch
from pyds import Buffer

# --- Plugin Metadata ---
GST_PLUGIN_NAME = 'gstexampleplugin'

__gstmetadata__ = (
    'GstExamplePlugin',                     # name
    'Filter/Effect/Video',                  # classification
    'Custom inference element',             # description
    'Your Name'                             # author
)

src_format = Gst.Caps.from_string(
    "video/x-raw(memory:NVMM), format=RGB, "
    "width=(int)[ 1, 2147483647 ], height=(int)[ 1, 2147483647 ], "
    "framerate=(fraction)[ 0/1, 2147483647/1 ]"
)
sink_format = Gst.Caps.from_string(
    "video/x-raw(memory:NVMM), format=RGB, "
    "width=(int)[ 1, 2147483647 ], height=(int)[ 1, 2147483647 ], "
    "framerate=(fraction)[ 0/1, 2147483647/1 ]"
)

src_pad_template = Gst.PadTemplate.new(
    "src", Gst.PadDirection.SRC, Gst.PadPresence.ALWAYS, src_format
)
sink_pad_template = Gst.PadTemplate.new(
    "sink", Gst.PadDirection.SINK, Gst.PadPresence.ALWAYS, sink_format
)
__gsttemplates__ = (src_pad_template, sink_pad_template)

__gproperties__ = {
    'model-engine': (
        str,
        'TensorRT engine path',
        'Path to the .engine file',
        '',
        GObject.ParamFlags.READWRITE
    ),
    'confidence-threshold': (
        float,
        'Confidence threshold',
        'Minimum confidence to attach a detection',
        0.0, 1.0, 0.5,
        GObject.ParamFlags.READWRITE
    ),
}

def __init__(self):
    super().__init__()
    self.model_engine = ''
    self.confidence_threshold = 0.5
    self.engine = None

def do_get_property(self, prop):
    if prop.name == 'model-engine':
        return self.model_engine
    elif prop.name == 'confidence-threshold':
        return self.confidence_threshold

def do_set_property(self, prop, value):
    if prop.name == 'model-engine':
        self.model_engine = value
    elif prop.name == 'confidence-threshold':
        self.confidence_threshold = value

def do_start(self):
    # Load your TensorRT engine here
    self.engine = load_engine(self.model_engine) # This function should be implemented
    return True

def do_transform_ip(self, gst_buffer: Gst.Buffer) -> Gst.FlowReturn:
    """In-place transform: attach metadata, pass buffer unchanged."""
    buffer = Buffer(gst_buffer)
    batch_meta = buffer.batch_meta

    frames = []
    for frame_meta in batch_meta.frame_items:
        t = torch.utils.dlpack.from_dlpack(buffer.extract(frame_meta.batch_id))
        frames.append(t)
    batch = torch.stack(frames, dim=0)

    # Run your model inference
    results = self.engine(batch)
    
    # Now we will need to iterate over the results for each frame
    # and attach it to the object_meta in case it is detection/segmentation
    # otherwise we can do it as user_meta
    # The following is pseudocode, which depends on your inference
    for frame_meta in batch_meta.frame_items:
        for det in results:
            obj = batch_meta.acquire_object_meta()
            # Fill the obj with each detection
            ...
            frame_meta.append(obj)

    return Gst.FlowReturn.OK


# --- Registration ---
GObject.type_register(GstExamplePlugin)
__gstelementfactory__ = (GST_PLUGIN_NAME, Gst.RANK_NONE, GstExamplePlugin)

A few important details about this basic structure:

GstBase.BaseTransform is the ideal base class for an in-place filter — one that takes a buffer, modifies it (by adding metadata), and sends it downstream. We override do_transform_ip instead of do_transform since we aren’t creating a new output buffer.

__gstmetadata__ and __gsttemplates__ are mandatory. GStreamer will not register the element without them. The caps string video/x-raw(memory:NVMM) informs GStreamer that this element handles NVIDIA memory, which is critical for keeping data on the GPU within a DeepStream pipeline.

__gproperties__ makes model-engine and confidence-threshold available as native GStreamer properties, allowing you to configure them via a gst-launch command or from Python pipeline code without modifying the source.

The final two lines are necessary for registration: GObject.type_register registers the class with the GObject type system, and __gstelementfactory__ tells GStreamer which element name to expose and which class to create.

Checking the plugin. After placing the file and clearing the cache, confirm registration by running:

GST_PLUGIN_PATH=/path/to/your/plugins gst-inspect-1.0 gstexampleplugin

You should see the element metadata, pad templates, and both properties displayed. If they appear, GStreamer recognizes your plugin and you can integrate it into a pipeline.

End-to-End Inference Example Using Ultralytics

With the plugin structure ready, it’s time to implement the inference logic. The complete working code is provided as a GitHub Gist. Once it’s accessible, you can inspect it as shown earlier or run the pipeline. Below is a simple example that runs inference and shows the fps:

gst-launch-1.0 -v 
  nvstreammux name=m width=1280 height=720 batch-size=1 
    batched-push-timeout=33000 ! 
  nvvideoconvert nvbuf-memory-type=0 ! 
  'video/x-raw(memory:NVMM), format=RGB' ! 
  gstyoloplugin model-path=/path/to/yolo26s.engine ! 
  fpsdisplaysink text-overlay=false silent=false sync=false 
    video-sink=fakesink 
  uridecodebin uri=file:///path/to/video.mp4 ! m.sink_0

Reviewing the Code

Compatibility Issue

If you’ve looked at the code, you may have noticed that we are overriding the tuple object, but only within the ultralytics.nn.backends.tensorrt module, since that is where the issue occurs. There is a known compatibility problem between the TensorRT Python bindings and the GStreamer Python wrapper framework (PyGObject) that can cause your pipeline to crash with the well-known error “Segmentation fault (core dumped).” This is why the following code snippet was needed to maintain the expected behavior:

import ultralytics.nn.backends.tensorrt as trt_backend

_original_tuple = tuple

def safe_tuple(obj):
    if "tensorrt" in type(obj).__module__ and type(obj).__name__ == "Dims":
        return _original_tuple(obj[i] for i in range(len(obj)))
    return _original_tuple(obj)

trt_backend.tuple = safe_tuple

This swaps the tuple reference inside the Ultralytics backend’s namespace at runtime with a version that uses index-based access for Dims objects, leaving all other behavior unchanged. It’s not a clean solution, but it’s precise and must be executed at import time, before any model is loaded.

The Inference Loop

The inference loop itself is fairly simple:

Extract the frames from the buffers
Preprocess + inference
Link the results to the object metadata of each frame if the downstream pipeline elements are DeepStream plugins.

Below is a code snippet demonstrating zero-copy data transfer using DLPack:

frames = []
for frame_meta in batch_meta.frame_items:
    t = torch.utils.dlpack.from_dlpack(buffer.extract(frame_meta.batch_id))
    frames.append(t)
batch = torch.stack(frames, dim=0)

Preparing the Input Data

YOLO models, when receiving a torch.Tensor, require a fixed input shape of (N, 3, 640, 640) as specified in the documentation. However, frames output by nvstreammux will match the resolution of your source video. The solution used is letterboxing: resizing the frame to fit within the target dimensions while maintaining the original aspect ratio, then filling the remaining space with padding. The key advantage is that this entire process can be performed on the GPU, across the entire batch simultaneously, without any need to access CPU memory.

With frame extraction, letterboxing, inference, and coordinate mapping all executed on the GPU within a single do_transform_ip call, the plugin functions identically to nvinfer from the perspective of all downstream components, while offering the full flexibility of a Python-based inference stack underneath.

From this point onward, the rest of the DeepStream pipeline handles the remaining tasks: nvtracker assigns tracking IDs, nvdsosd renders visual overlays, and nvmsgconv serializes the output payloads.

Key Takeaways and Future Directions

If you’ve made it this far, you now have a functional approach for substituting nvinfer with your own custom Python inference element, and more importantly, you understand the reasoning behind each design decision.

This approach is broadly applicable. Everything covered here: the plugin framework, batched preprocessing, and metadata linking is independent of any specific model. Replacing Ultralytics YOLO with Roboflow’s rfdetr is a simple swap, and the GStreamer and pyservicemaker infrastructure remains unchanged. The same applies to more complex architectures: NVIDIA’s deepstream_reference_apps repository contains a practical example of integrating a Vision-Language Model via vLLM using this exact plugin method, which is highly recommended for anyone looking to move beyond object detection into advanced video understanding.

The complete plugin code is available as a GitHub Gist. If you develop something based on it: whether a different model, a multi-stream configuration, or a VLM integration, I’d love to learn about your experience. Happy coding!

Top Posts

Kevin Warsh’s Dollar Dilemma: Why Bitcoin Thrives Without a Helmsman

In Other News: Apple Fixes Beats Eavesdropping Bug, DOT Wraps Up Delta CrowdStrike Investigation, AWS Continuum

AI’s Insatiable Hunger: Reinventing Data Centre Power and Cooling for the Next Era

Building a Custom GStreamer Plugin for NVIDIA DeepStream

Python 3.14 Unleashed: Inside the Revolutionary JIT Compiler Changing Everything

NVIDIA AI Introduce SpatialClaw: A Training-Free Agent That Treats Code as the Action Interface for Spatial Reasoning

Understanding Loss Functions: How AI Learns From Its Mistakes

SAMJ: Accelerating Image Annotation in ImageJ/Fiji with the Segment Anything Model

My $15 Smart Switch Test Exposed a $1,500 Annual Energy Leak in the Kitchen

Computer Vision Deployments Propel Retail Productivity to New Heights

Kevin Warsh’s Dollar Dilemma: Why Bitcoin Thrives Without a Helmsman

In Other News: Apple Fixes Beats Eavesdropping Bug, DOT Wraps Up Delta CrowdStrike Investigation, AWS Continuum

AI’s Insatiable Hunger: Reinventing Data Centre Power and Cooling for the Next Era

NTT DOCOMO BUSINESS and Transatel Unveil SASE-Driven Cellular IoT Connectivity

Building a Custom GStreamer Plugin for NVIDIA DeepStream

Python 3.14 Unleashed: Inside the Revolutionary JIT Compiler Changing Everything

“Ephemeral Cloudflare Identities: The Rise of AI Agent Account Fleets”

Harnessing the Sun for My Doorbell: How a $17 Solar Panel Delivered Infinite Battery Life

Trending

Kevin Warsh’s Dollar Dilemma: Why Bitcoin Thrives Without a Helmsman

In Other News: Apple Fixes Beats Eavesdropping Bug, DOT Wraps Up Delta CrowdStrike Investigation, AWS Continuum

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Building a Custom GStreamer Plugin for NVIDIA DeepStream

DeepStream Metadata

The Python Bridge: pyservicemaker

A Discoverable Python GStreamer Plugin

The Plugin Skeleton

End-to-End Inference Example Using Ultralytics

Reviewing the Code

Compatibility Issue

The Inference Loop

Preparing the Input Data

Key Takeaways and Future Directions

Related Posts