We implement a complicated, end-to-end Kornia tutorial and show how fashionable, differentiable pc imaginative and prescient could be constructed solely in PyTorch. We begin by setting up GPU-accelerated, synchronized augmentation pipelines for pictures, masks, and keypoints, then transfer into differentiable geometry by optimizing a homography immediately by gradient descent. We additionally present how discovered function matching with LoFTR integrates with Kornia’s RANSAC to estimate sturdy homographies and produce a easy stitched output, even below constrained or offline-safe situations. Lastly, we floor these concepts in apply by coaching a light-weight CNN on CIFAR-10 utilizing Kornia’s GPU augmentations, highlighting how research-grade imaginative and prescient pipelines translate naturally into studying programs. Try the FULL CODES right here.
import os, math, time, random, urllib.request
from dataclasses import dataclass
from typing import Tuple
import sys, subprocess
def pip_install(pkgs):
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + pkgs)
pip_install([
"kornia==0.8.2",
"torch",
"torchvision",
"matplotlib",
"numpy",
"opencv-python-headless"
])
import numpy as np
import torch
import torch.nn as nn
import torch.nn.useful as F
import torchvision
import torchvision.transforms.useful as TF
import matplotlib.pyplot as plt
import cv2
import kornia
import kornia.augmentation as Ok
import kornia.geometry.remodel as KG
from kornia.geometry.ransac import RANSAC
from kornia.function import LoFTR
torch.manual_seed(0)
np.random.seed(0)
random.seed(0)
print("Torch:", torch.__version__)
print("Kornia:", kornia.__version__)
print("System:", gadget)We start by organising a completely reproducible atmosphere, putting in Kornia and its core dependencies to make sure GPU-accelerated, differentiable pc imaginative and prescient runs easily in Google Colab. We then import and manage PyTorch, Kornia, and supporting libraries, establishing a clear basis for geometry, augmentation, and feature-matching workflows. We set the random seed and choose the obtainable compute gadget so that each one subsequent experiments stay deterministic, debuggable, and performance-aware. Try the FULL CODES right here.
def to_tensor_img_uint8(img_bgr_uint8: np.ndarray) -> torch.Tensor:
img_rgb = cv2.cvtColor(img_bgr_uint8, cv2.COLOR_BGR2RGB)
t = torch.from_numpy(img_rgb).permute(2, 0, 1).float() / 255.0
return t.unsqueeze(0)
def present(img_t: torch.Tensor, title: str = "", max_size: int = 900):
x = img_t.detach().float().cpu().clamp(0, 1)
if x.form[1] == 1:
x = x.repeat(1, 3, 1, 1)
x = x[0].permute(1, 2, 0).numpy()
h, w = x.form[:2]
scale = min(1.0, max_size / max(h, w))
if scale < 1.0:
x = cv2.resize(x, (int(w * scale), int(h * scale)), interpolation=cv2.INTER_AREA)
plt.determine(figsize=(7, 5))
plt.imshow(x)
plt.axis("off")
plt.title(title)
plt.present()
def show_mask(mask_t: torch.Tensor, title: str = ""):
x = mask_t.detach().float().cpu().clamp(0, 1)[0, 0].numpy()
plt.determine(figsize=(6, 4))
plt.imshow(x)
plt.axis("off")
plt.title(title)
plt.present()
def obtain(url: str, path: str):
os.makedirs(os.path.dirname(path), exist_ok=True)
if not os.path.exists(path):
urllib.request.urlretrieve(url, path)
def safe_download(url: str, path: str) -> bool:
attempt:
os.makedirs(os.path.dirname(path), exist_ok=True)
if not os.path.exists(path):
urllib.request.urlretrieve(url, path)
return True
besides Exception as e:
print("Obtain failed:", e)
return False
def make_grid_mask(h: int, w: int, cell: int = 32) -> torch.Tensor:
yy, xx = torch.meshgrid(torch.arange(h), torch.arange(w), indexing="ij")
m = (((yy // cell) % 2) ^ ((xx // cell) % 2)).float()
return m.unsqueeze(0).unsqueeze(0)
def draw_matches(img0_rgb: np.ndarray, img1_rgb: np.ndarray, pts0: np.ndarray, pts1: np.ndarray, max_draw: int = 200) -> np.ndarray:
h0, w0 = img0_rgb.form[:2]
h1, w1 = img1_rgb.form[:2]
out = np.zeros((max(h0, h1), w0 + w1, 3), dtype=np.uint8)
out[:h0, :w0] = img0_rgb
out[:h1, w0:w0+w1] = img1_rgb
n = min(len(pts0), len(pts1), max_draw)
if n == 0:
return out
idx = np.random.selection(len(pts0), measurement=n, change=False) if len(pts0) > n else np.arange(n)
for i in idx:
x0, y0 = pts0[i]
x1, y1 = pts1[i]
x1_shift = x1 + w0
p0 = (int(spherical(x0)), int(spherical(y0)))
p1 = (int(spherical(x1_shift)), int(spherical(y1)))
cv2.circle(out, p0, 2, (255, 255, 255), -1, lineType=cv2.LINE_AA)
cv2.circle(out, p1, 2, (255, 255, 255), -1, lineType=cv2.LINE_AA)
cv2.line(out, p0, p1, (255, 255, 255), 1, lineType=cv2.LINE_AA)
return out
def normalize_img_for_loftr(img_rgb01: torch.Tensor) -> torch.Tensor:
if img_rgb01.form[1] == 3:
return kornia.colour.rgb_to_grayscale(img_rgb01)
return img_rgb01We outline a set of reusable helper utilities for picture conversion, visualization, secure knowledge downloading, and artificial masks era, holding the imaginative and prescient pipeline clear and modular. We additionally implement sturdy visualization and matching helpers that permit us to examine augmented pictures, masks, and LoFTR correspondences immediately throughout experimentation. We normalize picture inputs to the precise tensor codecs anticipated by Kornia and LoFTR, guaranteeing that each one downstream geometry and feature-matching parts function persistently and appropriately. Try the FULL CODES right here.
print("n[1] Differentiable augmentations: picture + masks + keypoints")
B, C, H, W = 1, 3, 256, 384
img = torch.rand(B, C, H, W, gadget=gadget)
masks = make_grid_mask(H, W, cell=24).to(gadget)
kps = torch.tensor([[
[40.0, 40.0],
[W - 50.0, 50.0],
[W * 0.6, H * 0.8],
[W * 0.25, H * 0.65],
]], gadget=gadget)
aug = Ok.AugmentationSequential(
Ok.RandomResizedCrop((224, 224), scale=(0.6, 1.0), ratio=(0.8, 1.25), p=1.0),
Ok.RandomHorizontalFlip(p=0.5),
Ok.RandomRotation(levels=18.0, p=0.7),
Ok.ColorJiggle(0.2, 0.2, 0.2, 0.1, p=0.8),
data_keys=["input", "mask", "keypoints"],
same_on_batch=True
).to(gadget)
img_aug, mask_aug, kps_aug = aug(img, masks, kps)
print("picture:", tuple(img.form), "->", tuple(img_aug.form))
print("masks :", tuple(masks.form), "->", tuple(mask_aug.form))
print("kps :", tuple(kps.form), "->", tuple(kps_aug.form))
print("Instance keypoints (earlier than -> after):")
print(torch.cat([kps[0], kps_aug[0]], dim=1))
present(img, "Authentic (artificial)")
show_mask(masks, "Authentic masks (artificial)")
present(img_aug, "Augmented (synced)")
show_mask(mask_aug, "Augmented masks (synced)")We assemble a synchronized, absolutely differentiable augmentation pipeline that applies the identical geometric transformations to pictures, masks, and keypoints on the GPU. We generate artificial knowledge to obviously show how spatial consistency is preserved throughout modalities whereas nonetheless introducing lifelike variability by cropping, rotation, flipping, and colour jitter. We visualize the before-and-after outcomes to confirm that the augmented pictures, segmentation masks, and keypoints stay completely aligned after transformation. Try the FULL CODES right here.
print("n[2] Differentiable homography alignment by optimization")
base = torch.rand(1, 1, 240, 320, gadget=gadget)
present(base, "Base picture (grayscale)")
true_H_px = torch.eye(3, gadget=gadget).unsqueeze(0)
true_H_px[:, 0, 2] = 18.0
true_H_px[:, 1, 2] = -12.0
true_H_px[:, 0, 1] = 0.03
true_H_px[:, 1, 0] = -0.02
true_H_px[:, 2, 0] = 1e-4
true_H_px[:, 2, 1] = -8e-5
goal = KG.warp_perspective(base, true_H_px, dsize=(base.form[-2], base.form[-1]), align_corners=True)
present(goal, "Goal (base warped by true homography)")
p = torch.zeros(1, 8, gadget=gadget, requires_grad=True)
def params_to_H(p8: torch.Tensor) -> torch.Tensor:
Bp = p8.form[0]
Hm = torch.eye(3, gadget=p8.gadget).unsqueeze(0).repeat(Bp, 1, 1)
Hm[:, 0, 0] = 1.0 + p8[:, 0]
Hm[:, 0, 1] = p8[:, 1]
Hm[:, 0, 2] = p8[:, 2]
Hm[:, 1, 0] = p8[:, 3]
Hm[:, 1, 1] = 1.0 + p8[:, 4]
Hm[:, 1, 2] = p8[:, 5]
Hm[:, 2, 0] = p8[:, 6]
Hm[:, 2, 1] = p8[:, 7]
return Hm
choose = torch.optim.Adam([p], lr=0.08)
losses = []
for step in vary(120):
choose.zero_grad(set_to_none=True)
H_est = params_to_H(p)
pred = KG.warp_perspective(base, H_est, dsize=(base.form[-2], base.form[-1]), align_corners=True)
loss_photo = (pred - goal).abs().imply()
loss_reg = 1e-3 * (p ** 2).imply()
loss = loss_photo + loss_reg
loss.backward()
choose.step()
losses.append(loss.merchandise())
print("Last loss:", losses[-1])
plt.determine(figsize=(6,4))
plt.plot(losses)
plt.title("Homography optimization loss")
plt.xlabel("step")
plt.ylabel("loss")
plt.present()
H_est_final = params_to_H(p.detach())
pred_final = KG.warp_perspective(base, H_est_final, dsize=(base.form[-2], base.form[-1]), align_corners=True)
present(pred_final, "Recovered warp (optimized)")
present((pred_final - goal).abs(), "Abs error (recovered vs goal)")
print("True H (pixel):n", true_H_px.squeeze(0).detach().cpu().numpy())
print("Est H:n", H_est_final.squeeze(0).detach().cpu().numpy())We show that geometric alignment could be handled as a differentiable optimization downside by immediately recovering a homography through gradient descent. We first generate a goal picture by warping a base picture with a recognized homography after which be taught the transformation parameters by minimizing a photometric reconstruction loss with regularization. Additionally, we visualize the optimized warp and error map to verify that the estimated homography carefully matches the ground-truth transformation. Try the FULL CODES right here.
print("n[3] LoFTR matching + RANSAC homography + stitching (403-safe)")
data_dir = "/content material/kornia_demo"
os.makedirs(data_dir, exist_ok=True)
img0_path = os.path.be a part of(data_dir, "img0.png")
img1_path = os.path.be a part of(data_dir, "img1.png")
ok0 = safe_download(
"
img0_path
)
ok1 = safe_download(
"
img1_path
)
if not (ok0 and ok1):
print("⚠️ Utilizing artificial fallback pictures (no community / blocked downloads)")
base_rgb = torch.rand(1, 3, 480, 640, gadget=gadget)
H_syn = torch.tensor([[
[1.0, 0.05, 40.0],
[-0.03, 1.0, 25.0],
[1e-4, -8e-5, 1.0]
]], gadget=gadget)
t0 = base_rgb
t1 = KG.warp_perspective(base_rgb, H_syn, dsize=(480, 640), align_corners=True)
img0_rgb = (t0[0].permute(1,2,0).detach().cpu().numpy() * 255).astype(np.uint8)
img1_rgb = (t1[0].permute(1,2,0).detach().cpu().numpy() * 255).astype(np.uint8)
else:
img0_bgr = cv2.imread(img0_path, cv2.IMREAD_COLOR)
img1_bgr = cv2.imread(img1_path, cv2.IMREAD_COLOR)
if img0_bgr is None or img1_bgr is None:
elevate RuntimeError("Didn't load downloaded pictures.")
img0_rgb = cv2.cvtColor(img0_bgr, cv2.COLOR_BGR2RGB)
img1_rgb = cv2.cvtColor(img1_bgr, cv2.COLOR_BGR2RGB)
t0 = to_tensor_img_uint8(img0_bgr).to(gadget)
t1 = to_tensor_img_uint8(img1_bgr).to(gadget)
present(t0, "Picture 0")
present(t1, "Picture 1")
g0 = normalize_img_for_loftr(t0)
g1 = normalize_img_for_loftr(t1)
loftr = LoFTR(pretrained="out of doors").to(gadget).eval()
with torch.inference_mode():
correspondences = loftr({"image0": g0, "image1": g1})
mkpts0 = correspondences["keypoints0"]
mkpts1 = correspondences["keypoints1"]
mconf = correspondences.get("confidence", None)
print("Uncooked matches:", mkpts0.form[0])
if mkpts0.form[0] < 8:
elevate RuntimeError("Too few matches to estimate homography.")
if mconf will not be None:
mconf = mconf.detach()
topk = min(2000, mkpts0.form[0])
idx = torch.topk(mconf, ok=topk, largest=True).indices
mkpts0 = mkpts0[idx]
mkpts1 = mkpts1[idx]
print("Saved high matches:", mkpts0.form[0])
ransac = RANSAC(
model_type="homography",
inl_th=3.0,
batch_size=4096,
max_iter=10,
confidence=0.999,
max_lo_iters=5
).to(gadget)
with torch.inference_mode():
H01, inliers = ransac(mkpts0, mkpts1)
print("Estimated H form:", tuple(H01.form))
print("Inliers:", int(inliers.sum().merchandise()), "/", int(inliers.numel()))
vis = draw_matches(
img0_rgb,
img1_rgb,
mkpts0.detach().cpu().numpy(),
mkpts1.detach().cpu().numpy(),
max_draw=250
)
plt.determine(figsize=(10,5))
plt.imshow(vis)
plt.axis("off")
plt.title("LoFTR matches (subset)")
plt.present()
H01 = H01.unsqueeze(0) if H01.ndim == 2 else H01
warped0 = KG.warp_perspective(t0, H01, dsize=(t1.form[-2], t1.form[-1]), align_corners=True)
stitched = torch.max(warped0, t1)
present(warped0, "Image0 warped into Image1 body (through RANSAC homography)")
present(stitched, "Easy stitched mix (max)")We carry out discovered function matching utilizing LoFTR to determine dense correspondences between two pictures, whereas guaranteeing robustness by a network-safe fallback mechanism. We then apply Kornia’s RANSAC to estimate a secure homography from these matches and warp one picture into the coordinate body of the opposite. We visualize the correspondences and produce a easy stitched consequence to validate the geometric alignment end-to-end. Try the FULL CODES right here.
print("n[4] Mini coaching loop with Kornia augmentations (quick subset)")
cifar = torchvision.datasets.CIFAR10(root="/content material/knowledge", prepare=True, obtain=True)
num_samples = 4096
indices = np.random.permutation(len(cifar))[:num_samples]
subset = torch.utils.knowledge.Subset(cifar, indices.tolist())
def collate(batch):
imgs = []
labels = []
for im, y in batch:
imgs.append(TF.to_tensor(im))
labels.append(y)
return torch.stack(imgs, 0), torch.tensor(labels)
loader = torch.utils.knowledge.DataLoader(
subset, batch_size=256, shuffle=True, num_workers=2, pin_memory=True, collate_fn=collate
)
aug_train = Ok.ImageSequential(
Ok.RandomHorizontalFlip(p=0.5),
Ok.RandomAffine(levels=12.0, translate=(0.08, 0.08), scale=(0.9, 1.1), p=0.7),
Ok.ColorJiggle(0.2, 0.2, 0.2, 0.1, p=0.8),
Ok.RandomGaussianBlur((3, 3), (0.1, 1.5), p=0.3),
).to(gadget)
class TinyCifarNet(nn.Module):
def __init__(self, num_classes=10):
tremendous().__init__()
self.conv1 = nn.Conv2d(3, 48, 3, padding=1)
self.conv2 = nn.Conv2d(48, 96, 3, padding=1)
self.conv3 = nn.Conv2d(96, 128, 3, padding=1)
self.head = nn.Linear(128, num_classes)
def ahead(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2)
x = F.relu(self.conv3(x))
x = x.imply(dim=(-2, -1))
return self.head(x)
mannequin = TinyCifarNet().to(gadget)
choose = torch.optim.AdamW(mannequin.parameters(), lr=2e-3, weight_decay=1e-4)
mannequin.prepare()
t_start = time.time()
working = []
for it, (xb, yb) in enumerate(loader):
xb = xb.to(gadget, non_blocking=True)
yb = yb.to(gadget, non_blocking=True)
xb = aug_train(xb)
logits = mannequin(xb)
loss = F.cross_entropy(logits, yb)
choose.zero_grad(set_to_none=True)
loss.backward()
choose.step()
working.append(loss.merchandise())
if (it + 1) % 10 == 0:
print(f"iter {it+1:03d}/{len(loader)} | loss {np.imply(working[-10:]):.4f}")
if it >= 39:
break
print("Performed in", spherical(time.time() - t_start, 2), "sec")
plt.determine(figsize=(6,4))
plt.plot(working)
plt.title("Coaching loss (fast demo)")
plt.xlabel("iteration")
plt.ylabel("loss")
plt.present()
xb0, yb0 = subsequent(iter(loader))
xb0 = xb0[:8].to(gadget)
xbA = aug_train(xb0)
def tile8(x):
x = x.detach().cpu().clamp(0,1)
grid = torchvision.utils.make_grid(x, nrow=4)
return grid.permute(1,2,0).numpy()
plt.determine(figsize=(10,5))
plt.imshow(tile8(xb0))
plt.axis("off")
plt.title("CIFAR batch (authentic)")
plt.present()
plt.determine(figsize=(10,5))
plt.imshow(tile8(xbA))
plt.axis("off")
plt.title("CIFAR batch (Kornia-augmented on GPU)")
plt.present()
print("n✅ Tutorial full.")
print("Subsequent concepts:")
print("- Feathered stitching (mushy masks) as an alternative of max-blend.")
print("- Examine LoFTR vs DISK/LightGlue utilizing kornia.function.")
print("- Multi-scale homography optimization + SSIM/Charbonnier losses.")We show how Kornia’s GPU-based augmentations combine immediately into a typical coaching loop by making use of them on the fly to a subset of the CIFAR-10 dataset. We prepare a light-weight convolutional community end-to-end, demonstrating that differentiable augmentations incur minimal overhead whereas enhancing knowledge range. Ultimately, we visualize authentic versus augmented batches to verify that the transformations are utilized persistently and effectively throughout studying.
In conclusion, we demonstrated that Kornia allows a unified imaginative and prescient workflow the place knowledge augmentation, geometric reasoning, function matching, and studying stay differentiable and GPU-friendly inside a single framework. By combining LoFTR matching, RANSAC-based homography estimation, and optimization-driven alignment with a sensible coaching loop, we confirmed how classical imaginative and prescient and deep studying complement one another somewhat than compete. It serves as a basis for extending towards production-grade stitching, sturdy pose estimation, or large-scale coaching pipelines, and we emphasize that the identical patterns we used right here scale naturally to extra complicated, real-world imaginative and prescient programs.
Try the FULL CODES right here. Additionally, be at liberty to comply with us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you’ll be able to be a part of us on telegram as nicely.



