A Manufacturing-Fashion NetworKit 11.2.1 Coding Tutorial For Massive-Scale Graph Analytics, Communities, Cores, And Sparsification

On this tutorial, we implement a production-grade, large-scale graph analytics pipeline in NetworKit, specializing in pace, reminiscence effectivity, and version-safe APIs in NetworKit 11.2.1. We generate a large-scale free community, extract the biggest related part, after which compute structural spine indicators through k-core decomposition and centrality rating. We additionally detect communities with PLM and quantify high quality utilizing modularity; estimate distance construction utilizing efficient and estimated diameters; and, lastly, sparsify the graph to cut back value whereas preserving key properties. We export the sparsified graph as an edgelist so we will reuse it in downstream workflows, benchmarking, or graph ML preprocessing.

!pip -q set up networkit pandas numpy psutil


import gc, time, os
import numpy as np
import pandas as pd
import psutil
import networkit as nk


print("NetworKit:", nk.__version__)
nk.setNumberOfThreads(min(2, nk.getMaxNumberOfThreads()))
nk.setSeed(7, False)


def ram_gb():
   p = psutil.Course of(os.getpid())
   return p.memory_info().rss / (1024**3)


def tic():
   return time.perf_counter()


def toc(t0, msg):
   print(f"{msg}: {time.perf_counter()-t0:.3f}s | RAM~{ram_gb():.2f} GB")


def report(G, title):
   print(f"n[{name}] nodes={G.numberOfNodes():,} edges={G.numberOfEdges():,} directed={G.isDirected()} weighted={G.isWeighted()}")


def force_cleanup():
   gc.acquire()


PRESET = "LARGE"


if PRESET == "LARGE":
   N = 120_000
   M_ATTACH = 6
   AB_EPS = 0.12
   ED_RATIO = 0.9
elif PRESET == "XL":
   N = 250_000
   M_ATTACH = 6
   AB_EPS = 0.15
   ED_RATIO = 0.9
else:
   N = 80_000
   M_ATTACH = 6
   AB_EPS = 0.10
   ED_RATIO = 0.9


print(f"nPreset={PRESET} | N={N:,} | m={M_ATTACH} | approx-betweenness epsilon={AB_EPS}")

We arrange the Colab setting with NetworKit and monitoring utilities, and we lock in a secure random seed. We configure thread utilization to match the runtime and outline timing and RAM-tracking helpers for every main stage. We select a scale preset that controls graph measurement and approximation knobs so the pipeline stays giant however manageable.

t0 = tic()
G = nk.turbines.BarabasiAlbertGenerator(M_ATTACH, N).generate()
toc(t0, "Generated BA graph")
report(G, "G")


t0 = tic()
cc = nk.elements.ConnectedComponents(G)
cc.run()
toc(t0, "ConnectedComponents")
print("components:", cc.numberOfComponents())


if cc.numberOfComponents() > 1:
   t0 = tic()
   G = nk.graphtools.extractLargestConnectedComponent(G, compactGraph=True)
   toc(t0, "Extracted LCC (compactGraph=True)")
   report(G, "LCC")


force_cleanup()

We generate a big Barabási–Albert graph and instantly log its measurement and runtime footprint. We compute related elements to know fragmentation and shortly diagnose topology. We extract the biggest related part and compact it to enhance the remainder of the pipeline’s efficiency and reliability.

t0 = tic()
core = nk.centrality.CoreDecomposition(G)
core.run()
toc(t0, "CoreDecomposition")
core_vals = np.array(core.scores(), dtype=np.int32)
print("degeneracy (max core):", int(core_vals.max()))
print("core stats:", pd.Collection(core_vals).describe(percentiles=[0.5, 0.9, 0.99]).to_dict())


k_thr = int(np.percentile(core_vals, 97))


t0 = tic()
nodes_backbone = [u for u in range(G.numberOfNodes()) if core_vals[u] >= k_thr]
G_backbone = nk.graphtools.subgraphFromNodes(G, nodes_backbone)
toc(t0, f"Backbone subgraph (k>={k_thr})")
report(G_backbone, "Backbone")


force_cleanup()


t0 = tic()
pr = nk.centrality.PageRank(G, damp=0.85, tol=1e-8)
pr.run()
toc(t0, "PageRank")


pr_scores = np.array(pr.scores(), dtype=np.float64)
top_pr = np.argsort(-pr_scores)[:15]
print("Top PageRank nodes:", top_pr.tolist())
print("Top PageRank scores:", pr_scores[top_pr].tolist())


t0 = tic()
abw = nk.centrality.ApproxBetweenness(G, epsilon=AB_EPS)
abw.run()
toc(t0, "ApproxBetweenness")


abw_scores = np.array(abw.scores(), dtype=np.float64)
top_abw = np.argsort(-abw_scores)[:15]
print("Top ApproxBetweenness nodes:", top_abw.tolist())
print("Top ApproxBetweenness scores:", abw_scores[top_abw].tolist())


force_cleanup()

We compute the core decomposition to measure degeneracy and determine the community’s high-density spine. We extract a spine subgraph utilizing a excessive core-percentile threshold to give attention to structurally necessary nodes. We run PageRank and approximate betweenness to rank nodes by affect and bridge-like habits at scale.

t0 = tic()
plm = nk.group.PLM(G, refine=True, gamma=1.0, par="balanced")
plm.run()
toc(t0, "PLM community detection")


half = plm.getPartition()
num_comms = half.numberOfSubsets()
print("communities:", num_comms)


t0 = tic()
Q = nk.group.Modularity().getQuality(half, G)
toc(t0, "Modularity")
print("modularity Q:", Q)


sizes = np.array(checklist(half.subsetSizeMap().values()), dtype=np.int64)
print("community size stats:", pd.Collection(sizes).describe(percentiles=[0.5, 0.9, 0.99]).to_dict())


t0 = tic()
eff = nk.distance.EffectiveDiameter(G, ED_RATIO)
eff.run()
toc(t0, f"EffectiveDiameter (ratio={ED_RATIO})")
print("effective diameter:", eff.getEffectiveDiameter())


t0 = tic()
diam = nk.distance.EstimatedDiameter(G)
diam.run()
toc(t0, "EstimatedDiameter")
print("estimated diameter:", diam.getDiameter().distance)


force_cleanup()

We detect communities utilizing PLM and file the variety of communities discovered on the big graph. We compute modularity and summarize community-size statistics to validate the construction quite than merely trusting the partition. We estimate international distance habits utilizing efficient diameter and estimated diameter in an API-safe means for NetworKit 11.2.1.

t0 = tic()
sp = nk.sparsification.LocalSimilaritySparsifier(G, 0.7)
G_sparse = sp.getSparsifiedGraph()
toc(t0, "LocalSimilarity sparsification (alpha=0.7)")
report(G_sparse, "Sparse")


t0 = tic()
pr2 = nk.centrality.PageRank(G_sparse, damp=0.85, tol=1e-8)
pr2.run()
toc(t0, "PageRank on sparse")
pr2_scores = np.array(pr2.scores(), dtype=np.float64)
print("Top PR nodes (sparse):", np.argsort(-pr2_scores)[:15].tolist())


t0 = tic()
plm2 = nk.group.PLM(G_sparse, refine=True, gamma=1.0, par="balanced")
plm2.run()
toc(t0, "PLM on sparse")
part2 = plm2.getPartition()
Q2 = nk.group.Modularity().getQuality(part2, G_sparse)
print("communities (sparse):", part2.numberOfSubsets(), "| modularity (sparse):", Q2)


t0 = tic()
eff2 = nk.distance.EffectiveDiameter(G_sparse, ED_RATIO)
eff2.run()
toc(t0, "EffectiveDiameter on sparse")
print("effective diameter (orig):", eff.getEffectiveDiameter(), "| (sparse):", eff2.getEffectiveDiameter())


force_cleanup()


out_path = "/content/networkit_large_sparse.edgelist"
t0 = tic()
nk.graphio.EdgeListWriter("t", 0).write(G_sparse, out_path)
toc(t0, "Wrote edge list")
print("Saved:", out_path)


print("nAdvanced large-graph pipeline complete.")

We sparsify the graph utilizing native similarity to cut back the variety of edges whereas retaining helpful construction for downstream analytics. We rerun PageRank, PLM, and efficient diameter on the sparsified graph to test whether or not key indicators stay constant. We export the sparsified graph as an edgelist so we will reuse it throughout periods, instruments, or extra experiments.

In conclusion, we developed an end-to-end, scalable NetworKit workflow that mirrors actual large-network evaluation: we began from technology, stabilized the topology with LCC extraction, characterised the construction by way of cores and centralities, found communities and validated them with modularity, and captured international distance habits by way of diameter estimates. We then utilized sparsification to shrink the graph whereas preserving it analytically significant and saving it for repeatable pipelines. The tutorial offers a sensible template we will reuse for actual datasets by changing the generator with an edgelist reader, whereas preserving the identical evaluation phases, efficiency monitoring, and export steps.

Try the Full Codes right here. Additionally, be at liberty to observe us on Twitter and don’t neglect to hitch our 120k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you possibly can be a part of us on telegram as effectively.

Top Posts

CMMC Listening Sessions: DoD Hears Questions as Plans Take Shape

Sensing the Skies: IoT’s Silent Revolution in Aerospace Safety Checks

5 Agentic AI Power-Ups: Unlock Free Intelligence Now

A Manufacturing-Fashion NetworKit 11.2.1 Coding Tutorial for Massive-Scale Graph Analytics, Communities, Cores, and Sparsification

Dale-Proof AI Learns Perfect MNIST, Near-CIFAR-10 Vision—No Backpropagation Needed

Unlock Peak Performance: Your Command Protocol for GPT-5.6 Synergy

Beyond the Main Branch: Streamlining AI Workflows with Git Worktrees

The AI Safety Capital Rising: Beyond Silicon Valley’s Shadow

The Agent Security Chasm: 54% of Enterprises Battling AI Breaches While Credentials Freely Roam

Unleashing Kimi K3: The 2.8 Trillion-Parameter Open MoE Powerhouse with Delta Attention and 1M Context Horizon

CMMC Listening Sessions: DoD Hears Questions as Plans Take Shape

Sensing the Skies: IoT’s Silent Revolution in Aerospace Safety Checks

5 Agentic AI Power-Ups: Unlock Free Intelligence Now

Dale-Proof AI Learns Perfect MNIST, Near-CIFAR-10 Vision—No Backpropagation Needed

Critical WordPress Zero-Day: Unauthenticated Code Execution Exposed in WP2Shell Flaw

Bolivia’s Bold Crypto Play: USDT Adoption Sparks AI Mining Debate

General Dynamics Fires Back: DISA’s Enclave Cloud Expansion Sparks Contract Clash

Wireless Logic Bolsters US IoT Reach with Strategic SIMETRY Acquisition

Trending

CMMC Listening Sessions: DoD Hears Questions as Plans Take Shape

Sensing the Skies: IoT’s Silent Revolution in Aerospace Safety Checks

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

A Manufacturing-Fashion NetworKit 11.2.1 Coding Tutorial for Massive-Scale Graph Analytics, Communities, Cores, and Sparsification

Related Posts