Wednesday, July 22

Artificial Intelligence

A Practical Guide to Building a Semantic, Hybrid, Sparse, and Quantized Vector Search System with pgvector

By CarterMay 28, 2026Updated:May 28, 2026No Comments8 Mins Read

A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System

In this step-by-step guide, we create a fully functional pgvector environment within Google Colab and demonstrate how PostgreSQL can serve as a robust vector database for cutting-edge AI use cases. We begin by setting up PostgreSQL, compiling the pgvector extension from source, establishing connections via Psycopg, and registering vector types to ensure seamless integration with Python. Next, we generate embeddings using SentenceTransformers, store them in PostgreSQL, construct HNSW indexes, and perform advanced operations such as semantic search, filtered queries, comparisons across distance metrics, half-precision storage, binary quantization, sparse vector lookups, hybrid retrieval, and vector aggregation. This hands-on workflow reveals how pgvector enables practical solutions for retrieval-augmented generation, recommendation engines, similarity matching, and hybrid search—all built with open-source components.

Copy CodeCopiedUse a different Browser

import os
import subprocess
import sys
import time
def sh(cmd: str, check: bool = True):
   """Run a shell command, streaming a compact log."""
   print(f"  $ {cmd}")
   return subprocess.run(cmd, shell=True, check=check,
                         stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
print("[0/10] Installing PostgreSQL + building pgvector (≈1–2 min)...")
sh("apt-get -qq update")
sh("apt-get -qq install -y postgresql postgresql-contrib "
  "postgresql-server-dev-all build-essential git")
if not os.path.exists("/tmp/pgvector"):
   sh("git clone --depth 1  /tmp/pgvector")
sh("cd /tmp/pgvector && make && make install")
sh("service postgresql start")
time.sleep(3)
sh("""sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres';" """)
print("[0/10] Installing Python packages...")
sh(f"{sys.executable} -m pip install -q pgvector psycopg[binary] "
  f"sentence-transformers numpy")

Here, we configure the full PostgreSQL + pgvector stack inside Colab. We install system dependencies, clone and compile pgvector from source, launch the PostgreSQL service, set the database password, and prepare Python libraries for interacting with the database and handling vector data.

Copy CodeCopiedUse a different Browser

import numpy as np
import psycopg
from pgvector import HalfVector, SparseVector
from pgvector.psycopg import register_vector
from sentence_transformers import SentenceTransformer
print("n[1/10] Connecting and enabling the 'vector' extension...")
conn = psycopg.connect(
   "host=127.0.0.1 port=5432 dbname=postgres user=postgres password=postgres",
   autocommit=True,
)
conn.execute("CREATE EXTENSION IF NOT EXISTS vector")
register_vector(conn)
ver = conn.execute("SELECT extversion FROM pg_extension WHERE extname="vector"").fetchone()[0]
print(f"      pgvector version: {ver}")
print("n[2/10] Loading embedding model + encoding corpus...")
model = SentenceTransformer("all-MiniLM-L6-v2")
DIM = model.get_sentence_embedding_dimension()
corpus = [
   ("Octopuses have three hearts and blue blood.",             "animals"),
   ("Transformers revolutionized natural language processing.","technology"),
   ("Quantum computers exploit superposition and entanglement.","technology"),
   ("GPUs accelerate deep learning by parallelizing matrix math.","technology"),
   ("Sourdough bread relies on wild yeast and lactobacilli.",  "food"),
   ("Dark chocolate contains flavonoid antioxidants.",         "food"),
   ("A black hole's gravity is so strong light cannot escape.","space")
]
contents   = [c for c, _ in corpus]
categories = [k for _, k in corpus]
embeddings = model.encode(contents, normalize_embeddings=True)
conn.execute("DROP TABLE IF EXISTS documents")
conn.execute(f"""
   CREATE TABLE documents (
       id        bigserial PRIMARY KEY,
       content   text,
       category  text,
       embedding vector({DIM})
   )
""")
with conn.cursor() as cur:
   cur.executemany(
       "INSERT INTO documents (content, category, embedding) VALUES (%s, %s, %s)",
       list(zip(contents, categories, [np.asarray(e) for e in embeddings])),
   )
print(f"      Inserted {len(corpus)} documents with {DIM}-d embeddings.")

We establish a connection to PostgreSQL, activate the pgvector extension, and register vector support so Psycopg can handle natively. Then, we load a SentenceTransformers model, define a small multi-domain text collection, produce normalized embeddings, and create a table to store each document alongside its category and vector. These vectors are then inserted into the database, laying the foundation for semantic search operations.

Copy CodeCopiedUse a different Browser

print("n[3/10] Building HNSW index and running semantic search...")
conn.execute(
   "CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) "
   "WITH (m = 16, ef_construction = 64)"
)
conn.execute("SET hnsw.ef_search = 100")
def semantic_search(query: str, k: int = 4):
   q = np.asarray(model.encode(query, normalize_embeddings=True))
   return conn.execute(
       "SELECT content, category, embedding <=> %s AS distance "
       "FROM documents ORDER BY distance LIMIT %s",
       (q, k),
   ).fetchall()
for content, cat, dist in semantic_search("animals that are unusually quick"):
   print(f"      {dist:.3f}  [{cat:<10}] {content}")
print("n[4/10] Filtered search (only category = 'space')...")
q = np.asarray(model.encode("objects with extreme gravity", normalize_embeddings=True))
rows = conn.execute(
   "SELECT content, embedding <=> %s AS distance "
   "FROM documents WHERE category = %s ORDER BY distance LIMIT 3",
   (q, "space"),
)
).fetchall()
for content, dist in rows:
   print(f"      {dist:.3f}  {content}")
print("n[5/10] Same query under different distance metrics (top hit each)...")
q = np.asarray(model.encode("brewing a hot caffeinated drink", normalize_embeddings=True))
for op, label in [("<->", "L2"), ("<=>", "cosine"), ("<#>", "neg-inner"), ("<+>", "L1")]:
   content, score = conn.execute(
       f"SELECT content, embedding {op} %s AS s FROM documents ORDER BY s LIMIT 1", (q,)
   ).fetchone()
   print(f"      {label:<10} {score:+.3f}  {content}")

We construct an HNSW index on the document embeddings to accelerate nearest-neighbor lookups and execute a semantic search using cosine similarity. Then, we demonstrate a filtered search constrained to the “space” category, followed by a comparison of how the same query ranks results differently under four distinct distance metrics: L2, cosine, negative inner product, and L1.

Here is the paraphrased version of the text:

We construct an HNSW index on the embedding column to speed up vector searches. A semantic search function is defined to transform queries into embeddings and locate the most similar documents through cosine similarity. Additionally, we carry out metadata-filtered searches and examine various pgvector distance operators, such as L2, cosine, negative inner product, and L1.

Copy CodeCopiedUse a different Browser

print("n[6/10] Half-precision storage with halfvec...")
conn.execute(f"ALTER TABLE documents ADD COLUMN IF NOT EXISTS embedding_half halfvec({DIM})")
conn.execute("UPDATE documents SET embedding_half = embedding::halfvec")
conn.execute(
   "CREATE INDEX ON documents USING hnsw (embedding_half halfvec_cosine_ops)"
)
q_half = HalfVector(model.encode("the galaxy we live in", normalize_embeddings=True))
rows = conn.execute(
   "SELECT content, embedding_half <=> %s AS d FROM documents ORDER BY d LIMIT 2",
   (q_half,),
).fetchall()
for content, d in rows:
   print(f"      {d:.3f}  {content}")
print("n[7/10] Binary quantization (Hamming) + exact re-rank...")
conn.execute(
   f"CREATE INDEX ON documents "
   f"USING hnsw ((binary_quantize(embedding)::bit({DIM})) bit_hamming_ops)"
)
q = np.asarray(model.encode("parallel hardware for AI training", normalize_embeddings=True))
rerank_sql = f"""
   SELECT content, candidates.embedding <=> %(q)s AS exact_distance
   FROM (
       SELECT content, embedding
       FROM documents
       ORDER BY binary_quantize(embedding)::bit({DIM})
             <~> binary_quantize(%(q)s)::bit({DIM})
       LIMIT 8
   ) AS candidates
   ORDER BY exact_distance
   LIMIT 3
"""
for content, d in conn.execute(rerank_sql, {"q": q}).fetchall():
   print(f"      {d:.3f}  {content}")
print("n[8/10] Native sparse vectors...")
conn.execute("DROP TABLE IF EXISTS sparse_items")
conn.execute("CREATE TABLE sparse_items (id bigserial PRIMARY KEY, embedding sparsevec(10))")
sparse_data = [
   SparseVector({0: 1.0, 3: 2.0, 7: 1.5}, 10),
   SparseVector({1: 0.5, 3: 1.0, 9: 3.0}, 10),
   SparseVector({0: 0.2, 4: 2.5, 7: 0.8}, 10),
]
with conn.cursor() as cur:
   cur.executemany("INSERT INTO sparse_items (embedding) VALUES (%s)",
                   [(v,) for v in sparse_data])
query_sparse = SparseVector({0: 1.0, 7: 1.0}, 10)
rows = conn.execute(
   "SELECT id, embedding, embedding <#> %s AS neg_ip "
   "FROM sparse_items ORDER BY neg_ip LIMIT 3",
   (query_sparse,),
).fetchall()
for _id, vec, neg_ip in rows:
   print(f"      id={_id}  inner_product={-neg_ip:.2f}  nnz_indices={vec.indices()}")

We delve into sophisticated pgvector storage and retrieval methods that go beyond conventional dense vectors. Embeddings are transformed into half-precision vectors to cut down on storage requirements, binary quantization with Hamming search is applied for quick candidate fetching, and results are re-ranked using full-precision vectors. Sparse vectors are also constructed and queried via inner-product similarity, which proves beneficial for keyword-weighted or SPLADE-style retrieval.

Copy CodeCopiedUse a different Browser

print("n[9/10] Hybrid search (vector + full-text) via RRF...")
user_query = "fast animal"
qvec = np.asarray(model.encode(user_query, normalize_embeddings=True))
hybrid_sql = """
WITH semantic AS (
   SELECT id, RANK() OVER (ORDER BY embedding <=> %(qvec)s) AS rank
   FROM documents
   ORDER BY embedding <=> %(qvec)s
   LIMIT 20
),
keyword AS (
   SELECT d.id,
          RANK() OVER (ORDER BY ts_rank_cd(to_tsvector('english', d.content), q) DESC) AS rank
   FROM documents d, plainto_tsquery('english', %(qtext)s) AS q
   WHERE to_tsvector('english', d.content) @@ q
   LIMIT 20
)
SELECT d.content,
      COALESCE(1.0 / (60 + semantic.rank), 0.0)
    + COALESCE(1.0 / (60 + keyword.rank),  0.0) AS rrf_score
FROM documents d
LEFT JOIN semantic ON d.id = semantic.id
LEFT JOIN keyword  ON d.id = keyword.id
WHERE semantic.id IS NOT NULL OR keyword.id IS NOT NULL
ORDER BY rrf_score DESC
LIMIT 4
"""
for content, score in conn.execute(hybrid_sql, {"qvec": qvec, "qtext": user_query}).fetchall():
   print(f"      {score:.5f}  {content}")
print("n[10/10] Aggregating vectors with AVG (category centroid)...")
centroid = conn.execute(
   "SELECT AVG(embedding) FROM documents WHERE category = %s", ("food",)
).fetchone()[0]
typical = conn.execute(
   "SELECT content, embedding <=> %s AS d FROM documents "
   "WHERE category = %s ORDER BY d LIMIT 1",
   (np.asarray(centroid), "food"),
).fetchone()
print(f"      Centroid dim = {len(centroid)}")
print(f"      Most representative 'food' doc: {typical[0]}")
print("n Done. You now have a working pgvector playground inside Colab.")
print("   Try editing `corpus`, the queries, or swap in your own embedding model.")

Semantic vector search is combined with PostgreSQL full-text search by leveraging Reciprocal Rank Fusion. Results from both semantic and keyword rankings are gathered, their scores are merged, and a more powerful hybrid search outcome is generated. As a final step, the average embedding for a category is calculated and treated as a centroid to identify the most representative document within that category.

To sum up, we now have a fully functional pgvector-driven retrieval system operating entirely within Google Colab, requiring no external services or API keys. PostgreSQL has been utilized not merely as a standard relational database, but as a versatile vector search engine supporting dense vectors, half-precision vectors, binary-quantized retrieval, sparse vectors, full-text search, and aggregation. It was also observed how metadata filtering, HNSW indexing, Reciprocal Rank Fusion, and centroid-based analysis enhance pgvector’s applicability for real-world AI search pipelines.

Explore the complete code and notebook here. Additionally, follow us on Twitter and make sure to become part of our 150k+ ML SubReddit and subscribe to our Newsletter. Waiting for more? You can also join us on Telegram now!

Interested in partnering with us to promote your GitHub Repo, Hugging Face Page, Product Release, or Webinar? Reach out to us

The post A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System appeared first on MarkTechPost.

Leave A Reply