PostgreSQL Vector Search with pgvector

No new cluster required. Every EasyCloudify PostgreSQL cluster (version 13 or later) ships with pgvector and pgvectorscale pre-installed. Enable them with a single SQL command on any existing database.

What Are pgvector and pgvectorscale?

Extension	Available on	What it adds
`vector` (pgvector)	PostgreSQL 13+	`vector`, `halfvec`, `sparsevec` column types · exact and approximate nearest-neighbour search · HNSW and IVFFlat indexes
`vectorscale` (pgvectorscale)	PostgreSQL 14+	StreamingDiskANN index · Statistical Binary Quantization (SBQ) for large, disk-resident workloads

Both extensions share the same connection string, backups, and read replicas as the rest of your cluster — no separate vector database infrastructure to manage.

Enable the Extensions

Connect to any database inside your cluster with psql (or any SQL client):

-- Required: pgvector (all PG 13+ clusters)
CREATE EXTENSION vector;

-- Optional: pgvectorscale for large workloads (PG 14+ clusters)
CREATE EXTENSION vectorscale;

Verify installation:

\dx
-- Should list: vector, vectorscale

Quick Start — Storing and Querying Embeddings

1. Create a table with a vector column

-- 1536 dimensions = OpenAI text-embedding-3-small / text-embedding-ada-002
-- Adjust to match your embedding model's output dimension
CREATE TABLE documents (
  id        BIGSERIAL PRIMARY KEY,
  content   TEXT        NOT NULL,
  embedding VECTOR(1536)
);

2. Insert embeddings

Generate embeddings in your application pipeline (pgvector does not generate embeddings — pass pre-computed vectors as parameters):

import openai, psycopg2

conn = psycopg2.connect("postgresql://...")
cur = conn.cursor()

response = openai.embeddings.create(
    model="text-embedding-3-small",
    input="EasyCloudify managed PostgreSQL with pgvector"
)
embedding = response.data[0].embedding  # list of 1536 floats

cur.execute(
    "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
    ("EasyCloudify managed PostgreSQL with pgvector", embedding)
)
conn.commit()

3. Build an HNSW index

Create the index after bulk-loading data for best performance:

-- Cosine distance (best for normalised OpenAI / Cohere embeddings)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Euclidean distance (L2)
-- CREATE INDEX ON documents USING hnsw (embedding vector_l2_ops);

-- Inner product (dot product)
-- CREATE INDEX ON documents USING hnsw (embedding vector_ip_ops);

4. Nearest-neighbour search

-- Top-5 most similar documents to a query embedding
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5;

Pass the query embedding as a bound parameter from your application — never concatenate raw vectors into SQL strings.

Hybrid Search (Full-text + Vector)

Combine keyword precision with semantic recall:

SELECT
  d.id,
  d.content,
  ts_rank(to_tsvector('english', d.content), query) AS keyword_score,
  1 - (d.embedding <=> $1::vector)                  AS semantic_score
FROM documents d,
     plainto_tsquery('english', $2) AS query
WHERE to_tsvector('english', d.content) @@ query
ORDER BY semantic_score DESC
LIMIT 10;

Large Workloads with pgvectorscale

For datasets in the tens of millions of vectors or higher, use the diskann index from pgvectorscale instead of HNSW:

-- StreamingDiskANN: low memory footprint, high recall on disk-resident data
CREATE INDEX ON documents USING diskann (embedding);

-- Statistical Binary Quantization for maximum compression
CREATE INDEX ON documents USING diskann (embedding)
  WITH (quantizer = 'SBQ');

Query syntax is identical to pgvector — the index type is transparent to your application.

Choosing the Right Index

Scenario	Recommended index
Dataset fits in RAM, < 5 M rows	HNSW (`vector`)
Dataset exceeds RAM, 5 M – 100 M+ rows	StreamingDiskANN (`diskann`)
Maximum storage compression needed	diskann + SBQ
Exact brute-force search (small tables)	No index (sequential scan)

Upgrade Extension Versions

Extensions are kept at their installed version until you explicitly upgrade:

ALTER EXTENSION vector UPDATE;
ALTER EXTENSION vectorscale UPDATE;

Check current versions:

SELECT name, default_version, installed_version
FROM pg_available_extensions
WHERE name IN ('vector', 'vectorscale');

When to Use PostgreSQL vs OpenSearch for Vectors

	PostgreSQL + pgvector	OpenSearch
Best for	Vectors alongside relational data, existing PG schemas	Hybrid keyword + vector search as a first-class feature, log analytics
Dataset size	Up to tens of millions of rows	Hundreds of millions+
Tooling	Standard SQL, migrations, ORMs	OpenSearch Dashboards, k-NN plugin
Cluster type	Starter or HA	Starter or HA
Extra cost	None — same cluster	Separate OpenSearch subscription

If you already run PostgreSQL for your application data, adding pgvector is almost always the right first choice. Migrate to a dedicated vector store only if you outgrow pgvector's limits.

Connection Example (Node.js / pg)

import { Pool } from 'pg'

const pool = new Pool({ connectionString: process.env.DATABASE_URL })

async function similarDocuments(queryEmbedding: number[], limit = 5) {
  const { rows } = await pool.query(
    `SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
     FROM documents
     ORDER BY embedding <=> $1::vector
     LIMIT $2`,
    [`[${queryEmbedding.join(',')}]`, limit]
  )
  return rows
}

Next Steps

Connect to Your Database — get your connection URI
Database Users & Databases — create a dedicated role for your vector workload
Connection Pooling — PgBouncer for high-concurrency embedding ingestion
Database Monitoring — watch index build progress and query performance

PostgreSQL Vector Search with pgvector

No new cluster required. Every EasyCloudify PostgreSQL cluster (version 13 or later) ships with pgvector and pgvectorscale pre-installed. Enable them with a single SQL command on any existing database.

What Are pgvector and pgvectorscale?

Extension	Available on	What it adds
`vector` (pgvector)	PostgreSQL 13+	`vector`, `halfvec`, `sparsevec` column types · exact and approximate nearest-neighbour search · HNSW and IVFFlat indexes
`vectorscale` (pgvectorscale)	PostgreSQL 14+	StreamingDiskANN index · Statistical Binary Quantization (SBQ) for large, disk-resident workloads

Both extensions share the same connection string, backups, and read replicas as the rest of your cluster — no separate vector database infrastructure to manage.

Enable the Extensions

Connect to any database inside your cluster with psql (or any SQL client):

-- Required: pgvector (all PG 13+ clusters)
CREATE EXTENSION vector;

-- Optional: pgvectorscale for large workloads (PG 14+ clusters)
CREATE EXTENSION vectorscale;

Verify installation:

\dx
-- Should list: vector, vectorscale

Quick Start — Storing and Querying Embeddings

1. Create a table with a vector column

-- 1536 dimensions = OpenAI text-embedding-3-small / text-embedding-ada-002
-- Adjust to match your embedding model's output dimension
CREATE TABLE documents (
  id        BIGSERIAL PRIMARY KEY,
  content   TEXT        NOT NULL,
  embedding VECTOR(1536)
);

2. Insert embeddings

Generate embeddings in your application pipeline (pgvector does not generate embeddings — pass pre-computed vectors as parameters):

import openai, psycopg2

conn = psycopg2.connect("postgresql://...")
cur = conn.cursor()

response = openai.embeddings.create(
    model="text-embedding-3-small",
    input="EasyCloudify managed PostgreSQL with pgvector"
)
embedding = response.data[0].embedding  # list of 1536 floats

cur.execute(
    "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
    ("EasyCloudify managed PostgreSQL with pgvector", embedding)
)
conn.commit()

3. Build an HNSW index

Create the index after bulk-loading data for best performance:

-- Cosine distance (best for normalised OpenAI / Cohere embeddings)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Euclidean distance (L2)
-- CREATE INDEX ON documents USING hnsw (embedding vector_l2_ops);

-- Inner product (dot product)
-- CREATE INDEX ON documents USING hnsw (embedding vector_ip_ops);

4. Nearest-neighbour search

-- Top-5 most similar documents to a query embedding
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5;

Pass the query embedding as a bound parameter from your application — never concatenate raw vectors into SQL strings.

Hybrid Search (Full-text + Vector)

Combine keyword precision with semantic recall:

SELECT
  d.id,
  d.content,
  ts_rank(to_tsvector('english', d.content), query) AS keyword_score,
  1 - (d.embedding <=> $1::vector)                  AS semantic_score
FROM documents d,
     plainto_tsquery('english', $2) AS query
WHERE to_tsvector('english', d.content) @@ query
ORDER BY semantic_score DESC
LIMIT 10;

Large Workloads with pgvectorscale

For datasets in the tens of millions of vectors or higher, use the diskann index from pgvectorscale instead of HNSW:

-- StreamingDiskANN: low memory footprint, high recall on disk-resident data
CREATE INDEX ON documents USING diskann (embedding);

-- Statistical Binary Quantization for maximum compression
CREATE INDEX ON documents USING diskann (embedding)
  WITH (quantizer = 'SBQ');

Query syntax is identical to pgvector — the index type is transparent to your application.

Choosing the Right Index

Scenario	Recommended index
Dataset fits in RAM, < 5 M rows	HNSW (`vector`)
Dataset exceeds RAM, 5 M – 100 M+ rows	StreamingDiskANN (`diskann`)
Maximum storage compression needed	diskann + SBQ
Exact brute-force search (small tables)	No index (sequential scan)

Upgrade Extension Versions

Extensions are kept at their installed version until you explicitly upgrade:

ALTER EXTENSION vector UPDATE;
ALTER EXTENSION vectorscale UPDATE;

Check current versions:

SELECT name, default_version, installed_version
FROM pg_available_extensions
WHERE name IN ('vector', 'vectorscale');

When to Use PostgreSQL vs OpenSearch for Vectors

	PostgreSQL + pgvector	OpenSearch
Best for	Vectors alongside relational data, existing PG schemas	Hybrid keyword + vector search as a first-class feature, log analytics
Dataset size	Up to tens of millions of rows	Hundreds of millions+
Tooling	Standard SQL, migrations, ORMs	OpenSearch Dashboards, k-NN plugin
Cluster type	Starter or HA	Starter or HA
Extra cost	None — same cluster	Separate OpenSearch subscription

If you already run PostgreSQL for your application data, adding pgvector is almost always the right first choice. Migrate to a dedicated vector store only if you outgrow pgvector's limits.

Connection Example (Node.js / pg)

import { Pool } from 'pg'

const pool = new Pool({ connectionString: process.env.DATABASE_URL })

async function similarDocuments(queryEmbedding: number[], limit = 5) {
  const { rows } = await pool.query(
    `SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
     FROM documents
     ORDER BY embedding <=> $1::vector
     LIMIT $2`,
    [`[${queryEmbedding.join(',')}]`, limit]
  )
  return rows
}

Next Steps

Connect to Your Database — get your connection URI
Database Users & Databases — create a dedicated role for your vector workload
Connection Pooling — PgBouncer for high-concurrency embedding ingestion
Database Monitoring — watch index build progress and query performance

PostgreSQL Vector Search with pgvector

What Are pgvector and pgvectorscale?

Enable the Extensions

Quick Start — Storing and Querying Embeddings

1. Create a table with a vector column

2. Insert embeddings

3. Build an HNSW index

4. Nearest-neighbour search

Hybrid Search (Full-text + Vector)

Large Workloads with pgvectorscale

Choosing the Right Index

Upgrade Extension Versions

When to Use PostgreSQL vs OpenSearch for Vectors

Connection Example (Node.js / pg)

Next Steps

Product

Support

Company

Legal

PostgreSQL Vector Search with pgvector

What Are pgvector and pgvectorscale?

Enable the Extensions

Quick Start — Storing and Querying Embeddings

1. Create a table with a vector column

2. Insert embeddings

3. Build an HNSW index

4. Nearest-neighbour search

Hybrid Search (Full-text + Vector)

Large Workloads with pgvectorscale

Choosing the Right Index

Upgrade Extension Versions

When to Use PostgreSQL vs OpenSearch for Vectors

Connection Example (Node.js / pg)

Next Steps