Practical AI January 30, 2026 ⏱ 2 min read

Vector Databases: Pinecone, Weaviate, Chroma

The new database stack for the AI era. What are embeddings, why can't I use SQL, and which Vector DB should I choose?

vector-dbpineconeembeddingsweaviatechromapostgres

Vector Databases: Pinecone, Weaviate, Chroma

In traditional apps, we use SQL (PostgreSQL) or NoSQL (MongoDB). We query for exact matches: WHERE user_id = 123.

In AI apps, we need to query for meaning: WHERE content is similar to 'happy dog'.

To do this, we use Vector Databases.

What is a Vector (Embedding)?

An LLM converts text into a long list of floating-point numbers, typically 1,536 dimensions (for OpenAI) or 768 (for open source). This list represents the semantic meaning of the text.

[0.001, -0.23, 0.88, ...]

“Dog” and “Puppy” will have vectors that are mathematically close (high cosine similarity).
“Dog” and “Tax Return” will be far apart.

The Players

1. Pinecone

The Cloud Native Choice.

Fully managed (SaaS).
Fast, scalable, expensive.
Good for enterprise teams who don’t want to manage servers.

2. Weaviate

The Hybrid Choice.

Open source, but has a cloud offering.
Supports “hybrid search” (Vector + Keyword) out of the box very well.
Has a strong schema definition (like GraphQL).

3. Chroma (ChromaDB)

The Developer Choice.

Pure open source, runs locally easily.
pip install chromadb.
Perfect for prototyping and small-to-medium apps.

4. pgvector (PostgreSQL)

The Pragmatic Choice.

It’s just a Postgres extension!
If you already use Postgres, just add a vector column.
Pro: One database to manage. ACID compliance.
Con: Doesn’t scale to billions of vectors as well as specialized DBs, but fine for 99% of apps.

How to Choose?

Prototyping: Chroma or generic FAISS.
Production (Small/Mid): pgvector (Keep your stack simple).
Production (Huge Scale/Enterprise): Pinecone or Weaviate Cloud.

The Code (Python + Chroma)

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_docs")

# Add documents (automatically embedded)
collection.add(
    documents=["This is a document about cats", "This is about dogs"],
    metadatas=[{"source": "wiki"}, {"source": "wiki"}],
    ids=["id1", "id2"]
)

# Query
results = collection.query(
    query_texts=["animal"],
    n_results=2
)
# Returns both documents because both are animals.

Conclusion

Vector Databases are the long-term memory of AI. Choosing the right one depends on whether you value simplicity (pgvector) or raw performance (Pinecone).

Next: Running LLMs Locally — Take back control.