Embeddings & Vector Search
What are embeddings, supported models (OpenAI, Voyage AI), auto-indexing on object upload, pgvector similarity search, chunk strategies, re-indexing, and usage metering.
Embeddings are dense vector representations of text. NFYio uses them to power semantic search: similar texts have similar vectors, so you can find relevant documents by vector similarity instead of keyword matching. Embeddings are stored in pgvector and support auto-indexing, configurable chunk strategies, and usage metering.
What are Embeddings?
An embedding is a fixed-size vector of numbers that captures semantic meaning. For example:
- “refund policy” and “return policy” → similar vectors (close in vector space)
- “refund policy” and “weather forecast” → dissimilar vectors (far apart)
Models like OpenAI’s text-embedding-3-small convert text into these vectors. NFYio stores them in PostgreSQL with the pgvector extension for fast similarity search.
Supported Models
| Model | Provider | Dimensions | Max Tokens | Use Case |
|---|---|---|---|---|
text-embedding-3-small | OpenAI | 1536 | 8191 | Fast, cost-effective |
text-embedding-3-large | OpenAI | 3072 | 8191 | Higher quality |
voyage-3.5-lite | Voyage AI | 1024 | 16000 | Long documents |
Configuring the Embedding Model
{
"embedding": {
"model": "text-embedding-3-small",
"dimensions": 1536
}
}
For text-embedding-3-large, you can optionally reduce dimensions (e.g., 256, 1024) for smaller indexes and faster search.
Auto-Indexing on Object Upload
When you upload objects to a configured bucket, NFYio automatically:
- Detects new or updated objects (via S3 events or polling)
- Loads the document (PDF, DOCX, TXT, Markdown, images with OCR)
- Chunks the content according to your chunk strategy
- Embeds each chunk with the configured model
- Stores embeddings in pgvector with metadata (bucket, key, chunk index)
Enabling Auto-Indexing
{
"bucket": "my-docs",
"prefix": "knowledge/",
"autoIndex": true,
"embedding": {
"model": "text-embedding-3-small",
"chunkSize": 512,
"chunkOverlap": 64
}
}
Only objects under the specified prefix are indexed. Use prefix: "" to index the entire bucket.
pgvector Similarity Search
NFYio uses pgvector for vector storage and search. Supported distance metrics:
| Metric | Operator | Use Case |
|---|---|---|
| Cosine | <=> | Default, normalized vectors |
| L2 (Euclidean) | <-> | When magnitude matters |
| Inner product | <#> | Pre-normalized vectors |
Query Example
-- Cosine similarity (NFYio default)
SELECT chunk_id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM document_embeddings
WHERE workspace_id = $2
ORDER BY embedding <=> $1::vector
LIMIT 5;
Indexing for Scale
For large corpora, create an HNSW or IVFFlat index:
CREATE INDEX ON document_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
NFYio creates these indexes automatically when you configure a bucket for embedding.
Chunk Strategies
How you chunk documents affects retrieval quality:
Fixed Token Chunks
| Strategy | Chunk Size | Overlap | Best For |
|---|---|---|---|
| Small | 256 | 32 | Fine-grained retrieval, FAQs |
| Medium | 512 | 64 | General purpose (default) |
| Large | 1024 | 128 | Long-form context, narratives |
Semantic Chunking (Experimental)
Split on sentence or paragraph boundaries instead of fixed tokens. Preserves logical units and can improve retrieval for structured documents.
{
"chunkStrategy": "semantic",
"splitOn": "paragraph",
"minChunkSize": 200,
"maxChunkSize": 512
}
Overlap
Overlap between chunks prevents splitting important context across boundaries. Typical overlap: 10–20% of chunk size.
Re-indexing
Re-index when you:
- Change the embedding model
- Change chunk size or strategy
- Fix corrupted or missing embeddings
- Add new document types
Trigger Re-index via API
curl -X POST "https://api.yourdomain.com/v1/buckets/my-docs/reindex" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prefix": "knowledge/",
"full": true
}'
| Option | Description |
|---|---|
full: true | Re-embed all documents (destructive) |
full: false | Only process new/updated objects since last index |
prefix | Limit to objects under this prefix |
Incremental Updates
By default, NFYio performs incremental indexing: only new or modified objects are processed. Deleted objects have their embeddings removed.
Analytics and Usage Metering
NFYio tracks embedding usage for billing and analytics:
| Metric | Description |
|---|---|
embedding_tokens | Total tokens embedded |
embedding_requests | Number of embedding API calls |
search_queries | Number of similarity searches |
indexed_documents | Documents in the vector store |
indexed_chunks | Total chunks stored |
Usage API
curl "https://api.yourdomain.com/v1/usage/embeddings?workspaceId=ws_123&from=2026-03-01&to=2026-03-31" \
-H "Authorization: Bearer $TOKEN"
{
"workspaceId": "ws_123",
"period": "2026-03-01 to 2026-03-31",
"embeddingTokens": 1250000,
"embeddingRequests": 4200,
"searchQueries": 8500,
"indexedDocuments": 1200,
"indexedChunks": 45000
}
Best Practices
Chunk Size
- Start with 512 tokens and 64 overlap
- Use smaller chunks (256) for precise retrieval; larger (1024) for narrative context
Model Selection
text-embedding-3-smallfor most use casestext-embedding-3-largewhen quality is criticalvoyage-3.5-litefor very long documents (16K context)
Index Maintenance
- Run incremental re-index regularly if documents change often
- Monitor
indexed_chunksgrowth; consider archiving old documents
Similarity Threshold
- Filter low-similarity results (e.g.,
similarity < 0.7) to reduce noise - Tune per use case; support chatbots may need lower thresholds than strict Q&A
Next Steps
- RAG Agents — Use embeddings in RAG pipelines
- Agent Tools — Document search tool uses embeddings
- Storage Overview — Bucket configuration for ingestion