Scalability Guide

Horizontal scaling, database scaling, caching strategies, and multi-region deployment for NFYio.

This guide covers scaling strategies for NFYio at high load: horizontal scaling, database scaling, caching, and multi-region deployment.

Horizontal Scaling

API Gateway

Scale the gateway horizontally behind a load balancer:

# Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfyio-gateway
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nfyio-gateway
  template:
    spec:
      containers:
      - name: gateway
        image: nfyio/gateway:latest
        ports:
        - containerPort: 3000
ComponentScaling Strategy
API GatewayAdd replicas; stateless, scales linearly
Load BalancerRound-robin or least-connections

Storage Nodes

SeaweedFS scales by adding volume nodes:

# Add more volume nodes
seaweedfs-volume-1:
  image: chrislusf/seaweedfs
  command: volume -mserver=seaweedfs-master:9333 -port=8080
seaweedfs-volume-2:
  image: chrislusf/seaweedfs
  command: volume -mserver=seaweedfs-master:9333 -port=8080

Each volume node adds capacity and throughput. The master distributes writes across volumes.

Embedding Workers

For high-volume embedding pipelines, scale worker replicas:

nfyio-embedding-worker:
  deploy:
    replicas: 4
  environment:
    - OPENAI_API_KEY=${OPENAI_API_KEY}
    - BATCH_SIZE=32
SettingImpact
ReplicasThroughput (linear)
BATCH_SIZEAPI efficiency (larger = fewer calls, more memory)

Database Scaling

Read Replicas

Offload read traffic to replicas. Use PostgreSQL streaming replication:

# Primary
postgres-primary:
  image: pgvector/pgvector:pg16
  environment:
    - POSTGRES_REPLICATION_MODE=master

# Read replica
postgres-replica:
  image: pgvector/pgvector:pg16
  environment:
    - POSTGRES_REPLICATION_MODE=slave
    - POSTGRES_MASTER_HOST=postgres-primary

Route read queries (SELECT, list operations) to replicas. Writes go to primary.

Connection Pooling

Use PgBouncer to handle connection spikes:

[databases]
nfyio = host=postgres port=5432 dbname=nfyio

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 50
reserve_pool_size = 25
ParameterPurpose
max_client_connTotal client connections
default_pool_sizeDB connections per database
reserve_pool_sizeExtra for burst

pgvector Optimization

For vector similarity search at scale:

-- IVFFlat index (faster build, good for < 1M vectors)
CREATE INDEX idx_embeddings_ivfflat ON embeddings
  USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- HNSW index (faster query, better recall, slower build)
CREATE INDEX idx_embeddings_hnsw ON embeddings
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);
IndexBuild TimeQuery SpeedRecall
IVFFlatFastGoodGood
HNSWSlowerFasterBetter

Caching Strategies

Redis

Use Redis for session, rate limit, and query caching:

redis:
  image: redis:7-alpine
  command: redis-server --maxmemory 2gb --maxmemory-policy allkeys-lru
Use CaseTTLKey Pattern
Session24hsession:{id}
Rate limit1mratelimit:{key}:{window}
Query cache5mquery:{hash}
Embedding cache24hemb:{hash}

CDN Caching

Cache public objects at the edge:

HeaderEffect
Cache-Control: public, max-age=3600Cache 1 hour
Cache-Control: s-maxage=86400CDN cache 24h
Vary: AuthorizationSeparate cache per auth

Query Result Caching

Cache expensive RAG or list operations:

const cacheKey = `list:${bucket}:${prefix}:${page}`;
let result = await redis.get(cacheKey);
if (!result) {
  result = await s3.listObjectsV2({ Bucket: bucket, Prefix: prefix });
  await redis.setex(cacheKey, 60, JSON.stringify(result));
}
return JSON.parse(result);

Multi-Region Deployment

Architecture

Region A (Primary)          Region B (DR/Read)
┌─────────────────────┐     ┌─────────────────────┐
│ Gateway             │     │ Gateway (read)      │
│ Storage (primary)   │────▶│ Storage (replica)   │
│ PostgreSQL (primary)│───▶│ PostgreSQL (replica)│
│ Redis (primary)     │     │ Redis (replica)     │
└─────────────────────┘     └─────────────────────┘

Considerations

AspectStrategy
Data replicationAsync replication (PostgreSQL, SeaweedFS)
RoutingGeoDNS or latency-based routing
ConsistencyEventually consistent for cross-region reads
FailoverManual or automated (RTO/RPO defined)

Cross-Region Object Replication

For object storage, use replication rules:

{
  "replication": {
    "role": "source",
    "rules": [
      {
        "id": "replicate-to-region-b",
        "status": "enabled",
        "destination": {
          "bucket": "arn:nfyio:storage:region-b::my-bucket",
          "storage_class": "STANDARD"
        },
        "filter": { "prefix": "critical/" }
      }
    ]
  }
}

Scaling Checklist

  • API Gateway replicas behind load balancer
  • Storage volume nodes scaled for capacity
  • Embedding workers scaled for throughput
  • Read replicas for database
  • PgBouncer for connection pooling
  • pgvector index tuned (IVFFlat or HNSW)
  • Redis for sessions and caching
  • CDN for public objects
  • Multi-region plan if required

Next Steps