Cost Optimization
Storage lifecycle, compute sizing, AI cost control, and network cost reduction for NFYio.
This guide helps you reduce costs across storage, compute, AI, and network layers in NFYio deployments.
Storage Cost Optimization
Lifecycle Rules
Automatically transition or expire objects to reduce storage costs:
{
"rules": [
{
"id": "archive-old-logs",
"status": "enabled",
"filter": { "prefix": "logs/" },
"transitions": [
{ "days": 30, "storage_class": "STANDARD_IA" },
{ "days": 90, "storage_class": "GLACIER" }
],
"expiration": { "days": 365 }
}
]
}
| Transition | Use Case |
|---|---|
| STANDARD → STANDARD_IA | After 30 days, infrequently accessed |
| STANDARD_IA → GLACIER | After 90 days, archive |
| Expiration | Delete after retention period |
Storage Classes
Choose the right storage class for each workload:
| Class | Cost | Access | Use Case |
|---|---|---|---|
| STANDARD | Highest | Instant | Hot data, active workloads |
| STANDARD_IA | Medium | Instant | Infrequent access |
| GLACIER | Lowest | Hours | Archives, compliance |
# Upload directly to STANDARD_IA
aws s3 cp backup.tar.gz s3://my-bucket/ \
--storage-class STANDARD_IA
Data Tiering
Tier data by access pattern:
| Tier | Data Type | Storage Class |
|---|---|---|
| Hot | Active app data, recent uploads | STANDARD |
| Warm | Logs, backups 30–90 days old | STANDARD_IA |
| Cold | Archives, compliance | GLACIER |
Compute Cost Optimization
Right-Size Containers
Avoid over-provisioning. Start with minimum viable resources and scale up:
# docker-compose - resource limits
nfyio-gateway:
deploy:
resources:
limits:
cpus: '2'
memory: 2G
reservations:
cpus: '0.5'
memory: 512M
| Service | Min (dev) | Recommended (prod) |
|---|---|---|
| Gateway | 0.5 CPU, 512M | 2 CPU, 2G |
| Storage proxy | 0.5 CPU, 512M | 1 CPU, 1G |
| Agent | 1 CPU, 2G | 2 CPU, 4G |
Auto-Scaling
Scale based on load to avoid paying for idle capacity:
# Kubernetes HPA example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nfyio-gateway
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nfyio-gateway
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
AI Cost Optimization
Model Selection
Balance cost vs. quality when choosing models:
| Task | Low Cost | Balanced | High Quality |
|---|---|---|---|
| Embeddings | text-embedding-3-small | voyage-2 | text-embedding-3-large |
| Chat/Completion | gpt-4o-mini | gpt-4o | gpt-4-turbo |
# Use smaller model for embeddings
EMBEDDING_MODEL=text-embedding-3-small
Token Usage Optimization
- Chunk size: Larger chunks = fewer embeddings = lower cost
- Context window: Limit context sent to LLM; summarize when possible
- Caching: Cache embeddings and frequent query results
- Batch processing: Batch embedding requests to reduce API overhead
// Reduce tokens: summarize before sending to LLM
const summary = await summarize(chunks); // Shorter context
const response = await llm.chat([{ role: 'user', content: summary }]);
Embedding Reuse
Re-embed only when documents change. Use content hashes to detect changes:
const contentHash = hash(documentContent);
if (await db.getEmbeddingHash(docId) === contentHash) {
return; // Skip re-embedding
}
Network Cost Optimization
Minimize Inter-Region Transfer
Keep data and compute in the same region. Cross-region transfer is typically more expensive.
| Traffic | Cost |
|---|---|
| Same region | Low / free |
| Cross-region | Higher |
| Internet egress | Highest |
VPC Peering vs. Endpoints
| Option | Use Case | Cost |
|---|---|---|
| VPC Peering | Connect multiple VPCs | No egress for peered traffic |
| VPC Endpoints | Access NFYio privately | Endpoint hourly + no egress |
| Public internet | Dev/testing | Egress charges |
Use VPC Peering when connecting your VPC to NFYio’s VPC. Use endpoints for single-VPC private access.
CDN for Public Content
Serve static/public objects via CDN to reduce origin egress:
User → CDN (edge) → Cache HIT → No origin request
→ Cache MISS → Origin (NFYio) → One-time egress
Cost Optimization Checklist
- Lifecycle rules for old data
- Storage classes matched to access patterns
- Container resources right-sized
- Auto-scaling configured
- Embedding model chosen for cost/quality
- Token usage optimized (chunk size, caching)
- Same-region deployment
- VPC peering/endpoints for private traffic
Next Steps
- Storage Classes — Class options and pricing
- Storage Overview — Lifecycle and versioning
- Scalability Guide — Scaling strategies
- Performance Optimization — Tuning for efficiency