LLM Agents | nfyio

LLM Agents provide direct interaction with language models without retrieval. Unlike RAG agents, they don’t search a document corpus — they rely on the model’s knowledge and your system prompt. Use them for translation, summarization, classification, and any task that doesn’t require document lookup.

Direct LLM Interaction

LLM agents send your messages directly to the model. No embeddings, no vector search, no retrieved context. The model responds based on:

System prompt — Instructions that define behavior and persona
Conversation history — Previous messages in the thread
User input — The current message

This makes LLM agents simpler to configure and faster to respond than RAG agents.

Configuration

System Prompts

The system prompt sets the agent’s behavior. Define role, tone, and constraints:

{
  "name": "translator",
  "type": "llm",
  "systemPrompt": "You are a professional translator. Translate the user's text into the requested language. Preserve tone and formatting. Do not add commentary.",
  "llm": {
    "model": "gpt-4o",
    "temperature": 0.3,
    "maxTokens": 4096
  }
}

Temperature

Controls randomness of outputs:

Value	Behavior
0.0–0.2	Deterministic, focused — good for classification, extraction
0.3–0.5	Balanced — good for translation, summarization
0.6–1.0	Creative — good for brainstorming, open-ended generation

Token Limits

maxTokens — Maximum tokens in the completion. Set based on expected response length.
maxContextTokens — Maximum total context (system + history + user). Defaults to model limit (e.g., 128K for GPT-4o).

Conversation Threads

LLM agents support multi-turn conversations via threads:

Create a thread (or use an existing one)
Send messages with threadId
The agent maintains conversation history for context

# First message
curl -X POST "https://api.yourdomain.com/v1/agents/translator/chat" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "threadId": "thread_abc123",
    "messages": [{"role": "user", "content": "Translate to French: Hello, how are you?"}],
    "stream": true
  }'

# Follow-up in same thread
curl -X POST "https://api.yourdomain.com/v1/agents/translator/chat" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "threadId": "thread_abc123",
    "messages": [{"role": "user", "content": "Now translate that to Spanish"}],
    "stream": true
  }'

SSE Streaming

Responses can be streamed via Server-Sent Events (SSE) for real-time output:

curl -X POST "https://api.yourdomain.com/v1/agents/translator/chat" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "messages": [{"role": "user", "content": "Summarize this article..."}],
    "stream": true
  }'

SSE Event Format

data: {"choices":[{"delta":{"content":"The"},"index":0}]}

data: {"choices":[{"delta":{"content":" article"},"index":0}]}

data: {"choices":[{"delta":{"content":" discusses"},"index":0}]}

data: [DONE]

Parse delta.content to accumulate the streamed response.

Use Cases

Translation

{
  "name": "translator",
  "type": "llm",
  "systemPrompt": "You are a translator. Translate the user's text into the language they specify. Preserve formatting.",
  "llm": {
    "model": "gpt-4o",
    "temperature": 0.2,
    "maxTokens": 4096
  }
}

Summarization

{
  "name": "summarizer",
  "type": "llm",
  "systemPrompt": "Summarize the user's text in 2-3 sentences. Preserve key facts and conclusions.",
  "llm": {
    "model": "gpt-4o",
    "temperature": 0.3,
    "maxTokens": 1024
  }
}

Classification

{
  "name": "classifier",
  "type": "llm",
  "systemPrompt": "Classify the user's text into one of: positive, negative, neutral. Respond with only the label.",
  "llm": {
    "model": "gpt-4o",
    "temperature": 0.0,
    "maxTokens": 10
  }
}

Extraction

{
  "name": "extractor",
  "type": "llm",
  "systemPrompt": "Extract entities from the user's text. Return JSON: {\"people\": [], \"organizations\": [], \"dates\": []}",
  "llm": {
    "model": "gpt-4o",
    "temperature": 0.0,
    "maxTokens": 2048
  }
}

API Reference

Chat Completion

Endpoint	Method	Description
`/v1/agents/{agentId}/chat`	POST	Send messages and get completion

Request Body

Field	Type	Required	Description
`threadId`	string	No	Conversation thread ID
`messages`	array	Yes	Array of `{role, content}`
`stream`	boolean	No	Enable SSE streaming (default: false)

Response (Non-Streaming)

{
  "id": "chatcmpl-xyz",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Bonjour, comment allez-vous ?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 12,
    "total_tokens": 57
  }
}

Next Steps

RAG Agents — Add document retrieval for grounded answers
Workflow Agents — Chain LLM calls with tools and logic
Custom Agents — Build custom agent logic