LLM Agents

Direct LLM interaction without retrieval. Configure system prompts, temperature, token limits, conversation threads, and SSE streaming for translation, summarization, and classification.

LLM Agents provide direct interaction with language models without retrieval. Unlike RAG agents, they don’t search a document corpus — they rely on the model’s knowledge and your system prompt. Use them for translation, summarization, classification, and any task that doesn’t require document lookup.

Direct LLM Interaction

LLM agents send your messages directly to the model. No embeddings, no vector search, no retrieved context. The model responds based on:

  • System prompt — Instructions that define behavior and persona
  • Conversation history — Previous messages in the thread
  • User input — The current message

This makes LLM agents simpler to configure and faster to respond than RAG agents.

Configuration

System Prompts

The system prompt sets the agent’s behavior. Define role, tone, and constraints:

{
  "name": "translator",
  "type": "llm",
  "systemPrompt": "You are a professional translator. Translate the user's text into the requested language. Preserve tone and formatting. Do not add commentary.",
  "llm": {
    "model": "gpt-4o",
    "temperature": 0.3,
    "maxTokens": 4096
  }
}

Temperature

Controls randomness of outputs:

ValueBehavior
0.0–0.2Deterministic, focused — good for classification, extraction
0.3–0.5Balanced — good for translation, summarization
0.6–1.0Creative — good for brainstorming, open-ended generation

Token Limits

  • maxTokens — Maximum tokens in the completion. Set based on expected response length.
  • maxContextTokens — Maximum total context (system + history + user). Defaults to model limit (e.g., 128K for GPT-4o).

Conversation Threads

LLM agents support multi-turn conversations via threads:

  1. Create a thread (or use an existing one)
  2. Send messages with threadId
  3. The agent maintains conversation history for context
# First message
curl -X POST "https://api.yourdomain.com/v1/agents/translator/chat" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "threadId": "thread_abc123",
    "messages": [{"role": "user", "content": "Translate to French: Hello, how are you?"}],
    "stream": true
  }'

# Follow-up in same thread
curl -X POST "https://api.yourdomain.com/v1/agents/translator/chat" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "threadId": "thread_abc123",
    "messages": [{"role": "user", "content": "Now translate that to Spanish"}],
    "stream": true
  }'

SSE Streaming

Responses can be streamed via Server-Sent Events (SSE) for real-time output:

curl -X POST "https://api.yourdomain.com/v1/agents/translator/chat" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "messages": [{"role": "user", "content": "Summarize this article..."}],
    "stream": true
  }'

SSE Event Format

data: {"choices":[{"delta":{"content":"The"},"index":0}]}

data: {"choices":[{"delta":{"content":" article"},"index":0}]}

data: {"choices":[{"delta":{"content":" discusses"},"index":0}]}

data: [DONE]

Parse delta.content to accumulate the streamed response.

Use Cases

Translation

{
  "name": "translator",
  "type": "llm",
  "systemPrompt": "You are a translator. Translate the user's text into the language they specify. Preserve formatting.",
  "llm": {
    "model": "gpt-4o",
    "temperature": 0.2,
    "maxTokens": 4096
  }
}

Summarization

{
  "name": "summarizer",
  "type": "llm",
  "systemPrompt": "Summarize the user's text in 2-3 sentences. Preserve key facts and conclusions.",
  "llm": {
    "model": "gpt-4o",
    "temperature": 0.3,
    "maxTokens": 1024
  }
}

Classification

{
  "name": "classifier",
  "type": "llm",
  "systemPrompt": "Classify the user's text into one of: positive, negative, neutral. Respond with only the label.",
  "llm": {
    "model": "gpt-4o",
    "temperature": 0.0,
    "maxTokens": 10
  }
}

Extraction

{
  "name": "extractor",
  "type": "llm",
  "systemPrompt": "Extract entities from the user's text. Return JSON: {\"people\": [], \"organizations\": [], \"dates\": []}",
  "llm": {
    "model": "gpt-4o",
    "temperature": 0.0,
    "maxTokens": 2048
  }
}

API Reference

Chat Completion

EndpointMethodDescription
/v1/agents/{agentId}/chatPOSTSend messages and get completion

Request Body

FieldTypeRequiredDescription
threadIdstringNoConversation thread ID
messagesarrayYesArray of {role, content}
streambooleanNoEnable SSE streaming (default: false)

Response (Non-Streaming)

{
  "id": "chatcmpl-xyz",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Bonjour, comment allez-vous ?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 12,
    "total_tokens": 57
  }
}

Next Steps