LLM Agents
Direct LLM interaction without retrieval. Configure system prompts, temperature, token limits, conversation threads, and SSE streaming for translation, summarization, and classification.
LLM Agents provide direct interaction with language models without retrieval. Unlike RAG agents, they don’t search a document corpus — they rely on the model’s knowledge and your system prompt. Use them for translation, summarization, classification, and any task that doesn’t require document lookup.
Direct LLM Interaction
LLM agents send your messages directly to the model. No embeddings, no vector search, no retrieved context. The model responds based on:
- System prompt — Instructions that define behavior and persona
- Conversation history — Previous messages in the thread
- User input — The current message
This makes LLM agents simpler to configure and faster to respond than RAG agents.
Configuration
System Prompts
The system prompt sets the agent’s behavior. Define role, tone, and constraints:
{
"name": "translator",
"type": "llm",
"systemPrompt": "You are a professional translator. Translate the user's text into the requested language. Preserve tone and formatting. Do not add commentary.",
"llm": {
"model": "gpt-4o",
"temperature": 0.3,
"maxTokens": 4096
}
}
Temperature
Controls randomness of outputs:
| Value | Behavior |
|---|---|
| 0.0–0.2 | Deterministic, focused — good for classification, extraction |
| 0.3–0.5 | Balanced — good for translation, summarization |
| 0.6–1.0 | Creative — good for brainstorming, open-ended generation |
Token Limits
- maxTokens — Maximum tokens in the completion. Set based on expected response length.
- maxContextTokens — Maximum total context (system + history + user). Defaults to model limit (e.g., 128K for GPT-4o).
Conversation Threads
LLM agents support multi-turn conversations via threads:
- Create a thread (or use an existing one)
- Send messages with
threadId - The agent maintains conversation history for context
# First message
curl -X POST "https://api.yourdomain.com/v1/agents/translator/chat" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"threadId": "thread_abc123",
"messages": [{"role": "user", "content": "Translate to French: Hello, how are you?"}],
"stream": true
}'
# Follow-up in same thread
curl -X POST "https://api.yourdomain.com/v1/agents/translator/chat" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"threadId": "thread_abc123",
"messages": [{"role": "user", "content": "Now translate that to Spanish"}],
"stream": true
}'
SSE Streaming
Responses can be streamed via Server-Sent Events (SSE) for real-time output:
curl -X POST "https://api.yourdomain.com/v1/agents/translator/chat" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"messages": [{"role": "user", "content": "Summarize this article..."}],
"stream": true
}'
SSE Event Format
data: {"choices":[{"delta":{"content":"The"},"index":0}]}
data: {"choices":[{"delta":{"content":" article"},"index":0}]}
data: {"choices":[{"delta":{"content":" discusses"},"index":0}]}
data: [DONE]
Parse delta.content to accumulate the streamed response.
Use Cases
Translation
{
"name": "translator",
"type": "llm",
"systemPrompt": "You are a translator. Translate the user's text into the language they specify. Preserve formatting.",
"llm": {
"model": "gpt-4o",
"temperature": 0.2,
"maxTokens": 4096
}
}
Summarization
{
"name": "summarizer",
"type": "llm",
"systemPrompt": "Summarize the user's text in 2-3 sentences. Preserve key facts and conclusions.",
"llm": {
"model": "gpt-4o",
"temperature": 0.3,
"maxTokens": 1024
}
}
Classification
{
"name": "classifier",
"type": "llm",
"systemPrompt": "Classify the user's text into one of: positive, negative, neutral. Respond with only the label.",
"llm": {
"model": "gpt-4o",
"temperature": 0.0,
"maxTokens": 10
}
}
Extraction
{
"name": "extractor",
"type": "llm",
"systemPrompt": "Extract entities from the user's text. Return JSON: {\"people\": [], \"organizations\": [], \"dates\": []}",
"llm": {
"model": "gpt-4o",
"temperature": 0.0,
"maxTokens": 2048
}
}
API Reference
Chat Completion
| Endpoint | Method | Description |
|---|---|---|
/v1/agents/{agentId}/chat | POST | Send messages and get completion |
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
threadId | string | No | Conversation thread ID |
messages | array | Yes | Array of {role, content} |
stream | boolean | No | Enable SSE streaming (default: false) |
Response (Non-Streaming)
{
"id": "chatcmpl-xyz",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Bonjour, comment allez-vous ?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 45,
"completion_tokens": 12,
"total_tokens": 57
}
}
Next Steps
- RAG Agents — Add document retrieval for grounded answers
- Workflow Agents — Chain LLM calls with tools and logic
- Custom Agents — Build custom agent logic