AI Generation
Generate AI Answer
Generate AI-powered answers using your data as context or direct LLM calls.
POST
Overview
Generate AI-powered answers with two modes:- Search Mode — provide a
namespacename; Moorcheh searches your text namespace and uses retrieved chunks as RAG context - Direct AI Mode — set
namespaceto""(empty string) for a direct LLM call without retrieval
LLM providers: Ollama, OpenAI, or Cohere. Configure once with
moorcheh configure (saved under llm in ~/.moorcheh/config.json). Override the model per request with ai_model.After changing LLM (or embedding) settings, run
moorcheh down then moorcheh up so the running server loads the new config. moorcheh configure does not restart Docker. Check GET /health for llm_provider and llm_model.Headers
Must be
application/jsonBody Parameters
The user’s question or query to be answered
Namespace name for Search Mode, or empty string
"" for Direct AI ModeNumber of top relevant chunks for your query (Search Mode only). Clamped to 1–100.
Minimum relevance score threshold (0–1). Required when
kiosk_mode is true.AI creativity level (0.0–2.0). Higher = more creative.
Search type for RAG. Only
"text" is supported on-prem.Override the configured LLM model for this request
When
true, threshold is required and chunks below the threshold are filtered out (Search Mode).Previous conversation turns:
[{"role":"user"|"assistant","content":"..."}]Custom system instruction prepended to the prompt
Custom instruction appended before the user query
When set with
enabled: true, the API parses JSON from the model into structured_data. Optional: schema (JSON Schema object).Available LLM models (configure defaults)
| Provider | Example model IDs | Notes |
|---|---|---|
| ollama | qwen2.5, llama3.2, mistral | Local; no API key |
| openai | gpt-5.5, gpt-5, gpt-4o-mini | Requires API key in config |
| cohere | command-a-plus-05-2026, command-r-plus-08-2024, command-r-08-2024 | Requires API key in config |
Response Fields
The AI-generated answer text
The LLM model ID used for generation
Number of context chunks retrieved (Search Mode).
0 in Direct AI Mode.The original query submitted
Present when
structured_response is enabled: whether RAG context was used.Parsed JSON when
structured_response.enabled is true.Important Notes
- Search Mode requires a text namespace with indexed documents
- Direct AI Mode uses only LLM fields (
namespace,query,temperature,chat_history, prompts,ai_model,structured_response) - Configure LLM provider and default model with
moorcheh configureor edit~/.moorcheh/config.json /healthreportsllm_providerandllm_modelalongside embedding settings