Skip to main content
POST
/
answer
curl -X POST "http://localhost:8080/answer" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace": "my-documents",
    "query": "What are the main benefits of Moorcheh?",
    "top_k": 5
  }'
{
  "answer": "Serverless architecture offers several benefits...",
  "model": "gpt-4o-mini",
  "context_count": 3,
  "query": "What are the main benefits of using serverless architecture?"
}

Overview

Generate AI-powered answers with two modes:
  • Search Mode — provide a namespace name; Moorcheh searches your text namespace and uses retrieved chunks as RAG context
  • Direct AI Mode — set namespace to "" (empty string) for a direct LLM call without retrieval
LLM providers: Ollama, OpenAI, or Cohere. Configure once with moorcheh configure (saved under llm in ~/.moorcheh/config.json). Override the model per request with ai_model.
After changing LLM (or embedding) settings, run moorcheh down then moorcheh up so the running server loads the new config. moorcheh configure does not restart Docker. Check GET /health for llm_provider and llm_model.
If you change the embedding provider or model while you already have text data under ~/.moorcheh/data, RAG answers for existing namespaces may be wrong until you re-upload with the new embeddings. Changing only the answer (LLM) model does not require re-uploading.

Headers

Content-Type
string
required
Must be application/json

Body Parameters

query
string
required
The user’s question or query to be answered
namespace
string
required
Namespace name for Search Mode, or empty string "" for Direct AI Mode
top_k
number
default:"10"
Number of top relevant chunks for your query (Search Mode only). Clamped to 1–100.
threshold
number
Minimum relevance score threshold (0–1). Required when kiosk_mode is true.
temperature
number
default:"0.7"
AI creativity level (0.0–2.0). Higher = more creative.
type
string
default:"text"
Search type for RAG. Only "text" is supported on-prem.
ai_model
string
Override the configured LLM model for this request
kiosk_mode
boolean
default:"false"
When true, threshold is required and chunks below the threshold are filtered out (Search Mode).
chat_history
array
default:"[]"
Previous conversation turns: [{"role":"user"|"assistant","content":"..."}]
header_prompt
string
Custom system instruction prepended to the prompt
Custom instruction appended before the user query
structured_response
object
When set with enabled: true, the API parses JSON from the model into structured_data. Optional: schema (JSON Schema object).

Available LLM models (configure defaults)

ProviderExample model IDsNotes
ollamaqwen2.5, llama3.2, mistralLocal; no API key
openaigpt-5.5, gpt-5, gpt-4o-miniRequires API key in config
coherecommand-a-plus-05-2026, command-r-plus-08-2024, command-r-08-2024Requires API key in config
curl -X POST "http://localhost:8080/answer" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace": "my-documents",
    "query": "What are the main benefits of Moorcheh?",
    "top_k": 5
  }'
{
  "answer": "Serverless architecture offers several benefits...",
  "model": "gpt-4o-mini",
  "context_count": 3,
  "query": "What are the main benefits of using serverless architecture?"
}

Response Fields

answer
string
The AI-generated answer text
model
string
The LLM model ID used for generation
context_count
number
Number of context chunks retrieved (Search Mode). 0 in Direct AI Mode.
query
string
The original query submitted
used_context
boolean
Present when structured_response is enabled: whether RAG context was used.
structured_data
object
Parsed JSON when structured_response.enabled is true.

Important Notes

  • Search Mode requires a text namespace with indexed documents
  • Direct AI Mode uses only LLM fields (namespace, query, temperature, chat_history, prompts, ai_model, structured_response)
  • Configure LLM provider and default model with moorcheh configure or edit ~/.moorcheh/config.json
  • /health reports llm_provider and llm_model alongside embedding settings