Skip to main content
POST
/
answer
curl -X POST "http://localhost:8080/answer" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Who won the football match?",
    "query_vector": [0.01, -0.02, "... 768 floats ..."],
    "top_k": 5
  }'
{
  "answer": "Manchester United beat Chelsea 2-1.",
  "model": "llama3.2:1b-instruct-q4_K_M",
  "query": "Who won the football match?",
  "context_count": 1,
  "sources": [
    {
      "id": "doc-1",
      "score": 0.894123,
      "label": "Close Match",
      "text": "Manchester United beat Chelsea 2-1 in the Premier League on Saturday."
    }
  ]
}

Overview

Run retrieval-augmented generation (RAG) on the local store:
  1. Search using the provided query_vector (same dimension as the store).
  2. Build a prompt from the top matching passages.
  3. Call Ollama on the host with fixed model llama3.2:1b-instruct-q4_K_M.
The CLI and SDK embed query locally before calling this endpoint.
Requires Ollama running on the host. Start the stack with moorcheh-edge up (use --skip-ollama for search-only).

Request body

query
string
required
Original question text (included in the LLM prompt and echoed in the response).
query_vector
array
required
JSON array of floats used for similarity search. Length must match the store dimension (768 for text stores).
top_k
number
default:"5"
Number of passages to retrieve for context. Capped at 100.
threshold
number
default:"0"
Minimum search score when kiosk_mode is true.
kiosk_mode
boolean
default:"false"
When true, filters retrieved passages below threshold.
header_prompt
string
Optional system instruction (replaces the default RAG system prompt).
Optional instruction appended before the user question in the final user message.
chat_history
array
Prior turns: [{"role": "user"|"assistant", "content": "..."}].
temperature
number
default:"0.2"
LLM sampling temperature (0.0–2.0).
curl -X POST "http://localhost:8080/answer" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Who won the football match?",
    "query_vector": [0.01, -0.02, "... 768 floats ..."],
    "top_k": 5
  }'
{
  "answer": "Manchester United beat Chelsea 2-1.",
  "model": "llama3.2:1b-instruct-q4_K_M",
  "query": "Who won the football match?",
  "context_count": 1,
  "sources": [
    {
      "id": "doc-1",
      "score": 0.894123,
      "label": "Close Match",
      "text": "Manchester United beat Chelsea 2-1 in the Premier League on Saturday."
    }
  ]
}

Response fields

FieldDescription
answerGenerated answer text
modelLLM model id (llama3.2:1b-instruct-q4_K_M)
queryEcho of the request question
context_countNumber of passages passed to the LLM
sourcesSearch hits used as context (same shape as /search results)

Errors

ConditionStatusMessage (example)
LLM not configured400LLM is not configured: start Ollama on the host and run moorcheh-edge up
LLM unreachable or error400LLM request failed
Empty store / no matches200Answer may state insufficient context