Answer stream (RAG) - Moorcheh Documentation

curl -N -X POST "http://localhost:8080/answer/stream" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Who won the football match?",
    "query_vector": [0.01, -0.02, "... 768 floats ..."],
    "top_k": 5
  }'

event: meta
data: {"model":"llama3.2:1b-instruct-q4_K_M","context_count":1,"sources":[...],"query":"Who won the football match?"}

event: token
data: {"delta":"Manchester"}

event: token
data: {"delta":" United beat Chelsea 2-1."}

event: done
data: {"answer":"Manchester United beat Chelsea 2-1.","query":"Who won the football match?","model":"llama3.2:1b-instruct-q4_K_M","context_count":1}

POST

answer

stream

curl -N -X POST "http://localhost:8080/answer/stream" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Who won the football match?",
    "query_vector": [0.01, -0.02, "... 768 floats ..."],
    "top_k": 5
  }'

event: meta
data: {"model":"llama3.2:1b-instruct-q4_K_M","context_count":1,"sources":[...],"query":"Who won the football match?"}

event: token
data: {"delta":"Manchester"}

event: token
data: {"delta":" United beat Chelsea 2-1."}

event: done
data: {"answer":"Manchester United beat Chelsea 2-1.","query":"Who won the football match?","model":"llama3.2:1b-instruct-q4_K_M","context_count":1}

Overview

Same retrieval and prompting as POST /answer, but the LLM response is streamed as Server-Sent Events (SSE) instead of a single JSON body. Typical event order:

meta - model, sources, and context count (sent once)
token - one or more answer text deltas
done - full answer echo and metadata
error - if the LLM call fails (instead of done)

Requires Ollama running on the host. Start the stack with moorcheh-edge up (use --skip-ollama for search-only).

Request body

Same fields as Answer (RAG):

query

string

required

Original question text (included in the LLM prompt and echoed in the response).

query_vector

array

required

JSON array of floats used for similarity search. Length must match the store dimension (768 for text stores).

top_k

number

default:"5"

Number of passages to retrieve for context. Capped at 100.

threshold

number

default:"0"

Minimum search score when kiosk_mode is true.

kiosk_mode

boolean

default:"false"

When true, filters retrieved passages below threshold.

header_prompt

string

Optional system instruction (replaces the default RAG system prompt).

footer_prompt

string

Optional instruction appended before the user question in the final user message.

chat_history

array

Prior turns: [{"role": "user"|"assistant", "content": "..."}].

temperature

number

default:"0.2"

LLM sampling temperature (0.0–2.0).

curl -N -X POST "http://localhost:8080/answer/stream" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Who won the football match?",
    "query_vector": [0.01, -0.02, "... 768 floats ..."],
    "top_k": 5
  }'

event: meta
data: {"model":"llama3.2:1b-instruct-q4_K_M","context_count":1,"sources":[...],"query":"Who won the football match?"}

event: token
data: {"delta":"Manchester"}

event: token
data: {"delta":" United beat Chelsea 2-1."}

event: done
data: {"answer":"Manchester United beat Chelsea 2-1.","query":"Who won the football match?","model":"llama3.2:1b-instruct-q4_K_M","context_count":1}

SSE events

Event	When	Data shape
`meta`	Once, before tokens	`model`, `context_count`, `sources`, `query`
`token`	Per LLM delta	`{"delta": "..."}`
`done`	Stream complete	`answer`, `query`, `model`, `context_count`
`error`	LLM failure	`{"message": "..."}`

The sources array in meta matches the shape returned by POST /search (id, score, label, text).

Errors

Non-streaming HTTP errors (empty store, invalid vector, LLM not configured) return JSON with a 4xx status before SSE starts. Once streaming begins, LLM failures arrive as an SSE error event.

Condition	Status / event	Message (example)
LLM not configured	`400`	`LLM is not configured: start Ollama on the host and run moorcheh-edge up`
LLM unreachable or error	SSE `error`	`LLM request failed`

Answer (RAG)
Python: answer_stream()
Voice server: POST /ask/stream - proxies this endpoint on edge hardware

Answer (RAG)Upload

​Overview

​Request body

​SSE events

​Errors

​Related

Overview

Request body

SSE events

Errors

Related