AI Generation
Answer stream (RAG)
Stream a RAG answer from the local LLM as Server-Sent Events.
POST
Overview
Same retrieval and prompting asPOST /answer, but the LLM response is streamed as Server-Sent Events (SSE) instead of a single JSON body.
Typical event order:
meta- model, sources, and context count (sent once)token- one or more answer text deltasdone- full answer echo and metadataerror- if the LLM call fails (instead ofdone)
Requires Ollama running on the host. Start the stack with
moorcheh-edge up (use --skip-ollama for search-only).Request body
Same fields as Answer (RAG):Original question text (included in the LLM prompt and echoed in the response).
JSON array of floats used for similarity search. Length must match the store dimension (768 for text stores).
Number of passages to retrieve for context. Capped at 100.
Minimum search score when
kiosk_mode is true.When
true, filters retrieved passages below threshold.Optional system instruction (replaces the default RAG system prompt).
Optional instruction appended before the user question in the final user message.
Prior turns:
[{"role": "user"|"assistant", "content": "..."}].LLM sampling temperature (0.0–2.0).
SSE events
| Event | When | Data shape |
|---|---|---|
meta | Once, before tokens | model, context_count, sources, query |
token | Per LLM delta | {"delta": "..."} |
done | Stream complete | answer, query, model, context_count |
error | LLM failure | {"message": "..."} |
sources array in meta matches the shape returned by POST /search (id, score, label, text).
Errors
Non-streaming HTTP errors (empty store, invalid vector, LLM not configured) return JSON with a4xx status before SSE starts. Once streaming begins, LLM failures arrive as an SSE error event.
| Condition | Status / event | Message (example) |
|---|---|---|
| LLM not configured | 400 | LLM is not configured: start Ollama on the host and run moorcheh-edge up |
| LLM unreachable or error | SSE error | LLM request failed |
Related
- Answer (RAG)
- Python: answer_stream()
- Voice server: POST /ask/stream - proxies this endpoint on edge hardware