Skip to main content

HTTP client - answer_stream()

MoorchehEdgeApiClient.answer_stream() calls POST /answer/stream and yields raw response bytes (SSE chunks). Use this when you need token-by-token output; for a one-shot answer, use answer_text() or answer() instead. The high-level MoorchehEdge SDK does not wrap streaming yet - use MoorchehEdgeApiClient directly.
from moorcheh_edge import Embedder, MoorchehEdgeApiClient

client = MoorchehEdgeApiClient("http://localhost:8080")
embedder = Embedder()
query = "Who won the football match?"
vector = embedder.embed_query(query)

payload = {
    "query": query,
    "query_vector": vector,
    "top_k": 5,
}

for chunk in client.answer_stream(payload):
    print(chunk.decode("utf-8"), end="")
Parse SSE events from the byte stream, or use the internal _sse.iter_sse_events helper if you build tooling inside the package.

Optional prompts and history

Same optional fields as answer_text():
payload = {
    "query": "What are your hours?",
    "query_vector": embedder.embed_query("What are your hours?"),
    "top_k": 3,
    "kiosk_mode": True,
    "threshold": 0.3,
    "header_prompt": "You are a helpful store assistant.",
    "footer_prompt": "Answer in one sentence.",
    "chat_history": [
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hello! How can I help?"},
    ],
    "temperature": 0.2,
}

for chunk in client.answer_stream(payload):
    ...

SSE events

See API: Answer stream for the meta, token, done, and error event shapes.

Requirements

  • Store must contain uploaded documents (text or vector mode).
  • Ollama must be running with llama3.2:1b-instruct-q4_K_M (handled by moorcheh-edge up on first run).