Overview
The voice server is started withmoorcheh-edge voice serve. It runs on Linux edge hardware (for example Arduino UNO Q) and exposes mic, speaker, and RAG endpoints over HTTP.
It is not part of the Moorcheh Edge Docker container (:8080). RAG calls are proxied to Moorcheh Edge on the same device (default http://127.0.0.1:8080).
| Default URL | http://<device-ip>:8766 |
| Start command | moorcheh-edge voice serve --port 8766 |
| Platform | Linux with ALSA (not Windows/macOS) |
Run
moorcheh-edge voice setup once before starting the server.GET /health
Check that the voice server is running.POST /listen
Record from the device mic and return transcribed text.Fixed recording length in seconds. When set, disables silence detection.
When
true (and seconds is omitted), stop recording after a pause in speech.Maximum recording length when using silence detection (3–60).
POST /speak
Synthesize and play text on the device speaker.Text to speak.
POST /ask/stream
Stream a RAG answer as Server-Sent Events. Embeds the query on the edge device whenquery_vector is omitted; otherwise uses the vector you supply.
Proxies POST /answer/stream on Moorcheh Edge. When speak: true, plays TTS sentence-by-sentence on the device speaker.
Question text.
Optional precomputed embedding. When omitted, the server embeds locally with BGE (768-dim).
Passages to retrieve for context.
When
true, filters passages below threshold.Minimum search score when
kiosk_mode is true.Optional system instruction for RAG.
Optional instruction before the question.
Prior turns:
[{"role": "user"|"assistant", "content": "..."}].When
true, enqueue sentence TTS on the device during the stream.When
speak is true, play the cached kiosk holding welcome audio in parallel with RAG (requires voice cache-holding).SSE events
Inherits Moorcheh Edge events from Answer stream:| Event | Description |
|---|---|
meta | Model, sources, and context count |
token | Answer text delta ({"delta": "..."}) |
done | Full answer and metadata |
error | LLM or upstream failure |
speak: true:
| Event | Description |
|---|---|
holding | Kiosk welcome audio started ({"text": "...", "playing": true}) |
sentence | A completed sentence queued for TTS ({"text": "..."}) |
POST /ask and POST /ask/voice
Full voice loop: record from mic (unlessquery is provided), embed locally, call POST /answer on Moorcheh Edge, and speak the reply on the device.
/ask/voice is an alias for /ask.
When set, skip mic capture and use this text as the question.
Fixed recording length when
query is omitted.Stop recording after a pause when
seconds is omitted.Maximum recording length for silence detection.
Passages to retrieve for context.
Filter low-scoring passages when
true.Minimum score when
kiosk_mode is true.Optional system instruction.
Optional instruction before the question.
Prior conversation turns.
When
true, play the answer on the device speaker.Errors
| Condition | Status | Body |
|---|---|---|
| Invalid JSON | 400 | {"error": "invalid JSON body"} |
Missing query on /ask/stream | 400 | {"error": "query is required"} |
| Unknown route | 404 | {"error": "not found"} |
| Voice runtime / audio failure | 503 | {"error": "..."} |
| Unexpected server error | 500 | {"error": "..."} |