Voice server - Moorcheh Documentation

Overview

The voice server is started with moorcheh-edge voice serve. It runs on Linux edge hardware (for example Arduino UNO Q) and exposes mic, speaker, and RAG endpoints over HTTP. It is not part of the Moorcheh Edge Docker container (:8080). RAG calls are proxied to Moorcheh Edge on the same device (default http://127.0.0.1:8080).


Default URL	`http://<device-ip>:8766`
Start command	`moorcheh-edge voice serve --port 8766`
Platform	Linux with ALSA (not Windows/macOS)

Run moorcheh-edge voice setup once before starting the server.

GET /health

Check that the voice server is running.

curl -X GET "http://192.168.1.50:8766/health"

{
  "status": "ok",
  "service": "moorcheh-edge-voice"
}

POST /listen

Record from the device mic and return transcribed text.

seconds

number

Fixed recording length in seconds. When set, disables silence detection.

until_silence

boolean

default:"true"

When true (and seconds is omitted), stop recording after a pause in speech.

max_seconds

number

default:"30"

Maximum recording length when using silence detection (3–60).

curl -X POST "http://192.168.1.50:8766/listen" \
  -H "Content-Type: application/json" \
  -d '{"until_silence": true, "max_seconds": 30}'

{
  "heard": "Do you have oat milk?"
}

POST /speak

Synthesize and play text on the device speaker.

text

string

required

Text to speak.

curl -X POST "http://192.168.1.50:8766/speak" \
  -H "Content-Type: application/json" \
  -d '{"text": "Welcome to The Brew Corner."}'

{
  "spoke": true
}

POST /ask/stream

Stream a RAG answer as Server-Sent Events. Embeds the query on the edge device when query_vector is omitted; otherwise uses the vector you supply. Proxies POST /answer/stream on Moorcheh Edge. When speak: true, plays TTS sentence-by-sentence on the device speaker.

query

string

required

Question text.

query_vector

array

Optional precomputed embedding. When omitted, the server embeds locally with BGE (768-dim).

top_k

number

default:"5"

Passages to retrieve for context.

kiosk_mode

boolean

default:"true"

When true, filters passages below threshold.

threshold

number

default:"0.25"

Minimum search score when kiosk_mode is true.

header_prompt

string

Optional system instruction for RAG.

footer_prompt

string

Optional instruction before the question.

chat_history

array

Prior turns: [{"role": "user"|"assistant", "content": "..."}].

speak

boolean

default:"false"

When true, enqueue sentence TTS on the device during the stream.

holding_enabled

boolean

default:"true"

When speak is true, play the cached kiosk holding welcome audio in parallel with RAG (requires voice cache-holding).

curl -N -X POST "http://192.168.1.50:8766/ask/stream" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Do you have oat milk?",
    "top_k": 2,
    "kiosk_mode": true,
    "threshold": 0.3,
    "speak": true
  }'

SSE events

Inherits Moorcheh Edge events from Answer stream:

Event	Description
`meta`	Model, sources, and context count
`token`	Answer text delta (`{"delta": "..."}`)
`done`	Full answer and metadata
`error`	LLM or upstream failure

Additional events when speak: true:

Event	Description
`holding`	Kiosk welcome audio started (`{"text": "...", "playing": true}`)
`sentence`	A completed sentence queued for TTS (`{"text": "..."}`)

TTS continues in the background after the HTTP stream closes; the connection does not wait for playback to finish.

POST /ask and POST /ask/voice

Full voice loop: record from mic (unless query is provided), embed locally, call POST /answer on Moorcheh Edge, and speak the reply on the device. /ask/voice is an alias for /ask.

query

string

When set, skip mic capture and use this text as the question.

seconds

number

Fixed recording length when query is omitted.

until_silence

boolean

default:"true"

Stop recording after a pause when seconds is omitted.

max_seconds

number

default:"30"

Maximum recording length for silence detection.

top_k

number

default:"5"

Passages to retrieve for context.

kiosk_mode

boolean

default:"true"

Filter low-scoring passages when true.

threshold

number

default:"0.25"

Minimum score when kiosk_mode is true.

header_prompt

string

Optional system instruction.

footer_prompt

string

Optional instruction before the question.

chat_history

array

Prior conversation turns.

speak

boolean

default:"true"

When true, play the answer on the device speaker.

curl -X POST "http://192.168.1.50:8766/ask" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are your hours?",
    "top_k": 2,
    "kiosk_mode": true,
    "threshold": 0.3,
    "speak": true
  }'

{
  "heard": "What are your hours?",
  "query": "What are your hours?",
  "answer": "We are open Monday through Friday, 7am to 6pm.",
  "model": "llama3.2:1b-instruct-q4_K_M",
  "context_count": 1,
  "spoke": true
}

Errors

Condition	Status	Body
Invalid JSON	`400`	`{"error": "invalid JSON body"}`
Missing `query` on `/ask/stream`	`400`	`{"error": "query is required"}`
Unknown route	`404`	`{"error": "not found"}`
Voice runtime / audio failure	`503`	`{"error": "..."}`
Unexpected server error	`500`	`{"error": "..."}`

​Overview

​GET /health

​POST /listen

​POST /speak

​POST /ask/stream

​SSE events

​POST /ask and POST /ask/voice

​Errors

​Related

Overview

GET /health

POST /listen

POST /speak

POST /ask/stream

SSE events

POST /ask and POST /ask/voice

Errors

Related