Synopsis
--query locally (BGE, 768-dim), searches the store, and calls POST /answer. The server uses Ollama with fixed model llama3.2:1b-instruct-q4_K_M.
Run
moorcheh-edge up first so Ollama is installed and the LLM model is pulled. Use --skip-ollama on up only if you want search without RAG.Options
| Flag | Default | Description |
|---|---|---|
--query | Required | Question to answer |
--top-k | 5 | Passages to retrieve for context |
--threshold | 0.0 | Minimum score when --kiosk-mode is set |
--kiosk-mode | off | Filter low-scoring passages |
--header-prompt | — | Custom system instruction |
--footer-prompt | — | Instruction before the question |
--chat-history-json | — | Prior turns as JSON array |
--temperature | 0.2 (server) | LLM temperature 0.0–2.0 |
--timeout | 120 | HTTP timeout in seconds |
--base-url | http://localhost:8080 | API base URL |