Skip to main content

SDK — answer_text()

Embeds the query locally, then calls POST /answer. LLM model: llama3.2:1b-instruct-q4_K_M (via Ollama on the host).
result = edge.answer_text(
    "Who won the football match?",
    top_k=5,
    threshold=0.0,
    kiosk_mode=False,
    temperature=0.2,
)
print(result["answer"])
print(result["model"])  # llama3.2:1b-instruct-q4_K_M
for src in result["sources"]:
    print(src["id"], src["score"], src.get("text"))

Optional prompts and history

result = edge.answer_text(
    "What are your hours?",
    top_k=3,
    header_prompt="You are a helpful store assistant.",
    footer_prompt="Answer in one sentence.",
    chat_history=[
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hello! How can I help?"},
    ],
)

SDK — answer() with precomputed vector

If you already have a query embedding:
vector = edge.embedder.embed_query("Who won?")
result = edge.answer("Who won?", vector, top_k=5)

HTTP client

You must supply both query and query_vector:
from moorcheh_edge import MoorchehEdgeApiClient, Embedder

client = MoorchehEdgeApiClient("http://localhost:8080")
embedder = Embedder()
vector = embedder.embed_query("Who won the football match?")

result = client.answer({
    "query": "Who won the football match?",
    "query_vector": vector,
    "top_k": 5,
})
print(result["answer"])

Requirements

  • Store must contain uploaded documents (text or vector mode).
  • Ollama must be running with llama3.2:1b-instruct-q4_K_M (handled by moorcheh-edge up on first run).
For token-by-token output, use answer_stream() with MoorchehEdgeApiClient. See API: Answer and CLI: answer.