Skip to main content

Cohere + Moorcheh

This integration uses Cohere to generate embeddings and Moorcheh vector namespaces to store and search them with ITS ranking. Use this approach when you want full control over the embedding model (for example, embed-v4.0) and upload pre-computed vectors directly to Moorcheh.

Architecture

Embedding Generation

Generate vectors with Cohere embed-v4.0

Vector Storage

Store vectors in Moorcheh vector namespaces

Semantic Retrieval

Search by vector query for high-relevance results

Model Flexibility

Keep your preferred embedding provider while using Moorcheh search

Prerequisites

Install dependencies:
pip install moorcheh-sdk cohere

End-to-end Example

import os
import textwrap
from typing import List

import cohere
from moorcheh_sdk import MoorchehClient


MOORCHEH_API_KEY = os.environ["MOORCHEH_API_KEY"]
COHERE_API_KEY = os.environ["COHERE_API_KEY"]

NAMESPACE = "cohere-v4-demo"
VECTOR_DIMENSION = 1536
CHUNK_SIZE = 900
CHUNK_OVERLAP = 180


def to_float_vector(vector: List[float]) -> List[float]:
    return [float(x) for x in vector]


def chunk_text(text: str, chunk_size: int = CHUNK_SIZE, overlap: int = CHUNK_OVERLAP) -> List[str]:
    chunks: List[str] = []
    start = 0
    while start < len(text):
        end = min(start + chunk_size, len(text))
        chunks.append(text[start:end].strip())
        if end == len(text):
            break
        start = max(end - overlap, 0)
    return [c for c in chunks if c]


def extract_text(result: dict) -> str:
    if result.get("text"):
        return str(result["text"])
    metadata = result.get("metadata") or {}
    if isinstance(metadata, dict):
        return str(metadata.get("text") or metadata.get("raw_text") or metadata.get("content") or "")
    return ""


def clean_text(text: str) -> str:
    return " ".join(str(text).split())


def print_result(idx: int, result: dict) -> None:
    metadata = result.get("metadata") or {}
    text_value = clean_text(extract_text(result))
    wrapped = textwrap.fill(text_value, width=100)
    print(f"[{idx}] id={result.get('id')}")
    print(f"score={result.get('score')} label={result.get('label')}")
    print(f"section={metadata.get('section')} source_doc_id={metadata.get('source_doc_id')}")
    print("text:")
    print(wrapped if wrapped else "(no text returned)")
    print("-" * 120)


# 1) Initialize clients
co = cohere.ClientV2(api_key=COHERE_API_KEY)
mc = MoorchehClient(api_key=MOORCHEH_API_KEY)

# 2) Create vector namespace once (ignore if it already exists)
try:
    mc.namespaces.create(
        namespace_name=NAMESPACE,
        type="vector",
        vector_dimension=VECTOR_DIMENSION,
    )
except Exception:
    pass

# 3) Build richer source content and chunk it
source_documents = [
    {
        "id": "guide-vector-namespaces",
        "section": "vector-namespace-best-practices",
        "text": (
            "Moorcheh vector namespaces are designed for bring-your-own-embedding workflows. "
            "When using Cohere embed-v4.0 with 1536 dimensions, the namespace dimension must match exactly. "
            "Each vector item should include a stable id and the original chunk text in the text field so retrieved results "
            "can be displayed directly without a second data fetch. Include consistent metadata like source, section, and model."
        ),
    },
    {
        "id": "guide-search-tuning",
        "section": "semantic-search-tuning",
        "text": (
            "To increase relevance score, write question-style queries with domain terms and expected intent. "
            "Use coherent chunks of approximately 500 to 1000 characters with overlap to preserve context continuity. "
            "For production, use top_k values aligned with your use case and apply threshold filtering with kiosk_mode "
            "to remove low-confidence matches."
        ),
    },
]

documents = []
for doc in source_documents:
    parts = chunk_text(doc["text"])
    for idx, chunk in enumerate(parts):
        documents.append(
            {
                "id": f"{doc['id']}-chunk-{idx}",
                "text": chunk,
                "source_doc_id": doc["id"],
                "section": doc["section"],
                "chunk_index": idx,
                "total_chunks": len(parts),
            }
        )

# 4) Embed chunks using Cohere embed-v4.0
doc_embeddings = co.embed(
    model="embed-v4.0",
    input_type="search_document",
    texts=[d["text"] for d in documents],
).embeddings.float_

# 5) Upload vectors to Moorcheh
mc.vectors.upload(
    namespace_name=NAMESPACE,
    vectors=[
        {
            "id": documents[i]["id"],
            "vector": to_float_vector(doc_embeddings[i]),
            "text": documents[i]["text"],
            "source": "cohere-embed-v4",
            "model": "embed-v4.0",
            "section": documents[i]["section"],
            "source_doc_id": documents[i]["source_doc_id"],
            "chunk_index": documents[i]["chunk_index"],
            "total_chunks": documents[i]["total_chunks"],
        }
        for i in range(len(documents))
    ],
)

# 6) Embed retrieval-focused query and search namespace
query = (
    "In Moorcheh vector namespaces, how do I use Cohere embed-v4.0 with 1536 dimensions, "
    "store raw text in vector metadata, and improve semantic relevance scores?"
)
query_embedding = co.embed(
    model="embed-v4.0",
    input_type="search_query",
    texts=[query],
).embeddings.float_[0]

results = mc.similarity_search.query(
    namespaces=[NAMESPACE],
    query=to_float_vector(query_embedding),
    top_k=5,
    kiosk_mode=True,
    threshold=0.15,
)

print(f"namespace={NAMESPACE} total_results={len(results.get('results', []))}")
print("=" * 120)
for idx, r in enumerate(results.get("results", []), start=1):
    print_result(idx, r)

Important Notes

The namespace vector_dimension must exactly match the dimension returned by your Cohere embedding configuration.
Use search_document when embedding stored documents and search_query for user queries.
Include text in each uploaded vector object. Moorcheh stores it as metadata, so search results can return the original content without refetching from another data source.
Use coherent chunks (typically 500-1000 characters) with overlap, keep query phrasing intent-rich, and tune kiosk_mode + threshold to filter weaker matches.
Use the same embedding model/version and parameters for both index-time and query-time embeddings.

Troubleshooting

  • No vector namespace found: Create the namespace first with type="vector".
  • Dimension mismatch: Recreate the namespace with the correct vector_dimension for your embedding output.
  • Low relevance: Re-check chunking strategy and ensure document/query input types are correct.