LangChain Integration

What is Moorcheh?

Moorcheh is a lightning-fast semantic search engine and vector store. Instead of using simple distance metrics like L2 or Cosine, Moorcheh uses Maximally Informative Binarization (MIB) and Information-Theoretic Score (ITS) to retrieve accurate document chunks. The following tutorial will allow you to use Moorcheh and LangChain to upload and store text documents and vector embeddings as well as retrieve relevant chunks for all of your queries.

Key Features

MIB Technology

Uses Maximally Informative Binarization for superior search accuracy

ITS Scoring

Information-Theoretic Score provides better relevance ranking

Seamless integration with LangChain ecosystem

Lightning Fast

Optimized for speed and performance

Setup

First, install the necessary package:

pip install langchain-moorcheh

Initialization

Get started with Moorcheh

Generate API Key

Go to the “API Keys” tab and generate an API key

Set Environment Variable

Save the key as an environment variable named MOORCHEH_API_KEY

Create Namespace

In the Console, open the “Namespaces” tab and click “Create namespace”; or initialize it programmatically

Start Using

Use your API key to create namespaces, upload documents, and retrieve answers

For more information about the Moorcheh SDK functions, see the GitHub repository.

Importing Packages

Import the required packages:

from langchain_moorcheh import MoorchehVectorStore
from moorcheh_sdk import MoorchehClient

import logging
import os
from uuid import uuid4
import asyncio
from typing import Any, List, Optional, Literal, Tuple, Type, TypeVar, Sequence
from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings
from langchain_core.vectorstores import VectorStore
from google.colab import userdata

Code Setup

Set your Moorcheh API Key in your environment variables:

MOORCHEH_API_KEY = os.environ['MOORCHEH_API_KEY']

Set up your namespace name, type, and create the vector store:

namespace = "your_namespace_name"
namespace_type = "text" # or vector
store = MoorchehVectorStore(
            api_key=MOORCHEH_API_KEY,
            namespace=namespace,
            namespace_type=namespace_type
        )

Adding Documents

Create and add documents to your vector store:

document_1 = Document(
    page_content="Brewed a fresh cup of Ethiopian coffee and paired it with a warm croissant.",
    metadata={"source": "blog"},
)

document_2 = Document(
    page_content="Tomorrow's weather will be sunny with light winds, reaching a high of 78°F.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Experimenting with LangChain for an AI-powered note-taking assistant!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Local bakery donates 500 loaves of bread to the community food bank.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="That concert last night was absolutely unforgettable—what a performance!",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Check out our latest article: 5 ways to boost productivity while working from home.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The ultimate guide to mastering homemade pizza dough.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph just made multi-agent workflows way easier—seriously impressive!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="Oil prices rose 3% today after unexpected supply cuts from major exporters.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I really hope this post doesn't vanish into the digital void…",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]

uuids = [str(uuid4()) for _ in range(len(documents))]

store.add_documents(documents=documents, ids=uuids)

Delete Documents

Remove documents from your vector store:

store.delete(ids=["chunk_id_here"])

Query Engine

Once your namespace has been created and you have uploaded documents into it, you can ask queries about the documents directly through the vector store. Set the query and LLM you would like to answer your query.

query = "Give me a brief summary of the provided documents"
answer = store.generative_answer(query, ai_model = "anthropic.claude-sonnet-4-5-20250929-v1:0")
print(answer)

For more information on supported LLMs, please visit our Github page.

Advanced Usage

Custom Embeddings

You can use custom embeddings with Moorcheh:

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
store = MoorchehVectorStore(
    api_key=MOORCHEH_API_KEY,
    namespace=namespace,
    namespace_type=namespace_type,
    embedding=embeddings
)

Search with Similarity

Search for similar documents:

# Search for similar documents
results = store.similarity_search("coffee and breakfast", k=3)
for doc in results:
    print(doc.page_content)
    print(doc.metadata)
    print("---")

Search with Score

Get similarity scores along with results:

# Search with scores
results = store.similarity_search_with_score("weather forecast", k=3)
for doc, score in results:
    print(f"Score: {score}")
    print(f"Content: {doc.page_content}")
    print("---")

Configuration Options

Namespace Types

Moorcheh supports two namespace types:

Text Namespace

Store and search text documents with automatic embedding generation

Vector Namespace

Store pre-computed vector embeddings for custom use cases

AI Models

Supported AI models for generative answers:

anthropic.claude-sonnet-4-20250514-v1:0 - Claude Sonnet 4
anthropic.claude-sonnet-4-5-20250929-v1:0 - Claude Sonnet 4.5
anthropic.claude-opus-4-5-20251101-v1:0 - Claude Opus 4.5
meta.llama4-maverick-17b-instruct-v1:0 - Llama 4 Maverick 17B
meta.llama3-3-70b-instruct-v1:0 - Llama 3.3 70B
amazon.nova-pro-v1:0 - Amazon Nova Pro
deepseek.r1-v1:0 - DeepSeek R1
openai.gpt-oss-120b-1:0 - OpenAI GPT OSS 120B
qwen.qwen3-32b-v1:0 - Qwen 3 32B

Best Practices

Document Preparation

Clean Your Data

Remove unnecessary whitespace and format text consistently

Add Metadata

Include relevant metadata for better filtering and organization

Chunk Appropriately

Split large documents into meaningful chunks

Use Unique IDs

Generate unique identifiers for each document

Performance Optimization

Use appropriate chunk sizes (typically 500-1000 characters)
Batch document uploads for better performance
Monitor your API usage and rate limits
Use caching for frequently accessed data

Error Handling

try:
    store.add_documents(documents=documents, ids=uuids)
    print("Documents added successfully")
except Exception as e:
    print(f"Error adding documents: {e}")

Troubleshooting

Common Issues

API Key Issues

Ensure your API key is correctly set in environment variables
Check that the API key has the necessary permissions
Verify the API key is not expired

Namespace Problems

Make sure the namespace exists before adding documents
Check that the namespace type matches your use case
Verify you have access to the namespace

Document Upload Errors

Check document format and content
Ensure all required fields are present
Verify document IDs are unique

Debug Mode

Enable debug logging to troubleshoot issues:

import logging
logging.basicConfig(level=logging.DEBUG)

Further Resources

For more information about Moorcheh, explore these resources:

GitHub Repository

Source code and detailed documentation

Examples Repository

Practical examples and tutorials

Official Website

Learn more about Moorcheh’s capabilities

Documentation

Complete API documentation

YouTube Channel

Video tutorials and demos

Follow on X

Stay updated with latest news

Support

Need help with the LangChain integration?

Get Support

Contact our support team for assistance

MCP

Chat Boilerplate

LangChain

LlamaIndex

n8n

​What is Moorcheh?

​Key Features

MIB Technology

ITS Scoring