Skip to main content

documents.upload

Uploads text documents to a text namespace. Moorcheh will process and embed these asynchronously.

Parameters

namespace_name
str
required
The name of the target text namespace.
documents
List[Dict]
required
A list of dictionaries. Each dict requires an id and text key.
Returns: Dict[str, Any] - A dictionary confirming the documents were queued. Raises: NamespaceNotFound, InvalidInputError.

Example

Upload Documents Example
from moorcheh_sdk import MoorchehClient

with MoorchehClient() as client:
    documents_to_upload = [
        {
            "id": "faq-1",
            "text": "To reset your password, go to the account settings page.",
            "category": "account"
        },
        {
            "id": "faq-2",
            "text": "Our return policy allows returns within 30 days of purchase.",
            "category": "shipping"
        }
    ]

    status = client.documents.upload(
        namespace_name="my-faq-documents",
        documents=documents_to_upload
    )
    print(status)

Document Structure

Each document in the documents array is a flat object with these properties:
  • id (required): Unique identifier for the document (string or number)
  • text (required): The main text content of the document
  • Additional fields: Any other fields are treated as metadata
Well-Structured Documents
documents = [
    {
        "id": "article-123",
        "text": "Full article content here...",
        # Metadata fields - all other kwargs are considered as metadata
        "title": "Introduction to Machine Learning",
        "author": "Dr. Smith",
        "category": "education",
        "publish_date": "2024-01-15",
        "tags": ["ml", "ai", "tutorial"],
        "difficulty": "beginner"
    }
]

Complete Example

Complete Data Management Workflow
from moorcheh_sdk import MoorchehClient
import time

with MoorchehClient() as client:
    # 1. Create a namespace
    client.namespaces.create("my-data", type="text")

    # 2. Upload documents
    docs = [
        {
            "id": "doc-1",
            "text": "This is the first document",
            "category": "tutorial",
            "author": "John Doe"
        },
        {
            "id": "doc-2",
            "text": "This is the second document",
            "category": "guide",
            "author": "Jane Smith"
        }
    ]

    upload_result = client.documents.upload(namespace_name="my-data", documents=docs)
    print(f"Upload status: {upload_result}")

    # 3. Wait for processing (text documents need time to be embedded)
    print("Waiting for document processing...")
    time.sleep(5)

Important Notes

Asynchronous Processing: Text documents are processed asynchronously. Allow a few seconds after upload before searching.
ID Uniqueness: Document IDs must be unique within their namespace. Uploading with an existing ID will overwrite the previous entry.
Batch Processing: For large datasets, upload documents in batches of 100-1000 items for optimal performance.

Best Practices

  • Keep documents focused on a single topic
  • Include meaningful titles and metadata
  • Use consistent metadata schemas across documents
  • Break large documents into logical chunks
  • Upload in batches of 25-50 documents for optimal performance
  • Use meaningful document IDs for easier management

Document Limits

  • Text Length: Min 10 characters, Max 50,000 characters per document
  • Batch Size: Max 100 documents per request, Recommended 25-50
  • Metadata Size: Max 2KB per document, Up to 50 metadata keys