Skip to main content

documents.get

Retrieves specific documents by their IDs from a namespace. This endpoint allows you to fetch documents that have been previously uploaded and indexed.
This method retrieves documents that have been previously uploaded and indexed in the specified namespace. For semantic search and similarity-based retrieval, use the Search API.

Parameters

namespace_name
str
required
The name of the namespace containing the documents.
ids
List[Union[str, int]]
required
A list of document IDs to retrieve (max 100 IDs per request).
Returns: Dict[str, Any] - A dictionary containing the retrieved documents. Raises: NamespaceNotFound, InvalidInputError.

Example

Get Documents Example
from moorcheh_sdk import MoorchehClient

with MoorchehClient() as client:
    # Get specific documents by ID
    result = client.documents.get(
        namespace_name="my-faq-documents",
        ids=["faq-1", "faq-2", "faq-3"]
    )
    
    for item in result.get('items', []):
        print(f"ID: {item['id']}")
        print(f"Text: {item['text']}")
        print(f"Metadata: {item.get('metadata', {})}")

Response Structure

The response contains:
  • status (str): “success” or “partial”
  • message (str): Human-readable message
  • requested_ids (int): Number of document IDs requested
  • found_items (int): Number of documents successfully found
  • items (list): Array of retrieved document objects
  • not_found_ids (list, optional): IDs that were not found (for partial success)

Complete Example

Complete Example
from moorcheh_sdk import MoorchehClient

with MoorchehClient() as client:
    namespace = "my-documents"
    
    # Retrieve multiple documents
    result = client.documents.get(
        namespace_name=namespace,
        ids=["doc-1", "doc-2", "doc-3", "doc-4", "doc-5"]
    )
    
    print(f"Requested: {result.get('requested_ids', 0)}")
    print(f"Found: {result.get('found_items', 0)}")
    
    # Process retrieved documents
    for item in result.get('items', []):
        print(f"\nDocument ID: {item['id']}")
        print(f"Text: {item['text'][:100]}...")  # First 100 chars
        if item.get('metadata'):
            print(f"Metadata: {item['metadata']}")
    
    # Handle partial success
    if result.get('status') == 'partial':
        not_found = result.get('not_found_ids', [])
        if not_found:
            print(f"\nDocuments not found: {not_found}")

Key Features

  • Batch Retrieval: Retrieve up to 100 documents in a single request
  • Partial Success: Non-existent document IDs are ignored without causing errors
  • Efficient Processing: Uses optimized batch retrieval for performance
  • Flexible IDs: Document IDs can be strings or numbers

Best Practices

  • Use the maximum batch size (100 documents) when possible
  • Group related document retrievals to minimize API calls
  • Always check the found_items count vs requested_ids
  • Handle partial success responses gracefully
  • Cache frequently accessed documents client-side

Use Cases

  • Document Retrieval: Fetch specific documents by ID for display or processing
  • Content Management: Access and manage previously uploaded documents
  • Data Export: Extract documents for backup or migration purposes
  • Quality Assurance: Review uploaded content for accuracy and completeness
  • Integration: Sync document data with external systems and applications