Skip to main content
Collections are containers that group related data from one or more source connections, enabling unified search across multiple data sources. Think of them as logical databases where you organize your integrated data.

Overview

A collection serves as a namespace for your data sources. When you create a collection, Airweave:
  • Assigns a unique readable_id for use in API calls and URLs
  • Configures vector storage with your organization’s embedding model
  • Sets up a dedicated search endpoint
  • Creates a container for source connections to sync data into
Collections are isolated per organization. Each collection can have its own sync configuration and contains data from one or more authenticated source connections.

Creating a Collection

Create a new collection to start organizing your data sources.
1

Choose a descriptive name

Pick a name that clearly identifies the data contained within (e.g., “Customer Support”, “Finance Data”, “Engineering Docs”).
2

Optionally set a readable_id

If you don’t provide one, Airweave will auto-generate a URL-safe identifier from your collection name with a random suffix (e.g., customer-support-x7k9m).
3

Create via API or SDK

Use the REST API or SDK to create your collection.
from airweave import AirweaveSDK

client = AirweaveSDK(api_key="YOUR_API_KEY")

# Simple creation with auto-generated readable_id
collection = client.collections.create(
    name="Customer Support"
)

print(f"Collection ID: {collection.readable_id}")
# Output: customer-support-x7k9m

# Creation with custom readable_id
collection = client.collections.create(
    name="Finance Reports",
    readable_id="finance-data-2024"
)

Request Parameters

name
string
required
Human-readable display name for the collection. Must be between 4 and 64 characters.Examples: "Finance Data", "Customer Support", "Marketing Analytics"
readable_id
string
URL-safe unique identifier. Must contain only lowercase letters, numbers, and hyphens. Cannot start or end with a hyphen.If not provided, automatically generated from the name with a random 6-character suffix.Examples: "finance-data-ab123", "customer-support-xy789"
sync_config
object
Optional default sync configuration for all syncs in this collection. Can be overridden at the sync or job level.
{
  "handlers": {
    "enable_vector_handlers": true,
    "enable_postgres_handler": true
  }
}

Response Fields

The API returns a Collection object with the following fields:
id
UUID
Unique system identifier (auto-generated)
name
string
Display name of the collection
readable_id
string
URL-safe identifier for API endpoints
status
string
Current operational status:
  • NEEDS_SOURCE: No authenticated connections or no successful syncs yet
  • ACTIVE: At least one connection has completed a sync
  • ERROR: All connections failed their last sync
vector_size
integer
Embedding dimensions (derived from deployment metadata)
embedding_model_name
string
Name of the embedding model (e.g., "text-embedding-3-large")
organization_id
UUID
Your organization identifier
created_at
ISO 8601 datetime
Timestamp when created
modified_at
ISO 8601 datetime
Timestamp of last modification

Listing Collections

Retrieve all collections in your organization with optional pagination and search filtering.
# List all collections
collections = client.collections.list()

# Paginated listing
collections = client.collections.list(
    skip=0,
    limit=50
)

# Search by name or readable_id
collections = client.collections.list(
    search="customer"
)

# Get total count
count = client.collections.count()
print(f"Total collections: {count}")

Query Parameters

skip
integer
default:"0"
Number of collections to skip for pagination (min: 0)
limit
integer
default:"100"
Maximum number of collections to return (1-1000)
Filter collections by name or readable_id (case-insensitive partial match)

Getting a Collection

Retrieve details of a specific collection by its readable_id.
collection = client.collections.get(
    readable_id="customer-support-x7k9m"
)

print(f"Name: {collection.name}")
print(f"Status: {collection.status}")
print(f"Model: {collection.embedding_model_name}")

Updating a Collection

Modify a collection’s name or sync configuration. The readable_id cannot be changed after creation.
# Update collection name
updated = client.collections.update(
    readable_id="customer-support-x7k9m",
    name="Customer Support Archive"
)

# Update sync configuration
updated = client.collections.update(
    readable_id="finance-data-2024",
    sync_config={
        "handlers": {
            "enable_vector_handlers": True,
            "enable_postgres_handler": False
        }
    }
)
The readable_id is immutable to maintain stable API endpoints and preserve existing integrations. Only the name and sync_config can be updated.

Deleting a Collection

Permanently delete a collection and all associated data. This operation cannot be undone.
deleted = client.collections.delete(
    readable_id="customer-support-x7k9m"
)

print(f"Deleted: {deleted.name}")
This action cannot be undone. Deleting a collection:
  • Removes all synced data from the vector database
  • Deletes all source connections within the collection
  • Cancels any scheduled sync jobs
  • Cleans up all related resources
All data will be permanently deleted.

Collection Status

Collections have three possible statuses that reflect their operational state:
The collection has no authenticated source connections, or connections exist but haven’t completed a successful sync yet.Next steps: Add and authenticate a source connection, then trigger a sync.
At least one source connection has completed a sync successfully, or a sync is currently running.This is the normal operating state for collections with data.
All source connections have failed their most recent sync attempt.Action required: Check the sync error logs and fix authentication or configuration issues.

Next Steps

Search

Learn how to search across your collection data

Webhooks

Set up real-time notifications for collection events