Collections

Collections are containers that group related data from one or more source connections, enabling unified search across multiple data sources. Think of them as logical databases where you organize your integrated data.

Overview

A collection serves as a namespace for your data sources. When you create a collection, Airweave:

Assigns a unique readable_id for use in API calls and URLs
Configures vector storage with your organization’s embedding model
Sets up a dedicated search endpoint
Creates a container for source connections to sync data into

Collections are isolated per organization. Each collection can have its own sync configuration and contains data from one or more authenticated source connections.

Creating a Collection

Create a new collection to start organizing your data sources.

Choose a descriptive name

Pick a name that clearly identifies the data contained within (e.g., “Customer Support”, “Finance Data”, “Engineering Docs”).

Optionally set a readable_id

If you don’t provide one, Airweave will auto-generate a URL-safe identifier from your collection name with a random suffix (e.g., customer-support-x7k9m).

Create via API or SDK

Use the REST API or SDK to create your collection.

from airweave import AirweaveSDK

client = AirweaveSDK(api_key="YOUR_API_KEY")

# Simple creation with auto-generated readable_id
collection = client.collections.create(
    name="Customer Support"
)

print(f"Collection ID: {collection.readable_id}")
# Output: customer-support-x7k9m

# Creation with custom readable_id
collection = client.collections.create(
    name="Finance Reports",
    readable_id="finance-data-2024"
)

Request Parameters

name

string

required

Human-readable display name for the collection. Must be between 4 and 64 characters.Examples: "Finance Data", "Customer Support", "Marketing Analytics"

readable_id

string

URL-safe unique identifier. Must contain only lowercase letters, numbers, and hyphens. Cannot start or end with a hyphen.If not provided, automatically generated from the name with a random 6-character suffix.Examples: "finance-data-ab123", "customer-support-xy789"

sync_config

object

Optional default sync configuration for all syncs in this collection. Can be overridden at the sync or job level.

{
  "handlers": {
    "enable_vector_handlers": true,
    "enable_postgres_handler": true
  }
}

Response Fields

The API returns a Collection object with the following fields:

UUID

Unique system identifier (auto-generated)

name

string

Display name of the collection

readable_id

string

URL-safe identifier for API endpoints

status

string

Current operational status:

NEEDS_SOURCE: No authenticated connections or no successful syncs yet
ACTIVE: At least one connection has completed a sync
ERROR: All connections failed their last sync

vector_size

integer

Embedding dimensions (derived from deployment metadata)

embedding_model_name

string

Name of the embedding model (e.g., "text-embedding-3-large")

organization_id

UUID

Your organization identifier

created_at

ISO 8601 datetime

Timestamp when created

modified_at

ISO 8601 datetime

Timestamp of last modification

Listing Collections

Retrieve all collections in your organization with optional pagination and search filtering.

# List all collections
collections = client.collections.list()

# Paginated listing
collections = client.collections.list(
    skip=0,
    limit=50
)

# Search by name or readable_id
collections = client.collections.list(
    search="customer"
)

# Get total count
count = client.collections.count()
print(f"Total collections: {count}")

Query Parameters

skip

integer

default:"0"

Number of collections to skip for pagination (min: 0)

limit

integer

default:"100"

Maximum number of collections to return (1-1000)

string

Filter collections by name or readable_id (case-insensitive partial match)

Getting a Collection

Retrieve details of a specific collection by its readable_id.

collection = client.collections.get(
    readable_id="customer-support-x7k9m"
)

print(f"Name: {collection.name}")
print(f"Status: {collection.status}")
print(f"Model: {collection.embedding_model_name}")

Updating a Collection

Modify a collection’s name or sync configuration. The readable_id cannot be changed after creation.

# Update collection name
updated = client.collections.update(
    readable_id="customer-support-x7k9m",
    name="Customer Support Archive"
)

# Update sync configuration
updated = client.collections.update(
    readable_id="finance-data-2024",
    sync_config={
        "handlers": {
            "enable_vector_handlers": True,
            "enable_postgres_handler": False
        }
    }
)

The readable_id is immutable to maintain stable API endpoints and preserve existing integrations. Only the name and sync_config can be updated.

Deleting a Collection

Permanently delete a collection and all associated data. This operation cannot be undone.

deleted = client.collections.delete(
    readable_id="customer-support-x7k9m"
)

print(f"Deleted: {deleted.name}")

This action cannot be undone. Deleting a collection:

Removes all synced data from the vector database
Deletes all source connections within the collection
Cancels any scheduled sync jobs
Cleans up all related resources

All data will be permanently deleted.

Collection Status

Collections have three possible statuses that reflect their operational state:

NEEDS_SOURCE

The collection has no authenticated source connections, or connections exist but haven’t completed a successful sync yet.Next steps: Add and authenticate a source connection, then trigger a sync.

ACTIVE

At least one source connection has completed a sync successfully, or a sync is currently running.This is the normal operating state for collections with data.

ERROR

All source connections have failed their most recent sync attempt.Action required: Check the sync error logs and fix authentication or configuration issues.

Getting Started

Core Features

Data Sources

Integrations

Self-Hosting

Advanced

Contributing

Overview

Creating a Collection

Request Parameters

Response Fields

Listing Collections

Query Parameters

Getting a Collection

Updating a Collection

Deleting a Collection

Collection Status

Next Steps

Search

Webhooks

Getting Started

Core Features

Data Sources

Integrations

Self-Hosting

Advanced

Contributing

​Overview

​Creating a Collection

​Request Parameters

​Response Fields

​Listing Collections

​Query Parameters

​Getting a Collection

​Updating a Collection

​Deleting a Collection

​Collection Status

​Next Steps

Search

Webhooks

Overview

Creating a Collection

Request Parameters

Response Fields

Listing Collections

Query Parameters

Getting a Collection

Updating a Collection

Deleting a Collection

Collection Status

Next Steps