Guide

Define the table

Inherite the Table class and define the columns as attributes with the type hints. Some advanced configuration can be done by using the typing.Annotated.

Choose a primary key

  • PrimaryKeyAutoIncrease: generate an auto-incrementing integer as the primary key

  • PrimaryKeyUUID: use uuid7 as the primary key, suitable for distributed systems or general purposes

  • int or str: insert the key manually

Configure the Index

The default index is suitable for small datasets (less than 100k). For larger datasets, you can customize the index configuration by using the typing.Annotated with:

DenseVector = Vector[3072]

class MyTable(Table, kw_only=True):
    uid: PrimaryKeyUUID = msgspec.field(default_factory=PrimaryKeyUUID.factory)
    vec: Annotated[DenseVector, VectorIndex(lists=128)]
    text: str

Tip

If you need to use a customized tokenizer, please refer to the VectorChord-bm25 document.

JSONB

If you want to store a JSONB column, you can define like:

from psycopg.types.json import Jsonb

class MyJsonTable(Table, kw_only=True):
    uid: PrimaryKeyUUID = msgspec.field(default_factory=PrimaryKeyUUID.factory)
    json: JSONB

item = MyJsonTable(json=Jsonb({"key": "value"}))

Inject with decorator

The decorator inject() can be used to load the function arguments from the database and dump the return values to the database.

To use this decorator, you need to specify at least one of the input or output with the table class you have defined.

  • input=Type[Table]: will load the specified columns rom the database and inject the data to the decorated function arguments

    • if input=None, the function will need to pass the arguments manually

  • output=Type[Table]: will dump the return values to the database (will also need to annotate the return type with the provided table class or a list of the table class)

    • if output=None, you can get the return value from the functiona call

The following example uses the pre-defined tables:

from uuid import UUID
import httpx
from vechord.registry import VechordRegistry
from vechord.extract import SimpleExtractor
from vechord.embedding import GeminiDenseEmbedding
from vechord.spec import DefaultDocument, create_chunk_with_dim

DefaultChunk = create_chunk_with_dim(3072)
vr = VechordRegistry(namespace="test", url="postgresql://postgres:postgres@127.0.0.1:5432/", tables=[DefaultDocument, DefaultChunk])
extractor = SimpleExtractor()
emb = GeminiDenseEmbedding()


@vr.inject(output=DefaultDocument)
async def add_document(url: str) -> DefaultDocument:
    async with httpx.AsyncClient() as client:
        resp = await client.get(url)
        text = extractor.extract_html(resp.text)
        return DefaultDocument(title=url, text=text)


@vr.inject(input=Document, output=DefaultChunk)
async def add_chunk(uid: UUID, text: str) -> list[DefaultChunk]:
    chunks = text.split("\n")
    return [DefaultChunk(doc_id=uid, vec=await emb.vectorize_chunk(t), text=t) for t in chunks]


async def main():
    async with vr, emb:
        for url in ["https://paulgraham.com/best.html", "https://paulgraham.com/read.html"]:
            await add_document(url)
        await add_chunk()

Select/Insert/Delete

We also provide some functions to select, insert and delete the data from the database.

docs = await vr.select_by(DefaultDocument.partial_init())
await vr.insert(DefaultDocument(text="hello world"))
await vr.copy_bulk([DefaultDocument(text="hello world"), DefaultDocument(text="hello vector")])
await vr.remove_by(DefaultDocument.partial_init())

Transaction

Use the VechordPipeline to run multiple functions in a transaction.

This also guarantees that the decorated functions will only load the data from the current transaction instead of the whole table. So users can focus on the data processing part.

pipeline = vr.create_pipeline([add_document, add_chunk])
await pipeline.run("https://paulgraham.com/best.html")

Access the cursor

If you need to change some settings or use the cursor directly:

await vr.client.get_cursor().execute("SET vchordrq.probes = 100;")