Guide

Define the table

Inherite the Table class and define the columns as attributes with the type hints. Some advanced configuration can be done by using the typing.Annotated.

Choose a primary key

  • PrimaryKeyAutoIncrease: generate an auto-incrementing integer as the primary key

  • PrimaryKeyUUID: use uuid7 as the primary key, suitable for distributed systems or general purposes

  • int or str: insert the key manually

Configure the Index

The default index is suitable for small datasets (less than 100k). For larger datasets, you can customize the index configuration by using the typing.Annotated with:

DenseVector = Vector[768]

class MyTable(Table, kw_only=True):
    uid: PrimaryKeyUUID = msgspec.field(default_factory=PrimaryKeyUUID.factory)
    vec: Annotated[DenseVector, VectorIndex(lists=128)]
    text: str

Tip

If you need to use a customized tokenizer, please refer to the VectorChord-bm25 document.

JSONB

If you want to store a JSONB column, you can define like:

from psycopg.types.json import Jsonb

class MyJsonTable(Table, kw_only=True):
    uid: PrimaryKeyUUID = msgspec.field(default_factory=PrimaryKeyUUID.factory)
    json: JSONB

item = MyJsonTable(json=Jsonb({"key": "value"}))

Inject with decorator

The decorator inject() can be used to load the function arguments from the database and dump the return values to the database.

To use this decorator, you need to specify at least one of the input or output with the table class you have defined.

  • input=Type[Table]: will load the specified columns rom the database and inject the data to the decorated function arguments

    • if input=None, the function will need to pass the arguments manually

  • output=Type[Table]: will dump the return values to the database (will also need to annotate the return type with the provided table class or a list of the table class)

    • if output=None, you can get the return value from the functiona call

The following example uses the pre-defined tables:

from uuid import UUID
import httpx
from vechord.registry import VechordRegistry
from vechord.extract import SimpleExtractor
from vechord.embedding import GeminiDenseEmbedding
from vechord.spec import DefaultDocument, create_chunk_with_dim

DefaultChunk = create_chunk_with_dim(768)
vr = VechordRegistry(namespace="test", url="postgresql://postgres:postgres@127.0.0.1:5432/")
vr.register([DefaultDocument, DefaultChunk])
extractor = SimpleExtractor()
emb = GeminiDenseEmbedding()


@vr.inject(output=DefaultDocument)
def add_document(url: str) -> DefaultDocument:
    with httpx.Client() as client:
        resp = client.get(url)
        text = extractor.extract_html(resp.text)
        return DefaultDocument(title=url, text=text)


@vr.inject(input=Document, output=DefaultChunk)
def add_chunk(uid: UUID, text: str) -> list[DefaultChunk]:
    chunks = text.split("\n")
    return [DefaultChunk(doc_id=uid, vec=emb.vectorize_chunk(t), text=t) for t in chunks]


for url in ["https://paulgraham.com/best.html", "https://paulgraham.com/read.html"]:
    add_document(url)
add_chunk()

Select/Insert/Delete

We also provide some functions to select, insert and delete the data from the database.

docs = vr.select_by(DefaultDocument.partial_init())
vr.insert(DefaultDocument(text="hello world"))
vr.copy_bulk([DefaultDocument(text="hello world"), DefaultDocument(text="hello vector")])
vr.remove_by(DefaultDocument.partial_init())

Transaction

Use the VechordPipeline to run multiple functions in a transaction.

This also guarantees that the decorated functions will only load the data from the current transaction instead of the whole table. So users can focus on the data processing part.

pipeline = vr.create_pipeline([add_document, add_chunk])
pipeline.run("https://paulgraham.com/best.html")

Access the cursor

If you need to change some settings or use the cursor directly:

vr.client.get_cursor().execute("SET vchordrq.probes = 100;")