API Reference¶
VechordRegistry¶
- class vechord.registry.VechordPipeline(client, steps)[source]¶
Set up the pipeline to run multiple functions in a transaction.
- Parameters:
client (
VechordClient
) –VectorChordClient
to be used for the transaction.steps (
list
[Callable
]) – a list of functions to be run in the pipeline. The first function will be used to accept the input, and the last function will be used to return the output. The rest of the functions will be used to process the data in between. The functions will be run in the order they are defined in the list.
- run(*args, **kwargs)[source]¶
Execute the pipeline in a transactional manner.
All the args and kwargs will be passed to the first function in the pipeline. The pipeline will run in one transaction, and all the inject can only see the data inserted in this transaction (to guarantee only the new inserted data will be processed in this pipeline).
This will also return the final result of the last function in the pipeline.
- Return type:
- class vechord.registry.VechordRegistry(namespace, url)[source]¶
Create a registry for the given namespace and PostgreSQL URL.
- Parameters:
- register(tables, create_index=True)[source]¶
Register the given tables to the registry.
This will create the tables in the database if not exists.
- create_pipeline(steps)[source]¶
Create the
VechordPipeline
to run multiple functions in a transaction.
- select_by(obj, fields=None, limit=None)[source]¶
Retrieve the requested fields for the given object stored in the DB.
- Parameters:
obj (
TypeVar
(T
, bound=Table
)) – the object to be retrieved, this should be generated fromTable.partial_init()
, while the given values will be used for filtering (=
oris
).fields (
Optional
[Sequence
[str
]]) – the fields to be retrieved, if not set, all the fields will be retrieved.limit (
Optional
[int
]) – the maximum number of results to be returned, if not set, all the results will be returned.
- Return type:
- search_by_vector(cls, vec, topk=10, return_fields=None, probe=None)[source]¶
Search the vector for the given Table class.
- Parameters:
cls (
type
[TypeVar
(T
, bound=Table
)]) – the Table class to be searched.vec (
ndarray
) – the vector to be searched.topk (
int
) – the number of results to be returned.return_fields (
Optional
[Sequence
[str
]]) – the fields to be returned, if not set, all the non-[vector,keyword] fields will be returned.probe (
Optional
[int
]) – how many K-means clusters to probe for the vec.
- Return type:
- search_by_multivec(cls, multivec, topk=10, return_fields=None, maxsim_refine=1000, probe=None)[source]¶
Search the multivec for the given Table class.
- Parameters:
cls (
type
[TypeVar
(T
, bound=Table
)]) – the Table class to be searched.multivec (
ndarray
) – the multivec to be searched.topk (
int
) – the number of results to be returned.maxsim_refine (
int
) – the maximum number of document vectors to be compute with full-precision for each vector in the multivec. 0 means all the distances are compute with bit quantization.probe (
Optional
[int
]) – how many K-means clusters to probe for each vector in the multivec.return_fields (
Optional
[Sequence
[str
]]) – the fields to be returned, if not set, all the non-[vector,keyword] fields will be returned.
- Return type:
- search_by_keyword(cls, keyword, topk=10, return_fields=None)[source]¶
Search the keyword for the given Table class.
- Parameters:
cls (
type
[TypeVar
(T
, bound=Table
)]) – the Table class to be searched.keyword (
str
) – the keyword to be searched.topk (
int
) – the number of results to be returned.return_fields (
Optional
[Sequence
[str
]]) – the fields to be returned, if not set, all the non-[vector,keyword] fields will be returned.
- Return type:
- remove_by(obj)[source]¶
Remove the given object from the DB.
- Parameters:
obj (
Table
) – the object to be removed, this should be a Table.partial_init() instance, which means given values will be used for filtering.
- insert(obj)[source]¶
Insert the given object to the DB.
- Parameters:
obj (
Table
) – the object to be inserted
- copy_bulk(objs)[source]¶
Insert the given list of objects to the DB.
This is more efficient than calling insert for each object.
- inject(input=None, output=None)[source]¶
Decorator to inject the data for the function arguments & return value.
- Parameters:
input (
Optional
[type
[Table
]]) – the input table to be retrieved from the DB. If not set, the function will require the input to be passed in the function call.output (
Optional
[type
[Table
]]) – the output table to store the return value. If not set, the return value will be return to the caller in a list.
VechordClient¶
Types¶
- class vechord.spec.DefaultDocument(*, uid: ~vechord.spec.PrimaryKeyUUID = <factory>, title: str = '', text: str, created_at: ~datetime.datetime = <factory>)[source]¶
Default Document table class.
- class vechord.spec.ForeignKey(*args, **kwargs)[source]¶
Reference to another table’s attribute as a foreign key.
This should be used in the Annotated[] type hint.
- class vechord.spec.Keyword[source]¶
Keyword type for text search. (wrap
str
)User can assign the str type, it will be tokenized and converted to bm25vector in PostgreSQL.
- class vechord.spec.PrimaryKeyAutoIncrease[source]¶
Primary key with auto-increment ID type. (wrap
int
)
- class vechord.spec.PrimaryKeyUUID(hex=None, bytes=None, bytes_le=None, fields=None, int=None, version=None, *, is_safe=SafeUUID.unknown)[source]¶
Primary key with UUID type. (wrap
UUID
)This doesn’t come with auto-generate, because PostgreSQL doesn’t support UUID v7, while v4 is purely random and not sortable.
Choose this one over
PrimaryKeyAutoIncrease
when you need universal uniqueness.We suggest to use:
class MyTable(Table): uid: PrimaryKeyUUID = msgspec.field(default_factory=PrimaryKeyUUID.factory)
- class vechord.spec.Table[source]¶
Base class for table definition.
- classmethod table_schema()[source]¶
Generate the table schema from the class attributes’ type hints.
- classmethod table_psql_types()[source]¶
Generate the corresponding PostgreSQL types for each column.
- classmethod vector_column()[source]¶
Get the vector column name.
- Return type:
Optional
[IndexColumn
[VectorIndex
]]
- classmethod multivec_column()[source]¶
Get the multivec column name.
- Return type:
Optional
[IndexColumn
[MultiVectorIndex
]]
- classmethod keyword_column()[source]¶
Get the keyword column name.
- Return type:
Optional
[IndexColumn
[KeywordIndex
]]
- class vechord.spec.Vector(*args, **kwargs)[source]¶
Vector type with fixed dimension.
User can assign np.ndarray with np.float32 type or list[float] type.
- vechord.spec.create_chunk_with_dim(dim)[source]¶
Create a chunk table class with a specific vector dimension.
This comes with vector and keyword column. It also has a foreign key to the
DefaultDocument
table. (If this is used, theDefaultDocument
table must be registered too.)
Augment¶
- class vechord.augment.GeminiAugmenter(model='models/gemini-1.5-flash-001', ttl_sec=600)[source]¶
Bases:
BaseAugmenter
Gemini Augmenter.
Context caching is only available for stable models with fixed versions. Minimal cache token is 32768.
Chunk¶
- class vechord.chunk.RegexChunker(size=1536, overlap=200, separator='[\\\\n\\\\r\\\\f\\\\v\\\\t?!.;]{1,}', concat='. ')[source]¶
Bases:
BaseChunker
A simple regex-based chunker.
- class vechord.chunk.SpacyChunker(model='en_core_web_sm')[source]¶
Bases:
BaseChunker
A semantic sentence Chunker based on SpaCy.
This guarantees the generated chunks are sentences.
- class vechord.chunk.WordLlamaChunker(size=1536)[source]¶
Bases:
BaseChunker
A semantic chunker based on WordLlama.
This doesn’t guarantee the generated chunks are sentences.
- class vechord.chunk.GeminiChunker(model='gemini-2.0-flash', size=1536)[source]¶
Bases:
BaseChunker
A semantic chunker based on Gemini.
Embedding¶
- class vechord.embedding.SpacyDenseEmbedding(model='en_core_web_sm', dim=96)[source]¶
Bases:
BaseEmbedding
Spacy Dense Embedding.
- class vechord.embedding.GeminiDenseEmbedding(model='models/text-embedding-004', dim=768)[source]¶
Bases:
BaseEmbedding
Gemini Dense Embedding.
- class vechord.embedding.OpenAIDenseEmbedding(model='text-embedding-3-large', dim=3072)[source]¶
Bases:
BaseEmbedding
OpenAI Dense Embedding.
Evaluate¶
- class vechord.evaluate.BaseEvaluator[source]¶
Bases:
ABC
- class vechord.evaluate.GeminiEvaluator(model='gemini-2.0-flash')[source]¶
Bases:
BaseEvaluator
Evaluator using Gemini model to generate search queries.
Extract¶
- class vechord.extract.BaseHTMLParser[source]¶
Bases:
HTMLParser
A simple HTML parser to extract text content.
- class vechord.extract.SimpleExtractor[source]¶
Bases:
BaseExtractor
Local extractor for text files.
Load¶
- class vechord.load.LocalLoader(path, include=None)[source]¶
Bases:
BaseLoader
Load documents from local file system.
- class vechord.load.S3Loader(bucket, prefix, include=None)[source]¶
Bases:
BaseLoader
Rerank¶
- class vechord.rerank.CohereReranker(model='rerank-v3.5')[source]¶
Bases:
BaseReranker
Rerank chunks using Cohere API (requires env COHERE_API_KEY).
Service¶
- vechord.service.create_web_app(registry, pipeline)[source]¶
Create a Falcon WSGI application for the given registry.
This includes the: :rtype:
App
health check [GET](/)
tables [GET/POST/DELETE](/api/table/{table_name})
pipeline in a transaction [POST](/api/pipeline)
OpenAPI spec and Swagger UI [GET](/openapi/swagger)