Interface¶
VechordRegistry¶
- class vechord.registry.VechordRegistry(namespace, url)[source]¶
Create a registry for the given namespace and PostgreSQL URL.
- Parameters:
namespace (
str
) – the namespace for this registry, will be the prefix for all the tables registered.url (
str
) – the PostgreSQL URL to connect to.
- register(tables, create_index=True)[source]¶
Register the given tables to the registry.
This will create the tables in the database if not exists.
- Parameters:
tables (
list
[type
[Table
]]) – a list of Table classes to be registered.create_index (
bool
) – whether or not to create the index if not exists.
- run(*args, **kwargs)[source]¶
Execute the pipeline in a transactional manner.
All the args and kwargs will be passed to the first function in the pipeline. The pipeline will run in one transaction, and all the inject can only see the data inserted in this transaction (to guarantee only the new inserted data will be processed in this pipeline).
This will also return the final result of the last function in the pipeline.
- select_by(obj, fields=None, limit=None)[source]¶
Retrieve the requested fields for the given object stored in the DB.
- Parameters:
obj (
Table
) – the object to be retrieved, this should be a Table.partial_init() instance, which means given values will be used for filtering.fields (
Optional
[Sequence
[str
]]) – the fields to be retrieved, if not set, all the fields will be retrieved.limit (
Optional
[int
]) – the maximum number of results to be returned, if not set, all the results will be returned.
- Return type:
list
[Table
]
- search_by_vector(cls, vec, topk=10, return_fields=None)[source]¶
Search the vector for the given Table class.
- Parameters:
cls (
type
[Table
]) – the Table class to be searched.vec (
ndarray
) – the vector to be searched.topk (
int
) – the number of results to be returned.return_fields (
Optional
[Sequence
[str
]]) – the fields to be returned, if not set, all the non-[vector,keyword] fields will be returned.
- Return type:
list
[Table
]
- search_by_multivec(cls, multivec, topk=10, return_fields=None, max_maxsim_tuples=1000, probe=None)[source]¶
Search the multivec for the given Table class.
- Parameters:
cls (
type
[Table
]) – the Table class to be searched.multivec (
ndarray
) – the multivec to be searched.topk (
int
) – the number of results to be returned.max_maxsim_tuples (
int
) – the maximum number of tuples to be considered for the each vector in the multivec.probe (
Optional
[int
]) – TODOreturn_fields (
Optional
[Sequence
[str
]]) – the fields to be returned, if not set, all the non-[vector,keyword] fields will be returned.
- Return type:
list
[Table
]
- search_by_keyword(cls, keyword, topk=10, return_fields=None)[source]¶
Search the keyword for the given Table class.
- Parameters:
cls (
type
[Table
]) – the Table class to be searched.keyword (
str
) – the keyword to be searched.topk (
int
) – the number of results to be returned.return_fields (
Optional
[Sequence
[str
]]) – the fields to be returned, if not set, all the non-[vector,keyword] fields will be returned.
- Return type:
list
[Table
]
- remove_by(obj)[source]¶
Remove the given object from the DB.
- Parameters:
obj (
Table
) – the object to be removed, this should be a Table.partial_init() instance, which means given values will be used for filtering.
- inject(input=None, output=None)[source]¶
Decorator to inject the data for the function arguments & return value.
- Parameters:
input (
Optional
[type
[Table
]]) – the input table to be retrieved from the DB. If not set, the function will require the input to be passed in the function call.output (
Optional
[type
[Table
]]) – the output table to store the return value. If not set, the return value will be return to the caller in a list.
Types¶
- class vechord.spec.Vector(*args, **kwargs)[source]¶
Vector type with fixed dimension.
User can assign np.ndarray with np.float32 type or list[float] type.
- class vechord.spec.ForeignKey(*args, **kwargs)[source]¶
Reference to another table’s attribute as a foreign key.
This should be used in the Annotated[] type hint.
- class vechord.spec.Keyword[source]¶
Keyword type for text search.
User can assign the str type, it will be tokenized and converted to bm25vector in PostgreSQL.
- class vechord.spec.Table[source]¶
Base class for table definition.
- classmethod table_schema()[source]¶
Generate the table schema from the class attributes’ type hints.
- Return type:
Sequence
[tuple
[str
,str
]]
- classmethod vector_column()[source]¶
Get the vector column name.
- Return type:
Optional
[IndexColumn
]
- classmethod multivec_column()[source]¶
Get the multivec column name.
- Return type:
Optional
[IndexColumn
]
- classmethod keyword_column()[source]¶
Get the keyword column name.
- Return type:
Optional
[IndexColumn
]
Augment¶
- class vechord.augment.GeminiAugmenter(model='models/gemini-1.5-flash-001', ttl_sec=600)[source]¶
Bases:
BaseAugmenter
Gemini Augmenter.
Context caching is only available for stable models with fixed versions. Minimal cache token is 32768.
Chunk¶
- class vechord.chunk.RegexChunker(size=1536, overlap=200, separator='[\\\\n\\\\r\\\\f\\\\v\\\\t?!.;]{1,}', concat='. ')[source]¶
Bases:
BaseChunker
A simple regex-based chunker.
- class vechord.chunk.SpacyChunker(model='en_core_web_sm')[source]¶
Bases:
BaseChunker
A semantic sentence Chunker based on SpaCy.
This guarantees the generated chunks are sentences.
- class vechord.chunk.WordLlamaChunker(size=1536)[source]¶
Bases:
BaseChunker
A semantic chunker based on WordLlama.
This doesn’t guarantee the generated chunks are sentences.
- class vechord.chunk.GeminiChunker(model='gemini-2.0-flash', size=1536)[source]¶
Bases:
BaseChunker
A semantic chunker based on Gemini.
Embedding¶
- class vechord.embedding.SpacyDenseEmbedding(model='en_core_web_sm', dim=96)[source]¶
Bases:
BaseEmbedding
Spacy Dense Embedding.
- class vechord.embedding.GeminiDenseEmbedding(model='models/text-embedding-004', dim=768)[source]¶
Bases:
BaseEmbedding
Gemini Dense Embedding.
- class vechord.embedding.OpenAIDenseEmbedding(model='text-embedding-3-large', dim=3072)[source]¶
Bases:
BaseEmbedding
OpenAI Dense Embedding.
- class vechord.embedding.SpladePPSparseEmbedding(url, dim=30522, timeout_sec=10)[source]¶
Bases:
BaseEmbedding
Evaluate¶
- class vechord.evaluate.BaseEvaluator[source]¶
Bases:
ABC
- class vechord.evaluate.GeminiEvaluator(model='gemini-2.0-flash')[source]¶
Bases:
BaseEvaluator
Evaluator using Gemini model to generate search queries.
Extract¶
- class vechord.extract.BaseHTMLParser(*, convert_charrefs=Ellipsis)[source]¶
Bases:
HTMLParser
A simple HTML parser to extract text content.
- class vechord.extract.SimpleExtractor[source]¶
Bases:
BaseExtractor
Local extractor for text files.
- class vechord.extract.GeminiExtractor(model='gemini-2.0-flash')[source]¶
Bases:
SimpleExtractor
Extract text with Gemini model.
Load¶
- class vechord.load.LocalLoader(path, include=None)[source]¶
Bases:
BaseLoader
Load documents from local file system.
- class vechord.load.S3Loader(bucket, prefix, include=None)[source]¶
Bases:
BaseLoader
Rerank¶
- class vechord.rerank.CohereReranker(model='rerank-v3.5')[source]¶
Bases:
BaseReranker
Rerank chunks using Cohere API (requires env COHERE_API_KEY).
Service¶
- vechord.service.create_web_app(registry)[source]¶
Create a Falcon WSGI application for the given registry.
This includes the: - health check [GET](/) - tables [GET/POST/DELETE](/api/table/{table_name}) - pipeline in a transaction [POST](/api/pipeline) - OpenAPI spec and Swagger UI [GET](/openapi/swagger)
- Return type:
App