Client¶
Client¶
Documentation for Vector AI client goes here.
- class vectorai.client.ViClient(username: Optional[str] = None, api_key: Optional[str] = None, url: str = 'https://api.vctr.ai', verbose: bool = True)¶
The main Vi client with most of the available read and write methods available to it.
- Parameters
username – your username for accessing vectorai
api_key – your api key for accessing vectorai
url – url of the deployed vectorai database
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.list_collections()
- vectorai.client.request_api_key(username: str, email: str, description: str = "I'd like to try it out.", referral_code: str = 'github_referred')¶
Request an api key Make sure to save the api key somewhere safe. If you have a valid referral code, you can recieve the api key more quickly.
- Parameters
username – Username you’d like to create, lowercase only
email – Email you are using to sign up
description – Description of your intended use case
referral_code – The referral code you’ve been given to allow you to register for an api key before others
- class vectorai.client.ViCollectionClient(collection_name: str, username: str, api_key: str, url: str = 'https://api.vctr.ai', verbose: bool = True)¶
The Vi client when you are mainly working with 1 client.
- Parameters
username – your username for accessing vecdb
api_key – your api key for accessing vecdb
url – url of the deployed vecdb database
collection_name – The name of the collection
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, collection_name, vectorai_url) >>> vi_client.insert_documents(documents)
- advanced_cluster_aggregate(collection_name: str, aggregation_query: Dict, vector_field: str, alias: str = 'default', page: int = 1, page_size: int = 10, asc: bool = False, filters: list = [], flatten: bool = True)¶
Aggregate every cluster in a collection
Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them.
Only can be used after a vector field has been clustered with /advanced_cluster.
- Parameters
collection_name – Name of Collection
aggregation_query – Aggregation query to aggregate data
page_size – Size of each page of results
page – Page of the results
asc – Whether to sort results by ascending or descending order
vector_field – Clustered vector field
alias –
Alias of a cluster
- flatten:
Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.
- advanced_cluster_centroid_documents(collection_name: str, vector_field: str, alias: str = 'default', metric: str = 'cosine', include_vector: bool = True)¶
Returns the document closest to each cluster center of a collection
Only can be used after a vector field has been clustered with /advanced_cluster.
- Parameters
vector_field – Clustered vector field
alias – Alias is used to name a cluster
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
include_vector – Include vectors in the search results
collection_name – Name of Collection
- advanced_cluster_centroids(collection_name: str, vector_field: str, alias: str = 'default')¶
Returns the cluster centers of a collection by a vector field
Only can be used after a vector field has been clustered with /advanced_cluster.
- Parameters
vector_field – Clustered vector field
alias – Alias is used to name a cluster
collection_name – Name of Collection
- advanced_cluster_facets(collection_name: str, vector_field: str, alias: str = 'default', facets_fields: List = [], asc: bool = True)¶
Get Facets in each cluster in a collection
Takes a high level aggregation of every field and every cluster in a collection. This helps you interpret each cluster and what is in them.
Only can be used after a vector field has been clustered with /advanced_cluster.
- Parameters
vector_field – Clustered vector field
alias – Alias is used to name a cluster
facets_fields – Fields to include in the facets, if [] then all
page_size – Size of facet page
page – Page of the results
asc – Whether to sort results by ascending or descending order
date_interval – Interval for date facets
collection_name – Name of Collection
- advanced_clustering_job(collection_name: str, vector_field: str, alias: str = 'default', n_clusters: int = 0, n_init: int = 5, n_iter: int = 10, refresh: bool = True)¶
Clusters a collection by a vector field
Clusters a collection into groups using unsupervised machine learning. Clusters can then be aggregated to understand whats in them and how vectors are seperating data into different groups. Advanced cluster allows for more parameters to tune and alias to name each differently trained clusters.
- Parameters
vector_field – Vector field to perform clustering on
alias – Alias is used to name a cluster
n_clusters – Number of clusters
n_iter – Number of iterations in each run
n_init – Number of runs to run with different centroid seeds
refresh – Whether to refresh the whole collection and retrain the cluster model
collection_name – Name of Collection
- advanced_hybrid_search(collection_name: str, text: str, multivector_query: Dict, text_fields: List, sum_fields: bool = True, facets: List = [], filters: List = [], metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, include_facets: bool = False, asc: bool = False)¶
Advanced Search a text field with vector and text using Vector Search and Traditional Search
Advanced Vector similarity search + Traditional Fuzzy Search with text and vector.
You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.
Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.
- Parameters
collection_name – Name of Collection
page – Page of the results
page_size – Size of each page of results
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters – Query for filtering the search results
facets – Fields to include in the facets, if [] then all
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
include_facets – Include facets in the search results
hundred_scale – Whether to scale up the metric by 100
multivector_query – Query for advance search that allows for multiple vector and field querying
text – Text Search Query (not encoded as vector)
text_fields – Text fields to search against
traditional_weight – Multiplier of traditional search. A value of 0.025~0.1 is good.
fuzzy – Fuzziness of the search. A value of 1-3 is good.
join –
Whether to consider cases where there is a space in the word. E.g. Go Pro vs GoPro.
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- advanced_search(collection_name: str, multivector_query: Dict, sum_fields: bool = True, facets: List = [], filters: List = [], metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, include_facets: bool = False, asc: bool = False, approx: int = 0)¶
Advanced Vector Similarity Search. Support for multiple vectors, vector weightings, facets and filtering
Advance Vector Similarity Search, enables machine learning search with vector search. Search with a multiple vectors for the most similar documents.
For example: Search with a product image and description vectors to find the most similar products by what it looks like and what its described to do.
You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.
Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.
- Parameters
collection_name – Name of Collection
multivector_query – Query for advance search that allows for multiple vector and field querying
page – Page of the results
page_size – Size of each page of results
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters – Query for filtering the search results
facets – Fields to include in the facets, if [] then all
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
include_facets – Include facets in the search results
hundred_scale –
Whether to scale up the metric by 100
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
Example
>>> vi_client = ViCollectionClient(username, api_key, collection_name, url) >>> advanced_search_query = { 'text' : {'vector': encode_question("How do I cluster?"), 'fields' : ['function_vector_']} } >>> vi_client.advanced_search(advanced_search_query)
- advanced_search_by_id(collection_name: str, document_id: str, fields: Dict, sum_fields: bool = True, facets: List = [], filters: List = [], metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, include_facets: bool = False, asc: bool = False)¶
Advanced Single Product Recommendations (Search by an id).
For example: Search with id of a product in the database, and using the product’s image and description vectors to find the most similar products by what it looks like and what its described to do.
You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.
Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.
- Parameters
collection_name – Name of Collection
page – Page of the results
page_size – Size of each page of results
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters – Query for filtering the search results
facets – Fields to include in the facets, if [] then all
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
include_facets – Include facets in the search results
hundred_scale – Whether to scale up the metric by 100
document_id – ID of a document
search_fields –
Vector fields to search against, and the weightings for them.
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
Example
>>> filter_query = [ {'field': 'field_name', 'filter_type': 'text', 'condition_value': 'monkl', 'condition': '=='} ] >>> results = client.advanced_search_by_id(document_id=client.random_documents()['documents'][0]['_id'], fields={'image_url_field_flattened_vector_':1}, filters=filter_query)
- advanced_search_by_ids(collection_name: str, document_ids: Dict, fields: Dict, vector_operation: str = 'mean', sum_fields: bool = True, facets: List = [], filters: List = [], metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, include_facets: bool = False, asc: bool = False)¶
Advanced Multi Product Recommendations (Search by ids).
For example: Search with multiple ids of products in the database, and using the product’s image and description vectors to find the most similar products by what it looks like and what its described to do.
You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.
You can also give weightings of on each product as well e.g. product ID-A weights 100% whilst product ID-B 50%.
Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.
- Parameters
collection_name – Name of Collection
page – Page of the results
page_size – Size of each page of results
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters – Query for filtering the search results
facets – Fields to include in the facets, if [] then all
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
include_facets – Include facets in the search results
hundred_scale – Whether to scale up the metric by 100
document_ids – Document IDs to get recommendations for, and the weightings of each document
search_fields – Vector fields to search against, and the weightings for them.
vector_operation –
Aggregation for the vectors, choose from [‘mean’, ‘sum’, ‘min’, ‘max’]
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- advanced_search_by_positive_negative_ids(collection_name: str, positive_document_ids: Dict, negative_document_ids: Dict, fields: Dict, vector_operation: str = 'mean', sum_fields: bool = True, facets: List = [], filters: List = [], metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, include_facets: bool = False, asc: bool = False)¶
Advanced Multi Product Recommendations with likes and dislikes (Search by ids).
For example: Search with multiple ids of liked and dislike products in the database. Then using the product’s image and description vectors to find the most similar products by what it looks like and what its described to do against the positives and most disimilar products for the negatives.
You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.
You can also give weightings of on each product as well e.g. product ID-A weights 100% whilst product ID-B 50%.
Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.
- Parameters
collection_name – Name of Collection
page – Page of the results
page_size – Size of each page of results
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters – Query for filtering the search results
facets – Fields to include in the facets, if [] then all
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
include_facets – Include facets in the search results
hundred_scale – Whether to scale up the metric by 100
positive_document_ids – Positive Document IDs to get recommendations for, and the weightings of each document
negative_document_ids – Negative Document IDs to get recommendations for, and the weightings of each document
search_fields – Vector fields to search against, and the weightings for them.
vector_operation –
Aggregation for the vectors, choose from [‘mean’, ‘sum’, ‘min’, ‘max’]
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- advanced_search_with_positive_negative_ids_as_history(collection_name: str, vector: List, positive_document_ids: Dict, negative_document_ids: Dict, fields: Dict, vector_operation: str = 'mean', sum_fields: bool = True, facets: List = [], filters: List = [], metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, include_facets: bool = False, asc: bool = False)¶
Advanced Search with Likes and Dislikes as history
For example: Vector search of a query vector with multiple ids of liked and dislike products in the database. Then using the product’s image and description vectors to find the most similar products by what it looks like and what its described to do against the positives and most disimilar products for the negatives.
You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.
You can also give weightings of on each product as well e.g. product ID-A weights 100% whilst product ID-B 50%.
Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.
- Parameters
collection_name – Name of Collection
page – Page of the results
page_size – Size of each page of results
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters – Query for filtering the search results
facets – Fields to include in the facets, if [] then all
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
include_facets – Include facets in the search results
hundred_scale – Whether to scale up the metric by 100
positive_document_ids – Positive Document IDs to get recommendations for, and the weightings of each document
negative_document_ids – Negative Document IDs to get recommendations for, and the weightings of each document
search_fields – Vector fields to search against, and the weightings for them.
vector_operation – Aggregation for the vectors, choose from [‘mean’, ‘sum’, ‘min’, ‘max’]
vector –
Vector, a list/array of floats that represents a piece of data
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- aggregate(collection_name: str, aggregation_query: Dict, page: int = 1, page_size: int = 10, asc: bool = False, flatten: bool = True)¶
Aggregate a collection
Aggregation/Groupby of a collection using an aggregation query. The aggregation query is a json body that follows the schema of:
{ "groupby" : [ {"name": <nickname/alias>, "field": <field in the collection>, "agg": "category"}, {"name": <another_nickname/alias>, "field": <another groupby field in the collection>, "agg": "category"} ], "metrics" : [ {"name": <nickname/alias>, "field": <numeric field in the collection>, "agg": "avg"} ] }
“groupby” is the fields you want to split the data into. These are the available groupby types:
category” : groupby a field that is a category
“metrics” is the fields you want to metrics you want to calculate in each of those. These are the available metric types: every aggregation includes a frequency metric:
average”, “max”, “min”, “sum”, “cardinality”
- Parameters
collection_name – Name of Collection
aggregation_query – Aggregation query to aggregate data
page_size – Size of each page of results
page – Page of the results
asc –
Whether to sort results by ascending or descending order
- flatten:
Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.
- bulk_edit_document(collection_name: str, documents: List[Dict])¶
Edits documents by providing a key value pair of fields you are adding or changing, make sure to include the “_id” in the documents. :param collection_name: Name of collection :param documents: A list of documents. Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’
- bulk_edit_documents(collection_name, edits: List[Dict], chunk_size=15)¶
Bulk edit documents. :param collection_name: Name of collection :param edits: A list of documents to edit :param chunk_size: The size of the chunk
- Returns
Dictionary with the keys ‘edited_successfully’ (the number of documents successfully edited), ‘failed’ (the number of documents that failed to edit), ‘failed_document_ids’ (documents which failed to edit)
- bulk_id(collection_name: str, document_ids: List[str])¶
Look up multiple document by their ids
- Parameters
document_ids – IDs of documents
include_vector – Include vectors in the search results
collection_name – Name of Collection
- bulk_insert(collection_name: str, documents: List, insert_date: bool = True, overwrite: bool = True)¶
Insert multiple documents into a Collection When inserting the document you can specify your own id for a document by using the field name “_id”. For specifying your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”
- Parameters
collection_name – Name of Collection
documents – A list of documents. Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’
insert_date – Whether to include insert date as a field ‘insert_date_’.
overwrite – Whether to overwrite document if it exists.
- bulk_insert_and_encode(collection_name: str, docs: list, models: dict)¶
Client-side encoding of documents to improve speed of inserting. This removes the step of retrieving the vectors and can be useful to accelerate the encoding process if required. Models can be one of ‘text’, ‘audio’ or ‘image’.
- bulk_missing_id(collection_name: str, document_ids: List[str])¶
Return IDs that are not in a collection.
- check_schema(collection_name: str, document: Dict = None)¶
Check the schema of a given collection.
- Parameters
collection_name – Name of collection.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.check_schema(collection_name)
- chunk_search(collection_name: str, vector: List, search_fields: list, chunk_scoring: str = 'max', facets: List = [], filters: List = [], metric: str = 'cosine', sum_fields: bool = True, approx: int = 0, min_score=None, page: int = 1, page_size: int = 20, include_vector: bool = False, include_count: bool = True, include_facets: bool = False, asc: bool = False)¶
Chunk search functionality :param collection_name: Name of collection :param vector: A list of values :param Search_fields: A list of fields to search :param chunk_scoring: How each chunk should be scored :param approx: How many approximate neighbors to go through
- cluster_aggregate(collection_name: str, aggregation_query: Dict, page: int = 1, page_size: int = 10, asc: bool = False, flatten: bool = True)¶
Aggregate every cluster in a collection
Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them.
Only can be used after a vector field has been clustered with /cluster.
- Parameters
collection_name – Name of Collection
aggregation_query – Aggregation query to aggregate data
page_size – Size of each page of results
page – Page of the results
asc –
Whether to sort results by ascending or descending order
- flatten:
Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.
- cluster_centroid_documents(collection_name: str, vector_field: str, metric: str = 'cosine', include_vector: bool = True)¶
Returns the document closest to each cluster center of a collection
Only can be used after a vector field has been clustered with /cluster.
- Parameters
vector_field – Clustered vector field
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
include_vector – Include vectors in the search results
collection_name – Name of Collection
- cluster_centroids(collection_name: str, vector_field: str)¶
Returns the cluster centers of a collection by a vector field
Only can be used after a vector field has been clustered with /cluster.
- Parameters
vector_field – Clustered vector field
collection_name – Name of Collection
- cluster_facets(collection_name: str, fields: List = [], asc: bool = True)¶
Get Facets in each cluster in a collection
Takes a high level aggregation of every field and every cluster in a collection. This helps you interpret each cluster and what is in them.
Only can be used after a vector field has been clustered with /cluster.
- Parameters
facets_fields – Fields to include in the facets, if [] then all
page_size – Size of facet page
page – Page of the results
asc – Whether to sort results by ascending or descending order
date_interval – Interval for date facets
collection_name – Name of Collection
- clustering_job(collection_name: str, vector_field: str, n_clusters: int = 0, refresh: bool = True)¶
Clusters a collection by a vector field
Clusters a collection into groups using unsupervised machine learning. Clusters can then be aggregated to understand whats in them and how vectors are seperating data into different groups.
- Parameters
vector_field – Vector field to perform clustering on
n_clusters – Number of clusters
refresh – Whether to refresh the whole collection and retrain the cluster model
collection_name – Name of Collection
- collection_schema(collection_name: str)¶
Retrieves the schema of a collection
The schema of a collection can include types of: text, numeric, date, bool, etc.
- Parameters
collection_name – Name of Collection
- collection_stats(collection_name: str)¶
Retrieves stats about a collection
Stats include: size, searches, number of documents, etc.
- Parameters
collection_name – Name of Collection
- compare_vector_search_results(collection_name: str, vector_fields: List[str], label: str, id_document: str = None, id_value: str = None, num_rows=10)¶
Compare vector results :param vector_fields: The list of vectors :param id_value: The value of the ID of the document :param id_document: The document with the id_value in it :param label: The label for the vector :param num_rows: The number of rows to compare search results for
Example
compare_vector_search_results(collection_name, vector_fields)
- create_collection(collection_name: str, collection_schema: Dict = {})¶
Create a collection
A collection can store documents to be searched, retrieved, filtered and aggregated (similar to Collections in MongoDB, Tables in SQL, Indexes in ElasticSearch).
If you are inserting your own vector use the suffix (ends with) “_vector_” for the field name. and specify the length of the vector in colletion_schema like below example:
{ "collection_schema": { "celebrity_image_vector_": 1024, "celebrity_audio_vector" : 512, "product_description_vector" : 128 } }
- Parameters
collection_name – Name of a collection
collection_schema – A collection schema. This is necessary if the first document is not representative of the overall schema collection. This should be specified if the items need to be edited. The schema needs to look like this : { vector_field_name: vector_length }
Example
>>> collection_schema = {'image_vector_':2048} >>> ViClient.create_collection(collection_name, collection_schema)
- create_collection_from_document(collection_name: str, document: dict)¶
Creates a collection by infering the schema from a document
If you are inserting your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”
- Parameters
collection_name – Name of Collection
document – A Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’
- create_filter_query(collection_name: str, field: str, filter_type: str, filter_values: Union[List[str], str] = None)¶
Filter type can be one of contains/exact_match/categories/exists/insert_date/numeric_range Filter types can be one of: contains: Field must contain this specific string. Not case sensitive. exact_match: Field must have an exact match categories: Matches entire field exists: If field exists in document >= / > / < / <= : Larger than or equal to / Larger than / Smaller than / Smaller than or equal to These, however, can only be applied on numeric/date values. Check collection_schema.
Args: collection_name: The name of the collection field: The field to filter on filter_type: One of contains/exact_match/categories/>=/>/<=/<.
- delete_by_id(collection_name: str, document_id: str)¶
Delete a document in a Collection by its id
- Parameters
document_id – ID of a document
collection_name – Name of Collection
- delete_collection(collection_name: str)¶
Delete the collection via the colleciton name.
- Parameters
collection_name – Name of collection to delete.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.delete_collection(collection_name)
- dimensionality_reduce(collection_name: str, vectors: List[List[float]], vector_field: str, n_components: int, alias: str = 'default')¶
Trains a Dimensionality Reduction model on the collection
Dimensionality reduction allows your vectors to be reduced down to any dimensions greater than 0 using unsupervised machine learning. This is useful for even faster search and visualising the vectors.
- Parameters
vector_field – Vector field to perform dimensionality reduction on
alias – Alias is used to name the dimensionality reduced vectors
n_components – The size/length to reduce the vector down to. If 0 is set then highest possible is of components is set, when this is done you can get reduction on demand of any length.
refresh – Whether to refresh the whole collection and retrain the dimensionality reduction model
collection_name – Name of Collection
- dimensionality_reduction_job(collection_name: str, vector_field: str, n_components: int = 0, alias: str = 'default', refresh: bool = True)¶
Trains a Dimensionality Reduction model on the collection
Dimensionality reduction allows your vectors to be reduced down to any dimensions greater than 0 using unsupervised machine learning. This is useful for even faster search and visualising the vectors.
- Parameters
vector_field – Vector field to perform dimensionality reduction on
alias – Alias is used to name the dimensionality reduced vectors
n_components – The size/length to reduce the vector down to. If 0 is set then highest possible is of components is set, when this is done you can get reduction on demand of any length.
refresh – Whether to refresh the whole collection and retrain the dimensionality reduction model
collection_name – Name of Collection
- edit_document(collection_name: str, edits: Dict[str, str], verbose=True)¶
Edit a document ina collection based on ID
- Parameters
collection_name – Name of collection
edits – What edits to make in a collection.
document_id – Id of the document
- Example: Example:
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.edit_documents(collection_name, edits=documents, workers=10)
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> documents_df = pd.DataFrame.from_records([{'chicken': 'Big chicken'}, {'chicken': 'small_chicken'}, {'chicken': 'cow'}]) >>> vi_client.edit_document(documents=documents_df, models={'chicken': text_encoder.encode})
- edit_documents(collection_name: str, edits: Dict, workers: int = 1)¶
Edit documents in a collection
- Parameters
collection_name – Name of collection
edits – What edits to make in a collection. Ensure that _id is stored in the document.
workers – Number of parallel processes to run.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.edit_documents(collection_name, edits=documents, workers=10)
- encode_array(collection_name: str, array: List, array_field: str)¶
Encode an array into a vector
For example: an array that represents a movie’s categories, field “movie_categories”:
["sci-fi", "thriller", "comedy"] -> <Encode the arrays to vectors> -> | sci-fi | thriller | comedy | romance | drama | |--------|----------|--------|---------|-------| | 1 | 1 | 1 | 0 | 0 | array vector: [1, 1, 1, 0, 0]
- Parameters
array_field – The array field that encoding of the dictionary is trained on
array – The array to encode into vectors
collection_name – Name of Collection
- encode_array_field(collection_name: str, array_fields: List)¶
Encode all arrays in a field for a collection into vectors
Within a collection encode the specified array field in every document into vectors.
For example, array that represents a **movie’s categories, field “movie_categories”:
document 1 array field: {"category" : ["sci-fi", "thriller", "comedy"]} document 2 array field: {"category" : ["sci-fi", "romance", "drama"]} -> <Encode the arrays to vectors> -> | sci-fi | thriller | comedy | romance | drama | |--------|----------|--------|---------|-------| | 1 | 1 | 1 | 0 | 0 | | 1 | 0 | 0 | 1 | 1 | document 1 array vector: {"movie_categories_vector_": [1, 1, 1, 0, 0]} document 2 array vector: {"movie_categories_vector_": [1, 0, 0, 1, 1]}
- Parameters
array_fields – The array field to train on to encode into vectors
collection_name – Name of Collection
- encode_audio(collection_name: str, audio)¶
Encode encode into a vector
_note: audio has to be stored somewhere and be provided as audio_url, a url that stores the audio_
For example: an audio_url represents sounds that a pokemon make:
"https://play.pokemonshowdown.com/audio/cries/pikachu.mp3" -> <Encode the audio to vector> -> audio_url vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]
- Parameters
audio_url – The audio url of an audio to encode into a vector
collection_name – Name of Collection
- encode_audio_job(collection_name: str, audio_field: str, refresh: bool = False)¶
Encode all audios in a field into vectors
Within a collection encode the specified audio field in every document into vectors.
_note: audio has to be stored somewhere and be provided as audio_url, a url that stores the audio_
For example, an audio_url field “pokemon_cries” represents sounds that a pokemon make:
document 1 audio_url field: {"pokemon_cries" : "https://play.pokemonshowdown.com/audio/cries/pikachu.mp3"} document 2 audio_url field: {"pokemon_cries" : "https://play.pokemonshowdown.com/audio/cries/meowth.mp3"} -> <Encode the audios to vectors> -> document 1 audio_url vector: {"pokemon_cries_vector_": [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]} document 2 audio_url vector: {"pokemon_cries_vector_": [0.8364648222923279, 0.6280597448348999, 0.8112713694572449, 0.36105549335479736, 0.005313870031386614 ...]}
- Parameters
audio_field – The audio field to encode into vectors
refresh – Whether to refresh the whole collection and re-encode all to vectors
collection_name – Name of Collection
- encode_dictionary(collection_name: str, dictionary: Dict, dictionary_field: str)¶
Encode an dictionary into a vector
For example: a dictionary that represents a person’s characteristics visiting a store, field “person_characteristics”:
{"height":180, "age":40, "weight":70} -> <Encode the dictionary to vector> -> | height | age | weight | purchases | visits | |--------|-----|--------|-----------|--------| | 180 | 40 | 70 | 0 | 0 | dictionary vector: [180, 40, 70, 0, 0]
- Parameters
collection_name – Name of Collection
dictionary – A dictionary to encode into vectors
dictionary_field – The dictionary field that encoding of the dictionary is trained on
- encode_dictionary_field(collection_name: str, dictionary_fields: List)¶
Encode all dictionaries in a field for collection into vectors
Within a collection encode the specified dictionary field in every document into vectors.
For example: a dictionary that represents a person’s characteristics visiting a store, field “person_characteristics”:
document 1 field: {"person_characteristics" : {"height":180, "age":40, "weight":70}} document 2 field: {"person_characteristics" : {"age":32, "purchases":10, "visits": 24}} -> <Encode the dictionaries to vectors> -> | height | age | weight | purchases | visits | |--------|-----|--------|-----------|--------| | 180 | 40 | 70 | 0 | 0 | | 0 | 32 | 0 | 10 | 24 | document 1 dictionary vector: {"person_characteristics_vector_": [180, 40, 70, 0, 0]} document 2 dictionary vector: {"person_characteristics_vector_": [0, 32, 0, 10, 24]}
- Parameters
dictionary_fields – The dictionary field to train on to encode into vectors
collection_name – Name of Collection
- encode_image(collection_name: str, image)¶
Encode image into a vector
_note: image has to be stored somewhere and be provided as image_url, a url that stores the image_
For example: an image_url represents an image of a celebrity:
"https://www.celebrity_images.com/brad_pitt.png" -> <Encode the image to vector> -> image vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]
- Parameters
image – The image url of an image to encode into a vector
collection_name – Name of Collection
- encode_image_job(collection_name: str, image_field: str, refresh: bool = False)¶
Encode all images in a field into vectors
Within a collection encode the specified image field in every document into vectors.
_note: image has to be stored somewhere and be provided as image_url, a url that stores the image_
For example, an image_url field “celebrity_image” represents an image of a celebrity:
document 1 image_url field: {"celebrity_image" : "https://www.celebrity_images.com/brad_pitt".png} document 2 image_url field: {"celebrity_image" : "https://www.celebrity_images.com/brad_pitt.png"} -> <Encode the images to vectors> -> document 1 image_url vector: {"celebrity_image_vector_": [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]} document 2 image_url vector: {"celebrity_image_vector_": [0.8364648222923279, 0.6280597448348999, 0.8112713694572449, 0.36105549335479736, 0.005313870031386614 ...]}
- Parameters
image_field – The image field to encode into vectors
refresh – Whether to refresh the whole collection and re-encode all to vectors
collection_name – Name of Collection
- encode_text(collection_name: str, text)¶
Encode text into a vector
For example: a text field “product_description” represents the description of a product:
"AirPods deliver effortless, all-day audio on the go. And AirPods Pro bring Active Noise Cancellation to an in-ear headphone — with a customisable fit" -> <Encode the text to vector> -> text vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]
- Parameters
text – Text to encode into vector
collection_name – Name of Collection
- encode_text_job(collection_name: str, text_field: str, refresh: bool = False)¶
Encode all texts in a field into vectors
Within a collection encode the specified text field in every document into vectors.
For example, a text field “product_description” represents the description of a product:
document 1 text field: {"product_description" : "AirPods deliver effortless, all-day audio on the go. And AirPods Pro bring Active Noise Cancellation to an in-ear headphone — with a customisable fit." document 2 text field: {"product_description" : "MacBook Pro elevates the notebook to a whole new level of performance and portability. Wherever your ideas take you, you’ll get there faster than ever with high‑performance processors and memory, advanced graphics, blazing‑fast storage and more — all in a compact package." -> <Encode the texts to vectors> -> document 1 text vector: {"product_description_vector_": [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]} document 2 text vector: {"product_description_vector_": [0.8364648222923279, 0.6280597448348999, 0.8112713694572449, 0.36105549335479736, 0.005313870031386614 ...]}
- Parameters
text_field – The text field to encode into vectors
refresh – Whether to refresh the whole collection and re-encode all to vectors
collection_name – Name of Collection
- facets(collection_name: str, fields: List[str] = [], page: int = 1, page_size: int = 20, asc: bool = False)¶
Retrieve the facets of a collection
Takes a high level aggregation of every field in a collection. This is used in advance search to help create the filter bar for search.
- Parameters
facets_fields – Fields to include in the facets, if [] then all
date_interval – Interval for date facets
page_size – Size of facet page
page – Page of the results
asc – Whether to sort results by ascending or descending order
collection_name – Name of Collection
- filters(collection_name: str, filters: List, page=1, page_size=10, include_vector: bool = False)¶
Filters a collection
Filter is used to retrieve documents that match the conditions set in a filter query. This is used in advance search to filter the documents that are searched.
The filters query is a json body that follows the schema of:
[ {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":"==", "condition_value":"america"}, {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":">=", "condition_value":90}, ]
These are the available filter_type types:
1. "contains": for filtering documents that contains a string. {'field' : 'category', 'filter_type' : 'contains', "condition":"==", "condition_value": "bluetoo"]} 2. "exact_match"/"category": for filtering documents that matches a string or list of strings exactly. {'field' : 'category', 'filter_type' : 'categories', "condition":"==", "condition_value": "tv"]} 3. "categories": for filtering documents that contains any of a category from a list of categories. {'field' : 'category', 'filter_type' : 'categories', "condition":"==", "condition_value": ["tv", "smart", "bluetooth_compatible"]} 4. "exists": for filtering documents that contains a field. {'field' : 'purchased', 'filter_type' : 'exists', "condition":">=", "condition_value":" "} 5. "date": for filtering date by date range. {'field' : 'insert_date_', 'filter_type' : 'date', "condition":">=", "condition_value":"2020-01-01"} 6. "numeric": for filtering by numeric range. {'field' : 'price', 'filter_type' : 'date', "condition":">=", "condition_value":90}
These are the available conditions:
“==”, “!=”, “>=”, “>”, “<”, “<=”
- Parameters
collection_name – Name of Collection
filters – Query for filtering the search results
page – Page of the results
page_size – Size of each page of results
asc – Whether to sort results by ascending or descending order
include_vector – Include vectors in the search results
- head(collection_name: str, page_size: int = 5, return_as_pandas_df: bool = True)¶
The main Vi client with most of the available read and write methods available to it.
- Parameters
collection_name – The name of your collection
page_size – The number of results to return
return_as_pandas_df – If True, return as a pandas DataFrame rather than a JSON.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.head(collection_name, page_size=10)
- hybrid_search(collection_name: str, text: str, vector: List, fields: List, text_fields: List, sum_fields: bool = True, metric: str = 'cosine', min_score=None, traditional_weight=0.075, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)¶
Search a text field with vector and text using Vector Search and Traditional Search
Vector similarity search + Traditional Fuzzy Search with text and vector.
- Parameters
text – Text Search Query (not encoded as vector)
vector – Vector, a list/array of floats that represents a piece of data.
text_fields – Text fields to search against
traditional_weight – Multiplier of traditional search. A value of 0.025~0.1 is good.
fuzzy – Fuzziness of the search. A value of 1-3 is good.
join – Whether to consider cases where there is a space in the word. E.g. Go Pro vs GoPro.
collection_name – Name of Collection
search_fields – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
hundred_scale –
Whether to scale up the metric by 100
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- id(collection_name: str, document_id: str, include_vector: bool = True)¶
Look up a document by its id
- Parameters
document_id – ID of a document
include_vector – Include vectors in the search results
collection_name – Name of Collection
- insert(collection_name: str, document: Dict, insert_date: bool = True, overwrite: bool = True)¶
Insert a document into a Collection When inserting the document you can specify your own id for a document by using the field name “_id”. For specifying your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”
- Parameters
collection_name – Name of Collection
document – A Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’
insert_date – Whether to include insert date as a field ‘insert_date_’.
overwrite – Whether to overwrite document if it exists.
- insert_df(collection_name: str, df: pandas.core.frame.DataFrame, models: Dict[str, Callable] = {}, chunksize: int = 15, workers: int = 1, verbose: bool = True, use_bulk_encode: bool = False)¶
Insert dataframe into a collection
- Parameters
collection_name – Name of collection
df – Pandas DataFrame
models – Models with an encode method
verbose – Whether to print document ids that have failed when inserting.
Example
>>> from vectorai.models.deployed import ViText2Vec >>> text_encoder = ViText2Vec(username, api_key, vectorai_url) >>> documents_df = pd.DataFrame.from_records([{'chicken': 'Big chicken'}, {'chicken': 'small_chicken'}, {'chicken': 'cow'}]) >>> vi_client.insert_df(documents=documents_df, models={'chicken': text_encoder.encode})
- insert_document(collection_name: str, document: Dict, verbose=False)¶
Insert a document into a collection
- Parameters
collection_name – Name of collection
documents – List of documents/jsons/dictionaries.
Example
>>> from vectorai import ViClient >>> from vectorai.models.deployed import ViText2Vec >>> vi_client = ViClient() >>> collection_name = 'test_collection' >>> document = {'chicken': 'Big chicken'} >>> vi_client.insert_document(collection_name, document)
- insert_documents(collection_name: str, documents: List, models: Dict[str, Callable] = {}, chunksize: int = 15, workers: int = 1, verbose: bool = False, use_bulk_encode: bool = False, overwrite: bool = False, show_progress_bar: bool = True)¶
Insert documents into a collection with an option to encode with models.
- Parameters
collection_name – Name of collection
documents – All documents.
models – Models with an encode method
use_bulk_encode – Use the bulk_encode method in models
verbose – Whether to print document ids that have failed when inserting.
overwrite – If True, overwrites document based on _id field.
Example
>>> from vectorai.models.deployed import ViText2Vec >>> text_encoder = ViText2Vec(username, api_key, vectorai_url) >>> documents = [{'chicken': 'Big chicken'}, {'chicken': 'small_chicken'}, {'chicken': 'cow'}] >>> vi_client.insert_documents(documents, models={'chicken': text_encoder.encode})
- insert_single_document(collection_name: str, document: Dict)¶
Encode documents with models.
- Parameters
documents – List of documents/jsons/dictionaries.
Example
>>> from vectorai import ViClient >>> from vectorai.models.deployed import ViText2Vec >>> vi_client = ViClient() >>> collection_name = 'test_collection' >>> document = {'chicken': 'Big chicken'} >>> vi_client.insert_single_document(collection_name, document)
- job_status(collection_name: str, job_id: str, job_name: str)¶
Get status of a job. Whether its starting, running, failed or finished.
- Parameters
job_id –
.
job_name –
.
collection_name – Name of Collection
- list_jobs(collection_name: str)¶
Get history of jobs
List and get a history of all the jobs and its job_id, parameters, start time, etc.
- Parameters
collection_name – Name of Collection
- publish_aggregation(collection_name: str, aggregation_query: dict, aggregation_name: str, aggregated_collection_name: str, description: str = 'published aggregation', date_field: str = 'insert_date_', refresh_time: int = 30, start_immediately: bool = True)¶
Publishes your aggregation query to a new collection Publish and schedules your aggregation query and saves it to a new collection. This new collection is just like any other collection and you can read, filter and aggregate it.
- Parameters
source_collection – The collection where the data to aggregate comes from
dest_collection – The name of collection of where the data will be aggregated to
aggregation_name – The name for the published scheduled aggregation
description – The description for the published scheduled aggregation
aggregation_query – The aggregation query to schedule
date_field – The date field to check whether there is new data coming in
refresh_time – How often should the aggregation check for new data
start_immediately – Whether to start the published aggregation immediately
- random_aggregation_query(collection_name: str, groupby: int = 1, metrics: int = 1)¶
Generates a random filter query.
- Parameters
collection_name – name of collection
groupby – The number of groupbys to randomly generate
metrics – The number of metrics to randomly generate
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.random_aggregation_query(collection_name, groupby=1, metrics=1)
- random_documents(collection_name: str, page_size: int = 20, seed: int = None, include_vector: bool = True, include_fields: list = [])¶
Retrieve some documents randomly
Mainly for testing purposes.
- Parameters
seed – Random Seed for retrieving random documents.
page_size – Size of each page of results
include_vector – Include vectors in the search results
collection_name – Name of Collection
- random_filter_query(collection_name: str, text_filters: int = 1, numeric_filters: int = 0)¶
Generates a random filter query.
- Parameters
collection_name – name of collection
text_filters – The number of text filters to randomly generate
numeric_filters – The number of numeric filters to randomly generate
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.random_filter_query(collection_name, text_filters=1, numeric_filters=0)
- random_recommendation(collection_name: str, field: str, seed=None, sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, approx: int = 0, hundred_scale=True, asc: bool = False)¶
Recommend by random ID using vector search document_id:
ID of a document
- collection_name:
Name of Collection
- field:
Vector fields to search through
- approx:
Used for approximate search
- sum_fields:
Whether to sum the multiple vectors similarity search score as 1 or seperate
- page_size:
Size of each page of results
- page:
Page of the results
- metric:
Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
- min_score:
Minimum score for similarity metric
- include_vector:
Include vectors in the search results
- include_count:
Include count in the search results
- hundred_scale:
Whether to scale up the metric by 100
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- resume_insert_documents(collection_name: str, documents: List, models: Dict[str, Callable] = {}, chunksize: int = 15, workers: int = 1, verbose: bool = False, use_bulk_encode: bool = False, show_progress_bar: bool = True)¶
Resume inserting documents
- retrieve_all_documents(collection_name: str, sort_by: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [], retrieve_chunk_size: int = 1000)¶
Retrieve all documents in a given collection. We recommend specifying specific fields to extract as otherwise this function may take a long time to run.
- Parameters
collection_name – Name of collection.
sort_by – Select the fields by which to sort by.
asc – If true, returns in ascending order of what is sort.
include_vector – If true, includes _vector_ fields to return them.
include_fields – Adjust which fields are returned.
retrieve_chunk_size – The number of documents to retrieve per request.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> all_documents = vi_client.retrieve_all_documents(collection_name)
- retrieve_and_encode(collection_name: str, models: Dict[str, Callable] = {}, chunksize: int = 15, use_bulk_encode: bool = False)¶
Retrieve all documents and re-encode with new models. :param collection_name: Name of collection :param models: Models as a dictionary :param chunksize: the number of results to :param retrieve and then encode and then edit in one go: :param use_bulk_encode: Whether to use bulk_encode on the models.
- retrieve_documents(collection_name: str, page_size: int = 20, cursor: str = None, sort: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [])¶
Retrieve some documents
Cursor is provided to retrieve even more documents. Loop through it to retrieve all documents in the database.
- Parameters
include_fields – Fields to include in the document, if empty list [] then all is returned
cursor – Cursor to paginate the document retrieval
page_size – Size of each page of results
sort – Fields to sort the documents by
asc – Whether to sort results by ascending or descending order
include_vector – Include vectors in the search results
collection_name – Name of Collection
- search(collection_name: str, vector: List, field: List, approx: int = 0, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)¶
Vector Similarity Search. Search a vector field with a vector, a.k.a Nearest Neighbors Search
Enables machine learning search with vector search. Search with a vector for the most similar vectors.
For example: Search with a person’s characteristics, who are the most similar (querying the “persons_characteristics_vector” field):
Query person's characteristics as a vector: [180, 40, 70] representing [height, age, weight] Search Results: [ {"name": Adam Levine, "persons_characteristics_vector" : [180, 56, 71]}, {"name": Brad Pitt, "persons_characteristics_vector" : [180, 56, 65]}, ...]
- Parameters
vector – Vector, a list/array of floats that represents a piece of data.
collection_name – Name of Collection
search_fields – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count –
Include count in the search results
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- search_audio(collection_name: str, audio, fields: List, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)¶
Search an audio field with audio using Vector Search Vector similarity search with an audio directly.
_note: audio has to be stored somewhere and be provided as audio_url, a url that stores the audio_
For example: an audio_url represents sounds that a pokemon make:
"https://play.pokemonshowdown.com/audio/cries/pikachu.mp3" -> <Encode the audio to vector> -> audio vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...] -> <Vector Search> -> Search Results: {...}
- Parameters
audio_url – The audio url of an audio to encode into a vector
collection_name – Name of Collection
search_fields – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
hundred_scale –
Whether to scale up the metric by 100
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- search_audio_by_upload(collection_name: str, audio, fields: List, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)¶
Search an audio field with uploaded audio using Vector Search with an uploaded audio directly.
_note: audio has to be sent as a base64 encoded string_
- Parameters
collection_name – Name of Collection
search_fields – Vector fields to search against
page_size – Size of each page of results
page – Page of the results
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
hundred_scale – Whether to scale up the metric by 100
audio –
Audio in local file path
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- search_by_id(collection_name: str, document_id: str, field: str, sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, approx: int = 0, hundred_scale: bool = False)¶
Single Product Recommendations (Search by an id)
Recommendation by retrieving the vector from the specified id’s document. Then performing a search with that vector.
- Parameters
document_id – ID of a document
collection_name – Name of Collection
search_field – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
hundred_scale –
Whether to scale up the metric by 100
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- search_by_ids(collection_name: str, document_ids: List, field: str, vector_operation: str = 'mean', sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)¶
Multi Product Recommendations (Search by ids)
Recommendation by retrieving the vectors from the specified list of ids documents. Then performing a search with an aggregated vector that is the sum (depends on vector_operation) of those vectors.
- Parameters
document_ids – IDs of documents
vector_operation – Aggregation for the vectors, choose from [‘mean’, ‘sum’, ‘min’, ‘max’]
collection_name – Name of Collection
search_field – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
hundred_scale –
Whether to scale up the metric by 100
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- search_by_positive_negative_ids(collection_name: str, positive_document_ids: List, negative_document_ids: List, field: str, vector_operation: str = 'mean', sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)¶
Multi Product Recommendations with Likes and Dislikes (Search by ids)
Recommendation by retrieving the vectors from the specified list of positive and negative ids documents. Then performing a search with an aggregated vector that is the sum (depends on vector_operation) of positive id vectors minus the negative id vectors.
- Parameters
positive_document_ids – Positive Document IDs to get recommendations for, and the weightings of each document
negative_document_ids – Negative Document IDs to get recommendations for, and the weightings of each document
vector_operation – Aggregation for the vectors, choose from [‘mean’, ‘sum’, ‘min’, ‘max’]
collection_name – Name of Collection
search_field – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
hundred_scale –
Whether to scale up the metric by 100
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- search_image(collection_name: str, image, fields: List, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)¶
Search an image field with image using Vector Search
Vector similarity search with an image directly.
_note: image has to be stored somewhere and be provided as image_url, a url that stores the image_
For example: an image_url represents an image of a celebrity:
"https://www.celebrity_images.com/brad_pitt.png" -> <Encode the image to vector> -> image vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...] -> <Vector Search> -> Search Results: {...}
- Parameters
image_url – The image url of an image to encode into a vector
collection_name – Name of Collection
search_fields – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
hundred_scale –
Whether to scale up the metric by 100
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- search_image_by_upload(collection_name: str, image, fields: List, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector=False, include_count=True, asc=False)¶
Search an image field with uploaded image using Vector Search
Vector similarity search with an uploaded image directly.
_note: image has to be sent as a base64 encoded string_
- Parameters
collection_name – Name of Collection
search_fields – Vector fields to search against
page_size – Size of each page of results
page – Page of the results
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
hundred_scale – Whether to scale up the metric by 100
image –
Image in local file path
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- search_text(collection_name: str, text, fields: List, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)¶
Search a text field with text using Vector Search with text directly.
For example: “product_description” represents the description of a product:
"AirPods deliver effortless, all-day audio on the go. And AirPods Pro bring Active Noise Cancellation to an in-ear headphone — with a customisable fit" -> <Encode the text to vector> -> i.e. text vector, "product_description_vector_": [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...] -> <Vector Search> -> Search Results: {...}
- Parameters
text – Text to encode into vector and vector search with
collection_name – Name of Collection
search_fields – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
hundred_scale –
Whether to scale up the metric by 100
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- search_with_array(collection_name: str, array: List, array_field: str, fields: List, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)¶
Search an array field with an array using Vector Search with an array directly.
For example: an array that represents a movie’s categories, field “movie_categories”:
["sci-fi", "thriller", "comedy"] -> <Encode the arrays to vectors> -> | sci-fi | thriller | comedy | romance | drama | |--------|----------|--------|---------|-------| | 1 | 1 | 1 | 0 | 0 | array vector: [1, 1, 1, 0, 0] -> <Vector Search> -> Search Results: {...}
- Parameters
array_field – The array field that encoding of the dictionary is trained on
array – The array to encode into vectors
collection_name – Name of Collection
search_fields – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
hundred_scale –
Whether to scale up the metric by 100
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- search_with_dictionary(collection_name: str, dictionary: Dict, dictionary_field: str, fields: List, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)¶
Search a dictionary field with a dictionary using Vector Search with a dictionary directly.
For example: a dictionary that represents a person’s characteristics visiting a store, field “person_characteristics”:
{"height":180, "age":40, "weight":70} -> <Encode the dictionary to vector> -> | height | age | weight | purchases | visits | |--------|-----|--------|-----------|--------| | 180 | 40 | 70 | 0 | 0 | dictionary vector: [180, 40, 70, 0, 0] -> <Vector Search> -> Search Results: {...}
- Parameters
collection_name – Name of Collection
search_fields – Vector fields to search against
page_size – Size of each page of results
page – Page of the results
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
hundred_scale – Whether to scale up the metric by 100
dictionary – A dictionary to encode into vectors
dictionary_field –
The dictionary field that encoding of the dictionary is trained on
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- search_with_positive_negative_ids_as_history(collection_name: str, vector: List, positive_document_ids: List, negative_document_ids: List, field: str, vector_operation: str = 'mean', sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)¶
Multi Product Recommendations with Likes and Dislikes (Search by ids)
Search by retrieving the vectors from the specified list of positive and negative ids documents. Then performing a search with search query vector and aggregated vector, that is the sum (depends on vector_operation) of positive id vectors minus the negative id vectors.
- Parameters
vector – Vector, a list/array of floats that represents a piece of data.
positive_document_ids – Positive Document IDs to get recommendations for, and the weightings of each document
negative_document_ids – Negative Document IDs to get recommendations for, and the weightings of each document
vector_operation – Aggregation for the vectors, choose from [‘mean’, ‘sum’, ‘min’, ‘max’]
collection_name – Name of Collection
search_field – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
hundred_scale –
Whether to scale up the metric by 100
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- wait_till_jobs_complete(collection_name: str, job_id: str, job_name: str)¶
Wait until a specific job is complete.
- Parameters
collection_name – Name of collection.
job_id – ID of the job.
job_name – Name of the job.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> job = vi_client.dimensionality_reduction_job('nba_season_per_36_stats_demo', vector_field='season_vector_', n_components=2) >>> vi_client.wait_till_jobs_complete('nba_season_per_36_stats_demo', **job)