Creating Dense And Sparse Vectors
Since a hybrid index is a combination of a dense and a sparse index, you can use the same methods you have used for dense and sparse indexes, and combine them. Upstash allows you to upsert and query dense and sparse vectors to give you full control over the models you would use. Also, to make embedding easier for you, Upstash provides some hosted models and allows you to upsert and query text data. Behind the scenes, the text data is converted to dense and sparse vectors. You can create your index with a dense and sparse embedding model to use this feature.Using Hybrid Indexes
Upserting Dense and Sparse Vectors
You can upsert dense and sparse vectors into Upstash Vector indexes in two different ways.Upserting Dense and Sparse Vectors
You can upsert dense and sparse vectors into the index as follows:Upserting Text Data
If you created the hybrid index with Upstash-hosted dense and sparse embedding models, you can upsert text data, and Upstash can embed it behind the scenes.Querying Dense and Sparse Vectors
Similar to upserts, you can query dense and sparse vectors in two different ways.Querying with Dense and Sparse Vectors
Hybrid indexes can be queried by providing dense and sparse vectors.Querying with Text Data
If you created the hybrid index with Upstash-hosted dense and sparse embedding models, you can query with text data, and Upstash can embed it behind the scenes before performing the actual query.Fusing Dense And Sparse Query Scores
One of the most crucial parts of the hybrid search pipeline is the step where we fuse or rerank dense and sparse search results. By default, Upstash returns the hybrid query results by fusing/reranking the dense and the sparse search results. It provides two fusing algorithms to choose from to do so.Reciprocal Rank Fusion
RRF is a method for combining results from dense and sparse indexes. It focuses on the order of results, not their scores. Each result’s score is mapped using the formula:K
is a constant set to 60
.
If a result appears in both the dense and sparse indexes, its mapped scores are
added together. If it appears in only one of the indexes, its score remains unchanged.
After all scores are processed, the results are sorted by their combined scores,
and the top-K results are returned.
RRF effectively combines rankings from different sources, making use of their strengths,
while keeping the process simple and focusing on the order of results.
By default, hybrid indexes use RRF to fuse dense and sparse scores. It can be explicitly
set for queries as follows:
Distribution-Based Score Fusion
DBSF is a method for combining results from dense and sparse indexes by considering the distribution of scores. Each score is normalized using the formula:s
is the score.μ
is the mean of the scores.σ
is the standard deviation.(μ − 3 * σ)
represents the minimum value (lower tail of the distribution).(μ + 3 * σ)
represents the maximum value (upper tail of the distribution).