Simple Semantic Search
In this tutorial, we’ll demonstrate how to use Upstash Vector for semantic search. We will upload several documents and perform a search query to find the most semantically similar documents using embeddings generated automatically by Upstash.
Installation and Setup
First, we need to create a Vector Index in the Upstash Console. Once we have our index, we will copy the UPSTASH_VECTOR_REST_URL
and UPSTASH_VECTOR_REST_TOKEN
and paste them to our .env
file. To learn more about index creation, you can check out this page.
Add the following content to your .env
file (replace with your actual URL and token):
UPSTASH_VECTOR_REST_URL=your_upstash_url
UPSTASH_VECTOR_REST_TOKEN=your_upstash_token
We now need to install the upstash-vector
library via PyPI. Additionally, we will install python-dotenv
to load environment variables from the .env
file.
pip install upstash-vector python-dotenv
Code
Create a Python script (e.g., main.py
) and add the following code to perform semantic search using Upstash Vector:
from upstash_vector import Index
from dotenv import load_dotenv
import time
# Load environment variables from a .env file
load_dotenv()
# Initialize the index from environment variables (URL and token)
index = Index.from_env()
# Example documents to be indexed
documents = [
{"id": "1", "text": "Python is a popular programming language."},
{"id": "2", "text": "Machine learning enables computers to learn from data."},
{"id": "3", "text": "Upstash provides low-latency database solutions."},
{"id": "4", "text": "Semantic search is a technique for understanding the meaning of queries."},
{"id": "5", "text": "Cloud computing allows for scalable and flexible resource management."}
]
# Reset the index to remove previous data
index.reset()
# Upsert documents into Upstash (embeddings are generated automatically)
for doc in documents:
index.upsert(
vectors=[
(doc["id"], doc["text"], {"text": doc["text"]})
]
)
print(f"Document {doc['id']} inserted.")
# Wait for the documents to be indexed
time.sleep(1)
# Search for documents similar to the query
query = "What is Python?"
results = index.query(data=query, top_k=3, include_metadata=True)
# Display search results
print("Search Results:")
for result in results:
print(f"ID: {result.id}")
print(f"Score: {result.score:.4f}")
print(f"Metadata: {result.metadata}")
print("-" * 40) # Separator line between results
Running the Code
To run the code, execute the following command in your terminal:
python main.py
Here is an example output for the search query “What is Python?“:
Document 1 inserted.
Document 2 inserted.
Document 3 inserted.
Document 4 inserted.
Document 5 inserted.
Search Results:
ID: 1
Score: 0.9080
Metadata: {'text': 'Python is a popular programming language.'}
----------------------------------------
ID: 2
Score: 0.7592
Metadata: {'text': 'Machine learning enables computers to learn from data.'}
----------------------------------------
ID: 4
Score: 0.7388
Metadata: {'text': 'Semantic search is a technique for understanding the meaning of queries.'}
----------------------------------------
Code Breakdown
-
Environment Setup: We use
python-dotenv
to load our environment variables and use theIndex.from_env()
method to initialize the index client. -
Document Insertion: We define a list of documents, each with a unique ID and text content. The
upsert()
function inserts these documents into our index. These documents are automatically converted into embeddings. To learn more about Upstash Embedding Models, you can check out this page. -
Index Reset: Before inserting documents, the
reset()
function clears any existing data in the index. -
Search Query: After inserting the documents, we perform semantic search. The
query()
function returns thetop_k
most similar documents to the query along with their metadata ifinclude_metadata
is set toTrue
.
Was this page helpful?