Replies: 2 comments 14 replies
-
Hello, @silwork99! I'm here to help you with your issue. Let's figure this out together! To get the similarity scores for each document in your dataset using Chroma, Faiss, or Pinecone, you can use the respective methods that return documents along with their similarity scores. Here are examples for each: Using Chromafrom langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
# Initialize embeddings and vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma("langchain_store", embeddings)
# Define your query embedding
query_embedding = [0.1, 0.2, 0.3, 0.4] # Example embedding
# Get similarity scores for each document
results = vectorstore.similarity_search_by_vector_with_relevance_scores(
embedding=query_embedding,
k=2500 # Number of results to return
)
# Print the results
for doc, score in results:
print(f"Document: {doc.page_content}, Score: {score}") Using Faissfrom langchain_community.embeddings.openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
# Initialize embeddings and vector store
embeddings = OpenAIEmbeddings()
texts = ["Document 1 content", "Document 2 content", "Document 3 content"]
faiss_store = FAISS.from_texts(texts, embeddings)
# Define your query embedding
query_embedding = [0.1, 0.2, 0.3, 0.4] # Example embedding
# Get similarity scores for each document
results = faiss_store.similarity_search_with_score_by_vector(
embedding=query_embedding,
k=2500 # Number of results to return
)
# Print the results
for doc, score in results:
print(f"Document: {doc.page_content}, Score: {score}") Using Pineconefrom langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
# Initialize embeddings and vector store
embeddings = OpenAIEmbeddings()
index_name = "my-index"
namespace = "my-namespace"
vectorstore = PineconeVectorStore(
index_name=index_name,
embedding=embeddings,
namespace=namespace,
)
# Define your query embedding
query_embedding = [0.1, 0.2, 0.3, 0.4] # Example embedding
# Get similarity scores for each document
results = vectorstore.similarity_search_by_vector_with_score(
embedding=query_embedding,
k=2500 # Number of results to return
)
# Print the results
for doc, score in results:
print(f"Document: {doc.page_content}, Score: {score}") These methods will return the documents along with their similarity scores, allowing you to see the relevance of each document in your dataset [1][2][3]. |
Beta Was this translation helpful? Give feedback.
-
@dosu |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
Hello, I have built a vector store of more than 2500 documents and I want to calculate the similarity score of a query wit the whole documents in the data not only the most relevant.
I have used Chroma, Faiss and Pinecone and all of them returns only 4 or 5 scores maximum of the most relevant data.
I have tried to set the k = 2500+ which is the size of the data but nothing has changed
How should I do to get the scores of each document in the data?
Thank you.
System Info
python -m langchain_core.sys_info
Beta Was this translation helpful? Give feedback.
All reactions