Skip to content

Conversation

@cheriL
Copy link

@cheriL cheriL commented Nov 8, 2024

To fix: #5061

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Nov 8, 2024
@cheriL
Copy link
Author

cheriL commented Nov 12, 2024

@liunux4odoo PTAL

@ZamboLin
Copy link

你这个很奇怪啊
similarity_search_with_relevance_scores 深究下去还是similarity_search_with_score,similarity_search_with_score结果是NotImplementedError,但却能运行?
但是计算出向量似乎没有归一化。。

@cheriL
Copy link
Author

cheriL commented Dec 12, 2024

你这个很奇怪啊 similarity_search_with_relevance_scores 深究下去还是similarity_search_with_score,similarity_search_with_score结果是NotImplementedError,但却能运行? 但是计算出向量似乎没有归一化。。

similarity_search_with_score()方法是在各个 vector store 子类实现的,归一化是调用relevance_score_fn去做的

@ZamboLin
Copy link

similarity_search_with_score()方法是在各个 vector store 子类实现的,归一化是调用relevance_score_fn去做的

是的,我看到归一化处理了_select_relevance_score_fn的 1 - l2_distance / 4.0,,应该没有问题。
然后我打印了一下l2_distance ,发现结果有点奇怪,怀疑是不是哪一步没有归一化,然后翻了半天只找到了NotImplementedError。。是我的langchian库版本不对吗
{
"l2_distance": 566.509521484375
}
{
"l2_distance": 617.7225341796875
}
{
"l2_distance": 642.596923828125
}
{
"l2_distance": 566.509521484375
}
{
"l2_distance": 617.7225341796875
}
{
"l2_distance": 642.596923828125

langchain                   0.1.17

langchain-chatchat 0.3.1 /home/admin123/LLM4/chatchat
langchain-community 0.0.36
langchain-core 0.1.53
langchain-experimental 0.0.58
langchain-milvus 0.1.7
langchain-openai 0.0.6
langchain-text-splitters 0.0.2
langchainhub 0.1.14

@cheriL
Copy link
Author

cheriL commented Dec 13, 2024

@ZamboLin 你的 langchain 版本比较低了,我的

langchain                0.2.16
langchain-community      0.2.16
langchain-core           0.2.39

@ZamboLin
Copy link

langchain                0.2.16
langchain-community      0.2.16
langchain-core           0.2.39

感觉提高,虽然更新了,但是,,据公式反算l_distance 也是600.。
护眼罩/戴防护面具。'), -140.55015563964844)]
docs_and_similarities = self.vectorstore.similarity_search_with_relevance_scores(query, **self.search_kwargs)
2024-12-13 15:33:50,801 langchain_core.vectorstores.base 18401 WARNING No relevant docs were retrieved using the relevance score threshold 0.5

根据vector store 子类,确实找到具体计算了。
res = self.col.search(
data=[embedding],
anns_field=self._vector_field,
param=param,
limit=k,
expr=expr,
output_fields=output_fields,
timeout=timeout,
**kwargs,
)
从这里着手看看情况了。。

@ZamboLin
Copy link

找出问题了。。,milvus用col.search 计算的时候 无论是query的embedding还是库里的向量,都不会做归一化的,。。除非自己做。。。😑

@azhe1234
Copy link

langchain

您好,请问您解决了milvus检索质量差的问题吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] 使用milvus向量库查询时逻辑有误

3 participants