-
Notifications
You must be signed in to change notification settings - Fork 6.5k
feat: nvidia triton embedding integration #19226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...embeddings/llama-index-embeddings-nvidia-triton/llama_index/embeddings/nvidia_triton/base.py
Show resolved
Hide resolved
|
This PR is stale because it has been open 50 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
|
I believe the asynchronous embedding retrieval is now well implemented and the integration ready to be merged. |
|
This PR is stale because it has been open 50 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
|
This PR was closed because it has been stalled for 10 days with no activity. |
Description
This integration allows LlamaIndex to use embedding models hosted on a Triton Inference Server. Uses
tritonclient.New Package?
Did I fill in the
tool.llamahubsection in thepyproject.tomland provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.tomlfile of the package I am updating? (Except for thellama-index-corepackage)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.
Suggested Checklist:
uv run make format; uv run make lintto appease the lint gods