-
Notifications
You must be signed in to change notification settings - Fork 3
feat: add colpali retriever #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Conversation
… select k chunks from top pages
|
|
||
| return self._run_model(inputs) | ||
|
|
||
| def calculate_images_embeddings(self, images: List[str]) -> List[Tensor]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
images: List[str] - is it actually str or pil image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes list of pil images, corrected that
| colpali_index_config is not None | ||
| and colpali_index_config.enabled | ||
| and model_resource_config is not None | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one of these fields is not set, we will get a created instance of a ColpaliModelResource, which is not functional.
And we have to call _check_model_processor_device_is_set every time when we need to use it, because it can be in 2 states: valid one and an invalid one.
The code will be easier to read and use if the ColpaliModelResourceConfig could have only valid state. I.e. if ColpaliModelResourceConfig is not set, we can just not create an instance of ColpaliModelResourceConfig at all (this can be done in app.py). And if the ColpaliModelResourceConfig, we should assume that it is in a valid state to use if needed (otherwise it should throw an exception from the constructor if something goes wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed invalid state and simplified logic, now it ColpaliModelResource itself can be either none or valid depending on ColpaliModelResourceConfig which is set using app config on start.
| # if both are set then we can load model | ||
| if ( | ||
| colpali_index_config is not None | ||
| and colpali_index_config.enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
colpali_index_config and colpali_index_config.enabled values come from the part of the config which could be overridden on per-request basis: https://github.com/epam/ai-dial-rag/pull/25/files#diff-50338faca97bed0430a4330eb9445199dd51a3c636bf95bee711680786b234f4R211
I.e. if the default value for the colpali_index_config.enabled is False, and someone will enable in a particular request, the request will fail, because the ColpaliModelResourceConfig will not be in a valid usable state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, now i have removed dependency on it from colapli_model_resource, now it depends only on ColpaliModelResourceConfig, and now it creates model all the time and its field cannot be None anymore, but now ColpaliModelResource itself can be None
| """ | ||
| # Patch create_retriever to use only ColPali retriever | ||
| with patch( | ||
| "aidial_rag.retrieval_chain.create_retriever", new=mock_create_retriever |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why we need to mock create_retriever here. The run_e2e_test function already replaces the ColpaliModelResource with CachedColpaliModelResource. Shouldn't it be enough to get the responses from cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have simplified mock_create_retriever, colpali model resource is being used from arguments rn, which should be CachedColpaliModelResource which was set in the app.
there is a problem with create_retriever, if pdf is small then it will be using AllDocumentsRetriever instead, and if pdf is big enough then it will add semantic retriever anyway, so here i explicitly return only colplali retriever.
tests/test_colpali_retriever.py
Outdated
| cached_app_class = create_cached_app_config() | ||
|
|
||
| # Patch the app creation to use cached model resource | ||
| with patch("aidial_rag.app.DialRAGApplication", new=cached_app_class): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just add a parameter to the DialRAGApplication, which would allow to inject the dependency on the ColpaliModelResource which should be used here, instead of patching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed cached app class.
adding class which is used doesnt seem that good decision, and adding resource as a parameter to app also seems to be odd.
also i have tried replace the resource instance afrer creation of DialRAGApplication but getting instance of it from app is not that simple, so i instead just patched class that was used when colpali resource is creating, so instead standard one it is using cahced version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you forgot to include the name of the file with the tests into the path. This may cause collisions for the tests with the same name in a different files.
See tests/cache/test_colpali_retriever/test_colpali_retrieval_e2e/1adc73f3b7d87d911383d346295d6d66.response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved it to a separate folder, but this embeddings are for one pdf which i use in tests and it is shared so cant use test name in it.
| local_files_only=model_path | ||
| is not None, # if set use only local files from folder | ||
| ).eval() | ||
| processor = processor_class.from_pretrained(model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the cache_dir not needed for processor_class.from_pretrained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here cache dir has base weights for image feature extractor and llm, and all processor's configs are in the model folder which was donwloaded using snapshot download, so cache in this case is not needed
# Conflicts: # aidial_rag/documents.py
Description of changes
Added Colpali retriever:
Added unit tests for retriever and end-to-end test:
Added separate docker file that allows to build image based on image of ai-dial-rag that adds saved model weights inside docker image for specified model
Some measurements for memory consumptions on T4(in colab), batch 8 seems to be best option in time/memory vise.
query + images parallel run
Model Optimization Experiments
To measure time was used vidore-benchmark, batch 4 or 8
0. Pytorch models
All pytorch models were run on gpu with float16
1. ONNX Export
Overview
ONNX export was attempted to optimize model inference performance by converting the PyTorch models.
Export Results
Performance (T4 & L4 GPUs)
Optimizations Tried
TensorRT
2. Quantization (Original PyTorch Models)
3.
torch.compileBackends Tested
cudagraphs, onnxrt, openxla, tvm, inductor