-
|
I'm not sure if this is a schema or a library responsibility but I wonder if it would be possible for old_items = await rustac.read("items.parquet")
await rustac.write("updated-items.parquet", old_items["features"] + new_items)we won't be appending duplicated items. I think I tested and seems like there are no guarantees on this. |
Beta Was this translation helpful? Give feedback.
Answered by
gadomski
Oct 21, 2025
Replies: 1 comment 1 reply
-
|
Correct, there's no de-duplication inside of rustac. It's not too hard to do yourself: old_items = await rustac.read("items.parquet")
old_item_ids = set(item["id"] for item in old_items["features"])
await rustac.write("update-items.parquet", old_items["features"] + list(item for item in new_items if item.id not in old_item_ids)I'm not sure it belongs in rustac because "de-duplication" is use-case specific ... one user might want to de-deuplicate on id, another might want to de-duplicate on id+collection, another by id+version, etc... |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
betolink
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Correct, there's no de-duplication inside of rustac. It's not too hard to do yourself:
I'm not sure it belongs in rustac because "de-duplication" is use-case specific ... one user might want to de-deuplicate on id, another might want to de-duplicate on id+collection, another by id+version, etc...