Skip to content

Commit 5b7a100

Browse files
authored
Re-enable chunking when saving to Parquet (#594)
### Change list - Update to arro3 0.2.1 - Enable rechunking table before saving.
1 parent 08d6973 commit 5b7a100

File tree

3 files changed

+90
-91
lines changed

3 files changed

+90
-91
lines changed

lonboard/_serialization.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,16 +29,12 @@
2929

3030
def serialize_table_to_parquet(table: Table, *, max_chunksize: int) -> List[bytes]:
3131
buffers: List[bytes] = []
32-
# NOTE: passing `max_chunksize=0` creates an infinite loop
33-
# https://github.com/apache/arrow/issues/39788
3432
assert max_chunksize > 0
3533

3634
compression_string = (
3735
f"{DEFAULT_PARQUET_COMPRESSION}({DEFAULT_PARQUET_COMPRESSION_LEVEL})"
3836
)
39-
# TODO: restore rechunking
40-
# max_chunksize=max_chunksize
41-
for record_batch in table.to_batches():
37+
for record_batch in table.rechunk(max_chunksize=max_chunksize).to_batches():
4238
with BytesIO() as bio:
4339
# Occasionally it's possible for there to be empty batches in the
4440
# pyarrow table. This will error when writing to parquet. We want to

0 commit comments

Comments
 (0)