File tree Expand file tree Collapse file tree 2 files changed +12
-7
lines changed Expand file tree Collapse file tree 2 files changed +12
-7
lines changed Original file line number Diff line number Diff line change @@ -524,9 +524,11 @@ class PARQUET_EXPORT WriterProperties {
524
524
525
525
// / Enable writing page index in general for all columns. Default disabled.
526
526
// /
527
- // / Page index contains statistics for data pages and can be used to skip pages
528
- // / when scanning data in ordered and unordered columns. Note that it does not
529
- // / write statistics to the page header once page index is enabled.
527
+ // / Writing statistics to the page index disables the old method of writing
528
+ // / statistics to each data page header.
529
+ // / The page index makes filtering more efficient than the page header, as
530
+ // / it gathers all the statistics for a Parquet file in a single place,
531
+ // / avoiding scattered I/O.
530
532
// /
531
533
// / Please check the link below for more details:
532
534
// / https://github.com/apache/parquet-format/blob/master/PageIndex.md
Original file line number Diff line number Diff line change @@ -575,14 +575,17 @@ Miscellaneous
575
575
+--------------------------+----------+----------+---------+
576
576
| Feature | Reading | Writing | Notes |
577
577
+==========================+==========+==========+=========+
578
- | Column Index | ✓ | ✓ | |
578
+ | Column Index | ✓ | ✓ | \( 1) |
579
579
+--------------------------+----------+----------+---------+
580
- | Offset Index | ✓ | ✓ | |
580
+ | Offset Index | ✓ | ✓ | \( 1) |
581
581
+--------------------------+----------+----------+---------+
582
- | Bloom Filter | ✓ | ✓ | \( 1 ) |
582
+ | Bloom Filter | ✓ | ✓ | \( 2 ) |
583
583
+--------------------------+----------+----------+---------+
584
584
| CRC checksums | ✓ | ✓ | |
585
585
+--------------------------+----------+----------+---------+
586
586
587
- * \( 1) APIs are provided for creating, serializing and deserializing Bloom
587
+ * \( 1) Access to the Column and Offset Index structures is provided, but
588
+ data read APIs do not currently make any use of them.
589
+
590
+ * \( 2) APIs are provided for creating, serializing and deserializing Bloom
588
591
Filters, but they are not integrated into data read APIs.
You can’t perform that action at this time.
0 commit comments