|
1 | 1 | # ParquetReadSupport |
2 | 2 |
|
3 | | -`ParquetReadSupport` is a concrete `ReadSupport` (from Apache Parquet) of [UnsafeRows](../../UnsafeRow.md). |
| 3 | +`ParquetReadSupport` is a `ReadSupport` (Apache Parquet) of [UnsafeRows](../../UnsafeRow.md) for non-[Vectorized Parquet Decoding](../../vectorized-decoding/index.md). |
4 | 4 |
|
5 | | -`ParquetReadSupport` is <<creating-instance, created>> exclusively when `ParquetFileFormat` is requested for a [data reader](ParquetFileFormat.md#buildReaderWithPartitionValues) (with no support for [Vectorized Parquet Decoding](../../vectorized-decoding/index.md) and so falling back to parquet-mr). |
| 5 | +`ParquetReadSupport` is the value of `parquet.read.support.class` Hadoop configuration property for the following: |
6 | 6 |
|
7 | | -[[parquet.read.support.class]] |
8 | | -`ParquetReadSupport` is registered as the fully-qualified class name for [parquet.read.support.class](ParquetFileFormat.md#parquet.read.support.class) Hadoop configuration when `ParquetFileFormat` is requested for a [data reader](ParquetFileFormat.md#buildReaderWithPartitionValues). |
| 7 | +* [ParquetFileFormat](ParquetFileFormat.md#buildReaderWithPartitionValues) |
| 8 | +* [ParquetScan](ParquetScan.md#createReaderFactory) |
9 | 9 |
|
10 | | -[[creating-instance]] |
11 | | -[[convertTz]] |
12 | | -`ParquetReadSupport` takes an optional Java `TimeZone` to be created. |
| 10 | +## Creating Instance |
13 | 11 |
|
14 | | -[[logging]] |
15 | | -[TIP] |
16 | | -==== |
17 | | -Enable `ALL` logging level for `org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport` logger to see what happens inside. |
| 12 | +`ParquetReadSupport` takes the following to be created: |
18 | 13 |
|
19 | | -Add the following line to `conf/log4j2.properties`: |
| 14 | +* <span id="convertTz"> `ZoneId` (optional) |
| 15 | +* <span id="enableVectorizedReader"> `enableVectorizedReader` |
| 16 | +* <span id="datetimeRebaseSpec"> DateTime RebaseSpec |
| 17 | +* <span id="int96RebaseSpec"> int96 RebaseSpec |
20 | 18 |
|
21 | | -``` |
22 | | -log4j.logger.org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport=ALL |
23 | | -``` |
| 19 | +`ParquetReadSupport` is created when: |
24 | 20 |
|
25 | | -Refer to <<spark-logging.md#, Logging>>. |
26 | | -==== |
| 21 | +* `ParquetFileFormat` is requested to [buildReaderWithPartitionValues](ParquetFileFormat.md#buildReaderWithPartitionValues) (with [enableVectorizedReader](ParquetFileFormat.md#enableVectorizedReader) disabled) |
| 22 | +* `ParquetPartitionReaderFactory` is requested to [createRowBaseParquetReader](ParquetPartitionReaderFactory.md#createRowBaseParquetReader) |
27 | 23 |
|
28 | | -=== [[init]] Initializing ReadSupport -- `init` Method |
| 24 | +## Logging |
29 | 25 |
|
30 | | -[source, scala] |
31 | | ----- |
32 | | -init(context: InitContext): ReadContext |
33 | | ----- |
| 26 | +Enable `ALL` logging level for `org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport` logger to see what happens inside. |
34 | 27 |
|
35 | | -NOTE: `init` is part of the `ReadSupport` Contract to...FIXME. |
| 28 | +Add the following line to `conf/log4j2.properties`: |
36 | 29 |
|
37 | | -`init`...FIXME |
| 30 | +```text |
| 31 | +logger.ParquetReadSupport.name = org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport |
| 32 | +logger.ParquetReadSupport.level = all |
| 33 | +``` |
38 | 34 |
|
39 | | -=== [[prepareForRead]] `prepareForRead` Method |
| 35 | +Refer to [Logging](../../spark-logging.md). |
40 | 36 |
|
41 | | -[source, scala] |
42 | | ----- |
43 | | -prepareForRead( |
44 | | - conf: Configuration, |
45 | | - keyValueMetaData: JMap[String, String], |
46 | | - fileSchema: MessageType, |
47 | | - readContext: ReadContext): RecordMaterializer[UnsafeRow] |
48 | | ----- |
| 37 | +<!--- |
| 38 | +## Review Me |
| 39 | +
|
| 40 | +`ParquetReadSupport` is <<creating-instance, created>> exclusively when `ParquetFileFormat` is requested for a [data reader](ParquetFileFormat.md#buildReaderWithPartitionValues) (with no support for [Vectorized Parquet Decoding](../../vectorized-decoding/index.md) and so falling back to parquet-mr). |
49 | 41 |
|
50 | | -NOTE: `prepareForRead` is part of the `ReadSupport` Contract to...FIXME. |
| 42 | +[[parquet.read.support.class]] |
| 43 | +`ParquetReadSupport` is registered as the fully-qualified class name for [parquet.read.support.class](ParquetFileFormat.md#parquet.read.support.class) Hadoop configuration when `ParquetFileFormat` is requested for a [data reader](ParquetFileFormat.md#buildReaderWithPartitionValues). |
51 | 44 |
|
52 | | -`prepareForRead`...FIXME |
| 45 | +[[creating-instance]] |
| 46 | +[[convertTz]] |
| 47 | +`ParquetReadSupport` takes an optional Java `TimeZone` to be created. |
| 48 | +--> |
0 commit comments