Skip to content

awkward array mishandles np.datetime64[D] -> pa.date32 conversion #3643

@phc27x

Description

@phc27x

Version of Awkward Array

2.8.7

Description and code to reproduce

It looks that ak.to_arrow and ak.to_arrow_table make an error when processing 64-bit date types into arrows 32-bit date type, causing data to be changed in the conversion.

(EDIT: much simpler example)

ak.to_arrow(np.array(['2011-01-27', '2011-01-28', '2011-01-29', '2011-01-30'], dtype='datetime64[D]'), extensionarray=True)
<awkward._connect.pyarrow.extn_types.AwkwardArrowArray object at 0x0000023BAFC4E2C0>
[
  2011-01-27,
  1970-01-01,
  2011-01-28,
  1970-01-01
]

Original more complex demo.
I'm using polars here purely for nice visual and comparison:

data = np.array([(0, '2025-03-07'), (1, '2025-03-10')], dtype=[('index', '<i8'), ('date', '<M8[D]')])
df1 = pl.from_numpy(data)
df2 = pl.from_arrow(ak.to_arrow_table(ak.to_packed(ak.from_numpy(data)), extensionarray=False))
pltst.assert_frame_equal(df1, df2)
Image
  • I'm expecting a round-trip fidelity test of dates via numpy via awkard via arrow to be 100% faithful
  • ak.to_arrow_table(ak.to_packed(ak.from_numpy(data))) gives a 1970-01-01 when it should be 2025-03-10.
pyarrow.Table
index: extension<awkward<AwkwardArrowType>> not null
date: extension<awkward<AwkwardArrowType>> not null
----
index: [[0,1]]
date: [[2025-03-07,1970-01-01]]

Awkward array is awesome - thank you for such a useful and powerful package.

Metadata

Metadata

Assignees

Labels

bugThe problem described is something that must be fixed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions