-
Notifications
You must be signed in to change notification settings - Fork 95
Open
Labels
bugThe problem described is something that must be fixedThe problem described is something that must be fixed
Description
Version of Awkward Array
2.8.7
Description and code to reproduce
It looks that ak.to_arrow
and ak.to_arrow_table
make an error when processing 64-bit date types into arrows 32-bit date type, causing data to be changed in the conversion.
(EDIT: much simpler example)
ak.to_arrow(np.array(['2011-01-27', '2011-01-28', '2011-01-29', '2011-01-30'], dtype='datetime64[D]'), extensionarray=True)
<awkward._connect.pyarrow.extn_types.AwkwardArrowArray object at 0x0000023BAFC4E2C0>
[
2011-01-27,
1970-01-01,
2011-01-28,
1970-01-01
]
Original more complex demo.
I'm using polars here purely for nice visual and comparison:
data = np.array([(0, '2025-03-07'), (1, '2025-03-10')], dtype=[('index', '<i8'), ('date', '<M8[D]')])
df1 = pl.from_numpy(data)
df2 = pl.from_arrow(ak.to_arrow_table(ak.to_packed(ak.from_numpy(data)), extensionarray=False))
pltst.assert_frame_equal(df1, df2)

- I'm expecting a round-trip fidelity test of dates via numpy via awkard via arrow to be 100% faithful
ak.to_arrow_table(ak.to_packed(ak.from_numpy(data)))
gives a1970-01-01
when it should be2025-03-10
.
pyarrow.Table
index: extension<awkward<AwkwardArrowType>> not null
date: extension<awkward<AwkwardArrowType>> not null
----
index: [[0,1]]
date: [[2025-03-07,1970-01-01]]
Awkward array is awesome - thank you for such a useful and powerful package.
ianna
Metadata
Metadata
Assignees
Labels
bugThe problem described is something that must be fixedThe problem described is something that must be fixed