BUG: read_csv with engine=pyarrow and numpy-nullable dtype #62053

jbrockmendel · 2025-08-06T00:50:10Z

closes BUG: read_csv loses precision when engine='pyarrow' and dtype Int64 #56136 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Also makes this code path robust to always-distinguish behavior in #62040

mroeschke

From the original issue, do you know where we are introducing float to lose precision when wanting the result type to be int?

jbrockmendel · 2025-08-06T16:05:35Z

From the original issue, do you know where we are introducing float to lose precision when wanting the result type to be int?

In arrow_table_to_pandas the pyarrow[int64] columns get converted to np.float64, then in finalize_pandas_output that gets cast back to Int64.

mroeschke · 2025-08-06T16:49:04Z

OK I see, it's pyarrow.Table.to_pandas casting the int to float when there's null.

What if in arrow_table_to_pandas, we always provide fallback type_mapper={pyarrow ints : pandas nullable ints} to avoid the lossy conversions, then afterwards we cast the pandas nullable ints to the appropriate type?

jbrockmendel · 2025-08-06T20:47:08Z

That’s basically what this is currently doing, just not in that function since it is also called from other places.

I’m out of town for a few days. If you feel strongly that this logic should live inside that function I’ll move it when I get back

BUG: read_csv with engine=pyarrow and numpy-nullable dtype

98bedc4

jbrockmendel marked this pull request as ready for review August 6, 2025 00:50

jbrockmendel mentioned this pull request Aug 6, 2025

API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040

Draft

9 tasks

jbrockmendel added 4 commits August 5, 2025 18:14

mypy fixup, error message compat for 32bit builds

7aa640d

minimum version compat

e5b752e

not-infer-string compat

323414c

mypy fixup

96bed9d

mroeschke reviewed Aug 6, 2025

View reviewed changes

mroeschke added the IO CSV read_csv, to_csv label Aug 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: read_csv with engine=pyarrow and numpy-nullable dtype #62053

BUG: read_csv with engine=pyarrow and numpy-nullable dtype #62053

jbrockmendel commented Aug 6, 2025

Uh oh!

mroeschke left a comment

Uh oh!

jbrockmendel commented Aug 6, 2025

Uh oh!

mroeschke commented Aug 6, 2025

Uh oh!

jbrockmendel commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

BUG: read_csv with engine=pyarrow and numpy-nullable dtype #62053

Are you sure you want to change the base?

BUG: read_csv with engine=pyarrow and numpy-nullable dtype #62053

Conversation

jbrockmendel commented Aug 6, 2025

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Aug 6, 2025

Uh oh!

mroeschke commented Aug 6, 2025

Uh oh!

jbrockmendel commented Aug 6, 2025

Uh oh!

Uh oh!