Skip to content

API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 48 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
5e88fde
BUG: read_csv with engine=pyarrow and numpy-nullable dtype
jbrockmendel Aug 6, 2025
eae6f64
mypy fixup, error message compat for 32bit builds
jbrockmendel Aug 6, 2025
2861b16
minimum version compat
jbrockmendel Aug 6, 2025
5369afa
not-infer-string compat
jbrockmendel Aug 6, 2025
db35a9c
mypy fixup
jbrockmendel Aug 6, 2025
505bfb6
update usage
jbrockmendel Aug 11, 2025
febe83c
CLN: remove redundant check
jbrockmendel Aug 11, 2025
c81cbec
Use Matts idea
jbrockmendel Aug 11, 2025
26a3049
re-xfail
jbrockmendel Aug 12, 2025
a70b429
API: rank with nullable dtypes preserve NA
jbrockmendel Aug 4, 2025
99a71b7
API: improve dtype in df.where with EA other
jbrockmendel Aug 3, 2025
c86747d
GH refs
jbrockmendel Aug 3, 2025
9d222d8
doc fixup
jbrockmendel Aug 3, 2025
6f800b3
BUG: Decimal(NaN) incorrectly allowed in ArrowEA constructor with tim…
jbrockmendel Jul 3, 2025
514a56f
GH ref
jbrockmendel Jul 3, 2025
fca3c7c
BUG: ArrowEA constructor with timestamp type
jbrockmendel Jul 4, 2025
f20758a
POC: consistent NaN treatment for pyarrow dtypes
jbrockmendel Jun 28, 2025
cc416fa
comment
jbrockmendel Jun 28, 2025
7094d85
Down to 40 failing tests
jbrockmendel Jul 5, 2025
eeb0d32
Fix rank, json tests
jbrockmendel Jul 6, 2025
814d001
CLN: remove outdated
jbrockmendel Jul 6, 2025
5db5e4b
Fix where kludge
jbrockmendel Jul 6, 2025
87536a7
update tests
jbrockmendel Jul 6, 2025
64f4271
Fix remaining tests
jbrockmendel Jul 6, 2025
26d1177
mypy fixup
jbrockmendel Jul 7, 2025
bcb2506
old-numpy compat
jbrockmendel Jul 7, 2025
8f99d05
simplify
jbrockmendel Jul 7, 2025
5abd585
Better option name, fixture
jbrockmendel Jul 31, 2025
70830f7
default True
jbrockmendel Jul 31, 2025
58b3c4f
Patch ops
jbrockmendel Jul 31, 2025
cd7ec33
mypy fixup
jbrockmendel Jul 31, 2025
cf7b229
Test for setitem/construction
jbrockmendel Jul 31, 2025
eb12ea1
update ufunc test
jbrockmendel Jul 31, 2025
f0262ef
Improve rank test skips
jbrockmendel Jul 31, 2025
544faf1
ENH: mode.nan_is_na for numpy-nullable dtypes
jbrockmendel Aug 4, 2025
6c4b68f
update style test
jbrockmendel Aug 4, 2025
90d3a28
update asvs, mypy ignores
jbrockmendel Aug 4, 2025
408aa06
pre-commit fixup
jbrockmendel Aug 4, 2025
9e5ebec
doc fixup
jbrockmendel Aug 4, 2025
0fd2e2d
Remove special-casing
jbrockmendel Aug 4, 2025
7de9f40
comment
jbrockmendel Aug 4, 2025
2f61a58
ruff format
jbrockmendel Aug 5, 2025
36143ad
Set default to True
jbrockmendel Aug 6, 2025
b7ea9ae
whatsnew
jbrockmendel Aug 12, 2025
a625190
Merge branch 'main' into api-nan-vs-na
jbrockmendel Aug 20, 2025
d471aa8
update _cast_pointwise_result
jbrockmendel Aug 20, 2025
27cd097
update cast_pointwise_result
jbrockmendel Aug 20, 2025
1bb0a4e
Merge branch 'main' into api-nan-vs-na
jbrockmendel Aug 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
not-infer-string compat
  • Loading branch information
jbrockmendel committed Aug 12, 2025
commit 5369afad7affc8d489c65c08bac61c3423f12185
27 changes: 24 additions & 3 deletions pandas/io/parsers/arrow_parser_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

import numpy as np

from pandas._config import using_string_dtype

from pandas._libs import lib
from pandas.compat._optional import import_optional_dependency
from pandas.errors import (
Expand All @@ -13,7 +15,10 @@
)
from pandas.util._exceptions import find_stack_level

from pandas.core.dtypes.common import pandas_dtype
from pandas.core.dtypes.common import (
is_string_dtype,
pandas_dtype,
)
from pandas.core.dtypes.dtypes import (
BaseMaskedDtype,
)
Expand Down Expand Up @@ -339,13 +344,29 @@ def read(self) -> DataFrame:
ser = frame[key]
if isinstance(ser.dtype, BaseMaskedDtype):
new_dtype[key] = ser.dtype.numpy_dtype
if (
key in old_dtype
and not using_string_dtype()
and is_string_dtype(old_dtype[key])
and not isinstance(old_dtype[key], StringDtype)
and ser.array._hasna
):
# Cast to make sure we get "NaN" string instead of "NA"
frame[key] = ser.astype(old_dtype[key])
frame.loc[ser.isna(), key] = np.nan
old_dtype[key] = object # Avoid re-casting
elif isinstance(ser.dtype, StringDtype):
# We cast here in case the user passed "category" in
# order to get the correct dtype.categories.dtype
# e.g. test_categorical_dtype_utf16
sdt = StringDtype(na_value=np.nan)
if not using_string_dtype():
sdt = np.dtype(object)
frame[key] = ser.astype(sdt)
frame.loc[ser.isna(), key] = np.nan
else:
sdt = StringDtype(na_value=np.nan)
frame[key] = frame[key].astype(sdt)
new_dtype[key] = sdt # type: ignore[assignment]
frame[key] = frame[key].astype(new_dtype[key])

new_dtype.update(old_dtype)
self.dtype = new_dtype
Expand Down