Skip to content

BUG: raise a proper exception when str.rsplit is passed a regex and clarify the docs #62035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions pandas/core/strings/accessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -737,12 +737,14 @@ def cat(
%(raises_split)s
See Also
--------
Series.str.split : Split strings around given separator/delimiter.
Series.str.split : Split strings around given separator/delimiter or
regular expression.
Series.str.rsplit : Splits string around given separator/delimiter,
starting from the right.
Series.str.join : Join lists contained as elements in the Series/Index
with passed delimiter.
str.split : Standard library version for split.
re.split : Standard library version for split with ``regex=True``.
str.split : Standard library version for split with ``regex=False``.
str.rsplit : Standard library version for rsplit.

Notes
Expand Down Expand Up @@ -931,7 +933,7 @@ def split(
% {
"side": "end",
"pat_regex": "",
"pat_description": "String to split on",
"pat_description": "String to split on. Does not support regex",
"regex_argument": "",
"raises_split": "",
"regex_pat_note": "",
Expand All @@ -941,6 +943,9 @@ def split(
)
@forbid_nonstring_types(["bytes"])
def rsplit(self, pat=None, *, n=-1, expand: bool = False):
if pat is not None and not isinstance(pat, str):
msg = f"expected a string object, not {type(pat).__name__}"
raise TypeError(msg)
Comment on lines +946 to +948
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this in _str_rsplit for each implementation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure no problem but this exact error is raised here for the other methods so I was following the same process. See for example find.

result = self._data.array._str_rsplit(pat, n=n)
dtype = object if self._data.dtype == object else None
return self._wrap_result(
Expand Down
5 changes: 4 additions & 1 deletion pandas/tests/strings/test_split_partition.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,13 +122,16 @@ def test_split_n(any_string_dtype, method, n):


def test_rsplit(any_string_dtype):
# regex split is not supported by rsplit
values = Series(["a,b_c", "c_d,e", np.nan, "f,g,h"], dtype=any_string_dtype)
result = values.str.rsplit("[,_]")
exp = Series([["a,b_c"], ["c_d,e"], np.nan, ["f,g,h"]])
exp = _convert_na_value(values, exp)
tm.assert_series_equal(result, exp)

# regex split is not supported by rsplit
with pytest.raises(TypeError, match="expected a string object, not Pattern"):
values.str.rsplit(re.compile("[,_]"))


def test_rsplit_max_number(any_string_dtype):
# setting max number of splits, make sure it's from reverse
Expand Down
Loading