-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I wish I could use string functions like "first" and "last" when aggregating a dataframe just like they are used when aggregating a gorupby-object.
Feature Description
The goal is to allow "first" and "last" as valid aggregation strings in DataFrame.agg() and Series.agg() without requiring a groupby.
Implementation idea:
Currently, Series.agg() checks if the passed function name is a valid aggregation from NumPy or Pandas’ reduction methods. We can extend this logic to explicitly map "first" and "last" to the first and last elements of the Series.
Pseudocode:
Inside Series.agg() (simplified)
if isinstance(func, str):
if func == "first":
return self.iloc[0]
if func == "last":
return self.iloc[-1]
# existing code follows...
Expected behavior after change:
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c":[7,8,9]})
aggregations = {"a": "sum", "b": "first", "c": "last"}
df.agg(aggregations)
Returns:
a 6
b 4
c 9
This would align the behavior with groupby().agg(), which already supports "first" and "last".
Alternative Solutions
aggregations = {col: ("sum" if col in sumcols else (lambda x: x.iloc[-1])) for col in df.columns}
df.agg(aggregations)
Additional Context
No response