Skip to content

ENH: String functions for df.aggregate() #62050

@JustusKnnck

Description

@JustusKnnck

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish I could use string functions like "first" and "last" when aggregating a dataframe just like they are used when aggregating a gorupby-object.

Feature Description

The goal is to allow "first" and "last" as valid aggregation strings in DataFrame.agg() and Series.agg() without requiring a groupby.

Implementation idea:

Currently, Series.agg() checks if the passed function name is a valid aggregation from NumPy or Pandas’ reduction methods. We can extend this logic to explicitly map "first" and "last" to the first and last elements of the Series.

Pseudocode:

Inside Series.agg() (simplified)

if isinstance(func, str):
if func == "first":
return self.iloc[0]
if func == "last":
return self.iloc[-1]
# existing code follows...

Expected behavior after change:
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c":[7,8,9]})

aggregations = {"a": "sum", "b": "first", "c": "last"}
df.agg(aggregations)

Returns:

a 6
b 4
c 9

This would align the behavior with groupby().agg(), which already supports "first" and "last".

Alternative Solutions

aggregations = {col: ("sum" if col in sumcols else (lambda x: x.iloc[-1])) for col in df.columns}
df.agg(aggregations)

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions