-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Closed
Labels
API DesignExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.PerformanceMemory or execution speed performanceMemory or execution speed performanceReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, Explode
Description
AsOfMerge._get_join_indexers calls to_numpy() on EAs, which can be costly. _MergeOperation._get_join_indexers is a bit more forgiving, using _values_for_factorize in _factorize_keys. The latter still requires a cast to numpy, which makes this a non-starter for (hypothetical) distributed/GPU EAs.
Can we push this up into an EA method that can be potentially overridden? This might be a use case for @mroeschke 's "ExtensionManager" idea.
Metadata
Metadata
Assignees
Labels
API DesignExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.PerformanceMemory or execution speed performanceMemory or execution speed performanceReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, Explode