Feature Type
-
[x] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
This should be a simple follow-up to https://github.com/pandas-dev/pandas/issues/9471, enabling support for alignment with method='nearest'.
Since fillna internally uses interpolate, which already supports method='nearest', this might work right away, though it will require extensive testing.
Feature Description
The new feature could be implemented by extending the current alignment functionality in Pandas to support method='nearest'
. This would allow the user to align two Series or DataFrames by their indices, using the nearest available value when exact matches are not found. Here's a basic idea of how it could be implemented in pseudocode:
def align_nearest(df1, df2):
# Use a nearest neighbor search to align the indices
df1_nearest = df1.reindex(df2.index, method='nearest')
return df1_nearest
This functionality could be added as a method to the existing pandas.DataFrame
and pandas.Series
objects, integrating smoothly into the current API.
Alternative Solutions
An alternative solution would be to use the existing interpolate
function with method='nearest'
, which can be applied to the DataFrame or Series before performing the alignment. Additionally, third-party libraries like fuzzywuzzy
or scipy.spatial
could be used for more complex nearest matching.
import pandas as pd
from fuzzywuzzy import process
# Example using fuzzywuzzy to find nearest match
df1 = pd.DataFrame([...])
df2 = pd.DataFrame([...])
df1['nearest'] = df1['index_column'].apply(lambda x: process.extractOne(x, df2['index_column'])[0])
However, native support within Pandas would likely be more efficient and user-friendly.