Hello,
I am not very familiar with the current discussions, but I was wondering if there was a feature in the pipe to join two dataframes on the closest value of a given column. It happens quite often that two dataframes contain different data with different timestamps, yet it feels natural to want to join the two dataframes by matching each row on the left dataframe with the closest row in the right dataframe.
If you want me to add more detail please do tell me.
Regards.
Comment From: TomAugspurger
I think you're looking for merge_asof
, does that look right?
Comment From: MaxHalford
I hadn't noticed that method... Beautiful :).
Sorry for bothering you, have a nice day!
Comment From: MaxHalford
Actually I was a bit hasty, I just realized that this isn't exactly what I want. merge_asof
matches to the upper closest, not to the closest.
Do you see what I mean?
Comment From: jreback
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html?highlight=reindex#pandas.DataFrame.reindex
use tolerance
that works in both directions.
Comment From: MaxHalford
It doesn't work for me, the tolerance in the as_of
function is working but it still only considers forward values, not backward ones.
Comment From: jreback
this is on .reindex
Comment From: jreback
docs are here. please post an example if you are still having difficulty.
Comment From: MaxHalford
Sorry I wasn't available this weekend. I'm not sure I've expressed myself clearly. The issue is the following: I have two dataframes: the first one contains updates for a bike station, the second one contains weather updates. I would like to add to each station update the closest weather update.
I did the following:
weather_df = get_weather_updates(city_name)
weather_df.sort_values(by='moment', inplace=True)
joined_df = pd.merge_asof(left=city_df, right=weather_df, on='moment')
But I got
I'm not sure how I can use the reindex method because the resulting index might not contain unique values, ie. some station updates could be merged with the same weather update.