Hello,

I am not very familiar with the current discussions, but I was wondering if there was a feature in the pipe to join two dataframes on the closest value of a given column. It happens quite often that two dataframes contain different data with different timestamps, yet it feels natural to want to join the two dataframes by matching each row on the left dataframe with the closest row in the right dataframe.

If you want me to add more detail please do tell me.

Regards.

Comment From: TomAugspurger

I think you're looking for merge_asof, does that look right?

Comment From: MaxHalford

I hadn't noticed that method... Beautiful :).

Sorry for bothering you, have a nice day!

Comment From: MaxHalford

Actually I was a bit hasty, I just realized that this isn't exactly what I want. merge_asof matches to the upper closest, not to the closest.

Do you see what I mean?

Comment From: jreback

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html?highlight=reindex#pandas.DataFrame.reindex

use tolerance that works in both directions.

Comment From: MaxHalford

It doesn't work for me, the tolerance in the as_of function is working but it still only considers forward values, not backward ones.

Comment From: jreback

this is on .reindex

Comment From: jreback

docs are here. please post an example if you are still having difficulty.

Comment From: MaxHalford

Sorry I wasn't available this weekend. I'm not sure I've expressed myself clearly. The issue is the following: I have two dataframes: the first one contains updates for a bike station, the second one contains weather updates. I would like to add to each station update the closest weather update.

screen shot 2016-10-31 at 11 09 08

screen shot 2016-10-31 at 11 07 12

I did the following:

weather_df = get_weather_updates(city_name)
weather_df.sort_values(by='moment', inplace=True)
joined_df = pd.merge_asof(left=city_df, right=weather_df, on='moment')

But I got

screen shot 2016-10-31 at 11 13 42

I'm not sure how I can use the reindex method because the resulting index might not contain unique values, ie. some station updates could be merged with the same weather update.