I'm trying to get through the MovieLens 1M Data Set, but am having an issue with the merge step. I checked all of my dataframes and confirmed that I successfully imported the data from the tables, even though the book/git don't have appropriately marked links. On page 28, you are supposed to merge ratings with users, then merge the result with movies. My code looks like this:

data = pd.merge(pd.merge(ratings, users), movies)

the output that results looks like this:


MergeError Traceback (most recent call last) in () 4 # pandas infers which columns to use as the join keys based on overlapping names 5 ----> 6 data = pd.merge(pd.merge(ratings, users), movies)

//anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy) 37 right_on=right_on, left_index=left_index, 38 right_index=right_index, sort=sort, suffixes=suffixes, ---> 39 copy=copy) 40 return op.get_result() 41 if debug:

//anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in init(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy) 181 (self.left_join_keys, 182 self.right_join_keys, --> 183 self.join_names) = self._get_merge_keys() 184 185 def get_result(self):

//anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_merge_keys(self) 299 left_keys, right_keys 300 """ --> 301 self._validate_specification() 302 303 left_keys = []

//anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in _validate_specification(self) 398 self.right.columns) 399 if len(common_cols) == 0: --> 400 raise MergeError('No common columns to perform merge on') 401 self.left_on = self.right_on = common_cols 402 elif self.on is not None:

MergeError: No common columns to perform merge on

Comment From: jreback

pls post version on numpy/pandas, and your code up to this point

Comment From: WLazar

import numpy as np import pandas as pd cd /Users/WL/Documents/Data_science/pydata-book/ch02 np.version -->

unames = ['user_id', 'gender', 'age', 'occupation', 'zip'] users = pd.read_table('movielens/users.dat', sep='::', header=None, names=unames) rnames = ['user-id', 'movie-id', 'rating', 'timestamp'] ratings = pd.read_table('movielens/ratings.dat', sep='::', header=None, names=rnames) mnames = ['movie-id', 'title', 'genres'] movies = pd.read_table('movielens/movies.dat', sep='::', header=None, names=mnames)

users.merge(ratings, on='user-id')

Comment From: jreback

Looks like a typo in the book; I think should 'user_id' for the on field (and not user-id) and the names in the ratings should be the same

Their maybe someway to report typos if you are inclined.

In [23]: rnames = ['user_id', 'movie-id', 'rating', 'timestamp']

In [24]: ratings = pd.read_table('movielens/ratings.dat', sep='::', header=None,names=rnames)

In [25]: users.merge(ratings, on='user_id')
Out[25]: 
    user_id gender  age  occupation    zip  movie-id  rating  timestamp
0         1      F    1          10  48067      1193       5  978300760
1         1      F    1          10  48067       661       3  978302109
2         1      F    1          10  48067       914       3  978301968
3         1      F    1          10  48067      3408       4  978300275
4         1      F    1          10  48067      2355       5  978824291
5         1      F    1          10  48067      1197       3  978302268
6         1      F    1          10  48067      1287       5  978302039
7         1      F    1          10  48067      2804       5  978300719
8         1      F    1          10  48067       594       4  978302268
9         1      F    1          10  48067       919       4  978301368
10        1      F    1          10  48067       595       5  978824268
11        1      F    1          10  48067       938       4  978301752
12        1      F    1          10  48067      2398       4  978302281
13        1      F    1          10  48067      2918       4  978302124
14        1      F    1          10  48067      1035       5  978301753
15        1      F    1          10  48067      2791       4  978302188
16        1      F    1          10  48067      2687       3  978824268
17        1      F    1          10  48067      2018       4  978301777
18        1      F    1          10  48067      3105       5  978301713
19        1      F    1          10  48067      2797       4  978302039
20        1      F    1          10  48067      2321       3  978302205
21        1      F    1          10  48067       720       3  978300760
22        1      F    1          10  48067      1270       5  978300055
23        1      F    1          10  48067       527       5  978824195
24        1      F    1          10  48067      2340       3  978300103
25        1      F    1          10  48067        48       5  978824351
26        1      F    1          10  48067      1097       4  978301953
27        1      F    1          10  48067      1721       4  978300055
28        1      F    1          10  48067      1545       4  978824139
29        1      F    1          10  48067       745       3  978824268
30        1      F    1          10  48067      2294       4  978824291
31        1      F    1          10  48067      3186       4  978300019
32        1      F    1          10  48067      1566       4  978824330
33        1      F    1          10  48067       588       4  978824268
34        1      F    1          10  48067      1907       4  978824330
35        1      F    1          10  48067       783       4  978824291
36        1      F    1          10  48067      1836       5  978300172
37        1      F    1          10  48067      1022       5  978300055
38        1      F    1          10  48067      2762       4  978302091
39        1      F    1          10  48067       150       5  978301777
40        1      F    1          10  48067         1       5  978824268
41        1      F    1          10  48067      1961       5  978301590
42        1      F    1          10  48067      1962       4  978301753
43        1      F    1          10  48067      2692       4  978301570
44        1      F    1          10  48067       260       4  978300760
45        1      F    1          10  48067      1028       5  978301777
46        1      F    1          10  48067      1029       5  978302205
47        1      F    1          10  48067      1207       4  978300719
48        1      F    1          10  48067      2028       5  978301619
49        1      F    1          10  48067       531       4  978302149
50        1      F    1          10  48067      3114       4  978302174
51        1      F    1          10  48067       608       4  978301398
52        1      F    1          10  48067      1246       4  978302091
53        2      M   56          16  70072      1357       5  978298709
54        2      M   56          16  70072      3068       4  978299000
55        2      M   56          16  70072      1537       4  978299620
56        2      M   56          16  70072       647       3  978299351
57        2      M   56          16  70072      2194       4  978299297
58        2      M   56          16  70072       648       4  978299913
59        2      M   56          16  70072      2268       5  978299297
        ...    ...  ...         ...    ...       ...     ...        ...

[1000209 rows x 8 columns]

Comment From: jreback

@Lingzy your example is not reproducible. if you are having an issue, please open a new issue with a copy-pastable example.