Hi,

I am trying to group this one column that is for 1-11 regions as categorical variable and then print the mean for each group. My data column is 'Alcohol' and region column to groupby is 'REGION' but I keep getting error. Here is code. I need help as I am new to programming.

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.stats.multicomp as multi

data = pd.read_csv('gapminder2.csv', low_memory=False)

#setting variables you will be working with to numeric
data['ALCOHOL'] = data['alcconsumption'].convert_objects(convert_numeric=True)
data['REGION'] = data['country2'].convert_objects(convert_numeric=True)

ALCOHOL = data['ALCOHOL']

meanTest = pd.DataFrame(columns = ['ALCOHOL', 'REGION']).astype(float)

print ('mean per region')
mean = meanTest.groupby('REGION').mean()
print (mean)

Here is my error: Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. mean per region Empty DataFrame Columns: [ALCOHOL] Index: [] C:/Users/blahblahblah/Documents/untitled1.py:15: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. data['ALCOHOL'] = data['alcconsumption'].convert_objects(convert_numeric=True) C:/Users/C5182508/OneDrive - SAP SE/Documents/untitled1.py:16: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. data['REGION'] = data['country2'].convert_objects(convert_numeric=True)

Here is picture: (https://cloud.githubusercontent.com/assets/26236052/23641223/7ad06c26-02af-11e7-8811-52cb3d1b8f2c.png)

Comment From: TomAugspurger

I don't see any errors in your output, just a couple warnings. This line:

meanTest = pd.DataFrame(columns = ['ALCOHOL', 'REGION']).astype(float)

Creates an empty dataframe, you don't pass a data argument.

I'd recommend clarifying your question, making it reproducible, and asking on stackoverflow