GBQ offers a handy option to ignore unknown values in a data insert.
This is especially useful if working with data that changes in structure without notice.
To make pandas more comprehensively reflecting the functionality of GBQ I would suggest to add an optional parameter in to_gbq that prepends the body (e.g. here) with
body = {'ignoreUnknownValues': True, 'rows': rows}
or to patch the table schema if necessary (which would be more complex to implement, I guess).
Thanks for your consideration (and working on this io option! Love it. Saved me hours, if not days.)
Comment From: jreback
cc @parthea
Comment From: parthea
My initial thinking is that It is trivial to select specific columns in a DataFrame, while ignoring others. Users should modify/create a DataFrame which only includes the specific columns of interest.
I agree that this feature request is useful and will add flexibility through the use of an optional parameter, however there are already a lot of parameters in to_gbq
. I am hesitant to add another parameter in to_gbq
when there is already a solution available.
Comment From: parthea
@jreback Can this be closed, or do you expect a PR for this?
Comment From: jreback
is there an example for this?
Comment From: parthea
Assuming dataframe is
df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
Users would specify which columns to include using the following code.
df[['AAA','BBB']].to_gbq('TestDataSet.TestTable', project_id='project-xxxx')
I don't see an example in the docs, although the pandas tutorial mentions how to select multiple columns in a DataFrame (Chapter 2: 2.3 Selecting multiple columns).
Comment From: jorisvandenbossche
@parthea let's go with your judgement that this is easy do yourself and not needed to add yet another kwarg. So closing for now. If others provide some concrete usecases were this is not trivial, we can always revisit later.