Hi - I recently upgraded to pandas version 0.21.0 from version 0.20.3 and got the following error writing a dataframe to_json: OverflowError: Unsupported UTF-8 sequence length when encoding string.

The traceback looks like this: Pandas Overflow Error writing pandas 0.21.0 to_json in windows 7

As you can see it fails on converting dates(?). My dataframe has no dates. Here is how my df looks: Pandas Overflow Error writing pandas 0.21.0 to_json in windows 7

However to_json does work for these cases: df = pd.DataFrame(np.random.randint(0,100,size=(5, 4)), columns=list('ABCD')) and also for a standard string df['some_string'] = 'blah'

  • The same exact dataframe does not throw an error in 0.20.3.

Comment From: gfyoung

@gryBox : Thanks for reporting this! Could you provide the exact DataFrame that you were using OR provide a smaller one that we can use to confirm the error?

Comment From: gryBox

@gfyoung Yes. Would putting a link to a tiny .csv to my github work?

Comment From: gfyoung

@gryBox : Yes, that would. You can also upload one directly into the issue BTW.

Comment From: gryBox

@gfyoung Here you go. data

Comment From: gfyoung

@gryBox : I can't replicate the error on master using this dataset:

df = read_csv("<your-data-file>")
df.to_json()   # No error

Comment From: gryBox

@gfyoung I was afraid of that. Can you try it via Jupyter notebook 5.2? If not, I will do more digging.

Comment From: gfyoung

I was afraid of that

What do you mean? Does this example raise for you when you try to execute?

As for Jupyter, I was unable to re-produce either.

Comment From: gryBox

I meant that it could be my system environment. Let me investigate some more.

Comment From: gfyoung

Okay, sounds good. Feel free to re-open if something turns up.