Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[ ] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import gc
for i in range(0,10):
series = pd.Series([0.008, 0.002])
json_string = series.to_json(orient="records")
_ = gc.collect()
print("gc_count={}".format(len(gc.get_objects())))
Output:
gc_count=46619
gc_count=46619
gc_count=46620
gc_count=46621
gc_count=46622
gc_count=46623
gc_count=46624
gc_count=46625
gc_count=46626
gc_count=46627
Issue Description
pd.Series.to_json() seems to have memory leak issue. See reproducible example.
Expected Behavior
After collecting GC objects, the count in each iteration should be constant.
Installed Versions
Comment From: loicdiridollou
Hi @rockanjan,
I tried to reproduce your code on my setup and it does not seem to behave the same. I am using a more recent version of pandas though (1.5.3). I did try to run the code multiple times and always seeing the same pattern with decreasing values (I removed i
in the loop in case that could have been a factor).
Does your system behave similarly with a newer version of pandas?
import pandas as pd
import gc
for _ in range(0,10):
series = pd.Series([0.008, 0.002])
json_string = series.to_json(orient="records")
_ = gc.collect()
print("gc_count={}".format(len(gc.get_objects())))
gc_count=105533
gc_count=105529
gc_count=105525
gc_count=105521
gc_count=105521
gc_count=105521
gc_count=105521
gc_count=105521
gc_count=105521
gc_count=105521
And this is my local setup:
INSTALLED VERSIONS
------------------
commit : 2e218d10984e9919f0296931d92ea851c6a6faf5
python : 3.10.9.final.0
python-bits : 64
OS : Darwin
OS-release : 22.1.0
Version : Darwin Kernel Version 22.1.0: Sun Oct 9 20:14:30 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T8103
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.5.3
numpy : 1.24.2
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 65.4.1
pip : 23.0
Comment From: phofl
Yeah same on main, no memory leak.
Please check on the release candidate of 2.0
Comment From: rockanjan
I tried reproducing again myself on the same env, and now I do not see any issue.
My production service has older version and there it is consistently increasing the heap size.
commit : db08276bc116c438d3fdee492026f8223584c477
python : 3.8.16.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.228-141.415.amzn2int.x86_64
Version : #1 SMP Tue Dec 20 23:52:14 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8
pandas : 1.1.3
I think this is an issue with only older version of pandas. Will re-open if I see this issue in the latest version.