Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [ ] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import gc
for i in range(0,10):
    series = pd.Series([0.008, 0.002])
    json_string = series.to_json(orient="records")
    _ = gc.collect()
    print("gc_count={}".format(len(gc.get_objects())))


Output:
gc_count=46619
gc_count=46619
gc_count=46620
gc_count=46621
gc_count=46622
gc_count=46623
gc_count=46624
gc_count=46625
gc_count=46626
gc_count=46627

Issue Description

pd.Series.to_json() seems to have memory leak issue. See reproducible example.

Expected Behavior

After collecting GC objects, the count in each iteration should be constant.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 4bfe3d07b4858144c219b9346329027024102ab6 python : 3.9.7.final.0 python-bits : 64 OS : Darwin OS-release : 21.6.0 Version : Darwin Kernel Version 21.6.0: Mon Dec 19 20:44:01 PST 2022; root:xnu-8020.240.18~2/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.4.2 numpy : 1.22.2 ...

Comment From: loicdiridollou

Hi @rockanjan, I tried to reproduce your code on my setup and it does not seem to behave the same. I am using a more recent version of pandas though (1.5.3). I did try to run the code multiple times and always seeing the same pattern with decreasing values (I removed i in the loop in case that could have been a factor). Does your system behave similarly with a newer version of pandas?

import pandas as pd
import gc
for _ in range(0,10):
    series = pd.Series([0.008, 0.002])
    json_string = series.to_json(orient="records")
    _ = gc.collect()
    print("gc_count={}".format(len(gc.get_objects())))

gc_count=105533
gc_count=105529
gc_count=105525
gc_count=105521
gc_count=105521
gc_count=105521
gc_count=105521
gc_count=105521
gc_count=105521
gc_count=105521

And this is my local setup:

INSTALLED VERSIONS
------------------
commit           : 2e218d10984e9919f0296931d92ea851c6a6faf5
python           : 3.10.9.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 22.1.0
Version          : Darwin Kernel Version 22.1.0: Sun Oct  9 20:14:30 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T8103
machine          : arm64
processor        : arm
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.5.3
numpy            : 1.24.2
pytz             : 2022.7.1
dateutil         : 2.8.2
setuptools       : 65.4.1
pip              : 23.0

Comment From: phofl

Yeah same on main, no memory leak.

Please check on the release candidate of 2.0

Comment From: rockanjan

I tried reproducing again myself on the same env, and now I do not see any issue.

My production service has older version and there it is consistently increasing the heap size.

commit           : db08276bc116c438d3fdee492026f8223584c477
python           : 3.8.16.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.4.228-141.415.amzn2int.x86_64
Version          : #1 SMP Tue Dec 20 23:52:14 UTC 2022
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : en_US.UTF-8

pandas           : 1.1.3

I think this is an issue with only older version of pandas. Will re-open if I see this issue in the latest version.