Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
data = {'A': [-2, -1, 0, 1, 2], 'B': [-2, 1, 0, 1, 2]}
df = pd.DataFrame(data)
df
    A   B
0   -2  -2
1   -1  1
2   0   0
3   1   1
4   2   2
df.clip(lower=10, upper=[1,1])
A   B
0   10  10
1   10  10
2   10  10
3   10  10
4   1   1

Issue Description

This is undocumented behavior; but when upper is < lower, and upper is not a scalar the resulting data frame is not consistently producing the same results. If lower is an array, and upper is a scalar the values are fixed at whatever the scalar upper is.

I'm not sure if pandas should simply throw an error when the bounds are inverted, or what - but some consistency would be helpful.

Expected Behavior

Either throw an error when upper < lower or have a consistent way to handle the result between scalar and list-like things.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.10.9.final.0 python-bits : 64 OS : Darwin OS-release : 21.4.0 Version : Darwin Kernel Version 21.4.0: Mon Feb 21 20:35:58 PST 2022; root:xnu-8020.101.4~2/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.5.3 numpy : 1.24.2 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 67.6.0 pip : 23.0 Cython : None pytest : 7.2.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.10.0 pandas_datareader: None bs4 : 4.11.2 bottleneck : None brotli : fastparquet : None fsspec : 2023.3.0 gcsfs : None matplotlib : 3.7.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 10.0.1 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.10.1 snappy : None sqlalchemy : 2.0.4 tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : 0.19.0 tzdata : None

Comment From: m-ganko

take

Comment From: m-ganko

Ok, I tested clip method and it is quite a mess.

>>> import pandas as pd
>>> import numpy as np

>>> data = {'A': range(10), 'B': range(10)}
>>> df = pd.DataFrame(data)
>>> df
   A  B
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4
5  5  5
6  6  6
7  7  7
8  8  8
9  9  9

When lower < upper it works as expected:

>>> expected = df.clip(lower=3, upper=7)
>>> expected
   A  B
0  3  3
1  3  3
2  3  3
3  3  3
4  4  4
5  5  5
6  6  6
7  7  7
8  7  7
9  7  7
>>> all(expected == df.clip(lower=3, upper=[7, 7]))
True
>>> all(expected == df.clip(lower=[3, 3], upper=7))
True

When we swap lower and upper, we get 3 different results.

>>> df.clip(lower=7, upper=[3, 3])  # 1
   A  B
0  7  7
1  7  7
2  7  7
3  7  7
4  3  3
5  3  3
6  3  3
7  3  3
8  3  3
9  3  3
>>> df.clip(lower=[7, 7], upper=3)  # 2
   A  B
0  3  3
1  3  3
2  3  3
3  3  3
4  3  3
5  3  3
6  3  3
7  3  3
8  3  3
9  3  3
>>> df.clip(lower=7, upper=3)  # 3
   A  B
0  3  3
1  3  3
2  3  3
3  3  3
4  4  4
5  5  5
6  6  6
7  7  7
8  7  7
9  7  7
  • Case # 1 - for this one I don't have explanation, it is wrong result
  • Case # 2 - this one works same as numpy.clip, when lower > upper all values are equal to upper.
  • Case # 3 - here lower and upper are swapped, but this behavior is not documented.

In my opinion, there are two reasonable approaches: 1. Follow numpy approach (# 2). 2. Raise exception when lower > upper.

I will start coding first one, to be consistent with numpy. If you have other idea, discussion is welcome.