Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
import pandas as pd
# creating a single series dataframe
frame = pd.DataFrame(np.array([1, 2, 3, 4, 5, 100]))
# getting the describe with single percentile value
frame.describe(percentiles = [0.25])
Issue Description
Using a single percentile value below 50 for percentiles
for data frame describe function returns 50th percentile data by default, while the same is not reflected when the value is more than 50.
# considering the above dataframe in example
>>> frame.describe(percentiles = [0.25])
0
count 6.000000
mean 19.166667
std 39.625329
min 1.000000
25% 2.250000
50% 3.500000
max 100.000000
>>> frame.describe(percentiles = [0.35])
0
count 6.000000
mean 19.166667
std 39.625329
min 1.000000
35% 2.750000
50% 3.500000
max 100.000000
>>> frame.describe(percentiles = [0.51])
0
count 6.000000
mean 19.166667
std 39.625329
min 1.000000
50% 3.500000
51% 3.550000
max 100.000000
Expected Behavior
Should return only given percentile value instead.
Installed Versions
Comment From: rhshadrach
Thanks for the report. This goes back to https://github.com/ivanovmg/pandas/commit/843aa600c8ed359f828c5a25d5e7130d6df1e08a and is indeed intentional.
But it is certainly not well documented, and I'm supportive of removing the behavior where we always include 0.5.
Comment From: kevkle
take
Comment From: yanweiSu
take
Comment From: Pushkar3232
Hi, I'm a first-time contributor and working on this issue. Could you please help me by pointing to the specific file or path where the .describe() function is implemented? I'm having trouble navigating the large repository and would appreciate some guidance on where to start. Thanks in advance for your help!
Comment From: rhshadrach
@Pushkar3232 - thanks for your interest, but this issue already has a PR up resolving it.
Comment From: Abhibhav2003
take
Comment From: Abhibhav2003
refer to pull request : #61024
Comment From: preet545
take
Comment From: xaris96
take
Comment From: MartinBraquet
@rhshadrach and other maintainers, I read different opinions regarding the usefulness of the change desired here. In #60557 , some comments advise to preserve the median, even when percentiles
is not None
.
If we remove the median when percentiles
is not None
, perhaps it then raises the question as to why we don't also remove the other default percentiles, namely the min and max. Indeed, the default percentiles rendered by .describe
are the 0th, 25th, 50th, 75th and 100th ones. Keeping the 0th and 100th ones while discarding the 50th one -- when percentiles
is passed -- might seem arbitrary in the eyes of some users. In some sense, this would make the judgment that the min and max are more important / valuable than the median. To me, this decision potentially makes sense; I am simply wondering if such a judgment is also widely agreed upon by the users.
I allowed myself to handle this issue in #61158 as all related activity got stale for weeks. If my comment above about the utility of the issue is reconsidered, I'll update my PR; it was asked to cover with more tests in any case.
Comment From: MartinBraquet
take