This behavior is obviously by design. Why choose to break substitutability?
>>> bool([])
False
>>> bool(numpy.array([]))
False
>>> bool(pandas.Series(numpy.array([])))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 892, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I ask, because I'd like to write idomatic Python that works for all containers, without need to special-case Pandas Series and DataFrames. There's no ambiguity for [False]
nor numpy.array([False])
... oh.
>>> bool([False])
True
>>> bool(numpy.array([False]))
False
Ok, NumPy made an interesting choice here. But Pandas could reverse it and go back to standard truthiness: len() == 0
means False
and len() > 0
means True
.
I don't expect to actually cause a change here, but I'm curious as to what discussion happened on this point. If there's an archived email thread about this, could someone point me to it?
Comment From: chris-b1
Note that numpy
also raises for anything but a 1-element array.
In [31]: bool(np.array([False, False]))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-31-32cc592d060f> in <module>()
----> 1 bool(np.array([False, False]))
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Comment From: jreback
pretty clear explanation here
Comment From: selik
That wiki section asks some rhetorical questions to illustrate that the truthiness of a non-empty Series is ambiguous. However, I think those questions have very clear answers:
Should it be True because it’s not zero-length?
Yes. This is a very clear standard from the core Python language and the standard library.
>>> bool([False])
True
False because there are False values?
No, non-empty means True
. Again, this is a strong standard. There's no ambiguity here.
Obviously, Pandas can't change this, because it needs to stay consistent with NumPy. But if someday NumPy were to flip on this issue, I'd like to open it back up for discussion.
Comment From: jreback
@selik pls re-read the faq. These are very deliberate choices that ARE different from the standard library for good reason.
The point is that for a list-like you NEED to use .all/.any
to disambiguate. It HAS to be a deliberate choice. Simply doing if df:
is completely misleading, non-sensical, and is by-definition different that what the standard library does, BECAUSE the standard library doesn't care for the contents of things (e.g. the standard library doesn't care about the VALUES contained in a list), just that it has a length. This is pretty much useless for array computing.
Comment From: jorisvandenbossche
@selik If you want idiomatic code that works for any sort of container, I think you can do something like if len(some_contailer): ...
instead of if some_container: ...
? (if non-zero length is what you want to check here)
Comment From: selik
@jorisvandenbossche I wouldn't say if len(container):
is Pythonic, but perhaps it's the best option for this situation.
@jreback The choice makes sense for the emphasis on vectorized methods/functions, but causes usability problems for libraries that are primarily not for vectors. I suppose you might say this is a "no free lunch" API design situation and it's better to design for vectorized usage.