Code Sample, a copy-pastable example if possible
# Your code here
dataset[dataset.Column1 =="abc123"] # this works fine
dataset.loc[dataset.Column1.startswith("abc")] #this fails
Problem description
Traceback (most recent call last):
File "
Overall intent: To use a filtering function or a lambda to select a row or reject it.
Expected Output
just as == is defined on Series as checking each element, we can also define startswith or other function like re match etc
Output of pd.show_versions()
Comment From: chris-b1
Please make this a reproducible example, thanks.
Comment From: jorisvandenbossche
See your error message: a pandas Series has no method startswith
.
There is however a .str.startswith()
method, see http://pandas.pydata.org/pandas-docs/stable/text.html. So you can do: dataset.Column1.str.startswith("abc")
But maybe that's not your point, and you want to raise a more general issue. But then indeed, please make a reproducible example with what you want to work and the desired output.
Comment From: amitavmohanty01
As per your suggestion, dataset.loc[dataset.Column1.str.startswith("abc")]
works fine. How about considering dataset[re.search("^[A-Z]*", dataset.Column1.str).group(0) == "IAD"]
for the more general case that I want to achieve?
Comment From: jorisvandenbossche
. How about considering dataset[re.search("^[A-Z]*", dataset.Column1.str).group(0) == "IAD"] for the more general case that I want to achieve?
re.search
does not accept pandas Series objects, and that is not something we can change.
Closing this, as I don't think there is issue here for pandas. But feel free to further clarify if you disagree with that.
Comment From: amitavmohanty01
Thanks for the help and direction. I was able to use the apply function as follows to get close to what I want. dataset[dataset.Column1.apply(lambda x : re.search("^[A-Z]*", x).group(0)) == "xYz"]
. However, dataset[dataset.Column1.apply(lambda x : re.search("^[A-Z]*", x).group(0)) == dataset.Column2.apply(lambda x : re.search("^[A-Z]*", x).group(0))]
did not work out.
Comment From: jorisvandenbossche
Note that there is a Series.str.extract
which basically does a re.search
. So I think you should be able to do something like:
dataset[dataset.Column1.str.extract("^[A-Z]*") == "xYz"]
instead of using the apply
Comment From: amitavmohanty01
The intent is to select rows that meet a criteria. The criteria is re(column1).group(n) == re(column2).group(m)
. Am I approaching the problem incorrectly?
Comment From: amitavmohanty01
same = dataset[dataset.Column1.str.extract("^([A-Z]*)") == dataset.Column2.str.extract("^([A-Z]*)")]
this solved my purpose. Thanks for the help.