Selecting multiple index values in a pandas series [duplicate] - pandas

This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 3 years ago.
Suppose I have a Pandas Series:
import pandas as pd
foo = pd.Series(data=[1,2,3], index=['a','b','c'])
foo
a 1
b 2
c 3
dtype: int64
Comparing the index to a value returns a nice selector array:
foo.index == 'c'
array([False, False, True], dtype=bool)
What is the expression for a selector array for 'a' and 'c' ([True, False, True])?
Not this:
foo.index in ['a', 'c']
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This is a simple example, but the true one is more complicated, and I want to select 10 or 15 items, so I'd like a concise format, ideally listing the elements I want to select by name.
I'm using pandas 0.23.4.

You can use:
foo.index.isin(['a','b'])
which returns your selector array for a and b, you can arbitrarily change the list if different values are required.

Related

How to print the value of a row that returns false using .isin method in python [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 4 months ago.
I am new to writing code and currently working on a project to compare two columns of an excel sheet using python and return the rows that does not match.
I tried using the .isin funtion and was able to identify output the values comparing the columns however i am not sure on how to print the actual row that returns the value "False"
For Example:
import pandas as pd
data = ["Darcy Hayward","Barbara Walters","Ruth Fraley","Minerva Ferguson","Tad Sharp","Lesley Fuller","Grayson Dolton","Fiona Ingram","Elise Dolton"]
df = pd.DataFrame(data, columns=['Names'])
df
data1 = ["Darcy Hayward","Barbara Walters","Ruth Fraley","Minerva Ferguson","Tad Sharp","Lesley Fuller","Grayson Dolton","Fiona Ingram"]
df1 = pd.DataFrame(data1, columns=['Names'])
df1
data_compare = df["Names"].isin(df1["Names"])
for data in data_compare:
if data==False:
print(data)
However, i want to know that 8 index returned False, something like the below format
Could you please advise how i can modify the code to get the output printed with the Index, Name that returned False?

Does a sorted dataframe keep its order after groupby? [duplicate]

This question already has answers here:
Take the positive value for a primary key in case of duplicates
(1 answer)
Pandas filter maximum groupby
(2 answers)
Closed 7 months ago.
I would like to keep the latest entry per group in a dataframe:
from datetime import date
import pandas as pd
data = [
['A', date(2018,2,1), "I want this"],
['A', date(2018,1,1), "Don't want"],
['B', date(2019,4,1), "Don't want"],
['B', date(2019,5,1), "I want this"]]
df = pd.DataFrame(data, columns=['name', 'date', 'result'])
The following does what I want (found and credits here):
df.sort_values('date').groupby('name').tail(1)
name date result
0 A 2018-02-01 I want this
3 B 2019-05-01 I want this
But how do I know the order is always preserved when you do a groupby on a sorted data frame like df? Is it somewhere documented?
No it won't. Try to replace A with Z to see it.
Use sort=False:
df.sort_values('date').groupby('name', sort=False).tail(1)

count the number of strings in a 2-D pandas series [duplicate]

This question already has answers here:
How do I count the values from a pandas column which is a list of strings?
(5 answers)
Closed 11 months ago.
I am trying to count the number of characters in an uneven 2-D pandas series.
df = pd.DataFrame({ 'A' : [['a','b'],['a','c','f'],['a'], ['b','f']]}
I want to count the number of times each character is repeated.
any ideas?
You can use explode() and value_counts().
import pandas as pd
df = pd.DataFrame({ 'A' : [['a','b'],['a','c','f'],['a'], ['b','f']]})
df = df.explode("A")
print(df.value_counts())
Expected output:
A
a 3
b 2
f 2
c 1

How to check if a Pandas Dataframe column contains a value? [duplicate]

This question already has answers here:
finding values in pandas series - Python3
(2 answers)
Closed 1 year ago.
I'd like to check if a pandas.DataFrame column contains a specific value. For instance, this toy Dataframe has a "h" in column "two":
import pandas as pd
df = pd.DataFrame(
np.array(list("abcdefghi")).reshape((3, 3)),
columns=["one", "two", "three"]
)
df
one two three
0 a b c
1 d e f
2 g h i
But surprisingly,
"h" in df["two"]
evaluates to False.
My question is: What's the clearest way to find out if a DataFrame column (or pandas.Series in general) contains a specific value?
df["two"] is a pandas.Series which looks like this:
0 b
1 e
2 h
It turns out, the in operator checks the index, not the values. I.e.
2 in df["two"]
evaluates to True
So one has to explicitly check for the values like this:
"h" in df["two"].values
This evaluates to True.

Different brackets in pandas DataFrame.loc [duplicate]

This question already has answers here:
Pandas selecting by label sometimes return Series, sometimes returns DataFrame
(8 answers)
Closed 4 years ago.
What is the difference in using loc[x,y] vs. loc[x][y] vs. loc[[x]][y]? They seem quite similar at first glance.
df = pd.DataFrame(np.arange(6).reshape(3, 2),
columns=['price', 'count'],
index=['First', 'Second', 'Third'])
print(df)
# price count
# First 0 1
# Second 2 3
# Third 4 5
print(df.loc['Second', 'count'])
# 3
print(df.loc['Second']['count'])
# 3
print(df.loc[['Second'], 'count'])
# Second 3
Although the first 2 are equivalent in output, the second is called chained indexing:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
the type also is a Series for the second one:
In[48]:
type(df.loc['Second'])
Out[48]: pandas.core.series.Series
you then index the index value which then returns the scalar value:
In[47]:
df.loc['Second']
Out[47]:
price 2
count 3
Name: Second, dtype: int32
In[49]:
df.loc['Second']['count']
Out[49]: 3
Regarding the last one, the additional brackets returns a df which is why you see the index value rather than a scalar value:
In[44]:
type(df.loc[['Second']])
Out[44]: pandas.core.frame.DataFrame
So then passing the column, indexes this df and returns the matching column, as a Series:
In[46]:
type(df.loc[['Second'],'count'])
Out[46]: pandas.core.series.Series
So it depends on what you want to achieve, but avoid the second form as it can lead to unexpected behaviour when attempting to assign to the column or df