Remove the duplicated entry from this list [duplicate] - pandas

This question already has answers here:
Drop all duplicate rows across multiple columns in Python Pandas
(8 answers)
Closed 3 years ago.
Based on the bellow example, how can i remove just the last "A" from the list ? By using duplicates (as i did) it deletes both entries. The end result should be: A, B, C but now i get B,C.
import pandas as pd
df = pd.DataFrame({'ID': ["A", "B", "C", "A"]})
df.drop_duplicates(keep=False,inplace=True)
print(df)

You just need to set parameter keep="first"

Related

How Should I skip Rows in Pandas DataFrame, that are above the Index Columns name? [duplicate]

This question already has answers here:
Pandas dataframe with multiindex column - merge levels
(4 answers)
Closed 2 months ago.
I want to set all column names at one line. How should I do it?
I tried many things but couldn't do it, including renaming columns.
You need to flatten your multiindex column header.
df.columns = df.columns.map('_'.join)
Or using f-string with list comprehension:
df.columns = [f'{i}_{j}' if j else f'{i}' for i, j in df.columns]

How to print the value of a row that returns false using .isin method in python [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 4 months ago.
I am new to writing code and currently working on a project to compare two columns of an excel sheet using python and return the rows that does not match.
I tried using the .isin funtion and was able to identify output the values comparing the columns however i am not sure on how to print the actual row that returns the value "False"
For Example:
import pandas as pd
data = ["Darcy Hayward","Barbara Walters","Ruth Fraley","Minerva Ferguson","Tad Sharp","Lesley Fuller","Grayson Dolton","Fiona Ingram","Elise Dolton"]
df = pd.DataFrame(data, columns=['Names'])
df
data1 = ["Darcy Hayward","Barbara Walters","Ruth Fraley","Minerva Ferguson","Tad Sharp","Lesley Fuller","Grayson Dolton","Fiona Ingram"]
df1 = pd.DataFrame(data1, columns=['Names'])
df1
data_compare = df["Names"].isin(df1["Names"])
for data in data_compare:
if data==False:
print(data)
However, i want to know that 8 index returned False, something like the below format
Could you please advise how i can modify the code to get the output printed with the Index, Name that returned False?

How to check if a Pandas Dataframe column contains a value? [duplicate]

This question already has answers here:
finding values in pandas series - Python3
(2 answers)
Closed 1 year ago.
I'd like to check if a pandas.DataFrame column contains a specific value. For instance, this toy Dataframe has a "h" in column "two":
import pandas as pd
df = pd.DataFrame(
np.array(list("abcdefghi")).reshape((3, 3)),
columns=["one", "two", "three"]
)
df
one two three
0 a b c
1 d e f
2 g h i
But surprisingly,
"h" in df["two"]
evaluates to False.
My question is: What's the clearest way to find out if a DataFrame column (or pandas.Series in general) contains a specific value?
df["two"] is a pandas.Series which looks like this:
0 b
1 e
2 h
It turns out, the in operator checks the index, not the values. I.e.
2 in df["two"]
evaluates to True
So one has to explicitly check for the values like this:
"h" in df["two"].values
This evaluates to True.

filter multiple separate rows in a DataFrame that meet the condition in another DataFrame with pandas? [duplicate]

This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 2 years ago.
This is my DataFrame
df = pd.DataFrame({'uid': [109200005, 108200056, 109200060, 108200085, 108200022],
'grades': [69.233627, 70.130900, 83.357011, 88.206387, 74.342212]})
This is my condition list which comes from another DataFrame
condition_list = [109200005, 108200085]
I use this code to filter records that meet the condition
idx_list = []
for i in condition_list:
idx_list.append(df[df['uid']==i].index.values[0])
and get what I need
>>> df.iloc[idx_list]
uid grades
0 109200005 69.233627
3 108200085 88.206387
Job is done. I'd just like to know is there a simpler way to do the job?
Yes, use isin:
df[df['uid'].isin(condition_list)]

All column names not listed by df.columns [duplicate]

This question already has answers here:
pandas groupby without turning grouped by column into index
(3 answers)
Closed 2 years ago.
I wanted to perform groupby and agg fucntion on my dataframe, so i performed the below code
basic_df = df.groupby(['S2PName','S2PName-Category'], sort=False)['S2PGTotal'].agg([('totSale','sum'), ('count','size')])
basic_df.head(2)
My O/P:
totSale count
S2PName S2PName-Category
IDLY Food 598771.47 19749
DOSA Food 567431.03 14611
Now I try to print the columns using basic_df.columns
My O/P:
Index(['totSale', 'count'], dtype='object')
Why are the other two columns "S2pname and S2PName-category" not being displayed. What do I need to do to display them as well?
Thanks !
Adding as_index=False, or reset_index() at the end
basic_df = df.groupby(['S2PName','S2PName-Category'], sort=False,as_index=False)['S2PGTotal'].agg([('totSale','sum'), ('count','size')])
#basic_df = df.groupby(['S2PName','S2PName-Category'], sort=False)['S2PGTotal'].agg([('totSale','sum'), ('count','size')]).reset_index()