How do create lists of items for every unique ID in a Pandas DataFrame? [duplicate] - pandas

This question already has answers here:
How to get unique values from multiple columns in a pandas groupby
(3 answers)
Python pandas unique value ignoring NaN
(4 answers)
Closed 1 year ago.
Imagine I have a table that looks like this.
original table
How do I convert it into this?
converted table
Attached sample data. Thanks.

Related

Create a new column with value 1/0 based on other column value in pandas [duplicate]

This question already has answers here:
Adding a new pandas column with mapped value from a dictionary [duplicate]
(1 answer)
Pandas conditional creation of a series/dataframe column
(13 answers)
Mapping values in place (for example with Gender) from string to int in Pandas dataframe [duplicate]
(3 answers)
Closed 9 hours ago.
I want to create a column with values 1 for female, 0 for male based on the gender column in Pandas.
Is using a for loop efficient?

Pandas - list of unique strings in a column [duplicate]

This question already has answers here:
Find the unique values in a column and then sort them
(8 answers)
Closed 1 year ago.
i have a dataframe column which contains these values:
A
A
A
F
R
R
B
B
A
...
I would like to make a list summarizing the different strings, as [A,B,F,...].
I've used groupby with nunique(), but I don't need counting.
How can I make the list ?
Thanks
unique() is enough
df['col'].unique().tolist()
pandas.Series.nunique() is to return the number of unique items.

How to use nlasgest in pandas? [duplicate]

This question already has answers here:
Pandas max value index
(3 answers)
Closed 2 years ago.
I'm looking for the highest row of a dataframa, actually the idea is to pick the highest value and the index. I'm trying to use this code:
data_q11.nlargest(144,['1980','2010'])
where data_q11 is the dataframe,144 the number os rows in this df and range of columns.
Although the result is returning a empty list of 0 rows and x 31 columns.
There is a function in Pandas for the index of the maximum value:
data_q11['col'].idxmax(axis=1)

Extracting information from Pandas dataframe column headers [duplicate]

This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 2 years ago.
I have a pandas dataframe with column headers, which contain information. I want to loop through the column headers and use logical operations on each header to extract the columns with the relevant information that I have.
my df.columns command gives something like this:
['(param1:x)-(param2:y)-(param3:z1)',
'(param1:x)-(param2:y)-(param3:z2)',
'(param1:x)-(param2:y)-(param3:z3)']
I want to select only the columns, which contain (param3:z1) and (param3:z3).
Is this possible?
You can use filter:
df = df.filter(regex='z1|z3')

pyspark sql dataframe keep only null [duplicate]

This question already has answers here:
Filter Pyspark dataframe column with None value
(10 answers)
Closed 6 years ago.
I have a sql dataframe df and there is a column user_id, how do I filter the dataframe and keep only user_id is actually null for further analysis? From the pyspark module page here, one can drop na rows easily but did not say how to do the opposite.
Tried df.filter(df.user_id == 'null'), but the result is 0 column. Maybe it is looking for a string "null". Also df.filter(df.user_id == null) won't work as it is looking for a variable named 'null'
Try
df.filter(df.user_id.isNull())