This question already has answers here:
Find the unique values in a column and then sort them
(8 answers)
Closed 1 year ago.
i have a dataframe column which contains these values:
A
A
A
F
R
R
B
B
A
...
I would like to make a list summarizing the different strings, as [A,B,F,...].
I've used groupby with nunique(), but I don't need counting.
How can I make the list ?
Thanks
unique() is enough
df['col'].unique().tolist()
pandas.Series.nunique() is to return the number of unique items.
Related
This question already has answers here:
Pandas Dataframe: split column into multiple columns, right-align inconsistent cell entries
(3 answers)
How to split a dataframe string column into two columns?
(11 answers)
Closed 4 months ago.
According to the docs https://pandas.pydata.org/docs/reference/api/pandas.Series.str.split.html, I want to split this one column of numbers into 2 columns based on default whitespace. However, the following doesnt appear to do anything.
self.data[0].str.split(expand=True)
The df is of shape (1,1) but would like to split into (1,2)
Output
0
0 1.28353e-02 3.24985e-02
Desired Output
0 1
0 1.28353e-02 3.24985e-02
PS: I dont want to specifically create columns A and B
This question already has answers here:
How to get unique values from multiple columns in a pandas groupby
(3 answers)
Python pandas unique value ignoring NaN
(4 answers)
Closed 1 year ago.
Imagine I have a table that looks like this.
original table
How do I convert it into this?
converted table
Attached sample data. Thanks.
This question already has answers here:
Pandas max value index
(3 answers)
Closed 2 years ago.
I'm looking for the highest row of a dataframa, actually the idea is to pick the highest value and the index. I'm trying to use this code:
data_q11.nlargest(144,['1980','2010'])
where data_q11 is the dataframe,144 the number os rows in this df and range of columns.
Although the result is returning a empty list of 0 rows and x 31 columns.
There is a function in Pandas for the index of the maximum value:
data_q11['col'].idxmax(axis=1)
This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 2 years ago.
I have a pandas dataframe with column headers, which contain information. I want to loop through the column headers and use logical operations on each header to extract the columns with the relevant information that I have.
my df.columns command gives something like this:
['(param1:x)-(param2:y)-(param3:z1)',
'(param1:x)-(param2:y)-(param3:z2)',
'(param1:x)-(param2:y)-(param3:z3)']
I want to select only the columns, which contain (param3:z1) and (param3:z3).
Is this possible?
You can use filter:
df = df.filter(regex='z1|z3')
This question already has answers here:
Filter pandas DataFrame by substring criteria
(17 answers)
Closed 3 years ago.
I am trying to rewrite the following in one line using list comprehension. I want to select cells that contains substring '[edit]' only. ut is my dataframe and the column that I want to select from is 'col1'. Thanks!
for u in ut['col1']:
if '[edit]' in u:
print(u)
I expect the following output:
Alabama[edit]
Alaska[edit]
Arizona[edit]
...
If the output of a Pandas Series is acceptable, then you can just use .str.contains, without a loop
s = ut[ut["col1"].str.contains("edit")]
If you need to print each element of the Series separately, then loop over the Series using
for i in s:
print(i)