How to Select Rows and Columns of Dataframe?

How to Select Rows and Columns of Dataframe? - pandas

I have a dataframe with 4 columns and 6 rows, I want to select 2nd and 4th columns and 1st and 6th rows of this dataframe and create a new dataframe. How can i do this?

You can use the code given below, to do this but you have to be careful with the fact that in pandas the indexing starts from 0 which is the first column or could be first row when you are retrieving it.
>>> import pandas as pd
>>>
>>> dictA = {'col1': ['tom', 10,20,56,2,3,4],'col2': ['tom', 10,20,56,2,3,4],'col3': ['tom', 10,20,56,2,3,4],'col4': ['tom', 10,20,56,2,3,4]}
>>>
... dfA = pd.DataFrame(dictA)
>>> dfA
col1 col2 col3 col4
0 tom tom tom tom
1 10 10 10 10
2 20 20 20 20
3 56 56 56 56
4 2 2 2 2
5 3 3 3 3
6 4 4 4 4
>>> new_df = dfA.iloc[[0,5],[2,3]]
>>> new_df
col3 col4
0 tom tom
5 3 3
For more details have a look here

Related

Convert multiple columns in pandas dataframe to array of arrays

I have the following dataframe:
col1 col2 col3
1 1 2 3
2 4 5 6
3 7 8 9
4 10 11 12
I want to create a new column that will be an array of arrays, that contains a single array consisting of specific columns, casted to float.
So given column names, say "col2" and "col3", the output dataframe would look like this.
col1 col2 col3 new
1 1 2 3 [[2,3]]
2 4 5 6 [[5,6]]
3 7 8 9 [[8,9]]
4 10 11 12 [[11,12]]
What I have so far works, but seems clumsy and I believe there's a better way. I'm fairly new to pandas and numpy.
selected_columns = ["col2", "col3"]
df[selected_columns] = df[selected_columns].astype(float)
df['new'] = df.apply(lambda r: tuple(r[selected_columns]), axis=1)
.apply(np.array)
.apply(lambda r: tuple(r[["new"]]), axis=1)
.apply(np.array)
Appreciate your help, Thanks!

Using agg:
cols = ['col2', 'col3']
df['new'] = df[cols].agg(list, axis=1)
Using numpy:
df['new'] = df[cols].to_numpy().tolist()
Output:
col1 col2 col3 new
1 1 2 3 [2, 3]
2 4 5 6 [5, 6]
3 7 8 9 [8, 9]
4 10 11 12 [11, 12]
2D lists
cols = ['col2', 'col3']
df['new'] = df[cols].agg(lambda x: [list(x)], axis=1)
# or
df['new'] = df[cols].to_numpy()[:,None].tolist()
Output:
col1 col2 col3 new
1 1 2 3 [[2, 3]]
2 4 5 6 [[5, 6]]
3 7 8 9 [[8, 9]]
4 10 11 12 [[11, 12]]

pandas rolling calc count function

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': np.arange(6), 'col2': np.arange(2, 8)})
col1
col2
0
3
1
4
2
5
3
6
4
7
and i want get column col3 if condition with after rolling.tail(3) and re turn count(col1>=3 and col2>=3)
the last result i want it likes：
col1
col2
col3
reason
0
3
0
1
4
1
[(3>=3 and 6>=3)]
2
5
2
[(3>=3 and 6>=3),(4>=3 and 7>=3)]
3
6
nan
4
7
nan
Hope to get your reply as soon as possible

Add values in columns of multiple dataframes if values in another column are same

Question related to pandas dataframe
df1:
id count
1 3
2 7
3 11
df2:
id count
3 6
4 8
5 2
df3:
id count
2 1
4 3
6 9
Expected output df:
id count
1 3
2 8
3 17
4 11
5 2
6 9
Any help is appreciated &
Thanks in advance!

Use concat and aggregate sum:
df = pd.concat([df1, df2, df3]).groupby('id', as_index=False).sum()

Combine multiple columns into two columns: "column name" and "value"

There is probably an easy way of doing this, so I hope someone has a nice solution (currently I am doing it with ugly for loops).
My data looks like:
In [1]: df = pd.DataFrame({'Ref': [5, 6, 7],
'Col1': [10,11,12],
'Col2': [20,21,22],
'Col3': [30,31,32]})
In [2]: df
Out[2]:
Col1 Col2 Col3 Ref
0 10 20 30 5
1 11 21 31 6
2 12 22 32 7
And I am trying to flatten the table (for 2D histograms) to use a single column for the column id and one column for the actual values while keeping the corresponding Ref, like this:
Ref Col Value
0 5 1 10
1 5 2 20
2 5 3 30
3 6 1 11
4 6 2 21
5 6 3 31
6 7 1 12
7 7 2 22
8 7 3 32
I remember there was some kind of a join/group operation to do the reverse operation, but I cannot recall it anymore...

Maybe not the most elegant solution, but it works on your data. Using a combination of pivot_table and stack.
import pandas as pd
df = pd.DataFrame({'Ref': [5, 6, 7],
'Col1': [10,11,12],
'Col2': [20,21,22],
'Col3': [30,31,32]})
# In [23]: df
# Out[23]:
# Col1 Col2 Col3 Ref
# 0 10 20 30 5
# 1 11 21 31 6
# 2 12 22 32 7
piv = df.pivot_table(index=['Ref']).stack()
df2 = pd.DataFrame(piv)
df2.reset_index(inplace=True)
df2.columns = ['Ref','Col','Value']
# In [19]: df2
# Out[19]:
# Ref Col Value
# 0 5 Col1 10
# 1 5 Col2 20
# 2 5 Col3 30
# 3 6 Col1 11
# 4 6 Col2 21
# 5 6 Col3 31
# 6 7 Col1 12
# 7 7 Col2 22
# 8 7 Col3 32
If you want 'Col' to just be the last digit of the column name, could do something like this:
df2.Col = df2.Col.apply(lambda x: x[-1:])
# In [21]: df2
# Out[21]:
# Ref Col Value
# 0 5 1 10
# 1 5 2 20
# 2 5 3 30
# 3 6 1 11
# 4 6 2 21
# 5 6 3 31
# 6 7 1 12
# 7 7 2 22
# 8 7 3 32

In pandas, how to set_index with using column index instead of referring to column names?

For example:
We have a Pandas dataFrame foo with 2 columns ['A', 'B'].
I want to do function like
foo.set_index([0,1])
instead of
foo.set_index(['A', 'B'])
Have tried foo.set_index([[0,.1]]) as well but came with this error:
Length mismatch: Expected axis has 9 elements, new values have 2 elements

If the column index is unique you could use:
df.set_index(list(df.columns[cols]))
where cols is a list of ordinal indices.
For example,
In [77]: np.random.seed(2016)
In [79]: df = pd.DataFrame(np.random.randint(10, size=(5,4)), columns=list('ABCD'))
In [80]: df
Out[80]:
A B C D
0 3 7 2 3
1 8 4 8 7
2 9 2 6 3
3 4 1 9 1
4 2 2 8 9
In [81]: df.set_index(list(df.columns[[0,2]]))
Out[81]:
B D
A C
3 2 7 3
8 8 4 7
9 6 2 3
4 9 1 1
2 8 2 9
If the DataFrame's column index is not unique, then setting the index by label
is impossible and by ordinals more complicated:
import numpy as np
import pandas as pd
np.random.seed(2016)
def set_ordinal_index(df, cols):
columns, df.columns = df.columns, np.arange(len(df.columns))
mask = df.columns.isin(cols)
df = df.set_index(cols)
df.columns = columns[~mask]
df.index.names = columns[mask]
return df
df = pd.DataFrame(np.random.randint(10, size=(5,4)), columns=list('AAAA'))
print(set_ordinal_index(df, [0,2]))
yields
A A
A A
3 2 7 3
8 8 4 7
9 6 2 3
4 9 1 1
2 8 2 9

This worked for me, the other answer didn't.
# single column
df.set_index(df.columns[1])
# multi column
df.set_index(df.columns[[1, 0]].tolist())

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to Select Rows and Columns of Dataframe? - pandas

I have a dataframe with 4 columns and 6 rows, I want to select 2nd and 4th columns and 1st and 6th rows of this dataframe and create a new dataframe. How can i do this?

Related

Convert multiple columns in pandas dataframe to array of arrays

pandas rolling calc count function

Add values in columns of multiple dataframes if values in another column are same

Combine multiple columns into two columns: "column name" and "value"

In pandas, how to set_index with using column index instead of referring to column names?

Categories

Resources