How can we convert pandas dataframe two columns to python list after merging two columns vertically? - pandas

I have a dataframe...
print(df)
Name ae_rank adf de_rank
a 1 lk 4
b 2 lp 5
c 3 yi 6
How can I concat ae_rank column and de_rank column vertically and convert them into python list.
Expectation...
my_list = [1, 2, 3, 4, 5, 6]

Simpliest is join lists:
my_list = df['ae_rank'].tolist() + df['de_rank'].tolist()
If need reshape DataFrame with DataFrame.melt:
my_list = df.melt(['Name','adf'])['value'].tolist()
print (my_list )
[1, 2, 3, 4, 5, 6]

Another option is
my_list = df[['ae_rank', 'de_rank']].T.stack().tolist()
#[1, 2, 3, 4, 5, 6]

Most efficiently, use filter to select the columns by name that include "_rank" and use the underlying numpy array with ravel on the 'F' order (column major order):
my_list = df.filter(like='_rank').to_numpy().ravel('F').tolist()
output: [1, 2, 3, 4, 5, 6]

Related

Finding values from different rows in pandas

I have a dataframe comprising the data and another dataframe, containing a single row carrying indices.
data = {'col_1': [4, 5, 6, 7], 'col_2': [3, 4, 9, 8],'col_3': [5, 5, 6, 9],'col_4': [8, 7, 6, 5]}
df = pd.DataFrame(data)
ind = {'ind_1': [2], 'ind_2': [1],'ind_3': [3],'ind_4': [2]}
ind = pd.DataFrame(ind)
Both have the same number of columns. I want to extract the values of df corresponding to the index stored in ind so that I get a single row at the end.
For this data it should be: [6, 4, 9, 6]. I tried df.loc[ind.loc[0]] but that of course gives me four different rows, not one.
The other idea I have is to zip columns and rows and iterate over them. But I feel there should be a simpler way.
you can go to NumPy domain and index there:
In [14]: df.to_numpy()[ind, np.arange(len(df.columns))]
Out[14]: array([[6, 4, 9, 6]], dtype=int64)
this pairs up 2, 1, 3, 2 from ind and 0, 1, 2, 3 from 0 to number of columns - 1; so we get the values at [2, 0], [1, 1] and so on.
There's also df.lookup but it's being deprecated, so...
In [19]: df.lookup(ind.iloc[0], df.columns)
~\Anaconda3\Scripts\ipython:1: FutureWarning: The 'lookup' method is deprecated and will beremoved in a future version.You can use DataFrame.melt and DataFrame.locas a substitute.
Out[19]: array([6, 4, 9, 6], dtype=int64)

pandas dataframe how to remove values from cell that is a list based on other column

I have a dataframe with 2 columns that represent a list:
a. b. vals. locs
1. 2. [1,2,3,4,5]. [2,3]
5 1. [1,7,2,4,9]. [0,1]
8. 2. [1,9,4,7,8]. [3]
I want, for each row, exclude from the columns vals all the locations that are in locs.
so I will get:
a. b. vals. locs. new_vals
1. 2. [1,2,3,4,5]. [2,3]. [1,2,5]
5 1. [1,7,2,4,9]. [0,1]. [2,4,9]
8. 2. [1,9,4,7,8]. [3]. [1,9,4,8]
What is the best way to do so?
Thanks!
You can use a list comprehension with an internal filter based on enumerate:
df['new_vals'] = [[v for i,v in enumerate(a) if i not in b]
for a,b in zip(df['vals'], df['locs'])]
however this will become quickly inefficient when b get large.
A much better approach would be to use python sets that enable a fast (O(1) complexity) identification of membership:
df['new_vals'] = [[v for i,v in enumerate(a) if i not in S]
for a,b in zip(df['vals'], df['locs']) for S in [set(b)]]
output:
a b vals locs new_vals
0 1 2 [1, 2, 3, 4, 5] [2, 3] [1, 2, 5]
1 5 1 [1, 7, 2, 4, 9] [0, 1] [2, 4, 9]
2 8 2 [1, 9, 4, 7, 8] [3] [1, 9, 4, 8]
Use list comprehension with enumerate and converting values to sets:
df['new_vals'] = [[z for i, z in enumerate(x) if i not in y]
for x, y in zip(df['vals'], df['locs'].apply(set))]
print (df)
a b vals locs new_vals
0 1 2 [1, 2, 3, 4, 5] [2, 3] [1, 2, 5]
1 5 1 [1, 7, 2, 4, 9] [0, 1] [2, 4, 9]
2 8 2 [1, 9, 4, 7, 8] [3] [1, 9, 4, 8]
One way to do this is to create a function that works on row,
def func(row):
ans = [v for v in row['vals'] if row['vals'].index(v) not in row['locs']]
return ans
The call this function for each row using apply.
df['new_value'] = df.apply(func, axis=1)
This will work well, if the lists are short.

How to filter a pandas dataframe if a column is a list

In the dataframe that comes from
http:bit.ly/imdbratings
one column, actors_list, is a list of the actors in the movie.
How do I filter the dataframe for movies where Al Pacino took part?
e.g. [u'Marlon Brando', u'Al Pacino', u'James Caan']
You could use filtering with a map function inside it. Suppose you are looking for actor number 32:
import pandas as pd
import numpy as np
df = pd.DataFrame({'name':['A','B','C','D','E','F'],
'Actors':[[1,2,3],[2,4,3],[3,4,5,32,1],[4,5,2,3],[102,302],[1,2,3,32,5]]})
df[df['Actors'].map(lambda x: 32 in x)]
Output:
name Actors
2 C [3, 4, 5, 32, 1]
5 F [1, 2, 3, 32, 5]
Or if you want to check if at least one actor from the list of actors you wish is present in the movies then use any in combination with lambda:
important_actors = [32,3]
print(df[df['Actors'].map(lambda x: any(i in x for i in important_actors))])
Output:
name Actors
0 A [1, 2, 3]
1 B [2, 4, 3]
2 C [3, 4, 5, 32, 1]
3 D [4, 5, 2, 3]
5 F [1, 2, 3, 32, 5]
The structure is that... You can now change any for all if you wish to filter the movies where all actors are in them, and so on... Feel free to leave a comment if you need further explanation/have any doubts.
You can do string contains.
l=[u'Marlon Brando', u'Al Pacino', u'James Caan']
m=df['actors_list'].str.join('|').str.contains('|'.join(l))
df=df[m]
Or
m=pd.DataFrame(df['actors_list'].tolist()).isin(l).any(1)
df=df[m.values]

How to compute how many elements in three arrays in python are equal to some value in the same positon betweel the arrays?

I have three numpy arrays
a = [0, 1, 2, 3, 4]
b = [5, 1, 7, 3, 9]
c = [10, 1, 3, 3, 1]
and i wanna to compute how many elements in a, b, c are equal to 3 in the same position, so for that example would be 3.
An elegant solution is to use Numpy functions, like:
np.count_nonzero(np.vstack([a, b, c])==3, axis=0).max()
Details:
np.vstack([a, b, c]) - generate an array with 3 rows, composed of your
3 source arrays.
np.count_nonzero(...==3, axis=0) - count how many values of 3 occurs
in each column. For your data the result is array([0, 0, 1, 3, 0], dtype=int64).
max() - take the greatest value, in your case 3.

Access Row Based on Column Value

I have the following pandas dataframe:
data = {'ID': [1, 2, 3], 'Neighbor': [3, 1, 2], 'x': [5, 6, 7]}
Now I want to create a new column 'y', which for each row is the value of the field x, from that row referenced by the neighbor column (ie that column, whose ID equals the value of Neighbor), e.g: For row 0 (ID 1), 'Neighbor' is 3, thus 'y' should be 7.
So the resulting dataframe should have the colum y = [7, 5, 6].
Can I solve this without using df.apply? (As this is rather time-consuming for my big dataframes.)
I would like to use sth like
df.loc[:, 'y'] = df.loc[df.Neighbor.eq(df.ID), 'x']}
but this returns NaN.
we can pass a dict from your ID and X columns then map these into your new column
your_dict_ = dict(zip(df['ID'],df['x']))
print(your_dict_)
{1: 5, 2: 6, 3: 7}
Then we can use .map to pass these your column using the Neighbor column as the key.
df['Y'] = df['Neighbor'].map(your_dict_)
print(df)
ID Neighbor x Y
0 1 3 5 7
1 2 1 6 5
2 3 2 7 6