numpy append() function doesn't change my ndarray? - numpy

I want to add something to a ndarray, what am I doing wrong?
import numpy as np
sequence =np.repeat(1, 4)
print(sequence)
np.append(sequence, 7)
print(sequence)
Expected result in console:
[1 1 1 1]
[1 1 1 1 7]
Actual result:
[1 1 1 1]
[1 1 1 1]

np.append() returns a new list, so what you wanna do is you wanna do sequence = np.append(sequence, 7)

Related

pandas merge on list type rows

i have two dataframe.
df1:
import pandas as pd
values=[[1,[1,2]],[2,[2,2]],[3,[2,3]]]
df=pd.DataFrame(values,columns=['idx','value'])
print(df)
'''
idx value
1 [1,2]
2 [2,2]
3 [2,3]
4 []
'''
df2:
values=[[1,'json'],[2,'csv'],[3,'xml']]
df2=pd.DataFrame(values,columns=['id2','type'])
print(df2)
'''
id2 type
1 json
2 csv
3 xml
'''
i want to merge this two dataframes. But the values ​​column in the first df consists of lists. Expected output:
idx value type
1 [1,2] [json,csv]
2 [2,2] [csv,csv]
3 [2,3] [csv,xml]
4 [] []
I tried the code below but got an error.. Is there a way I can merge for each element in the list?
final=df.merge(df2,how='left',left_on='value',right_on='id2')
#returns
TypeError: unhashable type: 'list'
here is one way to do it
df.explode('value').merge(df2, left_on = 'idx',
right_on='id2').drop(columns='id2').pivot_table(index='idx', aggfunc=list).reset_index()
idx type value
0 1 [json, json] [1, 2]
1 2 [csv, csv] [2, 2]
2 3 [xml, xml] [2, 3]
Explode the value column then map the type using common id then groupby and aggregate back to list
d = df2.set_index('id2')['type']
df['type'] = df['value'].explode().map(d).groupby(level=0).agg(list)
Alternative approach with list comp and dict lookup
d = df2.set_index('id2')['type']
df['type'] = df['value'].map(lambda l: [d.get(i) for i in l])
idx value type
0 1 [1, 2] [json, csv]
1 2 [2, 2] [csv, csv]
2 3 [2, 3] [csv, xml]

Pandas locate and apply changes to column

This is something I always struggle with and is very beginner. Essentially, I want to locate and apply changes to a column based on a filter from another column.
Example input.
import pandas as pd
cols = ['col1', 'col2']
data = [
[1, 1],
[1, 1],
[2, 1],
[1, 1],
]
df = pd.DataFrame(data=data, columns=cols)
# NOTE: In practice, I will be applying a more complex function
df['col2'] = df.loc[df['col1'] == 1, 'col2'].apply(lambda x: x+1)
Returned output:
col1 col2
0 1 2.0
1 1 2.0
2 2 NaN
3 1 2.0
Expected output:
col1 col2
0 1 2
1 1 2
2 2 2
3 1 2
What's happening:
Records that do not meet the filtering condition are being set to null because of my apply / lambda routine
What I request:
The correct locate/filter and apply approach. I can achieve the expected frame using update, however I want to use locate and apply.
By doing df['col2'] = ..., you're setting all the values of col2. But, since you're only calling apply on some of the values, the values that aren't included get set to NaN. To fix that, save your mask and reuse it:
mask = df['col1'] == 1
df.loc[mask, 'col2'] = df.loc[mask, 'col2'].apply(lambda x: x+1)
Output:
>>> df
col1 col2
0 1 2
1 1 2
2 2 1
3 1 2

Get back to DataFrame after df.as_matrix()

I play with a dataset in pandas.
At some point I use it as matrix (df.as_matrix()) , then I do some transformations (with sklearn) and I want to go back to DataFrame.
How can I go back from df.as_matrix() back to df this the most straightworward way and with preserving indexes and col names?
Consider the data frame df
df = pd.DataFrame(1, list('xyz'), list('abc'))
df
a b c
x 1 1 1
y 1 1 1
z 1 1 1
as_matrix gives you:
df.as_matrix()
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
It is completely reasonable to go back to a data frame with
pd.DataFrame(df.as_matrix())
0 1 2
0 1 1 1
1 1 1 1
2 1 1 1
But you lose the index and column information.
If you still have that info lying around
pd.DataFrame(df.as_matrix(), df.index, df.columns)
a b c
x 1 1 1
y 1 1 1
z 1 1 1
And you are back where you started.

Given a tensor [5,4,3,4], how to generate a constant tensor where each row has n ones and m zeros, n=5,4,3,4, and m=0,1,2,1.

Given a tensor A: [5,4,3,4], I want to create a tensor B:
[[1,1,1,1,1],
[1,1,1,1,0],
[1,1,1,0,0],
[1,1,1,1,0]]
Each row of B has n ones where n = 5,4,3,4 according to A. The remaining positions are filled with zeros.
Can I realize this in tensorflow, and how?
You can use tf.sequence_mask for this.
import tensorflow as tf
A = tf.constant([5,4,3,4], dtype=tf.int32)
max_len = tf.reduce_max(A)
B = tf.sequence_mask(A, max_len, dtype=tf.int32)
with tf.Session() as sess:
print(sess.run(B))
Prints:
[[1 1 1 1 1]
[1 1 1 1 0]
[1 1 1 0 0]
[1 1 1 1 0]]

Numpy indexing in 3 dimensions

In [93]: a = np.arange(24).reshape(2, 3, 4)
In [94]: a[0, 1, ::2]
Out[94]: array([4, 6])
Can someone explain what '::2' means here?
Thanks!
::2 means : in this dimension, get all the "layers" having a pair index (starting from 0, counting by 2).
it means: get the element at a[0, 1, 0] and a[0, 1, 2] and put it into the same array.
each index position (you have 3 in this sample) is indexable and "sliceable". perhaps you saw slices like [this:slice] before in normal arrays. well... slices can also have a third value which is the "step" value.
so: [a:b:c] means [startPosition:endPosition:step] where endPosition is not included.
so having ::2 means start=0, end=the end of the ... dimension, step=2.
you have at most 4 in that dimension (see your reshape line), so the index it will count are 0 and 2 (1 and 3 are skipped, and 3 is the last element).
0 0 0 => 0
0 0 1 => 1
0 0 2 => 2
0 0 3 => 3
0 1 0 => 4 -> (0, 1, 0) is rescued via the slice
0 1 1 => 5
0 1 2 => 6 -> (0, 1, 2) is rescued via the slice