How can one rerange chararrays without getting a TypeError.
import numpy as np
bggr = ([['B', 'G', 'B', 'G'],
['G', 'R', 'G', 'R'],
['B', 'G', 'B', 'G'],
['G', 'R', 'G', 'R']])
test = np.chararray((4, 4, 3)) #create empty chararray
test[:] = ''
test[::2, ::2, 2]=bggr[0::2, 0::2] #blue
test[1::2, ::2, 1]=bggr[1::2, 0::2] #green
test[::2, 1::2, 1]=bggr[0::2, 1::2] #green
test[1::2, 1::2, 0]=bggr[1::2, 1::2] #red
print(test)
Traceback (most recent call last):
File "main.py", line 16, in <module>
test[::2, ::2, 2]=bggr[0::2, 0::2] #blue
TypeError: list indices must be integers or slices, not tuple
Tried the exact same code with numbers instead of chars and it worked fine. Thank you.
You need to use a numpy array for bggr:
import numpy as np
bggr = np.array([['B', 'G', 'B', 'G'],
['G', 'R', 'G', 'R'],
['B', 'G', 'B', 'G'],
['G', 'R', 'G', 'R']])
test = np.chararray((4, 4, 3)) #create empty chararray
test[::2, ::2, 2]=bggr[0::2, 0::2] #blue
test[1::2, ::2, 1]=bggr[1::2, 0::2] #green
test[::2, 1::2, 1]=bggr[0::2, 1::2] #green
test[1::2, 1::2, 0]=bggr[1::2, 1::2] #red
print(test)
output:
[[['' '' b'B']
['' b'G' '']
['' '' b'B']
['' b'G' '']]
[['' b'G' '']
[b'R' b'\x04' '']
['' b'G' '']
[b'R' '' '']]
[[b'\x06' '' b'B']
['' b'G' '']
['' '' b'B']
['' b'G' '']]
[['' b'G' '']
[b'R' '' '']
['' b'G' '']
[b'R' '' '']]]
Related
I want to rearrange my data (two even-length 1d arrays):
cs = [w x y z]
rs = [a b c d e f]
to make a result like this:
[[a b w x]
[c d w x]
[e f w x]
[a b y z]
[c d y z]
[e f y z]]
This is what I have tried (it works):
ls = []
for c in range(0,len(cs),2):
for r in range(0,len(rs),2):
item = [rs[r], rs[r+1], cs[c], cs[c+1]]
ls.append(item)
But I want to get the same result using reshaping/broadcasting or other numpy functions.
What is the idiomatic way to do this task in numpy?
You could tile the elements of rs, repeat the elements of cs and then arrange those as columns for a 2D array:
import numpy as np
cs = np.array(['w', 'x', 'y', 'z'])
rs = np.array(['a', 'b', 'c', 'd', 'e', 'f'])
res = np.c_[np.tile(rs[::2], len(cs) // 2), np.tile(rs[1::2], len(cs) // 2),
np.repeat(cs[::2], len(rs) // 2), np.repeat(cs[1::2], len(rs) // 2)]
Result:
array([['a', 'b', 'w', 'x'],
['c', 'd', 'w', 'x'],
['e', 'f', 'w', 'x'],
['a', 'b', 'y', 'z'],
['c', 'd', 'y', 'z'],
['e', 'f', 'y', 'z']], dtype='<U1')
An alternative:
np.c_[np.tile(rs.reshape(-1, 2), (len(cs) // 2, 1)),
np.repeat(cs.reshape(-1, 2), len(rs) // 2, axis=0)]
An alternative to using tile/repeat, is to generate repeated row indices.
Make the two arrays - reshaped as they will be combined:
In [106]: rs=np.reshape(list('abcdef'),(3,2))
In [107]: cs=np.reshape(list('wxyz'),(2,2))
In [108]: rs
Out[108]:
array([['a', 'b'],
['c', 'd'],
['e', 'f']], dtype='<U1')
In [109]: cs
Out[109]:
array([['w', 'x'],
['y', 'z']], dtype='<U1')
Make 'meshgrid' like indices (itertools.product could also be used)
In [110]: IJ = np.indices((3,2))
In [111]: IJ
Out[111]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[0, 1],
[0, 1],
[0, 1]]])
reshape with order gives two 1d arrays:
In [112]: I,J=IJ.reshape(2,6,order='F')
In [113]: I,J
Out[113]: (array([0, 1, 2, 0, 1, 2]), array([0, 0, 0, 1, 1, 1]))
Then just index the rs and cs and combine them with hstack:
In [114]: np.hstack((rs[I],cs[J]))
Out[114]:
array([['a', 'b', 'w', 'x'],
['c', 'd', 'w', 'x'],
['e', 'f', 'w', 'x'],
['a', 'b', 'y', 'z'],
['c', 'd', 'y', 'z'],
['e', 'f', 'y', 'z']], dtype='<U1')
edit
Here's another way of looking this - a bit more advanced. With sliding_window_view we can get a "block" view of that Out[114] result:
In [130]: np.lib.stride_tricks.sliding_window_view(_114,(3,2))[::3,::2,:,:]
Out[130]:
array([[[['a', 'b'],
['c', 'd'],
['e', 'f']],
[['w', 'x'],
['w', 'x'],
['w', 'x']]],
[[['a', 'b'],
['c', 'd'],
['e', 'f']],
[['y', 'z'],
['y', 'z'],
['y', 'z']]]], dtype='<U1')
With a bit more reverse engineering, I find I can create Out[114] with:
In [147]: res = np.zeros((6,4),'U1')
In [148]: res1 = np.lib.stride_tricks.sliding_window_view(res,(3,2),writeable=True)[::3,::2,:,:]
In [149]: res1[:,0,:,:] = rs
In [150]: res1[:,1,:,:] = cs[:,None,:]
In [151]: res
Out[151]:
array([['a', 'b', 'w', 'x'],
['c', 'd', 'w', 'x'],
['e', 'f', 'w', 'x'],
['a', 'b', 'y', 'z'],
['c', 'd', 'y', 'z'],
['e', 'f', 'y', 'z']], dtype='<U1')
I can't say that either of these is superior, but they show there are various ways of "vectorizing" this kind of array layout.
I have a long dataframe I need to transform to get a wide one.
The long one is :
df = pd.DataFrame({
'key' : ['E', 'E', 'E', 'E', 'J', 'J', 'J', 'J'],
'father' : ['A', 'D', 'C', 'B', 'F', 'H', 'G', 'I'],
'son' : ['B', 'E', 'D', 'C', 'G', 'I', 'H', 'J']
})
df
First thing to do, I think, is to group it by key. Then we have to find where those keys are found into the column 'son', it's the end (and last son) of the link I need to rebuild.
To rebuild the link, I need to look for his 'father'. His 'father' needs to be kept as father of final step and, also needs to be found into 'son'.
I need to iterate this until a 'father' cannot be found into the 'son' column, so it's going to be the father_0 of the link.
I think it could be done iterating those steps into a recursive function where the stop case : is 'father' not found in 'son'.
Here is the dataframe I want to get from this :
df1 = pd.DataFrame({
'key' : ['E', 'J'],
'father_1' : ['A', 'F'],
'son_1' : ['B', 'G'],
'father_2' : ['B', 'G'],
'son_2' : ['C', 'H'],
'father_3' : ['C', 'H'],
'son_3' : ['D', 'I'],
'father_4' : ['D', 'I'],
'son_4' : ['E', 'J'],
})
df1
I simplified the problem here with 2 different links of the same depth, but they could be from depth 1 to depth 10 (maybe more but rarely and unpredictably) for a lot of different keys.
Here is another example of df with 2 links of different size :
df_ = pd.DataFrame({
'key' : ['E', 'E', 'E', 'E', 'K', 'K', 'K', 'K', 'K'],
'father' : ['A', 'D', 'C', 'B', 'F', 'H', 'G', 'I', 'J'],
'son' : ['B', 'E', 'D', 'C', 'G', 'I', 'H', 'J', 'K']
})
df_
df_1 = pd.DataFrame({
'key' : ['E', 'K'],
'father_1' : ['A', 'F'],
'son_1' : ['B', 'G'],
'father_2' : ['B', 'G'],
'son_2' : ['C', 'H'],
'father_3' : ['C', 'H'],
'son_3' : ['D', 'I'],
'father_4' : ['D', 'I'],
'son_4' : ['E', 'J'],
'father_5' : [np.NaN, 'J'],
'son_5' : [np.NaN, 'K']
})
df_1
Then the final step is easy, it's about taking 'father_x' and 'son_x-1' into 'step_x-1' :
So the resulting dataframes for these examples would be :
df2 = pd.DataFrame({
'key' : ['E', 'J'],
'step_0' : ['A', 'F'],
'step_1' : ['B', 'G'],
'step_2' : ['C', 'H'],
'step_3' : ['D', 'I'],
'step_4' : ['E', 'J'],
})
df2
df_2 = pd.DataFrame({
'key' : ['E', 'K'],
'step_0' : ['A', 'F'],
'step_1' : ['B', 'G'],
'step_2' : ['C', 'H'],
'step_3' : ['D', 'I'],
'step_4' : ['E', 'J'],
'step_5' : [np.NaN, 'K']
})
df_2
My concerne is more about the way to aggregate the data from wide to long following the previously given rules into an recursive function.
It's like in a groupby.agg but that I can't just pass a dictionnary into it because the new columns are based on the number of iteration of the recursive function on each key.
Assign the new key with cumcount then we can do pivot
out = df.assign(c = df.groupby('key').cumcount().add(1).astype(str)).pivot('key','c').sort_index(level=1,axis=1)
out.columns = out.columns.map('_'.join)
out
Out[34]:
father_1 son_1 father_2 son_2 father_3 son_3 father_4 son_4
key
E A B B C C D D E
J F G G H H I I J
I found a solution for this specific type of dataframe : where we only have 1 predecessor for all values except root.
It also requires using NetworkX. I didn't find a way to do it only using Pandas.
First, we need to build a graph from edgelist :
G = nx.from_pandas_edgelist(df, 'father', 'son', create_using=nx.MultiDiGraph, edge_key = 'key')
from networkx.drawing.nx_agraph import write_dot, graphviz_layout
#write_dot(G,'test.dot')
plt.title('draw_networkx')
pos =graphviz_layout(G, prog='dot')
nx.draw(G, pos, with_labels=True, arrows=True)
For pygraphviz install, please see this question.
Then end-to-end links dataframe is built with :
num=0
num_max = len(df.key.drop_duplicates())
m_max = 30
dfy = pd.DataFrame(index=range(num_max),columns=range(m_max))
for n in df.key.drop_duplicates() :
m = 0
dfy.iloc[num, m] = n
while len(list(G.predecessors(dfy.iloc[num,m])))!=0 :
dfy.iloc[num,m+1] = list(G.predecessors(dfy.iloc[num,m]))[0]
m+=1
num+=1
print(dfy)
Output :
0 1 2 3 4 5 6 7 8 9 ...
0 E D C B A NaN NaN NaN NaN NaN ...
1 K J I H G F NaN NaN NaN NaN ...
Im trying to print the 'word results' and the 'number results' next to each other without spaces but unfortunately everything that I've tried hasn't worked and it will only print it out vertically.
import random
user_Input = input('Strong Or Weak?: ')
wrds = ['p', 'e', 'T', 'U', 'S', 'C', 'v', 'Q', 't', 'V', 'I', 'R', 'K', 'A', 'G', 'l', 'r', 'u', 'b', 'P', 'p', 'n', 'H', 'i', 'R', 'I', 'w', 'K', 'v', 'F', 'J', 'y', 'B', 'h', 'o', 'a', 'G', 'X', 'z']
rndm_num = random.randint(9999,99999)
rndm_wrds = random.sample(wrds , k = 8 )
result_wrds= rndm_wrds
result_num = rndm_num
if user_Input == 'Strong' or 'strong':
print(*result_wrds, sep ='') , print(result_num)
If you want the results in the same line, you can use the same print() satetment.
if user_Input == 'Strong' or 'strong':
print(*result_wrds, result_num, sep ='')
# CunvvzpI52080
Make example
letters = np.array([
np.array([
np.array(['a','a','a'])
, np.array(['b','b','b'])
, np.array(['c','c','c'])
])
, np.array([
np.array(['d','d','d'])
, np.array(['e','e','e'])
, np.array(['f','f','f'])
])
, np.array([
np.array(['g','g','g'])
, np.array(['h','h','h'])
, np.array(['i','i','i'])
])
])
array([[['a', 'a', 'a'],
['b', 'b', 'b'],
['c', 'c', 'c']],
[['d', 'd', 'd'],
['e', 'e', 'e'],
['f', 'f', 'f']],
[['g', 'g', 'g'],
['h', 'h', 'h'],
['i', 'i', 'i']]], dtype='<U1')
Desired output
array([['a', 'a', 'a', 'd', 'd', 'd', 'g', 'g', 'g'],
['b', 'b', 'b', 'e', 'e', 'e', 'h', 'h', 'h'],
['c', 'c', 'c', 'f', 'f', 'f', 'i', 'i', 'i']], dtype='<U1')
See how the 2D arrays are now side-by-side?
For the sake of memory, I'd prefer to do this with transpose and reshape rather than stacking/ concatting a new array.
Attempt
letters.reshape(
letters.shape[2],
letters.shape[0]*letters.shape[1]
)
array([['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'],
['d', 'd', 'd', 'e', 'e', 'e', 'f', 'f', 'f'],
['g', 'g', 'g', 'h', 'h', 'h', 'i', 'i', 'i']], dtype='<U1')
I think I need to transpose... before reshaping?
letters.transpose(
1,0,2
).reshape(
# where index represents dimension
letters.shape[2],
letters.shape[0]*letters.shape[1]
)
I have the following dataframe:
df = pd.DataFrame([{'name': 'a', 'label': 'false', 'score': 10},
{'name': 'a', 'label': 'true', 'score': 8},
{'name': 'c', 'label': 'false', 'score': 10},
{'name': 'c', 'label': 'true', 'score': 4},
{'name': 'd', 'label': 'false', 'score': 10},
{'name': 'd', 'label': 'true', 'score': 6},
])
I want to return names that have the "false" label score value higher than the score value of the "true" label with at least the double. In my example, it should return only the "c" name.
First you can pivot the data, and look at the ratio, filter what you want:
new_df = df.pivot(index='name',columns='label', values='score')
new_df[new_df['false'].div(new_df['true']).gt(2)]
output:
label false true
name
c 10 4
If you only want the label, you can do:
new_df.index[new_df['false'].div(new_df['true']).gt(2)].values
which gives
array(['c'], dtype=object)
Update: Since your data is result of orig_df.groupby().count(), you could instead do:
orig_df['label'].eq('true').groupby('name').mean()
and look at the rows with values <= 1/3.