I have the following dataframe:
# List of Tuples
matrix = [([22, 23], [34, 35, 65], [23, 29, 31]),
([33, 34], [31, 44], [11, 16, 18]),
([44, 56, 76], [16, 34, 76], [21, 34]),
([55, 34], [32, 35, 38], [22, 24, 26]),
([66, 65, 67], [33, 38, 39], [27, 32, 34]),
([77, 39, 45], [35, 36, 38], [11, 21, 34])]
# Create a DataFrame object
df = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
I'm able to apply my custom function to output start, end items in list like below for all columns:
def fl(x):
return [x[0], x[len(x)-1]]
df.apply(lambda x : [fl(i) for i in x])
But i want to apply the function to selected columns x & z.
I'm trying like below referring to this link
df.apply(lambda x: fl(x) if x in ['x', 'y'] else x)
and like this:
df[['x', 'y']].apply(fl)
How to get the output with the function applied to only x and z columns with y column unchanged.
Use DataFrame.applymap for elementwise processing, also for last value is possible use [-1] indexing:
def fl(x):
return [x[0], x[-1]]
df[['x', 'z']] = df[['x', 'z']].applymap(fl)
print (df)
x y z
a [22, 23] [34, 35, 65] [23, 31]
b [33, 34] [31, 44] [11, 18]
c [44, 76] [16, 34, 76] [21, 34]
d [55, 34] [32, 35, 38] [22, 26]
e [66, 67] [33, 38, 39] [27, 34]
f [77, 45] [35, 36, 38] [11, 34]
Or for solution with DataFrame.apply use zip with mapping tuples to lists and selexting by str:
def fl(x):
return list(map(list, zip(x.str[0], x.str[-1])))
df[['x', 'z']] = df[['x', 'z']].apply(fl)
print (df)
x y z
a [22, 23] [34, 35, 65] [23, 31]
b [33, 34] [31, 44] [11, 18]
c [44, 76] [16, 34, 76] [21, 34]
d [55, 34] [32, 35, 38] [22, 26]
e [66, 67] [33, 38, 39] [27, 34]
f [77, 45] [35, 36, 38] [11, 34]
Found out the mistake i'm doing.
Thanks for the reply.
I changed the function like below:
def fl(x):
new = []
for i in x:
new.append([i[0], i[-1]])
return new
Then applied the function like this.
df.apply(lambda x : fl(x) if x.name in ['x', 'z'] else x)
Then i'm able to get the expected output.
Related
I generate graphs from a big set of JSON files that I don't have a priory info about node positions in a graph image. As a result, when I draw these graphs, I get images with nodes and edges unevenly arranged in the image with lots of unused empty space.
The following is an example of a program that generates a connected graph of 38 nodes.
With default NetworkX image size connected nodes overlap each other. And with increased image size the large empty spaces appear.
How to create layout that will arrange nodes and edges evenly taking into account image size without large empty spaces?
import networkx as nx
import matplotlib.pyplot as plt
import random
import string
def generate_label(i):
label = str(i)+':'+random.choice(['q','a'])+':' \
+''.join(random.sample(string.ascii_letters, 3))
return label
edges = [[0, 16], [1, 13], [2, 20], [17, 2], [3, 28], [17, 3], [4, 27],
[17, 4], [7, 26], [17, 7], [21, 9], [29, 10], [31, 11], [32, 12],
[1, 13], [0, 16], [17, 18], [17, 2], [17, 21], [17, 22], [17, 3],
[17, 4], [17, 29], [17, 24], [17, 7], [18, 19], [17, 18], [18, 19],
[2, 20], [21, 9], [17, 21], [22, 23], [17, 22], [22, 23], [24, 25],
[17, 24], [24, 25], [7, 26], [4, 27], [3, 28], [29, 10], [17, 29],
[30, 31], [30, 32], [30, 33], [30, 31], [31, 11], [30, 32], [32, 12],
[30, 33], [34, 35], [34, 35]]
G = nx.Graph()
for i in range(38):
G.add_node(i, label = generate_label(i))
for e in edges:
G.add_edge(e[0], e[1])
labels = nx.get_node_attributes(G, 'label')
plt.figure(figsize=(14,20))
nx.draw_networkx(nx.relabel_nodes(G, labels), with_labels=True,
node_color = 'orange', node_size=200, font_size=12)
plt.show()
The following code indicates that the Einstein sum of two 3D (2x2x2) matrices is a 4D (2x2x2x2) matrix.
$ c_{ijlm} = \Sigma_k a_{i,j,k}b_{k,l,m} $
$ c_{0,0,0,0} = \Sigma_k a_{0,0,k}b_{k,0,0} = 1x9 + 5x11 = 64 $
But, c_{0,0,0,0} = 35 according to the result below:
>>> a=np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
>>> b=np.array([[[9,10],[11,12]],[[13,14],[15,16]]])
>>> c=np.einsum('ijk,klm->ijlm', a,b)
>>> c
array([[[[ 35, 38],
[ 41, 44]],
[[ 79, 86],
[ 93, 100]]],
[[[123, 134],
[145, 156]],
[[167, 182],
[197, 212]]]])
Could someone explain how the operation is carried out?
The particular element that you are testing, [0,0,0,0] is calculated with:
In [167]: a[0,0,:]*b[:,0,0]
Out[167]: array([ 9, 26])
In [168]: a[0,0,:]
Out[168]: array([1, 2])
In [169]: b[:,0,0]
Out[169]: array([ 9, 13])
It may be easier to understand if we reshape both arrays to 2d:
In [170]: A=a.reshape(-1,2); B=b.reshape(2,-1)
In [171]: A
Out[171]:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
In [172]: B
Out[172]:
array([[ 9, 10, 11, 12],
[13, 14, 15, 16]])
In [173]: A#B
Out[173]:
array([[ 35, 38, 41, 44],
[ 79, 86, 93, 100],
[123, 134, 145, 156],
[167, 182, 197, 212]])
The same numbers, but in (4,4) instead of (2,2,2,2). It's easier to read the (1,2) and (9,13) off of A and B.
Looking at the answers to this question: How to understand numpy's combined slicing and indexing example
I'm still unable to understand the result of indexing with a combination of a slice and two 1d arrays, like this:
>>> m = np.arange(36).reshape(3,3,4)
>>> m
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]],
[[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]]])
>>> m[1:3, [2,1],[2,1]]
array([[22, 17],
[34, 29]])
Why is the result equivalent to this?
np.array([
[m[1,2,2],m[1,1,1]],
[m[2,2,2],m[2,1,1]]
])
X.shape == (10,4)
y.shape == (10)
I'd like to produce M, where each entry in M is defined as M[r,c] == X[r, y[r]]; that is, use y to index into the appropriate column of X.
How can I do this efficiently (without loops)?
M could have a single column, though eventually I need to broadcast it so that it has the same shape as X. c starts from the first col of X (0) and goes to the last (9).
Just do :
X=np.arange(40).reshape(10,4)
Y=np.random.randint(0,4,10)
M=X[range(10),Y]
for
In [8]: X
Out[8]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35],
[36, 37, 38, 39]])
In [9]: Y
Out[9]: array([1, 1, 3, 3, 1, 2, 2, 3, 2, 1])
In [10]: M
Out[10]: array([ 1, 5, 11, 15, 17, 22, 26, 31, 34, 37])
Suppose I have a (50, 5) array. Is there a way for me to shuffle it on the basis of groupings of rows/sequences of datapoints, i.e. instead of shuffling every row, shuffle chunks of say, 5 rows?
Thanks
Approach #1 : Here's an approach that reshapes into a 3D array based on the group size, indexes into the indices of blocks with shuffled indices obtained from np.random.permutation and finally reshapes back to 2D -
N = 5 # Blocks of N rows
M,n = a.shape[0]//N, a.shape[1]
out = a.reshape(M,-1,n)[np.random.permutation(M)].reshape(-1,n)
Sample run -
In [141]: a
Out[141]:
array([[89, 26, 12],
[97, 60, 96],
[94, 38, 54],
[41, 63, 29],
[88, 62, 48],
[95, 66, 32],
[28, 58, 80],
[26, 35, 89],
[72, 91, 38],
[26, 70, 93]])
In [142]: N = 2 # Blocks of N rows
In [143]: M,n = a.shape[0]//N, a.shape[1]
In [144]: a.reshape(M,-1,n)[np.random.permutation(M)].reshape(-1,n)
Out[144]:
array([[94, 38, 54],
[41, 63, 29],
[28, 58, 80],
[26, 35, 89],
[89, 26, 12],
[97, 60, 96],
[72, 91, 38],
[26, 70, 93],
[88, 62, 48],
[95, 66, 32]])
Approach #2 : One can also simply use np.random.shuffle for an in-situ change -
np.random.shuffle(a.reshape(M,-1,n))
Sample run -
In [156]: a
Out[156]:
array([[15, 12, 14],
[55, 39, 35],
[73, 78, 36],
[54, 52, 32],
[83, 34, 91],
[42, 11, 98],
[27, 65, 47],
[78, 75, 82],
[33, 52, 93],
[87, 51, 80]])
In [157]: N = 2 # Blocks of N rows
In [158]: M,n = a.shape[0]//N, a.shape[1]
In [159]: np.random.shuffle(a.reshape(M,-1,n))
In [160]: a
Out[160]:
array([[15, 12, 14],
[55, 39, 35],
[27, 65, 47],
[78, 75, 82],
[73, 78, 36],
[54, 52, 32],
[33, 52, 93],
[87, 51, 80],
[83, 34, 91],
[42, 11, 98]])