How to add 2 or more kinds of Vertices into SGraph in GraphLab Create? - graphlab

I am using graphlab create in ubuntu. I try to add 2 kinds of vertices from 2 csv files using the following commands:
import graphlab as gl
v1 = gl.SFrame.read_csv('~/Documents/1.csv')
v2= gl.SFrame.read_csv('~/Documents/2.csv')
g = g.add_vertices(vertices=v1, vid_field='name')
g = g.add_vertices(vertices=v2, vid_field='id')
But I found that it does not work. After I run the last command try to add the second kind of vertices, the vertices I added the first time got overwritten! How can I do it correctly? And how can I do it correctly to add 2 kinds of edges?
Thanks ahead!

In the following example, I create two sets of vertices and add them to a graph, then create two sets of edges and add them to the graph.
>>> a = graphlab.SFrame({'id': [0, 1, 2, 3]})
>>> b = graphlab.SFrame({'name': [5, 6, 7]})
>>> g = graphlab.SGraph().add_vertices(a, 'id').add_vertices(b, 'name')
>>> e1 = graphlab.SFrame({'id': [0, 0, 1], 'name': [6, 6, 5]})
>>> e2 = graphlab.SFrame({'id': [2, 3], 'name': [5, 7]})
>>> g = g.add_edges(e1, 'id', 'name').add_edges(e2, 'id', 'name')
>>> g
SGraph({'num_edges': 5, 'num_vertices': 7})
Vertex Fields:['__id']
Edge Fields:['__src_id', '__dst_id']

Related

get elements in one array while not in other array along with axis 0 [duplicate]

I have 2 2d numpy arrays A and B
I want to remove all the rows in A which appear in B.
I tried something like this:
A[~np.isin(A, B)]
but isin keeps the dimensions of A, I need one boolean value per row to filter it.
EDIT: something like this
A = np.array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
B = np.array([[1, 1, 1],
[3, 1, 1]])
.....
A = np.array([[3, 0, 4],
[0, 5, 9]])
Probably not the most performant solution, but does exactly what you want. You can change the dtype of A and B to be a unit consisting of one row. You need to ensure that the arrays are contiguous first, e.g. with ascontiguousarray:
Av = np.ascontiguousarray(A).view(np.dtype([('', A.dtype, A.shape[1])])).ravel()
Bv = np.ascontiguousarray(B).view(Av.dtype).ravel()
Now you can apply np.isin directly:
>>> np.isin(Av, Bv)
array([False, True, False])
According to the docs, invert=True is faster than negating the output of isin, so you can do
A[np.isin(Av, Bv, invert=True)]
Try the following - it uses matrix multiplication for dimensionality reduction:
import numpy as np
A = np.array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
B = np.array([[1, 1, 1],
[3, 1, 1]])
arr_max = np.maximum(A.max(0) + 1, B.max(0) + 1)
print (A[~np.isin(A.dot(arr_max), B.dot(arr_max))])
Output:
[[3 0 4]
[0 5 9]]
This is certainly not the most performant solution but it is relatively easy to read:
A = np.array([row for row in A if row not in B])
Edit:
I found that the code does not correctly work, but this does:
A = [row for row in A if not any(np.equal(B, row).all(1))]

Transform polygons using matrices

What is the better way to transform polygons using matrices?
mat = np.array([[-0.75, 0],[0,-0.75]])
a = np.array([0, 4])
b = np.array([-4, 8])
c = np.array([-8, 0])
d = np.array([-8, -4])
e = np.array([-4, -4])
np.dot(mat,a)
np.dot(mat,b)
np.dot(mat,c)
np.dot(mat,d)
np.dot(mat,e)
Dot products (shorthand: #) can be broadcasted, but you need to make sure the axes align correctly.
mat = np.array([[-0.75, 0 ],
[ 0, -0.75]])
poly = np.array([[ 0, 4], # a
[-4, 8], # b
[-8, 0], # c
[-8, -4], # d
[-4, -4]]) # e
out = (mat # poly.T).T

Multiply every row of a matrix with every row of another matrix

In numpy / PyTorch, I have two matrices, e.g. X=[[1,2],[3,4],[5,6]], Y=[[1,1],[2,2]]. I would like to dot product every row of X with every row of Y, and have the results
[[3, 6],[7, 14], [11,22]]
How do I achieve this?, Thanks!
I think this is what you are looking for:
import numpy as np
x= [[1,2],[3,4],[5,6]]
y= [[1,1],[2,2]]
x = np.asarray(x) #convert list to numpy array
y = np.asarray(y) #convert list to numpy array
product = np.dot(x, y.T)
.T transposes the matrix, which is neccessary in this case for the multiplication (because of the way dot products are defined). print(product) will output:
[[ 3 6]
[ 7 14]
[11 22]]
Using einsum
np.einsum('ij,kj->ik', X, Y)
array([[ 3, 6],
[ 7, 14],
[11, 22]])
In PyTorch, you can achieve this using torch.mm(a, b) or torch.matmul(a, b), as shown below:
x = np.array([[1,2],[3,4],[5,6]])
y = np.array([[1,1],[2,2]])
x = torch.from_numpy(x)
y = torch.from_numpy(y)
# print(torch.matmul(x, torch.t(y)))
print(torch.mm(x, torch.t(y)))
output:
tensor([[ 3, 6],
[ 7, 14],
[11, 22]], dtype=torch.int32)

Perform matrix multiplication between two arrays and get result only on masked places

I have two dense matrices, A [200000,10], B [10,100000]. I need to multiply them to get matrix C. I can't do that directly, since the resulting matrix won't fit into the memory. Moreover, I need only a few elements from the resulting matrix, like 1-2% of the total number of elements. I have a third matrix W [200000,100000] which is sparse and has non-zero elements on exactly those places which are interesting to me in the matrix C.
Is there a way to use W as a "mask" so that the resulting matrix C will be sparse and will contain only the needed elements?
Since a matrix multiplication is just a table of dot products, we can just perform the specific dot products we need, in a vectorized fashion.
import numpy as np
import scipy as sp
iX, iY = sp.nonzero(W)
values = np.sum(A[iX]*B[:, iY].T, axis=-1) #batched dot product
C = sp.sparse.coo_matrix(values, np.asarray([iX,iY]).T)
First, get the indexes of the non zero places in W, and then you can just get the (i,j) element of the result matrix by multiplying the i-th row in A with the j-th column in B, and save the result as a tuple (i,j,res) instead of saving it as a matrix (this is the right way to save sparse matrices).
Here's one approach using np.einsum for a vectorized solution -
from scipy import sparse
from scipy.sparse import coo_matrix
# Get row, col for the output array
r,c,_= sparse.find(W)
# Get the sum-reduction using valid rows and corresponding cols from A, B
out = np.einsum('ij,ji->i',A[r],B[:,c])
# Store as sparse matrix
out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
Sample run -
1) Inputs :
In [168]: A
Out[168]:
array([[4, 6, 1, 1, 1],
[0, 8, 1, 3, 7],
[2, 8, 3, 2, 2],
[3, 4, 1, 6, 3]])
In [169]: B
Out[169]:
array([[5, 2, 4],
[2, 1, 3],
[7, 7, 2],
[5, 7, 5],
[8, 5, 0]])
In [176]: W
Out[176]:
<4x3 sparse matrix of type '<type 'numpy.bool_'>'
with 5 stored elements in Compressed Sparse Row format>
In [177]: W.toarray()
Out[177]:
array([[ True, False, False],
[False, False, False],
[ True, True, False],
[ True, False, True]], dtype=bool)
2) Using dense array to perform direct calculations and verify results later on :
In [171]: (A.dot(B))*W.toarray()
Out[171]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])
3) Use the proposed codes and get sparse matrix output :
In [172]: # Using proposed codes
...: r,c,_= sparse.find(W)
...: out = np.einsum('ij,ji->i',A[r],B[:,c])
...: out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
...:
4) Finally verify results by converting to dense/array version and checking against direct version -
In [173]: out_sparse.toarray()
Out[173]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])

Transpose of a vector using numpy

I am having an issue with Ipython - Numpy. I want to do the following operation:
x^T.x
with and x^T the transpose operation on vector x. x is extracted from a txt file with the instruction:
x = np.loadtxt('myfile.txt')
The problem is that if i use the transpose function
np.transpose(x)
and uses the shape function to know the size of x, I get the same dimensions for x and x^T. Numpy gives the size with a L uppercase indice after each dimensions. e.g.
print x.shape
print np.transpose(x).shape
(3L, 5L)
(3L, 5L)
Does anybody know how to solve this, and compute x^T.x as a matrix product?
Thank you!
What np.transpose does is reverse the shape tuple, i.e. you feed it an array of shape (m, n), it returns an array of shape (n, m), you feed it an array of shape (n,)... and it returns you the same array with shape(n,).
What you are implicitly expecting is for numpy to take your 1D vector as a 2D array of shape (1, n), that will get transposed into a (n, 1) vector. Numpy will not do that on its own, but you can tell it that's what you want, e.g.:
>>> a = np.arange(4)
>>> a
array([0, 1, 2, 3])
>>> a.T
array([0, 1, 2, 3])
>>> a[np.newaxis, :].T
array([[0],
[1],
[2],
[3]])
As explained by others, transposition won't "work" like you want it to for 1D arrays.
You might want to use np.atleast_2d to have a consistent scalar product definition:
def vprod(x):
y = np.atleast_2d(x)
return np.dot(y.T, y)
I had the same problem, I used numpy matrix to solve it:
# assuming x is a list or a numpy 1d-array
>>> x = [1,2,3,4,5]
# convert it to a numpy matrix
>>> x = np.matrix(x)
>>> x
matrix([[1, 2, 3, 4, 5]])
# take the transpose of x
>>> x.T
matrix([[1],
[2],
[3],
[4],
[5]])
# use * for the matrix product
>>> x*x.T
matrix([[55]])
>>> (x*x.T)[0,0]
55
>>> x.T*x
matrix([[ 1, 2, 3, 4, 5],
[ 2, 4, 6, 8, 10],
[ 3, 6, 9, 12, 15],
[ 4, 8, 12, 16, 20],
[ 5, 10, 15, 20, 25]])
While using numpy matrices may not be the best way to represent your data from a coding perspective, it's pretty good if you are going to do a lot of matrix operations!
For starters L just means that the type is a long int. This shouldn't be an issue. You'll have to give additional information about your problem though since I cannot reproduce it with a simple test case:
In [1]: import numpy as np
In [2]: a = np.arange(12).reshape((4,3))
In [3]: a
Out[3]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
In [4]: a.T #same as np.transpose(a)
Out[4]:
array([[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]])
In [5]: a.shape
Out[5]: (4, 3)
In [6]: np.transpose(a).shape
Out[6]: (3, 4)
There is likely something subtle going on with your particular case which is causing problems. Can you post the contents of the file that you're reading into x?
This is either the inner or outer product of the two vectors, depending on the orientation you assign to them. Here is how to calculate either without changing x.
import numpy
x = numpy.array([1, 2, 3])
inner = x.dot(x)
outer = numpy.outer(x, x)
The file 'myfile.txt' contain lines such as
5.100000 3.500000 1.400000 0.200000 1
4.900000 3.000000 1.400000 0.200000 1
Here is the code I run:
import numpy as np
data = np.loadtxt('iris.txt')
x = data[1,:]
print x.shape
print np.transpose(x).shape
print x*np.transpose(x)
print np.transpose(x)*x
And I get as a result
(5L,)
(5L,)
[ 24.01 9. 1.96 0.04 1. ]
[ 24.01 9. 1.96 0.04 1. ]
I would be expecting one of the two last result to be a scalar instead of a vector, because x^T.x (or x.x^T) should give a scalar.
b = np.array([1, 2, 2])
print(b)
print(np.transpose([b]))
print("rows, cols: ", b.shape)
print("rows, cols: ", np.transpose([b]).shape)
Results in
[1 2 2]
[[1]
[2]
[2]]
rows, cols: (3,)
rows, cols: (3, 1)
Here (3,) can be thought as "(3, 0)".
However if you want the transpose of a matrix A, np.transpose(A) is the solution. Shortly, [] converts a vector to a matrix, a matrix to a higher dimension tensor.