Can we use pytorch scatter_ on GPU - gpu

I'm trying to do one hot encoding on some data with pyTorch on GPU mode, however, it keeps giving me an exception. Can anybody help me?
Here's one example:
def char_OneHotEncoding(x):
coded = torch.zeros(x.shape[0], x.shape[1], 101)
for i in range(x.shape[1]):
coded[:,i] = scatter(x[:,i])
return coded
def scatter(x):
return torch.zeros(x.shape[0], 101).scatter_(1, x.view(-1,1), 1)
So if I give it an tensor on GPU, it shows like this:
x_train = [[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0],
[14, 13, 83, 18, 14],
[ 0, 0, 0, 0, 0]]
print(char_OneHotEncoding(torch.tensor(x_train, dtype=torch.long).cuda()).shape)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-62-95c0c4ade406> in <module>()
4 [14, 13, 83, 18, 14],
5 [ 0, 0, 0, 0, 0]]
----> 6 print(char_OneHotEncoding(torch.tensor(x_train, dtype=torch.long).cuda()).shape)
7 x_train[:5, maxlen:maxlen+5]
<ipython-input-53-055f1bf71306> in char_OneHotEncoding(x)
2 coded = torch.zeros(x.shape[0], x.shape[1], 101)
3 for i in range(x.shape[1]):
----> 4 coded[:,i] = scatter(x[:,i])
5 return coded
6
<ipython-input-53-055f1bf71306> in scatter(x)
7
8 def scatter(x):
----> 9 return torch.zeros(x.shape[0], 101).scatter_(1, x.view(-1,1), 1)
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #3 'index'
BTW, if we simply remove the .cuda() here, everything goes one well
print(char_OneHotEncoding(torch.tensor(x_train, dtype=torch.long)).shape)
torch.Size([5, 5, 101])

Yes, it is possible. You have to pay attention that all tensors are on GPU. In particular, by default, constructors like torch.zeros allocate on CPU, which will lead to this kind of mismatches. Your code can be fixed by constructing with device=x.device, as below
import torch
def char_OneHotEncoding(x):
coded = torch.zeros(x.shape[0], x.shape[1], 101, device=x.device)
for i in range(x.shape[1]):
coded[:,i] = scatter(x[:,i])
return coded
def scatter(x):
return torch.zeros(x.shape[0], 101, device=x.device).scatter_(1, x.view(-1,1), 1)
x_train = torch.tensor([
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0],
[14, 13, 83, 18, 14],
[ 0, 0, 0, 0, 0]
], dtype=torch.long, device='cuda')
print(char_OneHotEncoding(x_train).shape)
Another alternative are constructors called xxx_like, for instance zeros_like, though in this case, since you need different shapes than x, I found device=x.device more readable.

Related

How to use an input layer that also feeds on a previous layer of a neural network?

Let's say I want to predict the winner of a tag-team race, where some drivers are more usually place higher in certain weather conditions:
Race |Driver | Weather | Time
Dummy1 |D1 | Rain | 2:00
Dummy1 |D2 | Rain | 5:00
Dummy1 |D3 | Rain | 4:50
Dummy2 |D1 | Sunny | 3:00
Dummy2 |D2 | Sunny | 2:50
Dummy2 |D2 | Sunny | 2:30
...
The logic is that a team composed of D1 and D3 would outperform any other combination on Rain, but wouldn't have the same luck on other weather. With that said, I thought about the following model:
Layer 1 | Layer 2 | Layer 3 (output)
Driver encoding | weather encoding | expected race time
----------------------------------------------------------------
Input of 0 or 1 | sum(Layer 1 * weights | sum(Layer 2 * weights)
| * Input of 0 or 1) |
This means that layer 2 uses layer 1 as well as input values to compute a value.
The reason I want this architecture instead of having every feature on layer 1 is that I want different features to multiply each other instead of their sum.
I could not find anything like this, but it is probably just me not knowing the name of this approach. Can someone point me to sources or explain know how to replicate this on tensorflow/pytorch/any other lib?
Turns out it was actually pretty simple, for anyone that might stumble upon this post and would like to test this approach, here's rough code:
Racing dataset
# TEAM 1 TEAM 2 "Weather" "WON"
# "A","B","C","D","E", "A","B","C","D","E", W1 W2 W3 (combined times of team 1< combined times of team 2)
dataset=[[ 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1],
[ 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1],
[ 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1],
[ 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1],
[ 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0],
[ 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0],
[ 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0],
[ 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0],
[ 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0],
[ 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1],
]
inputs=[[x[0:-4],x[-4:-1]] for x in dataset]
results=[[x[-1]] for x in dataset]
Typings to make code more readable
from typing import Iterator
class InputLayer():
def __init__(self, inputs,useBias=False):
self.inputs=inputs
self.useBias=useBias
def __str__(self):
return "Layer of size "+ str(self.inputs)
def __repr__(self) -> str:
return self.__str__()
class InputLayerValue():
def __init__(self, values):
self.values=np.array(values)
Actual model
import torch
from torch import nn
class MutipleInputModel(nn.Module):
def __init__(self,input_layers:Iterator[InputLayer],output_size):
super(MutipleInputModel, self).__init__()
print(input_layers)
self.nns=[]
for i in range(len(input_layers)-1):
current:InputLayer=input_layers[i]
next:InputLayer=input_layers[i+1]
il=nn.Linear(current.inputs,next.inputs,current.useBias)
#To have hidden layers, you need to either use another model or create and attach multiple Linear models - nn.Linear(next.inputs,next.inputs)
name="nn"+str(i)
#models must be directly under self to be found by model.parameters()
self.__setattr__(name,il)
self.nns.append(name)
il=nn.Linear(input_layers[-1].inputs,output_size,current.useBias)
name="nnOutput"
self.__setattr__(name,il)
self.nns.append(name)
def forward(self, inputs:Iterator[InputLayerValue]):
inputsLen=len(inputs[0])
if inputsLen != len(self.nns):
raise Exception("Number of input values provided and input layers must be equal. Provided "+str(inputsLen)+" sets of inputs for a "+str(len(self.nns))+"-input-layer network")
#Initialize first layer of inputs with ones which will then be multiplied by the actual input values
lastOutput=torch.ones(len(inputs),len(inputs[0][0].values)) # Layer 1 Outputs | Layer 2 provided Inputs | Layer 2 actual Inputs
for i in range(inputsLen): # lastOutput | multiplier | input
multiplier=torch.from_numpy(np.array([x[i].values for x in inputs])).float() # 0.2 | 0 | 0
input=lastOutput*multiplier # 1.5 | 1 | 1.5
lastOutput=self.__getattr__(self.nns[i])(input) # 1.0 | 5 | 5
return lastOutput
Training
# Define hyperparameters
model = MutipleInputModel(input_layers=[InputLayer(len(x)) for x in inputs[0]],output_size=1)
n_epochs = 1000
lr=0.001
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
for epoch in range(1, n_epochs + 1):
optimizer.zero_grad() # Clears existing gradients from previous epoch
output = model([[InputLayerValue(y) for y in x] for x in inputs])
loss = criterion(output, torch.from_numpy(np.array(results)).float())
loss.backward()
optimizer.step()
print('Epoch: {}/{}.............'.format(epoch, n_epochs), end=' ')
print("Loss: {:.4f}".format(loss.item()))
Testing:
def predict(model, input):
input = [[InputLayerValue(y) for y in input]]
out = model(input)
return nn.Sigmoid()(out[0][0]).item()
print(predict(model,[[1, 1, 0, 0, 0, 0, 0, 1, 1, 0], [1, 0, 0]]))
print(predict(model,[[1, 1, 0, 0, 0, 0, 0, 1, 1, 0], [0, 1, 0]]))
print(predict(model,[[1, 1, 0, 0, 0, 0, 0, 1, 1, 0], [0, 0, 1]]))
This is a really basic implementation, but could easily be modified to have hidden layers.
Clearly needs further testing to see if it is actually better than a traditional NN, but I would say it is great for NN explainability.

PyTorch indexing by argmax

Dear community I have a challenge with regard to tensor indexing in PyTorch. The problem is very simple. Given a tensor create an index tensor to index its maximum values per column.
x = T.tensor([[0, 3, 0, 5, 9, 8, 2, 0],
[0, 4, 9, 6, 7, 9, 1, 0]])
Given this tensor I would like to build a boolean mask for indexing its maximum values per colum. To be specific I do not need its maximum values, torch.max(x, dim=0), nor its indices, torch.argmax(x, dim=0), but a boolean mask for indexing other tensor based on this tensor max values. My ideal output would be:
# Input tensor
x
tensor([[0, 3, 0, 5, 9, 8, 2, 0],
[0, 4, 9, 6, 7, 9, 1, 0]])
# Ideal output bool mask tensor
idx
tensor([[1, 0, 0, 0, 1, 0, 1, 1],
[0, 1, 1, 1, 0, 1, 0, 0]])
I know that values_max = x[idx] and values_max = x.max(dim=0) are equivalent but I am not looking for values_max but for idx.
I have built a solution around it but it just seem to complex and I am sure torch have an optimized way to do this. I have tried to use torch.index_select with the output of x.argmax(dim=0) but failed so I built a custom solution that seems to cumbersome to me so I am asking for help to do this in a vectorized / tensorial / torch way.
You can perform this operation by first extracting the index of the maximum value column-wise of your tensor with torch.argmax, setting keepdim to True
>>> x.argmax(0, keepdim=True)
tensor([[0, 1, 1, 1, 0, 1, 0, 0]])
Then you can use torch.scatter to place 1s in a zero tensor at the designated indices:
>>> torch.zeros_like(x).scatter(0, x.argmax(0,True), value=1)
tensor([[1, 0, 0, 0, 1, 0, 1, 1],
[0, 1, 1, 1, 0, 1, 0, 0]])

How to dot product (1,10^{13}) (10^13,1) scipy sparse matrix

Basically what the title entails.
The two matrices are mostly zeros. And the first is 1 x 9999999999999 and the second is 9999999999999 x 1
When I try to do a dot product I get this.
Unable to allocate 72.8 TiB for an array with shape (10000000000000,) and data type int64
Full traceback </br>
MemoryError: Unable to allocate 72.8 TiB for an array with shape (10000000000000,) and data type int64
In [31]: imputed.dot(s)
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-31-670cfc69d4cf> in <module>
----> 1 imputed.dot(s)
~/.local/lib/python3.8/site-packages/scipy/sparse/base.py in dot(self, other)
357
358 """
--> 359 return self * other
360
361 def power(self, n, dtype=None):
~/.local/lib/python3.8/site-packages/scipy/sparse/base.py in __mul__(self, other)
478 if self.shape[1] != other.shape[0]:
479 raise ValueError('dimension mismatch')
--> 480 return self._mul_sparse_matrix(other)
481
482 # If it's a list or whatever, treat it like a matrix
~/.local/lib/python3.8/site-packages/scipy/sparse/compressed.py in _mul_sparse_matrix(self, other)
499
500 major_axis = self._swap((M, N))[0]
--> 501 other = self.__class__(other) # convert to this format
502
503 idx_dtype = get_index_dtype((self.indptr, self.indices,
~/.local/lib/python3.8/site-packages/scipy/sparse/compressed.py in __init__(self, arg1, shape, dtype, copy)
32 arg1 = arg1.copy()
33 else:
---> 34 arg1 = arg1.asformat(self.format)
35 self._set_self(arg1)
36
~/.local/lib/python3.8/site-packages/scipy/sparse/base.py in asformat(self, format, copy)
320 # Forward the copy kwarg, if it's accepted.
321 try:
--> 322 return convert_method(copy=copy)
323 except TypeError:
324 return convert_method()
~/.local/lib/python3.8/site-packages/scipy/sparse/csc.py in tocsr(self, copy)
135 idx_dtype = get_index_dtype((self.indptr, self.indices),
136 maxval=max(self.nnz, N))
--> 137 indptr = np.empty(M + 1, dtype=idx_dtype)
138 indices = np.empty(self.nnz, dtype=idx_dtype)
139 data = np.empty(self.nnz, dtype=upcast(self.dtype))
MemoryError: Unable to allocate 72.8 TiB for an array with shape (10000000000000,) and data type int64
It seems the scipy is trying to create a temp array.
I am using the .dot method that scipy provides.
I am also open to non-scipy solutions.
Thanks!
In [105]: from scipy import sparse
If I make a (100,1) csr matrix:
In [106]: A = sparse.random(100,1,format='csr')
In [107]: A
Out[107]:
<100x1 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>
The data and indices are:
In [109]: A.data
Out[109]: array([0.19060481])
In [110]: A.indices
Out[110]: array([0], dtype=int32)
In [112]: A.indptr
Out[112]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)
So even with only 1 nonzero term, one array is large (101).
On the other hand the csc format for the same array has a much smaller storage. But csc with (1,100) shape will look like the csr.
In [113]: Ac = A.tocsc()
In [114]: Ac.indptr
Out[114]: array([0, 1], dtype=int32)
In [115]: Ac.indices
Out[115]: array([88], dtype=int32)
Math, especially matrix products is done with csr/csc formats. So it may be hard to avoid this 80 TB memory use.
Looking at the traceback I see that it's trying to convert other to the format that matches self.
So with A.dot(B), and A is (1,N) csr, the small shape. B is (N,1) csc, also the small shape. But B.tocsr() requires the large (N+1,) shaped indptr.
Let's try an alternative to dot
First 2 matrices:
In [122]: A = sparse.random(1,100, .2,format='csr')
In [123]: B = sparse.random(100,1, .2,format='csc')
In [124]: A
Out[124]:
<1x100 sparse matrix of type '<class 'numpy.float64'>'
with 20 stored elements in Compressed Sparse Row format>
In [125]: B
Out[125]:
<100x1 sparse matrix of type '<class 'numpy.float64'>'
with 20 stored elements in Compressed Sparse Column format>
In [126]: A#B
Out[126]:
<1x1 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>
In [127]: _.A
Out[127]: array([[1.33661021]])
Their nonzero element indices. Only the ones that match matter.
In [128]: A.indices, B.indices
Out[128]:
(array([16, 20, 23, 28, 30, 37, 39, 40, 43, 49, 54, 59, 61, 63, 67, 70, 74,
91, 94, 99], dtype=int32),
array([ 5, 8, 15, 25, 34, 35, 40, 46, 47, 51, 53, 60, 68, 70, 75, 81, 87,
90, 91, 94], dtype=int32))
equality matrix:
In [129]: mask = A.indices[:,None]==B.indices
In [132]: np.nonzero(mask.any(axis=0))
Out[132]: (array([ 6, 13, 18, 19]),)
In [133]: np.nonzero(mask.any(axis=1))
Out[133]: (array([ 7, 15, 17, 18]),)
The matching indices:
In [139]: A.indices[Out[133]]
Out[139]: array([40, 70, 91, 94], dtype=int32)
In [140]: B.indices[Out[132]]
Out[140]: array([40, 70, 91, 94], dtype=int32)
sum of the corresponding data values matches [127]
In [141]: (A.data[Out[133]]*B.data[Out[132]]).sum()
Out[141]: 1.3366102138511582

Standard implementation of vectorize_sequences

In François Chollet's Deep Learning with Python, appears this function:
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
I understand what this function does. This function is asked about in this quesion and in this question as well, also mentioned here, here, here, here, here & here. Despite being so wide-spread, this vectorization is, according to Chollet's book is done "manually for maximum clarity." I am interested whether there is a standard, not "manual" way of doing it.
Is there a standard Keras / Tensorflow / Scikit-learn / Pandas / Numpy implementation of a function which behaves very similarly to the function above?
Solution with MultiLabelBinarizer
Assuming sequences is an array of integers with maximum possible value upto dimension-1, we can use MultiLabelBinarizer from sklearn.preprocessing to replicate the behaviour of the function vectorize_sequences
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer(classes=range(dimension))
mlb.fit_transform(sequences)
Solution with Numpy broadcasting
Assuming sequences is an array of integers with maximum possible value upto dimension-1
(np.array(sequences)[:, :, None] == range(dimension)).any(1).view('i1')
Worked out example
>>> sequences
[[4, 1, 0],
[4, 0, 3],
[3, 4, 2]]
>>> dimension = 10
>>> mlb = MultiLabelBinarizer(classes=range(dimension))
>>> mlb.fit_transform(sequences)
array([[1, 1, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0, 0, 0]])
>>> (np.array(sequences)[:, :, None] == range(dimension)).any(1).view('i1')
array([[0, 1, 1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 0, 0, 0, 0, 0]])

Counting zeros in a rolling - numpy array (including NaNs)

I am trying to find a way of Counting zeros in a rolling using numpy array ?
Using pandas I can get it using:
df['demand'].apply(lambda x: (x == 0).rolling(7).sum()).fillna(0))
or
df['demand'].transform(lambda x: x.rolling(7).apply(lambda x: 7 - np.count _nonzero(x))).fillna(0)
In numpy, using the code from Here
def rolling_window(a, window_size):
shape = (a.shape[0] - window_size + 1, window_size) + a.shape[1:]
print(shape)
strides = (a.strides[0],) + a.strides
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
arr = np.asarray([10, 20, 30, 5, 6, 0, 0, 0])
np.count_nonzero(rolling_window(arr==0, 7), axis=1)
Output:
array([2, 3])
However, I need the first 6 NaNs as well, and fill it with zeros:
Expected output:
array([0, 0, 0, 0, 0, 0, 2, 3])
Think an efficient one would be with 1D convolution -
def sum_occurences_windowed(arr, W):
K = np.ones(W, dtype=int)
out = np.convolve(arr==0,K)[:len(arr)]
out[:W-1] = 0
return out
Sample run -
In [42]: arr
Out[42]: array([10, 20, 30, 5, 6, 0, 0, 0])
In [43]: sum_occurences_windowed(arr,W=7)
Out[43]: array([0, 0, 0, 0, 0, 0, 2, 3])
Timings on varying length arrays and window of 7
Including count_rolling from #Quang Hoang's post.
Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
import benchit
funcs = [sum_occurences_windowed, count_rolling]
in_ = {n:(np.random.randint(0,5,(n)),7) for n in [10,20,50,100,200,500,1000,2000,5000]}
t = benchit.timings(funcs, in_, multivar=True, input_name='Length')
t.plot(logx=True, save='timings.png')
Extending to generic n-dim arrays
from scipy.ndimage.filters import convolve1d
def sum_occurences_windowed_ndim(arr, W, axis=-1):
K = np.ones(W, dtype=int)
out = convolve1d((arr==0).astype(int),K,axis=axis,origin=-(W//2))
out.swapaxes(axis,0)[:W-1] = 0
return out
So, on a 2D array, for counting along each row, use axis=1 and for cols, axis=0 and so on.
Sample run -
In [155]: np.random.seed(0)
In [156]: a = np.random.randint(0,3,(3,10))
In [157]: a
Out[157]:
array([[0, 1, 0, 1, 1, 2, 0, 2, 0, 0],
[0, 2, 1, 2, 2, 0, 1, 1, 1, 1],
[0, 1, 0, 0, 1, 2, 0, 2, 0, 1]])
In [158]: sum_occurences_windowed_ndim(a, W=7)
Out[158]:
array([[0, 0, 0, 0, 0, 0, 3, 2, 3, 3],
[0, 0, 0, 0, 0, 0, 2, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 4, 3, 4, 3]])
# Verify with earlier 1D solution
In [159]: np.vstack([sum_occurences_windowed(i,7) for i in a])
Out[159]:
array([[0, 0, 0, 0, 0, 0, 3, 2, 3, 3],
[0, 0, 0, 0, 0, 0, 2, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 4, 3, 4, 3]])
Let's test out our original 1D input array -
In [187]: arr
Out[187]: array([10, 20, 30, 5, 6, 0, 0, 0])
In [188]: sum_occurences_windowed_ndim(arr, W=7)
Out[188]: array([0, 0, 0, 0, 0, 0, 2, 3])
I would modify the function as follow:
def count_rolling(a, window_size):
shape = (a.shape[0] - window_size + 1, window_size) + a.shape[1:]
strides = (a.strides[0],) + a.strides
rolling = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
out = np.zeros_like(a)
out[window_size-1:] = (rolling == 0).sum(1)
return out
arr = np.asarray([10, 20, 30, 5, 6, 0, 0, 0])
count_rolling(arr,7)
Output:
array([0, 0, 0, 0, 0, 0, 2, 3])