numpy : View support in CUDA

numpy : View support in CUDA - numpy

Does numba+CUDA and/or cupy support numpy views on the GPU ?
Specifically views with different types, like this :
In [179]: x = np.random.randint(0,100,(10,5), dtype=np.int16)
In [180]: y = x[:,4].view(dtype=np.float16)
In [188]: y[:] = 0
Out[188]:
array([[97, 14, 75, 42, 0],
[30, 87, 78, 62, 0],
[23, 92, 90, 37, 0],
[15, 12, 58, 36, 0],
[21, 32, 88, 83, 0],
[99, 70, 92, 16, 0],
[ 3, 88, 93, 16, 0],
[52, 32, 24, 15, 0],
[52, 99, 17, 97, 0],
[20, 33, 59, 56, 0]], dtype=int16)
In [191]: y
Out[191]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float16)
I need float16 to work
numba: https://github.com/numba/numba/issues/4402
cupy seems to have some support for float16

Related

Keep getting nan-loss when using Seq2SeqTransformer

I am trying to train a transformer model on text data. The task is to predict missing (masked) words so e.g. the input "How are you ?" gets mapped to "How [MASKED] you ?" like so:
inputs = [69, 4, 1337, 666] # How [MASK] you ?
targets = [69, 42, 1337, 666] # How are you ?
The problem is, that after a few steps, sometimes after a few hundred, sometimes after a few thousand, the loss becomes nan.
I have tried a model with just 90k parameters but also one with 10m parameters. The result is always the same.
The code below shows how I instantiate a Seq2SeqTransformer.
Using the debugger does not give me anything in the "Graph Executions" section. All I see is this:
Any idea what I could be doing wrong here? The learning rate is already rather small so I can't imagine that this is the problem.
model = Seq2SeqTransformer(
vocab_size=vocab_size, # <= 2000
embedding_width=32,
dropout_rate=0.1,
encoder_layer=TransformerEncoder(
num_layers=1, num_attention_heads=2, intermediate_size=64,
dropout_rate=0.1, intermediate_dropout=0.1, attention_dropout_rate=0.1
),
decoder_layer=TransformerDecoder(
num_layers=1, num_attention_heads=2, intermediate_size=64,
dropout_rate=0.1, intermediate_dropout=0.1, attention_dropout_rate=0.1
)
)
optimizer = Adam(
learning_rate=TransformerSchedule(
min_lr=2.5e-6,
max_lr=1.5e-4,
warmup_steps=6000,
warm_steps=30000
)
)
model.compile(
optimizer=optimizer,
loss=SmoothedSparseCategoricalCrossentropy(0.1),
)
model.fit(
train_data.repeat(),
steps_per_epoch=1000,
epochs=500,
callbacks=callbacks,
validation_data=valid_data,
validation_steps=100,
)
Batch Data
Just to verify that the data I present to the model is alright, here I print(a_batch). Since samples get bucketed, they all have the same length which is also why input_masks is all 1s.
Note: ID of [MASK] is 4.
X
{'inputs': <tf.Tensor: shape=(119, 41), dtype=int32, numpy=
array([[ 2, 192, 214, ..., 525, 7, 3],
[ 2, 57, 964, ..., 15, 7, 3],
[ 2, 4, 191, ..., 646, 7, 3],
...,
[ 2, 430, 29, ..., 675, 4, 3],
[ 2, 101, 45, ..., 15, 7, 3],
[ 2, 421, 11, ..., 15, 4, 3]], dtype=int32)>,
'input_masks': <tf.Tensor: shape=(119, 41), dtype=float32, numpy=
array([[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
...,
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.]], dtype=float32)>,
'targets': <tf.Tensor: shape=(119, 41), dtype=int32, numpy=
array([[ 2, 192, 214, ..., 525, 7, 3],
[ 2, 57, 964, ..., 15, 7, 3],
[ 2, 104, 191, ..., 646, 7, 3],
...,
[ 2, 430, 29, ..., 675, 7, 3],
[ 2, 101, 45, ..., 15, 7, 3],
[ 2, 421, 11, ..., 15, 7, 3]], dtype=int32)>}
Y
tf.Tensor(
[[ 2 192 214 ... 525 7 3]
[ 2 57 964 ... 15 7 3]
[ 2 104 191 ... 646 7 3]
...
[ 2 430 29 ... 675 7 3]
[ 2 101 45 ... 15 7 3]
[ 2 421 11 ... 15 7 3]], shape=(119, 41), dtype=int32)

Fastest way to do vectorized reduce product with boolean mask

I have a 3D numpy array A and 2D numpy boolean mask B.
The first two dimensions of A matches B
And I'm wondering if there is any fast way for each first dimension of A, select the third dimension along second based on B, perform a reduced product over the second dimension.
My expected out C would be a 2D numpy array, with the first dimension of A and the second dimension from the third of A.
My current solution is C = np.prod(A*np.repeat(B[...,np.newaxis], A.shape[-1], 2), 1)
Is there any better alternative?

With concrete example:
In [364]: A=np.arange(1,25).reshape(2,3,4); B=np.arange(1,7).reshape(2,3)
In [365]: C = np.prod(A*np.repeat(B[...,np.newaxis], A.shape[-1], 2), 1)
That repeat does:
In [366]: np.repeat(B[...,np.newaxis], A.shape[-1], 2)
Out[366]:
array([[[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]],
[[4, 4, 4, 4],
[5, 5, 5, 5],
[6, 6, 6, 6]]])
In [367]: _.shape
Out[367]: (2, 3, 4)
In [368]: A*np.repeat(B[...,np.newaxis], A.shape[-1], 2)
Out[368]:
array([[[ 1, 2, 3, 4],
[ 10, 12, 14, 16],
[ 27, 30, 33, 36]],
[[ 52, 56, 60, 64],
[ 85, 90, 95, 100],
[126, 132, 138, 144]]])
But by broadcasting rules, the repeat is no needed:
In [369]: A*B[...,np.newaxis]
Out[369]:
array([[[ 1, 2, 3, 4],
[ 10, 12, 14, 16],
[ 27, 30, 33, 36]],
[[ 52, 56, 60, 64],
[ 85, 90, 95, 100],
[126, 132, 138, 144]]])
In [371]: np.prod(_369, axis=1)
Out[371]:
array([[ 270, 720, 1386, 2304],
[556920, 665280, 786600, 921600]])
You could apply prod to A and B individually, but I don't know if that makes much of a difference:
In [373]: np.prod(A,1)*np.prod(B,1)[:,None]
Out[373]:
array([[ 270, 720, 1386, 2304],
[556920, 665280, 786600, 921600]])

operation of Einstein sum of 3D matrices

The following code indicates that the Einstein sum of two 3D (2x2x2) matrices is a 4D (2x2x2x2) matrix.
$ c_{ijlm} = \Sigma_k a_{i,j,k}b_{k,l,m} $
$ c_{0,0,0,0} = \Sigma_k a_{0,0,k}b_{k,0,0} = 1x9 + 5x11 = 64 $
But, c_{0,0,0,0} = 35 according to the result below:
>>> a=np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
>>> b=np.array([[[9,10],[11,12]],[[13,14],[15,16]]])
>>> c=np.einsum('ijk,klm->ijlm', a,b)
>>> c
array([[[[ 35, 38],
[ 41, 44]],
[[ 79, 86],
[ 93, 100]]],
[[[123, 134],
[145, 156]],
[[167, 182],
[197, 212]]]])
Could someone explain how the operation is carried out?

The particular element that you are testing, [0,0,0,0] is calculated with:
In [167]: a[0,0,:]*b[:,0,0]
Out[167]: array([ 9, 26])
In [168]: a[0,0,:]
Out[168]: array([1, 2])
In [169]: b[:,0,0]
Out[169]: array([ 9, 13])
It may be easier to understand if we reshape both arrays to 2d:
In [170]: A=a.reshape(-1,2); B=b.reshape(2,-1)
In [171]: A
Out[171]:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
In [172]: B
Out[172]:
array([[ 9, 10, 11, 12],
[13, 14, 15, 16]])
In [173]: A#B
Out[173]:
array([[ 35, 38, 41, 44],
[ 79, 86, 93, 100],
[123, 134, 145, 156],
[167, 182, 197, 212]])
The same numbers, but in (4,4) instead of (2,2,2,2). It's easier to read the (1,2) and (9,13) off of A and B.

How to skip 'for' loop when dealing with numpy arrays

Here is my code:
import numpy as np
>>> x
array([[ 1, 57],
[ 2, 21],
[ 4, 34],
...,
[3348, 29],
[3350, 23],
[3353, 11]])
>>> x.shape
(1310, 2)
>>> pic # greyscale image
array([[223, 222, 225, ..., 217, 219, 214],
[224, 222, 219, ..., 220, 219, 216],
[223, 224, 220, ..., 219, 215, 213],
...,
[228, 226, 231, ..., 224, 228, 229],
[229, 227, 227, ..., 216, 225, 227],
[226, 228, 225, ..., 218, 225, 230]], dtype=uint8)
pic = np.stack((pic,pic,pic), axis=2)
>>> pic.shape
(2208, 2752, 3)
>>>labels.shape
(2208, 2752)
color = [0, 0, 255]
for i in x:
B=np.full((i[1],3), color).astype('int')
pic[labels==i[0]]=B
It colors all the pixels in grayscale image (pic) with blue (rgb 0,0,255), which satisfy the condition pic[labels==i[0]]. Now, this is very slow because of the 'for' loop used (for i in x).
Is there any efficient 'Numpy way', which won't include for loop, and
would, therefore, be much faster. Thanks for your kind help!

How to multiply a NxN matrix H with a Nx1 array t = [t1,t2,...,tN] such that Ht = [Ht1,Ht2,...,HtN] using numpy?

I know how to do this with for loops, but is there a way to use numpy arrays and their operations to achieve this type of multiplication?

You can use np.multiply.outer:
>>> import numpy as np
>>>
>>> H = np.arange(9).reshape(3, 3)
>>> t = np.c_[10:40:10]
>>>
>>> H
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> t
array([[10],
[20],
[30]])
>>>
>>> np.multiply.outer(t.ravel(), H)
array([[[ 0, 10, 20],
[ 30, 40, 50],
[ 60, 70, 80]],
[[ 0, 20, 40],
[ 60, 80, 100],
[120, 140, 160]],
[[ 0, 30, 60],
[ 90, 120, 150],
[180, 210, 240]]])

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

numpy : View support in CUDA - numpy

Related

Keep getting nan-loss when using Seq2SeqTransformer

Fastest way to do vectorized reduce product with boolean mask

operation of Einstein sum of 3D matrices

How to skip 'for' loop when dealing with numpy arrays

How to multiply a NxN matrix H with a Nx1 array t = [t1,t2,...,tN] such that Ht = [Ht1,Ht2,...,HtN] using numpy?

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

numpy : View support in CUDA - numpy

Related

Keep getting nan-loss when using Seq2SeqTransformer

Fastest way to do vectorized reduce product with boolean mask

operation of Einstein sum of 3D matrices

How to skip 'for' loop when dealing with numpy arrays

How to multiply a NxN matrix H with a Nx1 array t = [t1,t2,...,tN] such that H*t = [H*t1,H*t2,...,H*tN] using numpy?

Categories

Resources

How to multiply a NxN matrix H with a Nx1 array t = [t1,t2,...,tN] such that Ht = [Ht1,Ht2,...,HtN] using numpy?