How to build model with 3D array label with TensorFlow - tensorflow

I have data predictor in 2D array like this below
array([[ 0, 0, 0, ..., 10, 6, 1],
[ 0, 0, 0, ..., 12, 6, 1],
[ 0, 0, 0, ..., 8, 6, 1],
...,
[ 0, 0, 0, ..., 54, 30, 60],
[ 0, 0, 0, ..., 1472, 5, 348],
[ 0, 0, 0, ..., 58, 45, 60]])
and label data with 3D shape like this below
[array([[0., 0., 1., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 1., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32),
array([[0., 0., 1., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
...
...,
[0., 1., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32),
...]
I would build the model with TensorFlow
model = Sequential()
model.add(Embedding(total_words, 10, input_length=max_sequence_len))
model.add(LSTM(100))
model.add(Dropout(0.1))
model.add(Dense(total_words_label, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
But, I get error in latest Dense
ValueError: Shapes (None, 10, 4991) and (None, 4991) are incompatible
It looks like from latest Dense shape because I have 3D label. How the way I can adjust it? I'm learning about TensorFlow.
And, how to convert the output to the 3D label when I have predict it?
--- update (adding the process) --
the example data
input output/label
Hartford Avenue, Bellingham, MA, 2019 {'address': 'Hartford Avenue', 'city': 'Bellingham', 'state': 'MA','zip': '2019'}
Oak Street, Brockton, MA, 2301 {'address': 'Oak Street', 'city': 'Brockton', 'state': 'MA', 'zip': '2301'}
the input is as predictor and the output/label is as label
The process of creating label
tokenizer_label = Tokenizer()
l = list(all['input'])
tokenizer_label.fit_on_texts(l)
total_words_label = len(tokenizer_label.word_index) + 1
labels = []
for i in range(len(l)):
labels.append(keras_utils.to_categorical(tokenizer_label.texts_to_sequences(l[i].split()), num_classes=total_words_label))

Related

merge two thresholds for two 3D arrays into a list

I have the first 3D array of size (50,250,250) that includes data points (1,2,3,4,5). I set up a threshold that is 3, where the data points above should equal to 1 and below it equal to 0. the only exception is when the data points are equal to 3, it has to test the second threshold (threshold1=50) that is based on the second 3D array of size (50,250,250). my equation is how to include the two thresholds in my code! In other words, the for loop will check every datapoint in array 1 and perform the first threshold testing, if the datapoint is equal to 3, the for loop should check the counterpart of that datapoint in the second array for the second threshold testing! I have tried the below code, but the results did not make sense
res1=[]
f1=numpy.ones((250, 250))
threshold=3
threshold1=30
for i in array1:
i = i.data
ii= f1*i
ii[ii < threshold] = 0
ii[ii > threshold] = 1
res1.append(ii)
if ii[ii == threshold]:
for j in array2:
j = j.data
jj[jj < threshold1] = 0
jj[jj > threshold1] = 1
res1.append(jj)
Array1:
array([[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[3., 3., 3., ..., 0., 0., 0.],
[3., 3., 3., ..., 0., 0., 0.],
[3., 3., 3., ..., 0., 0., 0.]],
[[0., 0., 0., ..., 0., 0., 1.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[3., 3., 3., ..., 0., 0., 0.],
[3., 3., 3., ..., 0., 0., 0.],
[3., 3., 3., ..., 0., 0., 0.]],
Array2:[[ nan, nan, nan, ..., nan,
0.9839769, 1.7042577],
[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., 3.2351596,
2.0924768, 1.7604152],
...,
[ nan, nan, nan, ..., 158.48865 ,
158.48865 , 125.888 ],
[ nan, nan, nan, ..., 158.48865 ,
158.48865 , 158.48865 ],
[ nan, nan, nan, ..., 125.88556 ,
158.48865 , 158.48865 ]],
the produced list (rest1)
`[array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[1., 1., 1., ..., 0., 0., 0.],
[1., 1., 1., ..., 0., 0., 0.],
[1., 1., 1., ..., 0., 0., 0.]]),
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[1., 1., 1., ..., 0., 0., 0.],
[1., 1., 1., ..., 0., 0., 0.],
[1., 1., 1., ..., 0., 0., 0.]]),
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],`
IIUC, for your second if condition, you are trying to see whether there is at least a 3 value in that array1, and then you will choose that 2D array of the same position. In that case, you should use in operator.
for i in range(len(array1)):
if threshold in array1[i]:
array2[i][array2[i] < threshold1] = 0
array2[i][array2[i] > threshold1] = 1
res1.append(array2[i])
else:
array1[i][array1[i] < threshold] = 0
array1[i][array1[i] > threshold] = 1
res1.append(array1[i])
The above method is a bit lengthy for numpy. There's a numpy way to do this, too.
array1[array1 < threshold] = 0
array1[array1 > threshold] = 1
array2_condition = np.unique(np.argwhere(array1 == 3)[:,0]) # return the index of array1 if 3 in array1
chosen_array2 = array2[array2_condition]
chosen_array2[chosen_array2 < threshold1] = 0
chosen_array2[chosen_array2 > threshold1] = 1
array2[array2_condition] = chosen_array2 # if you still want array2 values to be changed
res1 = array1
res1[array2_condition] = chosen_array2 # Final result
Update
As was mentioned by the OP, every 2D array has at least a 3 in it. So, the array2_condition is not applicable. Instead, we will modify the array2_condition and use a for loop to change the elements.
res1 = array1
res1[res1 < threshold] = 0
res1[res1 > threshold] = 1
array2_condition = np.argwhere(array1 == 3)
for data in array2_condition:
if array2[tuple(data)] > threshold1:
res1[tuple(data)] = 1
elif array2[tuple(data)] < threshold1:
res12[tuple(data)] = 0

Obtaining multiple plots of a given data

I have this data:
s_result = [{'time': array([ 0. , 0.1, 0.2, ..., 299.7, 299.8, 299.9]), 'I': array([10., 10., 10., ..., 0., 0., 0.]), 'S': array([60., 60., 60., ..., 0., 0., 0.]), 'M': array([40., 40., 40., ..., 0., 0., 0.]), 'R': array([0., 0., 0., ..., 1., 1., 1.]), 'P1': array([ 0., 0., 0., ..., 19., 19., 19.]), 'D1': array([ 0., 0., 0., ..., 81., 81., 81.]), 'P2': array([0., 0., 0., ..., 0., 0., 0.]), 'D2': array([0., 0., 0., ..., 0., 0., 0.]), 'P3': array([0., 0., 0., ..., 0., 0., 0.]), 'D3': array([0., 0., 0., ..., 0., 0., 0.]), 'P4': array([0., 0., 0., ..., 0., 0., 0.]), 'D4': array([0., 0., 0., ..., 0., 0., 0.]), 'P5': array([0., 0., 0., ..., 0., 0., 0.]), 'D5': array([0., 0., 0., ..., 0., 0., 0.]), 'P6': array([0., 0., 0., ..., 0., 0., 0.]), 'D6': array([0., 0., 0., ..., 0., 0., 0.]), 'P7': array([0., 0., 0., ..., 0., 0., 0.]), 'D7': array([0., 0., 0., ..., 0., 0., 0.]), 'P8': array([0., 0., 0., ..., 0., 0., 0.]), 'D8': array([0., 0., 0., ..., 0., 0., 0.]), 'P9': array([0., 0., 0., ..., 0., 0., 0.]), 'D9': array([0., 0., 0., ..., 0., 0., 0.])}, {'time': array([ 0. , 0.1, 0.2, ..., 299.7, 299.8, 299.9]), 'I': array([10., 10., 10., ..., 0., 0., 0.]), 'S': array([60., 60., 60., ..., 0., 0., 0.]), 'M': array([40., 40., 40., ..., 0., 0., 0.]), 'R': array([0., 0., 0., ..., 0., 0., 0.]), 'P1': array([ 0., 0., 0., ..., 20., 20., 20.]), 'D1': array([ 0., 0., 0., ..., 80., 80., 80.]), 'P2': array([0., 0., 0., ..., 0., 0., 0.]), 'D2': array([0., 0., 0., ..., 0., 0., 0.]), 'P3': array([0., 0., 0., ..., 0., 0., 0.]), 'D3': array([0., 0., 0., ..., 0., 0., 0.]), 'P4': array([0., 0., 0., ..., 0., 0., 0.]), 'D4': array([0., 0., 0., ..., 0., 0., 0.]), 'P5': array([0., 0., 0., ..., 0., 0., 0.]), 'D5': array([0., 0., 0., ..., 0., 0., 0.]), 'P6': array([0., 0., 0., ..., 0., 0., 0.]), 'D6': array([0., 0., 0., ..., 0., 0., 0.]), 'P7': array([0., 0., 0., ..., 0., 0., 0.]), 'D7': array([0., 0., 0., ..., 0., 0., 0.]), 'P8': array([0., 0., 0., ..., 0., 0., 0.]), 'D8': array([0., 0., 0., ..., 0., 0., 0.]), 'P9': array([0., 0., 0., ..., 0., 0., 0.]), 'D9': array([0., 0., 0., ..., 0., 0., 0.])}, {'time': array([ 0. , 0.1, 0.2, ..., 299.7, 299.8, 299.9]), 'I': array([10., 10., 10., ..., 0., 0., 0.]), 'S': array([60., 60., 60., ..., 0., 0., 0.]), 'M': array([40., 40., 40., ..., 0., 0., 0.]), 'R': array([0., 0., 0., ..., 0., 0., 0.]), 'P1': array([ 0., 0., 0., ..., 20., 20., 20.]), 'D1': array([ 0., 0., 0., ..., 80., 80., 80.]), 'P2': array([0., 0., 0., ..., 0., 0., 0.]), 'D2': array([0., 0., 0., ..., 0., 0., 0.]), 'P3': array([0., 0., 0., ..., 0., 0., 0.]), 'D3': array([0., 0., 0., ..., 0., 0., 0.]), 'P4': array([0., 0., 0., ..., 0., 0., 0.]), 'D4': array([0., 0., 0., ..., 0., 0., 0.]), 'P5': array([0., 0., 0., ..., 0., 0., 0.]), 'D5': array([0., 0., 0., ..., 0., 0., 0.]), 'P6': array([0., 0., 0., ..., 0., 0., 0.]), 'D6': array([0., 0., 0., ..., 0., 0., 0.]), 'P7': array([0., 0., 0., ..., 0., 0., 0.]), 'D7': array([0., 0., 0., ..., 0., 0., 0.]), 'P8': array([0., 0., 0., ..., 0., 0., 0.]), 'D8': array([0., 0., 0., ..., 0., 0., 0.]), 'P9': array([0., 0., 0., ..., 0., 0., 0.]), 'D9': array([0., 0., 0., ..., 0., 0., 0.])}]
I intend to work on only M as follows:
for index in range(0, 3):
x_stochastic = s_result[index]['M']
x_stochastic=((s_result['M'][0]-s_result['M'][:])/s_result['M'][0])
plt.plot(s_trajectory['time'],x_stochastic , 'r')
plt.xlabel('Time')
plt.ylabel('Monomer Conversion,X')
The expected outcome is 3 different trajectories as shown in the above data. But this is what got graphically:
I couldn't get your code as provided to run. Furthermore in the data snipped you posted all three M Value series are indistinguishable.
I've taken the liberty to modify the data to
s_result = [{'time': np.array([0,100,200,300]),
'M': np.array([40,40,30,0])},
{'time': np.array([0,100,200,300]),
'M': np.array([40,10,5,0])},
]
The corrected loop
for index in range(0, len(s_result)):
x_stochastic = s_result[index]
x_stochastic=((x_stochastic['M'][0]-x_stochastic['M'][:])/x_stochastic['M'][0])
plt.plot(s_result[index]['time'],x_stochastic , 'r')
plt.xlabel('Time')
plt.ylabel('Monomer Conversion,X')
then produces two distinct traces as desired.

Keras: result of model.evaluate() stays high with all the weights and biases being 0

I created a VGG16 model using Keras application (TensorFlow backend). Then I wanted to change part of those weights and then test the accuracy of this modified model. To be direct and intuitive, I changed ALL the weights and biases in ALL layers to 0 like this:
model = VGG16(weights='imagenet', include_top=True)
# here is the test data and label containing 10 pictures I created.
data = np.load('./10_random_samples_array.npz')
data, label = data["X"], data["Y"]
# Modify the weights to zero
for z in [1, 2, 4, 5, 7, 8, 9, 11, 12, 13, 15, 16, 17]: # Conv layers
weight_bias = model.layers[z].get_weights()
shape_weight = np.shape(weight_bias[0])
shape_bias = np.shape(weight_bias[1])
weight_bias[0] = np.zeros(shape=(shape_weight[0],shape_weight[1],shape_weight[2],shape_weight[3]))
weight_bias[1] = np.zeros(shape=(shape_bias[0],))
model.layers[z].set_weights(weight_bias)
for z in [20,21,22]: # FC layers
weight_bias = model.layers[z].get_weights()
shape_weight = np.shape(weight_bias[0])
print(z, shape_weight)
shape_bias = np.shape(weight_bias[1])
weight_bias[0] = np.zeros(shape=(shape_weight[0],shape_weight[1],))
weight_bias[1] = np.zeros(shape=(shape_bias[0],))
model.layers[z].set_weights(weight_bias)
model.compile(loss='binary_crossentropy',
optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
metrics=['accuracy'])
# To check if the weights have been modified.
print(model.layers[1].get_weights())
loss, acc = model.evaluate(data, label, verbose=1)
print(acc)
Then I got result like this:
[array([[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
...(All zero, I omit them)
[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]]], dtype=float32),
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)]
10/10 [==============================] - 2s 196ms/step
0.9989999532699585
Firstly, You can tell that all the weights and biases have already been changed to 0 but the accuracy still stays very high. That is unreasonable.(The original result returned by model.evaluate() is 0.9993000030517578)
Secondly, I used only 10 pictures as my test dataset. The result must be a decimal with only one digit after the point. But I got 0.9989999532699585.
I also tried to modify all weights only in Conv1-1 to zero and the result is also 0.9989999532699585. It seems that it is the minimum result. Is there something wrong with my model? Or the weights cannot be modified in this way? Or model.evaluate() doesn't work as I suppose?

how to understand the output of tf.nn.top_k() from tensorflow

I used tf.nn.top_k()function from tensorflow to use the model's softmax probabilities to visualize the certainty of its predictions with 5 new images and with k=5. I have an output as follows which I am not sure how to exactly interpret. Could anyone explain the output please.
TopKV2(values=array([[ 1., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.]], dtype=float32), indices=array([[13, 0, 1, 2, 3],
[13, 0, 1, 2, 3],
[13, 0, 1, 2, 3],
[26, 0, 1, 2, 3],
[13, 0, 1, 2, 3]], dtype=int32))
From the documentation, it returns two tensors: the first with the top K value and the second with the indices of these values in the original tensor.
So for your data what I see is that the original tensor is always one-hot (i.e. has a single 1.0 entry per row and is 0 everywhere else).

Interpreting loss in LSTM tensorflow

I am using PTB dataset to predict next words.
My code : pastebin link.
The input to the model (Batch_input) are the words with vocabulary_size of 10000. All the outputs (Batch_labels) are one-hot encoded as you can see a sample in the part of output code below.
Following is my output after training the LSTM Model.
Output : pastebin link.
Following is some part of output :
Initialized
('Loss :', 9.2027139663696289)
('Batch_input :', array([9971, 9972, 9974, 9975, 9976, 9980, 9981, 9982, 9983, 9984, 9986,
9987, 9988, 9989, 9991, 9992, 9993, 9994, 9995, 9996, 9997, 9998,
9999, 2, 9256, 1, 3, 72, 393, 33, 2133, 0, 146,
19, 6, 9207, 276, 407, 3, 2, 23, 1, 13, 141,
4, 1, 5465, 0, 3081, 1596, 96, 2, 7682, 1, 3,
72, 393, 8, 337, 141, 4, 2477, 657, 2170], dtype=int32))
('Batch_labels :', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))
Average loss at step 0: 0.092027 learning rate: 1.000000
('Label: ', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))
('Predicted:', array([[-0.36508381, -0.25612 , -0.26035795, ..., -0.42688274,
-0.4078168 , -0.36345699],
[-0.46035308, -0.27282876, -0.34078932, ..., -0.50623679,
-0.47014061, -0.43237451],
[-0.14694197, -0.07506246, -0.10392818, ..., -0.1128526 ,
-0.12404554, -0.13495158],
...,
[-0.07286638, -0.04560997, -0.05932444, ..., -0.08352474,
-0.07679331, -0.07829094],
[-0.13576414, -0.07057529, -0.1017022 , ..., -0.11192483,
-0.14713599, -0.11757012],
[-0.05446544, -0.02738103, -0.03401792, ..., -0.05073205,
-0.03746928, -0.05750648]], dtype=float32))
================================================================================
[[ 0. 0. 0. ..., 0. 0. 0.]]
8605
('f', u'altman')
('as', u'altman')
('feed', array([8605]))
('Sentence :', u'altman rake years regatta memotec pierre <unk> nonexecutive as will <eos> ssangyong director nahb group the cluett rubens snack-food fromstein calloway and memotec a board years regatta publishing fields rake group group rake cluett ssangyong pierre calloway memotec gitano gold rubens as as director sim is publishing gitano punts join <unk> and a old punts years memotec a rake is guterman cluett ssangyong will berlitz nahb <eos> of group join <unk> board join and pierre consolidated board cluett dutch gold as ipo ssangyong guterman a kia will dutch and director centrust consolidated rudolph guterman guterman cluett years n.v. old board rubens ')
================================================================================
('Loss :', 496.78199882507323)
('Batch_input :', array([4115, 5, 14, 45, 55, 3, 72, 195, 1244, 220, 2,
0, 3150, 7426, 1, 13, 4052, 1, 496, 14, 6885, 0,
1, 22, 113, 2652, 8068, 5, 14, 2474, 5250, 10, 464,
52, 3004, 466, 1244, 15, 2, 1, 80, 0, 167, 4,
35, 2645, 1, 65, 10, 558, 6092, 3574, 1898, 666, 1,
7, 27, 1, 4241, 6036, 7, 3, 2, 366], dtype=int32))
('Batch_labels :', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 1., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))
Average loss at step 100: 4.967820 learning rate: 1.000000
('Label: ', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 1., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))
('Predicted:', array([[ 4.41551352e+00, 9.98007679e+00, 1.75690575e+01, ...,
6.83443546e+00, -2.30797195e+00, 1.73750782e+00],
[ 1.26826172e+01, 5.96618652e-03, 1.18247871e+01, ...,
-3.70885038e+00, -8.55356884e+00, -9.16959190e+00],
[ 1.44652233e+01, 5.12977028e+00, 9.42045784e+00, ...,
1.39444172e+00, 1.95213389e+00, -4.00810099e+00],
...,
[ 2.93052626e+00, 9.41266441e+00, 1.79130135e+01, ...,
4.24245834e+00, -1.46551771e+01, -3.35697136e+01],
[ 2.48945675e+01, 2.32091904e+01, 2.47276134e+01, ...,
-6.39845896e+00, -2.66628218e+00, -4.59843445e+00],
[ 1.34414902e+01, 4.80197811e+00, 1.89214745e+01, ...,
-5.91268682e+00, -8.80736637e+00, -6.49542713e+00]], dtype=float32))
================================================================================
[[ 0. 0. 0. ..., 0. 0. 0.]]
3619
('f', u'officially')
('as', u'officially')
('feed', array([3619]))
('Sentence :', u'officially <unk> to <eos> filters ago cigarettes is that cigarette stopped to <eos> researchers <unk> to <eos> filters ago cigarettes asbestos the filters ago cigarettes asbestos the filters ago cigarettes is that cigarette up the <eos> researchers to <eos> researchers <unk> to <eos> filters ago cigarettes asbestos the filters ago cigarettes asbestos <eos> filters ago cigarettes asbestos the filters ago cigarettes is that cigarette up the <eos> researchers <unk> to <eos> researchers <unk> to <eos> filters ago cigarettes asbestos of percentage years the the the <eos> researchers <unk> to <eos> filters ago cigarettes asbestos the filters ago cigarettes asbestos the filters ')
================================================================================
The initial loss is 0.92 which predicts the text as given.The next loss is around 4.57 at 100 step. But as the number of step increases loss increases which is anomaly (right?).
And also the next predicted word in output 'among' repeats at step 500.
Is there any error in training?
This is new output I get : pastebin link.
I am not 100% sure about the problem in your code but I noticed that you start the learning rate at 1.
learning_rate = tf.train.exponential_decay(1.0, global_step, 5000, 0.1, staircase=True)
Try to pick a lower initial value.
High learning rates causes the model weights to take long leaps so it may miss the minimum and it may even reach a point where the loss is higher (which may be your case). it is like super-jumping over a valley from one side to the other instead of going down into it.
Reference for the image: http://cs231n.github.io/neural-networks-3/
Lowering the learning rate from 1e-2 to 1e-4 solved a similar problem in a different model. Your model may work at a different learning rate.