how to use customized loss function with mxnet? - mxnet

I try to learn how to use customized loss function with mxnet.
Bellow is a minimal (not) working example of linear regression.
When I set 'use_custom = False' everything work fine, rather than custom loss wan't work. What I'm doing wrong?
import mxnet as mx
import logging
logging.basicConfig(level='DEBUG')
use_custom = False
mx.random.seed(1)
A = mx.nd.random.uniform(-1, 1, (5, 1))
B = mx.nd.random.uniform(-1, 1)
X = mx.nd.random.uniform(-1, 1, (100, 5))
y = mx.nd.dot(X, A) + B
iter = mx.io.NDArrayIter(data=X, label=y, data_name='data', label_name='label', batch_size=20, shuffle=True)
data = mx.sym.Variable('data')
label = mx.sym.Variable('label')
net = mx.sym.FullyConnected(data, num_hidden=1)
if use_custom:
net = mx.sym.MakeLoss(mx.sym.square(net - label))
else:
net = mx.sym.LinearRegressionOutput(net, label=label)
mod = mx.mod.Module(net, label_names=('label',))
mod.fit(iter, num_epoch=50, eval_metric='mse', optimizer='adam')

Questions answered here:
https://discuss.mxnet.io/t/cannot-implement-customized-loss-function/797
Your custom loss is working as expected, you think it is not converging because the eval_metric is using the output of your network (the loss) and compare it with the label. In your case I would use a custom evaluation metric, the identity function.
mod = mx.mod.Module(net, label_names=['label'])
identity = mx.metric.CustomMetric(lambda x,y:y, name='mse_id')
mod.fit(iter, num_epoch=10, eval_metric=identity, optimizer='adam')
This gives you this:
INFO:root:Epoch[0] Train-mse_id=0.434285
INFO:root:Epoch[0] Time cost=0.056
INFO:root:Epoch[1] Train-mse_id=0.000387
INFO:root:Epoch[1] Time cost=0.055
INFO:root:Epoch[2] Train-mse_id=0.000000
INFO:root:Epoch[2] Time cost=0.055
INFO:root:Epoch[3] Train-mse_id=0.000000
INFO:root:Epoch[3] Time cost=0.055
INFO:root:Epoch[4] Train-mse_id=0.000000
INFO:root:Epoch[4] Time cost=0.055
INFO:root:Epoch[5] Train-mse_id=0.000000
INFO:root:Epoch[5] Time cost=0.056
INFO:root:Epoch[6] Train-mse_id=0.000000
INFO:root:Epoch[6] Time cost=0.056
INFO:root:Epoch[7] Train-mse_id=0.000000
INFO:root:Epoch[7] Time cost=0.056
INFO:root:Epoch[8] Train-mse_id=0.000000
INFO:root:Epoch[8] Time cost=0.056
INFO:root:Epoch[9] Train-mse_id=0.000000
INFO:root:Epoch[9] Time cost=0.056

Related

Implemention of early stopping with gradient descent

I am developing an algorithm based on gradient descent and I would like to add early stoping regularization. I have an objectif function,F, and I minimize it with respect to W.
This is given in the code below:
Data : X_Train, Y_Train
t=1;
while (t < MaxIteration):
W = W - step * Grad (F,X,W).
loss(t) = computeLoss(X,Y,W);
end
Now I want to add early stoping regularization : this technique would consist in choosing the moment when it is necessary to stop during the optimization process (break the loop). How should I choose this moment? I have to test my model for each iteration on the validation data and create a history?
What I'm trying to do is given below:
Data: X_Train, Y_Train, X_val, Y_val;
t=1;
maxIteration = 100;
models = array of size maxIteration
while (t < MaxIteration):
W = W - step * Grad(F,X,W).
loss(t) = computeLoss(X,Y,W);
models(t) = W;
t=t+1;
end
How do I choose the W model among all those I have stored?

My model gives terrible results when im trying to forecast univariant time series

I am trying to do univariant time series forecasting. My model works Perfectly in different datasets. But for this dataset, the prediction is incredibly bad (tried 50,100,200 epochs, different batch sizes, and learning rates. Nothing has changed. So I think there is something wrong with my dataset.)
Here values of my dataset:
Mean: 49.840000, standard deviation: 31.786387
Here is my architecture
Here is the sample from my dataset
Here is my prediction values
Here is the code for normalization that im using:
def NormalizeMult(data):
#normalize 用于反归一化
data = np.array(data)
normalize = np.arange(2*data.shape[1],dtype='float64')
normalize = normalize.reshape(data.shape[1],2)
print(normalize.shape)
for i in range(0,data.shape[1]):
#第i列
list = data[:,i]
listlow,listhigh = np.percentile(list, [0, 100])
# print(i)
normalize[i,0] = listlow
normalize[i,1] = listhigh
delta = listhigh - listlow
if delta != 0:
#第j行
for j in range(0,data.shape[0]):
data[j,i] = (data[j,i] - listlow)/delta
#np.save("./normalize.npy",normalize)
return data,normalize
Here is the code that I'm using dataset and normalizing it:
INPUT_DIMS = 1
TIME_STEPS = 4
lstm_units = 64
#归一化
series = read_csv('/content/logs.csv')
series = series.drop(["timestamp"],axis=1)
series= series.dropna()
series = series.head(100)
data=series
data,normalize = NormalizeMult(data[0:50])
pollution_data = data[:,0].reshape(len(data[0:50]),1)
train_X, _ = split_sequence(data,TIME_STEPS)
_ , train_Y = split_sequence(data,TIME_STEPS)
optimizer = tf.keras.optimizers.Adam(lr=0.001)
m = attention_model()
m.summary()
m.compile(optimizer, loss='mse')
m.fit(train_X, train_Y, epochs=500, batch_size=2, validation_split=0.1)

CPU time in docplex - Python

Let us assume that I have created a mathematical model in python and want to solve it using the below code (the docplex library.)
start = time.perf_counter() # CPU time calculator of the CPLEX solver
# Obj 1
mdl.minimize(obj1)
solution = mdl.solve(log_output=True)
if (solution is not None) and (solution.is_feasible_solution()):
lb[0] = obj1.solution_value
if obj2.solution_value > ub[1]: ub[1] = obj2.solution_value
if obj3.solution_value > ub[2]: ub[2] = obj3.solution_value
sol[0, 0] = obj1.solution_value
sol[0, 1] = obj2.solution_value
sol[0, 2] = obj3.solution_value
sol[0, 3] = round(time.perf_counter() - start, 3)
Given that I have set mdl.time_limit=480, why could the time recorded in sol[0, 3] be greater than 480 seconds?
Thanks!
Within CPLEX docplex you can get solve time:
from docplex.mp.model import Model
mdl = Model(name='buses')
nbbus40 = mdl.integer_var(name='nbBus40')
nbbus30 = mdl.integer_var(name='nbBus30')
mdl.add_constraint(nbbus40*40 + nbbus30*30 >= 300, 'kids')
mdl.minimize(nbbus40*500 + nbbus30*400)
mdl.solve(log_output=True,)
mdl.export("c:\\temp\\buses.lp")
for v in mdl.iter_integer_vars():
print(v," = ",v.solution_value)
print(mdl.solve_details)
print("solve time =",mdl.solve_details.time)
gives
status = integer optimal solution
time = 0.109 s.
problem = MILP
gap = 0%
solve time = 0.10900000005494803
To answer your initial question, The time_limit parameter applies only to the solve call, and is handled internally by CPLEX, counting exclusively solve time.
The Python time module, on the other hand, counts time in a different manner, including Python code that is executed after the call to solve but before the second call to time.time()

Multi-GPU TFF simulation errors "Detected dataset reduce op in multi-GPU TFF simulation"

I ran my code for an emotion detection model using Tensorflow Federated simulation. My code work perfectly fine using CPUs only. However, I received this error when trying to run TFF with GPU.
ValueError: Detected dataset reduce op in multi-GPU TFF simulation: `use_experimental_simulation_loop=True` for `tff.learning`; or use `for ... in iter(dataset)` for your own dataset iteration.Reduce op will be functional after b/159180073.
What is this error about and how can I fix it? I tried to search many places but found no answer.
Here is the call stack if it help. It is very long so I pasted into this link: https://pastebin.com/b1R93gf1
EDIT:
Here is the code containing iterative_process
def startTraining(output_file):
iterative_process = tff.learning.build_federated_averaging_process(
model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.01),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
use_experimental_simulation_loop=True
)
flstate = iterative_process.initialize()
evaluation = tff.learning.build_federated_evaluation(model_fn)
output_file.write(
'round,available_users,loss,sparse_categorical_accuracy,val_loss,val_sparse_categorical_accuracy,test_loss,test_sparse_categorical_accuracy\n')
curr_round_result = [0,0,100,0,100,0]
min_val_loss = 100
for round in range(1,ROUND_COUNT + 1):
available_users = fetch_available_users_and_increase_time(ROUND_DURATION_AVERAGE + random.randint(-ROUND_DURATION_VARIATION, ROUND_DURATION_VARIATION + 1))
if(len(available_users) == 0):
write_to_file(curr_round_result)
continue
train_data = make_federated_data(available_users, 'train')
flstate, metrics = iterative_process.next(flstate, train_data)
val_data = make_federated_data(available_users, 'val')
val_metrics = evaluation(flstate.model, val_data)
curr_round_result[0] = round
curr_round_result[1] = len(available_users)
curr_round_result[2] = metrics['train']['loss']
curr_round_result[3] = metrics['train']['sparse_categorical_accuracy']
curr_round_result[4] = val_metrics['loss']
curr_round_result[5] = val_metrics['sparse_categorical_accuracy']
write_to_file(curr_round_result)
Here is the code for make_federated_data
def make_federated_data(users, dataset_type):
offset = 0
if(dataset_type == 'val'):
offset = train_size
elif(dataset_type == 'test'):
offset = train_size + val_size
global LOADED_USER
for id in users:
if(id + offset not in LOADED_USER):
LOADED_USER[id + offset] = getDatasetFromFilePath(filepaths[id + offset])
return [
LOADED_USER[id + offset]
for id in users
]
TFF does support Multi-GPU, and as the error message says one of two things is happening:
The code is using tff.learning but using the default use_experimental_simulation_loop argument value of False. With multiple GPUs, this must be set to True when using APIs including tff.learning.build_federated_averaging_process. For example, calling with:
training_process = tff.learning.build_federated_averaging_process(
..., use_experimental_simulation_loop=True)
The code contains a custom tf.data.Dataset.reduce(...) call somewhere. This must be replaced with Python code that iterates over the dataset. For example:
result = dataset.reduce(initial_state=0, reduce_func=lambda s, x: s + x)
becomes
s = 0
for x in iter(dataset):
s += x
I realized that TFF has not yet supported multi-GPUs. Therefore, we need to limit number visible of GPUs to just 1, using:
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

RNN Slow-down phenomenon of Tensorflow

I found a peculiar property of lstm cell(not limited to lstm but I only examined with this) of tensorflow which has not been reported as far as I know.
I don't know whether it actually has, so I left this post in SO. Below is a toy code for this problem:
import tensorflow as tf
import numpy as np
import time
def network(input_list):
input,init_hidden_c,init_hidden_m = input_list
cell = tf.nn.rnn_cell.BasicLSTMCell(256, state_is_tuple=True)
init_hidden = tf.nn.rnn_cell.LSTMStateTuple(init_hidden_c, init_hidden_m)
states, hidden_cm = tf.nn.dynamic_rnn(cell, input, dtype=tf.float32, initial_state=init_hidden)
net = [v for v in tf.trainable_variables()]
return states, hidden_cm, net
def action(x, h_c, h_m):
t0 = time.time()
outputs, output_h = sess.run([rnn_states[:,-1:,:], rnn_hidden_cm], feed_dict={
rnn_input:x,
rnn_init_hidden_c: h_c,
rnn_init_hidden_m: h_m
})
dt = time.time() - t0
return outputs, output_h, dt
rnn_input = tf.placeholder("float", [None, None, 512])
rnn_init_hidden_c = tf.placeholder("float", [None,256])
rnn_init_hidden_m = tf.placeholder("float", [None,256])
rnn_input_list = [rnn_input, rnn_init_hidden_c, rnn_init_hidden_m]
rnn_states, rnn_hidden_cm, rnn_net = network(rnn_input_list)
feed_input = np.random.uniform(low=-1.,high=1.,size=(1,1,512))
feed_init_hidden_c = np.zeros(shape=(1,256))
feed_init_hidden_m = np.zeros(shape=(1,256))
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(10000):
_, output_hidden_cm, deltat = action(feed_input, feed_init_hidden_c, feed_init_hidden_m)
if i % 10 == 0:
print 'Running time: ' + str(deltat)
(feed_init_hidden_c, feed_init_hidden_m) = output_hidden_cm
feed_input = np.random.uniform(low=-1.,high=1.,size=(1,1,512))
[Not important]What this code does is to generate an output from 'network()' function containing LSTM where the input's temporal dimension is 1, so output's is also 1, and pull in&out initial state for each step of running.
[Important] Looking the 'sess.run()' part. For some reasons in my real code, I happened to put [:,-1:,:] for 'rnn_states'. What is happening is then the time spent for each 'sess.run()' increases. For some inspection by my own, I found this slow down stems from that [:,-1:,:]. I just wanted to get the output at the last time step. If you do 'outputs, output_h = sess.run([rnn_states, rnn_hidden_cm], feed_dict{~' w/o [:,-1:,:] and take 'last_output = outputs[:,-1:,:]' after the 'sess.run()', then the slow down does not occur.
I do not know why this exponential increment of time happens with that [:,-1:,:] running. Is this the nature of tensorflow hasn't been documented but particularly slows down(may be adding more graph by its own?)?
Thank you, and hope this mistake not happen for other users by this post.
I encountered the same problem, with TensorFlow slowing down for each iteration I ran it, and found this question while trying to debug it. Here's a short description of my situation and how I solved it for future reference. Hopefully it can point someone in the right direction and save them some time.
In my case the problem was mainly that I didn't make use of feed_dict to supply the network state when executing sess.run(). Instead I redeclared outputs, final_state and prediction every iteration. The answer at https://github.com/tensorflow/tensorflow/issues/1439#issuecomment-194405649 made me realize how stupid that was... I was constantly creating new graph nodes in every iteration, making it all slower and slower. The problematic code looked something like this:
# defining the network
lstm_layer = rnn.BasicLSTMCell(num_units, forget_bias=1)
outputs, final_state = rnn.static_rnn(lstm_layer, input, initial_state=rnn_state, dtype='float32')
prediction = tf.nn.softmax(tf.matmul(outputs[-1], out_weights)+out_bias)
for input_data in data_seq:
# redeclaring, stupid stupid...
outputs, final_state = rnn.static_rnn(lstm_layer, input, initial_state=rnn_state, dtype='float32')
prediction = tf.nn.softmax(tf.matmul(outputs[-1], out_weights)+out_bias)
p, rnn_state = sess.run((prediction, final_state), feed_dict={x: input_data})
The solution was of course to only declare the nodes once in the beginning, and supply the new data with feed_dict. The code went from being half slow (> 15 ms in the beginning) and becoming slower for every iteration, to execute every iteration in around 1 ms. My new code looks something like this:
out_weights = tf.Variable(tf.random_normal([num_units, n_classes]), name="out_weights")
out_bias = tf.Variable(tf.random_normal([n_classes]), name="out_bias")
# placeholder for the network state
state_placeholder = tf.placeholder(tf.float32, [2, 1, num_units])
rnn_state = tf.nn.rnn_cell.LSTMStateTuple(state_placeholder[0], state_placeholder[1])
x = tf.placeholder('float', [None, 1, n_input])
input = tf.unstack(x, 1, 1)
# defining the network
lstm_layer = rnn.BasicLSTMCell(num_units, forget_bias=1)
outputs, final_state = rnn.static_rnn(lstm_layer, input, initial_state=rnn_state, dtype='float32')
prediction = tf.nn.softmax(tf.matmul(outputs[-1], out_weights)+out_bias)
# actual network state, which we input with feed_dict
_rnn_state = tf.nn.rnn_cell.LSTMStateTuple(np.zeros((1, num_units), dtype='float32'), np.zeros((1, num_units), dtype='float32'))
it = 0
for input_data in data_seq:
encl_input = [[input_data]]
p, _rnn_state = sess.run((prediction, final_state), feed_dict={x: encl_input, rnn_state: _rnn_state})
print("{} - {}".format(it, p))
it += 1
Moving the declaration out from the for loop also got rid of the problem which the OP sdr2002 had, doing a slice outputs[-1] in sess.run() inside the for loop.
As mentioned above, no sliced output for 'sess.run()' is much appreciated for this case.
def action(x, h_c, h_m):
t0 = time.time()
outputs, output_h = sess.run([rnn_states, rnn_hidden_cm], feed_dict={
rnn_input:x,
rnn_init_hidden_c: h_c,
rnn_init_hidden_m: h_m
})
outputs = outputs[:,-1:,:]
dt = time.time() - t0
return outputs, output_h, dt