I have my own environment, which is basically the simulation of a stochastic differential equation, and I'm trying to train it using PPO. I can see from TensorBoard that the average reward and length of the episode is increasing, but when is it good enough?
I though of comparing it with the maximum lenght of an episode, and I was thinking this was given by the parameter total timesteps in the command:
model.learn(total_timesteps=TIMESTEPS, reset_num_timesteps=False, tb_log_name=f"PPO")
Is this true? I'm using stable baseline 3 and I'm following the tutorial from here, so I have a similar code, but I work with a continuous action-state space. This is my code without the environment:
from stable_baselines3 import PPO
import os
from Measure_gym import *
import time
models_dir = f"models/PPO/"
logdir = f"logs/"
if not os.path.exists(models_dir):
os.makedirs(models_dir)
if not os.path.exists(logdir):
os.makedirs(logdir)
env = bit_flip()
env.reset()
model = PPO('MlpPolicy', env, verbose=1, tensorboard_log=logdir)
#training
TIMESTEPS = 100000
episodes = 1000
iters = 0
for ep in range(episodes):
iters += 1
model.learn(total_timesteps=TIMESTEPS, reset_num_timesteps=False, tb_log_name=f"PPO")
model.save(f"{models_dir}/{TIMESTEPS*iters}")
Related
I have a training set on which I would like to train a neural network, using K-folds cross validation.
TL;DR: Given the number of epochs, the set of params to be used, and checking on the test-set, how RandomizedSearchCV trains the model? I would think that for a combination of params, it trains the model on (K-1) folds for epochs number of epochs. Then it tests it on the last fold. But then, what prevent us from overfitting? When "vanilla" training with a constant validation set, after each epoch keras checks it on the validation set, is it done here as well? Even though verbose=1 I don't see the scores from the fit on the remaining fold. I saw here that we can add callbacks to the KerasClassifier, but then, what happens if the settings of KerasClassifier and RandomizedSearchCV clash? Can I add there a callback to check the val_prc, for exampl? If so, what would happen?
Sorry for the long TL;DR!
Regarding the training procedure, I am using the keras-sklearn interface. I defined the model using
model = KerasClassifier(build_fn=get_model_, epochs=120, batch_size=32, verbose=1)
Where get_model_ is a function that returns a compiled tf.keras model.
Given the model, the training procedure is the following:
params = dict({'l2':[0.1,0.3,0.5,0.8],
'dropout_rate':[0.1,0.3,0.5,0.8],
'batch_size':[16,32,64,128],
'learning_rate':[0.001, 0.01, 0.05, 0.1]})
def trainer(model, X, y, folds, params, verbose=None):
from keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
if not verbose:
v=0
else:
v = verbose
clf = RandomizedSearchCV(model,
param_distributions = params,
n_jobs = 1,
scoring="roc_auc",
cv = folds,
verbose = v)
# -------------- fit ------------
grid_result = clf.fit(X, y)
# summarize results
print('- '*40)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
print('- '*40)
# ------ Training -------- #
trainer(model, X_train, y_train, folds, params, verbose=1)
First, do I use RandomizedSearchCV right? Regardless of the number of options for each param I get the same message: Fitting 5 folds for each of 10 candidates, totalling 50 fits
Second, I have a hard problem with imbalanced data + lack of data. Even so, I get unexpectedly low scores and high loss values.
Lastly, and following the TL;DR, what is the training procedure that is actually being done using the above code, assuming that it is correct.
Thanks!
First, do I use RandomizedSearchCV right? Regardless of the number of options for each param I get the same message: Fitting 5 folds for each of 10 candidates, totalling 50 fits
RandomizedSearchCV has an argument n_iter that defaults to 10, it will thus sample 10 configurations of parameters, no matter how many possible ones are there. If you want to run all combinations you want to use GridSearchCV instead.
Second, I have a hard problem with imbalanced data + lack of data. Even so, I get unexpectedly low scores and high loss values.
This is way too broad / ill posed question for stack overflow.
Lastly, and following the TL;DR, what is the training procedure that is actually being done using the above code, assuming that it is correct.
For i=1 to n_iters (10):
Get random hyperparameters from provided space
Split data into 5 equal chunks (X_1, y_1), ..., (X_5, y_5)
scores = []
for k=1 to 5:
Train model with given hyperparameters on all chunks apart from (X_k, y_k)
Evaluate the above model on (X_k, y_k)
Append score to scores
if avg(scores) > best_score:
best_score = avg(scores)
best_model = model
best_hyperparameters = hyperparameters
Recently, I've decided to apply some reinforcement learning and deep Q learning I've learned to the LunarLander environment by OpenAI.
My algorithm is just Deep Q-learning with experience replay and I wanted to be able to save the model/agent and then load up the model on its own and make it just interact with the environment without any fitting/training of its weights. I had chosen to save a few models using q_network.save(directory+"lunar_model_score{}.h5".format(accum_reward)) at the end of each iteration/episode with the highest consecutive scores and low epsilon value (so that the model is doing more predicting than exploring) during training.
However when I try to load the model again elsewhere and try to run the model just in the environment without training, it performs very poorly as if it had not been trained, code for testing:
import gym
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
env = gym.make('LunarLander-v2')
action_space = env.action_space.n
state_space = env.observation_space.shape[0]
lunar_agent = tf.keras.models.load_model('C:/Users/haora/gymEnv/LunarLand/models/lunar_model_score215.65755254109038.h5')
file_name = 'lunarLand_test_data.txt'
datafile = open(file_name,"w+")
episodes = 10
lunar_agent.summary()
#print(lunar_agent.get_weights())
for e in range(episodes):
state = env.reset()
accum_reward = 0
while True:
env.render()
state = np.reshape(state,[1,state_space])
prediction = lunar_agent.predict(state)
action = np.argmax(prediction[0])
next_state, reward, done, _ = env.step(action)
accum_reward += reward
if done:
break
print("episode:{}/{} | score:{}".format(e,episodes,accum_reward))
datafile.write(str(e)+','+str(accum_reward)+'\n')
env.close()
datafile.close()
I've verified that the weight values and architecture I saved in the training file is the same as the weights i got when I called print(lunar_agent.get_weights()), so I was wondering why there is such a big discrepency between the model when training and the model when only interacting with the environment and how to fix it so that I can run different models at different iterations of training and make the agent perform accordingly when only interacting with the environment.
I am training a model using Keras 2.2.4 and python 3.5.3 and Tensorflow on GCP virtual machine with K80 GPU.
GPU utilisation oscillates between 25 and 50% while CPU process with python eats 98%
I assume python is too slow to feed K80 with data.
The code as below.
There are multiple days of data for each epoch.
Each day has around 20K samples - number is a bit different for each.
Batch size is fixed by variable window_size=900
So I feed it around 19K batches for a day. Batch 0 starts with sample 0 and takes 900 samples, batch 1 starts from sample 1 and takes 900 samples and so on until the day ends.
So I have 3 loops - epoch, days, batches.
I feel the epoch and days loops should be preserved for clarity. I don't think they are the problem
I think the most inner loop should be looked at.
The implementation of the inner loop is naïve. Is there some trickery that can make work with arrays faster?
# d is tuple from groupby - d[0] = date, d[1] = values
for epoch in epochs:
print('epoch: ', epoch)
for d in days :
print(' day: ', d[0])
# get arrays for the day
features = np.asarray(d[1])[:,2:9].astype(dtype = 'float32')
print(len(features), len(features[0]), features[1].dtype)
labels = np.asarray(d[1])[:, 9:].astype(dtype = 'int8')
print(len(labels), len(labels[0]), labels[1].dtype)
for batch in range(len(features) - window_size):
# # # can these be optimised?
fb = features[batch:batch+window_size,:]
lb = labels[batch:batch+window_size,:]
fb = fb.reshape(1, fb.shape[0], fb.shape[1])
lb = lb.reshape(1, lb.shape[0], lb.shape[1])
# # #
model.train_on_batch(fb, lb)
#for batches
#model.reset_states()
#for days
#for epoch
try wrapping your script with:
import tensorflow as tf
with tf.device('/device:GPU:0'):
<your code>
Check out the Tensorflow guide on using GPUs for more information
I just recently learned tensorflow. I tried to run a simple regression example, but I got a bad result.
My input X is a matrix of 10x10000, that is, each data is a vector of 10x1, a total of 10000 pieces of data.
Desired output Y is just first row of X.
My code is as follows:
import tensorflow as tf
import numpy as np
from numpy.random import RandomState
rdm=RandomState(1)
data_size=10000
xdim=10
X=rdm.rand(data_size,xdim)
Y = [x1[0] for x1 in X]
x=tf.placeholder(tf.float32,shape=(None,xdim))
y=tf.placeholder(tf.float32,shape=(None))
#logits = modelFun(x)
Weights = tf.Variable(tf.random_normal([xdim, 1]))
biases = tf.Variable(0.1)
logits = tf.matmul(x, Weights) + biases
loss = tf.reduce_mean(tf.square(logits - y))
optimizer = tf.train.GradientDescentOptimizer(0.005).minimize(loss)
batch_size=50
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
steps=20001
for i in range(steps):
start = i * batch_size % data_size
end = min(start + batch_size,data_size)
sess.run(optimizer,feed_dict={x:X[start:end],y:Y[start:end]})
if i % 5000 == 0:
ypred,training_loss= sess.run([logits,loss],feed_dict={x:X,y:Y})
print("Epoch %d: loss=%g"%(i,training_loss))
The output results are as follows:
Epoch 0: loss=6.31555
Epoch 5000: loss=0.0798763
Epoch 10000: loss=0.0797333
Epoch 15000: loss=0.0797259
Epoch 20000: loss=0.079724
It can't go down to 0.0797.
I checked part of the output. They are far from the correct answer.
>>>print(ypred[:10].T[0])
[ 0.49342471 0.49475971 0.50192004 0.48912409 0.50592101 0.48473218 0.48652697 0.50261581 0.50218904 0.48906678]
>>>print(np.array(Y[:10]))
[ 0.417022 0.41919451 0.80074457 0.09834683 0.98886109 0.01936696 0.10233443 0.90340192 0.88330609 0.11474597]
What is the reason for this? How to solve it?
So thanks for your help!
You're asking for too much from your model. You're generating 10000 points of ten-dimensional random data, so there's no structure to learn, and then doing linear regression with a single neuron; your model doesn't have the capacity to even begin to memorize your input, so guessing that every y is about 0.5 is the best it can do.
The biggest issue is the random input. Most types of machine learning models make strong assumptions about the structure of what they're trying to learn, and random data doesn't have that structure. A large enough neural network could memorize you input data and give you a low training error, but it would completely fail to generalize (the test error would be high), and generalizing is usually the goal.
I'm trying to do a simple linear regression problem using Gradient Descent with Tensorflow, but unless I set my step size really, really small, the weight and bias balloon and overflow almost immediately. Here's my code:
import numpy as np
import tensorflow as tf
# Read the data
COLUMNS = ["url", "title_length", "article_length", "keywords", "shares"]
data = np.genfromtxt("OnlineNewsPopularitySample3.csv", delimiter=',', names=COLUMNS)
# We're looking for shares based on article_length
article_length = tf.placeholder("float")
shares = tf.placeholder("float")
# Set up the variables we're going to use
initial_m = 1.0
initial_b = 1.0
w = tf.Variable([initial_m, initial_b], name="w")
predicted_shares = tf.multiply(w[0], article_length) + w[1]
error = tf.square(predicted_shares - shares)
# This is as big as I can make it; any larger, and I have problems.
step_size = .000000025
optimizer = tf.train.GradientDescentOptimizer(step_size).minimize(error)
model = tf.global_variables_initializer()
with tf.Session() as session:
# First initialize all the variables
session.run(model)
# Now we're going to run the optimizer
for i in range(100000):
session.run(optimizer, feed_dict={article_length: data['article_length'], shares: data['shares']})
if (i % 100 == 0):
print (session.run(w))
# Once it's done, we need to get the value of w so we can display it.
w_value = session.run(w)
print("Predicted model: {a:.3f}x + {b:.3f}".format(a=w_value[0], b=w_value[1]))
So basically, when I run this, the outputs become "NaN" almost immediately. Any ideas?
Thanks in advance!
A very low learning rate means a very small update to the weights. In your case, even a relatively small learning rate is blowing up your weights, its because the weight updates (dE/dW) seems to be very large. And the update is a function of the output Error. If the labels are large values, your squared error will be high at the start as the predictions will be quite low. Try scaling the outputs to avoid this problem.