Predict flight delays with RNN - tensorflow2.0

I want to use rnn to predict flight delay
here is dataset website : https://www.transtats.bts.gov/Tables.asp?DB_ID=120&DB_Name=Airline%20On-Time%20Performance%20Data&DB_Short_Name=On-Time
x_train.shape = (75395, 16)
x_test.shape = (32313, 16)
I need to build RNN model to predict flight delays .
here is my notebook
https://github.com/AhmadAbdElhameed/flight_delays/blob/master/Flight%20Delay.ipynb
(go to cell 83) - for rnn model
I need help to create rnn model to predict delay

Related

LSTM multivariate predicting multiple features

I am new to this neural networks and LSTM. I hope I will get a guidance from you and I will be thankful to you.
I have 2 years of bitcoin historical dataset and bitcoin sentiment dataset which is of one hour interval. My goal is to predict next 60 hours future chart using LSTM.
I have seen some of the articles regarding multivariate time series prediction. But in all of them they are taking only one feature for prediction. They predict only the price of one upcoming day and . So in order to predict next 2 months data, I have to predict all of the features. So that I can seed the predicted data as input for the next prediction and so on to predict for next 60 days.
Can someone help me to figure out how can I do this kind of prediction?
Edit:
The dataset looks like this:
timestamp,close,sentiment
2020-05-01_00,8842.85,0.21
2020-05-01_01,8824.43,0.2
2020-05-01_02,8745.91,0.2
2020-05-01_03,8639.12,0.19
2020-05-01_04,8625.69,0.2
And I would like to use tenserflow as backend. As of now i have not written code for building the model as I have to know what to do before i start coding.
The idea is to give 100 or 150 rows of data as input to the model and then forecast for the next 60 hours by seeding the prediction of the model as the input for the next prediction.
It would help if you shared some code for how you are constructing your model, and what your data looks like. How is your sentiment data encoded, and what framework you are using (tensorflow, pytorch, etc)? I am mostly familiar with Tensorflow, so I'll point you in that direction.
In general it can be helpful to use an Input Layer, but LSTMs expect a 3D tensor [batch, timestamps, feature].
You might want to consider a non-sequential model architecture, using functional APIs. If you went that route, you could have 2 separate inputs. One being the price time series, the other being the sentiment time series
pass each to an LSTM then you can concatenate/combine them and pass them to Dense layers or even convolutional layers.
Lastly you could also look into ConvLSTM2D which takes a 5D tensor:[samples, time, channels, rows, cols]
#------------------ Response (👇) post update: -------------------------
View the notebook here
#===========Design Model Architecture:
#==== Create Input Layers:
Price_Input = tf.keras.layers.Input(shape=(60,),name='Price_Input') #Price as Input
Sent_Input = tf.keras.layers.Input(shape=(60,),name='Sentiment_Input') #Sentiment as Input
#=== Handle Reshaping as part of the Model Architecture:
P_Input_rshp = tf.keras.layers.Reshape(target_shape=(60,1),
input_shape=(60,),
name='Price_Reshape')(Price_Input) #Pass price to reshape layer
S_Input_rshp = tf.keras.layers.Reshape(target_shape=(60,1),
input_shape=(60,),
name='Sentiment_Reshape')(Sent_Input) #Pass sentiment to rehape layer
#=== Use LSTM layers for timeseries:
P_x = tf.keras.layers.LSTM(units=1,activation='tanh',name='Price_LSTM')(P_Input_rshp) #Price Focused LSTM
S_x = tf.keras.layers.LSTM(units=1,activation='tanh',name='Sentiment_LSTM')(S_Input_rshp) #Sentiment Focused LSTM
C_x = tf.keras.layers.Concatenate(name='Concat')([P_x,S_x]) #Concatinate(join) inputs from each branch
Output = tf.keras.layers.Dense(units=1,name='Dense')(C_x) #Dense layer as model output to synthesize results
#============== Greate Model Graph:
model = tf.keras.Model(inputs=[Price_Input,Sent_Input],
outputs=Output,
name='Double_LSTM_Model')

How can I properly train a model to predict a moving average using LSTM in keras?

I'm learning how to train RNN model on Keras and I was expecting that training a model to predict the Moving Average of the last N steps would be quite easy.
I have a time series with thousands of steps and I'm able to create a model and train it with batches of data.
If I train it with the following model though, the test set predictions differ a lot from real values. (batch = 30, moving average window = 10)
inputs = tf.keras.Input(shape=(batch_length, num_features))
x = tf.keras.layers.LSTM(10, return_sequences=False)(inputs)
outputs = tf.keras.layers.Dense(num_labels)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="test_model")
To be able to get good predictions, I need to add another layer of TimeDistributed, getting 2D predictions instead of 1D ones (I get one prediction per each time step)
inputs = tf.keras.Input(shape=(batch_length, num_features))
x = tf.keras.layers.LSTM(10, return_sequences=True)(inputs)
x = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(num_labels))(x)
outputs = tf.keras.layers.Dense(num_labels)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="test_model")
I suggest that if your goal is to give as input the last 10 timesteps and have as a prediction the moving average to try a regressor model with Densely Connected layers rather than an RNN. (Linear activation with regularization might work well enough)
That option would be cheaper to train and run than an LSTM

Expected validation accuracy for Keras Mobile Net V1 for CIFAR-10 (training from scratch)

Has anybody trained Mobile Net V1 from scratch using CIFAR-10? What was the maximum accuracy you got? I am getting stuck at 70% after 110 epochs. Here is how I am creating the model. However, my training accuracy is above 99%.
#create mobilenet layer
MobileNet_model = tf.keras.applications.MobileNet(include_top=False, weights=None)
# Must define the input shape in the first layer of the neural network
x = Input(shape=(32,32,3),name='input')
#Create custom model
model = MobileNet_model(x)
model = Flatten(name='flatten')(model)
model = Dense(1024, activation='relu',name='dense_1')(model)
output = Dense(10, activation=tf.nn.softmax,name='output')(model)
model_regular = Model(x, output,name='model_regular')
I used Adam optimizer with a LR= 0.001, amsgrad = True and batch size = 64. Also normalized pixel data by dividing by 255.0. I am not using any Data Augmentation.
optimizer1 = tf.keras.optimizers.Adam(lr=0.001, amsgrad=True)
model_regular.compile(optimizer=optimizer1, loss='categorical_crossentropy', metrics=['accuracy'])
history = model_regular.fit(x_train, y_train_one_hot,validation_data=(x_test,y_test_one_hot),batch_size=64, epochs=100) # train the model
I think I am supposed to get at least 75% according to https://arxiv.org/abs/1712.04698
Am I am doing anything wrong or is this the expected accuracy after 100 epochs. Here is a plot of my validation accuracy.
Mobilenet was designed to train Imagenet which is much larger, therefore train it on Cifar10 will inevitably result in overfitting. I would suggest you plot the loss (not acurracy) from both training and validation/evaluation, and try to train it hard to achieve 99% training accuracy, then observe the validation loss. If it is overfitting, you would see that the validation loss will actually increase after reaching minima.
A few things to try to reduce overfitting:
add dropout before fully connected layer
data augmentation - random shift, crop and rotation should be enough
use smaller width multiplier (read the original paper, basically just reduce number of filter per layers) e.g. 0.75 or 0.5 to make the layers thinner.
use L2 weight regularization and weight decay
Then there are some usual training tricks:
use learning rate decay e.g. reduce the learning rate from 1e-2 to 1e-4 stepwise or exponentially
With some hyperparameter search, I got evaluation loss of 0.85. I didn't use Keras, I wrote the Mobilenet myself using Tensorflow.
The OP asked about MobileNetv1. Since MobileNetv2 has been published, here is an update on training MobileNetv2 on CIFAR-10 -
1) MobileNetv2 is tuned primarily to work on ImageNet with an initial image resolution of 224x224. It has 5 convolution operations with stride 2. Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx7x7, where C is the number of filters (1280 for MobileNetV2).
2) For CIFAR10, I changed the stride in the first three of these layers to 1. Thus the GlobalAvgPool2D gets a feature map of Cx8x8. Secondly, I trained with 0.25 on the width parameter (affects the depth of the network). I trained with mixup in mxnet (https://gluon-cv.mxnet.io/model_zoo/classification.html). This gets me a validation accuracy of 93.27.
3) Another MobileNetV2 implementation that seems to work well for CIFAR-10 is available here - PyTorch-CIFAR
The reported accuracy is 94.43. This implementation changes the stride in the first two of the original layers which downsample the resolution to stride 1. And it uses the full width of the channels as used for ImageNet.
4) Further, I trained a MobileNetV2 on CIFAR-10 with mixup while only setting altering the stride in the first conv layer from 2 to 1 and used the complete depth (width parameter==1.0). Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx2x2. This gets me an accuracy of 92.31.

Recurrent Neural Network Mini-Batch dependency after trained

Currently, I have a neural network, built in tensorflow that is used to classify time sequence data into one of 6 categories. The network is composed of:
2 fully connected layers -> LSTM unit -> softmax -> output
All layers have regularization in the form of dropout and or layer normalization. In order to speed up the training process, I am using mini-batching of the data, where the mini-batch size = # of categories = 6. Each mini-batch contains exactly one sample for each of the 6 categories, arranged randomly in the mini-batch. Below is the feed-forward code, where x is of shape [batch_size, number of time steps, number of features], and the various get commands are simple definitions for creating standard fully connected layers and LSTM units with regularization.
def getFullyConnected(input ,hidden ,dropout, layer, phase):
weight = tf.Variable(tf.random_normal([input.shape.dims[1].value,hidden]), name="weight_layer"+str(layer))
bias = tf.Variable(tf.random_normal([1]), name="bias_layer"+str(layer))
layer = tf.add(tf.matmul(input, weight), bias)
layer = tf.contrib.layers.batch_norm(layer,
center=True, scale=True,
is_training=phase)
layer = tf.minimum(tf.nn.relu(layer), FLAGS.relu_clip)
layer = tf.nn.dropout(layer, (1.0 - dropout))
return layer
def RNN(x, weights, biases, time_steps):
#shape the input as [batch_size*time_steps, input_depth]
x = tf.reshape(x, [-1,input_depth])
layer1 = getFullyConnected(input=x, hidden=16, dropout=full_drop, layer=1, phase=True)
layer2 = getFullyConnected(input=layer1, hidden=input_depth*3, dropout=full_drop, layer=2, phase=True)
rnn_input = tf.reshape(layer2, [-1,time_steps,input_depth*3])
# 1-layer LSTM with n_hidden units.
LSTM_cell = getLSTMcell(n_hidden)
#generate prediction
outputs, state = tf.nn.dynamic_rnn(LSTM_cell,
rnn_input,
dtype=tf.float32,
time_major=False)
#good old tensorboard saves
tf.summary.histogram('weight', weights['out'])
tf.summary.histogram('bias',biases['out'])
#there are time_steps outputs, but only grab the last output for the classification
return tf.sigmoid(tf.matmul(outputs[:,-1,:], weights['out']) + biases['out'])
Surprisingly, this network trained extremely well giving me about 99.75% accuracy on my test data (which the trained network had never seen). However, it only scored this high when I fed the training data into the network with a mini-batch size the same as during training, 6. If I only fed the training data one sample at a time (mini-batch size = 1), the network was scoring around 60%. What is weird is that, if I train the network with only single samples (mini-batch size = 1), the trained network works perfectly fine with high accuracy once the network is trained. This leads me to the weird conclusion that the network is almost learning to utilize the batch size in its learning, so much so that it becomes dependent on the mini-batch to classify correctly.
Is it a thing for a deep network to become dependent on the size of the mini-batch during training, so much that the final trained network will require input data to have the same mini-batch size just to perform correctly?
All ideas or thoughts would be loved!

Keras models in tensorflow

I'm building image processing network in tensorflow and I want to make use of texture loss. Texture loss seems simple to implement if you have pretrained model loaded.
I'm using TF to build the computational graph for my model and I want to incorporate Keras.application.VGG19 model to get output from layer 'block4_conv4'.
The problem is: I have two TF tensors target and result from my main model, how to feed them into keras VGG19 in the same session to compute their diff and use it in main loss for my model?
It seems following code does the trick
with tf.variable_scope("") as scope:
phi_func = VGG19(include_top=False, weights=None, input_shape=(128, 128, 3))
text_1 = phi_func(predicted)
scope.reuse_variables()
text_2 = phi_func(x)
text_loss = tf.reduce_mean((text_1 - text_2)**2)
right after session created I call phi_func.load_weights(path) to initiate weights