How can I build <feature_name : feature_weight> dict in Tensorflow v2.4 - tensorflow

I am new in tensorflow. I use tf.feature_column to finish feature_engineering task, including crossed_column operation. After a simple LR model, I can get the feature_weight for every feature, but how do I know the feature's name? For example:
my_features = []
weight = IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='weight', vocabulary_list=(2, 3, 0, 1), dtype=tf.int64, default_value=-1, num_oov_buckets=0))
my_features.append(weight)
vol = IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='vol', vocabulary_list=(3, 4, 1, 2), dtype=tf.int64, default_value=-1, num_oov_buckets=0))
my_features.append(vol)
weight_X_vol = tf.feature_column.crossed_column( [feature_weight, feature_vol], has_bucket_size = 50)
my_features.append(weight_X_vol)
in feature_weight, it's one_hot feature size is 4(weight0, 1, 2, 3)
in feature_vol, it's one_hot feature size is 4(vol0, 1, 2, 3)
in feature_weight_X_vol, it's one_hot feature size is 16(weight0_vol0, weight0_vol1, ...., weight3_vol3)
so my features_size is 4 + 4 + 16 = 24
After fit a simple LR model, I can get the 24 feature_weight by model.layers[1].get_weights(), but how can I build a dict to map feature_name(key) to the feature_weight(value)? For example:
my model is :
my_model = tf.keras.Sequential([
tf.keras.layers.DenseFeatures(my_features),
tf.keras.layers.Dense(1, activatio="sigmoid")
])
feature_weight = my_model.layers[0].get_weights()
my weight is :
faeture_weight = [0.1, 0.2, 0.1, ..., 0.8, 0.4. 0.3] //lengh is 24
my feature_name is :
feature_name = [weight0, weight1, weight2, ..., vol0, vol1, ..., weight0_vol0, weight3_vol3] // lengh is 24
How can I map the key to the value? I just want to know every feature's weight for debuging.

Related

Keras Sequential with multiple inputs

Given 3 array as input to the network, it should learn what links data in 1st array, 2nd array, and 3rd array.
In particular:
1st array contains integer numbers (eg.: 2, 3, 5, 6, 7)
2nd array contains integer numbers (eg.: 3, 2, 4, 6, 2)
3rd array contains integer numbers that are the results of an operation done between data in 1st and 2nd array (eg.: 6, 6, 20, 36, 14).
As you can see from the example data here above, the operation done is a multiplication so the network should learn this, giving:
model.predict(11,2) = 22.
Here's the code I've used:
import logging
import numpy as np
import tensorflow as tf
primo = np.array([2, 3, 5, 6, 7])
secondo = np.array([3, 2, 4, 6, 2])
risu = np.array([6, 6, 20, 36, 14])
l0 = tf.keras.layers.Dense(units=1, input_shape=[1])
model = tf.keras.Sequential([l0])
input1 = tf.keras.layers.Input(shape=(1, ), name="Pri")
input2 = tf.keras.layers.Input(shape=(1, ), name="Sec")
merged = tf.keras.layers.Concatenate(axis=1)([input1, input2])
dense1 = tf.keras.layers.Dense(
2,
input_dim=2,
activation=tf.keras.activations.sigmoid,
use_bias=True)(merged)
output = tf.keras.layers.Dense(
1,
activation=tf.keras.activations.relu,
use_bias=True)(dense1)
model = tf.keras.models.Model([input1, input2], output)
model.compile(
loss="mean_squared_error",
optimizer=tf.keras.optimizers.Adam(0.1))
model.fit([primo, secondo], risu, epochs=500, verbose = False, batch_size=16)
print(model.predict(11, 2))
My questions are:
is it correct to concatenate the 2 input as I did? I don't understand if concatenating in such a way the network understand that input1 and input2 are 2 different data
I'm not able to make the model.predict() working, every attempt result in an error
Your model has two inputs, each with shape (None,1), so you need to use np.expand_dims:
print(model.predict([np.expand_dims(np.array(11), 0), np.expand_dims(np.array(2), 0)]))
Output:
[[20.316557]]

XGBoost gives nondeterministic predictions in version <1.0. for INT input

I've trained a classification model in 0.9:
param = {
'objective': 'multi:softprob',
'num_class': 9,
'booster': 'dart',
'eta': 0.3,
'gamma': 0,
'max_depth': 6,
'alpha': 0,
'lambda': 1,
'colsample_bylevel':0.8,
'colsample_bynode': 0.8,
'colsample_bytree': 0.8,
'normalize_type': 'tree',
'rate_drop': 1.0,
'min_child_weight': 5,
'subsample': 0.5,
'num_parallel_tree': 1,
'tree_method': 'approx'
}
model = xgb.train(param, D_train, num_boost_round=1)
model.save_model('./model.bst')
Here is the sample training data:
{f1:1, f2:1, label:"1"}
Both f1 and f2 are integers.
During prediction, the results are nondeterministic with the same input. Sometimes (~1 out of 10 times) it gives equal probability for every output class.
This issue is gone when switching to XGB 1.0: Use XGB 1.0 to make predictions on a model trained in 0.9.
imported_model = xgb.Booster(model_file='./model.bst')
encoded=[[[0, 0]]
prediction_input = xgb.DMatrix(
np.array(encoded).reshape(2, -1), missing=None)
for i in range(1000):
outputs = imported_model.predict(prediction_input)
print(outputs)
Does anyone know the root cause?

Process output data from YOLOv5 TFlite

ā¯”Question
Hi, I have successfully trained a custom model based on YOLOv5s and converted the model to TFlite. I feel silly asking, but how do you use the output data?
I get as output:
StatefulPartitionedCall: 0 = [1,25200,7]
from the converted YOLOv5 model
Netron YOLOv5s.tflite model
But I expect an output like:
StatefulPartitionedCall:3 = [1, 10, 4] # boxes
StatefulPartitionedCall:2 = [1, 10] # classes
StatefulPartitionedCall:1 = [1, 10] #scores
StatefulPartitionedCall:0 = [1] #count
(this one is from a tensorflow lite mobilenet model (trained to give 10 output data, default for tflite))
Netron mobilenet.tflite model
It may also be some other form of output, but I honestly have no idea how to get the boxes, classes, scores from a [1,25200,7] array.
(on 15-January-2021 I updated pytorch, tensorflow and yolov5 to the latest version)
The data contained in the [1, 25200, 7] array can be found in this file: outputdata.txt
0.011428807862102985, 0.006756599526852369, 0.04274776205420494, 0.034441519528627396, 0.00012877583503723145, 0.33658933639526367, 0.4722323715686798
0.023071227595210075, 0.006947836373001337, 0.046426184475421906, 0.023744791746139526, 0.0002465546131134033, 0.29862138628959656, 0.4498370885848999
0.03636947274208069, 0.006819264497607946, 0.04913407564163208, 0.025004519149661064, 0.00013208389282226562, 0.3155967593193054, 0.4081345796585083
0.04930267855525017, 0.007249316666275263, 0.04969717934727669, 0.023645592853426933, 0.0001222355494974181, 0.3123127520084381, 0.40113094449043274
...
Should I add a Non Max Suppression or something else, can someone help me please? (github YOLOv5 #1981)
Thanks to #Glenn Jocher I found the solution. The output is [xywh, conf, class0, class1, ...]
My current code is now:
def classFilter(classdata):
classes = [] # create a list
for i in range(classdata.shape[0]): # loop through all predictions
classes.append(classdata[i].argmax()) # get the best classification location
return classes # return classes (int)
def YOLOdetect(output_data): # input = interpreter, output is boxes(xyxy), classes, scores
output_data = output_data[0] # x(1, 25200, 7) to x(25200, 7)
boxes = np.squeeze(output_data[..., :4]) # boxes [25200, 4]
scores = np.squeeze( output_data[..., 4:5]) # confidences [25200, 1]
classes = classFilter(output_data[..., 5:]) # get classes
# Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
x, y, w, h = boxes[..., 0], boxes[..., 1], boxes[..., 2], boxes[..., 3] #xywh
xyxy = [x - w / 2, y - h / 2, x + w / 2, y + h / 2] # xywh to xyxy [4, 25200]
return xyxy, classes, scores # output is boxes(x,y,x,y), classes(int), scores(float) [predictions length]
To get the output data:
"""Output data"""
output_data = interpreter.get_tensor(output_details[0]['index']) # get tensor x(1, 25200, 7)
xyxy, classes, scores = YOLOdetect(output_data) #boxes(x,y,x,y), classes(int), scores(float) [25200]
And for the boxes:
for i in range(len(scores)):
if ((scores[i] > 0.1) and (scores[i] <= 1.0)):
H = frame.shape[0]
W = frame.shape[1]
xmin = int(max(1,(xyxy[0][i] * W)))
ymin = int(max(1,(xyxy[1][i] * H)))
xmax = int(min(H,(xyxy[2][i] * W)))
ymax = int(min(W,(xyxy[3][i] * H)))
cv2.rectangle(frame, (xmin,ymin), (xmax,ymax), (10, 255, 0), 2)
...

Google Colab freezes my browser and pc when trying to reconnect to a notebook

I am training a Machine learning model in google colab, to be more specific I am training a GAN with PyTorch-lightning. The problem occurs is when I get disconnected from my current runtime due to inactivity. When I try to reconnect my Browser(tried on firefox and chrome) becomes first laggy and than freezes, my pc starts to lag so that I am not able to close my browser and it doesn't go away. I am forced to press the power button of my PC in order to restart the PC.
I have no clue why this happens.
I tried various batch sizes(also the size 1) but it still happens. It can't be that my dataset is too big either(since i tried it on a dataset with 10images for testing puposes).
I hope someone can help me.
Here is my code (For using the code you will need comet.nl and enter the comet.ml api key):
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from pytorch_lightning.callbacks import ModelCheckpoint
import pytorch_lightning as pl
from pytorch_lightning import loggers
import numpy as np
from numpy.random import choice
from PIL import Image
import os
from pathlib import Path
import shutil
from collections import OrderedDict
# custom weights initialization called on netG and netD
def weights_init(m):
classname = m.__class__.__name__
if classname.find('Conv') != -1:
nn.init.normal_(m.weight.data, 0.0, 0.02)
elif classname.find('BatchNorm') != -1:
nn.init.normal_(m.weight.data, 1.0, 0.02)
nn.init.constant_(m.bias.data, 0)
# randomly flip some labels
def noisy_labels(y, p_flip=0.05): # # flip labels with 5% probability
# determine the number of labels to flip
n_select = int(p_flip * y.shape[0])
# choose labels to flip
flip_ix = choice([i for i in range(y.shape[0])], size=n_select)
# invert the labels in place
y[flip_ix] = 1 - y[flip_ix]
return y
class AddGaussianNoise(object):
def __init__(self, mean=0.0, std=0.1):
self.std = std
self.mean = mean
def __call__(self, tensor):
return tensor + torch.randn(tensor.size()) * self.std + self.mean
def __repr__(self):
return self.__class__.__name__ + '(mean={0}, std={1})'.format(self.mean, self.std)
def get_valid_labels(img):
return (0.8 - 1.1) * torch.rand(img.shape[0], 1, 1, 1) + 1.1 # soft labels
def get_unvalid_labels(img):
return noisy_labels((0.0 - 0.3) * torch.rand(img.shape[0], 1, 1, 1) + 0.3) # soft labels
class Generator(nn.Module):
def __init__(self, ngf, nc, latent_dim):
super(Generator, self).__init__()
self.ngf = ngf
self.latent_dim = latent_dim
self.nc = nc
self.main = nn.Sequential(
# input is Z, going into a convolution
nn.ConvTranspose2d(latent_dim, ngf * 8, 4, 1, 0, bias=False),
nn.BatchNorm2d(ngf * 8),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ngf*8) x 4 x 4
nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 4),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ngf*4) x 8 x 8
nn.ConvTranspose2d( ngf * 4, ngf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 2),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ngf*2) x 16 x 16
nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ngf) x 32 x 32
nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
nn.Tanh()
# state size. (nc) x 64 x 64
)
def forward(self, input):
return self.main(input)
class Discriminator(nn.Module):
def __init__(self, ndf, nc):
super(Discriminator, self).__init__()
self.nc = nc
self.ndf = ndf
self.main = nn.Sequential(
# input is (nc) x 64 x 64
nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf) x 32 x 32
nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 2),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*2) x 16 x 16
nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 4),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*4) x 8 x 8
nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 8),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*8) x 4 x 4
nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
nn.Sigmoid()
)
def forward(self, input):
return self.main(input)
class DCGAN(pl.LightningModule):
def __init__(self, hparams, logger, checkpoint_folder, experiment_name):
super().__init__()
self.hparams = hparams
self.logger = logger # only compatible with comet_logger at the moment
self.checkpoint_folder = checkpoint_folder
self.experiment_name = experiment_name
# networks
self.generator = Generator(ngf=hparams.ngf, nc=hparams.nc, latent_dim=hparams.latent_dim)
self.discriminator = Discriminator(ndf=hparams.ndf, nc=hparams.nc)
self.generator.apply(weights_init)
self.discriminator.apply(weights_init)
# cache for generated images
self.generated_imgs = None
self.last_imgs = None
# For experience replay
self.exp_replay_dis = torch.tensor([])
# creating checkpoint folder
dirpath = Path(self.checkpoint_folder)
if not dirpath.exists():
os.makedirs(dirpath, 0o755)
def forward(self, z):
return self.generator(z)
def adversarial_loss(self, y_hat, y):
return F.binary_cross_entropy(y_hat, y)
def training_step(self, batch, batch_nb, optimizer_idx):
# For adding Instance noise for more visit: https://www.inference.vc/instance-noise-a-trick-for-stabilising-gan-training/
std_gaussian = max(0, self.hparams.level_of_noise - ((self.hparams.level_of_noise * 1.5) * (self.current_epoch / self.hparams.epochs)))
AddGaussianNoiseInst = AddGaussianNoise(std=std_gaussian) # the noise decays over time
imgs, _ = batch
imgs = AddGaussianNoiseInst(imgs) # Adding instance noise to real images
self.last_imgs = imgs
# train generator
if optimizer_idx == 0:
# sample noise
z = torch.randn(imgs.shape[0], self.hparams.latent_dim, 1, 1)
# generate images
self.generated_imgs = self(z)
self.generated_imgs = AddGaussianNoiseInst(self.generated_imgs) # Adding instance noise to fake images
# Experience replay
# for discriminator
perm = torch.randperm(self.generated_imgs.size(0)) # Shuffeling
r_idx = perm[:max(1, self.hparams.experience_save_per_batch)] # Getting the index
self.exp_replay_dis = torch.cat((self.exp_replay_dis, self.generated_imgs[r_idx]), 0).detach() # Add our new example to the replay buffer
# ground truth result (ie: all fake)
g_loss = self.adversarial_loss(self.discriminator(self.generated_imgs), get_valid_labels(self.generated_imgs)) # adversarial loss is binary cross-entropy
tqdm_dict = {'g_loss': g_loss}
log = {'g_loss': g_loss, "std_gaussian": std_gaussian}
output = OrderedDict({
'loss': g_loss,
'progress_bar': tqdm_dict,
'log': log
})
return output
# train discriminator
if optimizer_idx == 1:
# Measure discriminator's ability to classify real from generated samples
# how well can it label as real?
real_loss = self.adversarial_loss(self.discriminator(imgs), get_valid_labels(imgs))
# Experience replay
if self.exp_replay_dis.size(0) >= self.hparams.experience_batch_size:
fake_loss = self.adversarial_loss(self.discriminator(self.exp_replay_dis.detach()), get_unvalid_labels(self.exp_replay_dis)) # train on already seen images
self.exp_replay_dis = torch.tensor([]) # Reset experience replay
# discriminator loss is the average of these
d_loss = (real_loss + fake_loss) / 2
tqdm_dict = {'d_loss': d_loss}
log = {'d_loss': d_loss, "d_exp_loss": fake_loss, "std_gaussian": std_gaussian}
output = OrderedDict({
'loss': d_loss,
'progress_bar': tqdm_dict,
'log': log
})
return output
else:
fake_loss = self.adversarial_loss(self.discriminator(self.generated_imgs.detach()), get_unvalid_labels(self.generated_imgs)) # how well can it label as fake?
# discriminator loss is the average of these
d_loss = (real_loss + fake_loss) / 2
tqdm_dict = {'d_loss': d_loss}
log = {'d_loss': d_loss, "std_gaussian": std_gaussian}
output = OrderedDict({
'loss': d_loss,
'progress_bar': tqdm_dict,
'log': log
})
return output
def configure_optimizers(self):
lr = self.hparams.lr
b1 = self.hparams.b1
b2 = self.hparams.b2
opt_g = torch.optim.Adam(self.generator.parameters(), lr=lr, betas=(b1, b2))
opt_d = torch.optim.Adam(self.discriminator.parameters(), lr=lr, betas=(b1, b2))
return [opt_g, opt_d], []
def train_dataloader(self):
transform = transforms.Compose([transforms.Resize((self.hparams.image_size, self.hparams.image_size)),
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5])])
dataset = MNIST(os.getcwd(), train=True, download=True, transform=transform)
return DataLoader(dataset, batch_size=self.hparams.batch_size)
# transform = transforms.Compose([transforms.Resize((self.hparams.image_size, self.hparams.image_size)),
# transforms.ToTensor(),
# transforms.Normalize([0.5], [0.5])
# ])
# train_dataset = torchvision.datasets.ImageFolder(
# root="./drive/My Drive/datasets/ghibli_dataset_small_overfit/",
# transform=transform
# )
# return DataLoader(train_dataset, num_workers=self.hparams.num_workers, shuffle=True, batch_size=self.hparams.batch_size)
def on_epoch_end(self):
z = torch.randn(4, self.hparams.latent_dim, 1, 1)
# match gpu device (or keep as cpu)
if self.on_gpu:
z = z.cuda(self.last_imgs.device.index)
# log sampled images
sample_imgs = self.generator(z)
sample_imgs = sample_imgs.view(-1, self.hparams.nc, self.hparams.image_size, self.hparams.image_size)
grid = torchvision.utils.make_grid(sample_imgs, nrow=2)
self.logger.experiment.log_image(grid.permute(1, 2, 0), f'generated_images_epoch{self.current_epoch}', step=self.current_epoch)
# save model
if self.current_epoch % self.hparams.save_model_every_epoch == 0:
trainer.save_checkpoint(self.checkpoint_folder + "/" + self.experiment_name + "_epoch_" + str(self.current_epoch) + ".ckpt")
comet_logger.experiment.log_asset_folder(self.checkpoint_folder, step=self.current_epoch)
# Deleting the folder where we saved the model so that we dont upload a thing twice
dirpath = Path(self.checkpoint_folder)
if dirpath.exists() and dirpath.is_dir():
shutil.rmtree(dirpath)
# creating checkpoint folder
access_rights = 0o755
os.makedirs(dirpath, access_rights)
from argparse import Namespace
args = {
'batch_size': 48,
'lr': 0.0002,
'b1': 0.5,
'b2': 0.999,
'latent_dim': 128, # tested value which worked(in V4_1): 100
'nc': 1,
'ndf': 32,
'ngf': 32,
'epochs': 10,
'save_model_every_epoch': 5,
'image_size': 64,
'num_workers': 2,
'level_of_noise': 0.15,
'experience_save_per_batch': 1, # this value should be very low; tested value which works: 1
'experience_batch_size': 50 # this value shouldnt be too high; tested value which works: 50
}
hparams = Namespace(**args)
# Parameters
experiment_name = "DCGAN_V4_2_MNIST"
dataset_name = "MNIST"
checkpoint_folder = "DCGAN/"
tags = ["DCGAN", "MNIST", "OVERFIT", "64x64"]
dirpath = Path(checkpoint_folder)
# init logger
comet_logger = loggers.CometLogger(
api_key="",
rest_api_key="",
project_name="gan",
experiment_name=experiment_name,
#experiment_key="f23d00c0fe3448ee884bfbe3fc3923fd" # used for resuming trained id can be found in comet.ml
)
#defining net
net = DCGAN(hparams, comet_logger, checkpoint_folder, experiment_name)
#logging
comet_logger.experiment.set_model_graph(str(net))
comet_logger.experiment.add_tags(tags=tags)
comet_logger.experiment.log_dataset_info(dataset_name)
trainer = pl.Trainer(#resume_from_checkpoint="GHIBLI_DCGAN_OVERFIT_64px_epoch_6000.ckpt",
logger=comet_logger,
max_epochs=args["epochs"]
)
trainer.fit(net)
comet_logger.experiment.end()
I fixed it with importing this:
from IPython.display import clear_output

ctc_loss error "No valid path found."

Training a model with tf.nn.ctc_loss produces an error every time the train op is run:
tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
Unlike in previous questions about this function, this is not due to divergence. I have a low learning rate, and the error occurs on even the first train op.
The model is a CNN -> LSTM -> CTC. Here is the model creation code:
# Build Graph
self.videoInput = tf.placeholder(shape=(None, self.maxVidLen, 50, 100, 3), dtype=tf.float32)
self.videoLengths = tf.placeholder(shape=(None), dtype=tf.int32)
self.keep_prob = tf.placeholder(dtype=tf.float32)
self.targets = tf.sparse_placeholder(tf.int32)
self.targetLengths = tf.placeholder(shape=(None), dtype=tf.int32)
conv1 = tf.layers.conv3d(self.videoInput ...)
pool1 = tf.layers.max_pooling3d(conv1 ...)
conv2 = ...
pool2 = ...
conv3 = ...
pool3 = ...
cnn_out = tf.reshape(pool3, shape=(-1, self.maxVidLength, 4*7*96))
fw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
bw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
outputs, _ = tf.nn.bidirectional_dynamic_rnn(
fw_cell, bw_cell, cnn_out, sequence_length=self.videoLengths, dtype=tf.float32)
outputs = tf.concat(outputs, 2)
outputs = tf.reshape(outputs, [-1, self.hidden_size * 2])
w = tf.Variable(tf.random_normal((self.hidden_size * 2, len(self.char2index) + 1), stddev=0.2))
b = tf.Variable(tf.zeros(len(self.char2index) + 1))
out = tf.matmul(outputs, w) + b
out = tf.reshape(out, [-1, self.maxVidLen, len(self.char2index) + 1])
out = tf.transpose(out, [1, 0, 2])
cost = tf.reduce_mean(tf.nn.ctc_loss(self.targets, out, self.targetLengths))
self.train_op = tf.train.AdamOptimizer(0.0001).minimize(cost)
And here is the feed dict creation code:
indices = []
values = []
shape = [len(vids) * 2, self.maxLabelLen]
vidInput = np.zeros((len(vids) * 2, self.maxVidLen, 50, 100, 3), dtype=np.float32)
# Actual video, then left-right flip
for j in range(len(vids) * 2):
# K is video index
k = j if j < len(vids) else j - len(vids)
# convert video and label to input format
vidInput[j, 0:len(vids[k])] = vids[k] if k == j else vids[k][:,::-1,:]
indices.extend([j, i] for i in range(len(labelList[k])))
values.extend(self.char2index[c] for c in labelList[k])
fd[self.targets] = (indices, values, shape)
fd[self.videoInput] = vidInput
# Collect video lengths and label lengths
vidLengths = [len(j) for j in vids] + [len(j) for j in vids]
labelLens = [len(l) for l in labelList] + [len(l) for l in labelList]
fd[self.videoLengths] = vidLengths
fd[self.targetLengths] = labelLens
It turns out that the ctc_loss requires that the label lengths be shorter than the input lengths. If the label lengths are too long, the loss calculator cannot unroll completely and therefore cannot compute the loss.
For example, the label BIFI would require input length of at least 4 while the label BIIF would require input length of at least 5 due to a blank being inserted between the repeated symbols.
I had the same issue but I soon realized it was just because I was using glob and my label was in the filename so it was exceeding.
You can fix this issue by using:
os.path.join(*(filename.split(os.path.sep)[noOfDir:]))
For me the problem was fixed by setting preprocess_collapse_repeated=True.
FWIW: My target sequence length was already shorter than inputs, and the RNN outputs are that of softmax.
Another possible reason which I found out in my case is the input data range is not normalized to 0~1, due to that LSTM activation function becomes saturated in the beginning of the training, and causes "no valid path" log somehow.