Text recognition model improvement suggestions - tensorflow

I am trying to make a text recognition model using CNNs and LSTMs with CTC Loss. My current model looks like this where the numbers in bracket are the shape of tensor after each layer. I have a vocabulary size of 94 and size of my input image is (64x1024). This model is improving very slowly and I need some thought if I can change some thing. Thanks :)
Input: (?, 64, 1024, 1)
cnn-1: [None, 64, 1024, 64]
bn-1: [None, 64, 1024, 64]
relu-1: [None, 64, 1024, 64]
maxpool-1: [None, 32, 512, 64]
cnn-2: [None, 32, 512, 128]
bn-2: [None, 32, 512, 128]
relu-2: [None, 32, 512, 128]
maxpool-2: [None, 16, 256, 128]
cnn-3: [None, 16, 256, 128]
bn-3: [None, 16, 256, 128]
relu-3: [None, 16, 256, 128]
maxpool-3: [None, 8, 128, 128]
cnn-4: [None, 8, 128, 256]
bn-4: [None, 8, 128, 256]
relu-4: [None, 8, 128, 256]
maxpool-4: [None, 4, 64, 256]
cnn-5: [None, 4, 64, 256]
bn-5: [None, 4, 64, 256]
relu-5: [None, 4, 64, 256]
maxpool-5: [None, 2, 32, 256]
lstm-input: [None, 32, 512]
lstm-output: [None, 32, 64]
lstm-output-reshaped: [None, 64]
fully-connected: [None, 94]
reshaped_logits: [None, None, 94]
transposed_logits: [None, None, 94]

Related

Using a model from TF Hub as a convolutional feature extractor

I would like to build a custom model on top of ResNet-50 extractor (its intermediate layer).
Here how I am trying to do that:
def get_model(img_shape, num_classes):
inputs = Input(shape=img_shape)
backbone = hub.load("https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3").signatures["image_feature_vector"]
x = backbone(inputs)['resnet_v2_50/block3/unit_1/bottleneck_v2/shortcut']
# Add a per-pixel classification layer
outputs = Conv2D(num_classes + 1, 3, activation="softmax", padding="same")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
This code works. However, it runs very long (7 min in Colab) and the model summary I get look like this:
input_14 (InputLayer) [(None, 224, 224, 3)] 0
tf_op_layer_StatefulPartiti [(None, 2048), 0
onedCall_16 (TensorFlowOpLa (None, 28, 28, 256),
yer) (None, 56, 56, 256),
(None, 56, 56, 64),
(None, 56, 56, 64),
(None, 56, 56, 256),
(None, 56, 56, 256),
(None, 56, 56, 256),
(None, 56, 56, 64),
(None, 56, 56, 64),
(None, 56, 56, 256),
(None, 28, 28, 256),
(None, 56, 56, 64),
(None, 28, 28, 64),
(None, 28, 28, 256),
(None, 14, 14, 512),
(None, 28, 28, 512),
(None, 28, 28, 128),
(None, 28, 28, 128),
(None, 28, 28, 512),
(None, 28, 28, 512),
(None, 28, 28, 512),
(None, 28, 28, 128),
(None, 28, 28, 128),
(None, 28, 28, 512),
(None, 28, 28, 512),
(None, 28, 28, 128),
(None, 28, 28, 128),
(None, 28, 28, 512),
(None, 14, 14, 512),
(None, 28, 28, 128),
(None, 14, 14, 128),
(None, 14, 14, 512),
(None, 7, 7, 1024),
(None, 14, 14, 1024),
(None, 14, 14, 256),
(None, 14, 14, 256),
(None, 14, 14, 1024),
(None, 14, 14, 1024),
(None, 14, 14, 1024),
(None, 14, 14, 256),
(None, 14, 14, 256),
(None, 14, 14, 1024),
(None, 14, 14, 1024),
(None, 14, 14, 256),
(None, 14, 14, 256),
(None, 14, 14, 1024),
(None, 14, 14, 1024),
(None, 14, 14, 256),
(None, 14, 14, 256),
(None, 14, 14, 1024),
(None, 14, 14, 1024),
(None, 14, 14, 256),
(None, 14, 14, 256),
(None, 14, 14, 1024),
(None, 7, 7, 1024),
(None, 14, 14, 256),
(None, 7, 7, 256),
(None, 7, 7, 1024),
(None, 7, 7, 2048),
(None, 7, 7, 2048),
(None, 7, 7, 512),
(None, 7, 7, 512),
(None, 7, 7, 2048),
(None, 7, 7, 2048),
(None, 7, 7, 2048),
(None, 7, 7, 512),
(None, 7, 7, 512),
(None, 7, 7, 2048),
(None, 7, 7, 2048),
(None, 7, 7, 512),
(None, 7, 7, 512),
(None, 7, 7, 2048),
(None, 112, 112, 64),
(None, 1, 1, 2048)]
conv2d_3 (Conv2D) (None, 14, 14, 2) 18434
To me that looks like a hanging outputs from all intermediate layers of ResNet. Also, the execution time is somewhat too long for instantiation such a simple model.
How to avoid getting list of tensors as TensorFlowOpLayer and just link a single output to the rest of the model?

Implementation of a WGAN-GP in tensorflow

Using tensorflow, I'm trying to reimplement the following architecture (for now I'm focusing on the Generator part):
What I've done for now has been defining the generator in the following way:
N_Z = 128
generator = [
tf.keras.layers.Dense(units=6144, activation="relu"),
tf.keras.layers.Reshape(target_shape=(6, 4, 256)),
tf.keras.layers.Conv2DTranspose(
filters=128, kernel_size=(5,5), strides=(2, 2), padding="SAME", activation="relu"
),
tf.keras.layers.Conv2DTranspose(
filters=128, kernel_size=(3,3), strides=(2, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Conv2DTranspose(
filters=64, kernel_size=(3,3), strides=(1, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Conv2DTranspose(
filters=64, kernel_size=(3,3), strides=(2, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Conv2DTranspose(
filters=32, kernel_size=(3,3), strides=(1, 1), padding="SAME", activation="relu"
)
tf.keras.layers.Conv2DTranspose(
filters=32, kernel_size=(3,3), strides=(2, 1), padding="SAME", activation="relu"
)
tf.keras.layers.Conv2DTranspose(
filters=1, kernel_size=(3,3), strides=(1, 1), padding="SAME", activation="relu"
)
]
Generator = tf.keras.models.Sequential(generator)
But if I take some random noise and let the model process it, this is the final shape I get back:
noise = tf.random.normal((64,128))
result = Generator(noise)
result.shape
TensorShape([64, 28, 28, 1])
What am I doing wrong here? I was also checking the original implementation to see additional details but I can't find anything that makes me understand.
It is easy you need to see input-output, it required some help at the top levels.
[ Sample ]:
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape=( 6144 )),
tf.keras.layers.Dense( 48 * 128, activation="linear" ),
tf.keras.layers.BatchNormalization( momentum=0.99, epsilon=0.00001 ),
tf.keras.layers.Reshape(target_shape=( 6, 4, 256 )),
tf.keras.layers.Conv2DTranspose(
filters=128, kernel_size=(5,5), strides=(2, 2), padding="same", activation="relu"
),
tf.keras.layers.Resizing( 11, 8, interpolation='bilinear', crop_to_aspect_ratio=False ),
tf.keras.layers.Reshape(target_shape=(11, 8, 128)),
tf.keras.layers.Conv2DTranspose(
filters=128, kernel_size=(3,3), strides=(2, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Resizing( 22, 8, interpolation='bilinear', crop_to_aspect_ratio=False ),
tf.keras.layers.Reshape(target_shape=(22, 8, 128)),
tf.keras.layers.Conv2DTranspose(
filters=64, kernel_size=(3,3), strides=(1, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Resizing( 22, 8, interpolation='bilinear', crop_to_aspect_ratio=False ),
tf.keras.layers.Reshape(target_shape=(22, 8, 64)),
tf.keras.layers.Conv2DTranspose(
filters=64, kernel_size=(3,3), strides=(2, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Resizing( 43, 8, interpolation='bilinear', crop_to_aspect_ratio=False ),
tf.keras.layers.Reshape(target_shape=(43, 8, 64)),
tf.keras.layers.Conv2DTranspose(
filters=32, kernel_size=(3,3), strides=(1, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Resizing( 43, 8, interpolation='bilinear', crop_to_aspect_ratio=False ),
tf.keras.layers.Reshape(target_shape=(43, 8, 32)),
tf.keras.layers.Conv2DTranspose(
filters=32, kernel_size=(3,3), strides=(2, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Resizing( 85, 8, interpolation='bilinear', crop_to_aspect_ratio=False ),
tf.keras.layers.Reshape(target_shape=(85, 8, 32)),
])
model.summary()
[ Output ]:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 6144) 37754880
batch_normalization (BatchN (None, 6144) 24576
ormalization)
reshape (Reshape) (None, 6, 4, 256) 0
conv2d_transpose (Conv2DTra (None, 12, 8, 128) 819328
nspose)
resizing (Resizing) (None, 11, 8, 128) 0
reshape_1 (Reshape) (None, 11, 8, 128) 0
conv2d_transpose_1 (Conv2DT (None, 22, 8, 128) 147584
ranspose)
resizing_1 (Resizing) (None, 22, 8, 128) 0
reshape_2 (Reshape) (None, 22, 8, 128) 0
conv2d_transpose_2 (Conv2DT (None, 22, 8, 64) 73792
ranspose)
resizing_2 (Resizing) (None, 22, 8, 64) 0
reshape_3 (Reshape) (None, 22, 8, 64) 0
conv2d_transpose_3 (Conv2DT (None, 44, 8, 64) 36928
ranspose)
resizing_3 (Resizing) (None, 43, 8, 64) 0
reshape_4 (Reshape) (None, 43, 8, 64) 0
conv2d_transpose_4 (Conv2DT (None, 43, 8, 32) 18464
ranspose)
resizing_4 (Resizing) (None, 43, 8, 32) 0
reshape_5 (Reshape) (None, 43, 8, 32) 0
conv2d_transpose_5 (Conv2DT (None, 86, 8, 32) 9248
ranspose)
resizing_5 (Resizing) (None, 85, 8, 32) 0
reshape_6 (Reshape) (None, 85, 8, 32) 0
=================================================================
Total params: 38,884,800
Trainable params: 38,872,512
Non-trainable params: 12,288
_________________________________________________________________
2022-04-03 03:37:10.354570: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
(1, 85, 8, 32)
1/1 [==============================] - 2s 2s/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 0.0000e+00 - val_accuracy: 1.0000

Upsampling using 3d_transposed_convolution layers

Suppose I have a 4D tensor x from a previous layer with shape [2, 2, 7, 7, 64] where batch = 2, depth = 2, height = 7, width = 7, and in_channels = 64.
And I'd like to upsample it to a tensor with shape [2, 4, 14, 14, 32].
Maybe next steps are transferring it with shape like [2, 8, 28, 28, 16] and [2, 16, 112, 112, 1] and so on.
I'm new to Tensorflow and I know that the implementations of transposed convolution between CAFFE and Tensorflow are different. I mean, in CAFFE, you can define the size of output by changing the strides of kernel. However, it's more complicated in tensorflow.
So how can I do that with tf.layers.conv3d_transpose or tf.nn.conv3d_transpose?
Would anyone give me a hand? Thanks!
You can do the upsampling with both tf.layers.conv3d_transpose and tf.nn.conv3d_transpose.
Lets consider your input tensor as:
input_layer = tf.placeholder(tf.float32, (2, 2, 7, 7, 64)) # batch, depth, height, width, in_channels
With tf.nn.conv3d_transpose we need to take care of the creation of the variables (weights and bias):
def conv3d_transpose(name, l_input, w, b, output_shape, stride=1):
transp_conv = tf.nn.conv3d_transpose(l_input, w, output_shape, strides=[1, stride, stride, stride, 1], padding='SAME')
return tf.nn.bias_add(transp_conv, b, name=name)
# Create variables for the operation
with tf.device('/cpu:0'):
# weights will have the shape [depth, height, width, output_channels, in_channels]
weights = tf.get_variable(name='w_transp_conv', shape=[3, 3, 3, 32, 64])
bias = tf.get_variable(name='b_transp_conv', shape=[32])
t_conv_layer = conv3d_transpose('t_conv_layer', input_layer, weights, bias,
output_shape=[2, 4, 14, 14, 32], stride=2)
print(t_conv_layer)
# Tensor("t_conv_layer:0", shape=(2, 4, 14, 14, 32), dtype=float32)
With tf.layers.conv3d_transpose, which will take care of the creation of both weights and bias, we use the same input tensor input_layer:
t_conv_layer2 = tf.layers.conv3d_transpose(input_layer, filters=32, kernel_size=[3, 3, 3],
strides=(2, 2, 2), padding='SAME', name='t_conv_layer2')
print(t_conv_layer2)
# Tensor("t_conv_layer2/Reshape_1:0", shape=(2, 4, 14, 14, 32), dtype=float32)
To get the other upsampled tensors you can repeat this procedure by changing the strides as you need:
Example with tf.layers.conv3d_transpose:
t_conv_layer3 = tf.layers.conv3d_transpose(t_conv_layer2, filters=16, kernel_size=[3, 3, 3],
strides=(2, 2, 2), padding='SAME', name='t_conv_layer3')
t_conv_layer4 = tf.layers.conv3d_transpose(t_conv_layer3, filters=8, kernel_size=[3, 3, 3],
strides=(2, 2, 2), padding='SAME', name='t_conv_layer4')
t_conv_layer5 = tf.layers.conv3d_transpose(t_conv_layer4, filters=1, kernel_size=[3, 3, 3],
strides=(1, 2, 2), padding='SAME', name='t_conv_layer5')
print(t_conv_layer5)
# Tensor("t_conv_layer5/Reshape_1:0", shape=(2, 16, 112, 112, 1), dtype=float32)
Note: since tf.nn.conv3d_transpose is actually the gradient of tf.nn.conv3d, you can make sure that the variable output_shape is correct, by considering the forward operation with tf.nn.conv3d.
def print_expected(weights, shape, stride=1):
output = tf.constant(0.1, shape=shape)
expected_layer = tf.nn.conv3d(output, weights, strides=[1, stride, stride, stride, 1], padding='SAME')
print("Expected shape of input layer when considering the output shape ({} and stride {}): {}".format(shape, stride, expected_layer.get_shape()))
Therefore, to produce a transposed convolution with shape [2, 4, 14, 14, 32], we can check, for example, strides 1 and 2:
print_expected(weights, shape=[2, 4, 14, 14, 32], stride=1)
print_expected(weights, shape=[2, 4, 14, 14, 32], stride=2)
which prints and confirms that the second option (using stride 2) is the right one to produce a tensor with our desired shape:
Expected shape of input layer when considering the output shape ([2, 4, 14, 14, 32] and stride 1): (2, 4, 14, 14, 64)
Expected shape of input layer when considering the output shape ([2, 4, 14, 14, 32] and stride 2): (2, 2, 7, 7, 64)

How to broadcast third dimension in tensorflow?

I have a sobel filter
sobel_x = tf.constant([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], tf.float32)
I want to get a depth of 64. The shape is momentarily [3,3,1], but it should result in [3,3,64].
How do to that? With the following line, I get shape errors.
tf.tile(sobel_x, [1, 1, 64])
ValueError: Shape must be rank 2 but is rank 3 for 'Tile' (op: 'Tile') with input shapes: [3,3], [3].
The reason you cannot broadcast is that the third dimension does not exist, and so you actually have a rank 2 tensor.
>>> sess.run(tf.shape(sobel_x))
array([3, 3], dtype=int32)
We can solve this problem by reshaping the tensor first.
>>> sobel_x = tf.reshape(sobel_x, [3, 3, 1])
>>> tf.tile(sobel_x, [1, 1, 64])
<tf.Tensor 'Tile_6:0' shape=(3, 3, 64) dtype=float32>
I think your issue is with sobel_x.
sobel_x.get_shape(): TensorShape([Dimension(3), Dimension(3)])
sobel_x: <tf.tensor 'Const:0' shape=(3, 3) dtype=float32
So sobel_x is a two dimension matrix and your passing a rank 3 input to tile hence the error.
Fix: Make sobel_x 3 dimensional such that the shape is shape=(3, 3, 1)
then tf.tile(sobel_x, [1, 1, 64] will output shape=(1, 1, 64)

Trouble with variable initialization

I am building a graph via a function and am trying to extract the value of a variable to add further operations. A part of the function I have written is shown below :
def build(self, save_path=None, save_name=None):
g = tf.Graph()
with g.as_default():
init_op = tf.initialize_all_variables()
images = tf.placeholder(tf.float32, shape=[None, 300, 300, 3], name='input')
with tf.variable_scope('conv1_'):
conv11 = self.conv_relu(images, kernel_shape=[3, 3, 3, 64], bias_shape=64, name='c1')
conv12 = self.conv_relu(conv11, kernel_shape=[3, 3, 64, 64], bias_shape=64, name='c2')
pool1 = tf.nn.max_pool(conv12, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool1')
with tf.variable_scope('conv2_'):
conv21 = self.conv_relu(pool1, kernel_shape=[3, 3, 64, 128], bias_shape=128, name='c1')
conv22 = self.conv_relu(conv21, kernel_shape=[3, 3, 128, 128], bias_shape=128, name='c2')
pool2 = tf.nn.max_pool(conv22, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool2')
with tf.variable_scope('conv3_'):
conv31 = self.conv_relu(pool2, kernel_shape=[3, 3, 128, 256], bias_shape=256, name='c1')
conv32 = self.conv_relu(conv31, kernel_shape=[3, 3, 256, 256], bias_shape=256, name='c2')
conv33 = self.conv_relu(conv32, kernel_shape=[3, 3, 256, 256], bias_shape=256, name='c3')
pool3 = tf.nn.max_pool(conv33, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool3')
with tf.variable_scope('conv4_'):
conv41 = self.conv_relu(pool3, kernel_shape=[3, 3, 256, 512], bias_shape=512, name='c1')
conv42 = self.conv_relu(conv41, kernel_shape=[3, 3, 512, 512], bias_shape=512, name='c2')
conv43 = self.conv_relu(conv42, kernel_shape=[3, 3, 512, 512], bias_shape=512, name='c3')
pool4 = tf.nn.max_pool(conv43, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool4')
with tf.variable_scope('conv5_'):
conv51 = self.conv_relu(pool4, kernel_shape=[3, 3, 512, 512], bias_shape=512, name='c1')
conv52 = self.conv_relu(conv51, kernel_shape=[3, 3, 512, 512], bias_shape=512, name='c2')
conv53 = self.conv_relu(conv52, kernel_shape=[3, 3, 512, 512], bias_shape=512, name='c3')
pool5 = tf.nn.max_pool(conv53, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool5')
pool5_shape = tf.shape(pool5)
pool5_reshaped = tf.reshape(pool5, shape=[pool5_shape[0], -1], name='pool5_reshaped')
weight_rows = pool5_shape[1] * pool5_shape[2] * pool5_shape[3]
sess = tf.Session(graph=g)
inp = np.zeros(shape=(2, 300, 300, 3))
print(inp.shape)
sess.run(init_op)
print(sess.run(weight_rows, feed_dict={images:inp}))
sess.close()
At the line print(sess.run(weight_rows, feed_dict={images:inp})) i get the following error :
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value conv5_/biasesc3
[[Node: conv5_/biasesc3/read = Identity[T=DT_FLOAT, _class=["loc:#conv5_/biasesc3"], _device="/job:localhost/replica:0/task:0/cpu:0"](conv5_/biasesc3)]]
What is the reason for this error when I have run the init_op operation in the session before ? Exactly how does this work and what am I doing wrong over here ?
You need to define your init_op (i.e. call tf.initialize_all_variables()) after you declared all variables.
Creating a variable via tf.get_variable or tf.Variable places it in GLOBAL_VARIABLES collection (unless otherwise specified with collections kwarg). tf.initialize_all_variables() takes a look at this collection and creates an op that initializes variables listed.
To see GLOBAL_VARIABLES collection, you can use tf.get_collection with tf.GraphKeys.GLOBAL_VARIABLES as argument.
TL;DR Place init_op = tf.initialize_all_variables() after the graph was created.