Implementation of a WGAN-GP in tensorflow - tensorflow

Using tensorflow, I'm trying to reimplement the following architecture (for now I'm focusing on the Generator part):
What I've done for now has been defining the generator in the following way:
N_Z = 128
generator = [
tf.keras.layers.Dense(units=6144, activation="relu"),
tf.keras.layers.Reshape(target_shape=(6, 4, 256)),
tf.keras.layers.Conv2DTranspose(
filters=128, kernel_size=(5,5), strides=(2, 2), padding="SAME", activation="relu"
),
tf.keras.layers.Conv2DTranspose(
filters=128, kernel_size=(3,3), strides=(2, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Conv2DTranspose(
filters=64, kernel_size=(3,3), strides=(1, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Conv2DTranspose(
filters=64, kernel_size=(3,3), strides=(2, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Conv2DTranspose(
filters=32, kernel_size=(3,3), strides=(1, 1), padding="SAME", activation="relu"
)
tf.keras.layers.Conv2DTranspose(
filters=32, kernel_size=(3,3), strides=(2, 1), padding="SAME", activation="relu"
)
tf.keras.layers.Conv2DTranspose(
filters=1, kernel_size=(3,3), strides=(1, 1), padding="SAME", activation="relu"
)
]
Generator = tf.keras.models.Sequential(generator)
But if I take some random noise and let the model process it, this is the final shape I get back:
noise = tf.random.normal((64,128))
result = Generator(noise)
result.shape
TensorShape([64, 28, 28, 1])
What am I doing wrong here? I was also checking the original implementation to see additional details but I can't find anything that makes me understand.

It is easy you need to see input-output, it required some help at the top levels.
[ Sample ]:
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape=( 6144 )),
tf.keras.layers.Dense( 48 * 128, activation="linear" ),
tf.keras.layers.BatchNormalization( momentum=0.99, epsilon=0.00001 ),
tf.keras.layers.Reshape(target_shape=( 6, 4, 256 )),
tf.keras.layers.Conv2DTranspose(
filters=128, kernel_size=(5,5), strides=(2, 2), padding="same", activation="relu"
),
tf.keras.layers.Resizing( 11, 8, interpolation='bilinear', crop_to_aspect_ratio=False ),
tf.keras.layers.Reshape(target_shape=(11, 8, 128)),
tf.keras.layers.Conv2DTranspose(
filters=128, kernel_size=(3,3), strides=(2, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Resizing( 22, 8, interpolation='bilinear', crop_to_aspect_ratio=False ),
tf.keras.layers.Reshape(target_shape=(22, 8, 128)),
tf.keras.layers.Conv2DTranspose(
filters=64, kernel_size=(3,3), strides=(1, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Resizing( 22, 8, interpolation='bilinear', crop_to_aspect_ratio=False ),
tf.keras.layers.Reshape(target_shape=(22, 8, 64)),
tf.keras.layers.Conv2DTranspose(
filters=64, kernel_size=(3,3), strides=(2, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Resizing( 43, 8, interpolation='bilinear', crop_to_aspect_ratio=False ),
tf.keras.layers.Reshape(target_shape=(43, 8, 64)),
tf.keras.layers.Conv2DTranspose(
filters=32, kernel_size=(3,3), strides=(1, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Resizing( 43, 8, interpolation='bilinear', crop_to_aspect_ratio=False ),
tf.keras.layers.Reshape(target_shape=(43, 8, 32)),
tf.keras.layers.Conv2DTranspose(
filters=32, kernel_size=(3,3), strides=(2, 1), padding="SAME", activation="relu"
),
tf.keras.layers.Resizing( 85, 8, interpolation='bilinear', crop_to_aspect_ratio=False ),
tf.keras.layers.Reshape(target_shape=(85, 8, 32)),
])
model.summary()
[ Output ]:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 6144) 37754880
batch_normalization (BatchN (None, 6144) 24576
ormalization)
reshape (Reshape) (None, 6, 4, 256) 0
conv2d_transpose (Conv2DTra (None, 12, 8, 128) 819328
nspose)
resizing (Resizing) (None, 11, 8, 128) 0
reshape_1 (Reshape) (None, 11, 8, 128) 0
conv2d_transpose_1 (Conv2DT (None, 22, 8, 128) 147584
ranspose)
resizing_1 (Resizing) (None, 22, 8, 128) 0
reshape_2 (Reshape) (None, 22, 8, 128) 0
conv2d_transpose_2 (Conv2DT (None, 22, 8, 64) 73792
ranspose)
resizing_2 (Resizing) (None, 22, 8, 64) 0
reshape_3 (Reshape) (None, 22, 8, 64) 0
conv2d_transpose_3 (Conv2DT (None, 44, 8, 64) 36928
ranspose)
resizing_3 (Resizing) (None, 43, 8, 64) 0
reshape_4 (Reshape) (None, 43, 8, 64) 0
conv2d_transpose_4 (Conv2DT (None, 43, 8, 32) 18464
ranspose)
resizing_4 (Resizing) (None, 43, 8, 32) 0
reshape_5 (Reshape) (None, 43, 8, 32) 0
conv2d_transpose_5 (Conv2DT (None, 86, 8, 32) 9248
ranspose)
resizing_5 (Resizing) (None, 85, 8, 32) 0
reshape_6 (Reshape) (None, 85, 8, 32) 0
=================================================================
Total params: 38,884,800
Trainable params: 38,872,512
Non-trainable params: 12,288
_________________________________________________________________
2022-04-03 03:37:10.354570: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
(1, 85, 8, 32)
1/1 [==============================] - 2s 2s/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 0.0000e+00 - val_accuracy: 1.0000

Related

Using a model from TF Hub as a convolutional feature extractor

I would like to build a custom model on top of ResNet-50 extractor (its intermediate layer).
Here how I am trying to do that:
def get_model(img_shape, num_classes):
inputs = Input(shape=img_shape)
backbone = hub.load("https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3").signatures["image_feature_vector"]
x = backbone(inputs)['resnet_v2_50/block3/unit_1/bottleneck_v2/shortcut']
# Add a per-pixel classification layer
outputs = Conv2D(num_classes + 1, 3, activation="softmax", padding="same")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
This code works. However, it runs very long (7 min in Colab) and the model summary I get look like this:
input_14 (InputLayer) [(None, 224, 224, 3)] 0
tf_op_layer_StatefulPartiti [(None, 2048), 0
onedCall_16 (TensorFlowOpLa (None, 28, 28, 256),
yer) (None, 56, 56, 256),
(None, 56, 56, 64),
(None, 56, 56, 64),
(None, 56, 56, 256),
(None, 56, 56, 256),
(None, 56, 56, 256),
(None, 56, 56, 64),
(None, 56, 56, 64),
(None, 56, 56, 256),
(None, 28, 28, 256),
(None, 56, 56, 64),
(None, 28, 28, 64),
(None, 28, 28, 256),
(None, 14, 14, 512),
(None, 28, 28, 512),
(None, 28, 28, 128),
(None, 28, 28, 128),
(None, 28, 28, 512),
(None, 28, 28, 512),
(None, 28, 28, 512),
(None, 28, 28, 128),
(None, 28, 28, 128),
(None, 28, 28, 512),
(None, 28, 28, 512),
(None, 28, 28, 128),
(None, 28, 28, 128),
(None, 28, 28, 512),
(None, 14, 14, 512),
(None, 28, 28, 128),
(None, 14, 14, 128),
(None, 14, 14, 512),
(None, 7, 7, 1024),
(None, 14, 14, 1024),
(None, 14, 14, 256),
(None, 14, 14, 256),
(None, 14, 14, 1024),
(None, 14, 14, 1024),
(None, 14, 14, 1024),
(None, 14, 14, 256),
(None, 14, 14, 256),
(None, 14, 14, 1024),
(None, 14, 14, 1024),
(None, 14, 14, 256),
(None, 14, 14, 256),
(None, 14, 14, 1024),
(None, 14, 14, 1024),
(None, 14, 14, 256),
(None, 14, 14, 256),
(None, 14, 14, 1024),
(None, 14, 14, 1024),
(None, 14, 14, 256),
(None, 14, 14, 256),
(None, 14, 14, 1024),
(None, 7, 7, 1024),
(None, 14, 14, 256),
(None, 7, 7, 256),
(None, 7, 7, 1024),
(None, 7, 7, 2048),
(None, 7, 7, 2048),
(None, 7, 7, 512),
(None, 7, 7, 512),
(None, 7, 7, 2048),
(None, 7, 7, 2048),
(None, 7, 7, 2048),
(None, 7, 7, 512),
(None, 7, 7, 512),
(None, 7, 7, 2048),
(None, 7, 7, 2048),
(None, 7, 7, 512),
(None, 7, 7, 512),
(None, 7, 7, 2048),
(None, 112, 112, 64),
(None, 1, 1, 2048)]
conv2d_3 (Conv2D) (None, 14, 14, 2) 18434
To me that looks like a hanging outputs from all intermediate layers of ResNet. Also, the execution time is somewhat too long for instantiation such a simple model.
How to avoid getting list of tensors as TensorFlowOpLayer and just link a single output to the rest of the model?

How can I write DSSIM+MAE loss function for model training

How can I write code DSSIM+MAE loss function from formula:
Loss = αMAE + (1-α)DSSIM
with
Mean Absolute Error(MAE) = (1/M) * ∑|yi – xi|
DSSIM = 1-SSIM
SSIM = (numerator1 * numerator2) / (denominator1 * denominator2)
numerator1 = 2 * μ12 + C1 #μ12 = μ1 * μ2
numerator2 = 2 * σ12 + C2
denominator1 = μ1_sq + μ2_sq + C1
denominator2 = σ1_sq + σ2_sq + C2
where α is a trade-off parameter between MAE and DSSIM,
M is the total number of pixels in the image,
μ is the mean value of the image,
σ is the standard variation of the image, and σx,y is the covariance of x and y two images. c1 and c2 are two variables that stabilize
the division with a weak denominator. In our implementation,
I set α = 0.75, c1 = (0.01L)^2 and c2 = (0.03L)^2
where L is the dynamic range of the pixel values in the image.
So, this is my code
def custom_loss (y_true,y_pred):
M = 512 #M = total number of pixels in the sCT image
sum = 0
y_pred = tf.cast(y_pred, tf.int32)
y_true = tf.cast(y_true, tf.int32)
print(y_pred.shape)
print(y_true.shape)
y_pred = y_pred[0]
y_true = y_true[0]
for i in range(n):
sum = sum+abs(y_true[i] - y_pred[i])
my_mae = sum / n
dssim = tf.reduce_mean((1 - tf.image.ssim(y_true,y_pred, max_val=512,
filter_size=11,filter_sigma=1.5, k1=0.01, k2=0.03)) / 2)
my_mae = tf.cast(my_mae, tf.float32)
return (0.75*my_mae) + (1 - 0.75*dssim)
it have error when I run
model.compile(optimizer='adam',loss= custom_loss,metrics=['accuracy'])
error is
Traceback (most recent call last):
File "C:/Users/CRA01/Desktop/Unet/custom loss.py", line 85, in
history = model.fit(Training_CBCT_dataset,Training_pCT_dataset,validation_split=0.2,batch_size=1, epochs=5, callbacks=[model_save_callback])
File "C:\Users\CRA01\miniconda3\envs\tf_2.9\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\CRA01\AppData\Local\Temp_autograph_generated_fileqb1dimxg.py", line 15, in tf__train_function
retval = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
ValueError: in user code:
File "C:\Users\CRA01\miniconda3\envs\tf_2.9\lib\site-packages\keras\engine\training.py", line 1051, in train_function *
return step_function(self, iterator)
File "C:\Users\CRA01\miniconda3\envs\tf_2.9\lib\site-packages\keras\engine\training.py", line 1040, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\CRA01\miniconda3\envs\tf_2.9\lib\site-packages\keras\engine\training.py", line 1030, in run_step **
outputs = model.train_step(data)
File "C:\Users\CRA01\miniconda3\envs\tf_2.9\lib\site-packages\keras\engine\training.py", line 893, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "C:\Users\CRA01\miniconda3\envs\tf_2.9\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 539, in minimize
return self.apply_gradients(grads_and_vars, name=name)
File "C:\Users\CRA01\miniconda3\envs\tf_2.9\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 640, in apply_gradients
grads_and_vars = optimizer_utils.filter_empty_gradients(grads_and_vars)
File "C:\Users\CRA01\miniconda3\envs\tf_2.9\lib\site-packages\keras\optimizers\optimizer_v2\utils.py", line 73, in filter_empty_gradients
raise ValueError(f"No gradients provided for any variable: {variable}. "
ValueError: No gradients provided for any variable: (['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'conv2d_2/kernel:0', 'conv2d_2/bias:0', 'conv2d_3/kernel:0', 'conv2d_3/bias:0', 'conv2d_4/kernel:0', 'conv2d_4/bias:0', 'conv2d_5/kernel:0', 'conv2d_5/bias:0', 'conv2d_6/kernel:0', 'conv2d_6/bias:0', 'conv2d_7/kernel:0', 'conv2d_7/bias:0', 'conv2d_8/kernel:0', 'conv2d_8/bias:0', 'conv2d_9/kernel:0', 'conv2d_9/bias:0', 'conv2d_10/kernel:0', 'conv2d_10/bias:0', 'conv2d_11/kernel:0', 'conv2d_11/bias:0', 'conv2d_12/kernel:0', 'conv2d_12/bias:0', 'conv2d_13/kernel:0', 'conv2d_13/bias:0', 'conv2d_14/kernel:0', 'conv2d_14/bias:0', 'conv2d_15/kernel:0', 'conv2d_15/bias:0', 'conv2d_16/kernel:0', 'conv2d_16/bias:0', 'conv2d_17/kernel:0', 'conv2d_17/bias:0', 'conv2d_18/kernel:0', 'conv2d_18/bias:0', 'conv2d_19/kernel:0', 'conv2d_19/bias:0', 'conv2d_20/kernel:0', 'conv2d_20/bias:0', 'conv2d_21/kernel:0', 'conv2d_21/bias:0', 'conv2d_22/kernel:0', 'conv2d_22/bias:0', 'conv2d_23/kernel:0', 'conv2d_23/bias:0', 'conv2d_24/kernel:0', 'conv2d_24/bias:0', 'conv2d_25/kernel:0', 'conv2d_25/bias:0', 'conv2d_26/kernel:0', 'conv2d_26/bias:0'],). Provided `grads_and_vars` is ((None, <tf.Variable 'conv2d/kernel:0' shape=(3, 3, 1, 32) dtype=float32>), (None, <tf.Variable 'conv2d/bias:0' shape=(32,) dtype=float32>), (None, <tf.Variable 'conv2d_1/kernel:0' shape=(3, 3, 32, 32) dtype=float32>), (None, <tf.Variable 'conv2d_1/bias:0' shape=(32,) dtype=float32>), (None, <tf.Variable 'conv2d_2/kernel:0' shape=(3, 3, 32, 64) dtype=float32>), (None, <tf.Variable 'conv2d_2/bias:0' shape=(64,) dtype=float32>), (None, <tf.Variable 'conv2d_3/kernel:0' shape=(3, 3, 64, 64) dtype=float32>), (None, <tf.Variable 'conv2d_3/bias:0' shape=(64,) dtype=float32>), (None, <tf.Variable 'conv2d_4/kernel:0' shape=(3, 3, 64, 128) dtype=float32>), (None, <tf.Variable 'conv2d_4/bias:0' shape=(128,) dtype=float32>), (None, <tf.Variable 'conv2d_5/kernel:0' shape=(3, 3, 128, 128) dtype=float32>), (None, <tf.Variable 'conv2d_5/bias:0' shape=(128,) dtype=float32>), (None, <tf.Variable 'conv2d_6/kernel:0' shape=(3, 3, 128, 256) dtype=float32>), (None, <tf.Variable 'conv2d_6/bias:0' shape=(256,) dtype=float32>), (None, <tf.Variable 'conv2d_7/kernel:0' shape=(3, 3, 256, 256) dtype=float32>), (None, <tf.Variable 'conv2d_7/bias:0' shape=(256,) dtype=float32>), (None, <tf.Variable 'conv2d_8/kernel:0' shape=(3, 3, 256, 512) dtype=float32>), (None, <tf.Variable 'conv2d_8/bias:0' shape=(512,) dtype=float32>), (None, <tf.Variable 'conv2d_9/kernel:0' shape=(3, 3, 512, 512) dtype=float32>), (None, <tf.Variable 'conv2d_9/bias:0' shape=(512,) dtype=float32>), (None, <tf.Variable 'conv2d_10/kernel:0' shape=(3, 3, 512, 1024) dtype=float32>), (None, <tf.Variable 'conv2d_10/bias:0' shape=(1024,) dtype=float32>), (None, <tf.Variable 'conv2d_11/kernel:0' shape=(3, 3, 1024, 1024) dtype=float32>), (None, <tf.Variable 'conv2d_11/bias:0' shape=(1024,) dtype=float32>), (None, <tf.Variable 'conv2d_12/kernel:0' shape=(3, 3, 1024, 2048) dtype=float32>), (None, <tf.Variable 'conv2d_12/bias:0' shape=(2048,) dtype=float32>), (None, <tf.Variable 'conv2d_13/kernel:0' shape=(3, 3, 2048, 2048) dtype=float32>), (None, <tf.Variable 'conv2d_13/bias:0' shape=(2048,) dtype=float32>), (None, <tf.Variable 'conv2d_14/kernel:0' shape=(3, 3, 3072, 1024) dtype=float32>), (None, <tf.Variable 'conv2d_14/bias:0' shape=(1024,) dtype=float32>), (None, <tf.Variable 'conv2d_15/kernel:0' shape=(3, 3, 1024, 1024) dtype=float32>), (None, <tf.Variable 'conv2d_15/bias:0' shape=(1024,) dtype=float32>), (None, <tf.Variable 'conv2d_16/kernel:0' shape=(3, 3, 1536, 512) dtype=float32>), (None, <tf.Variable 'conv2d_16/bias:0' shape=(512,) dtype=float32>), (None, <tf.Variable 'conv2d_17/kernel:0' shape=(3, 3, 512, 512) dtype=float32>), (None, <tf.Variable 'conv2d_17/bias:0' shape=(512,) dtype=float32>), (None, <tf.Variable 'conv2d_18/kernel:0' shape=(3, 3, 768, 256) dtype=float32>), (None, <tf.Variable 'conv2d_18/bias:0' shape=(256,) dtype=float32>), (None, <tf.Variable 'conv2d_19/kernel:0' shape=(3, 3, 256, 256) dtype=float32>), (None, <tf.Variable 'conv2d_19/bias:0' shape=(256,) dtype=float32>), (None, <tf.Variable 'conv2d_20/kernel:0' shape=(3, 3, 384, 128) dtype=float32>), (None, <tf.Variable 'conv2d_20/bias:0' shape=(128,) dtype=float32>), (None, <tf.Variable 'conv2d_21/kernel:0' shape=(3, 3, 128, 128) dtype=float32>), (None, <tf.Variable 'conv2d_21/bias:0' shape=(128,) dtype=float32>), (None, <tf.Variable 'conv2d_22/kernel:0' shape=(3, 3, 192, 64) dtype=float32>), (None, <tf.Variable 'conv2d_22/bias:0' shape=(64,) dtype=float32>), (None, <tf.Variable 'conv2d_23/kernel:0' shape=(3, 3, 64, 64) dtype=float32>), (None, <tf.Variable 'conv2d_23/bias:0' shape=(64,) dtype=float32>), (None, <tf.Variable 'conv2d_24/kernel:0' shape=(3, 3, 96, 32) dtype=float32>), (None, <tf.Variable 'conv2d_24/bias:0' shape=(32,) dtype=float32>), (None, <tf.Variable 'conv2d_25/kernel:0' shape=(3, 3, 32, 32) dtype=float32>), (None, <tf.Variable 'conv2d_25/bias:0' shape=(32,) dtype=float32>), (None, <tf.Variable 'conv2d_26/kernel:0' shape=(1, 1, 32, 1) dtype=float32>), (None, <tf.Variable 'conv2d_26/bias:0' shape=(1,) dtype=float32>)).

Out of memory while running on tensorflow-gpu

I'm training a model on gpu RTX3060 with 6GB memory ,
tensorflow 2.4,
cuda 11.0 and cudnn 8.0.4.
I'm facing this problem dispite the fact that I'm using only batchsize=2 ( even 1 fails )
-09-18 11:27:14.053184: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Sum Total of in-use chunks: 979.49MiB
2021-09-18 11:27:14.053190: I tensorflow/core/common_runtime/bfc_allocator.cc:1042] total_region_allocated_bytes_: 1081081856 memory_limit_: 4963174976 available bytes: 3882093120 curr_region_allocation_bytes_: 4294967296
2021-09-18 11:27:14.053200: I tensorflow/core/common_runtime/bfc_allocator.cc:1048] Stats:
Limit: 4963174976
InUse: 1027070208
MaxInUse: 2221276928
NumAllocs: 73401
MaxAllocSize: 1234173952
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2021-09-18 11:27:14.053218: W tensorflow/core/common_runtime/bfc_allocator.cc:441] **********************************************____**************************************************
2021-09-18 11:27:14.053235: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_ops_common.h:128 : Resource exhausted: OOM when allocating tensor with shape[2,128,128,128,16] and type bool on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
I can't solve the problem , I have set
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
but it doesn't work, I have the necessary memory but it fails to run.
Can somebody help me?
Edit :
I'm using a 3D-Unet model , here is a summary of my model
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 128, 128, 12 0
__________________________________________________________________________________________________
conv3d (Conv3D) (None, 128, 128, 128 1312 input_1[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, 128, 128, 128 0 conv3d[0][0]
__________________________________________________________________________________________________
conv3d_1 (Conv3D) (None, 128, 128, 128 6928 dropout[0][0]
__________________________________________________________________________________________________
max_pooling3d (MaxPooling3D) (None, 64, 64, 64, 1 0 conv3d_1[0][0]
__________________________________________________________________________________________________
conv3d_2 (Conv3D) (None, 64, 64, 64, 3 13856 max_pooling3d[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 64, 64, 64, 3 0 conv3d_2[0][0]
__________________________________________________________________________________________________
conv3d_3 (Conv3D) (None, 64, 64, 64, 3 27680 dropout_1[0][0]
__________________________________________________________________________________________________
max_pooling3d_1 (MaxPooling3D) (None, 32, 32, 32, 3 0 conv3d_3[0][0]
__________________________________________________________________________________________________
conv3d_4 (Conv3D) (None, 32, 32, 32, 6 55360 max_pooling3d_1[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) (None, 32, 32, 32, 6 0 conv3d_4[0][0]
__________________________________________________________________________________________________
conv3d_5 (Conv3D) (None, 32, 32, 32, 6 110656 dropout_2[0][0]
__________________________________________________________________________________________________
max_pooling3d_2 (MaxPooling3D) (None, 16, 16, 16, 6 0 conv3d_5[0][0]
__________________________________________________________________________________________________
conv3d_6 (Conv3D) (None, 16, 16, 16, 1 221312 max_pooling3d_2[0][0]
__________________________________________________________________________________________________
dropout_3 (Dropout) (None, 16, 16, 16, 1 0 conv3d_6[0][0]
__________________________________________________________________________________________________
conv3d_7 (Conv3D) (None, 16, 16, 16, 1 442496 dropout_3[0][0]
__________________________________________________________________________________________________
max_pooling3d_3 (MaxPooling3D) (None, 8, 8, 8, 128) 0 conv3d_7[0][0]
__________________________________________________________________________________________________
conv3d_8 (Conv3D) (None, 8, 8, 8, 256) 884992 max_pooling3d_3[0][0]
__________________________________________________________________________________________________
dropout_4 (Dropout) (None, 8, 8, 8, 256) 0 conv3d_8[0][0]
__________________________________________________________________________________________________
conv3d_9 (Conv3D) (None, 8, 8, 8, 256) 1769728 dropout_4[0][0]
__________________________________________________________________________________________________
conv3d_transpose (Conv3DTranspo (None, 16, 16, 16, 1 262272 conv3d_9[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 16, 16, 16, 2 0 conv3d_transpose[0][0]
conv3d_7[0][0]
__________________________________________________________________________________________________
conv3d_10 (Conv3D) (None, 16, 16, 16, 1 884864 concatenate[0][0]
__________________________________________________________________________________________________
dropout_5 (Dropout) (None, 16, 16, 16, 1 0 conv3d_10[0][0]
__________________________________________________________________________________________________
conv3d_11 (Conv3D) (None, 16, 16, 16, 1 442496 dropout_5[0][0]
__________________________________________________________________________________________________
conv3d_transpose_1 (Conv3DTrans (None, 32, 32, 32, 6 65600 conv3d_11[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 32, 32, 32, 1 0 conv3d_transpose_1[0][0]
conv3d_5[0][0]
__________________________________________________________________________________________________
conv3d_12 (Conv3D) (None, 32, 32, 32, 6 221248 concatenate_1[0][0]
__________________________________________________________________________________________________
dropout_6 (Dropout) (None, 32, 32, 32, 6 0 conv3d_12[0][0]
__________________________________________________________________________________________________
conv3d_13 (Conv3D) (None, 32, 32, 32, 6 110656 dropout_6[0][0]
__________________________________________________________________________________________________
conv3d_transpose_2 (Conv3DTrans (None, 64, 64, 64, 3 16416 conv3d_13[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, 64, 64, 64, 6 0 conv3d_transpose_2[0][0]
conv3d_3[0][0]
__________________________________________________________________________________________________
conv3d_14 (Conv3D) (None, 64, 64, 64, 3 55328 concatenate_2[0][0]
__________________________________________________________________________________________________
dropout_7 (Dropout) (None, 64, 64, 64, 3 0 conv3d_14[0][0]
__________________________________________________________________________________________________
conv3d_15 (Conv3D) (None, 64, 64, 64, 3 27680 dropout_7[0][0]
__________________________________________________________________________________________________
conv3d_transpose_3 (Conv3DTrans (None, 128, 128, 128 4112 conv3d_15[0][0]
__________________________________________________________________________________________________
concatenate_3 (Concatenate) (None, 128, 128, 128 0 conv3d_transpose_3[0][0]
conv3d_1[0][0]
__________________________________________________________________________________________________
conv3d_16 (Conv3D) (None, 128, 128, 128 13840 concatenate_3[0][0]
__________________________________________________________________________________________________
dropout_8 (Dropout) (None, 128, 128, 128 0 conv3d_16[0][0]
__________________________________________________________________________________________________
conv3d_17 (Conv3D) (None, 128, 128, 128 6928 dropout_8[0][0]
__________________________________________________________________________________________________
conv3d_18 (Conv3D) (None, 128, 128, 128 68 conv3d_17[0][0]
==================================================================================================
Total params: 5,645,828
Trainable params: 5,645,828
Non-trainable params: 0
__________________________________________________________________________________________________
Image shape : (None, 128, 128, 128, 3)
Mask shape :(None, 128, 128, 128, 4)

Upsampling using 3d_transposed_convolution layers

Suppose I have a 4D tensor x from a previous layer with shape [2, 2, 7, 7, 64] where batch = 2, depth = 2, height = 7, width = 7, and in_channels = 64.
And I'd like to upsample it to a tensor with shape [2, 4, 14, 14, 32].
Maybe next steps are transferring it with shape like [2, 8, 28, 28, 16] and [2, 16, 112, 112, 1] and so on.
I'm new to Tensorflow and I know that the implementations of transposed convolution between CAFFE and Tensorflow are different. I mean, in CAFFE, you can define the size of output by changing the strides of kernel. However, it's more complicated in tensorflow.
So how can I do that with tf.layers.conv3d_transpose or tf.nn.conv3d_transpose?
Would anyone give me a hand? Thanks!
You can do the upsampling with both tf.layers.conv3d_transpose and tf.nn.conv3d_transpose.
Lets consider your input tensor as:
input_layer = tf.placeholder(tf.float32, (2, 2, 7, 7, 64)) # batch, depth, height, width, in_channels
With tf.nn.conv3d_transpose we need to take care of the creation of the variables (weights and bias):
def conv3d_transpose(name, l_input, w, b, output_shape, stride=1):
transp_conv = tf.nn.conv3d_transpose(l_input, w, output_shape, strides=[1, stride, stride, stride, 1], padding='SAME')
return tf.nn.bias_add(transp_conv, b, name=name)
# Create variables for the operation
with tf.device('/cpu:0'):
# weights will have the shape [depth, height, width, output_channels, in_channels]
weights = tf.get_variable(name='w_transp_conv', shape=[3, 3, 3, 32, 64])
bias = tf.get_variable(name='b_transp_conv', shape=[32])
t_conv_layer = conv3d_transpose('t_conv_layer', input_layer, weights, bias,
output_shape=[2, 4, 14, 14, 32], stride=2)
print(t_conv_layer)
# Tensor("t_conv_layer:0", shape=(2, 4, 14, 14, 32), dtype=float32)
With tf.layers.conv3d_transpose, which will take care of the creation of both weights and bias, we use the same input tensor input_layer:
t_conv_layer2 = tf.layers.conv3d_transpose(input_layer, filters=32, kernel_size=[3, 3, 3],
strides=(2, 2, 2), padding='SAME', name='t_conv_layer2')
print(t_conv_layer2)
# Tensor("t_conv_layer2/Reshape_1:0", shape=(2, 4, 14, 14, 32), dtype=float32)
To get the other upsampled tensors you can repeat this procedure by changing the strides as you need:
Example with tf.layers.conv3d_transpose:
t_conv_layer3 = tf.layers.conv3d_transpose(t_conv_layer2, filters=16, kernel_size=[3, 3, 3],
strides=(2, 2, 2), padding='SAME', name='t_conv_layer3')
t_conv_layer4 = tf.layers.conv3d_transpose(t_conv_layer3, filters=8, kernel_size=[3, 3, 3],
strides=(2, 2, 2), padding='SAME', name='t_conv_layer4')
t_conv_layer5 = tf.layers.conv3d_transpose(t_conv_layer4, filters=1, kernel_size=[3, 3, 3],
strides=(1, 2, 2), padding='SAME', name='t_conv_layer5')
print(t_conv_layer5)
# Tensor("t_conv_layer5/Reshape_1:0", shape=(2, 16, 112, 112, 1), dtype=float32)
Note: since tf.nn.conv3d_transpose is actually the gradient of tf.nn.conv3d, you can make sure that the variable output_shape is correct, by considering the forward operation with tf.nn.conv3d.
def print_expected(weights, shape, stride=1):
output = tf.constant(0.1, shape=shape)
expected_layer = tf.nn.conv3d(output, weights, strides=[1, stride, stride, stride, 1], padding='SAME')
print("Expected shape of input layer when considering the output shape ({} and stride {}): {}".format(shape, stride, expected_layer.get_shape()))
Therefore, to produce a transposed convolution with shape [2, 4, 14, 14, 32], we can check, for example, strides 1 and 2:
print_expected(weights, shape=[2, 4, 14, 14, 32], stride=1)
print_expected(weights, shape=[2, 4, 14, 14, 32], stride=2)
which prints and confirms that the second option (using stride 2) is the right one to produce a tensor with our desired shape:
Expected shape of input layer when considering the output shape ([2, 4, 14, 14, 32] and stride 1): (2, 4, 14, 14, 64)
Expected shape of input layer when considering the output shape ([2, 4, 14, 14, 32] and stride 2): (2, 2, 7, 7, 64)

Text recognition model improvement suggestions

I am trying to make a text recognition model using CNNs and LSTMs with CTC Loss. My current model looks like this where the numbers in bracket are the shape of tensor after each layer. I have a vocabulary size of 94 and size of my input image is (64x1024). This model is improving very slowly and I need some thought if I can change some thing. Thanks :)
Input: (?, 64, 1024, 1)
cnn-1: [None, 64, 1024, 64]
bn-1: [None, 64, 1024, 64]
relu-1: [None, 64, 1024, 64]
maxpool-1: [None, 32, 512, 64]
cnn-2: [None, 32, 512, 128]
bn-2: [None, 32, 512, 128]
relu-2: [None, 32, 512, 128]
maxpool-2: [None, 16, 256, 128]
cnn-3: [None, 16, 256, 128]
bn-3: [None, 16, 256, 128]
relu-3: [None, 16, 256, 128]
maxpool-3: [None, 8, 128, 128]
cnn-4: [None, 8, 128, 256]
bn-4: [None, 8, 128, 256]
relu-4: [None, 8, 128, 256]
maxpool-4: [None, 4, 64, 256]
cnn-5: [None, 4, 64, 256]
bn-5: [None, 4, 64, 256]
relu-5: [None, 4, 64, 256]
maxpool-5: [None, 2, 32, 256]
lstm-input: [None, 32, 512]
lstm-output: [None, 32, 64]
lstm-output-reshaped: [None, 64]
fully-connected: [None, 94]
reshaped_logits: [None, None, 94]
transposed_logits: [None, None, 94]