Dynamic output shape is incorrect/ not same as static - tensorflow

I am trying to implement a patch creation function with using tensorflow's extract_image_patches function but dynamic output shape is not same as my expectation.
Let me tell briefly what it does. Input shape is supposed to be
6000x4000. We first find its greatest common denominator. It turns out it is 3. then we pass '64' argument to our function to create patches with size of 3x64,2x64=192,128. This returns us 31x31 distinct patches. Everything works ok with static output, but when it comes to dynamic output things are not ok. I could not find which part caused a different dynamic output.
# input_shape_inbuild: (None, 6000, 4000, 1)
# ---LAYER---
# Input Size: (None, 6000, 4000, 1)
# Patch Size: (x,y) = 192, 128
# Aspect ratio: (3, 2)
!wget https://www.fujifilm.com/products/digital_cameras/x/fujifilm_x_pro2/sample_images/img/index/ff_x_pro2_001.JPG
img = cv2.imread('ff_x_pro2_001.JPG', 0)
img = tf.reshape(img, [1,img.shape[0],img.shape[1],1])
***tensorflow takes images as (y, x)
so a 6000x4000 im is given as tf.func(4000, 6000)
# Here I define custom layer in tensorflow.
class create_patches(Layer):
def __init__(self, patchMultiplier):
super(create_patches, self).__init__()
self.patchMultiplier = patchMultiplier
def build(self, input_shape):
print('input_shape_inbuild: ', input_shape)
def aspect_ratio(width, height):
#find greatest common divider of input_shape
def gcd(x, y):
while y != 0:
(x, y) = (y, x % y)
return x
r = gcd(width, height)
x = int(width/r)
y = int(height/r)
return x, y
self.aspect_ratio = aspect_ratio(input_shape[1], input_shape[2])
self.patchSize_x = self.aspect_ratio[0] * self.patchMultiplier
self.patchSize_y = self.aspect_ratio[1] * self.patchMultiplier
def call(self, inputs):
print('---LAYER---')
print('Input Size:', inputs._keras_shape)
print('Patch Size: (x,y) = {}, {}'.format(self.patchSize_x, self.patchSize_y))
print('Aspect ratio: {}'.format(self.aspect_ratio))
print('---LAYER---')
#call tf.extract_image_patches to return it.
out = tf.extract_image_patches(images=inputs,
ksizes=[1, self.patchSize_y, self.patchSize_x, 1],
strides=[1, self.patchSize_y, self.patchSize_x, 1],
rates=[1, 1, 1, 1],
padding='VALID')
return out
def compute_output_shape(self, input_shape):
"""
ksize_cols = patchSize_x
ksize_rows = patchSize_y
"""
#output shape=[batch, out_rows, out_cols, ksize_rows * ksize_cols * depth]
"""
shape = (self.patchSize_x, self.patchSize_y,
(input_shape[1]/self.patchSize_x) * (input_shape[2]/self.patchSize_y))
"""
shape =(input_shape[0],
input_shape[1]/self.patchSize_x, # patch row count
input_shape[2]/self.patchSize_y, # patch col count
self.patchSize_x * self.patchSize_y) # patch pixel count
return shape
#here is input with 6000x4000 pixels.
input_shape_1 = Input(shape=(6000, 4000, 1))
#here I fed input to my custom layer.
x1 = create_patches(64)(input_shape_1)
print('Output shape: ', x1.shape)
# here I build a model to see static output
f = K.function([input_shape_1], [x1])
import numpy as np
#result = f([np.random.randint(256, size=(1,4000,6000,1))])
result = f([img])
#print(result)
result = np.array(result)
# [batch, out_rows, out_cols, ksize_rows * ksize_cols * depth]
# Result shape: (1, 1, 125, 125, 1536)
print('Result shape: ', result.shape, '\n\n')
#print(result[:, :, :, 0].shape)
here is output I get.
input_shape_inbuild: (None, 6000, 4000, 1)
---LAYER---
Input Size: (None, 6000, 4000, 1)
Patch Size: (x,y) = 192, 128
Aspect ratio: (3, 2)
---LAYER---
Output shape: (?, 46, 20, 24576)
Result shape: (1, 1, 31, 31, 24576)
#####Result Shape is as I expected but at output shape I could not resolve where 46 and 20 come from. Could you tell me why it is like this?

Related

output from tf.tensordot(x,y,2)

I try to understand the collaborative filter recommender
One particular part is about tf.tensordot(x,y,2) in the following code:
def call(self, inputs):
user_vector = self.user_embedding(inputs[:, 0])
user_bias = self.user_bias(inputs[:, 0])
movie_vector = self.movie_embedding(inputs[:, 1])
movie_bias = self.movie_bias(inputs[:, 1])
dot_user_movie = tf.tensordot(user_vector, movie_vector, 2)
# Add all the components (including bias)
x = dot_user_movie + user_bias + movie_bias
# The sigmoid activation forces the rating to between 0 and 1
return tf.nn.sigmoid(x)
dot_user_movie.shape is () but x.shape is (None, 1). Why? I expect dot_user_movie.shape be (None, 1) as well.
What I really want to do is to remove bias terms. But it will give me some error if I do the following:
x = dot_user_movie
return tf.nn.sigmoid(x)
I try to mimic the above calculation with explicit values:
emb = layers.Embedding(20,4)
bias = layers.Embedding(20, 1)
x1 = emb(np.array([0,2,1]))
y1 = emb(np.array([3,5,1]))
dot_x1_y1 = tf.tensordot(x1, y1, 2)
#<tf.Tensor: shape=(), dtype=float32, numpy=0.0065868925>
bias(np.array([0,2,1]))
dot_x1_y1+bias
## this will cause some error ValueError: Attempt to convert a value
## (<keras.layers.core.embedding.Embedding object at0x7f7211a59640>)
## with an unsupported type (<class'keras.layers.core.embedding.Embedding'>)
## to a Tensor.

Can't call numpy() on Tensor that requires grad while defining convolute method

I am using the following snippet while defining the convolution method in python
I am trying to implement the convolution code which is implicitly defined by the library in python.
def conv(x, in_channels, out_channels, kernel_size, stride, padding, weight, bias):
"""
Args:
x: torch tensor with size (N, C_in, H_in, W_in),
in_channels: number of channels in the input image, it is C_in;
out_channels: number of channels produced by the convolution;
kernel_size: size of onvolving kernel,
stride: stride of the convolution,
padding: implicit zero padding to be added on both sides of each dimension,
Return:
y: torch tensor of size (N, C_out, H_out, W_out)
"""
y = None
N, C_in, H_in, W_in = x.shape
n_h = int((H_in - kernel_size + 2*padding)/stride) + 1
n_w = int((W_in - kernel_size + 2*padding)/stride) + 1
y = np.zeros((N, C_in, n_h, n_w))
x_pad = np.pad(array = x, pad_width = ((0,0),(padding,padding), (padding,padding), (0,0)), mode = 'constant', constant_values = 0)
for i in range(N):
for h in range(n_h):
for w in range(n_w):
for c in range(C_in):
w_start = w * stride
w_end = w_start + kernel_size
h_start = h * stride
h_end = h_start + kernel_size
conv = np.multiply(x[i, :, h_start:h_end, w_start:w_end], weight[:,c,:,:])
y[i,c,h,w] = np.sum(conv) + bias[:,c,:,:]
return y
I am making the following method call:
conv(x,in_channels=3,
out_channels=6,
kernel_size=3,
stride=1,
padding=0,
weight=torch_conv.weight,
bias=torch_conv.bias)
Shape of x: torch.Size([2, 3, 32, 32])
Shape of weight: torch.Size([6, 3, 3, 3])
Shape of bias: torch.Size([6])
However, I am getting the following error:
RuntimeError Traceback (most recent call last)
<ipython-input-113-d2d90ada6b93> in <module>
5 padding=0,
6 weight=torch_conv.weight,
----> 7 bias=torch_conv.bias)
1 frames
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in __array__(self, dtype)
755 return handle_torch_function(Tensor.__array__, (self,), self, dtype=dtype)
756 if dtype is None:
--> 757 return self.numpy()
758 else:
759 return self.numpy().astype(dtype, copy=False)
RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
Can someone please help with this?

Tensorflow YOLO Object Detection Loss Exploding

I am trying to implement and train YOLO on my own, based on this implementation https://github.com/allanzelener/YAD2K/. The problems I am having is the width/height values in my prediction tensor are exploding and I never see an IOU above 0 between my prediction objects and ground truth objects. This all goes wrong within the first few minibatches of the first epoch. The loss and most of my prediction width/heights are nan.
My image size is 416x416, I'm using 5 anchors, and have 5 classes. I'm dividing the image into a 13x13 grid for a prediction tensor of [batch_size, 13, 13, 5, 10]. The ground truths for each batch are [batch_size, 13, 13, 5, 5], without one hot for the class probabilities.
Below is my loss function (based on https://github.com/allanzelener/YAD2K/blob/master/yad2k/models/keras_yolo.py#L152), which passes the image to my model and then calls predict_transform which reshapes the tensor and transforms the coordinates.
def loss_custom(true_box_grid, x):
# training=training is needed only if there are layers with different
# behavior during training versus inference (e.g. Dropout).
y_ = model(x, training=training)
# (batch, rows, cols, anchors, vals)
center_coords, wh_coords, obj_scores, class_probs = DetectNet.predict_transform(y_)
detector_mask = create_mask(true_box_grid)
total_loss = 0
pred_wh_half = wh_coords / 2.
# bottom left corner
pred_mins = center_coords - pred_wh_half
# top right corner
pred_maxes = center_coords + pred_wh_half
true_xy = true_box_grid[..., 0:2]
true_wh = true_box_grid[..., 2:4]
true_wh_half = true_wh / 2.
true_mins = true_xy - true_wh_half
true_maxes = true_xy + true_wh_half
# max bottom left corner
intersect_mins = tf.math.maximum(pred_mins, true_mins)
# min top right corner
intersect_maxes = tf.math.minimum(pred_maxes, true_maxes)
intersect_wh = tf.math.maximum(intersect_maxes - intersect_mins, 0.)
# product of difference between x max and x min, y max and y min
intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]
pred_areas = wh_coords[..., 0] * wh_coords[..., 1]
true_areas = true_wh[..., 0] * true_wh[..., 1]
union_areas = pred_areas + true_areas - intersect_areas
iou_scores = intersect_areas / union_areas
# Best IOUs for each location.
iou_scores = tf.expand_dims(iou_scores, 4)
best_ious = tf.keras.backend.max(iou_scores, axis=4) # Best IOU scores.
best_ious = tf.expand_dims(best_ious, 4)
# A detector has found an object if IOU > thresh for some true box.
object_detections = tf.keras.backend.cast(best_ious > 0.6, dtype=tf.float32)
no_obj_weights = params.noobj_loss_weight * (1 - object_detections) * (1 - detector_mask[...,:1])
no_obj_loss = no_obj_weights * tf.math.square(obj_scores)
# could use weight here on obj loss
obj_conf_loss = params.obj_loss_weight * detector_mask[...,:1] * tf.math.square(1 - obj_scores)
conf_loss = no_obj_loss + obj_conf_loss
matching_classes = tf.cast(true_box_grid[...,4], tf.int32)
matching_classes = tf.one_hot(matching_classes, params.num_classes)
class_loss = detector_mask[..., :1] * tf.math.square(matching_classes - class_probs)
# keras_yolo does a sigmoid on center_coords here but they should already be between 0 and 1 from predict_transform
pred_boxes = tf.concat([center_coords, wh_coords], axis=-1)
matching_boxes = true_box_grid[..., :4]
coord_loss = params.coord_loss_weight * detector_mask[..., :1] * tf.math.square(matching_boxes - pred_boxes)
confidence_loss_sum = tf.keras.backend.sum(conf_loss)
classification_loss_sum = tf.keras.backend.sum(class_loss)
coordinates_loss_sum = tf.keras.backend.sum(coord_loss)
# not sure why .5 is here, maybe to make sure numbers don't get too large
total_loss = 0.5 * (confidence_loss_sum + classification_loss_sum + coordinates_loss_sum)
return total_loss
Below is predict_transform (based on https://github.com/allanzelener/YAD2K/blob/master/yad2k/models/keras_yolo.py#L66) which reshapes the prediction tensor into a grid in order to compare with the ground truth objects. For the center coordinates, object scores, and class probabilities it does a sigmoid or softmax.
For the width height coordinates it performs the exponential operation on them (to make them positive) and multiplies them by the anchors. This seems to be where they start exploding.
def predict_transform(predictions):
predictions = tf.reshape(predictions, [-1, params.grid_height, params.grid_width, params.num_anchors, params.pred_vec_len])
conv_dims = predictions.shape[1:3]
conv_height_index = tf.keras.backend.arange(0, stop=conv_dims[0])
conv_width_index = tf.keras.backend.arange(0, stop=conv_dims[1])
conv_height_index = tf.tile(conv_height_index, [conv_dims[1]]) # (169,) tensor with 0-12 repeating
conv_width_index = tf.tile(tf.expand_dims(conv_width_index, 0), [conv_dims[0], 1]) # (13, 13) tensor with x offset in each row
conv_width_index = tf.keras.backend.flatten(tf.transpose(conv_width_index)) # (169,) tensor with 13 0's followed by 13 1's, etc (y offsets)
conv_index = tf.transpose(tf.stack([conv_height_index, conv_width_index])) # (169, 2)
conv_index = tf.reshape(conv_index, [1, conv_dims[0], conv_dims[1], 1, 2]) # y offset, x offset
conv_dims = tf.cast(tf.reshape(conv_dims, [1, 1, 1, 1, 2]), tf.float32) # grid_height x grid_width, max dims of anchors
# makes the center coordinate between 0 and 1, each grid cell is normalized to 1 x 1
center_coords = tf.math.sigmoid(predictions[...,:2])
conv_index = tf.cast(conv_index, tf.float32)
center_coords = (center_coords + conv_index) / conv_dims
# makes the objectness score a probability between 0 and 1
obj_scores = tf.math.sigmoid(predictions[...,4:5])
anchors = DetectNet.get_anchors()
anchors = tf.reshape(anchors, [1, 1, 1, params.num_anchors, 2])
# exp to make width and height positive then multiply by anchor dims to resize box to anchor
# should fit close to anchor, normalizing by conv_dims should make it between 0 and approx 1
wh_coords = (tf.math.exp(predictions[...,2:4])*anchors) / conv_dims
# apply sigmoid to class scores to make them probabilities
class_probs = tf.keras.activations.softmax(predictions[..., 5 : 5 + params.num_classes])
# (batch, rows, cols, anchors, vals)
return center_coords, wh_coords, obj_scores, class_probs
I have another doubt in creating the ground truth data, based on https://github.com/allanzelener/YAD2K/blob/master/yad2k/models/keras_yolo.py#L352. In the below box[0] and box[1] are the center coordinates, i and j are the grid cell coordinates (between 0 and 13), and box[2] and box[3] are the width and height.
They have all been normalized to be within the grid coordinates (0 to 13). Its placing the object in the ground truth grid with its corresponding best anchor. box[0] - j and box[1] - i ensure the center coordinates are between 0 and 1.
However I don't understand np.log(box[2] / anchors[best_anchor][0]), as the anchors are also on the grid coordinate scale, and the quotient may be less than 1, which will produce a negative number after the log. I often see negative widths and heights in my ground truth data as I am training, and don't know what to make of that.
if best_iou > 0:
adjusted_box = np.array(
[
box[0] - j, # center should be between 0 and 1, like prediction will be
box[1] - i,
np.log(box[2] / anchors[best_anchor][0]), # quotient might be less than one, not sure why log is used
np.log(box[3] / anchors[best_anchor][1]),
box[4] # class label
],
dtype=np.float32
)
true_box_grid[i, j, best_anchor] = adjusted_box
Also here is my model, which is very watered down because of my lack of computational resources.
def create_model():
model = models.Sequential()
model.add(Conv2D(6, 3, padding='same', data_format='channels_last', kernel_regularizer=l2(5e-4)))
model.add(BatchNormalization())
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPool2D())
model.add(Conv2D(8, 3, padding='same', data_format='channels_last', kernel_regularizer=l2(5e-4)))
model.add(BatchNormalization())
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPool2D())
model.add(Conv2D(12, 3, padding='same', data_format='channels_last', kernel_regularizer=l2(5e-4)))
model.add(BatchNormalization())
model.add(LeakyReLU(alpha=0.1))
model.add(Conv2D(8, 1, padding='same', data_format='channels_last', kernel_regularizer=l2(5e-4)))
model.add(BatchNormalization())
model.add(LeakyReLU(alpha=0.1))
model.add(Conv2D(12, 3, padding='same', data_format='channels_last', kernel_regularizer=l2(5e-4)))
model.add(BatchNormalization())
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(params.grid_height * params.grid_width * params.pred_vec_len * params.num_anchors, activation='relu'))
return model
I'm wondering what I can do to prevent the predicted widths and heights, and thus the loss from exploding. The exponential is there to ensure they are positive, which makes sense. I could also do a sigmoid on them, but I don't want to restrict them to be between 0 and 1. In the YOLO paper, they mention that they pretrain their network so that the layer weights are already initialized when the YOLO training begins. Is this a problem of initializing the network properly?

In Pytorch, how to test simple image with my loaded model?

I made a alphabet classification CNN model using Pytorch, and then use that model to test it with a single image that I've never seen before. I extracted a bounding box in my handwriting image with opencv, but I don't know how to apply it to the model.
bounded my_image
this is custom dataset
class CustomDatasetFromCSV(Dataset):
def __init__(self, csv_path, height, width, transforms=None):
"""
Args:
csv_path (string): path to csv file
height (int): image height
width (int): image width
transform: pytorch transforms for transforms and tensor conversion
"""
self.data = pd.read_csv(csv_path)
self.labels = np.asarray(self.data.iloc[:, 0])
self.height = height
self.width = width
self.transforms = transforms
def __getitem__(self, index):
single_image_label = self.labels[index]
# Read each 784 pixels and reshape the 1D array ([784]) to 2D array ([28,28])
img_as_np = np.asarray(self.data.iloc[index][1:]).reshape(28,28).astype('uint8')
# Convert image from numpy array to PIL image, mode 'L' is for grayscale
img_as_img = Image.fromarray(img_as_np)
img_as_img = img_as_img.convert('L')
# Transform image to tensor
if self.transforms is not None:
img_as_tensor = self.transforms(img_as_img)
# Return image and the label
return (img_as_tensor, single_image_label)
def __len__(self):
return len(self.data.index)
transformations = transforms.Compose([
transforms.ToTensor()
])
alphabet_from_csv = CustomDatasetFromCSV("/content/drive/My Drive/A_Z Handwritten Data.csv",
28, 28, transformations)
random_seed = 50
data_size = len(alphabet_from_csv)
indices = list(range(data_size))
split = int(np.floor(0.2 * data_size))
if True:
np.random.seed(random_seed)
np.random.shuffle(indices)
train_indices, test_indices = indices[split:], indices[:split]
train_dataset = SubsetRandomSampler(train_indices)
test_dataset = SubsetRandomSampler(test_indices)
train_loader = torch.utils.data.DataLoader(dataset = alphabet_from_csv,
batch_size = batch_size,
sampler = train_dataset)
test_loader = torch.utils.data.DataLoader(dataset = alphabet_from_csv,
batch_size = batch_size,
sampler = test_dataset)
this is my model
class ConvNet3(nn.Module):
def __init__(self, num_classes=26):
super().__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(1, 28, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(28),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.layer2 = nn.Sequential(
nn.Conv2d(28, 56, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(56),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.fc = nn.Sequential(
nn.Dropout(p = 0.5),
nn.Linear(56 * 7 * 7, 512),
nn.BatchNorm1d(512),
nn.ReLU(),
nn.Dropout(p = 0.5),
nn.Linear(512, 26),
)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = out.reshape(out.size(0), -1)
out = self.fc(out)
return out
model = ConvNet3(num_classes).to(device)
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
def train():
# train phase
model.train()
# create a progress bar
batch_loss_list = []
progress = ProgressMonitor(length=len(train_dataset))
for batch, target in train_loader:
# Move the training data to the GPU
batch, target = batch.to(device), target.to(device)
# forward propagation
output = model( batch )
# calculate the loss
loss = loss_func( output, target )
# clear previous gradient computation
optimizer.zero_grad()
# backpropagate to compute gradients
loss.backward()
# update model weights
optimizer.step()
# update progress bar
batch_loss_list.append(loss.item())
progress.update(batch.shape[0], sum(batch_loss_list)/len(batch_loss_list) )
def test():
# test phase
model.eval()
correct = 0
# We don't need gradients for test, so wrap in
# no_grad to save memory
with torch.no_grad():
for batch, target in test_loader:
# Move the training batch to the GPU
batch, target = batch.to(device), target.to(device)
# forward propagation
output = model( batch )
# get prediction
output = torch.argmax(output, 1)
# accumulate correct number
correct += (output == target).sum().item()
# Calculate test accuracy
acc = 100 * float(correct) / len(test_dataset)
print( 'Test accuracy: {}/{} ({:.2f}%)'.format( correct, len(test_dataset), acc ) )
for epoch in range(num_epochs):
print("{}'s try".format(int(epoch)+1))
train()
test()
print("-----------------------------------------------------------------------------")
this is my image to bound
import cv2
import matplotlib.image as mpimg
im = cv2.imread('/content/drive/My Drive/my_handwritten.jpg')
gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
thresh = cv2.adaptiveThreshold(blur, 255, 1, 1, 11, 2)
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[1]
rects=[]
for cnt in contours:
x, y, w, h = cv2.boundingRect(cnt)
if h < 20: continue
red = (0, 0, 255)
cv2.rectangle(im, (x, y), (x+w, y+h), red, 2)
rects.append((x,y,w,h))
cv2.imwrite('my_handwritten_bounding.png', im)
img_result = []
img_for_class = im.copy()
margin_pixel = 60
for rect in rects:
#[y:y+h, x:x+w]
img_result.append(
img_for_class[rect[1]-margin_pixel : rect[1]+rect[3]+margin_pixel,
rect[0]-margin_pixel : rect[0]+rect[2]+margin_pixel])
# Draw the rectangles
cv2.rectangle(im, (rect[0], rect[1]),
(rect[0] + rect[2], rect[1] + rect[3]), (0, 0, 255), 2)
count = 0
nrows = 4
ncols = 7
plt.figure(figsize=(12,8))
for n in img_result:
count += 1
plt.subplot(nrows, ncols, count)
plt.imshow(cv2.resize(n,(28,28)), cmap='Greys', interpolation='nearest')
plt.tight_layout()
plt.show()
You have already written the function test to test your net. The only thing you should do — create batch with one image with same preprocessing as images in your dataset.
def test_one_image(I, model):
'''
I - 28x28 uint8 numpy array
'''
# test phase
model.eval()
# convert image to torch tensor and add batch dim
batch = torch.tensor(I / 255).unsqueeze(0)
# We don't need gradients for test, so wrap in
# no_grad to save memory
with torch.no_grad():
batch = batch.to(device)
# forward propagation
output = model( batch )
# get prediction
output = torch.argmax(output, 1)
return output

input_shape not recognised in Keras model

I am trying to use Tensorflow's 2.0 new MirroredStrategy but I am receiving an error saying:
ValueError: We currently do not support distribution strategy with a `Sequential` model that is created without `input_shape`/`input_dim` set in its first layer or a subclassed model.
Model:
class Model(kr.Model):
def __init__(self, input_shape, conv_sizes, num_outputs):
super().__init__('model_1')
self.num_outputs = num_outputs
rows, cols, depth = input_shape
self.one_hot = kl.Lambda(lambda x: tf.one_hot(tf.cast(x, 'int32'), num_outputs), input_shape=(rows, cols))
self.concat = kl.Concatenate(axis=-1)
vision_layers = []
for i, (filters, kernel, stride) in enumerate(conv_sizes):
if not i:
depth += num_outputs - 1
vision_layers += [kl.Conv2D(filters, kernel, stride, activation='relu',
input_shape=(rows, cols, depth))]
else:
vision_layers += [kl.Conv2D(filters, kernel, stride, activation='relu')]
vision_layers += [kl.MaxPool2D(pool_size=(2, 2))]
flatten = kl.Flatten()
dense = kl.Dense(num_outputs)
self.net = kr.Sequential(vision_layers+[flatten]+[dense])
self.build(input_shape=(None, ) + input_shape)
def call(self, inputs):
one_hot = self.one_hot(inputs[:, :, :, -1])
return self.net(self.concat([inputs[:, :, :, :-1], one_hot]))
Reproduction code:
model_args = {'conv_sizes': [(32, (2, 2), 1), (32, (2, 2), 1), (32, (2, 2), 1)],
'input_shape': (50, 50, 6),
'num_outputs': 5}
def dummy_loss(values, targets):
return tf.reduce_sum(values-targets, axis=-1)
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = Model(**model_args)
model.compile(optimizer=kr.optimizers.Adam(learning_rate=0.01), loss=dummy_loss)
Output:
Traceback (most recent call last):
File "/home/joao/anaconda3/envs/tf2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-4-dc492e7c638b>", line 18, in <module>
model.compile(optimizer=kr.optimizers.Adam(learning_rate=0.01), loss=dummy_loss)
File "/home/joao/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 456, in _method_wrapper
result = method(self, *args, **kwargs)
File "/home/joao/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 263, in compile
'We currently do not support distribution strategy with a '
ValueError: We currently do not support distribution strategy with a `Sequential` model that is created without `input_shape`/`input_dim` set in its first layer or a subclassed model.
Model Summary (model.summary()):
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lambda (Lambda) multiple 0
_________________________________________________________________
concatenate (Concatenate) multiple 0
_________________________________________________________________
sequential (Sequential) (None, 5) 13573
=================================================================
Total params: 13,573
Trainable params: 13,573
Non-trainable params: 0
I would do away with the Sequential approach and use the Model class directly:
def create_model(input_shape, conv_sizes, fc_sizes, num_outputs):
num_outputs = num_outputs
rows, cols, depth = input_shape
input_layer = kl.Input(shape=(rows, cols, depth))
actions = tf.slice(input_layer, [0, 0, 0, depth - 1], [-1, rows, cols, 1])
non_actions = tf.slice(input_layer, [0, 0, 0, 0], [-1, rows, cols, depth - 1])
one_hot = kl.Lambda(lambda x: tf.one_hot(tf.cast(x, 'int32'), num_outputs),
input_shape=(rows, cols))(actions)
concat = kl.Concatenate(axis=-1)([non_actions, tf.reshape(one_hot, (-1, rows, cols, num_outputs))])
vision_layer = concat
for i, (filters, kernel, stride) in enumerate(conv_sizes):
vision_layer = kl.Conv2D(filters, kernel, stride, activation='relu')(vision_layer)
vision_layer = kl.MaxPool2D(pool_size=(2, 2))(vision_layer)
flatten = kl.Flatten()(vision_layer)
dense = kl.Dense(num_outputs)(flatten)
return kr.Model(inputs=input_layer, outputs=[dense])