I want to solve a 2D-differential equation using neural network and working with the JAX library. The neural network function I am using basically approximates the function u = f(x,y) and goes something like this:
def f(params, inputs_x, inputs_y):
inputs = jnp.concatenate((inputs_x, inputs_y), axis=1)
for w, b in params:
outputs = jnp.dot(inputs, w)
inputs = jnn.swish(outputs)
return outputs
params is a PyTree that contains the weights and biases matrices. For the 2D problem, let's take layer sizes as something like [2,5,1]. There are 10 batches of (x_inputs, y_inputs) passed onto the function, hence inputs_x, inputs_y both are of shapes (10,1). Therefore, the output I want should also have the shape (10,1). But, the real problem comes when I'm trying to find out du/dx, du/dy, d2u/dx2 or d2u/dy2. I am writing something like this:
u = lambda x,y: f(params, x, y)
u = lambda x,y: f(params, x)
u_x = lambda x,y: vmap(jacfwd(u,argnums=0), in_axes=(0,0))(x,y)
u_xx = lambda x,y: vmap(jacfwd(u_x,argnums=0), in_axes=(0,0))(x,y)
I am getting errors.
If I was solving a 1D differential equation, then everything was going fine. In that case, the neural network function is something like this:
def f(params, inputs):
for w, b in params:
outputs = jnp.dot(inputs, w)
inputs = jnn.swish(outputs)
return outputs
u = lambda x,: f(params, x)
u_x = lambda x: vmap(jacfwd(u,argnums=0))(x)
Layer Sizes are [1,5,1] and I pass 10 batches of inputs into the neural network function and compute the gradients using vmap. Everything works fine!
As soon as I have a 2D problem and two input neurons, the layer sizes become [2,5,1] and then I pass 10 batches of inputs for both x and y together, vmap doesn't work anymore. I wanted to find du/dx, du/dy, d2u/dx2 or d2u/dy2 using the neural network and four functions below, and I expect all the four functions to return me results of shape (10,1), but I am getting error.
It looks like your function is not compatible with vmap, because it expects explicit batch dimensions. You can fix this by concatenating along axis=-1 rather than axis=1. Then your function calls could look something like the following:
from functools import partial
import jax
import jax.numpy as jnp
from jax import nn as jnn
def f(params, inputs_x, inputs_y):
inputs = jnp.concatenate((inputs_x, inputs_y), axis=-1)
for w, b in params:
outputs = jnp.dot(inputs, w)
inputs = jnn.swish(outputs)
return outputs
# Some example inputs and parameters
inputs_x = jnp.ones((10, 1))
inputs_y = jnp.ones((10, 1))
params = [
(jnp.ones((2, 5)), 1),
(jnp.ones((5, 1)), 1)
]
u = partial(f, params)
# u: (10,1)->(10,1)
print(u(inputs_x, inputs_y).shape)
# (10, 1)
# u: (1)->(1) batched to (10,1)->(10,1)
print(jax.vmap(u)(inputs_x, inputs_y).shape)
# (10, 1)
# ∇u: (1) -> (1,1) batched to (10,1)->(10,1,1)
print(jax.vmap(jax.jacobian(u))(inputs_x, inputs_y).shape)
# (10, 1, 1)
# ∇²u: (1) -> (1,1,1) batched to (10,1)->(10,1,1,1)
print(jax.vmap(jax.hessian(u))(inputs_x, inputs_y).shape)
# (10, 1, 1, 1)
Related
I recently started learning pytorch and I am trying to convert a part of a large script including coding a MLP with variable number of hidden layers from Tensorflow to pytorch.
import tensorflow as tf
### Base neural network
def init_mlp(layer_sizes, std=.01, bias_init=0.):
params = {'w':[], 'b':[]}
for n_in, n_out in zip(layer_sizes[:-1], layer_sizes[1:]):
params['w'].append(tf.Variable(tf.random_normal([n_in, n_out], stddev=std)))
params['b'].append(tf.Variable(tf.mul(bias_init, tf.ones([n_out,]))))
return params
def mlp(X, params):
h = [X]
for w,b in zip(params['w'][:-1], params['b'][:-1]):
h.append( tf.nn.relu( tf.matmul(h[-1], w) + b ) )
#h.append( tf.nn.tanh( tf.matmul(h[-1], w) + b ) )
return tf.matmul(h[-1], params['w'][-1]) + params['b'][-1]
def compute_nll(x, x_recon_linear):
return tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(x_recon_linear, x), reduction_indices=1, keep_dims=True)
def gauss_cross_entropy(mean_post, std_post, mean_prior, std_prior):
d = (mean_post - mean_prior)
d = tf.mul(d,d)
return tf.reduce_sum(-tf.div(d + tf.mul(std_post,std_post),(2.*std_prior*std_prior)) - tf.log(std_prior*2.506628), reduction_indices=1, keep_dims=True)
how could I write down similarly weights and bias variables and attach them in each hidden layer in pytorch?
how could I convert gauss_cross_entropy and compute_nll
functions as well (finding equivalent syntax)?
Are these two codes compatible?
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as func
from torch.distributions import Normal, Categorical, Independent
from copy import
device = "cpu"
if torch.cuda.is_available():
device = "cuda:0"
if torch.cuda.device_count() > 1:
net = nn.DataParallel(net)
net.to(device)
def init_mlp(layer_sizes, std=.01, bias_init=0.):
params = {'w':[], 'b':[]}
for n_in, n_out in zip(layer_sizes[:-1], layer_sizes[1:]):
params['w'].append(torch.tensor(Normal([n_in, n_out], torch.tensor([std])) ,requires_grad=True))
params['b'].append(torch.tensor(torch.mul(bias_init, torch.ones([n_out,])),requires_grad=True))
return params
def mlp(X, params):
h = [X]
for w,b in zip(params['w'][:-1], params['b'][:-1]):
h.append( torch.nn.ReLU( tf.matmul(h[-1], w) + b ) )
return torch.matmul(h[-1], params['w'][-1]) + params['b'][-1]
def compute_nll(x, x_recon_linear):
return torch.sum(func.binary_cross_entropy_with_logits(x_recon_linear, x), reduction_indices=1, keep_dims=True)
def gauss_cross_entropy(mu_post, sigma_post, mu_prior, sigma_prior):
d = (mu_post - mu_prior)
d = torch.mul(d,d)
return torch.sum(-torch.div(d + torch.mul(sigma_post,sigma_post),(2.*sigma_prior*sigma_prior)) - torch.log(sigma_prior*2.506628), reduction_indices=1, keep_dims=True)
What is the substitute function for tf.placeholder in pytorch? For instance here:
class VAE(object):
def __init__(self, hyperParams):
self.X = tf.placeholder("float", [None, hyperParams['input_d']])
self.prior = hyperParams['prior']
self.K = hyperParams['K']
self.encoder_params = self.init_encoder(hyperParams)
self.decoder_params = self.init_decoder(hyperParams)
and also how should I change tf.shape in this line: tf.random_normal(tf.shape(self.sigma[-1]))
How could I write down similar weights and bias variables and attach them in each hidden layer in PyTorch?
An easier way to define those is to create a list containing the params as (weight, bias) tuples:
def init_mlp(layer_sizes, std=.01, bias_init=0.):
params = []
for n_in, n_out in zip(layer_sizes[:-1], layer_sizes[1:]):
params.append([
nn.init.normal_(torch.empty(n_in, n_out)).requires_grad_(True),
torch.empty(n_out).fill_(bias_init).requires_grad_(True)])
return params
Above I define my parameters as 'empty' (created with uninitialized data) tensors with torch.empty. I have used in-place functions such as nn.init.normal_ (there are many others available) and torch.Tensor.fill_ to fill the tensor with an arbitrary value (maybe it is .mul_(bias_init) you are looking for, based on your TensorFlow sample?).
For the inference code, you don't actually need to store the intermediate layer results:
def mlp(x, params):
for i, (W, b) in enumerate(params):
x = x#W + b
if i < len(params) - 1:
x = torch.relu(x)
return x
How could I convert gauss_cross_entropy and compute_nll functions as well (finding equivalent syntax)?
You can use PyTorch functions and mathematical operators to define your logic. For compute_loss you were using the built-in, which actually does not require summation after it, by default the losses of the batch elements are averaged.
def compute_loss(y_pred, y_true):
return F.binary_cross_entropy_with_logits(y_pred, y_true)
What is the substitute function for tf.placeholder in Pytorch?
You don't have placeholders in PyTorch, you compute your outputs explicitly using PyTorch operators, then you should be able to backpropagate through those operators and get the gradients for each parameter.
How should I change tf.shape in this line: tf.random_normal(tf.shape(self.sigma[-1]))
Function tf.shape returns the shape of the tensor, in PyTorch you call torch.Tensor.shape or by calling torch.Tensor.size: i.e. self.sigma[-1].shape or self.sigma[-1].size().
I'm trying to write a custom layer that will handle variable-length vectors, and reduce them to the same length vector.
The length is known in advance because the reason for the variable lengths is that I have several different data types that I encode using a different number of features.
In a sense, it is similar to Embedding only for numerical values.
I've tried using padding, but the results were bad, so I'm trying this approach instead.
So, for example let's say I have 3 data types, which I encode with 3, 4, 6 length vectors.
arr = [
# example one (data type 1 [len()==3], datat type 3[len()==6]) - force values as floats
[[1.0,2.0,3],[1,2,3,4,5,6]],
# example two (data type 2 [len()==4], datat type 3len()==6]) - force values as floats
[[1.0,2,3,4],[1,2,3,4,5,6]],
]
I tried implementing a custom layer like:
class DimensionReducer(tf.keras.layers.Layer):
def __init__(self, output_dim, expected_lengths):
super(DimensionReducer, self).__init__()
self._supports_ragged_inputs = True
self.output_dim = output_dim
for l in expected_lengths:
setattr(self,f'w_{l}', self.add_weight(shape=(l, self.output_dim),initializer='random_normal',trainable=True))
setattr(self, f'b_{l}',self.add_weight(shape=(self.output_dim,), initializer='random_normal',trainable=True))
def call(self, inputs):
print(inputs.shape)
# batch
if len(inputs.shape) == 3:
print("batch")
result = []
for i,x in enumerate(inputs):
_result = []
for v in x:
l = len(v)
print(l)
print(v)
w = getattr(self, f'w_{l}')
b = getattr(self, f'b_{l}')
out = tf.matmul([v],w) + b
_result.append(out)
result.append(tf.concat(_result, 0))
r = tf.stack(result)
print("batch output:",r.shape)
return r
Which seems to be working when called directly:
dim = DimensionReducer(3, [3,4,6])
dim(tf.ragged.constant(arr))
But when I try to incorporate it into a model, it fails:
import tensorflow as tf
val_ragged = tf.ragged.constant(arr)
inputs_ragged = tf.keras.layers.Input(shape=(None,None), ragged=True)
outputs_ragged = DimensionReducer(3, [3,4,6])(inputs_ragged)
model_ragged = tf.keras.Model(inputs=inputs_ragged, outputs=outputs_ragged)
# this one with RaggedTensor doesn't
print(model_ragged(val_ragged))
With
AttributeError: 'DimensionReducer' object has no attribute 'w_Tensor("dimension_reducer_98/strided_slice:0", shape=(), dtype=int32)'
I'm not sure how am I to implement such a layer, or what I'm doing wrong.
I'm working on a seq2sql project and I successfully build a model but when training I get an error. I'm not using any Keras embedding layer.
M=13 #Question Length
d=40 #Dimention of the LSTM
C=12 #number of table Columns
batch_size=9
inputs1=Input(shape=(M,100),name='question_token')
Hq=Bidirectional(LSTM(d,return_sequences=True),name='QuestionENC')(inputs1) #this is HQ shape is (num_samples,13,80)
inputs2=Input(shape=(C,3,100),name='col_token')
col_lstm_layer=Bidirectional(LSTM(d,return_sequences=False),name='ColENC')
def hidd(te):
t=tf.Variable(initial_value=1,dtype=tf.int32)
for i in range(batch_size):
t=tf.assign(t,i)
Z = tf.nn.embedding_lookup(te, t)
print(col_lstm_layer(Z))
h=tf.reshape(col_lstm_layer(Z),[1,C,d*2])
if i==0:
# cols_last_hidden=tf.Variable(initial_value=h)
cols_last_hidden=tf.stack(h)#this is because it gives an error if we use tf.Variable here
else:
cols_last_hidden=tf.concat([cols_last_hidden,h],0)#shape of this one is (num_samples,num_col,80) 80 is last encoding of each column
return cols_last_hidden
cols_last_hidden=Lambda(hidd)(inputs2)
Hq=Dense(d*2,name='QuestionLastEncode')(Hq)
I=tf.Variable(initial_value=1,dtype=tf.int32)
J=tf.Variable(initial_value=1,dtype=tf.int32)
K=1
def get_col_att(tensors):
global K,all_col_attention
if K:
t=tf.Variable(initial_value=1,dtype=tf.int32)
for i in range(batch_size):
t=tf.assign(t,i)
x = tf.nn.embedding_lookup(tensors[0], t)
# print("tensors[1]:",tensors[1])
y = tf.nn.embedding_lookup(tensors[1], t)
# print("x shape",x.shape,"y shape",y.shape)
y=tf.transpose(y)
# print("x shape",x.shape,"y",y.shape)
Ecol=tf.reshape(tf.transpose(tf.tensordot(x,y,axes=1)),[1,C,M])
if i==0:
# all_col_attention=tf.Variable(initial_value=Ecol,name=""+i)
all_col_attention=tf.stack(Ecol)
else:
all_col_attention=tf.concat([all_col_attention,Ecol],0)
K=0
print("all_col_attention",all_col_attention)
return all_col_attention
total_alpha_sel_lambda=Lambda(get_col_att,name="Alpha")([Hq,cols_last_hidden])
total_alpha_sel=Dense(13,activation="softmax")(total_alpha_sel_lambda)
# print("Hq",Hq," total_alpha_sel_lambda shape",total_alpha_sel_lambda," total_alpha_sel shape",total_alpha_sel.shape)
def get_EQcol(tensors):
global K
if K:
t=tf.Variable(initial_value=1,dtype=tf.int32)
global all_Eqcol
for i in range(batch_size):
t=tf.assign(t,i)
x = tf.nn.embedding_lookup(tensors[0], t)
y = tf.nn.embedding_lookup(tensors[1], t)
Eqcol=tf.reshape(tf.tensordot(x,y,axes=1),[1,C,d*2])
if i==0:
# all_Eqcol=tf.Variable(initial_value=Eqcol,name=""+i)
all_Eqcol=tf.stack(Eqcol)
else:
all_Eqcol=tf.concat([all_Eqcol,Eqcol],0)
K=0
print("all_Eqcol",all_Eqcol)
return all_Eqcol
K=1
EQcol=Lambda(get_EQcol,name='EQcol')([total_alpha_sel,Hq])#total_alpha_sel(12x13) Hq(13xd*2)
EQcol=Dropout(.2)(EQcol)
L1=Dense(d*2,name='L1')(cols_last_hidden)
L2=Dense(d*2,name='L2')(EQcol)
L1_plus_L2=Add()([L1,L2])
pre=Flatten()(L1_plus_L2)
Psel=Dense(12,activation="softmax")(pre)
model=Model(inputs=[inputs1,inputs2],outputs=Psel)
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])
model.summary()
earlyStopping=EarlyStopping(monitor='val_loss', patience=7, verbose=0, mode='auto')
history=model.fit([Equestion,Col_Embeddings],y_train,epochs=50,validation_split=.1,shuffle=False,callbacks=[earlyStopping],batch_size=batch_size)
The shapes of the Equestion, Col_Embeddings, and y_train are (10, 12, 3, 100) ,(10, 13, 100) and (10, 12).
I searched about this error but in all cases they have used an embedding layer incorrectly. Here I get this error even though I'm not using one.
indices = 2 is not in [0, 1)
[[{{node lambda_3/embedding_lookup_2}} = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:#col_token_2"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_col_token_2_0_1, lambda_3/Assign_2, lambda_3/embedding_lookup_2/axis)]]
The problem here was the batch size is defined at the graph level.here i have used batch_size =9 for the graph and yes i get batch size of 9 for training by the validation split .1 for the full batch size of 10 but for the validation i left only one sample because 10*.1 is one.
So the batch size of 1 cannot be passed to the graph because it needs batch size of 9.that's why this error comes
As for the solution i put the batch_size=1 and then it works fine also got a good accuracy by using batch_size=1.
Hope this will help someone.
Cheers ..
For me this error was due to bad form of my input data. You have to double check your input data to the model and it depends on your model input.
I want to design a follow function for expanding any 1D/2D/3D matrix to a 4D matrix.
import tensorflow as tf
def inputs_2_4D(inputs):
_ranks = tf.rank(inputs)
return tf.case({tf.equal(_ranks, 3): lambda: tf.expand_dims(inputs, 3),
tf.equal(_ranks, 2): lambda: tf.expand_dims(tf.expand_dims(inputs, 0), 3),
tf.equal(_ranks, 1): lambda: tf.expand_dims(tf.expand_dims(tf.expand_dims(inputs, 0), 0), 3)},
default=lambda: tf.identity(inputs))
def run():
with tf.Session() as sess:
mat_1d = tf.constant([1, 1])
mat_2d = tf.constant([[1, 1]])
mat_3d = tf.constant([[[1, 1]]])
mat_4d = tf.constant([[[[1, 1]]]])
result = inputs_2_4D(mat_1d)
print(result.eval())
The function, however, cannot run well. It can only perform to output a 4-D matrix when the mat_3d and mat-4d tensors are passed into it. There will be some errors information if a 1D or 2D matrix are passed to the function.
When passing mat_3dormat_4dinto inputs_2_4D(), they can be expanded to a 4D matrix or original matrix:
mat_3d -----> [[[[1]
[1]]]]
mat_4d -----> [[[[1 1]]]]
When mat_1dormat_2dmatrixes are passed into inputs_2_4D, error information:
ValueError: dim 3 not in the interval [-2, 1]. for 'case/cond/ExpandDims' (op: 'ExpandDims') with input shapes: [2], [] and with computed input tensors: input[1] = <3>.
I tested another similar function before. That function can run correctly.
import tensorflow as tf
def test_2_4D(inputs):
_ranks = tf.rank(inputs)
return tf.case({tf.equal(_ranks, 3): lambda: tf.constant(3),
tf.equal(_ranks, 2): lambda: tf.constant(2),
tf.equal(_ranks, 1): lambda: tf.constant(1)},
default=lambda: tf.identity(inputs))
def run():
with tf.Session() as sess:
mat_1d = tf.constant([1, 1])
mat_2d = tf.constant([[1, 1]])
mat_3d = tf.constant([[[1, 1]]])
mat_4d = tf.constant([[[[1, 1]]]])
result = test_2_4D(mat_3d)
print(result.eval())
This function can correctly output the corresponding results when passing all of matrixes.
test_2_4D() RESULTS:
mat_1d -----> 1
mat_2d -----> 2
mat_3d -----> 3
mat_4d -----> [[[[1 1]]]]
I don't know why the correct branch in inputs_2_4D() cannot be found while the tf.equal() in each branch were executed. I feel that the 1st and 2nd branches in the function seem to still work if the input matrix is "mat_1d" or "mat_2d". So, the program will crash down. Please help me to analyze this problem!
I think I worked out what the problem is here. Turns out all condition/function pairs are evaluated. This can be revealed by giving the ops different names. The problem is that if your input is, say, rank 2, Tensorflow seems to still evaluate tf.equal(_ranks, 3): lambda: tf.expand_dims(inputs, 3). This leads to a crash because it cannot expand dim 3 for a rank-2 tensor (the maximum allowed value is 2).
This actually makes sense since with tf.case you're basically saying "I don't know which of these cases is going to be true at runtime, so check which one is appropriate and execute the corresponding function". However this means that Tensorflow needs to prepare execution paths for all possible cases, which in this case leads to invalid computations (trying to expand invalid dimensions).
At this point it would be nice to know a little more about your problem, i.e. why exactly you need that function. If you have different inputs and you simply want to bring them all to 4D, but each input always has the same dimensionality, consider simply using Python if-statements. Example:
inputs3d = tf.constant([[[1,1]]]) # this is always 3D
inputs2d = tf.constant([[1,1]]) # this is alwayas 2D
...
def inputs_2_4D(inputs):
_rank = len(inputs.shape.as_list())
if _rank == 3:
return tf.expand_dims(inputs, 3)
elif _rank == 2:
return tf.expand_dims(tf.expand_dims(inputs, 0), 3)
...
This will check the input rank while the graph is being built (not at runtime like tf.case) and really only prepare those expand_dims ops that are appropriate for the given input.
However if you have a single inputs tensor and this could have different ranks at different times of your program this would require a different solution. Please let us know which problem you're trying to solve!
I have implement the functionality I want through 2 ways. Now, I provide my code to share.
The 1st method based on tf.cond:
def inputs_2_4D(inputs):
_rank1d = tf.rank(inputs)
def _1d_2_2d(): return tf.expand_dims(inputs, 0)
def _greater_than_1d(): return tf.identity(inputs)
_tmp_2d = tf.cond(_rank1d < 2, _1d_2_2d, _greater_than_1d)
_rank2d = tf.rank(_tmp_2d)
def _2d_2_3d(): return tf.expand_dims(_tmp_2d, 0)
def _greater_than_2d(): return tf.identity(_tmp_2d)
_tmp_3d = tf.cond(_rank2d < 3, _2d_2_3d, _greater_than_2d)
_rank3d = tf.rank(_tmp_3d)
def _3d_2_4d(): return tf.expand_dims(_tmp_3d, 3)
def _greater_than_3d(): return tf.identity(_tmp_3d)
return (tf.cond(_rank3d < 4, _3d_2_4d, _greater_than_3d))
The 2nd method based on tf.case with tf.cond:
def inputs_2_4D_1(inputs):
_rank = tf.rank(inputs)
def _assign_original(): return tf.identity(inputs)
def _dummy(): return tf.expand_dims(inputs, 0)
_1d = tf.cond(tf.equal(_rank, 1), _assign_original, _dummy)
_2d = tf.cond(tf.equal(_rank, 2), _assign_original, _dummy)
_3d = tf.cond(tf.equal(_rank, 3), _assign_original, _dummy)
def _1d_2_4d(): return tf.expand_dims(tf.expand_dims(tf.expand_dims(_1d, 0), 0), 3)
def _2d_2_4d(): return tf.expand_dims(tf.expand_dims(_2d, 0), 3)
def _3d_2_4d(): return tf.expand_dims(_3d, 3)
return (tf.case({tf.equal(_rank, 1): _1d_2_4d,
tf.equal(_rank, 2): _2d_2_4d,
tf.equal(_rank, 3): _3d_2_4d},
default=_assign_original))
I think the efficiency of the 2nd method should be less than the 1st method's, because the function _dummy() always wastes 2 operations when allocating inputs into _1d,_2d,_3d respectively.
I want to use maxout activation function in tensorflow, but I don't know which function should use.
I sent a pull request for maxout, here is the link:
https://github.com/tensorflow/tensorflow/pull/5528
Code is as follows:
def maxout(inputs, num_units, axis=None):
shape = inputs.get_shape().as_list()
if axis is None:
# Assume that channel is the last dimension
axis = -1
num_channels = shape[axis]
if num_channels % num_units:
raise ValueError('number of features({}) is not a multiple of num_units({})'
.format(num_channels, num_units))
shape[axis] = -1
shape += [num_channels // num_units]
outputs = tf.reduce_max(tf.reshape(inputs, shape), -1, keep_dims=False)
return outputs
Here is how it works:
I don't think there is a maxout activation but there is nothing stopping yourself from making it yourself. You could do something like the following.
with tf.variable_scope('maxout'):
layer_input = ...
layer_output = None
for i in range(n_maxouts):
W = tf.get_variable('W_%d' % d, (n_input, n_output))
b = tf.get_variable('b_%d' % i, (n_output,))
y = tf.matmul(layer_input, W) + b
if layer_output is None:
layer_output = y
else:
layer_output = tf.maximum(layer_output, y)
Note that this is code I just wrote in my browser so there may be syntax errors but you should get the general idea. You simply perform a number of linear transforms and take the maximum across all the transforms.
How about this code?
This seems to work in my test.
def max_out(input_tensor,output_size):
shape = input_tensor.get_shape().as_list()
if shape[1] % output_size == 0:
return tf.transpose(tf.reduce_max(tf.split(input_tensor,output_size,1),axis=2))
else:
raise ValueError("Output size or input tensor size is not fine. Please check it. Reminder need be zero.")
I refer the diagram in the following page.
From version 1.4 on you can use tf.contrib.layers.maxout.
Maxout is a layer such that it calculates N*M output for a N*1 input, and then it returns the maximum value across the column, i.e., the final output has shape N*1 as well. Basically it uses multiple linear fittings to mimic a complex function.