from_tensor_slices() with big numpy array while using tf.keras - tensorflow

I have some training data in a numpy array - it fits in the memory but it is bigger than 2GB. I'm using tf.keras and the dataset API. To give you a simplified, self-contained example:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
model = tf.keras.Sequential([
layers.Dense(64, activation='relu', input_shape=(32,)),
layers.Dense(64, activation='relu'),
layers.Dense(1)
])
model.compile(optimizer=tf.train.AdamOptimizer(0.001),
loss='mse',
metrics=['mae'])
# generate some big input datasets, bigger than 2GB
data = np.random.random((1024*1024*8, 32))
labels = np.random.random((1024*1024*8, 1))
val_data = np.random.random((100, 32))
val_labels = np.random.random((100, 1))
train_dataset = tf.data.Dataset.from_tensor_slices((data, labels))
train_dataset = train_dataset.batch(32).repeat()
val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_labels))
val_dataset = val_dataset.batch(32).repeat()
model.fit(train_dataset, epochs=10, steps_per_epoch=30,
validation_data=val_dataset, validation_steps=3)
So, executing this results in an error "Cannot create a tensor proto whose content is larger than 2GB". The documentation lists a solution to this problem: https://www.tensorflow.org/guide/datasets#consuming_numpy_arrays - just use tf.placeholders and then feed_dict in session run.
Now the main question is: how to do this with tf.keras? I cannot feed anything for the placeholders when I call model.fit() and in fact when I introduced the placeholders I got errors saying "You must feed a value for placeholder tensor".

As with Estimator API, you can use from_generator
data_chunks = list(np.split(data, 1024))
labels_chunks = list(np.split(labels, 1024))
def genenerator():
for i, j in zip(data_chunks, labels_chunks):
yield i, j
train_dataset = tf.data.Dataset.from_generator(genenerator, (tf.float32, tf.float32))
train_dataset = train_dataset.shuffle().batch().repeat()
Also take a look https://github.com/tensorflow/tensorflow/issues/24520

Related

Implementing Variational Auto Encoder using Tensoroflow_Probability NOT on MNIST data set

I know there are many questions related to Variational Auto Encoders. However, this question in two aspects differs from the existing ones: 1) it is implemented using Tensforflow V2 and Tensorflow_probability; 2) It does not use MNIST or any other image data set.
As about the problem itself:
I am trying to implement VAE using Tensorflow_probability and Keras. and I want to train and evaluate it on some synthetic data sets --as part of my research. I provided the code below.
Although the implementation is done and during the training, the loss value decreases but once I want to evaluate the trained model on my test set I face different errors.
I am somehow confident that the issue is related to input/output shape but unfortunately I did not manage the solve it.
Here is the code:
import numpy as np
import tensorflow as tf
import tensorflow.keras as tfk
import tensorflow_probability as tfp
from tensorflow.keras import layers as tfkl
from sklearn.datasets import make_classification
from tensorflow_probability import layers as tfpl
from sklearn.model_selection import train_test_split
tfd = tfp.distributions
n_epochs = 5
n_features = 2
latent_dim = 1
n_units = 4
learning_rate = 1e-3
n_samples = 400
batch_size = 32
# Generate synthetic data / load data sets
x_in, y_in = make_classification(n_samples=n_samples, n_features=n_features, n_informative=2, n_redundant=0,
n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=[0.5, 0.5],
flip_y=0.01, class_sep=1.0, hypercube=True,
shift=0.0, scale=1.0, shuffle=False, random_state=42)
x_in = x_in.astype('float32')
y_in = y_in.astype('float32') # .reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(x_in, y_in, test_size=0.4, random_state=42, shuffle=True)
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=42, shuffle=True)
print("shapes:", x_train.shape, y_train.shape, x_test.shape, y_test.shape, x_val.shape, y_val.shape)
prior = tfd.Independent(tfd.Normal(loc=[tf.zeros(latent_dim)], scale=1.), reinterpreted_batch_ndims=1)
train_dataset = tf.data.Dataset.from_tensor_slices(x_train).batch(batch_size)
valid_dataset = tf.data.Dataset.from_tensor_slices(x_val).batch(batch_size)
test_dataset = tf.data.Dataset.from_tensor_slices(x_test).batch(batch_size)
encoder = tf.keras.Sequential([
tfkl.InputLayer(input_shape=[n_features, ], name='enc_input'),
tfkl.Lambda(lambda x: tf.cast(x, tf.float32)), # - 0.5
tfkl.Dense(n_units, activation='relu', name='enc_dense1'),
tfkl.Dense(int(n_units / 2), activation='relu', name='enc_dense2'),
tfkl.Dense(tfpl.MultivariateNormalTriL.params_size(latent_dim),
activation=None, name='mvn_triL1'),
tfpl.MultivariateNormalTriL(
# weight >> num_train_samples or some thing except 1 to convert VAE to beta-VAE
latent_dim, activity_regularizer=tfpl.KLDivergenceRegularizer(prior, weight=1.), name='bottleneck'),
])
decoder = tf.keras.Sequential([
tfkl.InputLayer(input_shape=latent_dim, name='dec_input'),
# tfkl.Dense(n_units, activation='relu', name='dec_dense1'),
# tfkl.Dense(int(n_units * 2), activation='relu', name='dec_dense2'),
tfpl.IndependentBernoulli([n_features], tfd.Bernoulli.logits, name='dec_output'),
])
vae = tfk.Model(inputs=encoder.inputs, outputs=decoder(encoder.outputs), name='VAE')
print("enoder:", encoder)
print(" ")
print("encoder.inputs:", encoder.inputs)
print(" ")
print(" encoder.outputs:", encoder.outputs)
print(" ")
print("decoder:", decoder)
print(" ")
print("decoder:", decoder.inputs)
print(" ")
print("decoder.outputs:", decoder.outputs)
print(" ")
# negative log likelihood i.e the E_{S(eps)} [p(x|z)];
# because the KL term was added in the last layer of the encoder, i.e., via activity_regularizer.
# this loss function takes two arguments, namely the original data points x, and the output of the model,
# which we call it rv_x (because it is a random variable)
negloglik = lambda x, rv_x: -rv_x.log_prob(x)
vae.compile(optimizer=tf.optimizers.Adam(learning_rate=learning_rate),
loss=negloglik,)
vae.summary()
history = vae.fit(train_dataset, epochs=n_epochs, validation_data=valid_dataset,)
print("x.shape:", x_test.shape)
x_hat = vae(x_test)
print("original:")
print(x_test)
print(" ")
print("Decoded Random Samples:")
print(x_hat.sample())
print(" ")
print("Decoded Means:")
print(x_hat.mean())
The Questions:
With the above code I receive the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 80 values, but the requested shape has 160 [Op:Reshape]
As far I know we can add as many layers as I want in the decoder model before its output layer --as it is done a convolutional VAEs, am I right?
If I uncomment the following two lines of code in decoder:
# tfkl.Dense(n_units, activation='relu', name='dec_dense1'),
# tfkl.Dense(int(n_units * 2), activation='relu', name='dec_dense2'),
I see the following warnings and the upcoming error:
WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.
And the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 640 values, but the requested shape has 160 [Op:Reshape]
Now the question is why the decoder layers are not used during the training as it is mentioned in the warning.
PS, I also tried to pass the x_train, x_valid, x_test directly during the training and evaluation process but it does not help.
Any helps would be indeed appreciated.

Why trainable_variables Do not Change after Training?

I went over
a basic example of tf2.0
containing very simple code
from __future__ import absolute_import, division, print_function, unicode_literals
import os
import tensorflow as tf
import cProfile
# Fetch and format the mnist data
(mnist_images, mnist_labels), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices(
(tf.cast(mnist_images[...,tf.newaxis]/255, tf.float32),
tf.cast(mnist_labels,tf.int64)))
dataset = dataset.shuffle(1000).batch(32)
# Build the model
mnist_model = tf.keras.Sequential([
tf.keras.layers.Conv2D(16,[3,3], activation='relu',
input_shape=(None, None, 1)),
tf.keras.layers.Conv2D(16,[3,3], activation='relu'),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(10)
])
for images,labels in dataset.take(1):
print("Logits: ", mnist_model(images[0:1]).numpy())
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_history = []
def train_step(model, images, labels):
with tf.GradientTape() as tape:
logits = model(images, training=True)
# Add asserts to check the shape of the output.
tf.debugging.assert_equal(logits.shape, (32, 10))
loss_value = loss_object(labels, logits)
loss_history.append(loss_value.numpy().mean())
grads = tape.gradient(loss_value, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
def train(epochs):
for epoch in range(epochs):
for (batch, (images, labels)) in enumerate(dataset):
train_step(mnist_model, images, labels)
print ('Epoch {} finished'.format(epoch))
I trained it and save trainable_variables before and after by the following
t0=mnist_model.trainable_variables
train(epochs = 3)
t1=mnist_model.trainable_variables
diff = tf.reduce_mean(tf.abs(t0[0] - t1[0]))
# whethere indexing [0] or [1] etc. gets the same outcome of diff
print(diff.numpy())
They are the same!!!
So am I checking somethere incorrect? If that is the case, how can I observe those updated variables correctly?
You aren't creating new arrays of variables, just 2 pointers on the same object
Try to do so
t0 = np.array(mnist_model.trainable_variables)

Translating Pytorch program into Keras: different results

I have translated a pytorch program into keras.
A working Pytorch program:
import numpy as np
import cv2
import torch
import torch.nn as nn
from skimage import segmentation
np.random.seed(1)
torch.manual_seed(1)
fi = "in.jpg"
class MyNet(nn.Module):
def __init__(self, n_inChannel, n_outChannel):
super(MyNet, self).__init__()
self.seq = nn.Sequential(
nn.Conv2d(n_inChannel, n_outChannel, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.BatchNorm2d(n_outChannel),
nn.Conv2d(n_outChannel, n_outChannel, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.BatchNorm2d(n_outChannel),
nn.Conv2d(n_outChannel, n_outChannel, kernel_size=1, stride=1, padding=0),
nn.BatchNorm2d(n_outChannel)
)
def forward(self, x):
return self.seq(x)
im = cv2.imread(fi)
data = torch.from_numpy(np.array([im.transpose((2, 0, 1)).astype('float32')/255.]))
data = data.cuda()
labels = segmentation.slic(im, compactness=100, n_segments=10000)
labels = labels.flatten()
u_labels = np.unique(labels)
label_indexes = np.array([np.where(labels == u_label)[0] for u_label in u_labels])
n_inChannel = 3
n_outChannel = 100
model = MyNet(n_inChannel, n_outChannel)
model.cuda()
model.train()
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
label_colours = np.random.randint(255,size=(100,3))
for batch_idx in range(100):
optimizer.zero_grad()
output = model( data )[ 0 ]
output = output.permute( 1, 2, 0 ).view(-1, n_outChannel)
ignore, target = torch.max( output, 1 )
im_target = target.data.cpu().numpy()
nLabels = len(np.unique(im_target))
im_target_rgb = np.array([label_colours[ c % 100 ] for c in im_target]) # correct position of "im_target"
im_target_rgb = im_target_rgb.reshape( im.shape ).astype( np.uint8 )
for inds in label_indexes:
u_labels_, hist = np.unique(im_target[inds], return_counts=True)
im_target[inds] = u_labels_[np.argmax(hist, 0)]
target = torch.from_numpy(im_target)
target = target.cuda()
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
print (batch_idx, '/', 100, ':', nLabels, loss.item())
if nLabels <= 3:
break
fo = "out.jpg"
cv2.imwrite(fo, im_target_rgb)
(source: https://github.com/kanezaki/pytorch-unsupervised-segmentation/blob/master/demo.py)
My translation into Keras:
import cv2
import numpy as np
from skimage import segmentation
from keras.layers import Conv2D, BatchNormalization, Input, Reshape
from keras.models import Model
import keras.backend as k
from keras.optimizers import SGD, Adam
from skimage.util import img_as_float
from skimage import io
from keras.models import Sequential
np.random.seed(0)
fi = "in.jpg"
im = cv2.imread(fi).astype(float)/255.
labels = segmentation.slic(im, compactness=100, n_segments=10000)
labels = labels.flatten()
print (labels.shape)
u_labels = np.unique(labels)
label_indexes = [np.where(labels == u_label)[0] for u_label in np.unique(labels)]
n_channels = 100
model = Sequential()
model.add ( Conv2D(n_channels, kernel_size=3, activation='relu', input_shape=im.shape, padding='same'))
model.add( BatchNormalization())
model.add( Conv2D(n_channels, kernel_size=3, activation='relu', padding='same'))
model.add( BatchNormalization())
model.add( Conv2D(n_channels, kernel_size=1, padding='same'))
model.add( BatchNormalization())
model.add( Reshape((im.shape[0] * im.shape[1], n_channels)))
img = np.expand_dims(im,0)
print (img.shape)
output = model.predict(img)
print (output.shape)
im_target = np.argmax(output[0], 1)
print (im_target.shape)
for inds in label_indexes:
u_labels_, hist = np.unique(im_target[inds], return_counts=True)
im_target[inds] = u_labels_[np.argmax(hist, 0)]
def custom_loss(loss_target, loss_output):
return k.categorical_crossentropy(target=k.stack(loss_target), output=k.stack(loss_output), from_logits=True)
model.compile(optimizer=SGD(lr=0.1, momentum=0.9), loss=custom_loss)
model.fit(img, output, epochs=100, batch_size=1, verbose=1)
pred_result = model.predict(x=[img])[0]
print (pred_result.shape)
target = np.argmax(pred_result, 1)
print (target.shape)
nLabels = len(np.unique(target))
label_colours = np.random.randint(255, size=(100, 3))
im_target_rgb = np.array([label_colours[c % 100] for c in im_target])
im_target_rgb = im_target_rgb.reshape(im.shape).astype(np.uint8)
cv2.imwrite("out.jpg", im_target_rgb)
However, Keras output is really different than of pytorch
Input image:
Pytorch result:
Keras result:
Could someone help me for this translation?
Edit 1:
I corrected two errors as advised by #sebrockm
1. removed `relu` from last conv layer
2. added `from_logits = True` in the loss function
Also, changed the no. of conv layers from 4 to 3 to match with the original code.
However, output image did not improve than before and the `loss` are resulted in negative:
Epoch 99/100
1/1 [==============================] - 0s 92ms/step - loss: -22.8380
Epoch 100/100
1/1 [==============================] - 0s 99ms/step - loss: -23.039
I think that the Keras code lacks connection between model and output. However, could not figure out to make this connection.
Two major mistakes that I see (likely related):
The last convolutional layer in the original model does not have an activation function, while your translation uses relu.
The original model uses CrossEntropyLoss as loss function, while your model uses categorical_crossentropy with logits=False (a default argument). Without mathematical background the difference is tricky to explain, but in short: CrossEntropyLoss has a softmax built in, that's why the model doesn't have one on the last layer. To do the same in keras, use k.categorical_crossentropy(..., logits=True). "logits" means the input values are expected not to be "softmaxed", i.e. all values can be arbitrary. Currently, your loss function expects the output values to be "softmaxed", i.e. all values must be between 0 and 1 (and sum up to 1).
Update:
One other mistake, likely a huge one: In Keras, you calculate the output once in the beginning and never change it from there on. Then you train your model to fit on this initially generated output.
In the original pytorch code, target (which is the variable being trained on) gets updated in every training loop.
So, you cannot use Keras' fit method which is designed for doing the entire training for you (given fixed training data). You will have to replicate the training loop manually, just as it is done in the pytorch code. I'm not sure if this is easily doable with the API Keras provides. train_on_batch is one method you surely will need in your manual loop. You will have to do some more work, I'm afraid...

How to use tensorflow feature_columns as input to a keras model

Tensorflow's feature_columns API is quite useful for non-numerical feature processing. However, the current API doc is more about using feature_columns with tensorflow Estimator. Is there a possible way to use feature_columns for categorical features representation and then build a model based on tf.keras?
The only reference I found is the following tutorial. It shows how to feed feature columns to a Keras Sequential model: Link
The code snippet is as follows:
from tensorflow.python.feature_column import feature_column_v2 as fc
feature_columns = [fc.embedding_column(ccv, dimension=3), ...]
feature_layer = fc.FeatureLayer(feature_columns)
model = tf.keras.Sequential([
feature_layer,
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dense(64, activation=tf.nn.relu),
tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
...
model.fit(dataset, steps_per_epoch=8) # dataset is created from tensorflow Dataset API
The question is how to use a customed model with keras functional model API. I tried the following, but it did not work (tensorflow version 1.12)
feature_layer = fc.FeatureLayer(feature_columns)
dense_features = feature_layer(features) # features is a dict of ndarrays in dataset
layer1 = tf.keras.layers.Dense(128, activation=tf.nn.relu)(dense_features)
layer2 = tf.keras.layers.Dense(64, activation=tf.nn.relu)(layer1)
output = tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)(layer2)
model = Model(inputs=dense_features, outputs=output)
The error log:
ValueError: Input tensors to a Model must come from `tf.layers.Input`. Received: Tensor("feature_layer/concat:0", shape=(4, 3), dtype=float32) (missing previous layer metadata).
I don't kown how to transform feature columns to keras model's input.
The behavior you desire could be achieved and it's able to combine tf.feature_column and keras functional API. And, actually, is not mentioned in TF docs.
This works at least in TF 2.0.0-beta1, but may being changed or even simplified in further releases.
Please check out issue in TensorFlow github repository Unable to use FeatureColumn with Keras Functional API #27416. There you will find useful comments about tf.feature_column and Keras Functional API.
Because you ask about general approach I would just copy the snippet with example from the link above. update: the code below should work
from __future__ import absolute_import, division, print_function
import numpy as np
import pandas as pd
#!pip install tensorflow==2.0.0-alpha0
import tensorflow as tf
from tensorflow import feature_column
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
csv_file = tf.keras.utils.get_file('heart.csv', 'https://storage.googleapis.com/download.tensorflow.org/data/heart.csv')
dataframe = pd.read_csv(csv_file, nrows = 10000)
dataframe.head()
train, test = train_test_split(dataframe, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
print(len(train), 'train examples')
print(len(val), 'validation examples')
print(len(test), 'test examples')
# Define method to create tf.data dataset from Pandas Dataframe
# This worked with tf 2.0 but does not work with tf 2.2
def df_to_dataset_tf_2_0(dataframe, label_column, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
#labels = dataframe.pop(label_column)
labels = dataframe[label_column]
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
def df_to_dataset(dataframe, label_column, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
labels = dataframe.pop(label_column)
#labels = dataframe[label_column]
ds = tf.data.Dataset.from_tensor_slices((dataframe.to_dict(orient='list'), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
batch_size = 5 # A small batch sized is used for demonstration purposes
train_ds = df_to_dataset(train, label_column = 'target', batch_size=batch_size)
val_ds = df_to_dataset(val,label_column = 'target', shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, label_column = 'target', shuffle=False, batch_size=batch_size)
age = feature_column.numeric_column("age")
feature_columns = []
feature_layer_inputs = {}
# numeric cols
for header in ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope', 'ca']:
feature_columns.append(feature_column.numeric_column(header))
feature_layer_inputs[header] = tf.keras.Input(shape=(1,), name=header)
# bucketized cols
age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35])
feature_columns.append(age_buckets)
# indicator cols
thal = feature_column.categorical_column_with_vocabulary_list(
'thal', ['fixed', 'normal', 'reversible'])
thal_one_hot = feature_column.indicator_column(thal)
feature_columns.append(thal_one_hot)
feature_layer_inputs['thal'] = tf.keras.Input(shape=(1,), name='thal', dtype=tf.string)
# embedding cols
thal_embedding = feature_column.embedding_column(thal, dimension=8)
feature_columns.append(thal_embedding)
# crossed cols
crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)
crossed_feature = feature_column.indicator_column(crossed_feature)
feature_columns.append(crossed_feature)
feature_layer = tf.keras.layers.DenseFeatures(feature_columns)
feature_layer_outputs = feature_layer(feature_layer_inputs)
x = layers.Dense(128, activation='relu')(feature_layer_outputs)
x = layers.Dense(64, activation='relu')(x)
baggage_pred = layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs=[v for v in feature_layer_inputs.values()], outputs=baggage_pred)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(train_ds)
If you use tensorflow dataset API, that code could do well.
featurlayer = keras.layers.DenseFeatures(feature_columns=feature_columns)
train_dataset = train_dataset.map(lambda x, y: (featurlayer(x), y))
test_dataset = test_dataset.map(lambda x, y: (featurlayer(x), y))
model.fit(train_dataset, epochs=, steps_per_epoch=, # all_data/batch_num =
validation_data=test_dataset,
validation_steps=)
tf.feature_column.input_layer
user this function, and this api doc has a sample .
you can transform featur_columns into Tensor, and then use it into Mode()
I have been recently reading this document in TensorFlow 2.0 alpha version. It has examples using Keras together with the feature column API. Not sure if TF 2.0 is what you are going to use

How do I call model.predict with data stored in GPU in Keras

I have built and trained a Sequential model.
Now before each model.predict call I want to upload the data into GPU, do some operations and then call model.predict using the output stored in GPU without downloading to memory and handover to keras model for it to upload to gpu again.
Edit:
I would like to use opencv operations on the input image in gpu and use the output directly to call model.predict if possible.
You can easily achieve this by adding the operations as Lambda layers on the top of the model.
Here is a very simple example.. you can extend from here:
import numpy as np
from keras import backend as K
from keras.models import Sequential, Model
from keras.layers import Dense, Lambda, Input, merge
X = np.random.random((1000,5))
Y = np.random.random((1000,1))
inp = Input(shape = (5,))
d1 = Dense(60, input_dim=5, init='normal', activation='relu')
d2 = Dense(1, init='normal', activation='sigmoid')
out = d2(d1(inp))
model = Model(input=[inp], output=[out])
model.compile(loss='binary_crossentropy', optimizer='adam')
model.summary()
model.fit(X, Y, nb_epoch=1)
X1 = np.random.random((10,3))
X2 = np.random.random((10,2))
inp1 = Input(shape = (3,))
inp2 = Input(shape = (2,))
p1 = Lambda(lambda x: K.sqrt(x))(inp1)
p2 = Lambda(lambda x: K.tf.exp(x))(inp2)
mer = merge([p1, p2], mode='concat')
out2 = d2(d1(mer))
model2 = Model(input=[inp1, inp2], output=[out2])
model2.summary()
ypred = model2.predict([X1, X2])
print ypred.shape
Here from model.summary() you can see both the models are sharing the upper layers so in essence is using the weights already learnt during the training of the first model