what's the meaning of 'input_length'? - tensorflow

the data have 4 timestamps,but the embedding's input_length=3,so what's the meaning of input_length?
from tensorflow import keras
import numpy as np
data = np.array([[0,0,0,0]])
emb = keras.layers.Embedding(input_dim=2, output_dim=3, input_length=3)
emb(data)

As per the official documentation here,
input_length: Length of input sequences, when it is constant. This
argument is required if you are going to connect Flatten then Dense
layers upstream (without it, the shape of the dense outputs cannot be
computed).
from tensorflow import keras
import numpy as np
model = keras.models.Sequential()
model.add(keras.layers.Embedding(input_dim=2, output_dim=3, input_length=4))
# the model will take as input an integer matrix of size (batch, input_length).
input_array = np.array([[0,0,0,0]])
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
print(output_array)
Above works fine, but if you change input_length to 3, then you will get below error:
ValueError: Error when checking input: expected embedding_input to
have shape (3,) but got array with shape (4,)

Related

how to concatenate with a flatten layer

I would like to flatten an input before concatenation like below.
import numpy as np
import pandas as pd
import tensorflow as tf
from matplotlib import pyplot as plt
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import (
CategoryEncoding,
Concatenate,
Dense,
Discretization,
Embedding,
Flatten,
Input,
)
from tensorflow.keras.layers.experimental.preprocessing import HashedCrossing
dnn_hidden_units = [32, 8]
NBUCKETS = 16
latbuckets = np.linspace(start=38.0, stop=42.0, num=NBUCKETS).tolist()
lonbuckets = np.linspace(start=-76.0, stop=-72.0, num=NBUCKETS).tolist()
# Bucketization with Discretization layer
plon = Discretization(lonbuckets, name="plon_bkt")(inputs["pickup_longitude"])
plat = Discretization(latbuckets, name="plat_bkt")(inputs["pickup_latitude"])
dlon = Discretization(lonbuckets, name="dlon_bkt")(inputs["dropoff_longitude"])
dlat = Discretization(latbuckets, name="dlat_bkt")(inputs["dropoff_latitude"])
# Feature Cross with HashedCrossing layer
p_fc = HashedCrossing(num_bins=NBUCKETS * NBUCKETS, name="p_fc")((plon, plat))
d_fc = HashedCrossing(num_bins=NBUCKETS * NBUCKETS, name="d_fc")((dlon, dlat))
pd_fc = HashedCrossing(num_bins=NBUCKETS**4, name="pd_fc")((p_fc, d_fc))
# Embedding with Embedding layer
pd_embed = Embedding(input_dim=NBUCKETS**4, output_dim=10, name="pd_embed")(
pd_fc
)
unk = Concatenate(axis=1)([pd_embed])
# Concatenate and define inputs for deep network
deep = Concatenate(name="deep_input",axis=0)(
[
inputs["pickup_longitude"],
inputs["pickup_latitude"],
inputs["dropoff_longitude"],
inputs["dropoff_latitude"],
Flatten(name="flatten_embedding")(pd_embed),
]
)
I am getting the following error at the conatenate layer.
ValueError: A Concatenate layer requires inputs with matching shapes
except for the concatenation axis. Received: input_shape=[(None,),
(None,), (None,), (None,), (None, 10)]
I understand that (None,10) should be (None*10) or just (None) but I am not sure how to get there.
The concatenate layer takes input as a list of tensors, all of the same shape except for the concatenation axis, and returns a single tensor that is the concatenation of all inputs.
In the error you have mentioned above says that you try to concatenate 2 different shapes None(has an unknown number of dimensions, and an unknown size in all dimensions) and None,10 ( has a known number of dimensions, and an unknown size for one or more dimension).
For example i have to concatenate 2 tensors a and b (a,b has to be the same size)
import tensorflow as tf
a=tf.random.uniform([2,3])
b=tf.random.uniform([2,3])
tf.keras.layers.Concatenate(axis=0)([a.numpy(), b.numpy()])
output:<tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[0.5595623 , 0.07109773, 0.646863 ],
[0.1997714 , 0.6131079 , 0.03418195],
[0.40428162, 0.94192684, 0.10390592],
[0.72463846, 0.3348019 , 0.95906615]], dtype=float32)>
If a and b are of different shape it will produce an error
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(2, 3), (3, 2)]
Thank You.

Convert TensorFlow data to be used by ONNX inference

I'm trying to convert a LSTM model from TensorFlow into ONNX. The code for generating data for TensorFlow model training is as below:
def make_dataset(self, data):
data = np.array(data, dtype=np.float32)
ds = tf.keras.utils.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=self.total_window_size,
sequence_stride=1,
shuffle=True,
batch_size=32, )
ds = ds.map(self.split_window)
The model training code is actually from the official tutorial. Then after conversion to ONNX, I try to perform prediction as follows:
import onnx
import onnxruntime as rt
from tf_lstm import WindowGenerator
import tensorflow as tf
wide_window = WindowGenerator(
input_width=24, label_width=24, shift=1,
label_columns=['T (degC)'])
model = onnx.load_model('models/onnx/tf-lstm-weather.onnx')
print(model)
sess = rt.InferenceSession('models/onnx/tf-lstm-weather.onnx')
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
pred = sess.run([label_name], {input_name: wide_window.test})[0]
But it throws this error:
RuntimeError: Input must be a list of dictionaries or a single numpy array for input 'lstm_input'.
I tried to convert wide_window.test into numpy array and use it instead as follows:
test_data = []
test_label = []
for x, y in wide_window.test:
test_data.append(x.numpy())
test_label.append(y.numpy())
test_data2 = np.array(test_data, dtype=np.float)
pred = sess.run([label_name], {input_name: test_data2})[0]
Then it gives this error:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (219,) + inhomogeneous part.
Any idea?
That's a numpy error. Each row you add to the input array has to have the same number of elements.
setting an array element with a sequence requested array has an inhomogeneous shape after 1 dimensions The detected shape was (2,)+inhomogeneous part

How to fix type error with Keras Lambda layer

I need to embed sentence universal encoder into my keras model using Google colab.
import tensorflow as tf
import tensorflow_hub as hub
from tensorflow import keras
def UniversalEmbedding(x):
results = embed(tf.squeeze(tf.cast(x, tf.string)))["outputs"]
return keras.backend.concatenate([results])
module_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
embed = hub.load(module_url)
input_size = 1
embed_size = 512
input_1 = keras.layers.Input(shape=(input_size,), dtype=tf.string)
embed_layer = keras.layers.Lambda(UniversalEmbedding, output_shape=(embed_size,))
x1 = embed_layer(input_1)
It throws a type error as follows.
TypeError: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices, got 'outputs'
TF version: 2.3.0
Any hint to fix it is appreciated.

Getting TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0] while doing multi class classification

from sklearn.naive_bayes import CategoricalNB
from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(sparse = True, n_labels = 15,
return_indicator = 'sparse', allow_unlabeled = False)
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=0)
I tried using X.todense() but the error is still raised.
X_train = X_train.todense()
X_test = X_test.todense()
Training on the dataset
from skmultilearn.adapt import MLkNN
from sklearn.metrics import accuracy_score
classifier = MLkNN(k=20)
classifier.fit(X_train, y_train)
predicting the output of trained dataset.
y_pred = classifier.predict(X_test)
accuracy_score(y_test,y_pred)
np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1)
You are trying to get the length from a matrix, which is ambigious:
len(y_pred)
Your matrix y_pred has the dimension (25,5), as seen with y_pred.shape.
So instead of len(y_pred), you could use y_pred.shape[0], which would return 25.
But then you will encounter a problem when you are using y_pred.reshape(y_pred.shape[0],1)
ValueError: cannot reshape array of size 125 into shape (25, 1)
(previously: y_pred.reshape(len(y_pred),1))
This error makes sense, because you are trying to reshape a matrix with 125 values into a matrix with only 25 values. You need to rethink your code here.

Tensorflow data import

I just started to use tensorflow, but I failed to import the data properly to use with the DNNClassifier. I actually have two files in the hdf5 format, that I import with pandas. The feature vector has dimension 100 and there are 5 classes where the features can belong to. If I use for example the following code:
import pandas as pd
import numpy as np
import tensorflow as tf
#Data
train = pd.read_hdf("train.h5", "train")
test = pd.read_hdf("test.h5", "test")
Y=train.iloc[0:,0]
X=train.iloc[0:,1:]
X_t=test.iloc[0:,0:]
Y=np.array(Y.values).astype('int')
X=np.array(X.values).astype('double')
X_t=np.array(X_t.values).astype('double')
#Train
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=100)]
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20],
n_classes=5,
model_dir="/tmp/model")
# Define the training inputs
def get_train_inputs():
x = tf.constant(X)
y = tf.constant(Y)
return x, y
#fit
classifier.fit(input_fn=get_train_inputs, steps=1000)
predictions = list(classifier.predict(input_fn=get_train_inputs))
print(predictions)
I get the error: InvalidArgumentError (see above for traceback): Shape in shape_and_slice spec [100,10] does not match the shape stored in checkpoint: [1,10]
[[Node: save/RestoreV2_2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_2/tensor_names, save/RestoreV2_2/shape_and_slices)]]
I don't get why this happens? How should I transform my data to apply to this classifier?
My Solution:-
Change your model_dir="/tmp/model" to
model_dir="/tmp/model-1
Note:- It need not to be model-1, replace it with any valid names like
model_dir="/tmp/model-a ..something like that..