FC = mx.sym.FullyConnected(data=x_3,flatten=False, num_hidden=n_class)
x = mx.sym.softmax(data=FC)
sm_label = mx.sym.Reshape(data=label, shape=(-1,))
sm_label = mx.sym.Cast(data = sm_label, dtype = ‘int32’)
sm = mx.sym.WarpCTC(data=x, label=sm_label, label_length =n_len ,
input_length =rnn_length )
my x layer's shape[(32L, 35L, 27L)] (bacthsize,input_lenth,n_class)
label的shape[(32L,21L)] (batchsize,label_lenth)
warpctc
simple_bind error.
Arguments:
data: (32, 1L, 32L, 286L)
label: (32, 21L)
Error in operator warpctc48: Shape inconsistent, Provided = [672], inferred shape=[0,1]
What can I do?
MXNet repo has a WarpCTC example here. You can run the training using python lstm_ocr_train.py --gpu 1 --num_proc 4 --loss warpctc font/Ubuntu-M.ttf. In the example, here are the shapes of prediction and label used with WarpCTC operator:
Prediction is (10240, 11)
Label is (512,)
label_length: 4
input_length: 80
batch_size = 128
seq_length = 80
In the above case,
Prediction is (batch_size*seq_length, n_class).
Label is (batch_size*label_length,).
Following along the lines of the example, I would suggest calling WarpCTC with prediction shape = (1120, 27), label shape = (672,), label_length = 21, input_length = 35 in your case.
Related
I am trying to create a LSTM model that can take multiple input features and Forecast those multiple features for N days
Eg I want to give 4 Features as input for 103 days not I want to forecast those 4 features for next 5 days
input features = [A1,B1,C1,D1],[A2,B2,C2,D2],...., [A103,B103,C103,D103]
output = [A104,B104,C104,D105],...,[A109,B109,C109,D109]
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)
train = scaled_data[:13813,:]
test = scaled_data[13813:,:]
def create_dataset(dataset,size):
X = []
y = []
for i in range(len(dataset)-size-1):
a = dataset[i:(i+size),:]
a = dataset[i:(i+size),:]
X.append(a)
y.append(dataset[i+size])
return np.array(X),np.array(y)
##535 is number of data i want to use to forecast the feature
X_test, y_test = create_dataset(test,535)
X_train,y_train = create_dataset(train,535)
##X_train and y_train shape ((13277, 535, 17), (13277, 17))
##X_test and y_test shape ((2917, 535, 17), (2917, 17))
model_2 = Sequential()
model_2.add(tf.keras.layers.LSTM(units=128,return_sequences=True,input_shape = (X_train.shape[1],17)))
model_2.add(tf.keras.layers.LSTM(units=64,return_sequences=True))
model_2.add(tf.keras.layers.LSTM(units=50,return_sequences=True))
model_2.add(tf.keras.layers.LSTM(units=32))
model_2.add(tf.keras.layers.Dense(units=17,activation='linear'))
model_2.summary()
model_2.compile(loss='MSE',optimizer='adam')
model_history_2 = model_2.fit(X_train,y_train,validation_data=(X_test,y_test),callbacks=[early_stopping],epochs = 5,batch_size=150)
##But after this I am unable understand how to create a sliding window to predict the next 103 records.
Using the TFBertForQuestionAnswering.from_pretrained() function, we get a predefined head on top of BERT together with a loss function that are suitable for this task.
My question is how to create a custom head without relying on TFAutoModelForQuestionAnswering.from_pretrained().
I want to do this because there is no place where the architecture of the head is explained clearly. By reading the code here we can see the architecture they are using, but I can't be sure I understand their code 100%.
Starting from How to Fine-tune HuggingFace BERT model for Text Classification is good. However, it covers only the classification task, which is much simpler.
'start_positions' and 'end_positions' are created following this tutorial.
So far, I've got the following:
train_dataset
# Dataset({
# features: ['input_ids', 'token_type_ids', 'attention_mask', 'start_positions', 'end_positions'],
# num_rows: 99205
# })
train_dataset.set_format(type='tensorflow', columns=['input_ids', 'token_type_ids', 'attention_mask'])
features = {x: train_dataset[x] for x in ['input_ids', 'token_type_ids', 'attention_mask']}
labels = [train_dataset[x] for x in ['start_positions', 'end_positions']]
labels = np.array(labels).T
tfdataset = tf.data.Dataset.from_tensor_slices((features, labels)).batch(16)
input_ids = tf.keras.layers.Input(shape=(256,), dtype=tf.int32, name='input_ids')
token_type_ids = tf.keras.layers.Input(shape=(256,), dtype=tf.int32, name='token_type_ids')
attention_mask = tf.keras.layers.Input((256,), dtype=tf.int32, name='attention_mask')
bert = TFAutoModel.from_pretrained("bert-base-multilingual-cased")
output = bert([input_ids, token_type_ids, attention_mask]).last_hidden_state
output = tf.keras.layers.Dense(2, name="qa_outputs")(output)
model = tf.keras.models.Model(inputs=[input_ids, token_type_ids, attention_mask], outputs=output)
num_train_epochs = 3
num_train_steps = len(tfdataset) * num_train_epochs
optimizer, schedule = create_optimizer(
init_lr=2e-5,
num_warmup_steps=0,
num_train_steps=num_train_steps,
weight_decay_rate=0.01
)
def qa_loss(labels, logits):
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction=tf.keras.losses.Reduction.NONE
)
start_loss = loss_fn(labels[0], logits[0])
end_loss = loss_fn(labels[1], logits[1])
return (start_loss + end_loss) / 2.0
model.compile(
loss=loss_fn,
optimizer=optimizer
)
model.fit(tfdataset, epochs=num_train_epochs)
And I am getting the following error:
ValueError: `labels.shape` must equal `logits.shape` except for the last dimension. Received: labels.shape=(2,) and logits.shape=(256, 2)
It is complaining about the shape of the labels. This should not happen since I am using SparseCategoricalCrossentropy loss.
For future reference, I actually found a solution, which is just editing the TFBertForQuestionAnswering class itself. For example, I added an additional layer in the following code and trained the model as usual and it worked.
from transformers import TFBertPreTrainedModel
from transformers import TFBertMainLayer
from transformers.modeling_tf_utils import TFQuestionAnsweringLoss, get_initializer, input_processing
from transformers.modeling_tf_outputs import TFQuestionAnsweringModelOutput
from transformers import BertConfig
class MY_TFBertForQuestionAnswering(TFBertPreTrainedModel, TFQuestionAnsweringLoss):
# names with a '.' represents the authorized unexpected/missing layers when a TF model is loaded from a PT model
_keys_to_ignore_on_load_unexpected = [
r"pooler",
r"mlm___cls",
r"nsp___cls",
r"cls.predictions",
r"cls.seq_relationship",
]
def __init__(self, config: BertConfig, *inputs, **kwargs):
super().__init__(config, *inputs, **kwargs)
self.num_labels = config.num_labels
self.bert = TFBertMainLayer(config, add_pooling_layer=False, name="bert")
# This is the dense layer I added
self.my_dense = tf.keras.layers.Dense(
units=config.hidden_size,
kernel_initializer=get_initializer(config.initializer_range),
name="my_dense",
)
self.qa_outputs = tf.keras.layers.Dense(
units=config.num_labels,
kernel_initializer=get_initializer(config.initializer_range),
name="qa_outputs",
)
def call(
self,
input_ids = None,
attention_mask = None,
token_type_ids = None,
position_ids = None,
head_mask = None,
inputs_embeds = None,
output_attentions = None,
output_hidden_states = None,
return_dict = None,
start_positions = None,
end_positions= None,
training = False,
**kwargs,
):
r"""
start_positions (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*):
Labels for position (index) of the start of the labelled span for computing the token classification loss.
Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
are not taken into account for computing the loss.
end_positions (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*):
Labels for position (index) of the end of the labelled span for computing the token classification loss.
Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
are not taken into account for computing the loss.
"""
inputs = input_processing(
func=self.call,
config=self.config,
input_ids=input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
head_mask=head_mask,
inputs_embeds=inputs_embeds,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
start_positions=start_positions,
end_positions=end_positions,
training=training,
kwargs_call=kwargs,
)
outputs = self.bert(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
token_type_ids=inputs["token_type_ids"],
position_ids=inputs["position_ids"],
head_mask=inputs["head_mask"],
inputs_embeds=inputs["inputs_embeds"],
output_attentions=inputs["output_attentions"],
output_hidden_states=inputs["output_hidden_states"],
return_dict=inputs["return_dict"],
training=inputs["training"],
)
sequence_output = outputs[0]
# You also have to add it here
my_logits = self.my_dense(inputs=sequence_output)
logits = self.qa_outputs(inputs=my_logits)
start_logits, end_logits = tf.split(value=logits, num_or_size_splits=2, axis=-1)
start_logits = tf.squeeze(input=start_logits, axis=-1)
end_logits = tf.squeeze(input=end_logits, axis=-1)
loss = None
if inputs["start_positions"] is not None and inputs["end_positions"] is not None:
labels = {"start_position": inputs["start_positions"]}
labels["end_position"] = inputs["end_positions"]
loss = self.hf_compute_loss(labels=labels, logits=(start_logits, end_logits))
if not inputs["return_dict"]:
output = (start_logits, end_logits) + outputs[2:]
return ((loss,) + output) if loss is not None else output
return TFQuestionAnsweringModelOutput(
loss=loss,
start_logits=start_logits,
end_logits=end_logits,
hidden_states=outputs.hidden_states,
attentions=outputs.attentions,
)
def serving_output(self, output: TFQuestionAnsweringModelOutput) -> TFQuestionAnsweringModelOutput:
hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None
attns = tf.convert_to_tensor(output.attentions) if self.config.output_attentions else None
return TFQuestionAnsweringModelOutput(
start_logits=output.start_logits, end_logits=output.end_logits, hidden_states=hs, attentions=attns
)
My goal is to use a CNN to go through a picture, then add an array of extra data before the dense layers.
picIn = keras.Input(shape=x[0].shape)
conv1 = layers.Conv2D(32,kernel_size=3,padding='same',use_bias=False)(picIn)
batch1 = layers.BatchNormalization()(conv1)
leaky1 = layers.LeakyReLU(alpha=.3)(batch1)
conv2 = layers.Conv2D(32,kernel_size=3,padding='same',use_bias=False)(leaky1)
batch2 = layers.BatchNormalization()(conv2)
leaky2 = layers.LeakyReLU(alpha=.3)(batch2)
cdrop1 = layers.Dropout(.20)(leaky2)
conv3= layers.Conv2D(64,kernel_size=3,padding='same',use_bias=False)(cdrop1)
batch3 = layers.BatchNormalization()(conv3)
leaky3 = layers.LeakyReLU(alpha=.3)(batch3)
conv4 = layers.Conv2D(64,kernel_size=3,padding='same',use_bias=False)(leaky3)
batch4 = layers.BatchNormalization()(conv4)
leaky4 = layers.LeakyReLU(alpha=.3)(batch4)
cdrop2 = layers.Dropout(.20)(leaky4)
flat1 = layers.Flatten()(cdrop2)
rtheta1 = rtheta[trainCut]
rtheta1 = rtheta1.reshape(467526,1)
rtheta2 = rtheta[testCut]
rtheta2 = rtheta2.reshape(82247,1)
ip2 = keras.Input(shape=rtheta1.shape)
flat2 = layers.Flatten()(ip2)
merge = layers.Concatenate()([flat1,flat2])
hidden1 = layers.Dense(512,use_bias=False)(merge)
batch5 = layers.BatchNormalization()(hidden1)
leaky5 = layers.LeakyReLU(alpha=.3)(batch5)
ddrop1 = layers.Dropout(.20)(leaky5)
hidden2 = layers.Dense(512,use_bias=False)(ddrop1)
batch6 = layers.BatchNormalization()(hidden2)
leaky6 = layers.LeakyReLU(alpha=.3)(batch6)
ddrop2 = layers.Dropout(.20)(leaky6)
hidden3 = layers.Dense(512,use_bias=False)(merge)
batch7 = layers.BatchNormalization()(hidden1)
leaky7 = layers.LeakyReLU(alpha=.3)(batch5)
ddrop3 = layers.Dropout(.20)(leaky5)
output = layers.Dense(1)(ddrop3)
model = keras.Model(inputs = [picIn,ip2], outputs = output)
H = model.fit(x =[ x[trainCut],rtheta[trainCut]],y= y[trainCut],batch_size=args.bsize,validation_data=([x[testCut],rtheta[testCut]], y[testCut]),epochs=args.epochs)
I always get an error related to the shape of the inputs
Input 0 of layer dense is incompatible with the layer: expected axis -1 of input shape to have value 473926 but received input with shape [None, 6401]
Model was constructed with shape (None, 467526, 1) for input Tensor("input_2:0", shape=(None, 467526, 1), dtype=float32), but it was called on an input with incompatible shape (None, 1, 1).
Im confused on what exactly to do here.
x[traincut] is a matrix of size (467526,10,10,2)
rtheta1 is (467526,1) and so is y[traincut]
The validation data is the same except it is 82247 instead of 467526.
I have tried it without flattening after ip2 and I get a different error but I think the core issue is still the same.
Any help would be appreciated. Thanks!
Edit: The data was not the right shape, obviously, but I figured out how to fix it.
Are you ensuring that all of your training data's shape is uniform before you put it through and into the first tensor?
So when I was trying to build a lstm network, every time it tells me that "ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (1188, 12, 2)".
My dataset has more than 1000 samples, 2 features, and I set the time_step as 12.
I have already reshaped my dataset to 3-dim, however, the error tells that my last layer-Dense layer(I use this layer as output) expected a 2-dimention array.What shell I do?
My codes are as follows:
# read train set
readColsPro = (7, 20)
filename = 'train_set.txt'
xProTrain_1 = readCsv.csvMat(filename, 1, cols=readColsPro, rows=[0, 1200])
yProTrain_1 = readCsv.csvMat(filename, 1, cols=readColsPro, rows=[1, 1201])
xProTrain_1 = xProTrain_1.reshape(xProTrain_1.shape[0], 2)
yProTrain_1 = yProTrain_1.reshape(yProTrain_1.shape[0], 2)
# erase 'nan' datas
for i in xProTrain_1:
if np.isnan(i[1]):
i[1] = 0
for i in yProTrain_1:
if np.isnan(i[1]):
i[1] = 0
# read test set
xProTest_1 = readCsv.csvMat(filename, 1, cols=readColsPro, rows=[1, 1201])
yProTest_1 = readCsv.csvMat(filename, 1, cols=readColsPro, rows=[2, 1202])
xProTest_1 = np.reshape(xProTest_1, (xProTest_1.shape[0], xProTest_1.shape[1]))
yProTest_1 = np.reshape(yProTest_1, (yProTest_1.shape[0], yProTest_1.shape[1]))
for i in xProTest_1:
if np.isnan(i[1]):
i[1] = 0
for i in yProTest_1:
if np.isnan(i[1]):
i[1] = 0
# parameters
timeStepPro = 12
epoch = 24
batch_size = 24
trainNumPro = xProTrain_1.shape[0]
testNumPro = yProTrain_1.shape[0]
# reshape datas to 3D
xProTrain_2 = []
for i in range(timeStepPro, trainNumPro):
xProTrain_2.append(xProTrain_1[i - timeStepPro:i])
xProTrain_2 = np.array(xProTrain_2)
yProTrain_2 = []
for i in range(timeStepPro, trainNumPro):
yProTrain_2.append(yProTrain_1[i - timeStepPro:i])
yProTrain_2 = np.array(yProTrain_2)
print(xProTrain_2.shape)
print(yProTrain_2.shape)
# reshape datas to 3D
xProTest_2 = []
for i in range(timeStepPro, trainNumPro):
xProTest_2.append(xProTest_1[i - timeStepPro:i])
xProTest_2 = np.array(xProTest_2)
yProTest_2 = []
for i in range(timeStepPro, trainNumPro):
yProTest_2.append(yProTest_1[i - timeStepPro:i])
yProTest_2 = np.array(yProTest_2)
# define network
modelA = Sequential()
modelA.add(LSTM(units=64, return_sequences=True,
input_shape=[xProTrain_2.shape[1], 2]))
modelA.add(BatchNormalization())
modelA.add(LSTM(units=128, return_sequences=True))
modelA.add(LSTM(units=128, return_sequences=True))
modelA.add(LSTM(units=256, return_sequences=True))
modelA.add(LSTM(units=64, return_sequences=False))
modelA.add(Dense(units=2, activation='relu'))
modelA.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
modelA.fit(x=xProTrain_2, y=yProTrain_2, epochs=epoch, batch_size=batch_size)
Error message are as follows:
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (1188, 12, 2)
Training a model with tf.nn.ctc_loss produces an error every time the train op is run:
tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
Unlike in previous questions about this function, this is not due to divergence. I have a low learning rate, and the error occurs on even the first train op.
The model is a CNN -> LSTM -> CTC. Here is the model creation code:
# Build Graph
self.videoInput = tf.placeholder(shape=(None, self.maxVidLen, 50, 100, 3), dtype=tf.float32)
self.videoLengths = tf.placeholder(shape=(None), dtype=tf.int32)
self.keep_prob = tf.placeholder(dtype=tf.float32)
self.targets = tf.sparse_placeholder(tf.int32)
self.targetLengths = tf.placeholder(shape=(None), dtype=tf.int32)
conv1 = tf.layers.conv3d(self.videoInput ...)
pool1 = tf.layers.max_pooling3d(conv1 ...)
conv2 = ...
pool2 = ...
conv3 = ...
pool3 = ...
cnn_out = tf.reshape(pool3, shape=(-1, self.maxVidLength, 4*7*96))
fw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
bw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
outputs, _ = tf.nn.bidirectional_dynamic_rnn(
fw_cell, bw_cell, cnn_out, sequence_length=self.videoLengths, dtype=tf.float32)
outputs = tf.concat(outputs, 2)
outputs = tf.reshape(outputs, [-1, self.hidden_size * 2])
w = tf.Variable(tf.random_normal((self.hidden_size * 2, len(self.char2index) + 1), stddev=0.2))
b = tf.Variable(tf.zeros(len(self.char2index) + 1))
out = tf.matmul(outputs, w) + b
out = tf.reshape(out, [-1, self.maxVidLen, len(self.char2index) + 1])
out = tf.transpose(out, [1, 0, 2])
cost = tf.reduce_mean(tf.nn.ctc_loss(self.targets, out, self.targetLengths))
self.train_op = tf.train.AdamOptimizer(0.0001).minimize(cost)
And here is the feed dict creation code:
indices = []
values = []
shape = [len(vids) * 2, self.maxLabelLen]
vidInput = np.zeros((len(vids) * 2, self.maxVidLen, 50, 100, 3), dtype=np.float32)
# Actual video, then left-right flip
for j in range(len(vids) * 2):
# K is video index
k = j if j < len(vids) else j - len(vids)
# convert video and label to input format
vidInput[j, 0:len(vids[k])] = vids[k] if k == j else vids[k][:,::-1,:]
indices.extend([j, i] for i in range(len(labelList[k])))
values.extend(self.char2index[c] for c in labelList[k])
fd[self.targets] = (indices, values, shape)
fd[self.videoInput] = vidInput
# Collect video lengths and label lengths
vidLengths = [len(j) for j in vids] + [len(j) for j in vids]
labelLens = [len(l) for l in labelList] + [len(l) for l in labelList]
fd[self.videoLengths] = vidLengths
fd[self.targetLengths] = labelLens
It turns out that the ctc_loss requires that the label lengths be shorter than the input lengths. If the label lengths are too long, the loss calculator cannot unroll completely and therefore cannot compute the loss.
For example, the label BIFI would require input length of at least 4 while the label BIIF would require input length of at least 5 due to a blank being inserted between the repeated symbols.
I had the same issue but I soon realized it was just because I was using glob and my label was in the filename so it was exceeding.
You can fix this issue by using:
os.path.join(*(filename.split(os.path.sep)[noOfDir:]))
For me the problem was fixed by setting preprocess_collapse_repeated=True.
FWIW: My target sequence length was already shorter than inputs, and the RNN outputs are that of softmax.
Another possible reason which I found out in my case is the input data range is not normalized to 0~1, due to that LSTM activation function becomes saturated in the beginning of the training, and causes "no valid path" log somehow.