Feed LSTM layer with different feature sizes - tensorflow

For a NLP task, I want to train a neural network including LSTM layer. Input data has multiple features with different dimensions. Like this:
x1: [embedding(1,300), POS_idx(1,1), feature_x(1,20), feature_y(1,10)]
(each feature with it's shape)
My question is that how to feed this kind of data to LSTM layer? is concatenating all features to a single array works well?

Related

Best way to add features after last convolutional layer, before fully-connected layers?

I am working on a regression problem related to chess. The output will depend on about 68 values that are given by Stockfish's static evaluation function (example output shown here), as well as the state of the board. However, the static eval features should not be passed through the CNN, only through the final fully-connected layers. Therefore I want to have some convolutional layers take the (one-hot encoded) board state down to a flat vector, then extend it with the other features before passing the full vector to a fully-connected layer.
How can I use Tensorflow to combine these two feature vectors (the result from the CNN and the other game-related features) within a single Layer type that can be added to a Sequential? I couldn't find anything that would handle this in the docs. Would subclassing Layer be the only way to go?

How to implement skip connections from MSG-GAN paper

I am trying to implement the technique described in the MSG-GAN paper:
https://arxiv.org/pdf/1903.06048.pdf
But I am having difficulty understanding some things, for example, how are the connections made from the generator to the discriminator? These are Conv2D connections literally? (in that case, how would I insert the real images to train the discriminator?) Or does the discriminator have multiple outputs (one prediction for each resolution and the generator has to optimize the average loss of the resolutions)?
How are the connections made from the generator to the discriminator?
Generator output them and discriminator concatenate them with feature maps from last layer at corresponding input layer.
These are Conv2D connections literally?
Those just input tensor with shape like(batch size, W, H, 3), same as ordinary image input.
Does the discriminator have multiple outputs?
No, this is end to end training, training with all resolution outputs at same time, otherwise it will just like the Progressive Growing GAN and no reason for concatenate operation at each input layer(beginning of each block) of discriminator.
this is only a partial answer.
I would say that if you had to implement this in keras, and you don't want each model (G and D) to be in one piece, it's actually easier to have them separated and then use tf.GradientTape() to train
Does the discriminator have multiple outputs?
yes, if implementing them as separate models, yes there will be multiple inputs and multiple outputs of multiple resolutions, only one of those is the final output.

What are the uses of layers in keras/Tensorflow

So I am new to computer vision, and I do not really know what the layers do in keras. What is the use of adding layers (dense, Conv2D, etc) in keras? What do they add to it?
Convolution neural network has 4 main steps: Convolution, Pooling, Flatten, and Full connection.
Conv2D(), Conv3D(), etc. is for Feature extraction (It's a Convolution Layer).
Pooling layers (MaxPool2D(), AvgPool2D(), etc) is for Feature extraction as well (It has different operation though).
Flattening layers (Flatten() ) are to convert the extracted feature map into Vector before being fed into the Fully connection layers (The Dense layers).
Dense layers are for Fully connected step in Computer vision that acts as Classifier (The Neural network classify each extracted features from the Convolution layers.)
There are also optimization layers such as Dropout(), BatchNormalization(), etc.
For more information, just open the keras documentation.
If you want to start learning Convolution neural network, this article may help.
A layer in an Artificial Neural Network is a bunch of nodes bound together at a specific depth in a Neural Network. Keras is a high level API used over NN modules like TensorFlow or CNTK in order to simplify tasks. A Keras layer comprises 3 main parts:
Input Layer - Which contains the raw data
Hidden layer - Where the nodes of a layer learn some aspects about
the raw data which is input. It's similar to levels of abstraction
to form a Neural network.
Output Layer - Consists of a single output which is mostly a single
node and can be subjected to classification.
Keras, as a whole consists of many different types of layers. A Convolutional layer creates a kernel which is convoluted with the input over a single temporal space to derive a group of outputs. Pooling layers provide sampling of the feature maps by simplifying features in a map into patches. Max Pooling and Average Pooling are commonly used methods in a Pool layer.
Other commonly used layers in Keras are Embedding layers, Noise layers and Core layers. A single NN layer can represent only a Linearly seperable method. Most prediction problems are complicated and more than just one layer is required. This is where Multi Layer concept is required.
I think i clear your doubts and for any other queries you can see on https://www.tensorflow.org/api_docs/python/tf/keras
Neural networks are a great tool nowadays to automate classification problems. However when it comes to computer vision the amount of input data is too great to be handled efficiently by simple neural networks.
To reduce the network workload, your data needs to be preprocessed and certain features need to be identified. To find features in images we can use certain filters (like sobel edge detection), which will highlight the essential features needed for classification.
Again the amount of filters required to classify one image is too great, and thus the selection of those filters needs to be automated.
That's where the convolutional layer comes in.
We use a convolutional layer to generate multiple random (at first) filters that will highlight certain features in an image. While the network is training those filters are optimized to do a better job at highlighting features.
In Tensorflow we use Conv2D() to add one of those layers. An example of parameters is : Conv2D(64, 3, activation='relu'). 64 denotes the number of filters used, 3 denotes the size of the filters (in this case 3x3) and activation='relu' denotes the activation function
After the convolutional layer we use a pooling layer to further highlight the features produced by the previous convolutional layer. In Tensorflow this is usually done with MaxPooling2D() which takes the filtered image and applies a 2x2 (by default) layer every 2 pixels. The filter applied by MaxPooling is basically looking for the maximum value in that 2x2 area and adds it in a new image.
We can use this set of convolutional layer and pooling layers multiple times to make the image easier for the network to work with.
After we are done with those layers, we need to pass the output to a conventional (Dense) neural network.
To do that, we first need to flatten the image data from a 2D Tensor(Matrix) to a 1D Tensor(Vector). This is done by calling the Flatten() method.
Finally we need to add our Dense layers which are used to train on the flattened data. We do this by calling Dense(). An example of parameters is Dense(64, activation='relu')
where 64 is the number of nodes we are using.
Here is an example CNN structure I used recently:
# Build model
model = tf.keras.models.Sequential()
# Convolution and pooling layers
model.add(tf.keras.layers.Conv2D(64, 3, activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 1))) # Input layer
model.add(tf.keras.layers.MaxPooling2D())
model.add(tf.keras.layers.Conv2D(64, 3, activation='relu'))
model.add(tf.keras.layers.MaxPooling2D())
# Flattened layers
model.add(tf.keras.layers.Flatten())
# Dense layers
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(2, activation='softmax')) # Output layer
Of course this worked for a certain classification problem and the number of layers and method parameters differ depending on the problem.
The Youtube channel The Coding Train has a very helpful video explaining the Convolutional and Pooling layer.

What is the difference between conv1d with kernel_size=1 and dense layer?

I am building a CNN with Conv1D layers, and it trains pretty well. I'm now looking into how to reduce the number of features before feeding it into a Dense layer at the end of the model, so I've been reducing the size of the Dense layer, but then I came across this article. The article talks about the effect of using a Conv2D filters with a kernel_size=(1,1) to reduce the number of features.
I was wondering what the difference is between using a Conv2D layer with kernel_size=(1,1) tf.keras.layers.Conv2D(filters=n,kernel_size=(1,1)) and using a Dense layer of the same size tf.keras.layers.Dense(units=n)? From my perspective (I'm relatively new to neural nets), a filter with kernel_size=(1,1) is a single number, which is essentially equivalent to weight in a Dense layer, and both layers have biases, so are they equivalent, or am I misunderstanding something? And if my understanding is correct, in my case where I am using Conv1D layers, not Conv2D layers, does that change anything? As in is tf.keras.layers.Conv1D(filters=n, kernel_size=1) equivalent to tf.keras.layers.Dense(units=n)?
Please let me know if you need anything from me to clarify the question. I'm mostly curious about if Conv1D layers with kernel_size=1 and Conv2D layers with kernel_size=(1,1) behave differently than Dense layers.
Yes, since Dense layer is applied on the last dimension of its input (see this answer), Dense(units=N) and Conv1D(filters=N, kernel_size=1) (or Dense(units=N) and Conv2D(filters=N, kernel_size=1)) are basically equivalent to each other both in terms of connections and number of trainable parameters.
In 1D CNN, the kernel moves in 1 direction. The input and output data of 1D CNN is 2 dimensional. Mostly used on Time-Series Data, Natural Language Processing tasks etc. Definitely gonna see people using it in Kaggle NLP competitions and notebooks.
In 2D CNN, the kernel moves in 2 directions. The input and output data of 2D CNN is 3 dimensional. Mostly used on Image data.
Definitely gonna see people using it in Kaggle CNN Image Processing competitions and notebooks
In 3D CNN, the kernel moves in 3 directions. The input and output data of 3D CNN is 4 dimensional. Mostly used on 3D Image data (MRI, CT Scans). Haven't personally seen applied version in competitions

Weights update in Tensorflow embedding layer with pretrained fasttext weights

I'm not sure if my understanding is correct but...
While training a seq2seq model, one of the purpose I want to initiated a set of pre-trained fasttext weights in the embedding layers is to decrease the unknown words in the test environment (these unknown words are not in training set). Since pre-trained fasttext model has larger vocabulary, during test environment, the unknown word can be represented by fasttext out-of-vocabulary word vectors, which supposed to have similar direction of the semantic similar words in the training set.
However, due to the fact that the initial fasttext weights in the embedding layers will be updated through the training process (updating weights generates better results). I am wondering if the updated embedding weights would distort the relationship of semantic similarity between words and undermine the representation of fasttext out-of-vocabulary word vectors? (and, between those updated embedding weights and word vectors in the initial embedding layers but their corresponding ID didn't appear in the training data)
If the input ID can be distributed represented vectors extracted from pre-trained model and, then, map these pre-trained word vectors (fixed weights while training) via a lookup table to the embedding layers (these weights will be updated while training), would it be a better solution?
Any suggestions will be appreciated!
You are correct about the problem: when using pre-trained vector and fine-tuning them in your final model, the words that are infrequent or hasn't appear in your training set won't get any updates.
Now, usually one can test how much of the issue for your particular case this is. E.g. if you have a validation set, try fine-tuning and not fine-tuning the weights and see what's the difference in model performance on validation set.
If you see a big difference in performance on validation set when you are not fine-tuning, here is a few ways to handle this:
a) Add a linear transformation layer after not-trainable embeddings. Fine-tuning embeddings in many cases does affine transformations to the space, so one can capture this in a separate layer that can be applied at test time.
E.g. A is pre-trained embedding matrix:
embeds = tf.nn.embedding_lookup(A, tokens)
X = tf.get_variable("X", [embed_size, embed_size])
b = tf.get_vairable("b", [embed_size])
embeds = tf.mul(embeds, X) + b
b) Keep pre-trained embeddings in the not-trainable embedding matrix A. Add trainable embedding matrix B, that has a smaller vocab of popular words in your training set and embedding size. Lookup words both in A and B (and if word is out of vocab use ID=0 for example), concat results and use it input to your model. This way you will teach your model to use mostly A and sometimes rely on B for popular words in your training set.
fixed_embeds = tf.nn.embedding_lookup(A, tokens)
B = tf.get_variable("B", [smaller_vocab_size, embed_size])
oov_tokens = tf.where(tf.less(tokens, smaller_vocab_size), tokens, tf.zeros(tf.shape(tokens), dtype=tokens.dtype))
dyn_embeds = tf.nn.embedding_lookup(B, oov_tokens)
embeds = tf.concat([fixed_embeds, dyn_embeds], 1)