How to access pytorch embeddings lookup table as a tensor - tensorflow

I want to show my embeddings with the tensorboard projector. I would like to access the embeddings matrix (lookup table) of one of my layers so I can write it to the logs.
I instantiate my layer as this:
self.embeddings_user = torch.nn.Embedding(30,300)
And I'm looking for the tensor with shape (30,300) of 30 users with embedding on 300 to dimensions to replace it with the vectors variable in this sample code:
import numpy as np
import tensorflow as tf
import tensorboard as tb
tf.io.gfile = tb.compat.tensorflow_stub.io.gfile
from torch.utils.tensorboard import SummaryWriter
vectors = np.array([[0,0,1], [0,1,0], [1,0,0], [1,1,1]])
metadata = ['001', '010', '100', '111'] # labels
writer = SummaryWriter()
writer.add_embedding(vectors, metadata)
writer.close()

Embeddings layers have weight attributes corresponding to the lookup table. You can access it as follows.
vectors = self.embeddings_user.weight
So now you can visualize it with tensorboard.
import numpy as np
import tensorflow as tf
import tensorboard as tb
tf.io.gfile = tb.compat.tensorflow_stub.io.gfile
from torch.utils.tensorboard import SummaryWriter
vectors = self.embeddings_user.weight
metadata = ['001', '010', '100', '111', ...] # labels
writer = SummaryWriter()
writer.add_embedding(vectors, metadata)
writer.close()

Related

stacking cnn output layer with xgboost. Data prep gives OOM error

I have trained a cnn model and I am trying to stack the output layer to an xgboost regressor to reduce mape. I am getting OOM error in Sagemaker training job when I try to include the input data (in npy format) with the cnn output layer and save it as csv - so this can be input to xgboost. When I try to run this in Sagemaker notebook instance the kernel dies. The training input npy file is around 42gb and I have tried these instances : ml.m5d.24xlarge, ml.r5.24xlarge
Here is my code I am running in notebook:
'''
import numpy as np
import tensorflow as tf
import boto3
from io import BytesIO
from keras.models import load_model
from keras import backend as K
client = boto3.client("s3")
bucket = <bucket_name>
key = '/path/cnn_model.h5'
client.download_file(bucket, key, 'cnn_model.h5')
cnn_model = load_model("cnn_model.h5")
def read_s3_npy(s3_uri, arg = False):
bytes = BytesIO()
bytes_.seek(0)
parsed_s3 = urlparse(s3_uri)
obj = client.get_object(Bucket=parsed_s3.netloc, key = parsed_s3.path[1:])
return np.load(BytesIO(obj['Body'].read()), allow_pickle=arg)
x_train_path = <path in s3>+'x_train.npy'
y_train_path = <path in s3>+'y_train.npy'
x_train = read_s3_npy(x_train_path)
y_train = read_s3_npy(y_train_path)
last_layer_op = K.function([cnn_model.layers[0].input], [cnn_model.layers[-2].output])
train_layer = last_layer_op([x_train, 1])[0]
'''

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type tensorflow.python.framework.ops.EagerTensor)

I'm trying to use huggingface and tensorflow to train a BERT model on some data. Here's my code:
First, I initialized the tokenizer.
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', sep_token = "||")
Then applied my tokenizer to my data.
def preprocess_function(x):
return tokenizer(x, truncation = True, return_tensors = 'tf')['input_ids']
from tqdm import tqdm
tqdm.pandas()
df["Text"] = df["Text"].progress_apply(preprocess_function)
And some more preprocessing..
df["intvwStatus"] = [0 if x == "Completed" else 1 for x in df["intvwStatus"]]
import numpy as np
train, validate, test = \
np.split(df.sample(frac=1, random_state=42),
[int(.6*len(df)), int(.8*len(df))])
Created an optimizer
from transformers import create_optimizer
import tensorflow as tf
batch_size = 16
num_epochs = 5
batches_per_epoch = len(train) // batch_size
total_train_steps = int(batches_per_epoch * num_epochs)
optimizer, schedule = create_optimizer(init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
And then finally instantiated and compiled my model
from transformers import TFBertForSequenceClassification
model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased")
import tensorflow as tf
model.compile(optimizer=optimizer)
Then fit my model
x_train = train["Text"]
y_train = train["intvwStatus"]
x_val = validate["Text"]
y_val = validate["intvwStatus"]
model.fit(x=x_train,y=y_train, validation_data=(x_val, y_val), epochs=3)
Which gives error:
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type tensorflow.python.framework.ops.EagerTensor).
I'm confused. Why is it confusing tensorflow.python.framework.ops.EagerTensor to a NumPy array?

what's the meaning of 'input_length'?

the data have 4 timestamps,but the embedding's input_length=3,so what's the meaning of input_length?
from tensorflow import keras
import numpy as np
data = np.array([[0,0,0,0]])
emb = keras.layers.Embedding(input_dim=2, output_dim=3, input_length=3)
emb(data)
As per the official documentation here,
input_length: Length of input sequences, when it is constant. This
argument is required if you are going to connect Flatten then Dense
layers upstream (without it, the shape of the dense outputs cannot be
computed).
from tensorflow import keras
import numpy as np
model = keras.models.Sequential()
model.add(keras.layers.Embedding(input_dim=2, output_dim=3, input_length=4))
# the model will take as input an integer matrix of size (batch, input_length).
input_array = np.array([[0,0,0,0]])
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
print(output_array)
Above works fine, but if you change input_length to 3, then you will get below error:
ValueError: Error when checking input: expected embedding_input to
have shape (3,) but got array with shape (4,)

ValueError: Cannot feed value of shape (1, 1) for Tensor 'Placeholder_765:0', which has shape '(1,)'

I have a trained an InceptionV3 from scratch on a custom dataset containing 100 classes. Initialized the CNN model on Keras. I am now trying to generate adversarial examples for this model of mine using Foolbox, however I am getting the above error. Where am I going wrong? The library(Foolbox) seems to be working fine for others and my model gets past the image classification process correctly without any error but the wrapper model generates it.
from keras.models import load_model
from keras.applications.vgg16 import VGG16
import foolbox
from foolbox.models import KerasModel
from foolbox.attacks import LBFGSAttack
from foolbox.criteria import TargetClass
import numpy as np
import foolbox
keras.backend.set_learning_phase(0)
model=load_model('standard_inceptionV3.h5')
fmodel = foolbox.models.KerasModel(model, bounds=(0, 255))
from PIL import Image
img = Image.open('/home/shikhar/Downloads/suit.jpeg')
img = img.resize((224,224))
img = np.asarray(img)
img = img[:, :, :3]
lab=model.predict(np.expand_dims(img, axis=0))
label=np.argmax(lab,axis=1)
from foolbox.criteria import Misclassification, TargetClass
attack = foolbox.attacks.FGSM(model=fmodel)
adversarial = attack(img, label,unpack=False)

Tensorflow data import

I just started to use tensorflow, but I failed to import the data properly to use with the DNNClassifier. I actually have two files in the hdf5 format, that I import with pandas. The feature vector has dimension 100 and there are 5 classes where the features can belong to. If I use for example the following code:
import pandas as pd
import numpy as np
import tensorflow as tf
#Data
train = pd.read_hdf("train.h5", "train")
test = pd.read_hdf("test.h5", "test")
Y=train.iloc[0:,0]
X=train.iloc[0:,1:]
X_t=test.iloc[0:,0:]
Y=np.array(Y.values).astype('int')
X=np.array(X.values).astype('double')
X_t=np.array(X_t.values).astype('double')
#Train
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=100)]
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20],
n_classes=5,
model_dir="/tmp/model")
# Define the training inputs
def get_train_inputs():
x = tf.constant(X)
y = tf.constant(Y)
return x, y
#fit
classifier.fit(input_fn=get_train_inputs, steps=1000)
predictions = list(classifier.predict(input_fn=get_train_inputs))
print(predictions)
I get the error: InvalidArgumentError (see above for traceback): Shape in shape_and_slice spec [100,10] does not match the shape stored in checkpoint: [1,10]
[[Node: save/RestoreV2_2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_2/tensor_names, save/RestoreV2_2/shape_and_slices)]]
I don't get why this happens? How should I transform my data to apply to this classifier?
My Solution:-
Change your model_dir="/tmp/model" to
model_dir="/tmp/model-1
Note:- It need not to be model-1, replace it with any valid names like
model_dir="/tmp/model-a ..something like that..