How to let XGBoost output as int, not objects? - text-mining

I am using XGBoost to classify fake or true news. However after dealing with the text by Tfidf vector, training my XGBclassifier and running the predict X_test successfully.
The error message displaying when calculating accuracy,precision,Recall and F-measure. Message is:
TypeError: '<' not supported between instances of 'str' and 'int'.

Related

Converting tensorflow dataset to numpy array

I have an autoencoder defined using tf.keras in tensorflow 1.15. I cannot upgrade to tensorflow to 2.0 for some specific reasons.
This particular autoencoder is used for anomaly detection. I currently compute the AUC score of the autoencoder as follows:
All anomalous inputs are labelled 1 and all normal inputs are labelled 0. This is y_true
I feed the autoencoder with unseen inputs and then measure the reconstruction error, like so: errors = np.mean(np.square(data - model.predict(data)), axis=-1)
The mean of this array is then said to the predicted label, y_pred.
I then compute the AUC using auc = metrics.roc_auc_score(y_true, y_pred).
This approach works well. I now need to move towards using tf.data.dataset to feed in my data. Previously, it was numpy arrays. The issue is, I am unable to convert tf.data.dataset to a numpy array and hence unable to compute the mean squared error as seen in 2.
Once I have a tf.data.Dataset, I feed it for prediction like so: results = model.predict(x_test)
This yields a numpy array, results. I want to compute the mean square error of results with x_test. However, x_test is of type tf.data.Dataset. So the question is, how can I convert a tf.data.dataset to a numpy array in tensorflow 1.15 or what is an alternative method to do this?

Calculate gradient of trained network

assume I got a trained model, I am trying to calculate it's Jacobin (trying to understand some it's mathematical properties after training). I am trying to use autograd as follow:
from autograd import jacobian
jacobian_pred=jacobian(model.predict)
jacobian_pred(x)
where x is from my training set. It raises an error:
TypeError: object of type 'numpy.float32' has no len()
What should I do?
Thanks!

My tensorflow 2.0 custom model is not receiving the shape or values I expect

I'm in the process of converting my pytorch models into tensorflow 2.0, so I'm still getting used to it. I have mostly gone off the API, I made a custom model, and defined it's call method with argument inputs:
class CustomModel(tf.keras.Model):
<... init...>
def call(self, inputs):
print("inputs: ", inputs)
self.sequential_convolution(inputs)
The sequential_convolution is a keras.Sequential of multiple convolution related layers. I can create the model object, compile it. It is variable length on both the output and input
model = CustomModel(inputs=tf.keras.Input(shape=(None, vdim)))
model.compile(optimizer=optimizer, loss=loss_func, metrics=[calc_accuracy])
for x, y in dataset:
print("x.shape: ", x.shape)
print("y.shape: ", y.shape)
model.fit(x, y, batch_size=1)
Where the shapes are x.shape: (244, 161) and y.shape: (40,). Both are Tensorflow tensors created from numpy arrays with tf.convert_to_tensor().
But when the model's call method prints the inputs, I get the following:
Tensor("input_1_1:0", shape=(None, 161), dtype=float32)
Which I should point out is not the Input defined on the model, this input is calculated from the actual input provided in the model.fit(), I manually changed the numbers to see what the causes were...
Which then ultimately leads to the stack trace:
x = self.sequential_conv(inputs)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py:396 converted_call
return py_builtins.overload_of(f)(*args)
TypeError: 'NoneType' object is not callable
This error occurs in a function deemed internal use only, but not able to ascertain what the cause of my problem is.
As I can't find much information on the matter, I feel that it's most likely something simple I haven't done, but I'm not sure. Any help would be great...

Why does Keras to_categorical method not return 3-D tensor when inputting 2-D tensor?

I was trying to build a LSTM neural net with Keras to predict tags for words in a set of sentences.
The implementation is all pretty straight forward, but the surprising thing was that
given the exactly same and otherwise correctly implemented code and
using Tensorflow 1.4.0 with Keras running on Tensorflow Backend,
on some people's computers, it returned tensors with wrong dimensions, while for others it worked perfectly.
The problem occured in the following context:
First, we turned the list of training sentences (sentences as a list of word indeces) into a 2-D matrix using the pad_sequences method from Keras (https://keras.io/preprocessing/sequence/):
def do_padding(sequences, length, padding_value):
return pad_sequences(sequences, maxlen=length, padding='post',
truncating='post', value=padding_value)
train_sents_padded = do_padding(train_sents, MAX_LENGTH,
word_to_id[PAD_TOKEN])
Next, we used our do_padding method on the corresponding training labels to turn them into a padded matrix. At the same time, we used the Keras to_categorical method (https://keras.io/utils/#to_categorical) to add a one-hot encoded vector to the created label matrix (one one-hot vector for each cell in the matrix, that means for word in each training sentence):
train_labels_padded = to_categorical(do_padding(train_labels, MAX_LENGTH,
label_to_id["O"]), NUM_LABELS)
We expected the resulting shape to be 3-D: (len(train_labels), MAX_LENGTH, NUM_LABELS). Yet, we found that the resulting shape was 2-D and basically looked like this: ((len(train_labels) x MAX_LENGTH), NUM_LABELS), meaning the numbers on the two expected dimensions len(train_labels) and MAX_LENGTH were multiplied and flattened into one dimension.
Interestingly, this problem as said before only occured for about 50% of the people, using Tensorflow 1.4.0 and Keras running on Tensorflow Backend.
We managed to solve the problem by reshaping the label matrix manually:
train_labels_padded = np.reshape(train_labels_padded, (len(train_labels),
MAX_LENGTH, NUM_LABELS))
I was just wondering if any of you have experienced a similar problem and have figured out the reason why this happens.

session.run error when implement a pairwise cnn with tensorflow

I am very new to tensorflow. Now I am implementing a pairwise cnn model. While I got the error:
tensorflow.python.pywrap_tensorflow.StatusNotOK: Not found:
FeedInputs: unable to find feed output Placeholder:0
The error is attributed to the code:
_,train_loss,train_feature1,train_feature2=sess.run([model.train_op,model.cost,model.feature1,model.feature2],feed)
Specifically in sess.run function, feed is a dictionary containing input data and corresponding labels. feature1 and feature2 are the output of the same neural networks and finally I would like to compare feature1 and feature2.
The gist codes are:
with self.graph.device('/gpu:1'):
self.inputs1=tf.placeholder(tf.float32,[self.batchSize,self.inputH,self.inputW,hp.channel])
self.inputs2=tf.placeholder(tf.float32,[self.batchSize,self.inputH,self.inputW,hp.channel])
self.labels=tf.placeholder(tf.float32,[self.batchSize])
self.feature1=forward(self.inputs1)
self.feature2=forward(self.inputs2)
self.cost=tf.reduce_sum( tf.mul( -1.0*self.labels, tf.sub( self.feature1,self.feature2 ) ) )
self.tvars=tf.trainable_variables()
grads=tf.gradients(self.cost,self.tvars)
optimizer=tf.train.MomentumOptimizer(learning_rate=self.learning_rate,momentum=self.momentum)
self.train_op=optimizer.apply_gradients(zip(grads,self.tvars))
Here I create three placeholders, and do some operations like train_op. Forward is the neural networks used in my implementation. Then in the training part, I load text data and labels with data_loader.next_batch(), finally run the optimization operation with sess.run(). The codes is available below.
for b in xrange(data_loader.batch_num):
pbar.update(int((b/data_loader.batch_num)*100) )
time.sleep(0.01)
maps1,maps2,labels=data_loader.next_batch()
feed={model.labels:labels,model.inputs1:maps1,model.inputs2:maps2}
_,train_loss,train_feature1,train_feature2=sess.run([model.train_op,model.cost,model.feature1,model.feature2],feed)
NEW ERROR
I tried to change the maps1 as Tensors instead of numpy ndarray, the error above disappeared. Unfortunately new errors raised.
File "Train.py", line 55, in train _,train_loss,train_feature1,train_feature2=sess.run([model.train_op,model.cost,model.feature1,model.feature2],feed)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py",
line 334, in run
np_val = np.array(subfeed_val, dtype=subfeed_t.dtype.as_numpy_dtype)
ValueError: setting an array element with a sequence.