How to make spacy train faster on NER for Persian language - tensorflow

I have a blank model from spacy, in the config file I use the widget Training Pipelines & Models with this config:
Language = Arabic
Components = ner
Hardware = CPU
Optimize for = accuracy
then in config-file I changed the:
[nlp]
lang = "ar"
to
[nlp]
lang = "fa"
because there is no pretrained GPU (transformer) for persian-language.
and as you know the accuracy type is very slow and I have 400,000 records.
this is my config-file
[paths]
train = null
dev = null
vectors = null
[system]
gpu_allocator = null
[nlp]
lang = "fa"
pipeline = ["tok2vec","ner"]
batch_size = 1000
[components]
[components.tok2vec]
factory = "tok2vec"
[components.tok2vec.model]
#architectures = "spacy.Tok2Vec.v2"
[components.tok2vec.model.embed]
#architectures = "spacy.MultiHashEmbed.v2"
width = ${components.tok2vec.model.encode.width}
attrs = ["ORTH", "SHAPE"]
rows = [5000, 2500]
include_static_vectors = true
[components.tok2vec.model.encode]
#architectures = "spacy.MaxoutWindowEncoder.v2"
width = 256
depth = 8
window_size = 1
maxout_pieces = 3
[components.ner]
factory = "ner"
[components.ner.model]
#architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null
[components.ner.model.tok2vec]
#architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
[corpora]
[corpora.train]
#readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
[corpora.dev]
#readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
[training.optimizer]
#optimizers = "Adam.v1"
[training.batcher]
#batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
[training.batcher.size]
#schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
[initialize]
vectors = ${paths.vectors}
How can I make the training process faster?

To speed up training you have a few options.
Change the evaluation frequency. It's not in the config the widget generates, but there's an eval_frequency option - it should be filled in if you use fill-config as recommended. The default value is relatively low, and evaluation is slow. You should increase this value a lot if you have a large amount of training data.
Use the efficiency presets instead of accuracy. If speed is an issue then you should try this. For your pipeline, the relevant options are whether to include static vectors or not, and the width or depth of your tok2vec. Note this alone won't affect speed that much, but because it definitely reduces memory usage it can be usefully combined with the next option.
Increase batch size. In training the time to process a batch is relatively constant, so larger batches means fewer batches for the same data, which means faster training. How large a batch you can handle depends on the size of your documents and your hardware.
Use less training data. This is very rarely something that I'd recommend, but if you have 400,000 records you shouldn't need that many to get a good NER model. (How many classes do you have?) Try 10,000 to start with and see how your model performs, and scale up until you get the accuracy/speed tradeoff you want. This will also help you figure out if there is some kind of issue with your data more quickly.
For tips on faster inference (not training), see the spaCy speed FAQ.

You might just be using one core of your CPU, as that is kind of the Python default iirc. I would look into parallelizing the job with joblib and increasing your chunk size.
See: https://prrao87.github.io/blog/spacy/nlp/performance/2020/05/02/spacy-multiprocess.html#Option-3:-Parallelize-the-work-using-joblib

Related

Question on ElasticNet algorithm implemented in Cleverhans

I'm trying to use the Elastic-Net algorithm implemented in Cleverhans to generate adversarial samples in a classification task. The main problem is that i'm trying to use it in a way to obtain an higher confidence at classification time on a target class (different from the original one) but i'm not able to reach good results.
The system that i'm trying to fool is a DNN with a softmax output on 10 classes.
For instance:
Given a sample of class 3 i want to generate an adversarial sample of class 0.
Using the default hyperparameters implemented in the ElasticNetMethod of cleverhans i'm able to obtain a succesful attack, so the class assigned to the adversarial sample became the class 0, but the confidence is quite low(about 30%). This also happens trying different values for the hyperparameters.
My purpose is to obtain a quite higher confidence (at least 90%).
For other algorithm like "FGSM" or "MadryEtAl" i'm able to reach this purpose creating a loop in which the algorithm is applied until the sample is classified as the target class with a confidence greater than 90%, but i can't to apply this iteration on the EAD algorithm because at each step of the iteration it yields the adversarial sample generated at the first step, and in the following iterations it remains unchanged. (I know that this may happens because the algorithm is different from the other two metioned, but i'm trying to find a solution to reach my purpose).
This is the code that i'm actually using to generate adversarial samples.
ead_params = { 'binary_search_steps':9, 'max_iterations':100 , 'learning_rate':0.001, 'clip_min':0,'clip_max':1,'y_target':target}
adv_x = image
founded_adv = False
threshold = 0.9
wrap = KerasModelWrapper(model)
ead = ElasticNetMethod(wrap, sess=sess)
while (not founded_adv):
adv_x = ead.generate_np(adv_x, **ead_params)
prediction = model.predict(adv_x).tolist()
pred_class = np.argmax(prediction[0])
confidence = prediction[0][pred_class]
if (pred_class == 0 and confidence >= threshold):
founded_adv = True
The while loop may generate a sample until the target class is reached with a confidence greater than 90%. This code actually works with FGSM and Madry, but runs infinitely using EAD.
Library version:
Tensorflow: 2.2.0
Keras: 2.4.3
Cleverhans: 2.0.0-451ccecad450067f99c333fc53592201
Anyone can help me ?
Thanks a lot.
For anyone intrested in this problem the previous code can be modified in this way to works properly:
FIRST SOLUTION:
prediction = model.predict(image)
initial_predicted_class = np.argmax(prediction[0])
ead_params = { 'binary_search_steps':9, 'max_iterations':100 , 'learning_rate':0.001,'confidence':1, 'clip_min':0,'clip_max':1,'y_target':target}
adv_x = image
founded_adv = False
threshold = 0.9
wrap = KerasModelWrapper(model)
ead = ElasticNetMethod(wrap, sess=sess)
while (not founded_adv):
adv_x = ead.generate_np(adv_x, **ead_params)
prediction = model.predict(adv_x).tolist()
pred_class = np.argmax(prediction[0])
confidence = prediction[0][pred_class]
if (pred_class == initial_pred_class and confidence >= threshold):
founded_adv = True
else:
ead_params['confidence'] += 1
Using the confidence parameter implemented in the library. Actually we increase by 1 the confidence parameter if the probability of the target class does not increase.
SECOND SOLUTION :
prediction = model.predict(image)
initial_predicted_class = np.argmax(prediction[0])
ead_params = {'beta':5e-3 , 'binary_search_steps':6, 'max_iterations':10 , 'learning_rate':3e-2, 'clip_min':0,'clip_max':1}
threshold = 0.96
adv_x = image
founded_adv = False
wrap = KerasModelWrapper(model)
ead = ElasticNetMethod(wrap, sess=sess)
while (not founded_adv):
eps_hyp = 0.5
new_adv_x = ead.generate_np(adv_x, **ead_params)
pert = new_adv_x-adv_x
new_adv_x = adv_x - eps_hyp*pert
new_adv_x = (new_adv_x - np.min(new_adv_x)) / (np.max(new_adv_x) - np.min(new_adv_x))
adv_x = new_adv_x
prediction = model.predict(new_adv_x).tolist()
pred_class = np.argmax(prediction[0])
confidence = prediction[0][pred_class]
print(pred_class)
print(confidence)
if (pred_class == initial_predicted_class and confidence >= threshold):
founded_adv = True
In the second solution there are the following modification to the original code:
-Initial_predicted_class is the class predicted by the model on the benign sample ( "0" for our example ).
-In the parameters of the algorithm (ead_params) we don't insert the target class.
-Then we can obtain the perturbation given by the algorithm calculating pert = new_adv_x - adv_x where "adv_x" is the original image (in the first step of the for loop), and new_adv_x is the perturbed sample generated by the algorithm.
-The previous operation is useful because the EAD original alghoritm calculate the perturbation to maximize the loss w.r.t the class "0", but in our case we want to minimize it.
-So, we can calculate the new perturbed image as new_adv_x = adv_x - eps_hyp*pert (where the eps_hyp is an epsilon hyperparameter that i've introduced to reduce the perturbation), and than we normalize the new perturbed image.
-I've tested the code for a large number of images, and the the confidence always increase, so i think that can be a good solution for this purpose.
I think that the second solution allow to obtain finer perturbation.

After quantisation in neural network, will the output need to be scaled with the inverse of the weight scaling

I'm currently writing a script to quantise a Keras model down to 8 bits. I'm doing a fairly basic linear scaling on the weights, by assuming a normal distribution of weights and biases, and then interpolating all the values within 2 standard deviations of the mean, to the range [-128, 127].
This all works, and I run the model through inference, but my image out is crazy bad. I know there will be a small performance hit, but I'm seeing roughly 10x performance degradation.
My question is, after this scaling of the weights, do I need to do the inverse scaling operation to my output? None of the papers I've been reading seem to mention this, but I'm unsure why else my results would be so bad.
The network is for image demosaicing. It takes in a RAW image, and is meant to output an image with very low noise, and no demosaicing artefacts. My full precision model is very good, with image PSNRs of around 40-43dB, but after quantisation, I'm getting 4-8dB, and incredibly bad looking images.
Code for anyone who's bothered to read it
for i in layer_index:
count = count+1
layer = model.get_layer(index = i);
weights = layer.get_weights();
weights_act = weights[0];
bias_act = weights[1];
std = np.std(weights_act)
if (std > max_std):
max_std = std
mean = np.mean(weights_act)
mean_of_mean = mean_of_mean + mean
mean_of_mean = mean_of_mean / count
max_bound = mean_of_mean + 2*max_std
min_bound = mean_of_mean - 2*max_std
print(max_bound, min_bound)
for i in layer_index:
layer = model.get_layer(index = i);
weights = layer.get_weights();
weights_act = weights[0];
bias_act = weights[1];
weights_shape = weights_act.shape;
bias_shape = bias_act.shape;
new_weights = np.empty(weights_shape, dtype = np.int8)
print(new_weights.dtype)
new_biass = np.empty(bias_shape, dtype = np.int8)
for a in range(weights_shape[0]):
for b in range(weights_shape[1]):
for c in range(weights_shape[2]):
for d in range(weights_shape[3]):
new_weight = (((weights_act[a,b,c,d] - min_bound) * (127 - (-128)) / (max_bound - min_bound)) + (-128))
new_weights[a,b,c,d] = np.int8(new_weight)
#print(new_weights[a,b,c,d], weights_act[a,b,c,d])
for e in range(bias_shape[0]):
new_bias = (((bias_act[e] - min_bound) * (127 - (-128)) / (max_bound - min_bound)) + (-128))
new_biass[e] = np.int8(new_bias)
new_weight_layer = (new_weights, new_biass)
layer.set_weights(new_weight_layer)
You dont do what you think you are doing, I'll explain.
If you wish to take pre-trained model and quantize it you have to add scales after each operation that involves weights, lets take for example the convolution operation.
As we know convolution operation is linear in my explantion i will ignore the bias for the sake of simplicity (adding him is relatively easy), Let's assume X is our input Y is our output and W is the weights, convolution can be written as:
Y=W*X
where '*' represent the convolution operation, what you are basically doing is taking the weights and multiple them by some scalar (lets call it 'a') and shift them by some other scalar (let's call it 'b') so in your model you use W' where: W'= Wa+b
So if we return to the convolution operation we get that in your quantized network you basically do the next operation: Y' = W'*X = (Wa+b)*X
Because convolution is linear we get: Y' = a(W*X) + b*X'
Don't forget that in your network you want to receive Y not Y' at the output of the convolution therefore you must do shift + re scale to get the correct answer.
So after that explanation (which i hope was clear enough) i hope you can understand what is the problem in your network, you do this scale and shift to all of weights and you never compensate for it, I think your confusion is because your read papers that trained models in quantized mode from the beginning and didn't take pretrained model quantized it.
For you problem i think tensorflow graph transform tool might help, take a look at:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md
If you wish to read more about quantizing pre trained model you can find more information in (for more academic info just go to scholar.google.com:
https://www.tensorflow.org/lite/performance/post_training_quantization

TensorflowJS takes too long

When I run TensorflowJS on the browser, especially on the phone, it takes really long to predict and sometimes doesn't even work. I'm using the optimized graph already. I want to know if there is any way to speed it up, whether by running a prediction before the page loads so that the second is faster or anything else.
I am using InceptionV3 architecture, and the image size is 299 by 299; if I could make that smaller perhaps it could go faster, but that would mean I would have to retrain my model. Note: I am not training using Tensorflowjs, only making predictions. Here is the relevant code:
var ctx = canvas.getContext('2d');
var file = ctx.getImageData(0,0,120,120);
const raw = tf.fromPixels(file).toFloat();
const resized = tf.image.resizeBilinear(raw, [299, 299])
const offset = tf.scalar(127);
const normalized = resized.sub(offset).div(offset);
batched = normalized.expandDims(0);
f = model.execute(batched).dataSync();

Restored model in tensorflow gives different results for relu operation

The weights retrieved from restored model doesn't change and the input is also constant
But the output of 'Relu:0' operation is giving different results each time.
Below is my code:
sess=tf.Session()
saver = tf.train.import_meta_graph('checkpoints/checkpoints_otherapproach_1/cameranetwork_RAID_CNN-3100.meta')
saver.restore(sess,tf.train.latest_checkpoint(checkpoint_dir='checkpoints/checkpoints_otherapproach_1/'))
images = tf.get_default_graph().get_tensor_by_name('images:0')
phase = tf.get_default_graph().get_tensor_by_name('phase:0')
Activ = tf.get_default_graph().get_tensor_by_name('network/siamese_model/convolution_1/conv_1/Relu:0')
image_array = np.zeros(shape = [1,3,128,64,3]) #*******
imagepath = 'RAiD_Dataset' + '/images_afterremoving_persons_notinallcameras/'+'test'+'/camera_'+str(1)
fullfile_name = imagepath+"/"+ 'camera_1_person_23_index_1.jpg'
image_array[0][0] = cv2.imread(fullfile_name)
image_array[0][1] = image_array[0][0]
image_array[0][2] = image_array[0][0]
image_array = image_array.astype(np.float32)
feed_dict_values ={images: image_array, phase:False}
temp2 = sess.run(Activ, feed_dict =feed_dict_values)
temp1 = sess.run(Activ, feed_dict =feed_dict_values)
print (temp1==temp2).all() #output is false
There are two possible reasons for this:
Some of the tensorflow ops inherit non-deterministic behavior from CUDA. This results in small numerical errors (which might be amplified by non-linearities). See this answer on how to try running your model on a single CPU thread. If the two arrays will turn out to be identical in this condition, then this is the case.
I'm assuming that you know the graph you are loading, but the graph itself might produce inconsistent results 'by design' due to operations deliberately introducing either randomness or inconstant data. For example, consider operations that use the random number generator or operations that update variables (e.g., tf.assign) each time Activ is evaluated.

How to write expensive summaries less often in tensorflow

I have a tensorflow model. In it, I have different summaries. Some, such as loss and accuracy and inexpensive, and I want to write them often. Others, like accuracy on the test set are more expensive to calculate and I want to write them say, 100 times less often than normal summaries. What is the best way to implement it in tensorflow?
Instead of merging all summaries with merge_all(), you create a few different groups of summaries with merge() and then write them with different frequency. Something like this:
s1 = tf.summary.image(...)
s2 = tf.summary.scalar(...)
s3 = tf.summary.histogram(...)
s4 = tf.summary.audio(...)
summary_expensive = tf.summary.merge([s1, s4])
summary_cheap = tf.summary.merge([s2, s3])
# open a session `sess`
# init variables
# create a writer `writer`
for i in xrange(many_steps):
summary1 = sess.run(summary_cheap)
writer.add_summary(summary1, i)
if i % 100 == 0:
summary2 = sess.run(summary_expensive)
writer.add_summary(summary2, i)