AFAIK YOLO calculates mAP against validation dataset during training. Now is it possible to calculate the same against unseen test dataset ?
Command:
./darknet detector map obj.data yolo-obj.cfg yolo-obj_best.weights
obj.data:
classes = 1
train = train.txt
valid = test.txt
names = classes.txt
backup = backup
I have directed valid to test dataset containing annotated images. But I always get the following result:
calculation mAP (mean average precision)...
44
detections_count = 50, unique_truth_count = 43
class_id = 0, name = traffic_light, ap = 100.00% (TP = 43, FP = 0)
for conf_thresh = 0.25, precision = 1.00, recall = 1.00, F1-score = 1.00
for conf_thresh = 0.25, TP = 43, FP = 0, FN = 0, average IoU = 85.24 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP#0.50) = 1.000000, or 100.00 %
Total Detection Time: 118 Seconds
It's not that I'm not happy with 100% mAP, but it's definitely wrong isn't it?
Any advice would be greatly appreciated.
Regards,
Setnug
Now is it possible to calculate the same against unseen test dataset ?
Yes, mAP calculation needs images with corresponding labels/annotation that's all.
I have directed valid to test dataset containing annotated images.
Yes, this is the way to do what you wanted.
There is a possibility that what you're seeing here is this known bug, provided you're using old code and haven't updated after that. In that case suggest you to pull the latest darknet and try.
Note that if the model is trained really well and if your test set is simple in terms of complexity (though it's unseen) or it's visually similar to that of train set, it's possible to get such numbers as well, as we're talking about small number of test samples.
Related
I'm trying to use the Elastic-Net algorithm implemented in Cleverhans to generate adversarial samples in a classification task. The main problem is that i'm trying to use it in a way to obtain an higher confidence at classification time on a target class (different from the original one) but i'm not able to reach good results.
The system that i'm trying to fool is a DNN with a softmax output on 10 classes.
For instance:
Given a sample of class 3 i want to generate an adversarial sample of class 0.
Using the default hyperparameters implemented in the ElasticNetMethod of cleverhans i'm able to obtain a succesful attack, so the class assigned to the adversarial sample became the class 0, but the confidence is quite low(about 30%). This also happens trying different values for the hyperparameters.
My purpose is to obtain a quite higher confidence (at least 90%).
For other algorithm like "FGSM" or "MadryEtAl" i'm able to reach this purpose creating a loop in which the algorithm is applied until the sample is classified as the target class with a confidence greater than 90%, but i can't to apply this iteration on the EAD algorithm because at each step of the iteration it yields the adversarial sample generated at the first step, and in the following iterations it remains unchanged. (I know that this may happens because the algorithm is different from the other two metioned, but i'm trying to find a solution to reach my purpose).
This is the code that i'm actually using to generate adversarial samples.
ead_params = { 'binary_search_steps':9, 'max_iterations':100 , 'learning_rate':0.001, 'clip_min':0,'clip_max':1,'y_target':target}
adv_x = image
founded_adv = False
threshold = 0.9
wrap = KerasModelWrapper(model)
ead = ElasticNetMethod(wrap, sess=sess)
while (not founded_adv):
adv_x = ead.generate_np(adv_x, **ead_params)
prediction = model.predict(adv_x).tolist()
pred_class = np.argmax(prediction[0])
confidence = prediction[0][pred_class]
if (pred_class == 0 and confidence >= threshold):
founded_adv = True
The while loop may generate a sample until the target class is reached with a confidence greater than 90%. This code actually works with FGSM and Madry, but runs infinitely using EAD.
Library version:
Tensorflow: 2.2.0
Keras: 2.4.3
Cleverhans: 2.0.0-451ccecad450067f99c333fc53592201
Anyone can help me ?
Thanks a lot.
For anyone intrested in this problem the previous code can be modified in this way to works properly:
FIRST SOLUTION:
prediction = model.predict(image)
initial_predicted_class = np.argmax(prediction[0])
ead_params = { 'binary_search_steps':9, 'max_iterations':100 , 'learning_rate':0.001,'confidence':1, 'clip_min':0,'clip_max':1,'y_target':target}
adv_x = image
founded_adv = False
threshold = 0.9
wrap = KerasModelWrapper(model)
ead = ElasticNetMethod(wrap, sess=sess)
while (not founded_adv):
adv_x = ead.generate_np(adv_x, **ead_params)
prediction = model.predict(adv_x).tolist()
pred_class = np.argmax(prediction[0])
confidence = prediction[0][pred_class]
if (pred_class == initial_pred_class and confidence >= threshold):
founded_adv = True
else:
ead_params['confidence'] += 1
Using the confidence parameter implemented in the library. Actually we increase by 1 the confidence parameter if the probability of the target class does not increase.
SECOND SOLUTION :
prediction = model.predict(image)
initial_predicted_class = np.argmax(prediction[0])
ead_params = {'beta':5e-3 , 'binary_search_steps':6, 'max_iterations':10 , 'learning_rate':3e-2, 'clip_min':0,'clip_max':1}
threshold = 0.96
adv_x = image
founded_adv = False
wrap = KerasModelWrapper(model)
ead = ElasticNetMethod(wrap, sess=sess)
while (not founded_adv):
eps_hyp = 0.5
new_adv_x = ead.generate_np(adv_x, **ead_params)
pert = new_adv_x-adv_x
new_adv_x = adv_x - eps_hyp*pert
new_adv_x = (new_adv_x - np.min(new_adv_x)) / (np.max(new_adv_x) - np.min(new_adv_x))
adv_x = new_adv_x
prediction = model.predict(new_adv_x).tolist()
pred_class = np.argmax(prediction[0])
confidence = prediction[0][pred_class]
print(pred_class)
print(confidence)
if (pred_class == initial_predicted_class and confidence >= threshold):
founded_adv = True
In the second solution there are the following modification to the original code:
-Initial_predicted_class is the class predicted by the model on the benign sample ( "0" for our example ).
-In the parameters of the algorithm (ead_params) we don't insert the target class.
-Then we can obtain the perturbation given by the algorithm calculating pert = new_adv_x - adv_x where "adv_x" is the original image (in the first step of the for loop), and new_adv_x is the perturbed sample generated by the algorithm.
-The previous operation is useful because the EAD original alghoritm calculate the perturbation to maximize the loss w.r.t the class "0", but in our case we want to minimize it.
-So, we can calculate the new perturbed image as new_adv_x = adv_x - eps_hyp*pert (where the eps_hyp is an epsilon hyperparameter that i've introduced to reduce the perturbation), and than we normalize the new perturbed image.
-I've tested the code for a large number of images, and the the confidence always increase, so i think that can be a good solution for this purpose.
I think that the second solution allow to obtain finer perturbation.
I am implementing federated learning with tensorflowjs. But i am kind of stuck in the federated averaging process. The idea is simple: get updated weights from multiple clients and average it in the server.
I have trained a model on browser, got the updated weights via model.getWeights() method and sent the weights to server for averaging.
//get weights from multiple clients(happens i client-side)
w1 = model.getWeights(); //weights from client 1
w2 = model.getWeights(); //weights from client 2
//calculate average of the weights(server-side)
var mean_weights= [];
let length = w1.length; // length of all weights_array is same
for(var i=0; i<length; i++){
let sum = w1[i].add(w2[i]);
let mean = sum.divide(2); //got confused here, how to calculate mean of tensors ??
mean_weights.push(mean);
}
//apply updates to the model(both client-side and server-side)
model.setWeights(mean_weights);
So my question is:
How do I calculate the mean of tensor array ?
Also, is this the right approach to perform federated averaging via tensorflowjs ?
Yes, but be careful. You can average two tensors with tf.mean like https://stackoverflow.com/users/5069957/edkeveked said. However, remember axis=0 should be shortened to just 0 in JavaScript.
Just to rewrite his code in a second way:
const x = tf.tensor([1, 2, 3, 2, 3, 4], [2, 3]);
x.mean(0).print()
However, you asked if you're doing it right, and that depends on if you're averaging as you go or not. There's an issue with a rolling average.
Example:
If you average (10, 20) then 30, you get (22.5) a different number than averaging (20, 30) then 10 (17.5), which is of course different from averaging all three at the same time, which would give you 20.
Averages do not adhere to an order-irrelevance principle once they've been calculated. It's the division part that removes the associative property. So you'll need to either:
A: Store all model weights and calculate a new average each time based on all previous models
or
B: Add a weighting system to your federated average so more recent models do not significantly affect the system.
Which makes sense?
I recommend B in the situation that you:
Don't want to or cannot store every model and weight ever submitted.
You know some models have seen more valid data, and should be weighted appropriately compared to blind models.
You can computer a weighted average adjusting the denominator for your existing model vs your incoming model.
In JavaScript you can do something simple like this to computer a weighted average between two values:
const modelVal1 = 0
const modelVal2 = 1
const weight1 = 0.5
const weight2 = 1 - weight1
const average = (modelVal1 * weight1) + (modelVal2 * weight2)
The above code is your common evenly weighted average, but as you adjust the weight1, you are rebalancing the scales to significantly adjust the outcome in favor of modelVal1 or modelVal2.
Obviously, you'll need to convert the JavaScript I have shown into tensor mathematical functions, but that's trivial.
Iterate averaging (or weighted average) with weights decaying is often used in Federated learning. See Iterate averaging as regularization for stochastic gradient descent, and Server Averaging for Federated Learning.
To calculate the mean of 2 tensors, you can use tf.mean
const x = tf.tensor1d([1, 2, 3]);
const y = tf.tensor1d([2, 3, 4]);
tf.stack([x, y]).print()
const mean = tf.stack([x, y]).mean(axis=0)
mean.print();
I'm currently writing a script to quantise a Keras model down to 8 bits. I'm doing a fairly basic linear scaling on the weights, by assuming a normal distribution of weights and biases, and then interpolating all the values within 2 standard deviations of the mean, to the range [-128, 127].
This all works, and I run the model through inference, but my image out is crazy bad. I know there will be a small performance hit, but I'm seeing roughly 10x performance degradation.
My question is, after this scaling of the weights, do I need to do the inverse scaling operation to my output? None of the papers I've been reading seem to mention this, but I'm unsure why else my results would be so bad.
The network is for image demosaicing. It takes in a RAW image, and is meant to output an image with very low noise, and no demosaicing artefacts. My full precision model is very good, with image PSNRs of around 40-43dB, but after quantisation, I'm getting 4-8dB, and incredibly bad looking images.
Code for anyone who's bothered to read it
for i in layer_index:
count = count+1
layer = model.get_layer(index = i);
weights = layer.get_weights();
weights_act = weights[0];
bias_act = weights[1];
std = np.std(weights_act)
if (std > max_std):
max_std = std
mean = np.mean(weights_act)
mean_of_mean = mean_of_mean + mean
mean_of_mean = mean_of_mean / count
max_bound = mean_of_mean + 2*max_std
min_bound = mean_of_mean - 2*max_std
print(max_bound, min_bound)
for i in layer_index:
layer = model.get_layer(index = i);
weights = layer.get_weights();
weights_act = weights[0];
bias_act = weights[1];
weights_shape = weights_act.shape;
bias_shape = bias_act.shape;
new_weights = np.empty(weights_shape, dtype = np.int8)
print(new_weights.dtype)
new_biass = np.empty(bias_shape, dtype = np.int8)
for a in range(weights_shape[0]):
for b in range(weights_shape[1]):
for c in range(weights_shape[2]):
for d in range(weights_shape[3]):
new_weight = (((weights_act[a,b,c,d] - min_bound) * (127 - (-128)) / (max_bound - min_bound)) + (-128))
new_weights[a,b,c,d] = np.int8(new_weight)
#print(new_weights[a,b,c,d], weights_act[a,b,c,d])
for e in range(bias_shape[0]):
new_bias = (((bias_act[e] - min_bound) * (127 - (-128)) / (max_bound - min_bound)) + (-128))
new_biass[e] = np.int8(new_bias)
new_weight_layer = (new_weights, new_biass)
layer.set_weights(new_weight_layer)
You dont do what you think you are doing, I'll explain.
If you wish to take pre-trained model and quantize it you have to add scales after each operation that involves weights, lets take for example the convolution operation.
As we know convolution operation is linear in my explantion i will ignore the bias for the sake of simplicity (adding him is relatively easy), Let's assume X is our input Y is our output and W is the weights, convolution can be written as:
Y=W*X
where '*' represent the convolution operation, what you are basically doing is taking the weights and multiple them by some scalar (lets call it 'a') and shift them by some other scalar (let's call it 'b') so in your model you use W' where: W'= Wa+b
So if we return to the convolution operation we get that in your quantized network you basically do the next operation: Y' = W'*X = (Wa+b)*X
Because convolution is linear we get: Y' = a(W*X) + b*X'
Don't forget that in your network you want to receive Y not Y' at the output of the convolution therefore you must do shift + re scale to get the correct answer.
So after that explanation (which i hope was clear enough) i hope you can understand what is the problem in your network, you do this scale and shift to all of weights and you never compensate for it, I think your confusion is because your read papers that trained models in quantized mode from the beginning and didn't take pretrained model quantized it.
For you problem i think tensorflow graph transform tool might help, take a look at:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md
If you wish to read more about quantizing pre trained model you can find more information in (for more academic info just go to scholar.google.com:
https://www.tensorflow.org/lite/performance/post_training_quantization
Hi,I have a question that, how can I make predict with unfixed input data? I will try to describe in detail clear:
I use MTCNN for face detection(it's ok unfamiliar with that), and it employs 3 networks: PNet, RNet, ONet. PNet detects a mass of proposal face bounding boxes, then these boxes are coarse-to-fine by the rest net one after another, finally get precise face bbox(s). When taking an image as input to PNet, image's size is unfixed, and the output proposal box number from PNet is also unfixed, so as RNet, ONet. Reference to another MTCNN code I set a large data_shapes(e.g., image size, batch size) when I bind the module, and initialize all to zero,then make predict. That works though, Isn't that a redundant calculation? (Question 1)
PNet:
max_img_w=1000
max_img_h=1000
sym, arg_params, aux_params = mx.model.load_checkpoint(‘det1’, 0)
self.PNets = mx.mod.Module(symbol=sym, context=ctx,label_names=None)
self.PNets.bind(data_shapes=[(‘data’, (1, 3, max_img_w, max_img_h))],for_training=False)
self.PNets.set_params(arg_params,aux_params)
RNet
sym, arg_params, aux_params = mx.model.load_checkpoint(‘det2’, 0)
self.RNet = mx.mod.Module(symbol=sym, context=ctx,label_names=None)
self.RNet.bind(data_shapes=[(‘data’, (2048,3, 24, 24))],for_training=False)
self.RNet.set_params(arg_params,aux_params,allow_missing=True)
ONet
sym, arg_params, aux_params = mx.model.load_checkpoint(‘det3’, 0)
self.ONet = mx.mod.Module(symbol=sym, context=ctx,label_names=None)
self.ONet.bind(data_shapes=[(‘data’, (256, 3, 48, 48))],for_training=False)
self.ONet.set_params(arg_params,aux_params,allow_missing=True)
And I try mx.mod.Module.reshape before predict, which will adjust data'shape according to last network's output, but I get this error:(Question 2)
AssertionError: Shape of unspecified array arg:prob1_label changed. This can cause the new executor to not share parameters with the old one. Please check for error in the network. If this is intended, set partial_shaping=True to suppress this warning.
One more thing is that The MTCNN code (https://github.com/pangyupo/mxnet_mtcnn_face_detection) primary use deprecated function to load models:
self.PNet = mx.model.FeedForward.load(‘det1’,0)
One single line to work with arbitrary data_shapes, why this function be deprecated..?(Question 3)
I found a little difference that after load model, FeedFroward takes 0MB memory before make one predict, but mx.mod.Module takes up memory once loaded, and increase obviously after making one prediction.
You can use MXNet imperative API Gluon and that will let you use different batch-sizes.
If like in this case, your model was trained using the symbolic API or has been exported in the serialized MXNet format ('-0001.params', '-symbol.json' for e.g), you can load it in Gluon that way:
ctx = mx.cpu()
sym = mx.sym.load_json(open('det1-symbol.json', 'r').read())
PNet = gluon.nn.SymbolBlock(outputs=sym, inputs=mx.sym.var('data'))
PNet.load_params('det1-0001.params', ctx=ctx)
Then you can use it the following way:
# a given batch size (1)
data1 = mx.nd.ones((1, C, W, H))
output1 = PNet(data1)
# a different batch size (5)
data2 = mx.nd.ones((5, C, W, H))
output2 = PNet(data2)
And it would work.
You can get started with MXNet Gluon with the official 60 minutes crash course
I'm new to SQL, so waiting for someone to shed me some lights hopefully. We got a stored procedure in place using the simple linear regression. Now I want to apply some weighting using a discount factor of lamda, i.e. 1, lamda, lamda^2, ..., lamda^n, while n is the length of the original series.
How should I generate the discounted weight series and apply to the current code structure below?
...
SUM((OASSpline-OASPriorSpline) * (AdjOASDolDur-AdjOASPriorDolDur))/SUM(SQUARE((AdjOASDolDur-AdjOASPriorDolDur))) as Beta, /* Beta = Sxy/Sxx */
SUM(SQUARE((AdjOASDolDur-AdjOASPriorDolDur))) as Sxx,
SUM((OASSpline-OASPriorSpline) * (AdjOASDolDur-AdjOASPriorDolDur)) as Sxy
...
e.g.
If I set discount factor (lamda) = 0.99, my weighting array should be formed generated automatically using the length of 10 from my series:
OASSpline = [1.11,1.45,1.79, 2.14, 2.48, 2.81,3.13,3.42,3.70,5.49]
AdjOASDolDur = [0.75,1.06,1.39, 1.73, 2.10, 2.48,2.85,3.20,3.52,3.61]
OASPriorSpline = 5.49
AdjOASPriorDolDur = 5.61
Weight = [1,0.99,0.9801,0.970299,0.96059601,0.9509900, 0.941480149,0.932065348,0.922744694,0.913517247]
The weighted linear regression should return a beta of 0.81243398, while the current simple linear regression should return a beta of 0.81164174.
Thanks much in advance!
I'll take a stab.
You could look at this article dealing generating sequence numbers and then use the current row number generated as an exponent. Does that work? I think a fair few are bamboozled by the request.