I have encountered a problem for mnist dataset on tensorflow. As you probably know, using batches it does not preserve order on datasets but I need to know exactly which image of the samples I am working on. Does TF have any kind of indicator such as ID or some information that tells you which images it has extracted? For instance in one batch we may get images 20,1,4,6 and in another we get 3,7,88 etc from mnist. I want to have access to these IDs, is this possible?
You can always add your own indication; when you enqueue the features and labels you can enqueue the indicator as well.
Related
I'm working on a custom NER model with spacy-transformers and roBERTa. I'm really only using the CLI for this and am trying to alter my Spacy config.cfg file to account for custom entity labels in the pipeline.
I'm new to Spacy, but I've gathered that people usually use ner.add_label to accomplish this. I wonder if I might be able to change something in [initialize.components.ner.labels] of the config, but haven't come across a good way to do that.
I can't seem to find any options to alter the config file in a similar fashion - does anyone know if this is possible, or what might be the most succinct way to achieve those custom labels?
Edited for clarity: My issue could be different than my config theory. Right now I am getting an output, but instead of text labels they are numeric labels, such as:
('Oct',383) ('2019',383) ('February',383)
Thank you in advance for your help!
If you are working with the config-based training, generally you should not have to specify the labels anywhere - spaCy will look at the training data and get the list of labels from there.
There are a few cases where this won't work.
You have labels that aren't in your training data. These can't be learned so I would just consider this an error, but sometimes you have to work with the data you've been given.
You training data is very large. In this case reading over all the training data to get a complete list of labels can be an issue. You can use the init labels command to generate data so that the input data doesn't have to be scanned every time you start training.
Hello I'm new to the object detection tensorflow area. I tagged my images with the labelimg program and then trained them, but among the results I got, I got multiple detections on an object. What can I do to prevent this?enter image description here
It is normal you get multiple detections with different score. Post process your result: use some score threshold, merge your detections if they are at the same position. Check out this guide
Supposing I have a Keras model which is already trained. When using predict() method, I want to get the instance key value and corresponding prediction at the same time( I can pass key value as a feature/column in the input).
I wonder is it realistic to do that?
I struggled with this for a while. I'm using the tf.data.Dataset infrastructure so my first approach was to see if I could ensure that the order of the examples produced by the datasets was deterministic, but that wasn't optimal because it gave up a bunch of the parallel processing performance benefits and ended up not being the case in any event. I ended up processing predictions using model.predict_on_batch feeding in batches iterated out of the dataset manually instead of feeding the entire dataset into model.predict. That way I was able to grab the ids from the batch and associate them with the returned prediction.
I was surprised there wasn't a more ready made solution to a problem that must come up a lot. I haven't gotten up to speed on the Estimator interface or custom training/prediction loops yet, but hopefully this problem becomes trivial there.
I'm in a situation where the input into my ML model is a variable number of images per example (but only 1 label for each set), and so I would like to be able to pack multiple images into a single TFRecord example. However, every example i come across online is single image, single label, which is understandable because that's the most common use-case. I also wonder about decoding...it appears that tf.image.decode_png only does one image at a time, but perhaps I can convert all the images to tf.string and use tf.decode_raw, then resize to get all the images?
Thanks
I have trained a faster rcnn model with a custom dataset using Tensorflow's Object Detection Api. Over time I would like to continue to update the model with additional images (collected weekly). The goal is to optimize for accuracy and to weight newer images over time.
Here are a few alternatives:
Add images to previous dataset and train a completely new model
Add images to previous dataset and continue training previous model
New dataset with just new images and continue training previous model
Here are my thoughts:
option 1: would be more time consuming, but all images would be treated "equally".
Option 2: would like take less additional training time, but one concern is that the algorithm might be weighting the earlier images more.
Option 3: This seems like the best option. Take original model and simply focus on training the new stuff.
Is one of these clearly better? What would be the pros/cons of each?
In addition, I'd like to know if it's better to keep one test set as a control for accuracy or to create a new one each time that includes newer images. Perhaps adding some portion of new images to model and another to the test set, and then feeding older test set images back into model (or throwing them out)?
Consider the case where your dataset is nearly perfect. If you ran the model on new images (collected weekly), then the results (i.e. boxes with scores) would be exactly what you want from the model and it would be pointless adding these to the dataset because the model would not be learning anything new.
For the imperfect dataset, results from new images will show (some) errors and these are appropriate for further training. But there may be "bad" images already in the dataset and it is desirable to remove these. This indicates that Option 1 must occur, on some schedule, to remove entirely the effect of "bad" images.
On a shorter schedule, Option 3 is appropriate if the new images are reasonably balanced across the domain categories (in some sense a representative subset of the previous dataset).
Option 2 seems pretty safe and is easier to understand. When you say "the algorithm might be weighting the earlier images more", I don't see why this is a problem if the earlier images are "good". However, I can see that the domain may change over time (evolution) in which case you may well wish to counter-weight older images. I understand that you can modify the training data to do just that as discussed in this question:
Class weights for balancing data in TensorFlow Object Detection API