This issue is seen when performing training against my own dataset which was converted to binary via data_convert_example.py. After a week of training I get decode results that don't make sense when comparing the decode and ref files.
If anyone has been successful and gotten results similar to what is posted in the Textsum readme using their own data, I would love to know what has worked for you...environment, tf build, number of articles.
I currently have not had luck with 0.11, but have gotten some results with 0.9 however the decode results are similar to those shown below which I have no idea where they are even coming from.
I currently am running Ubuntu 16.04, TF 0.9, CUDA 7.5 and CuDnn 4. I tried TF 0.11 but was dealing with other issues so I went back to 0.9. It does seem that the decode results are being generated from valid articles, but the reference file and decode file indicies have NO correlation.
If anyone can provide any help or direction, it would be greatly appreciated. Otherwise, should I figure anything out, I will post here.
A few final questions. Regarding the vocab file referenced. Does it at all need to be sorted by word frequency at all? I never performed anything along these lines when generating it and just wasn't sure if this would throw something off as well.
Finally, I made the assumption in generating the data that the training data articles should be broken down into smaller batches. I separated out the articles into multiple files of 100 articles each. These were then named data-0, data-1, etc. I assume this was a correct assumption on my part? I also kept all the vocab in one file which has not seemed to throw any errors.
Are the above assumptions correct as well?
Below are some ref and decode results which you can see are quite odd and seem to have no correlation.
DECODE:
output=Wild Boy Goes About How I Can't Be Really Go For Love
output=State Department defends the campaign of Iran
output=John Deere sails profit - Business Insider
output=to roll for the Perseid meteor shower
output=Man in New York City in Germany
REFERENCE:
output=Battle Chasers: Nightwar Combines Joe Mad's Stellar Art With Solid RPG Gameplay
output=Obama Meets a Goal That Could Literally Destroy America
output=WOW! 10 stunning photos of presidents daughter Zahra Buhari
output=Koko the gorilla jams out on bass with Flea from Red Hot Chili Peppers
output=Brenham police officer refused service at McDonald's
Going to answer this one myself. Seems the issue here was the lack of training data. In the end I did end up sorting my vocab file, however it seems this is not necessary. The reason this was done, was to allow the end user to limit the vocab words to something like 200k words should they wish.
The biggest reason for the problems above were simply the lack of data. When I ran the training in the original post, I was working with 40k+ articles. I thought this was enough but clearly it wasn't and this was even more evident when I got deeper into the code and gained a better understanding as to what was going on. In the end I increased the number of articles to over 1.3 million, I trained for about a week and a half on my 980GTX and got the average loss to about 1.6 to 2.2 I was seeing MUCH better results.
I am learning this as I go, but I stopped at the above average loss because some reading I performed stated that when you perform "eval" against your "test" data, your average loss should be close to what you are seeing in training. This helps to determine whether you are getting close to over-fitting when these are far apart. Again take this with a grain of salt, as I am learning but it seems to make sense logically to me.
One last note that I learned the hard way is this. Make sure you upgrade to the latest 0.11 Tensorflow version. I originally trained using 0.9 but when I went to figure out how to export the model for tensorflow, I found that there was no export.py file in that repo. When I upgrades to 0.11, I then found that the checkpoint file structure seems to have changed in 0.11 and I needed to take another 2 weeks to train. So I would recommend just upgrading as they have resolved a number of the problems I was seeing during the RC. I still did have to set the is_tuple=false but that aside, all has worked out well. Hope this helps someone.
Related
I need assistance as a ML beginner. I have retrained TF2's model using make_image_classifier through tensorflow hub (command line approach).
My very first training:
I have immediately retrained the model once again as this one's predictions did not satisfy me -> using 4 classes instead and achieved 85% val.accuracy in Epoch 4/5 and 81% val.accuracy in Epoch 5/5. Because the problem is complex, I have decided to expand upon my db, increasing the number of data fed to the model.
I have expanded the database more than twofold! The results are shocking and I have no clue what to do. I can't believe this.
If anything - the added data is of much better quality, relevance and diversity, and it is actually obtained by myself - a human "discriminator" to the 2nd model's preformance (the 81% acc. one). If I knew how to do it, which is a second question - I would simply add this "feedback data" to my already working model. But I have no idea how to make it happen - ideally, I'd want to give feedback to the bot, allowing it to train once again on the additional data, it understanding that this is a feedback rather than a random additional set of data. However training a new model with updated data from scratch I'd like to try out anyways.
How do I interpret those numbers? What can be wrong with the data? Is it the data what's wrong at all, which is my assumption? Why is the training accuracy below 0.5?? What could have caused the drop between Epoch 2/5' and Epoch 3/5 starting the downfall? What are common issues and similar situations when something like this happens?
I'd love to understand what happened behind the curtains on a more lower level here but I have difficulty, I need guidance. This is as discouraging as the first & second training were encouraging.
Possible problems with the data I can see - but I can't believe they could cause such a drop especially because they were there during 1st&2nd training that went okay:
1-5% images can be in multiple classes (2 classes or maximum 3) - in my opinion this is even encouraged for the problem, as the model imitates me, and I want it to struggle with finding out what it is about certain elements that caused me to classify them as two+ simultaneous classes. Finding out features in them that made me think they'd satisfy both.
1-3% can be duplicates.
There's white border around the images (since it didn't seem to affect the first trainings I didn't bother to remove it)
The drop is 40%.
I will now try to use Tensorflow's tutorial to do it through a script rather than using command line, but seeing such a huge drop is very discouraging, I was hoping to see an increase, after all that's a common suggestion to feed a model more data and I have made sure to feed it quality additional data.. I appreciate every single suggestion for fine-tuning the model, as I am a beginner and I have no experience with what might work and what most probably won't. Thank you.
EDIT: I have cleaned my data by:
removing borders(now they are tiny, white, sometimes not present at all)
removing dust
removing artifacts which there were quite many in the added data!
Convinced the last point was the issue, I retrained. Results are even worse!
2 classes binary
4 classes
I'm using Tensorflow-GPU 1.8 API on Windows 10. For many projects I use the tf.Estimator's, which really work great. It takes care of a bunch of steps including writting summaries for Tensorboard. But right now the 'events.out.tfevents' file getting way to big and I am running into "out of space" errors. For that reason I want to disable the summary writting or at least reduce the amount of summaries written.
Going along with that mission I found out about the RunConfig you can pass over at construction of tf.Estimator. Apparently the parameter 'save_summary_steps' (which by default is 200) controls the way summaries are wrtitten out. Unfortunately changing this parameter seems to have no effect at all. It won't disable (using None value) the summary or reducing (choosing higher values, e.g. 3000) the file size of 'events.out.tfevents'.
I hope you guys can help me out here. Any help is appreciated.
Cheers,
Tobs.
I've observed the following behavior. It doesn't make sense to me so I hope we get a better answer:
When the input_fn gets data from tf.data.TFRecordDataset then the number of steps between saving events is the minimum of save_summary_steps and (number of training examples divided by batch size). That means it does it a minimum of once per epoch.
When the input_fn gets data from tf.TextLineReader, it follows save_summary_steps as you'd expect and I can give it a large value for infrequent updates.
I have a model published on GCloud-ML and it is working fine. I can do the Online predicitons and get the right results. My issue is the Performance. Each predict (inference) takes around 3.5 seconds and it is not good for my case. I'm using automatically scale and my bucket is US-Central. My images have around 100Kb and I'm in Brazil (in the GCloud Console I can see tha latency --> 1.5 seconds).
I already tried the optimazed_for_inference.py but it didn't work (I can't generate the saved_model from the optimazed graph. It is possible?).
I need to get the result at least in 2 seconds. My doubt is: Is it possible to do that?
or is it normal to get results in 3 / 4 seconds using gcloud-ml predict?
Thanks! Any ideas are well coming! If you need more information to help me, please add a comment!
Thanks again!
Recently I stared toying with tensor flow, dnns etc. now I'm trying to implement something more serious, information retrieval from short sentences (doctor instructions).
Unfortunately the dataset I have is, as always, quite "dirty". As I'm trying to use word embeddings, I actually need "clean" data. Take one example:
"Take two pilleach day". There is a missing white space between pill and each. I am implementing "tokenizer improver" to look at each sentence and propose new tokenization based on joint probability of each word in sentence given the frequency of terms in whole document (tf) . As I was doing it today, a thought came to my mind: why bother writing suboptimal solution for this problem when I can employ powerful learning algorithms such as Lstm networks to do that for me. However, as of today, I have only a feeling that it's actually possible to do that. As we know, feelings are not best when it comes to architecting such complex problems. I don't know where to begin: what should be my training set and learning goal.
I know this is a broad question, but I know there are many brilliant people with more knowledge about tensorflow and neural nets, so I'm sure that somebody has either already solved similar problem or just knows how to approach this problem.
Any guidance is welcome, I do not except you to solve this for me of course:)
Besos and all the best to all the tensorflow community:)
Having the same issue. I solved it by using a character level net. Basically I rewrote Character-Aware Neural Language Models, kicked out the whole "words"-elements and just stayed with the caracter level.
Training Data: I took the data I had, as dirty as it was, used the dirty data as targets and made it even more dirty to create inputs.
So your "Take two pilleach day" will be learned as in many cases you do have a clean and similar phrase, e.g. "Take one pill each morning" that with the regime mentioned will serve as target and you train the net on destroyed inputs like "Take oe pileach mornin"
having read this article about a guy who uses tensorflow to sort cucumber into nine different classes I was wondering if this type of process could be applied to a large number of classes. My idea would be to use it to identify Lego parts.
At the moment, a site like Bricklink describes more than 40,000 different parts so it would be a bit different than the cucumber example but I am wondering if it sounds suitable. There is no easy way to get hundreds of pictures for each part but does the following process sound feasible :
take pictures of a part ;
try to identify the part using tensorflow ;
if it does not identify the correct part, take more pictures and feed the neural network with them ;
go on with the next part.
That way, each time we encounter a new piece we "teach" the network its reference so that it can better be recognized the next time. Like that and after hundreds of iterations monitored by a human, could we imagine tensorflow to be able to recognize the parts? At least the most common ones?
My question might sound stupid but I am not into neural networks so any advice is welcome. At the moment I have not found any way to identify a lego part based on pictures and this "cucumber example" sounds promising so I am looking for some feedback.
Thanks.
You can read about the work of Jacques Mattheij, he actually uses a customized version of Xception1 running on https://keras.io/.
The introduction is Sorting 2 Metric Tons of Lego.
In Sorting 2 Tons of Lego, The software Side you can read:
The hard challenge to deal with next was to get a training set large
enough to make working with 1000+ classes possible. At first this
seemed like an insurmountable problem. I could not figure out how to
make enough images and to label them by hand in acceptable time, even
the most optimistic calculations had me working for 6 months or longer
full-time in order to make a data set that would allow the machine to
work with many classes of parts rather than just a couple.
In the end the solution was staring me in the face for at least a week
before I finally clued in: it doesn’t matter. All that matters is that
the machine labels its own images most of the time and then all I need
to do is correct its mistakes. As it gets better there will be fewer
mistakes. This very rapidly expanded the number of training images.
The first day I managed to hand-label about 500 parts. The next day
the machine added 2000 more, with about half of those labeled wrong.
The resulting 2500 parts where the basis for the next round of
training 3 days later, which resulted in 4000 more parts, 90% of which
were labeled right! So I only had to correct some 400 parts, rinse,
repeat… So, by the end of two weeks there was a dataset of 20K images,
all labeled correctly.
This is far from enough, some classes are severely under-represented
so I need to increase the number of images for those, perhaps I’ll
just run a single batch consisting of nothing but those parts through
the machine. No need for corrections, they’ll all be labeled
identically.
A recent update is Sorting 2 Tons of Lego, Many Questions, Results.
1CHOLLET, François. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv preprint arXiv:1610.02357, 2016.
I have started this using IBM Watson's Visual Recognition.
I had six different bricks to be recognized on the transport belt background.
I am actually thinking about tensorflow, since I can have it running locally.
The codelab : TensorFlow for Poets, describes almost exactly what you want to achieve,
For a demo of the Watson version:
https://www.ibm.com/developerworks/community/blogs/ibmandgoogle/entry/Lego_bricks_recognition_with_Watosn_lego_and_raspberry_pi?lang=en