How to solve User Warning Image Retrieving Problem? - google-colaboratory

How to solve this problem:-
/usr/local/lib/python3.6/dist-packages/keras/utils/data_utils.py:616: UserWarning: The input 49 could not be retrieved. It could be because a worker has died.
And does the val_accuracy get affected?

Follow this lead, they also faced the same problem and dealt with it too. Basically it is a warning message and does effect if you have a flaw in logic if not, then don't be worried.
Here is the link, Hope it is useful.

Related

Unable to resume solving using a previous best solution or a serialized solution

To all,
Version of optaplanner: 7.48
Since a moment now, I'm no longer able to resume solving.
The process is:
thread 1: solver.solve();
thread 2: solver.terminateEarly();
thread 2: solver.solve(solver.getBestSolution());
The longer the time spent between solve() and terminateEarly() is short, the less likely the resume is to work fine.
When not working, symptoms are after the Construction Heuristics is finished, only a few new best solutions are found and then the solver stops for ever to find new best solutions even if it's still calculating at a significant CPU rate.
The problem is similar when solver.getBestSolution() is serialized and reloaded later.
Any suggestion?
Thanks.
Regards.
JLL
Based on the contents of the question, the title is wrong - OptaPlanner resumes just fine, it just can not find any better solutions. There are two reasons for why that could be the case:
There are no more better solutions to be found. The bigger your data set becomes, the less likely this is.
There are better solutions available, but OptaPlanner can not get to them, as it is stuck in a local optima. This is a common problem.
Escaping local optima is usually accomplished by a combination of the following:
Eliminating score traps from your constraints.
Increasing variety in move selection. See the available generic moves, or consider implementing a custom move for any intricacies of your particular problem.
Iterative local search. We do not (yet) support that out of the box, but the general idea is that at a certain point, you ruin a part of your solution (perhaps by uninitializing it) and then recreate it (randomly or otherwise).
Finally, I wholeheartedly recommend you to upgrade to OptaPlanner 8. The upgrade is easy, and the 7.x stream has been in maintenance mode for a very long time now.

Is there a reason behind this weird noise in GAN generated images?

When training a GAN to generate images of faces (with the celebrity faces dataset from keggle), I keep on getting ok-ish looking faces, but this very weird noise that looks like fire (and makes the generated faces look unnervingly demonic). An example of the result can be found here. The same phenomenon has appeared all three times I've tried to train it, so I was wondering if anyone knew of a reason why such a specific- and weird - type of noise would keep appearing? Asking mostly out of curiosity about what's going on- since I'm a total beginner if the way of mitigating the effect is really complicated don't worry about trying to explain how to fix it.
Thanks!

Tensorflow: The graph couldn't be sorted in topological order only when running in terminal

I encounter some problem while running the python script on Google cloud computing instance, with python 3.6, tensorflow 1.13.1. I see several people encounter similar problems of loops in computational graph on stackoverflow. But none of it really find the culprit for it. And I observe something interesting so maybe someone experienced can figure it out.
The error message is like this:
2019-05-28 22:28:57.747339: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-05-28 22:28:57.754195: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
My script for train.py will look like this:
import A,B,C
...
def main():
....
if __name__ == '__main__':
main()
So I will show my two ways to run this script:
VERSION1:
In terminal,
python3 train.py
This gives me the error like I state above. When I only use CPU, i notice it throws me something like failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected. So I add GPU to my instance but the loop in computational graph is still there.
VERSION 2(This is where weird thing happens):
I just simply copy, with nothing changed, the code in main to the jupyter notebook and run there. Then suddenly, no error occurs anymore.
I don't really know what's going on under the hood. I just notice the message at the beginning of the running is not the same between two different ways of running the code.
If you encounter the same problem, copy to jupyter notebook might help directly. I would really like share more info if someone has any ideas what might possible cause this. Thank you!
Well it turns out, no matter what, I choose a wrong way to build the graph at the beginning, which will not give the loop in my perspective. The loop error give me an idea I'm doing something wrong. But the interesting question I mentioned above is still not answered! However, I'd like to share my error so anyone see the loop error should think whether you're doing the same thing as me.
In the input_fn, I use tensor.eval() to get corresponding numpy.array in the middle to interact with data outside of that function. I choose not to use tf.data.Dataset because the whole process is complicated and I can't compress the whole thing into Dataset directly. But it turns out this approach sabotage the static computational graph design of Tensorflow. So during training, it trains on the same batch again and again. So my two cents advice is that if you want to achieve something super complex in your input_fn. It's likely you will be better off or even only doing the right thing by using the old fashioned modelling way- tf.placeholder.

YOLO 3 Assertion '0' failed error - NOT CRLF error

I'm looking to train my own object detector using YOLO 3 for a single class. Basically, it needs to detect whether the test images have the object or not. I face an error where the training doesn't begin and exits with an assertion '0' failed. I checked other answers which say that the CRLF must be encoded as LF for it to work on linux. But that solution doesn't work either. I'm following all the steps outlined in pjreddie's website!
The error happens due to image size or with batch size (also sometimes with sub-division also).
You can try to tune the parameters to lower values.
Okay, so what helped was re-starting the server and re-doing everything from the beginning.
It hasn't happened again, so I think it was just a random error?
I'm not sure though. I will try to look into it further.

Textsum - Incorrect decode results compared to ref file

This issue is seen when performing training against my own dataset which was converted to binary via data_convert_example.py. After a week of training I get decode results that don't make sense when comparing the decode and ref files.
If anyone has been successful and gotten results similar to what is posted in the Textsum readme using their own data, I would love to know what has worked for you...environment, tf build, number of articles.
I currently have not had luck with 0.11, but have gotten some results with 0.9 however the decode results are similar to those shown below which I have no idea where they are even coming from.
I currently am running Ubuntu 16.04, TF 0.9, CUDA 7.5 and CuDnn 4. I tried TF 0.11 but was dealing with other issues so I went back to 0.9. It does seem that the decode results are being generated from valid articles, but the reference file and decode file indicies have NO correlation.
If anyone can provide any help or direction, it would be greatly appreciated. Otherwise, should I figure anything out, I will post here.
A few final questions. Regarding the vocab file referenced. Does it at all need to be sorted by word frequency at all? I never performed anything along these lines when generating it and just wasn't sure if this would throw something off as well.
Finally, I made the assumption in generating the data that the training data articles should be broken down into smaller batches. I separated out the articles into multiple files of 100 articles each. These were then named data-0, data-1, etc. I assume this was a correct assumption on my part? I also kept all the vocab in one file which has not seemed to throw any errors.
Are the above assumptions correct as well?
Below are some ref and decode results which you can see are quite odd and seem to have no correlation.
DECODE:
output=Wild Boy Goes About How I Can't Be Really Go For Love
output=State Department defends the campaign of Iran
output=John Deere sails profit - Business Insider
output=to roll for the Perseid meteor shower
output=Man in New York City in Germany
REFERENCE:
output=Battle Chasers: Nightwar Combines Joe Mad's Stellar Art With Solid RPG Gameplay
output=Obama Meets a Goal That Could Literally Destroy America
output=WOW! 10 stunning photos of presidents daughter Zahra Buhari
output=Koko the gorilla jams out on bass with Flea from Red Hot Chili Peppers
output=Brenham police officer refused service at McDonald's
Going to answer this one myself. Seems the issue here was the lack of training data. In the end I did end up sorting my vocab file, however it seems this is not necessary. The reason this was done, was to allow the end user to limit the vocab words to something like 200k words should they wish.
The biggest reason for the problems above were simply the lack of data. When I ran the training in the original post, I was working with 40k+ articles. I thought this was enough but clearly it wasn't and this was even more evident when I got deeper into the code and gained a better understanding as to what was going on. In the end I increased the number of articles to over 1.3 million, I trained for about a week and a half on my 980GTX and got the average loss to about 1.6 to 2.2 I was seeing MUCH better results.
I am learning this as I go, but I stopped at the above average loss because some reading I performed stated that when you perform "eval" against your "test" data, your average loss should be close to what you are seeing in training. This helps to determine whether you are getting close to over-fitting when these are far apart. Again take this with a grain of salt, as I am learning but it seems to make sense logically to me.
One last note that I learned the hard way is this. Make sure you upgrade to the latest 0.11 Tensorflow version. I originally trained using 0.9 but when I went to figure out how to export the model for tensorflow, I found that there was no export.py file in that repo. When I upgrades to 0.11, I then found that the checkpoint file structure seems to have changed in 0.11 and I needed to take another 2 weeks to train. So I would recommend just upgrading as they have resolved a number of the problems I was seeing during the RC. I still did have to set the is_tuple=false but that aside, all has worked out well. Hope this helps someone.