The same OOM curve on different apps - google-fabric

We recorded our crashes and OOM on the Fabric, but recently discovered that several of our apps on different servers have the same OOM anomaly curve at the same time. I would like to know if anyone understands this situation. Or if someone has the same situation as me.

I had a similar problem in may, particularly on may 9th, 15th and 18th.
Does that match yours?
Cordially,

Related

Unable to resume solving using a previous best solution or a serialized solution

To all,
Version of optaplanner: 7.48
Since a moment now, I'm no longer able to resume solving.
The process is:
thread 1: solver.solve();
thread 2: solver.terminateEarly();
thread 2: solver.solve(solver.getBestSolution());
The longer the time spent between solve() and terminateEarly() is short, the less likely the resume is to work fine.
When not working, symptoms are after the Construction Heuristics is finished, only a few new best solutions are found and then the solver stops for ever to find new best solutions even if it's still calculating at a significant CPU rate.
The problem is similar when solver.getBestSolution() is serialized and reloaded later.
Any suggestion?
Thanks.
Regards.
JLL
Based on the contents of the question, the title is wrong - OptaPlanner resumes just fine, it just can not find any better solutions. There are two reasons for why that could be the case:
There are no more better solutions to be found. The bigger your data set becomes, the less likely this is.
There are better solutions available, but OptaPlanner can not get to them, as it is stuck in a local optima. This is a common problem.
Escaping local optima is usually accomplished by a combination of the following:
Eliminating score traps from your constraints.
Increasing variety in move selection. See the available generic moves, or consider implementing a custom move for any intricacies of your particular problem.
Iterative local search. We do not (yet) support that out of the box, but the general idea is that at a certain point, you ruin a part of your solution (perhaps by uninitializing it) and then recreate it (randomly or otherwise).
Finally, I wholeheartedly recommend you to upgrade to OptaPlanner 8. The upgrade is easy, and the 7.x stream has been in maintenance mode for a very long time now.

video object classification suggestion

I am trying to build an outdoor smoke detection from the neighbor chimneys.
I live in a neighborhood where a couple of houses are still using wood-burning fireplaces and cause lots of smoke and they do during the day time. when it is smoky outside, the kid's room sometime has windows open and smoke get in and very hard to get smoke out. The worst part is it is not illegal (yet) so I found little help apart from talking to them and react to it quickly, in vain.
I am thinking to have an outdoor camera looking at chimneys and detect smoke. Then a program sends a text message for alerting. Most time, the image is pretty still and not a lot of variations. It shouldn't be a too hard problem for classification I imagine? I have little experience with Tensorflow or machine learning but I am a good programmer. So given some direction and some existing model, I hope I can get this working...
I know this sounds desperate, nevertheless, for a good deed. Please help.
For fire and smoke classification, you can check the following tutorial: https://www.pyimagesearch.com/2019/11/18/fire-and-smoke-detection-with-keras-and-deep-learning/.
PyImageSearch is a very good website for image processing, you can find there many articles which can help you (even deployment of neural networks on RaspberryPi and so on).

Textsum - Incorrect decode results compared to ref file

This issue is seen when performing training against my own dataset which was converted to binary via data_convert_example.py. After a week of training I get decode results that don't make sense when comparing the decode and ref files.
If anyone has been successful and gotten results similar to what is posted in the Textsum readme using their own data, I would love to know what has worked for you...environment, tf build, number of articles.
I currently have not had luck with 0.11, but have gotten some results with 0.9 however the decode results are similar to those shown below which I have no idea where they are even coming from.
I currently am running Ubuntu 16.04, TF 0.9, CUDA 7.5 and CuDnn 4. I tried TF 0.11 but was dealing with other issues so I went back to 0.9. It does seem that the decode results are being generated from valid articles, but the reference file and decode file indicies have NO correlation.
If anyone can provide any help or direction, it would be greatly appreciated. Otherwise, should I figure anything out, I will post here.
A few final questions. Regarding the vocab file referenced. Does it at all need to be sorted by word frequency at all? I never performed anything along these lines when generating it and just wasn't sure if this would throw something off as well.
Finally, I made the assumption in generating the data that the training data articles should be broken down into smaller batches. I separated out the articles into multiple files of 100 articles each. These were then named data-0, data-1, etc. I assume this was a correct assumption on my part? I also kept all the vocab in one file which has not seemed to throw any errors.
Are the above assumptions correct as well?
Below are some ref and decode results which you can see are quite odd and seem to have no correlation.
DECODE:
output=Wild Boy Goes About How I Can't Be Really Go For Love
output=State Department defends the campaign of Iran
output=John Deere sails profit - Business Insider
output=to roll for the Perseid meteor shower
output=Man in New York City in Germany
REFERENCE:
output=Battle Chasers: Nightwar Combines Joe Mad's Stellar Art With Solid RPG Gameplay
output=Obama Meets a Goal That Could Literally Destroy America
output=WOW! 10 stunning photos of presidents daughter Zahra Buhari
output=Koko the gorilla jams out on bass with Flea from Red Hot Chili Peppers
output=Brenham police officer refused service at McDonald's
Going to answer this one myself. Seems the issue here was the lack of training data. In the end I did end up sorting my vocab file, however it seems this is not necessary. The reason this was done, was to allow the end user to limit the vocab words to something like 200k words should they wish.
The biggest reason for the problems above were simply the lack of data. When I ran the training in the original post, I was working with 40k+ articles. I thought this was enough but clearly it wasn't and this was even more evident when I got deeper into the code and gained a better understanding as to what was going on. In the end I increased the number of articles to over 1.3 million, I trained for about a week and a half on my 980GTX and got the average loss to about 1.6 to 2.2 I was seeing MUCH better results.
I am learning this as I go, but I stopped at the above average loss because some reading I performed stated that when you perform "eval" against your "test" data, your average loss should be close to what you are seeing in training. This helps to determine whether you are getting close to over-fitting when these are far apart. Again take this with a grain of salt, as I am learning but it seems to make sense logically to me.
One last note that I learned the hard way is this. Make sure you upgrade to the latest 0.11 Tensorflow version. I originally trained using 0.9 but when I went to figure out how to export the model for tensorflow, I found that there was no export.py file in that repo. When I upgrades to 0.11, I then found that the checkpoint file structure seems to have changed in 0.11 and I needed to take another 2 weeks to train. So I would recommend just upgrading as they have resolved a number of the problems I was seeing during the RC. I still did have to set the is_tuple=false but that aside, all has worked out well. Hope this helps someone.

Using tensorflow to identify lego bricks?

having read this article about a guy who uses tensorflow to sort cucumber into nine different classes I was wondering if this type of process could be applied to a large number of classes. My idea would be to use it to identify Lego parts.
At the moment, a site like Bricklink describes more than 40,000 different parts so it would be a bit different than the cucumber example but I am wondering if it sounds suitable. There is no easy way to get hundreds of pictures for each part but does the following process sound feasible :
take pictures of a part ;
try to identify the part using tensorflow ;
if it does not identify the correct part, take more pictures and feed the neural network with them ;
go on with the next part.
That way, each time we encounter a new piece we "teach" the network its reference so that it can better be recognized the next time. Like that and after hundreds of iterations monitored by a human, could we imagine tensorflow to be able to recognize the parts? At least the most common ones?
My question might sound stupid but I am not into neural networks so any advice is welcome. At the moment I have not found any way to identify a lego part based on pictures and this "cucumber example" sounds promising so I am looking for some feedback.
Thanks.
You can read about the work of Jacques Mattheij, he actually uses a customized version of Xception1 running on https://keras.io/.
The introduction is Sorting 2 Metric Tons of Lego.
In Sorting 2 Tons of Lego, The software Side you can read:
The hard challenge to deal with next was to get a training set large
enough to make working with 1000+ classes possible. At first this
seemed like an insurmountable problem. I could not figure out how to
make enough images and to label them by hand in acceptable time, even
the most optimistic calculations had me working for 6 months or longer
full-time in order to make a data set that would allow the machine to
work with many classes of parts rather than just a couple.
In the end the solution was staring me in the face for at least a week
before I finally clued in: it doesn’t matter. All that matters is that
the machine labels its own images most of the time and then all I need
to do is correct its mistakes. As it gets better there will be fewer
mistakes. This very rapidly expanded the number of training images.
The first day I managed to hand-label about 500 parts. The next day
the machine added 2000 more, with about half of those labeled wrong.
The resulting 2500 parts where the basis for the next round of
training 3 days later, which resulted in 4000 more parts, 90% of which
were labeled right! So I only had to correct some 400 parts, rinse,
repeat… So, by the end of two weeks there was a dataset of 20K images,
all labeled correctly.
This is far from enough, some classes are severely under-represented
so I need to increase the number of images for those, perhaps I’ll
just run a single batch consisting of nothing but those parts through
the machine. No need for corrections, they’ll all be labeled
identically.
A recent update is Sorting 2 Tons of Lego, Many Questions, Results.
1CHOLLET, François. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv preprint arXiv:1610.02357, 2016.
I have started this using IBM Watson's Visual Recognition.
I had six different bricks to be recognized on the transport belt background.
I am actually thinking about tensorflow, since I can have it running locally.
The codelab : TensorFlow for Poets, describes almost exactly what you want to achieve,
For a demo of the Watson version:
https://www.ibm.com/developerworks/community/blogs/ibmandgoogle/entry/Lego_bricks_recognition_with_Watosn_lego_and_raspberry_pi?lang=en

region monitoring accuracy in iOS 5/6, late 2012

I am trying to use region monitoring for my app, but I am trying to use it with accuracy on the order of what building in a given city area the user is in.
Reading through other articles here on region monitoring, I have gotten a bunch of conflicting arguments on the accuracy of the system. Now, at the end of 2012, what is the accuracy like?
From my own testing it seems to be checking me into locations that are a few dozen meters away from where I am, which is too granular for my needs. I need to know if this is an issue with region monitoring or just my implementation.
Thanks, and I hope this question isn't too much of a repetition of other ones, but the dates of those questions and responses makes getting the current answer confusing.