Our development team is scratching our heads wondering why Google App Engine latency tends to go through the roof from time to time with very little predictability or warning. When latency jumps like this we start to see db timeouts between our app and database. The CPU util is pretty flat across the instances at this time which also makes this hard to understand.
We are using the Flex environment to host a .NET Core API. We like AppEngine for its PaaS feel and its always on feature. We are thinking about looking at Cloud Run as an alternative to test with since we can't figure this out.
Any suggestions on where to look or how we could troubleshoot this?
Here's the latest spike in latency from last night. Plenty of Cloud SQL db timeouts and other exceptions happening here due to this spike as well.
There are couple of reasons for this. This page has all the possible causes and the debugging steps. Mostly check your monitoring for memory usage, there might be a memory leak. Also, check for the autoscalar configuration.
Related
We are suspecting that we're experciencing thread pool starvation on a server that is running a couple of ASP.NET Core APIs and a couple of .NET Core consoles.
I ran perfview one one of our servers were we are suspecting problems with thread pool starvation. However I'm having a bit of trouble analyzing the results.
I ran PerfView /threadTime collect for about 60 seconds. And this is the result I got (I chose one to look at one of our ASP.NET Core APIs):
Looking at "By Name" we can see that there is a lot of time spent in BLOCKED_TIME. If I double click then I'm taken to the following view where I can expand one of the nodes to get the following view (the overwritten part is the name of our API process):
What does that tell me? Shouldn't I be able to see what exactly is blocking? And does it look like the problem is that a lot of threads is blocking each one for a small amount of time?
Are there any other conclusions we can draw from this?
BLOCKED_TIME generally means a period when the thread wasn't doing anything at all. This could be periods of I/O, where network or other types of latency are involved or time spent waiting on locks such as in situations with semaphores. In short, this doesn't necessarily tell you anything, as there's perfectly standard and reasonable reasons for the thread to be idled. However, a goodish amount of time spent blocked can be an indication of an underlying problem. Perhaps you have too much network latency. Perhaps you're trying to do too much file system work on a slow drive. In short, it may or may not indicate a problem, and even if it does indicate a problem, it doesn't really tell you what the problem is.
In general, if you're experiencing thread starvation, the first thing you should look at is thread pool utilization. Are you using async everywhere you can? Are you doing things that are big no-nos in web apps such as using Task.Run, Task.StartNew or worse, Thread.Start? All those created threads are coming out of the same thread pool, and thus proportionally reducing your server throughput.
There's an all too common pattern of attempting to schedule long-running jobs by shuffling them to new threads. That's death to a web application. All threads in the pool are there to service requests, not long-running jobs, and as such, requests should be handled quickly and efficiently so that the thread can be returned to the pool in short order to field other requests. If you need to background work, you need to truly background it, by offloading to another process or even a different machine entirely.
Short of all that, maybe you're just getting more load than the server can handle in general. That's always a possibility. Perhaps you need to vertically scale your system resources (and the thread pool with it). Perhaps you need to horizontally scale by replicating this server with a load balancer in front. Given that you're running multiple different things on the same server, an easy way to horizontally scale is to simply divvy out these things to their own machines. That alone would probably help tremendously. However, scaling, either vertically or horizontally, should be your last resort. Make sure you're using resources efficiently first, before throwing more resources at your inefficient things.
I have a Rails 3.2 application running on production server. The server has 8 GB of RAM and every other process works fine. But, there is a ruby process which keeps the memory utilization on the higher side. I have to manually login to the server console and type the TOP command and kill the process using the PID.
But, I am unable to figure out how to check which ruby process is taking so much of memory and also how to control it permanently.
Please suggest me a solution.
Thanks.
Could be so many things. Finding memory leaks is tough. What kind of application server are you using? If you're using Unicorn consider checking out Puma. It's actually really easy to switch over. We saw big gains in our app when we switched to Puma.
Also look through your app for n+1 queries. Optimizing some queries here and there would help tremendously.
Another thing you could consider trying is moving some longer running tasks to a background job with something like sidekiq.
Lots of performance monitoring services out there, like New Relic, that you could check out as well. Without more info it's a tough question to answer.
I noticed the other day that it seems like all the w3wp.exe running on my server are consuming way more RAM than I would expect. So I created test web application with a single aspx page and then a vanilla global.asax page as well (default methods, no additional code). I then deployed that site to IIS6 with a target framework of 4.0 built in release mode with debug set to "false". The site is also set up under it's own application pool. I then used issapp.vbs to figure out which w3wp.exe this test site was running under. I was surprised to see that the single site with 1 page was using almost 40mb of ram.
40mb of RAM seems like a lot for a single page website. Is this normal and if so why? Is there anything I can do to reduce the memory footprint?
I also noticed that each time the default page was refreshed that the w3wp.exe grew a little bit more. Is IIS6 caching the same page over and over?
40Mb for a single AppDomain (irrespective of how many pages you have) is alright. Consider that .NET is a managed environment and the app pool contains a large majority of the logic and heap for serving requests. In many cases it also will be less eager to free up RAM, if the RAM is not under contention (demanded by other parts of the system).
You can share app pools amongst different websites, and the "overhead" of the sandbox becomes less proportionate to your perception of what is acceptable. However, I don't think this is bad at all.
You're trying to optimize the case where you have no load and no clients. That really isn't sensible. Optimize for a realistic use case.
You don't build a factory and then try to get it to make one item as efficiently as possible. You try to get it to crank them out efficiently by the dozens.
This is not really an issue, since it is quite normal. Your .NET Framework is getting loaded and hence the size. When a real issue happens, you would have to use certain post production or profiling tools to get it fixed. Sharing some articles for memory and post production debugging just in case you need it in future.
http://msdn.microsoft.com/en-us/magazine/cc188781.aspx
http://aspalliance.com/1350_Post_Production_Debugging_for_ASPNET_Applications__Part_1.all
http://blogs.msdn.com/b/tess/archive/2006/09/06/net-memory-usage-a-restaurant-analogy.aspx
http://blogs.msdn.com/b/tess/archive/2008/09/12/asp-net-memory-issues-high-memory-usage-with-ajaxpro.aspx
Our network team is thinking of setting up a virtual desktop environment (via Windows 2008 virtual host) for each developer.
So we are going to have dumb terminals/laptops and should be using the virtual desktops for all of our work.
Ours is a Microsoft shop and we work with all versions of .net framework. Not having the development environments on the laptops is making the team uncomfortable.
Are there any potential problems with that kind of setup? Is there any reason to be worried about this setup?
Unless there's a very good development-oriented reason for doing this, I'd say don't.
Your developers are going to work best in an environment they want to work in. Unless your developers are the ones suggesting it and pushing for it, you shouldn't be instituting radical changes in their work environments without very good reasons.
I personally am not at all a fan of remote virtualized instances for development work, either. They're often slower, you have to deal with network issues and latency, you often don't have as much control as you would on your own machine. The list goes on and on, and little things add up to create major annoyances.
What happens when the network goes down? Are your dev's just supposed to sit on their hands? Or maybe they could bring cards and play real solitare...
Seriously, though, Unless you have virtual 100% network uptime, and your dev's never work off-site (say, from home) I'm on the "this is a Bad Idea" side.
One option is to get rid of your network team.
Seriously though, I have worked with this same type of setup through VMWare and it wasn't much fun. The only reason why I did it was because my boss thought it might be worth a try. Since I was newly hired, I didn't object. However, after several months of programming this way, I told him that I preferred to have my development studio on my machine and he agreed.
First, the graphical interface isn't really clear with a virtual workstation since it's sending images over the network rather than having your video card's graphical driver render the image. Constant viewing of this gave me a headache.
Secondly, any install of components or tools required the network administrator's help which meant I had to hurry up and wait.
Third, your computer is going to process one application faster than your server is going to process many apps and besides that, it has to send the rendered image over the network. It doesn't sound like it slows you down but it does. Again, hurry up and wait.
Fourth, this may be specific to VMWare but the virtual disk size was fixed to 4GB which to my network guy seemed to think it was enough. This filled up rather quickly. In order for me to expand the drive, I had to wait for the network admin to run partition magic on my drive which screwed it up and I had to have him rebuild my installation.
There are several more reasons but I would strongly encourage you to protest if you can. Your company is probably trying to impliment this because it's a new fad and it can be a way for them to save money. However, your productivity time will be wasted and that needs to be considered as a cost.
Bad Idea. You're taking the most critical tool in your developers' arsenal and making it run much, much, much slower than it needs to, and introducing several critical dependencies along the way.
It's good if you ever have to develop on-site, you can move your dev environment to a laptop and hit the road.
I could see it being required for some highly confidential multiple client work - there is a proof that you didn't leak any test data or debug files from one customer to another.
Down sides:
Few VMs support multiple monitors - without multiple monitors you can't be a productive developer.
Only virtualbox 3 gets close to being able to develop for opengl/activeX on a VM.
In my experience Virtual environments are ideal for test environments (for testing deployments) and not development environments. They are great as a blank slate / clean sheet for testing. I think the risk of alienating your developers is high if you pursue this route. Developers should have all the best tools at their disposal, i.e. high spec laptop / desktop, this keeps morale and productivity high.
Going down this route precludes any home-working which may or may not be an issue. Virtual environments are by their nature slower than dedicated environments, you may also have issues with multiple monitor setups on a VM.
If you go that route, make sure you bench the system aggressively before any serious commitment.
My experience of remote desktops is that it's ok for occasional use, but seldom sufficient for intensive computations and compilation typical of development work, especially at crunch time when everyone needs resources at the same time.
Not sure if that will affect you, but both VMWare and Virtual PC work very slow when viewed via Remote Desktop. For some reason Radmin (http://www.radmin.com/ ) does a much better job.
I regularly work with remote development environments and it is OK (although it takes some time to get used to keep track in which system you're working at the moment ;) ) - but most of the time I'm alone on the system.
I'm using NSOperation and NSOperationQueue to handle all of my networking threads so my interface can remain responsive while handling data transfer over the internet. Currently, I've got my operation queue set to a maximum concurrent operation count of 5, and it seems to work well.
I'm wondering, though, if there is a more ideal number of concurrent network operations that would best maximize the available resources without choking the hardware. Are there any recommendations, or steps I might take to measure and find out for myself?
Given the iPhone (currently) runs a single core, I would guess 5 is around the right number.
But the only way to be sure would be to instrument it and find out what the usage looks like (CPU, Memory and Network). Network usage you could get based on the data transferred - but its hard to know what a reaspnable usage would be. I'm not sure if it is possible to get CPU/Memory statistics from the iPhone.
If you are doing large transfers, then more connections probably wont help much. If you are doing lots of small transfers, then more connections will help work around the back and forth of setting up and tearing down the connection.