I have a Ruby on Rails application with following versions:
Ruby: 1.9.3-p547
Rails: 3.2.16
I am facing some performance issues in this application. My initial efforts on diagnosing this landed me up on using the RackTimer gem (https://github.com/joelhegg/rack_timer) mentioned in article at http://www.codinginthecrease.com/news_article/show/137153 to record the Middleware timestamps.
Using it I found that Rack::Lock consumes a lot of time.For e.g. following
are some of the RackTimer logs from the logs provided below:
Rack Timer (incoming) -- Rack::Lock: 73.77910614013672 ms
Rack Timer (incoming) -- Rack::Lock: 67.05522537231445 ms
Rack Timer (incoming) -- Rack::Lock: 87.3713493347168 ms
Rack Timer (incoming) -- Rack::Lock: 59.815168380737305 ms
Rack Timer (incoming) -- Rack::Lock: 55.583953857421875 ms
Rack Timer (incoming) -- Rack::Lock: 111.56821250915527 ms
Rack Timer (incoming) -- Rack::Lock: 119.28486824035645 ms
Rack Timer (incoming) -- Rack::Lock: 69.2741870880127 ms
Rack Timer (incoming) -- Rack::Lock: 75.4690170288086 ms
Rack Timer (incoming) -- Rack::Lock: 86.68923377990723 ms
Rack Timer (incoming) -- Rack::Lock: 113.18349838256836 ms
Rack Timer (incoming) -- Rack::Lock: 116.78934097290039 ms
Rack Timer (incoming) -- Rack::Lock: 118.49355697631836 ms
Rack Timer (incoming) -- Rack::Lock: 132.1699619293213 ms
As can be seen Rack::Lock middleware processing time fluctuates between 10 ms to more than 130 seconds. And the majority of these comes into picture while serving assets on my Home Page.
BTW I have Asset Pipeline enabled in my config/application.rb
# Enable the asset pipeline
config.assets.enabled = true
I have my Production-version application configured to be monitored by NewRelic.There too the graph-charts highlights highest % and time taken by Rack::Lock.
I am totally blank on what is contributing towards making Rack::Lock take so many milliseconds.Would appreciate if anybody from the community can provide their valuable guidance in figuring out what might be causing this and how to fix it?
Below can be found the Gemfile, what all Middlewares are involved and Dev environment Logs.
Gemfile:
https://gist.github.com/JigneshGohel-BoTreeConsulting/1b10977de58d09452e19
Middlewares Involved:
https://gist.github.com/JigneshGohel-BoTreeConsulting/91c004686de21bd6ebc1
Development Environment Logs:
----- FIRST TIME LOADED THE HOME INDEX PAGE
https://gist.github.com/JigneshGohel-BoTreeConsulting/990fab655f156a920131
----- SECOND TIME LOADED THE HOME INDEX PAGE WITHOUT RESTARTING THE SERVER
https://gist.github.com/JigneshGohel-BoTreeConsulting/f5233302c955e3b31e2f
Thanks,
Jignesh
I am posting my findings here after I posted my question above. Hoping that somebody else benefits from these findings when ends up in a situation like above.
Discussing about the Rack::Lock issue with one of my senior associates Juha Litola, below were his first thoughts (quoting his own words as it is):
Could it just be possible that you are seeing measuring artifact in sense that you are just seeing Rack::Lock as taking a lot of time, but that is just because it is wrapping the actual call? So the Rack::Lock time is cumulative time from everything that happens in the request processing. See
https://github.com/rack/rack/blob/master/lib/rack/lock.rb .
As for the performance issues, could you elaborate on what kind of problems you have so I could help?
To which I thought it could be a possibility. However I could not convince myself with this possibility because of the following doubt:
Rack::Lock is at the second position in the Middlewares chain with a Rails application (please refer the Middleware List I mentioned in post above at https://gist.github.com/JigneshGohel-BoTreeConsulting/91c004686de21bd6ebc1 ). And each middleware is processed in sequential order in the chain. Thus Rack::Lock would be the second one to process the request
and then others in the chain would get a chance to jump in.
In such a case as per my understanding I don't think the timestamps recorded for Rack::Lock middleware is the cumulative time from everything that happens in the request processing.And it should be the time taken by Rack::Lock middleware itself.
Later Juha, after spending a few minutes looking at the server(see note below) config provided following inputs:
With a quick look I think that there is a quite clear problem in how the application has been configured.
Application doesn’t have config.threadsafe! enabled, which means that Rack::Lock is enabled, and request processing is limited to one thread / process. Now puma is configured only to have one process, but 16-32 threads. What this means in effect that puma is currently processing only one request at given moment.
Best solution would of course be if you could enable thread safe mode, but that will require thorough testing.
If that fails or is not an option puma should be configured with multiple workers with 1 thread each.
Note: I forgot to add any details about the configuration of web server on which my application is deployed. We are using Puma Web Server (https://github.com/puma/puma)
With that I got a hint to dig more into the config.threadsafe!. Doing a search on web I landed up on following articles
http://www.sitepoint.com/config-threadsafe/
http://tenderlovemaking.com/2012/06/18/removing-config-threadsafe.html
shedding great insights on how enabling or disabling the option config.threadsafe! impacts the performance of the application deployed on a multi-threaded or multi-process webserver on production.
A Brief Summary Of What The Above Articles Conveyed
What is Rack::Lock?
Rack::Lock is a middleware that is inserted to the Rails middleware stack in order to protect our applications from the multi-threaded Bogeyman. This middleware is supposed to protect us from nasty race conditions and deadlocks by wrapping our requests with a mutex. The middleware locks a mutex at the beginning of the request, and unlocks the mutex when the request finishes.
Let's assume that there is a program running and sending 5 requests simultaneously 100 times to an application whose code (say Controller) is NOT thread-safe.
Now lets observe the impact of combination of Rack::Lock middleware, config.threadsafe! option enabled or disabled, Thread-unsafe code in the application, and Multi-Threaded or Multi-Process Web Server, after the program gets finished or is killed
Multi-Threaded Web Server (Puma)
# Combination 1:
config.threadsafe! option : Disabled
Rack::Lock middleware : Available in app's middleware stack because of config.threadsafe! option disabled
With this combination the web server is successfully able to entertain all the 500 requests received.
This is because each request is augmented by Rack::Lock so as to execute it synchronously.In other words
Rack::Lock ensures we have only one concurrent request at a time.Thus each of the 500 requests gets a chance to
execute.
# Combination 2:
config.threadsafe! option : Enabled
Rack::Lock middleware : Unavailable in app's middleware stack because of config.threadsafe! option enabled
With this combination the web server is able to entertain only 200 out of 500 requests received.
This is because of the absence of Rack::Lock middleware, which ensures that we have only one concurrent request
at a time and thus each request gets a chance.
However there are advantages as well as disadvantages of each combination mentioned above:
# Combination 1
Advantage:
Each of the request received gets chance to be processed
Disadvantage:
* The runtime to process all of the 500 requests took 1 min 46 secs (compare it to runtime of Combination 2)
* Using a multi-threaded webserver is useless, if Rack::Lock remains available in middleware stack.
# Combination 2
Advantage:
The runtime to process 200 requests took 24 secs (compare it to runtime of Combination 1).
The reason being the multi-threaded nature of webserver is being leveraged in this case to entertain concurrent requests coming in.
Disadvantage:
* Not all 500 requests got a chance to be processed
Note: Examples and Runtime statistics have been quoted from http://tenderlovemaking.com/2012/06/18/removing-config-threadsafe.html
Multi-Process Web Server (Unicorn)
# Combination 1:
config.threadsafe! option : Disabled
Rack::Lock middleware : Available in app's middleware stack because of config.threadsafe! option disabled
Since multiple processes are forked by the webserver and each of them listens for requests and also
Rack::Lock middleware is available, the web server is successfully able to entertain all the 500 requests received.
# Combination 2:
config.threadsafe! option : Enabled
Rack::Lock middleware : Unavailable in app's middleware stack because of config.threadsafe! option enabled
Here too multiple processes are forked by the webserver and each of them listens for requests,
however Rack::Lock middleware is unavailable which enables multi-threading, which in turn means that we'll
get a race condition in the thread-unsafe code we have in the application.But strangely with this combination
too the web server is successfully able to entertain all the 500 requests received.
The reason being a process-based web server creates worker processes and each process holds one instance of
our application. When a request is received webserver spawns a child process for handling it.
Conclusion:
In a multi-process environment Rack::Lock becomes redundant if we keep config.threadsafe! option disabled.
This is because in multi-process environment the socket is our lock and we don't need any additional lock.
Thus it is beneficial to enable config.threadsafe! and remove Rack::Lock overhead in production environment.
In a multi-threaded environment if we keep config.threadsafe! enabled the developers needs to ensure the application's code is thread-safe.
And the advantage of keeping config.threadsafe! is that lesser runtime is needed to process the incoming requests.
In my application's context we tweaked the Puma server's config by increasing the workers. I hope the performance improves.
Related
I have an instance of APIConnect on premise.
Analyzing the logs, I have seen the task called "security-appID" moving from 10ms execution time to 200ms execution time.
What is the meaning of this task?
This task I believe offloads application security requests to other integrations if you have it so configured. It does not have anything to do necessarily with apiconnect, it is probably related to your bluemix ID, dashboard or landing page and how that is setup. You can probably find more information about it in the BMX docs. https://console.dys0.bluemix.net/docs/services/appid/existing.html#adding-app-id-to-an-existing-app
I have a simple twisted application which I run using a systemd service, executing a script, which subsequently executes a .tac file.
The application is structured as a JSON RPC endpoint (fastjsonrpc), built into a t.w.r.Resource, which is in a t.w.s.Site, and served t.a.i.TCPServer, and the whole thing packed into a t.a.Application. This works fine.
Where I do run into trouble is when I try to warm up caches at startup. This warm-up process is pretty slow (~300 seconds), and makes systemd timeout and kill the process. Increasing the timeout is not really a viable option, since I wouldn't want this to block system boot.
Analogous code is used in a separate stack running on Flask from within Apache and wsgi. That server starts itself off and lets systemd go on while it takes its time building the caches. This behaviour is fine for me.
I've tried calling the warmup function using the following within the setup function of the t.w.r.Resource:
reactor.callLater(1, ep.warmup, None)
I've not yet tried using this from within systemd, and have been testing it from twistd directly on the command line. The server does work as expected, however it no longer responds to SIGINT (^C). Removing the callLater is all that's needed to let the server respond to SIGINT.
If the warmup function is called directly (not by callLater, i.e., the arrangement which makes systemd give up while waiting for warm up to complete), the resulting server also continues to respond to SIGINT.
Is there a better / good way to handle this sort of long-running warmup code?
Why would twistd / the reactor not respond to SIGINT? Am I missing something here?
Twisted is a single-threaded thing. It sounds like your "cache warmup" code is blocking the reactor for those 300 seconds. One easy way to fix this would be using deferToThread to let it run without blocking the reactor.
Our users are restless. They keep complaining about woolly, unmeasurable stuff, particularly slowness, without giving specifics, which of course makes it very difficult to track down.
Nonetheless, it is quite possible that they are right, that there are server calls that are taking way too long to come back. So I want to put some kind of sniffer on the web site (we're using ASP.NET MVC 4 on IIS7) that will log any call that takes more than n seconds to turn around, or that returns more than x megabytes of data, along with all request parameters, the response size, and maybe a certain amount of response data.
I haven't a clue how to do this, though. Any suggestions?
here is my take on this:
FRT
While you can use failed request tracing to log slow requests, in my experience is more useful for finding out why a request fails before it hits your application, rather than why its running slowly. 9/10 times its going to simply show you that the slowdown is in your code somewhere.
Log Parser
Yes you can download and analyze iis logs. I use Log Parser Lizard to do the analysis - its a great gui over log parser. Here's a sample of how you might query slow requests over 1000ms:
SELECT
To_String(To_timestamp(date, time), 'dd/MM/yyyy hh:mm:ss') As Time,
cs-uri-stem, cs-uri-query, cs-method, time-taken, cs-bytes, sc-status
FROM
'C:\inetpub\logs\LogFiles\W3SVC1\u_ex140721.log'
WHERE
time-taken > 1000
ORDER BY time-taken desc
New Relic
My recommendation - go easy on yourself and sign up for a free trial. No I don't work for them, but I've used their APM product a lot. Install the agent on the server - set it up. In 10 mins you will be amazed at the data you see about the site. Trust me.
Its designed to work in production environments and gives you amazing depth of info on what's running slow, down to the database query and stack traces. Its pure awesome. Once its setup wait for the next user complaint, log in and look at traces for the time frame.
When your pro trial ends, you can still get valuable data on the free tier, but it will only keep last 24 hours. We purchased licenses -expensive yes, but worth every cent. Why? Time taken to identify root causes was reduced by an order of magnitude, we can get proactive by looking at what is number 2, 3 and 4 on the slow requests list and working those before they become big problems, and finally the alerting makes us much more responsive when things were going wrong.
Code it
You could roll you own. This blog uses Mvc ActionFilters to do the logging. You could also use an HttpModule similar to this post. The nice thing about this approach is you can compile and implement the module separately from your application, and then just drop in the dll and update web.config to wire up the module. I would be wary of these approaches for a very busy site. Also, getting the right level of detail to fully identify the root is challenging.
View Requests
As touched on by Appleman1234, IIS has a little known feature to look at requests currently executing. Its handy for the 'hey its running slow right now' situation. You can use appcmd.exe or the IIS gui to do it. You will need to install the 'Request Monitor' IIS feature for this to work. This approach is ok for rudimentary narrowing of the problem, but does not show you whats running slowly in your controller.
There are various ways you can do this:
Failed Requests Tracing(FRT) – formerly known as Failed Request Event Buffering (FREB) with custom failure condition of takes over a certain time to load / run
Logging request information with IIS logging functionality and then using a tool like LogParserStudio
Using tools like Fiddler or IISMonitor on the IIS server to capture request information
For FRT the official documentation is available here and information how to capture dumps for long running process is avaliable here
For logging request information in IIS information about log file analysis is located here
For information on configuring Fiddler to capture IIS requests find information here
A summary of the steps in the linked resources is provided below.
For FRT
From IIS Manager for a given site,In the Actions pane, under Configure, click Failed Request Tracing and enter desired values in dialog box to enable Failed Request Tracing.
From IIS Manager for a given site, under IIS click Failed Request Tracing Rules, in order to define rules of failure for a given request. In the Actions pane, click Add and follow the wizard.
The logs will go in the directory you specify and are viewable in a web broswer.
For IIS logging
Logging is enabled by default on IIS
From IIS Manager for a given site,under IIS click Logging, and in the Actions Pane, click Enable to enable logging if it isn't already.
From IIS Manager for a given site,under IIS click Logging, and then configure as desired and click apply.
Install LogParser, .Net 4.x and LogParserStudio (if you need additional steps see here
Open LogParserStudio and add logs to it, you then can use SQL queries to get information from the log files.
For Fiddler
You need to change the user that IIS runs as to a user that can launch applications, like Fiddler (instead of Network Service), and then launch Fiddler with that user.
Also see Monitor Activity on a Web Server (IIS 7) for further information.
I'm developing a Rails 3.2.16 app and deploying to a Heroku dev account with one free web dyno and no worker dynos. I'm trying to determine if a (paid) worker dyno is really needed.
The app sends various emails. I use delayed_job_active_record to queue those and send them out.
I also need to check a notification count every minute. For that I'm using rufus-scheduler.
rufus-scheduler seems able to run a background task/thread within a Heroku web dyno.
On the other hand, everything I can find on delayed_job indicates that it requires a separate worker process. Why? If rufus-scheduler can run a daemon within a web dyno, why can't delayed_job do the same?
I've tested the following for running my every-minute task and working off delayed_jobs, and it seems to work within the single Heroku web dyno:
config/initializers/rufus-scheduler.rb
require 'rufus-scheduler'
require 'delayed/command'
s = Rufus::Scheduler.singleton
s.every '1m', :overlap => false do # Every minute
Rails.logger.info ">> #{Time.now}: rufus-scheduler task started"
# Check for pending notifications and queue to delayed_job
User.send_pending_notifications
# work off delayed_jobs without a separate worker process
Delayed::Worker.new.work_off
end
This seems so obvious that I'm wondering if I'm missing something? Is this an acceptable way to handle the delayed_job queue without the added complexity and expense of a separate worker process?
Update
As #jmettraux points out, Heroku will idle an inactive web dyno after an hour. I haven't set it up yet, but let's assume I'm using one of the various keep-alive methods to keep it from sleeping: Easy way to prevent Heroku idling?.
According to this
https://blog.heroku.com/archives/2013/6/20/app_sleeping_on_heroku
your dyno will go to sleep if he hasn't serviced requests for an hour. No dyno, no scheduling.
This could help as well: https://devcenter.heroku.com/articles/clock-processes-ruby
If I have the following action in a controller
def give_a
print a
a = a+1
end
What happens in each webserver when a request comes and when multiple requests are recieved?
I know that webrick and thin and single threaded so I guess that means that the a request doesn't get processed until the current request is done.
What happens in concurrent webservers such as puma or unicorn (perhaps others)
If there are 2 requests coming and 2 unicorn threads handle them, would both responses give the same a value? (in a situation when both request enter the method in the same time)
or does it all depend on what happens on the server itself and the access to data is serial?
Is there a way to have a mutex/semaphore for the concurrent webservers?
afaik, the rails application makes a YourController.new with each request env.
from what you post, it is not possible to see, what a means. when it is some shared class variable, then it is mutuable state and could be modified from both request threads.