I've installed FusionAuth (awesome product) into a Docker Swarm cluster using the official docker-compose.yml file and everything seems to work brilliantly.
EXCEPT
Periodically, when a user goes to login they will be presented with the above error stating that the search engine is not available. If they try again immediately then everything works correctly! I would, obviously, prefer that they never saw the error.
Elasticsearch is definitely running and is responding to API calls correctly, and I can see the fusionauth_user index is present and populated with docs.
I guess my question is two fold:
1) What role does the ElasticSearch engine play in the FusionAuth ecosystem and can it be disabled?
2) Is there a configurable timeout somewhere that is causing the error message and, if so, where can change it?
I've search the docs for answers to the above but I can't seem to find anything :-(
Thanks for the kind feedback.
1) What role does the ElasticSearch engine play in the FusionAuth ecosystem and can it be disabled?
Elasticsearch provides full text search of user data. Each time a user is created or updated the user is re-indexed. In this case during login, we are updating the search index with the last login instant.
This service is required and cannot be disabled. We have had clients request to make this service optional for embedded applications or small scale scenarios where Elasticsearch may not be required. While this is not currently in plan, it is possible we may revisit this option in the future.
2) Is there a configurable timeout somewhere that is causing the error message and, if so, where can change it?
Not currently.
Full disclosure, I am not a Docker or Docker Swarm expert at all - perhaps there are some nuances to Swarm and response time due to spin up and spin down of resources?
Do you see any exceptions in the log when a user sees this error on the login?
I'm using a open-source chef server managing about 150 nodes.
Analytics/Reporting module is not activated in the chef server due to resource constraints.
"chef-client" is running on all the nodes every 30 minutes
How can I find, how much time each chef-client run is taking to complete?
I'm trying to find the nodes that are slowest in completing their chef-client runs
Chef Server doesn't store this information. You'll need to manage it yourself, possibly using a handler as linked above in the comments. A simple option would be to make a handler which stores the duration of the last run as a node attribute, but the sky is the limit. If you want something to help debug long runs once you find them, check out my poise-profiler cookbook.
Our users are restless. They keep complaining about woolly, unmeasurable stuff, particularly slowness, without giving specifics, which of course makes it very difficult to track down.
Nonetheless, it is quite possible that they are right, that there are server calls that are taking way too long to come back. So I want to put some kind of sniffer on the web site (we're using ASP.NET MVC 4 on IIS7) that will log any call that takes more than n seconds to turn around, or that returns more than x megabytes of data, along with all request parameters, the response size, and maybe a certain amount of response data.
I haven't a clue how to do this, though. Any suggestions?
here is my take on this:
FRT
While you can use failed request tracing to log slow requests, in my experience is more useful for finding out why a request fails before it hits your application, rather than why its running slowly. 9/10 times its going to simply show you that the slowdown is in your code somewhere.
Log Parser
Yes you can download and analyze iis logs. I use Log Parser Lizard to do the analysis - its a great gui over log parser. Here's a sample of how you might query slow requests over 1000ms:
SELECT
To_String(To_timestamp(date, time), 'dd/MM/yyyy hh:mm:ss') As Time,
cs-uri-stem, cs-uri-query, cs-method, time-taken, cs-bytes, sc-status
FROM
'C:\inetpub\logs\LogFiles\W3SVC1\u_ex140721.log'
WHERE
time-taken > 1000
ORDER BY time-taken desc
New Relic
My recommendation - go easy on yourself and sign up for a free trial. No I don't work for them, but I've used their APM product a lot. Install the agent on the server - set it up. In 10 mins you will be amazed at the data you see about the site. Trust me.
Its designed to work in production environments and gives you amazing depth of info on what's running slow, down to the database query and stack traces. Its pure awesome. Once its setup wait for the next user complaint, log in and look at traces for the time frame.
When your pro trial ends, you can still get valuable data on the free tier, but it will only keep last 24 hours. We purchased licenses -expensive yes, but worth every cent. Why? Time taken to identify root causes was reduced by an order of magnitude, we can get proactive by looking at what is number 2, 3 and 4 on the slow requests list and working those before they become big problems, and finally the alerting makes us much more responsive when things were going wrong.
Code it
You could roll you own. This blog uses Mvc ActionFilters to do the logging. You could also use an HttpModule similar to this post. The nice thing about this approach is you can compile and implement the module separately from your application, and then just drop in the dll and update web.config to wire up the module. I would be wary of these approaches for a very busy site. Also, getting the right level of detail to fully identify the root is challenging.
View Requests
As touched on by Appleman1234, IIS has a little known feature to look at requests currently executing. Its handy for the 'hey its running slow right now' situation. You can use appcmd.exe or the IIS gui to do it. You will need to install the 'Request Monitor' IIS feature for this to work. This approach is ok for rudimentary narrowing of the problem, but does not show you whats running slowly in your controller.
There are various ways you can do this:
Failed Requests Tracing(FRT) – formerly known as Failed Request Event Buffering (FREB) with custom failure condition of takes over a certain time to load / run
Logging request information with IIS logging functionality and then using a tool like LogParserStudio
Using tools like Fiddler or IISMonitor on the IIS server to capture request information
For FRT the official documentation is available here and information how to capture dumps for long running process is avaliable here
For logging request information in IIS information about log file analysis is located here
For information on configuring Fiddler to capture IIS requests find information here
A summary of the steps in the linked resources is provided below.
For FRT
From IIS Manager for a given site,In the Actions pane, under Configure, click Failed Request Tracing and enter desired values in dialog box to enable Failed Request Tracing.
From IIS Manager for a given site, under IIS click Failed Request Tracing Rules, in order to define rules of failure for a given request. In the Actions pane, click Add and follow the wizard.
The logs will go in the directory you specify and are viewable in a web broswer.
For IIS logging
Logging is enabled by default on IIS
From IIS Manager for a given site,under IIS click Logging, and in the Actions Pane, click Enable to enable logging if it isn't already.
From IIS Manager for a given site,under IIS click Logging, and then configure as desired and click apply.
Install LogParser, .Net 4.x and LogParserStudio (if you need additional steps see here
Open LogParserStudio and add logs to it, you then can use SQL queries to get information from the log files.
For Fiddler
You need to change the user that IIS runs as to a user that can launch applications, like Fiddler (instead of Network Service), and then launch Fiddler with that user.
Also see Monitor Activity on a Web Server (IIS 7) for further information.
I was looking a way to list chef-client where chef is failing (Only failing NOT stopped).
I checked knife status document, but didn't find anything usefull.
anyone know better way to do this ?
Thanks
The knife lastrun plugin provides a useful chef handler that will record the time and status of most recent chef run. It will also usefully store the stack trace of any failed chef run. This is useful plugin when running large numbers of chef clients.
I'm experimenting with the Distributed Shell example in YARN 2.2 and am hoping that someone can clarify what the difference between a managed and and an un-managed application manager is?
For example the following lines appear in the client code
// unmanaged AM
appContext.setUnmanagedAM(true);
but I am unable to find documentation explaining the difference this line makes to the execution behaviour.
Many thanks.
The setUnmanagedAM(true) is used for debugging purposes i.e. it runs an application manager in local mode and does not submit it to a cluster so it is easier to step into code and debug.
You can see it in use in the hadoop-yarn-applications-unmanaged-am-launcher.jar that ships with yarn
Check the respective JIRA tickets: JIRA-420 and JIRA-419 (client side)
Currently, the RM itself manages the AM by allocating a container for it and negotiating the launch on the NodeManager and manages the AM lifecycle. Thereafter, the AM negotiates resources with the RM and launches tasks to do the real work.
It would be a useful improvement to enhance this model by allowing the AM to be launched independently by the client without requiring the RM. These AM's would be launched on a gateway machine that can talk to the cluster. This would open up new use cases such as the following
1) Easy debugging of AM, specially during initial development. Having the AM launched on an arbitrary cluster node makes it hard to looks at logs or attach a debugger to the AM. If it can be launched locally then these tasks would be easier.
2) Running AM's that need special privileges that may not be available on machines managed by the NodeManager
Blog post with more implementation details on unmanaged AM: click-me
Example of how Impala manages its resources with the help of unmanaged applications: Llama