Erlang/OTP, how to signal an application startup error without a crash - crash

So I have this application that has a process that requires some gen_servers to be alive somewhere else in the cluster.
If they are up it just works, if they are not, my gen_server fails in init with {error,Reason}, this propagates through my supervisor into my applications start function.
The problem is that if I return anything other than {ok,Pid} I get a crash report.
My intention here would be to somehow signal that the application couldn't start properly and that all the processes are down and because of that the application should not be considered active, however, I can only choose to return {ok, self()} and see my application listed as active when it is not, or return {error, Error} and see how it crashes with:
{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}},{ancestors,[rtb_sup,<0.134.0>]},
{messages,[]},{links,[<0.135.0>]},{dictionary,[]},{trap_exit,false},{status,running},
{heap_size,377},{stack_size,24},{reductions,255}],[]]:\n"
The problem seems to be bigger than this, basically there is no way to tell to the application framework that the app failed. It may look like one of these things that are handled by let the process die in erlang, but allow for an {error, } return value on application:start seems like a good tradeoff.
Any hints?

Application will crash at any moment, so application's dependence relationship at the start time can not provide helpful dynamic crash information.
Before I have read part of rabbitmq project source code, it is also a cluster-based program.
I think rabbitmq has faced your similar question as you said, because cluster need collect related nodes's application "is live" information and memory water highmark information and then make decision.
It's solution is
to register the the first main process of the application in the node locally, the name is "rabbit" in the rabbitmq system, you can find it is rabbit.erl file, and in the function "start/2".
start(normal, []) ->
case erts_version_check() of
ok ->
{ok, SupPid} = rabbit_sup:start_link(),
true = register(rabbit, self()),
print_banner(),
[ok = run_boot_step(Step) || Step <- boot_steps()],
io:format("~nbroker running~n"),
{ok, SupPid};
Error ->
Error
end.
And the other 4 modules, rabbit_node_monitor.erl, rabbit_memory_monitor.erl,
vm_memory_monitor.erl, rabbit_alarm.erl to use two erlang technique, one is monitor process to get "DOWN" message of the registered process, the other is alarm handler to collect these information.

Related

Remote debug Idea doesn't works for openresty

I am using mobDebug. If run a lua script from command line everything works.
But when I run them from openresty the Idea doesn't stop. It only writes "Connected/ Disconnected"
Configs:
location / {
access_by_lua_block {
local client = require("client")
}
client.lua:
local mobdebug = require("mobdebug");
mobdebug.start()
local lfs = require("lfs")
print("Folder: "..lfs.currentdir())
modebug debug_hook is not invoked for needed lines, set_breakpoints don't invoked.
Idea Debug Logs, but nothing occures:
Idea catch debug from terminal client.lua; But it miss it from running nginx.
THIS IS NOT AN ANSWER. It's just that I am experiencing basically the same problem, and comment space is too small to fit all the relevant observations I would like to share:
I was actually able to stop immediately after mobdebug.start() in code running in nginx, and to step-debug - but only in code called directly from init_by_lua_block. This code of course executes once during the server startup or config reload.
I was never able to stop in worker code (e.g. rewrite_by_lua_*). mobdebug.coro() didn't help, and mobdebug.on() threw about "attempt to yield across C-call boundary"
I was only ever able to stop one time, on next statement after mobdebug.start(); once I hit |> (Resume program), it won't stop on any further breakpoints.
Using mobdebug.loop() is not a correct way to do this, as it's used for live coding, which is not going to work as expected with this setup. mobdebug.start() should be used instead.
Please see an example of how this debugging can be setup with ZeroBrane Studio here: http://notebook.kulchenko.com/zerobrane/debugging-openresty-nginx-lua-scripts-with-zerobrane-studio. All the details to how paths to mobdebug and required modules are configured should still be applicable to your environment.

Is it possible to write a plugin for Glimpse's existing SQL tab?

Is it possible to write a plugin for Glimpse's existing SQL tab?
I'm trying to log my SQL queries and the currently available extensions don't support our in-house SQL libary. I have written a custom plugin which logs what I want, but it has limited functionality and it doesn't integrate with the existing SQL tab.
Currently, I'm logging to my custom plugin using a single helper method inside my DAL's base class. This function looks takes the SqlCommand and Duration in order to show data on my custom tab:
// simplified example:
Stopwatch sw = Stopwatch.StartNew();
sqlCommand.Connection = sqlConnection;
sqlConnection.Open();
object result = sqlCommand.ExecuteScalar();
sqlConnection.Close();
sw.Stop();
long duration = sw.ElapsedMilliseconds;
LogSqlActivity(sqlCommand, null, duration);
This works well on my 'custom' tab but unfortunately means I don't get metrics shown on Glimpse's HUD:
Is there a way I can provide Glimpse directly with the info it needs (in terms of method names, and parameters) so it displays natively on the SQL tab?
The following advise is based on the fact that you can't use DbProviderFactory and you can't use a proxied SqlCommand, etc.
The data that appears in the "out-of-the-box" SQL tab is based on messages of given types been published through our internal Message Broker (see below on information on this). Because of the above limitations in your case, to get things lighting up correctly (i.e. your data showing up in HUD and the SQL tab), you will need to simulate the work that we do under the covers when we publish these messages. This shouldn't be that difficult and once done, should just work moving forward.
If you have a look at the various proxies we have here you will be above to see what messages we publish in what circumstances. Here are some highlights:
DbCommand
Log command start - here
Log command error - here
Log command end - here
DbConnection:
Log connection open - here
Log connection closed - here
DbTransaction
Log Started - here
Log committed - here
Log rollback - here
Other
Command row count here - Glimpses calculates this at the DbDataReader level but you could do it elsewhere as well
Now that you have an idea of what messages we are expecting and how we generate them, as long as you pass in the right data when you publish those messages, everything should just light up - if you are interested here is the code that looks for the messages that you will be publishing.
Message Broker: If you at the GlimpseConfiguration here you will see how to access the Broker. This can be done statically if needed (as we do here). From here you can publish the messages you need.
Helpers: For generating some of the above messages, you can use the helpers inside the Support class here. I would have shifted all the code for publishing the actual messages to this class, but I didn't think there would be too many people doing what you are doing.
Update 1
Starting point: With the above approach you shouldn't need to write your own plugin. You should just be able to access the broker GlimpseConfiguration.GetConfiguredMessageBroker() (make sure you check if its null, which it is if Glimpse is turned off, etc) and publish your messages.
I would imagine that you would put the inspection that leverages the broker and published the messages, where ever you have knowledge of the information that needs to be collected (i.e. inside your custom lib). Normally this would require references inside your lib to glimpse (which you may not want), so to protect against this, from your lib, you would call a proxy (which could be another VS proj) that has the glimpse dependency. Hence your ado lib only has references to your own code.
To get your toes wet, try just publishing a couple of fake connection and command messages. Assuming the broker you get from GlimpseConfiguration.GetConfiguredMessageBroker() isn't null, these should just show up. Then you can work towards getting real data into it from your lib.
Update 2
Obsolete Broker Access
Its marked as obsolete because its going to change in v2. You will still be able to do what you need to do, but the way of accessing the broker has changed. For what you currently need to do this is ok.
Sometimes null
As you have found this is really dependent on where in the page lifecycle you are currently at. To get around this, I would probably change my original recommendation a little.
In the code where you are currently creating messages and pushing them to the message bus, try putting them into HttpContext.Current.Items. If you haven't used it before, this is a store which asp.net provides out of the box which lasts the lifetime of a given request. You could have a list that you put in there, still create the message objects that you are doing, but put them into that list instead of pushing them through the broker.
Then, create a HttpModule (its really simple to do) which taps into the PostLogRequest event. Within this handler, you would pull the list out of the context, iterate through it and push the message into the message broker (accessing the same way you have been).

Allow only one running instance of a program

How can i restrict my program to run only instance? Currently i'm running my program as daemon(starts and stops automatically), and when user clicks and tries to launch again(which is not a valid usecase), process gets launched in user context and i would like to avoid this for many reasons.
How can i achieve this?
As of now i'm getting list of processes and doing some checks and exiting at the begining itself but this method is not clean, though it solves my problem.
can someone give me a better solution?
And i'm using ps to get process list, is there any reliable API to get this done?
Use a named semaphore with count of 1. On startup, check if this semaphore is taken. If it is, quit. Otherwise, take it.
Proof of concept code: (put somewhere near application entry point)
#include <semaphore.h>
...
if (sem_open(<UUID string for my app>, O_CREAT, 600, 1) == SEM_FAILED) {
exit(0);
}
From the sem_open documentation,
The returned semaphore descriptor is available to the calling process until it is closed with sem_close(), or until the caller exits or execs.

How can I reject a Windows "Service Stop" request in ATL 7?

I have a Windows service built upon ATL 7's CAtlServiceModuleT class. This service serves up COM objects that are used by various applications on the system, and these other applications naturally start getting errors if the service is stopped while they are still running.
I know that ATL DLLs solve this problem by returning S_OK in DllCanUnloadNow() if CComModule's GetLockCount() returns 0. That is, it checks to make sure no one is currently using any COM objects served up by the DLL. I want equivalent functionality in the service.
Here is what I've done in my override of CAtlServiceModuleT::OnStop():
void CMyServiceModule::OnStop()
{
if( GetLockCount() != 0 ) {
return;
}
BaseClass::OnStop();
}
Now, when the user attempts to Stop the service from the Services panel, they are presented with an error message:
Windows could not stop the XYZ service on Local Computer.
The service did not return an error. This could be an internal Windows error or an internal service error.
If the problem persists, contact your system administrator.
The Stop request is indeed refused, but it appears to put the service in a bad state. A second Stop request results in this error message:
Windows could not stop the XYZ service on Local Computer.
Error 1061: The service cannot accept control messages at this time.
Interestingly, the service does actually stop this time (although I'd rather it not, since there are still outstanding COM references).
I have two questions:
Is it considered bad practice for a service to refuse to stop when asked?
Is there a polite way to signify that the Stop request is being refused; one that doesn't put the Service into a bad state?
You can't do this. Once the SCM sends a SERVICE_CONTROL_STOP to your service, you have to stop.
If your other apps are also services, you can make them dependencies within the SCM. Of course, if the apps using this service are just regular applications that can't be used.
When ATL's lock count increments to 1, call SetServiceStatus() with the SERVICE_ACCEPT_STOP flag omitted in the SERVICE_STATUS::dwControlsAccepted field. Then you will not receive any SERVICE_CONTROL_STOP requests at all. Any attempt to stop the service will fail immediately. When ATL's lock count falls back to 0, call SetServiceStatus() again with the SERVICE_ACCEPT_STOP flag specified.
I just had to do this in 2 (older) ATL-based services, and it works well. Granted, I was unable to figure out the best way to override Lock() and Unlock() directly, so I just put a small loop inside my service that checks GetLockCount() at frequent intervals and calls SetServiceStatus() when needed.
In your constructor, update m_status.dwControlsAccepted removing SERVICE_ACCEPT_STOP. For instance:
CMyServiceModule::CMyServiceModule()
: ATL::CAtlServiceModuleT<CMyServiceModule, IDS_SERVICENAME>()
{
m_status.dw &= ~SERVICE_ACCEPT_STOP
}

Error handling: show error message or not?

Generally, in software design, which of the options below is preferred when there is a problem or error with a resource such as a database or file?
Show an error message
Do not show an error message and act as though the resource was empty (eg. do not populate a GUI component)]
For example, should the user see an empty DataGrid following which they complain, or should there be an error message? Which is better?
I don't see this as an either/or. Also, we need to consider all "users" of the system.
First consider the UI. Let's consider a contrived general case: you are populating a UI by calling a service which in turn uses a couple of of databases (for example a "current data" and an "historic data") database.
There are at least these possibilities:
It all works, data is retrieved
It all works but as it happens there's no data for this particular query
Can't reach the service
Service is invoked, but one database is down
Service is invoked, but both databases are down
Then also consider your application's semantics. Can your applciation procede in a "degraded" mode if all the data cannot be retrieved? For example, we can't query the history but that doesn't stop us creating a new item.,
Now also consider the roles here. There's the person using the UI, there's also support and maintenance people who need to know about and fix problems.
My general rules:
First Failure Data capture: Whichever component first detects an error should log it in some detail. So, service up, database down the service should log the problem. Service down, the UI should log the problem. This log should be a technical record targeting the support roles.
UIs should be tolerant: if at all possible run in a degraded mode. So if the service is down but (for example) local working is possible put up an empty screen and continue. BUT ...
Always indicate a problem: The "no data for this query" and "databases unavailable" cases may both result in an empty screen. The user needs to know the status of the display, is it showing complete information, partial information (eg. because one DB is down) or is no information available (eg. service or both dbs down). So have a "Status" field somewhere on the screen. Giving messages such as
Historica Data not currently available
or
There are problems retrieveing
information, if these persist please
contact support ...
There are some pitfalls to each of the options
Showing error message
This is specially helpful when your application is in testing stage or public testing. Also when clients meets an error, he or she can copy down the details and forward to you.
However sometimes this error message gets very ugly (call stacks and so on - remember ASP.NET?) and it becomes so large that it becomes difficult for clients to copy down the details.
Do not show error message and act as though nothing happened =)
This is useful when you don't want error messages to cog up your software UI design. But be reminded that it becomes difficult and further error prone when clients can't differentiate between an actual error, or really nothing on the GUI. The error stays there and nothing gets fixed.
My stand
Get the best of both worlds. In fact most modern applications how have a very good error handling process. I'll take the example of Mozilla Firefox 3.
A deadly error occurred and Firefox crashes
Error is captured and stored into a file as a form of error report
Error Reporting Application pops up apologizing to the user
Ask the user if the user want to send the error report to the software dev team
Then ask the user if want to restart the application
Or if the error is a warning or of lesser severity:
Show a simple error code and tell the user that there's the error with that action. Something like: "Error 123 at RequestSalary() Line 2"
The practice I usualy use is:
If the error didn't happen due to user error, then you should try to handle the error quietly.
If the error occurred because of some external problem (such as no internet connection) then you should alert the user.
IMO you should show a message (albeit a user friendly one and not something like "java.io.IOException: Connection timed out".) You could have a message box telling the user that an error occured while getting the data and provide helpful tips like: Trying after some time, check network cable, etc.
Also allow user to report that error to you (error reporting build into the app) that will send you the actual error and stack trace.