Repeated IBM bluemix Node Red app crashing; status 1 - crash

My Node Red application in IBM BlueMix is repeatedly crashing - once an hour - with no real error message other than "exited with status: 1."
How can I troubleshoot this issue?
Is there someone from IBM BlueMix support that monitors this that could take a look?
I looked at my logs and there's nothing in there that really says what's going on.
Edit per requests:
The regular log for "OUT/ERR" is scrolling so fast with HTTPD logs that I can't get it to copy/paste. Filtering to "ERR" Channel the only thing I see is below. I believe this is an error which occurs during deploy when the application restarts.
[App/0] ERR js-bson: Failed to load c++ bson extension, using pure JS version
My Node Red application is gathering data from Wink, LIFX, and other IoT services and compiles them together into a Freeboard dashboard.
Caught crash on screenshot here -- not enough cred to post images so it'll only post as a link

The zlib error was fixed in the 0.13.2 Node-RED release (that shipped 19/02/16).
If you re-stage your application is should pick up the new version of Node-RED
You can re-stage the application using the cf command line management application:
cf restage <app name>

Related

SO_KEEPALIVE issue in Mulesoft

we had a Mulesoft app that basically picks message from queue (ActiveMQ), then posts to target app via HTTP request to target's API.
Runtime: 4.3.0
HTTP Connector version: v1.3.2
Server: Windows, On-premise standalone
However, sometimes the message doesn't get sent successfully after picking from queue , and below message can be found in the log -
WARN 2021-07-10 01:24:46,080 [[masked-app].http.requester.requestConfig.02 SelectorRunner] [event: ] org.glassfish.grizzly.nio.transport.TCPNIOTransport: GRIZZLY0005: Can not set SO_KEEPALIVE to false
java.net.SocketException: Invalid argument: no further information
at sun.nio.ch.Net.setIntOption0(Native Method) ~[?:1.8.0_281]
The flow completed silently without any error after above message, hence no error handling happens.
I found this mentioning it is a known bug on Windows server and won’t affect the well behavior of the application, but the document is failing to set SO_KEEPALIVE to true rather than false.
Looks the message didn't get posted successfully as the target system team can't find corresponding incoming request in their log.
It is not acceptable as the message is critical and no one knows unless the target system realizes something is wrong... Not sure if the SO_KEEPALIVE is failing to be set to false is the root cause, could you please share some thoughts? Thanks a lot in advance.
The is probably unrelated to the warning you mentioned but there doesn't seem to be enough information to identify the actual root cause.
Having said that the version of the HTTP connector is old and it's missing almost 3 years of fixes. Updating the version to the last one should improve the reliability of the application.

Get Jaeger agent error when distributed trace spans Node.js and Java services

In our application a Node.js front end talks to a Java Spring backend. Everything is containerized and running in Kubernetes. Some time ago we added support for Jaeger distribtued tracing across the front end and back end services. Jaeger has been running fine until recently.
Our Elasticsearch cluster was out of date so we upgraded. That mandated an upgrade of Jaeger--we ended up with the following bits:
Jaeger Helm Chart: 0.13.3 from https://github.com/helm/charts/tree/master/incubator/jaeger
Jaeger Client for Node: 3.17.1
Jaeger Client for Java:
opentracing-spring-jaeger-cloud-starter 2.0.3
opentracing-spring-jaeger-web-starter 2.0.3
Both of the opentracing libraries have a dependency on the version 0.35.1 of the Jaeger Java client.
Since upgrading, traces that are created on one side or the other seem to be fine. But traces that span the boundary (i.e. start on the Node.js front end and complete on the Java backend) generate errors in the jaeger-agent pod like this:
{"level":"error","ts":1574224941.7531824,"caller":"processors/thrift_processor.go:119",
"msg":"Processor failed","error":"*jaeger.Batch error reading struct: *jaeger.Span error
reading struct: *jaeger.Log error reading struct: *jaeger.Tag error reading struct:
error reading field 3: Invalid data length","stacktrace":"github.com/jaegertracing/jaeger/cmd/agent/app/processors.
(*ThriftProcessor).processBuffer\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/
agent/app/processors/thrift_processor.go:119\ngithub.com/jaegertracing/jaeger/cmd/agent/app/proc
essors.NewThriftProcessor.func2\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/a
gent/app/processors/thrift_processor.go:83"}
For these traces, the Jaeger UI shows us the spans that were created by the front end before invoking the backend API, but the child backend spans do not show up as you would expect.
What might cause this sort of processor error?
It looks like you have different versions of opentracing. The spring-starter-jaeger version 2.x upgrade the version of opentracing, so you might have introduced this breaking changes when you upgraded the dependency version.

Getting VersionConflictEngineException in IBM Mobile First 6.3

I'm receiving the following message on server log in IBM Mobile First 6.3 every time an Adapter is getting called:
Stacktrace
[ERROR ] Error sending bulk request: java.lang.RuntimeException:
failure in bulk execution: [2]: index [worklight], type [devices], id
[b2deefe7-0d15-4ed4-b199-7e42440fc372], message
[VersionConflictEngineException[[worklight][1]
[devices][b2deefe7-0d15-4ed4-b199-7e42440fc372]: version conflict,
current [58], provided [57]]] at
com.ibm.elasticsearch.servlet.DataReceiver.processData(DataReceiver.java:132)
at
com.ibm.elasticsearch.servlet.DataReceiver.processDataLegacy(DataReceiver.java:85)
at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source) ...
The adapter is executed correctly and the response is returned to the app.
Any idea why this error is happening?
Help will be appreciated.
Thanks.
This is an internal error in analytics. The error itself is actually harmless, however the analytics platform should be catching it... A defect will be logged for the message. In the meantime, if you're not using analytics, you can disable it by removing the WAR files from the Liberty server.
If you are using analytics, then I would recommend clearing out the analytics data folder and restarting the IMF platform (this would remove any data you have stored in analytics). This is assuming that you are running in development mode. The analytics data folder can be in the same directory as the server.xml file for your Liberty server.

IBM MobileFirst Platform 6.3 Operational Analytics Failed installation for Tomcat

I have installed MobileFirst 6.3 appcenter console, worklight console successfully, they are operating fine on Tomcat/7.0.57. However when I try to install Operational Analytics, the documentation has the following
http://www-01.ibm.com/support/knowledgecenter/SSHS8R_6.3.0/com.ibm.worklight.installconfig.doc/monitor/c_op_analytics_installation_tomcat.html
I am using tomcat manager http://localhost:8080/html to deploy the war files. logging in as manager, with the manager-gui role.
worklight-analytics.war - deployed with no issues
when I select the worklight-analytics-service.war file and deploy in the GUI, it throws a blank page first, indicating "connection error", and when I refresh the page, on the status bar in Tomcat manager GUI, I get this message - "FAIL - Tried to use command /upload via a GET request but POST is required";
Please provide some direction on what I need to do get this fixed. I am not sure If I have provided all required information - please bear with me and ask, if anything relevant (obviously I can't figure out what is relevant yet) is required to debug.
So I was able to reproduce your error and I saw this in the logs:
java.lang.IllegalStateException:
org.apache.tomcat.util.http.fileupload.FileUploadBase$SizeLimitExceededException:
the request was rejected because its size (57353297) exceeds the
configured maximum (52428800)
It looks like by default, the web UI will only upload WARs of size 50MB or smaller. The analytics service WAR file is larger than this, so that is why this is failing. I was able to increase the limit by modifying the following lines in
/webapps/manager/WEB-INF/web.xml
<max-file-size>100000000</max-file-size>
<max-request-size>100000000</max-request-size>
This will increase the limit to 100MB. After I did this, I was able to successfully deploy the service WAR.
Just as a heads up, once you get the WAR deployed, you'll be presented with the login page. You'll need a tomcat user with the 'worklightadmin' role in order to get past the login screen.
The worklight-analytics-service WAR file does not have a user interface. It is simply referenced by the worklight-analytics WAR file. When both WARs have been deployed, can you see the analytics console? And does data load just fine? If so, then everything is fine. There is only an issue if you are unable to use the user interface provided by the worklight-analytics WAR file.

Cannot Direct Update after fixpack WL 6.1.0.1

after the fixpack in subject, i cannot manage to direct update my application.
It notify me that there is a new version available, but when i hit Update it keeps saying that the download failed (both on iOS and Android).
I attached my android to adb and i noticed those lines in the console:
Authentication error: Unable to respond to any of these challenges:
{wl-composite-challenge=WWW-Authenticate: WL-Composite-Challenge}
java.io.IOException: Error downloading update file The following
message has been received from the server instead of the expected
application update zip file: HTTP/1.1 401 Unauthorized
/-secure-{"challenges":{"wl_deviceNoProvisioningRealm":{"token":"1or0tj7gnoev1rn06s188j4u9h"},"wl_antiXSRFRealm":{"WL-Instance-Id":"np8c8o3c4dk1k7s79i2ikddfab"}}}/
at
com.worklight.androidgap.plugin.WebResourcesDownloaderPlugin$WebResourcesDownloader.downloadZipFile(WebResourcesDownloaderPlugin.java:364)
To complete the information I deployed the IBM_Worklight_Console with no security role in its web.xml and we have Worklight 6.1.0.1 installed on WAS Network Deployment 7.0.0.23 running AIX.
Before the fixpack, everything worked well.
Thank you
EDIT: here you can see my application-descriptor.xml and my server configuration:
I forgot about this stack, however I found the problem.
When I open my application, the WL Framework asks for new available update. In the same time, my application ask an adapter for some data, before the WL server can response about the Update. So if it arrives first the response of my adpater, then the Update response will contain a different requestID causing a 403 forbidden.
I don't know if I explained it clearly, however the temporary fix is to put updateSilently: true, not disturbing the direct update.