Traefik Healthcheck apparently causing intermittent RST - traefik

Our application is reporting regular connection errors which seem to originate from the Traefik healthcheck.
We have a Java application sitting behind Traefik (v1.7.5) - and keep getting this error in our logs:
2019-01-02 16:30:04.676 ERROR 1 --- [or-http-epoll-3]
reactor.netty.tcp.TcpServer : [id: 0xa37b0112,
L:/10.0.1.80:8080 - R:/10.0.1.72:48298]
onUncaughtException(SimpleConnection{channel=[id: 0xa37b0112,
L:/10.0.1.80:8080 - R:/10.0.1.72:48298]})
We ran a packet sniff and it appears that Traefik is occasionally sending a RST - e.g:
traefik -> receiver GET /monitor/health HTTP/1.1
traefik -> receiver FIN, ACK
receiver -> traefik HTTP/1.1 200 OK
receiver -> traefik FIN, ACK
traefik -> receiver RST
traefik -> receiver RST
...which causes the uncaught error in our logs. I'm wondering if this is bug in Traefik - or something we've misconfigured - but not quite sure where to look next?

Related

Health Check on Fabric CA

I have a hyperledger fabric network v2.2.0 deployed with 2 peer orgs and an orderer org in a kubernetes cluster. Each org has its own CA server. The CA pod keeps on restarting sometimes. In order to know whether the service of the CA server is reachable or not, I am trying to use the healthz API on port 9443.
I have used the livenessProbe condition in the CA deployment like so:
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 9443
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
After configuring this liveness probe, the pod keeps on restarting with the event Liveness probe failed: HTTP probe failed with status code: 400. Why might this be happening?
HTTP 400 code:
The HTTP 400 Bad Request response status code indicates that the server cannot or will not process the request due to something that is perceived to be a client error (for example, malformed request syntax, invalid request message framing, or deceptive request routing).
This indicates that Kubernetes is sending the data in a way hyperledger is rejecting, but without more information it is hard to say where the problem is. Some quick checks to start with:
Send some GET requests directly to the hyperledger /healthz resource yourself. What do you get? You should get back either a 200 "OK" if everything is functioning, or a 503 "Service Unavailable" with details of which nodes are down (docs).
kubectl describe pod liveness-request. You should see a few lines towards the bottom describing the state of the liveness probe in more detail:
Restart Count: 0
.
.
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned example-dc/liveness-request to dcpoz-d-sou-k8swor3
Normal Pulling 4m45s kubelet, dcpoz-d-sou-k8swor3 Pulling image "nginx"
Normal Pulled 4m42s kubelet, dcpoz-d-sou-k8swor3 Successfully pulled image "nginx"
Normal Created 4m42s kubelet, dcpoz-d-sou-k8swor3 Created container liveness
Normal Started 4m42s kubelet, dcpoz-d-sou-k8swor3 Started container liveness
Some other things to investigate:
httpGet options that might be helpful:
scheme – Protocol type HTTP or HTTPS
httpHeaders– Custom headers to set in the request
Have you configured the operations service?
You may need a valid client certificate (if TLS is enabled, and clientAuthRequired is set to true).

Spring Cloud Gateway hides server websocket handshake 401 failures to clients

I'm reverse proxying a websocket backend API with spring-cloud-gateway 2.2.3. When this backend API rejects some websocket handshake request with a 401 Unauthorized status response, then spring-cloud-gateway still returns a 101 handshake status to the client (which gets confused and then misbehaves)
I need spring-cloud-gateway to return the original 401 websocket handshake error to the client so the SCG reverse proxy is transparent to the client (which is conforming to the WebSocket specs handshake)
Here are the full wiretap traces and exception (I have redacted hostnames).
The client-side response in this WSS request is available as a HAR file captured from chrome and which displays in chrome
as this screenpshot.
Here is my spring cloud gateway configuration
spring:
cloud:
gateway:
routes:
- id: route_shield
uri: https://shield-webui-cf-mysql.nd-int-cfapi.was.redacted
predicates:
- Host=**
filters:
- SetRequestHostHeader=shield-webui-cf-mysql.nd-int-cfapi.was.redacted
ssl:
useInsecureTrustManager: true
I'm wondering whether this is a spring-cloud-gateway bug, or a desired behavior which I can override.
To override it, here are alternatives I'm considering:
using circuit breaker filter and fallback to a local handler returning a 401
write a custom post-filter
Override/patch the WebsocketRoutingFilter
However my debugger breakpoint in the handle(WebSocketSession session) method does not trigger, suspecting it is not called
Likely would need to provide a RequestUpgradeStrategy bean as an alternative to the default implementation of org.springframework.web.reactive.socket.server.upgrade.ReactorNettyRequestUpgradeStrategy#getNativeResponse mentionned in the trace
io.netty.handler.codec.http.websocketx.WebSocketHandshakeException: Invalid handshake response getStatus: 401 Unauthorized
at io.netty.handler.codec.http.websocketx.WebSocketClientHandshaker13.verify(WebSocketClientHandshaker13.java:274) ~[netty-codec-http-4.1.51.Final.jar:4.1.51.Final]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ http://localhost:8080/v2/events [ReactorNettyRequestUpgradeStrategy]

WildFly Swarm apps using an external ActiveMQ broker

I'm having a very hard time to get two WildFly swarm apps (based on 2017.9.5 version) communicate with each other over a standalone ActiveMQ 5.14.3 broker. All done using YAML config as I can't have a main method in my case.
after reading hundreds of outdated examples and inaccurate pages of documentation, I settled with following settings for both producer and consumer apps:
swarm:
messaging-activemq:
servers:
default:
jms-topics:
domain-events: {}
messaging:
remote:
name: remote-mq
host: localhost
port: 61616
jndi-name: java:/jms/remote-mq
remote: true
Now it seems that at least part of the setting is correct as the apps start except for following warning:
2017-09-16 14:20:04,385 WARN [org.jboss.activemq.artemis.wildfly.integration.recovery] (MSC service thread 1-2) AMQ122018: Could not start recovery discovery on XARecoveryConfig [transportConfiguration=[TransportConfiguration(name=, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&localAddress=::&host=localhost], discoveryConfiguration=null, username=null, password=****, JNDI_NAME=java:/jms/remote-mq], we will retry every recovery scan until the server is available
Also when producer tries to send messages it just times out and I get following exception (just the last part):
Caused by: javax.jms.JMSException: Failed to create session factory
at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createConnectionInternal(ActiveMQConnectionFactory.java:727)
at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createXAConnection(ActiveMQConnectionFactory.java:304)
at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createXAConnection(ActiveMQConnectionFactory.java:300)
at org.apache.activemq.artemis.ra.ActiveMQRAManagedConnection.setup(ActiveMQRAManagedConnection.java:785)
... 127 more
Caused by: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ119013: Timed out waiting to receive cluster topology. Group:null]
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:797)
at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createConnectionInternal(ActiveMQConnectionFactory.java:724)
... 130 more
I suspect that the problem is ActiveMQ has security turned on, but I found no place to give username and password to swarm config.
The ActiveMQ instance is running using Docker and following compose file:
version: '2'
services:
activemq:
image: webcenter/activemq
environment:
- ACTIVEMQ_NAME=amqp-srv1
- ACTIVEMQ_REMOVE_DEFAULT_ACCOUNT=true
- ACTIVEMQ_ADMIN_LOGIN=admin
- ACTIVEMQ_ADMIN_PASSWORD=your_password
- ACTIVEMQ_WRITE_LOGIN=producer_login
- ACTIVEMQ_WRITE_PASSWORD=producer_password
- ACTIVEMQ_READ_LOGIN=consumer_login
- ACTIVEMQ_READ_PASSWORD=consumer_password
- ACTIVEMQ_JMX_LOGIN=jmx_login
- ACTIVEMQ_JMX_PASSWORD=jmx_password
- ACTIVEMQ_MIN_MEMORY=1024
- ACTIVEMQ_MAX_MEMORY=4096
- ACTIVEMQ_ENABLED_SCHEDULER=true
ports:
- "1883:1883"
- "5672:5672"
- "8161:8161"
- "61616:61616"
- "61613:61613"
- "61614:61614"
any idea what's going wrong?
I had bad times trying to get it working too. The following YML solved my problem:
swarm:
network:
socket-binding-groups:
standard-sockets:
outbound-socket-bindings:
myapp-socket-binding:
remote-host: localhost
remote-port: 61616
messaging-activemq:
servers:
default:
remote-connectors:
myapp-connector:
socket-binding: myapp-socket-binding
pooled-connection-factories:
myAppRemote:
user: username
password: password
connectors:
- myapp-connector
entries:
- 'java:/jms/remote-mq'

Weblogic 12.1.2. "https + t3" combination on a single managed server. Is it possible?

WLS 12.1.2 is running under JDK 1.7_60 on Windows 7
To meet the requirement "Switch to HTTPS, but leave t3" the following steps are performed in admin console for managed server (where the apps reside)
Disable default listen port 7280 (http and t3)
Enable default SSL listen port 7282 (https and t3s)
In order to enable t3, create a custom Channel
Protocol: t3
Port: 7280
“HTTP Enabled for This Protocol“ flag is set to false
After that, we have https and t3s on port 7282 and t3 only on port 7280.
In this case, we have issues with deployment of applications.
The deployer fails to start/stop the apps.
The reason is the deployer still tries to send messages to managed server via http.
I turned on the deployment debugging and see the following messages in admin server log.
…<DeploymentServiceTransportHttp> …<HTTPMessageSender: IOException: java.io.EOFException: Response had end of stream after 0 bytes when making a DeploymentServiceMsg request to URL: http://localhost:7280/bea_wls_deployment_internal/DeploymentService>
… <DeploymentServiceTransportHttp> …<sending message for id '-1' to 'my_srv' using URL 'http://localhost:7280' via http>
If I disable the custom t3 Channel, everything is ok. The deployer sends messages to https://localhost:7282, as expected. But in this case, we have no t3 available.
Any help is much appreciated.
Thanks

Apache HTTPD Websocket Tunnel Plugin Error

My websocket connection fails to connect when connecting through Apache ws tunnel plugin intermittently. The connection always works when hitting the app servers directly.
I see the below errors.
Error during WebSocket handshake: Invalid status line
WebSocket connection to 'ws://host' failed: One or more reserved bits are on: reserved1 = 1, reserved2 = 0, reserved3 = 0
and sometimes
WebSocket connection to 'ws://host' failed: Unrecognized frame opcode: 12
and at times
Error during WebSocket handshake: Status line does not end with CRLF ui-toolkit-vendor.js:21965
Infrastructure
Apache HTTPD 2.4.9 with mod_proxy_wstunnel and mod_proxy_balancer modules
The ws tunnel module ported with 2.4.9 version has several bugs which have been later fixed in the 2.4.12 build. Please find the excerpt from the SVN log.
Revision 1587075 - (view) (download) (annotate) - [select for diffs]
Modified Sun Apr 13 18:41:05 2014 UTC (11 months, 3 weeks ago) by covener
File length: 20119 byte(s)
Diff to previous 1587057 (colored)
several related mod_proxy_wstunnel changes that are tough to pull apart:
make async websockets tunnel opt-in
add config for how long we block a thread in asynch mode
add config for a cap on the synchronous path
avoid sending error responses down the upgraded tunnel