Hyperledger RAFT consensus not electing a Leader after Kafka to Raft Migration in version 1.4.9 - migration

I have a 1.4.9 network having 3 orderers and 2 peers accordingly running on KAFKA consensus. I wanted to migrate it to RAFT so as prerequisites mentioned I updated channel capabilities for both application channel as well as system channel. After Upgrading Capabilities I Started migration of the consensus by putting the network in Maintenance Mode and updating the Consensus type like this
"ConsensusType": {
"mod_policy": "Admins",
"value": {
"metadata": {
"consenters": [
"client_tls_cert": "CERTS",
"host": "orderer1",
"port": 7050,
"server_tls_cert": "CERTS"
"client_tls_cert": "CERTS",
"host": "orderer2",
"port": 7050,
"server_tls_cert": "CERTS"
"client_tls_cert": "CERTS",
"host": "orderer3",
"port": 7050,
"server_tls_cert": "CERTS"
"options": {
"election_tick": 10,
"heartbeat_tick": 1,
"max_inflight_blocks": 5,
"snapshot_interval_size": 20971520,
"tick_interval": "500ms"
"type": "etcdraft"
"version": "1"
Applying the correct certificates in both Application channel and system channel. After this i restarted the docker containers and it showed consensus migration has started to RAFT consensus and it starts the election process but it never completes the election process. Log is like this for all the orderers as shown below
This is my orderer config
- GODEBUG=netdns=go
- ORDERER_HOST=orderer2
- CONFIGTX_ORDERER_KAFKA_BROKERS=[kafka0:9092,kafka1:9092,kafka2:9092,kafka3:9092]
- ORDERER_GENERAL_GENESISFILE=/var/hyperledger/orderer/configtx/genesis.block
- ORDERER_GENERAL_LOCALMSPDIR=/var/hyperledger/orderer/msp
# enabled TLS
- ORDERER_GENERAL_TLS_PRIVATEKEY=/var/hyperledger/orderer/tls/server.key
- ORDERER_GENERAL_TLS_CERTIFICATE=/var/hyperledger/orderer/tls/server.crt
- ORDERER_GENERAL_TLS_ROOTCAS=/var/hyperledger/orderer/msp/cacerts/ca-orderer-7054.pem
- ORDERER_GENERAL_CLUSTER_CLIENTCERTIFICATE=/var/hyperledger/orderer/tls/server.crt
- ORDERER_GENERAL_CLUSTER_CLIENTPRIVATEKEY=/var/hyperledger/orderer/tls/server.key
- ORDERER_GENERAL_CLUSTER_ROOTCAS=[/var/hyperledger/orderer/msp/cacerts/ca-orderer-7054.pem,/var/hyperledger/peers/eprocure1/peer/msp/cacerts/ca-eprocure1-7054.pem,/var/hyperledger/peers/eprocure2/peer/msp/cacerts/ca-eprocure2-7054.pem]
# Client Auth
- ORDERER_GENERAL_TLS_CLIENTROOTCAS=[/var/hyperledger/orderer/msp/cacerts/ca-orderer-7054.pem,/var/hyperledger/peers/eprocure1/peer/msp/cacerts/ca-eprocure1-7054.pem,/var/hyperledger/peers/eprocure2/peer/msp/cacerts/ca-eprocure2-7054.pem]
- ORG_ADMIN_CERT=/var/hyperledger/orderer/msp/admincerts/cert.pem
2023-02-16 11:45:06.175 UTC [orderer.consensus.etcdraft] Step -> INFO 07e 1 is starting a new election at term 1 channel=channel node=1
2023-02-16 11:45:06.175 UTC [orderer.consensus.etcdraft] becomePreCandidate -> INFO 07f 1 became pre-candidate at term 1 channel=channel node=1
2023-02-16 11:45:06.180 UTC [orderer.consensus.etcdraft] poll -> INFO 080 1 received MsgPreVoteResp from 1 at term 1 channel=channel node=1
2023-02-16 11:45:06.180 UTC [orderer.consensus.etcdraft] campaign -> INFO 081 1 [logterm: 1, index: 3] sent MsgPreVote request to 2 at term 1 channel=channel node=1
2023-02-16 11:45:06.180 UTC [orderer.consensus.etcdraft] campaign -> INFO 082 1 [logterm: 1, index: 3] sent MsgPreVote request to 3 at term 1 channel=channel node=1
In the beginning of the logs I get this error shown here
2023-02-16 11:44:33.175 UTC [orderer.consensus.etcdraft] logSendFailure -> ERRO 049 Failed to send StepRequest to 2, because: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup orderer2 on no such host" channel=testchainid node=1
2023-02-16 11:44:33.175 UTC [orderer.consensus.etcdraft] logSendFailure -> ERRO 04a Failed to send StepRequest to 3, because: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup orderer3 on no such host" channel=testchainid node=1
2023-02-16 11:44:33.705 UTC [core.comm] ServerHandshake -> ERRO 04b TLS handshake failed with error tls: first record does not look like a TLS handshake server=Orderer remoteaddress=
I have added environment variables related to General.Cluster but no Difference was found


Connnections handling

So I've been using karate for a while now, and there has been an issue we were facing since over the last year: org.apache.http.conn.ConnectTimeoutException
Other threads about that mentioned connectionTimeout exception were solvable by specifying proxy, but taht did not help us.
After tons of investigation, it turned out that our Azure SNAT was exhausted, meaning Karate was opening way too many connections.
To verify this I enabled log debugging and used this feature
* url "https://www.karatelabs.io/"
* method GET
* method GET
the logs then had following lines
13:10:17.868 [main] DEBUG com.intuit.karate - request:
1 > GET https://www.karatelabs.io/
1 > Host: www.karatelabs.io
1 > Connection: Keep-Alive
1 > User-Agent: Apache-HttpClient/4.5.13 (Java/
1 > Accept-Encoding: gzip,deflate
13:10:17.868 [main] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection request: [route: {s}->https://www.karatelabs.io:443][total available: 0; route allocated: 0 of 5; total allocated: 0 of 10]
13:10:17.874 [main] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection leased: [id: 0][route: {s}->https://www.karatelabs.io:443][total available: 0; route allocated: 1 of 5; total allocated: 1 of 10]
13:10:17.875 [main] DEBUG o.a.h.impl.execchain.MainClientExec - Opening connection {s}->https://www.karatelabs.io:443
13:10:17.883 [main] DEBUG o.a.h.i.c.DefaultHttpClientConnectionOperator - Connecting to www.karatelabs.io/
13:10:17.883 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Connecting socket to www.karatelabs.io/ with timeout 30000
13:10:17.924 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Enabled protocols: [TLSv1.3, TLSv1.2]
13:10:17.924 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Enabled cipher suites:[...]
13:10:17.924 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Starting handshake
13:10:18.012 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Secure session established
13:10:18.012 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - negotiated protocol: TLSv1.3
13:10:18.012 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - negotiated cipher suite: TLS_AES_256_GCM_SHA384
13:10:18.012 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - peer principal: CN=karatelabs.io
13:10:18.012 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - peer alternative names: [karatelabs.io, www.karatelabs.io]
13:10:18.012 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - issuer principal: CN=Sectigo RSA Domain Validation Secure Server CA, O=Sectigo Limited, L=Salford, ST=Greater Manchester, C=GB
13:10:18.014 [main] DEBUG o.a.h.i.c.DefaultHttpClientConnectionOperator - Connection established localIp<->serverIp
13:10:18.015 [main] DEBUG o.a.h.i.c.DefaultManagedHttpClientConnection - http-outgoing-0: set socket timeout to 120000
13:10:18.015 [main] DEBUG o.a.h.impl.execchain.MainClientExec - Executing request GET / HTTP/1.1
13:10:18.066 [main] DEBUG o.a.h.impl.execchain.MainClientExec - Connection can be kept alive indefinitely
13:10:18.196 [main] DEBUG com.intuit.karate - request:
2 > GET https://www.karatelabs.io/
13:10:18.196 [main] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection request: [route: {s}->https://www.karatelabs.io:443][total available: 0; route allocated: 0 of 5; total allocated: 0 of 10]
13:10:18.196 [main] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection leased: [id: 1][route: {s}->https://www.karatelabs.io:443][total available: 0; route allocated: 1 of 5; total allocated: 1 of 10]
13:10:18.196 [main] DEBUG o.a.h.impl.execchain.MainClientExec - Opening connection {s}->https://www.karatelabs.io:443
13:10:18.196 [main] DEBUG o.a.h.i.c.DefaultHttpClientConnectionOperator - Connecting to www.karatelabs.io/
13:10:18.196 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Connecting socket to www.karatelabs.io/ with timeout 30000
13:10:18.206 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Enabled protocols: [TLSv1.3, TLSv1.2]
13:10:18.206 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Enabled cipher suites:[...]
13:10:18.206 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Starting handshake
13:10:18.236 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Secure session established
13:10:18.236 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - negotiated protocol: TLSv1.3
13:10:18.236 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - negotiated cipher suite: TLS_AES_256_GCM_SHA384
13:10:18.236 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - peer principal: CN=karatelabs.io
13:10:18.236 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - peer alternative names: [karatelabs.io, www.karatelabs.io]
13:10:18.236 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - issuer principal: CN=Sectigo RSA Domain Validation Secure Server CA, O=Sectigo Limited, L=Salford, ST=Greater Manchester, C=GB
13:10:18.236 [main] DEBUG o.a.h.i.c.DefaultHttpClientConnectionOperator - Connection established localIp<->serverIp
13:10:18.236 [main] DEBUG o.a.h.i.c.DefaultManagedHttpClientConnection - http-outgoing-1: set socket timeout to 120000
13:10:18.279 [main] DEBUG o.a.h.impl.execchain.MainClientExec - Connection can be kept alive indefinitely
13:10:18.609 [Finalizer] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection manager is shutting down
13:10:18.610 [Finalizer] DEBUG o.a.h.i.c.DefaultManagedHttpClientConnection - http-outgoing-1: Shutdown connection
13:10:18.611 [Finalizer] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection manager shut down
13:10:18.612 [Finalizer] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection manager is shutting down
13:10:18.612 [Finalizer] DEBUG o.a.h.i.c.DefaultManagedHttpClientConnection - http-outgoing-2: Shutdown connection
13:10:18.612 [Finalizer] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection manager shut down
13:10:18.612 [Finalizer] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection manager is shutting down
"Connecting to socket" and "handshake" indicate that karate is establishing a new connection instead of using an already opened one, even though I am sending a request to the same host.
On the other hand, on longer scenarios, I was seeing "http-outgoing-x: Shutdown connection" after about ~1s from opening it, in the middle of the run, despite having "karate.configure('readTimeout', 120000)" specified.
I don't think that was intentional, especially after seeing the "keep-alive" header and the "Connection can be kept alive indefinitely" in the log"
That being said, is there any way to force karate to use the same connection instead of establishing a new one each request?
As far as we know, we use the Apache HTTP Client API the right way.
But you never know. The best thing is for you to dive into the code and see what we could be missing. Or you could provide a way to replicate following these instructions: https://github.com/karatelabs/karate/wiki/How-to-Submit-an-Issue

Hyperledger Fabric - Peer unable to connect to (raft) Orderer with Mutual TLS

I am running a HLF on kubernetes - (3 raft orderers & 2 peers)
Now as raft requires Mutual TLS I had to setup some certificates.
The 3 raft orderers are able to communicate with eachother, as they are electing a leader, and re-electing another leader when I bring that leader down.
When I setup the peer, I used the same CA to generate the certificates. I am able to create the channel & join it from the peer. However I have to run CORE_PEER_MSPCONFIGPATH=$ADMIN_MSP_PATH prior to those commands, otherwise I get Access Denied error.
I am also forced to append the following flags to every peer channel x command I run.
--tls --cafile $ORD_TLS_PATH/cacert.pem --certfile $CORE_PEER_TLS_CLIENTCERT_FILE --keyfile $CORE_PEER_TLS_CLIENTKEY_FILE --clientauth
I am able to create, fetch, join the channel using the admin msp.
Now once the channel is joined, the peer is unable to connect with the orderer, somehow a bad certificate is given.
Orderer Logs
A bad certificate is used ?
2019-08-15 16:07:55.699 UTC [core.comm] ServerHandshake -> ERRO 221 TLS handshake failed with error remote error: tls: bad certificate server=Orderer remoteaddress=
2019-08-15 16:07:55.699 UTC [grpc] handleRawConn -> DEBU 222 grpc: Server.Serve failed to complete security handshake from "": remote error: tls: bad certificate
Peer Logs
These suggest that it could not validate it with the ca.crt ?
2019-08-15 16:10:17.990 UTC [grpc] DialContext -> DEBU 03a parsed scheme: ""
2019-08-15 16:10:17.990 UTC [grpc] DialContext -> DEBU 03b scheme "" not registered, fallback to default scheme
2019-08-15 16:10:17.991 UTC [grpc] watcher -> DEBU 03c ccResolverWrapper: sending new addresses to cc: [{orderer-2.hlf-orderers.svc.cluster.local:7050 0 <nil>}]
2019-08-15 16:10:17.991 UTC [grpc] switchBalancer -> DEBU 03d ClientConn switching balancer to "pick_first"
2019-08-15 16:10:17.991 UTC [grpc] HandleSubConnStateChange -> DEBU 03e pickfirstBalancer: HandleSubConnStateChange: 0xc00260b710, CONNECTING
2019-08-15 16:10:18.009 UTC [grpc] createTransport -> DEBU 03f grpc: addrConn.createTransport failed to connect to {orderer-2.hlf-orderers.svc.cluster.local:7050 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority". Reconnecting...
2019-08-15 16:10:18.012 UTC [grpc] HandleSubConnStateChange -> DEBU 040 pickfirstBalancer: HandleSubConnStateChange: 0xc00260b710, TRANSIENT_FAILURE
2019-08-15 16:10:18.991 UTC [grpc] HandleSubConnStateChange -> DEBU 041 pickfirstBalancer: HandleSubConnStateChange: 0xc00260b710, CONNECTING
2019-08-15 16:10:19.003 UTC [grpc] createTransport -> DEBU 042 grpc: addrConn.createTransport failed to connect to {orderer-2.hlf-orderers.svc.cluster.local:7050 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority". Reconnecting...
2019-08-15 16:10:19.003 UTC [grpc] HandleSubConnStateChange -> DEBU 043 pickfirstBalancer: HandleSubConnStateChange: 0xc00260b710, TRANSIENT_FAILURE
2019-08-15 16:10:20.719 UTC [grpc] HandleSubConnStateChange -> DEBU 044 pickfirstBalancer: HandleSubConnStateChange: 0xc00260b710, CONNECTING
2019-08-15 16:10:20.731 UTC [grpc] createTransport -> DEBU 045 grpc: addrConn.createTransport failed to connect to {orderer-2.hlf-orderers.svc.cluster.local:7050 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority". Reconnecting...
2019-08-15 16:10:20.733 UTC [grpc] HandleSubConnStateChange -> DEBU 046 pickfirstBalancer: HandleSubConnStateChange: 0xc00260b710, TRANSIENT_FAILURE
2019-08-15 16:10:20.990 UTC [ConnProducer] NewConnection -> ERRO 047 Failed connecting to {orderer-2.hlf-orderers.svc.cluster.local:7050 [OrdererMSP]} , error: context deadline exceeded
I generated the used certificates as follows:
Orderer Admin
fabric-ca-client enroll -u https://u:p#ca.example.com -M ./OrdererMSP
Orderer Node X
As I use the same certificates for TLS I added the used hosts here for TLS purposes
orderer-x.hlf-orderers.svc.cluster.local #kubernetes
orderer-x.hlf-orderers #kubernetes
orderer-x #kubernetes
localhost #local debug
fabric-ca-client enroll -m orderer-x \
-u https://ox:px#ca.example.com \
--csr.hosts orderer-x.hlf-orderers.svc.cluster.local,orderer-x.hlf-orderers,orderer-x,localhost \
-M orderer-x-MSP
Peer Admin
fabric-ca-client enroll -u https://u:p#ca.example.com -M ./PeerMSP
Peer Node X
fabric-ca-client enroll -m peer-x \
-u https://ox:px#ca.example.com \
--csr.hosts peer-x.hlf-peers.svc.cluster.local,peer-x.hlf-peers,peer-x,localhost \
-M peer-x-MSP
Now all of these, have the same ca.crt (/cacerts/ca.example.com.pem)
<<: *OrdererDefaults
OrdererType: etcdraft
- Host: orderer-1.hlf-orderers.svc.cluster.local
Port: 7050
ClientTLSCert: orderer-1-MSP/signcerts/cert.pem
ServerTLSCert: orderer-1-MSP/signcerts/cert.pem
- Host: orderer-2.hlf-orderers.svc.cluster.local
Port: 7050
ClientTLSCert: orderer-2-MSP/signcerts/cert.pem
ServerTLSCert: orderer-2-MSP/signcerts/cert.pem
- Host: orderer-3.hlf-orderers.svc.cluster.local
Port: 7050
ClientTLSCert: orderer-3-MSP/signcerts/cert.pem
ServerTLSCert: orderer-3-MSP/signcerts/cert.pem
- orderer-1.hlf-orderers.svc.cluster.local:7050
- orderer-2.hlf-orderers.svc.cluster.local:7050
- orderer-3.hlf-orderers.svc.cluster.local:7050
I have checked multiple times if the correct certificates are mounted on the correct places and configured.
On the peer side I made sure that:
CORE_PEER_TLS_CLIENTROOTCAS_FILES is set correctly and that the (correct) file gets mounted (CORE_PEER_TLS_CLIENTROOTCAS_FILES: "/var/hyperledger/tls/client/cert/ca.crt")
On the orderer side I made sure that:
It seems strange to me that the orderers are able to talk to eachother (as they are electing leaders), but that the peer is not able to do so
So it appears to be, that the tlscacerts should be in the msp(s) directory(ies) PRIOR to creating genesis / channel block. Simply mounting them in the pod at runtime is not enough
My msp directories (used in configtx.yaml) look like:
After this it all started to work
seems like you have got below error
E0923 16:30:14.963567129 31166 ssl_transport_security.cc:989] Handshake failed with fatal error SSL_ERROR_SSL: error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate.
E0923 16:30:15.964456710 31166 ssl_transport_security.cc:188] ssl_info_callback: error occured.
According to your details, All seems to be correct
However check below
certificate signed by unknown authority -> This makes me bit doubt on your certificate mapping

Hyperledger Fabric - Error on Invoke / TLS handshake failed with error tls: first record does not look like a TLS handshake

Scope: This is a network with one channel composed of 3 Orgs, 1 anchor peer per organization, 1 CA per org and 1 MSP per org.
I'm facing an issue on my Hyperledger Fabric network related to the TLS handshake process that occurs when I make an invoke call to my chaincode (which is correctly installed and instantiated) through the CLI container.
[core.comm] ServerHandshake -> ERRO 01b TLS handshake failed with error tls: first record does not look like a TLS handshake {"server": "Orderer", "remote address": ""}
Error: error sending transaction for invoke: could not send: EOF - proposal response: version:1 response:<status:200 >
I couldn't find a solution to that so any kind of help would be great.
EDIT: I'm also having warnings like this one in the orderer container when I update the anchor peers:
2018-12-12 14:06:00.518 UTC [common.deliver] Handle -> WARN 014 Error reading from rpc error: code = Canceled desc = context canceled
2018-12-12 14:06:00.518 UTC [comm.grpc.server] 1 -> INFO 016 streaming call completed {"grpc.start_time": "2018-12-12T14:06:00.509Z", "grpc.service": "orderer.AtomicBroadcast", "grpc.method": "Deliver", "grpc.peer_address": "", "error": "rpc error: code = Canceled desc = context canceled", "grpc.code": "Canceled", "grpc.call_duration": "8.958614ms"}
2018-12-12 14:06:00.518 UTC [orderer.common.broadcast] Handle -> WARN 015 Error reading from rpc error: code = Canceled desc = context canceled
2018-12-12 14:06:00.518 UTC [comm.grpc.server] 1 -> INFO 017 streaming call completed {"grpc.start_time": "2018-12-12T14:06:00.511Z", "grpc.service": "orderer.AtomicBroadcast", "grpc.method": "Broadcast", "grpc.peer_address": "", "error": "rpc error: code = Canceled desc = context canceled", "grpc.code": "Canceled", "grpc.call_duration": "7.13278ms"}
2018-12-12 14:06:10.328 UTC [comm.grpc.server] 1 -> INFO 018 streaming call completed {"grpc.start_time": "2018-12-12T14:06:05.692Z", "grpc.service": "orderer.AtomicBroadcast", "grpc.method": "Deliver", "grpc.peer_address": "", "grpc.peer_subject": "CN=peer1.farmer.supply-chain-network.com,L=San Francisco,ST=California,C=US", "error": "context finished before block retrieved: context canceled", "grpc.code": "Unknown", "grpc.call_duration": "4.636199388s"}
Thanks in advance
Seems like the orderer was expecting a tls connection but cli did not connect with tls.
Did you properly specify --tls --cafile <orderer-cert> during invoke?

Spring Cloud Config (Server) Monitor does not send events to clients via RabbitMQ

I'm using Spring Boot / 2.1.0.RELAESE and Spring Cloud Dependencies / Greenwich.M3, consider the following projects, with the following dependencies:
https://github.com/dnijssen/configurationclient, dependencies:
https://github.com/dnijssen/configurationserver, dependencies:
So my configurationserver fetches the properties for the configurationclient via a Git repository (namely https://github.com/dnijssen/configuration), when a change is made I'll trigger the /monitor endpoint on my configurationserver manually (just for testing). Which is supposed to send an event via RabbitMQ which I started up with the following Docker command:
docker run -d --hostname my-rabbit --name some-rabbit -p 5672:5672 -p 15672:15672 rabbitmq:3-management
This gives me the following response: ["*"]
With the following console output on my configurationserver:
2018-11-22 09:19:48.527 INFO 19316 --- [nio-8888-exec-6] o.s.c.c.monitor.PropertyPathEndpoint : Refresh for: *
2018-11-22 09:19:48.543 INFO 19316 --- [nio-8888-exec-6] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [localhost:5672]
2018-11-22 09:19:48.550 INFO 19316 --- [nio-8888-exec-6] o.s.a.r.c.CachingConnectionFactory : Created new connection: rabbitConnectionFactory.publisher#79a8318:0/SimpleConnection#34a9bacb [delegate=amqp://guest#, localPort= 55205]
2018-11-22 09:19:48.553 INFO 19316 --- [nio-8888-exec-6] o.s.amqp.rabbit.core.RabbitAdmin : Auto-declaring a non-durable, auto-delete, or exclusive Queue (springCloudBusInput.anonymous.cT0DOhixRX6H82A1zBDF9g) durable:false, auto-delete:true, exclusive:true. It will be redeclared if the broker stops and is restarted while the connection factory is alive, but all messages will be lost.
2018-11-22 09:19:48.880 INFO 19316 --- [nio-8888-exec-6] trationDelegate$BeanPostProcessorChecker : Bean 'configurationPropertiesRebinderAutoConfiguration' of type [org.springframework.cloud.autoconfigure.ConfigurationPropertiesRebinderAutoConfiguration$$EnhancerBySpringCGLIB$$135b8961] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2018-11-22 09:19:49.162 INFO 19316 --- [nio-8888-exec-6] o.s.boot.SpringApplication : No active profile set, falling back to default profiles: default
2018-11-22 09:19:49.165 INFO 19316 --- [nio-8888-exec-6] o.s.boot.SpringApplication : Started application in 0.6 seconds (JVM running for 29.712)
2018-11-22 09:19:49.228 INFO 19316 --- [nio-8888-exec-6] o.s.cloud.bus.event.RefreshListener : Received remote refresh request. Keys refreshed []
However my configurationclient does not seem to receive any event, and is thus not refreshing itself.
What am I missing here?
Kind regards,
Dennis Nijssen

HiveServer2: Thrift SASL related exception when using custom PasswdAuthenticationProvider

I've created a custom implementation of the PasswdAuthenticationProvider interface, based on OAuth2. I think the code is irrelevant for the problem I'm experiencing, nevertheless, it can be found here.
I've configured hive-site.xml with the following properties:
Then I've restarted the Hive service and I've connected a JDBC based remote client with success. This is an example of a successful run found in /var/log/hive/hiveserver2.log:
2016-02-01 11:52:44,515 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (HttpClientFactory.java:<init>(66)) - Setting max total connections (500)
2016-02-01 11:52:44,515 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (HttpClientFactory.java:<init>(67)) - Setting default max connections per route (100)
2016-02-01 11:52:44,799 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(65)) - Doing request: GET https://account.lab.fiware.org/user?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx HTTP/1.1
2016-02-01 11:52:44,800 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(76)) - Response received: {"organizations": [], "displayName": "frb", "roles": [{"name": "provider", "id": "106"}], "app_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "email": "frb#tid.es", "id": "frb"}
2016-02-01 11:52:44,801 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(104)) - User frb authenticated
2016-02-01 11:52:44,868 INFO [pool-5-thread-5]: thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(188)) - Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V6
2016-02-01 11:52:44,871 INFO [pool-5-thread-5]: session.SessionState (SessionState.java:start(358)) - No Tez session required at this point. hive.execution.engine=mr.
2016-02-01 11:52:44,873 INFO [pool-5-thread-5]: session.SessionState (SessionState.java:start(358)) - No Tez session required at this point. hive.execution.engine=mr.
The problem is after that the following error appears in a recurrent manner:
2016-02-01 11:52:48,227 ERROR [pool-5-thread-4]: server.TThreadPoolServer (TThreadPoolServer.java:run(215)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more
2016-02-01 11:53:18,323 ERROR [pool-5-thread-5]: server.TThreadPoolServer (TThreadPoolServer.java:run(215)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more
Why? I've seen in several other questions this occurs when using the default value of hive.server2.authentication, i.e. SASL, and the client is not doing the handshake. But in my case, the value of such a property is CUSTOM. I cannot understand it, and any help would be really appreciated.
I've found there are periodical requests to the HiveServer2... from the HiveServer2 itself! These are the requests that are resulting in Thrift SASL errors:
$ sudo tcpdump -i lo port 10000
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
10:18:48.183469 IP dev-fiwr-bignode-11.hi.inet.ndmp > dev-fiwr-bignode-11.hi.inet.55758: Flags [.], ack 7, win 512, options [nop,nop,TS val 1034162147 ecr 1034162107], length 0
21 packets captured
42 packets received by filter
0 packets dropped by kernel
[fiware-portal#dev-fiwr-bignode-11 ~]$ sudo netstat -nap | grep 55758
tcp 0 0 CLOSE_WAIT 7190/java
tcp 0 0 FIN_WAIT2 -
[fiware-portal#dev-fiwr-bignode-11 ~]$ ps -ef | grep 7190
hive 7190 1 1 10:10 ? 00:00:10 /usr/java/jdk1.7.0_71//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hive -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx1024m -Xmx4096m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service- org.apache.hive.service.server.HiveServer2 -hiveconf hive.metastore.uris=" " -hiveconf hive.log.file=hiveserver2.log -hiveconf hive.log.dir=/var/log/hive
1011 14158 12305 0 10:19 pts/1 00:00:00 grep 7190
Any idea?
More research about the connections sent from HiveServer2 to HiveServer2. Data packets always sent 5 bytes, the following ones (hexadecimal): 22 41 30 30 31
Any idea about these connections?
I finally "fixed" this. Since the message was sent by the Ambari agent running in the HiveServer2 machine (some king of weird ping), I simply added an iptables rule blocking all the connections to TCP/10000 port on the loopback interface:
iptables -A INPUT -i lo -p tcp --dport 10000 -j DROP
Of course, now Ambari warns the HiveServer2 is not alive (the pings are droped). And the above rule must be removed if I want to restart the server from Ambari (there is another alive check in the starting script); then after the restart I can enable the rule again. Well, I can live with that.