zookeeper node fails to communicate with Leader node - apache

I have a zookeeper cluster which includes 3 nodes. Zookeeper config is mentioned below. While restarting it shows a success message but it shows status as failure.
zoo.cfg
dataDir=/ngs/app/<app>/zookeeper-3.4.6/zookeeperdata/1
clientPort=2181
initLimit=5
syncLimit=2
server.1=pr2-ligerp-lapp27.<domain.com>:2888:3888
server.2=pr2-ligerp-lapp28.<domain.com>:2889:3889
server.3=pr2-ligerp-lapp29.<domain.com>:2890:3890
Please find the logs below:
sh zkServer.sh start
JMX enabled by default
Using config: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
-bash-4.1$
-bash-4.1$ cat zookeeper.out
2017-04-18 18:58:13,840 [myid:] - INFO [main:QuorumPeerConfig#103] - Reading configuration from: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
2017-04-18 18:58:13,843 [myid:] - INFO [main:QuorumPeerConfig#340] - Defaulting to majority quorums
2017-04-18 18:58:13,845 [myid:1] - INFO [main:DatadirCleanupManager#78] - autopurge.snapRetainCount set to 3
2017-04-18 18:58:13,845 [myid:1] - INFO [main:DatadirCleanupManager#79] - autopurge.purgeInterval set to 0
2017-04-18 18:58:13,846 [myid:1] - INFO [main:DatadirCleanupManager#101] - Purge task is not scheduled.
2017-04-18 18:58:13,854 [myid:1] - INFO [main:QuorumPeerMain#127] - Starting quorum peer
2017-04-18 18:58:13,861 [myid:1] - INFO [main:NIOServerCnxnFactory#94] - binding to port 0.0.0.0/0.0.0.0:2181
2017-04-18 18:58:13,875 [myid:1] - INFO [main:QuorumPeer#959] - tickTime set to 3000
2017-04-18 18:58:13,875 [myid:1] - INFO [main:QuorumPeer#979] - minSessionTimeout set to -1
2017-04-18 18:58:13,875 [myid:1] - INFO [main:QuorumPeer#990] - maxSessionTimeout set to -1
2017-04-18 18:58:13,875 [myid:1] - INFO [main:QuorumPeer#1005] - initLimit set to 5
2017-04-18 18:58:13,884 [myid:1] - INFO [main:FileSnap#83] - Reading snapshot /ngs/app/ligerp/solr/zookeeper-3.4.6/zookeeperdata/1/version-2/snapshot.1300000032
2017-04-18 18:58:13,954 [myid:1] - INFO [Thread-1:QuorumCnxManager$Listener#504] - My election bind port: pr2-ligerp-lapp27.<domain>/10.136.145.38:3888
2017-04-18 18:58:13,960 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer#714] - LOOKING
2017-04-18 18:58:13,961 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#815] - New election. My id = 1, proposed zxid=0x130000024b
2017-04-18 18:58:13,962 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#597] - Notification: 1 (message format version), 1 (n.leader), 0x130000024b (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x13 (n.peerEpoch) LOOKING (my state)
2017-04-18 18:58:13,964 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager#193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:13,964 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager#193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:14,165 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager#193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:14,166 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager#193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:14,166 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#849] - Notification time out: 400
2017-04-18 18:58:15,566 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager#193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:15,567 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager#193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:15,567 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#849] - Notification time out: 800
2017-04-18 18:58:16,368 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager#193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:16,368 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager#193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:16,368 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#849] - Notification time out: 1600
2017-04-18 18:58:17,969 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager#193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:17,969 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager#193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:17,970 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#849] - Notification time out: 3200
But after checking status, we found that it's not running but we can still see the process id. Could someone help fixing this issue.
sh zkServer.sh status
JMX enabled by default
Using config: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
This is the exception in the leader node of zookeeper:
2017-04-18 18:25:32,634 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxnFactory#197] - Accepted socket connection from /127.0.0.1:47916
2017-04-18 18:25:32,635 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn#827] - Processing srvr command from /127.0.0.1:47916
2017-04-18 18:25:32,635 [myid:3] - INFO [Thread-22:NIOServerCnxn#1007] - Closed socket connection for client /127.0.0.1:47916 (no session established for client)
2017-04-18 18:30:01,662 [myid:3] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker#780] - Connection broken for id 1, my id = 3, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
2017-04-18 18:30:01,663 [myid:3] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker#783] - Interrupting SendWorker
2017-04-18 18:30:01,662 [myid:3] - ERROR [LearnerHandler-/10.136.145.38:47656:LearnerHandler#633] - Unexpected exception causing shutdown while sock still open
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:546)
2017-04-18 18:30:01,663 [myid:3] - WARN [SendWorker:1:QuorumCnxManager$SendWorker#697] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
2017-04-18 18:30:01,663 [myid:3] - WARN [LearnerHandler-/10.136.145.38:47656:LearnerHandler#646] - ******* GOODBYE /10.136.145.38:47656 ********
2017-04-18 18:30:01,663 [myid:3] - WARN [SendWorker:1:QuorumCnxManager$SendWorker#706] - Send worker leaving thread
2017-04-18 18:39:40,076 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxnFactory#197] - Accepted socket connection from /10.136.145.38:58748
2017-04-18 18:39:40,077 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn#357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-04-18 18:39:40,078 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn#1007] - Closed socket connection for client /10.136.145.38:58748 (no session established for client)
2017-04-18 18:42:46,516 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxnFactory#197] - Accepted socket connection from /127.0.0.1:47988
2017-04-18 18:42:46,516 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn#827] - Processing srvr command from /127.0.0.1:47988
2017-04-18 18:42:46,517 [myid:3] - INFO [Thread-23:NIOServerCnxn#1007] - Closed socket connection for client /127.0.0.1:47988 (no session established for client)

Fixed the issue by making below changes in zoo.cfg of all the zookeeper nodes:
changes the zookeeper hostname to IP address
started zookeeper instances in a sequence of descending id.
increased initSize from 5 to 100
A sample zoo.cfg looks like this:
dataDir=/ngs/app/ligerp/solr/zookeeper-3.4.6/zookeeperdata/1
clientPort=2181
initLimit=100
syncLimit=2
server.1=10.136.145.38:2888:3888
server.2=10.136.145.39:2889:3889
server.3=10.136.145.40:2890:3890

In my case, it was simply a matter of not sufficient CPU allocation to the process.
Try allocating additional CPU and see if this fixes the problem.

Related

Karate scripts with API request to Splunk server works fine on MAC but fails on windows & pipeline[Alpine linux] [duplicate]

I am working on automating a API which is hosted in China server and when I send a request it throws me timeout exception ( org.apache.http.conn.HttpHostConnectException).
My feature file:
Background:
* url 'http://myurl'
* configure connectTimeout = 500000
Scenario: Get Client details
Given path '/clients'
And header Authorization = 'sdssSSLwWDSD'
When method get
Then match response.client_id == 'TestId'
Error details:
11:22:30.347 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection request: [route: {s}->https://myurl.com][total kept alive: 0; route allocated: 0 of 5; total allocated: 0 of 10]
11:22:30.365 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection leased: [id: 0][route: {s}->https://myurl.com][total kept alive: 0; route allocated: 1 of 5; total allocated: 1 of 10]
11:22:30.365 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Opening connection {s}->https://myurl.com
11:22:30.384 [main] DEBUG org.apache.http.impl.conn.DefaultHttpClientConnectionOperator - Connecting to myurl.com/54.223.191.33:443
11:22:30.384 [main] DEBUG org.apache.http.conn.ssl.SSLConnectionSocketFactory - Connecting socket to myurl.com/54.223.191.33:443 with timeout 500000
11:22:51.407 [main] DEBUG org.apache.http.impl.conn.DefaultHttpClientConnectionOperator - Connect to myurl.com/54.223.191.33:443 timed out. Connection will be retried using another IP address
11:22:51.407 [main] DEBUG org.apache.http.impl.conn.DefaultHttpClientConnectionOperator - Connecting to myurl.com/52.80.167.86:443
11:22:51.408 [main] DEBUG org.apache.http.conn.ssl.SSLConnectionSocketFactory - Connecting socket to myurl.com/52.80.167.86:443 with timeout 500000
11:23:12.438 [main] DEBUG org.apache.http.impl.conn.DefaultManagedHttpClientConnection - http-outgoing-0: Shutdown connection
11:23:12.439 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Connection discarded
11:23:12.440 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection released: [id: 0][route: {s}->https://myurl.com:443][total kept alive: 0; route allocated: 0 of 5; total allocated: 0 of 10]
11:23:12.440 [main] ERROR com.intuit.karate - org.apache.http.conn.HttpHostConnectException: Connect to myurl.com [myurl.com/54.223.191.33, myurl.com/52.80.167.86] failed: Connection timed out: connect, http call failed after 42094 milliseconds for URL: https://myurl.com/tnc/v1/tnc/all
11:23:12.441 [main] ERROR com.intuit.karate - http request failed:
org.apache.http.conn.HttpHostConnectException: Connect to myurl.com:443 [myurl.com/54.223.191.33, myurl.com/52.80.167.86] failed: Connection timed out: connect
11:23:12.601 [Finalizer] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection manager is shutting down
11:23:12.601 [Finalizer] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection manager shut down
HTML report: (paste into browser to view) | Karate version: 0.9.4
What I have tried yet:
I tried to configure timeout to max as below:
* configure connectTimeout = 500000
It didn't work.
However, the same request when tried from Postman, works fine and I get response within 2000 ms.
Not sure where I am wrong.
Most likely you have a corporate HTTP proxy.
Refer: https://github.com/intuit/karate#configure
* configure proxy = 'http://my.proxy.host:8080'

Connnections handling

So I've been using karate for a while now, and there has been an issue we were facing since over the last year: org.apache.http.conn.ConnectTimeoutException
Other threads about that mentioned connectionTimeout exception were solvable by specifying proxy, but taht did not help us.
After tons of investigation, it turned out that our Azure SNAT was exhausted, meaning Karate was opening way too many connections.
To verify this I enabled log debugging and used this feature
Background:
* url "https://www.karatelabs.io/"
Scenario:
* method GET
* method GET
the logs then had following lines
13:10:17.868 [main] DEBUG com.intuit.karate - request:
1 > GET https://www.karatelabs.io/
1 > Host: www.karatelabs.io
1 > Connection: Keep-Alive
1 > User-Agent: Apache-HttpClient/4.5.13 (Java/17.0.4.1)
1 > Accept-Encoding: gzip,deflate
13:10:17.868 [main] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection request: [route: {s}->https://www.karatelabs.io:443][total available: 0; route allocated: 0 of 5; total allocated: 0 of 10]
13:10:17.874 [main] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection leased: [id: 0][route: {s}->https://www.karatelabs.io:443][total available: 0; route allocated: 1 of 5; total allocated: 1 of 10]
13:10:17.875 [main] DEBUG o.a.h.impl.execchain.MainClientExec - Opening connection {s}->https://www.karatelabs.io:443
13:10:17.883 [main] DEBUG o.a.h.i.c.DefaultHttpClientConnectionOperator - Connecting to www.karatelabs.io/34.149.87.45:443
13:10:17.883 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Connecting socket to www.karatelabs.io/34.149.87.45:443 with timeout 30000
13:10:17.924 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Enabled protocols: [TLSv1.3, TLSv1.2]
13:10:17.924 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Enabled cipher suites:[...]
13:10:17.924 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Starting handshake
13:10:18.012 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Secure session established
13:10:18.012 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - negotiated protocol: TLSv1.3
13:10:18.012 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - negotiated cipher suite: TLS_AES_256_GCM_SHA384
13:10:18.012 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - peer principal: CN=karatelabs.io
13:10:18.012 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - peer alternative names: [karatelabs.io, www.karatelabs.io]
13:10:18.012 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - issuer principal: CN=Sectigo RSA Domain Validation Secure Server CA, O=Sectigo Limited, L=Salford, ST=Greater Manchester, C=GB
13:10:18.014 [main] DEBUG o.a.h.i.c.DefaultHttpClientConnectionOperator - Connection established localIp<->serverIp
13:10:18.015 [main] DEBUG o.a.h.i.c.DefaultManagedHttpClientConnection - http-outgoing-0: set socket timeout to 120000
13:10:18.015 [main] DEBUG o.a.h.impl.execchain.MainClientExec - Executing request GET / HTTP/1.1
...
13:10:18.066 [main] DEBUG o.a.h.impl.execchain.MainClientExec - Connection can be kept alive indefinitely
...
...
13:10:18.196 [main] DEBUG com.intuit.karate - request:
2 > GET https://www.karatelabs.io/
13:10:18.196 [main] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection request: [route: {s}->https://www.karatelabs.io:443][total available: 0; route allocated: 0 of 5; total allocated: 0 of 10]
13:10:18.196 [main] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection leased: [id: 1][route: {s}->https://www.karatelabs.io:443][total available: 0; route allocated: 1 of 5; total allocated: 1 of 10]
13:10:18.196 [main] DEBUG o.a.h.impl.execchain.MainClientExec - Opening connection {s}->https://www.karatelabs.io:443
13:10:18.196 [main] DEBUG o.a.h.i.c.DefaultHttpClientConnectionOperator - Connecting to www.karatelabs.io/34.149.87.45:443
13:10:18.196 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Connecting socket to www.karatelabs.io/34.149.87.45:443 with timeout 30000
13:10:18.206 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Enabled protocols: [TLSv1.3, TLSv1.2]
13:10:18.206 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Enabled cipher suites:[...]
13:10:18.206 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Starting handshake
13:10:18.236 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - Secure session established
13:10:18.236 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - negotiated protocol: TLSv1.3
13:10:18.236 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - negotiated cipher suite: TLS_AES_256_GCM_SHA384
13:10:18.236 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - peer principal: CN=karatelabs.io
13:10:18.236 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - peer alternative names: [karatelabs.io, www.karatelabs.io]
13:10:18.236 [main] DEBUG o.a.h.c.s.SSLConnectionSocketFactory - issuer principal: CN=Sectigo RSA Domain Validation Secure Server CA, O=Sectigo Limited, L=Salford, ST=Greater Manchester, C=GB
13:10:18.236 [main] DEBUG o.a.h.i.c.DefaultHttpClientConnectionOperator - Connection established localIp<->serverIp
13:10:18.236 [main] DEBUG o.a.h.i.c.DefaultManagedHttpClientConnection - http-outgoing-1: set socket timeout to 120000
...
13:10:18.279 [main] DEBUG o.a.h.impl.execchain.MainClientExec - Connection can be kept alive indefinitely
...
...
13:10:18.609 [Finalizer] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection manager is shutting down
13:10:18.610 [Finalizer] DEBUG o.a.h.i.c.DefaultManagedHttpClientConnection - http-outgoing-1: Shutdown connection
13:10:18.611 [Finalizer] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection manager shut down
13:10:18.612 [Finalizer] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection manager is shutting down
13:10:18.612 [Finalizer] DEBUG o.a.h.i.c.DefaultManagedHttpClientConnection - http-outgoing-2: Shutdown connection
13:10:18.612 [Finalizer] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection manager shut down
13:10:18.612 [Finalizer] DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection manager is shutting down
"Connecting to socket" and "handshake" indicate that karate is establishing a new connection instead of using an already opened one, even though I am sending a request to the same host.
On the other hand, on longer scenarios, I was seeing "http-outgoing-x: Shutdown connection" after about ~1s from opening it, in the middle of the run, despite having "karate.configure('readTimeout', 120000)" specified.
I don't think that was intentional, especially after seeing the "keep-alive" header and the "Connection can be kept alive indefinitely" in the log"
That being said, is there any way to force karate to use the same connection instead of establishing a new one each request?
As far as we know, we use the Apache HTTP Client API the right way.
But you never know. The best thing is for you to dive into the code and see what we could be missing. Or you could provide a way to replicate following these instructions: https://github.com/karatelabs/karate/wiki/How-to-Submit-an-Issue

Karate DSL: Getting connection timeout error

I am working on automating a API which is hosted in China server and when I send a request it throws me timeout exception ( org.apache.http.conn.HttpHostConnectException).
My feature file:
Background:
* url 'http://myurl'
* configure connectTimeout = 500000
Scenario: Get Client details
Given path '/clients'
And header Authorization = 'sdssSSLwWDSD'
When method get
Then match response.client_id == 'TestId'
Error details:
11:22:30.347 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection request: [route: {s}->https://myurl.com][total kept alive: 0; route allocated: 0 of 5; total allocated: 0 of 10]
11:22:30.365 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection leased: [id: 0][route: {s}->https://myurl.com][total kept alive: 0; route allocated: 1 of 5; total allocated: 1 of 10]
11:22:30.365 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Opening connection {s}->https://myurl.com
11:22:30.384 [main] DEBUG org.apache.http.impl.conn.DefaultHttpClientConnectionOperator - Connecting to myurl.com/54.223.191.33:443
11:22:30.384 [main] DEBUG org.apache.http.conn.ssl.SSLConnectionSocketFactory - Connecting socket to myurl.com/54.223.191.33:443 with timeout 500000
11:22:51.407 [main] DEBUG org.apache.http.impl.conn.DefaultHttpClientConnectionOperator - Connect to myurl.com/54.223.191.33:443 timed out. Connection will be retried using another IP address
11:22:51.407 [main] DEBUG org.apache.http.impl.conn.DefaultHttpClientConnectionOperator - Connecting to myurl.com/52.80.167.86:443
11:22:51.408 [main] DEBUG org.apache.http.conn.ssl.SSLConnectionSocketFactory - Connecting socket to myurl.com/52.80.167.86:443 with timeout 500000
11:23:12.438 [main] DEBUG org.apache.http.impl.conn.DefaultManagedHttpClientConnection - http-outgoing-0: Shutdown connection
11:23:12.439 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Connection discarded
11:23:12.440 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection released: [id: 0][route: {s}->https://myurl.com:443][total kept alive: 0; route allocated: 0 of 5; total allocated: 0 of 10]
11:23:12.440 [main] ERROR com.intuit.karate - org.apache.http.conn.HttpHostConnectException: Connect to myurl.com [myurl.com/54.223.191.33, myurl.com/52.80.167.86] failed: Connection timed out: connect, http call failed after 42094 milliseconds for URL: https://myurl.com/tnc/v1/tnc/all
11:23:12.441 [main] ERROR com.intuit.karate - http request failed:
org.apache.http.conn.HttpHostConnectException: Connect to myurl.com:443 [myurl.com/54.223.191.33, myurl.com/52.80.167.86] failed: Connection timed out: connect
11:23:12.601 [Finalizer] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection manager is shutting down
11:23:12.601 [Finalizer] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection manager shut down
HTML report: (paste into browser to view) | Karate version: 0.9.4
What I have tried yet:
I tried to configure timeout to max as below:
* configure connectTimeout = 500000
It didn't work.
However, the same request when tried from Postman, works fine and I get response within 2000 ms.
Not sure where I am wrong.
Most likely you have a corporate HTTP proxy.
Refer: https://github.com/intuit/karate#configure
* configure proxy = 'http://my.proxy.host:8080'

Stucked HDP adding host

I have an HDP cluster with 2 nodes, we had some problem and 1 host heartbeat was lost without any chance to recover it as the machine failed so we finally re-install Ubuntu and configure it again.
It was impossible to recover the host in ambari (trying to give same FQDN, ip, configuration,...) so I tried to change hostname and add it as a completely new host.
I did get able to complete installation step 2 with 'SUCCESS' status, but it get stucked whith the following message 'Please wait while the hosts are being checked for potential problems...' for hours.
I attach ambari-server log, ambari-agent log ambari registration log and error image.
Do you have some idea about what is happening and how to solve it?
Thanks.
ambari-server.log
12 jun 2018 09:34:55,667 WARN [ambari-action-scheduler] ExecutionCommandWrapper:185 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
12 jun 2018 09:34:56,675 WARN [ambari-action-scheduler] ExecutionCommandWrapper:185 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
12 jun 2018 09:34:57,683 WARN [ambari-action-scheduler] ExecutionCommandWrapper:185 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
ambari-agent.log
INFO 2018-06-12 09:00:16,026 Controller.py:512 - Registration response from bigdata was OK
INFO 2018-06-12 09:00:16,026 Controller.py:517 - Resetting ActionQueue...
INFO 2018-06-12 09:00:26,035 Controller.py:304 - Heartbeat (response id = 0) with server is running...
INFO 2018-06-12 09:00:26,036 Controller.py:311 - Building heartbeat message
INFO 2018-06-12 09:00:26,037 Heartbeat.py:90 - Adding host info/state to heartbeat message.
INFO 2018-06-12 09:00:26,099 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.
INFO 2018-06-12 09:00:26,168 Hardware.py:176 - Some mount points were ignored: /dev, /run, /, /dev/shm, /run/lock, /sys/fs/cgroup, /boot, /run/user/1000, /run/user/0, /run/user/994
INFO 2018-06-12 09:00:26,169 Controller.py:320 - Sending Heartbeat (id = 0)
INFO 2018-06-12 09:00:26,174 Controller.py:332 - Heartbeat response received (id = 1)
INFO 2018-06-12 09:00:26,174 Controller.py:341 - Heartbeat interval is 10 seconds
INFO 2018-06-12 09:00:26,174 Controller.py:377 - Updating configurations from heartbeat
INFO 2018-06-12 09:00:26,174 Controller.py:386 - Adding cancel/execution commands
INFO 2018-06-12 09:00:26,174 Controller.py:403 - Adding recovery commands
INFO 2018-06-12 09:00:26,174 Controller.py:471 - Waiting 9.9 for next heartbeat
INFO 2018-06-12 09:00:36,075 Controller.py:478 - Wait for next heartbeat over
registration log
INFO 2018-06-12 09:34:38,350 Controller.py:512 - Registration response from bigdata was OK
INFO 2018-06-12 09:34:38,350 Controller.py:517 - Resetting ActionQueue...
', None)
Connection to master.es closed.
SSH command execution finished
host=master.es, exitcode=0
Command end time 2018-06-12 09:34:38
Registering with the server...
Registering with the server...
Do an ambari-agent reset on all nodes.
Change cluster name.

How can I trace the failure ot TSaslTransport (hive related)

I've been debugging a JDBC Connection error in hive, similar to what was asked here:
Hive JDBC getConnection does not return.
By turning on log4j properly, i finally got down to seeing this , before the getConnection() hangs. What is thrift waiting for ? If this is related to using the wrong thrift APIs, how can I determine versioning differences between client/server?
I have tried copying all libraries from my hive server onto my client app to test if it is some kind of minor thrift class versioning error, but that didnt solve the problem, my JDBC connection still hangs.
0 [main] DEBUG org.apache.thrift.transport.TSaslTransport - opening transport org.apache.thrift.transport.TSaslClientTransport#219ba640
0 [main] DEBUG org.apache.thrift.transport.TSaslTransport - opening transport org.apache.thrift.transport.TSaslClientTransport#219ba640
3 [main] DEBUG org.apache.thrift.transport.TSaslClientTransport - Sending mechanism name PLAIN and initial response of length 14
3 [main] DEBUG org.apache.thrift.transport.TSaslClientTransport - Sending mechanism name PLAIN and initial response of length 14
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Writing message with status START and payload length 5
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Writing message with status START and payload length 5
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Writing message with status COMPLETE and payload length 14
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Writing message with status COMPLETE and payload length 14
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Start message handled
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Start message handled
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Main negotiation loop complete
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Main negotiation loop complete
6 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: SASL Client receiving last message
6 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: SASL Client receiving last message