I am trying to setup a testing environment using Selenium grid. The hub is to be running on an AWS computer while the nodes would be executed on AWS Workspaces. At this point, nodes can register in the hub, but about one minute latter, the hub complains stating:
Marking the node http://192.168.x.x:1444 as down: cannot reach the node for 2 tries
I have been doing some research and the problem seems to be that the hub sends heartbeat signals that do not reach the nodes and then it drops the session. Since there is a firewall that might be blocking the heartbeat signals, I need to know which ports are used to send such signals in order to configure the firewall accordingly.
Thank you
Related
I wanted to understand the how a Node left event is triggered for an Apache Ignite grid.
Do nodes keep pinging each other constantly to find it if nodes are present or they ping each other only when required?
If ping from client node is not successful then can it also trigger NODE_LEFT event or it can only be triggered by server node.
Once a node has left, then which node triggers topology update event i.e. PME. Can it be triggered by client node or only server nodes can trigger it.
Yes, nodes are pinging each other to verify the connection. Here is more detailed explanation of how a node failure happens. You might also check this video.
The final decision of failing a node (leaving the cluster) is made on the Coordinator node issuing a special event that has to be acked by other nodes (NODE FAILED).
Though a node might leave a cluster explicitly, sending a TcpDiscoveryNodeLeftMessage (aka triggering a NODE_LEFT event), for example when you stop it gracefully.
Only the coordinator node can change topology version, meaning that a PME always starts on the coordinator and is spread to other nodes afterward.
Using the azure-iot-sdk for python I have a program that opens a connection to the IoT Hub and continually listens for direct methods, using the MQTT protocol. This is working as expected. I have a second python program that I invoke from cron hourly, that connects to the IoT Hub and updates the device twin for my device. Again this is using MQTT. Everything is working fine.
However I've come across in the documentation that a device can only have one MQTT connection at a time and the second will drop cause the first to drop. I'm not seeing this, however is what I'm doing unsupported?
Should I have a single program doing both tasks and sharing a single connection?
Yes that is correct, you can't have more than one connection with the same device ID to the IoTHub. Eventually in time you will have inconsistency behaviors and that scenario is unsupported. You should use a single program with a unique device ID doing both tasks.
Depending on the scenario you may want to consider using an iothubowner connection string to do service side operations like manage your IoT hub, and optionally send messages, schedule jobs, invoke direct methods, or send desired property updates to your IoT devices or modules.
I have 1 grid hub server and 3 selenium nodes.
I would like to execute multiple test suite against one Grid Hub server.
So that each test suite will be executed on all 3 servers and the rest of the test suits (pending) will wait till the current test suite will finish its execution.
Can the Grid Hub manage a queue of the test suits?
If no, is there any workaround or another solution?
TLDR; - Yes. Grid can manage.
Long answer
Selenium Hub doesn't care about whether the requests are coming from 3 different test suites or one. Think about it in this way - Hub will process all the requests that comes to it. When a request comes, hub will see if there is a node that has the capability to execute the request.
If there is one available and free, it will send the command to the node.
If there is a node which can execute the request but is busy now, it will send the command to the queue.
If there are no nodes which has the requested capability, the request is marked as failed.
Hub doesn't check for the source of request anywhere in the above flow.
How do I detect a network timeout of grid requests after starting a Sauce test using the RemoteWebDriver client object? I have a scenario that I want my framework to catch and that is that when connectivity outwards to SauceLabs is working but the connectivity back fails. In other words, this is a network scenario where my Selenium test sends a browser .get() and opens a new browser in SauceLabs with a new URL but then, because of a network issue the subsequent JSON packets fail and the tests appears to hang. I know what the problem is but I just want my unit test framework to report the network issue. Right now the RemoteWebDriver will timeout indefinitely when this condition occurs and that is not acceptable.
I know that to solve this will require an understanding of how the client side timeout works when a RemoteWebDriver client initially tries to send JSON commands to the Grid Hub.
I know I can specify the timeout when starting the Grid but similar options do not appear to be on the client side?
Hub start :- java -jar /tools/grid/selenium-server-standalone-2.35.0.jar
-role hub -maxSession 20 -browserTimeout 240
-remoteControlPollingIntervalInSeconds 180 -sessionMaxIdleTimeInSeconds 240
-newSessionMaxWaitTimeInSeconds 250 -timeout 30
Setting an idle timeout on the Sauce end might help with this: http://saucelabs.com/docs/additional-config#idle-timeout
This desired capability (set by default at 90 seconds) times out if no commands are received from your Selenium script.
While it can't detect network issues directly, this could prevent your minutes from being eaten by blocked responses.
You might also benefit from Sauce Connect (https://saucelabs.com/docs/connect), a free, standalone Java utility that enables you to test against firewalled resources. Connect also checks for dropped packets, and will automatically resend them (up to a certain point) in an attempt to re-establish the connection.
I've been trying to setup Enterprise Jenkins with the High Availabilty setup. The current setup consists of two jenkins masters sharing the same jenkins home, say master1 and master2, an installation of the jenkins-ha-monitor-1.1-1.1 rpm on both these masters, say monitor1 and monitor2. With this setup, according to the documentation atleast, the HA plugin should work as expected. Promotion and demotion scripts are similar to the ones in the documentation (only the ip and interface is different, same approach). i.e
For demotion
ifconfig eth0:2 down
For promotion
ifconfig eth0:2 the.floating.ip
Now for the nodes to get registered correctly I have to start master1, master2, monitor1 and monitor2 in that order. Tailing the logs for both I see that when the services are started in that order they are registered correctly by both monitor services as nodes in a cluster, and in the HA status gui in the jenkins console.
Now when master1 is killed by sending it a KILL signal monitor2 recognizes this and runs the promotion script. But monitor one keeps throwing :
Oct 24, 2012 3:47:36 PM
com.cloudbees.jenkins.ha.singleton.HASingleton$3 suspect INFO:
Suspecting a node failure in a cluster: jenkins-master-1-285 Oct 24,
2012 3:47:39 PM com.cloudbees.jenkins.ha.singleton.HASingleton$3
suspect INFO: Suspecting a node failure in a cluster:
jenkins-master-1-285
continuously without ever runnign the demotion script. Now since master2 has taken up the floating ip via its promotion script, and master1 still has that ip because demotion script is not run the setup ends up with two boxes claiming the same ip. Moreover restarting master1 does not do anything, i.e master1 does not get added to the cluster as a seconday node, monitor1 still keeps spitting the above messages to log, the floating ip keeps returning "Unable to connect" and master2 and monitor2 show the cluster as master2,monitor2 and monitor1. So my question/problem is twofold - why isnt master1 accepted back into the cluster? And why isn't the demotion script run as it should?
Also FYI i have tried to do a
service jenkins stop
and in that case the demotion script runs but again there are similar issues when
service jenkins start
is run on the master that was stopped earlier since the promotion script is run regardless of whether a primary jenkins exists. And in this case the two monitors register different clusters like so monitor1 : master1,monitor1 and monitor2 : master2,monitor2.
Running an ifconfig shows that both masters have taken up the floating ip at this point.
Any help is appreciated! Thanks!
Still under investigation with support. The originally reported problem (here) suggests that the two nodes are communicating fine, but promotions/demotions are not run correctly—either a bug in JGroups or in its usage in Jenkins high availability.
But further tests turned up problems with UDP multicast communication, which has been reported for RedHat/CentOS hosts. Work is underway to offer an alternate JGroups stack which does not rely on multicast (or UDP) at all, using the shared $JENKINS_HOME directory to register Jenkins and monitor instances (as TCP address:port records).