Configured keepalived to monitor services on my other machine and to automatically failover to a standby if problems occur
Here is my config file
vrrp_script chk_haproxy {
script "/usr/bin/pidof -s haproxy"
interval 2
}
vrrp_instance VI_1 {
interface eth1
state MASTER
priority 200
virtual_router_id 33
unicast_src_ip 10.122.0.25.6
unicast_peer {
10.122.28.6
}
authentication {
auth_type PASS
auth_pass password
}
track_script {
chk_haproxy
}
notify_master /etc/keepalived/script.sh
}
But on running the keepalived service I am getting error sudo systemctl start keepalived.service
May 15 19:36:39 ubuntu-ams0101-ss-df killall5[5973]: only one argument, a signal number, allowed
How do I resolve this?
Related
Hell, I am trying to deploy rke k8s with terraform, but I am not able to connect to the desired host via ssh:
time="2022-02-28T11:17:38+01:00" level=warning msg="Failed to set up SSH tunneling for host [poc-k8s.my-domain.com]: Can't retrieve Docker Info: error during connect: Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info\": Unable to access node with address [poc-k8s.my-domain.com:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain"
and this is the .tf file I am using:
terraform {
required_providers {
rke = {
source = "rancher/rke"
version = "1.3.0"
}
}
}
provider "rke" {
log_file = "rke_debug.log"
}
resource "rke_cluster" "cluster" {
nodes {
address = "poc-k8s.my-domain.com"
user = "root"
role = ["controlplane", "worker", "etcd"]
ssh_key = file("~/.ssh/root_key")
}
nodes {
address = "poc-k8s.my-domain.com"
user = "root"
role = ["worker", "etcd"]
ssh_key = file("~/.ssh/root_key")
}
addons_include = [
"https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml",
"https://gist.githubusercontent.com/superseb/499f2caa2637c404af41cfb7e5f4a938/raw/930841ac00653fdff8beca61dab9a20bb8983782/k8s-dashboard-user.yml",
]
}
resource "local_file" "kube_cluster_yaml" {
filename = "~/.kube/kube_config_cluster.yml"
sensitive_content = "rke_cluster.cluster.kube_config_yaml"
}
The key if of course correct and I am able to connect to the desired host:
ssh -i ~/.ssh/root_key root#poc-k8s.my-domain.com
what am I missing here?
[Update]
Cluster resource has delay_on_creation property that can be used
resource "rke_cluster" "cluster" {
delay_on_creation = 180
(...)
}
I'm facing a similar issue. On the second run of terrafor apply it works correctly. In my case the issue is that docker is not up fast enough for RKE provider.
I've found following workaround from citynetwork /
citycloud-examples:
resource "rke_cluster" "cluster" {
(...)
depends_on = [null_resource.wait-for-docker]
}
resource "null_resource" "wait-for-docker" {
provisioner "local-exec" {
command = "sleep 180"
}
depends_on = [
# list of servers docker being installed on
(...)
]
}
It waits for 180s which is not ideal, though.
I am having problems exposing my RSK node to an external IP.
My startup command looks as follows:
java \
-cp $HOME/Downloads/rskj-core-3.0.1-IRIS-all.jar \
-Drsk.conf.file=/root/bitcoind-lnd/rsk/rsk.conf \
-Drpc.providers.web.cors=* \
-Drpc.providers.web.ws.enabled=true \
co.rsk.Start \
--regtest
This is my rsk.conf:
rpc {
providers {
web {
cors: "*",
http {
enabled = true
bind_address = "0.0.0.0"
hosts = ["localhost", "0.0.0.0"]
port: 4444
}
}
}
}
API is accessible from localhost, but from external network I get error 400. How do I expose it to external network?
You should add your external IP to hosts. Adding just 0.0.0.0 is not enough to indicate all IPs to be valid. Port forwarding needs to be enabled for the port number that you have configured in rsk.conf, which in this case is the default value of 4444.
rpc {
providers {
web {
cors: “*”,
http {
enabled = true
bind_address = “0.0.0.0"
hosts = [“localhost”, “0.0.0.0", “216.58.208.100”]
port: 4444
}
}
}
}
where 216.58.208.100 is your external IP
We are running haproxy on two non-production servers balanced by keepalived to manage failover.
We recently upgraded from haproxy 1.5 to 2.0.3. In our non-production environment, we never had a HA solution, so we decided to run keepalived to detect haproxy failure/stoppage and apply the VIPs to the backup server.
When we applied these updates, everything worked pretty well...until we noticed something in the addition of new sites into the lb. When keepalived is restarted (not reloaded) and with the new sites behind the lb the new sites seem to work well for an indeterminate amount time...then they start to return "err_empty_response". Nothing seems to fix this, until keepalived is restarted, then they work again for an indeterminate amount of time and than they will start returning "err_empty_response".
The site is still marked up in the stats page.
The painful part is that the calls stop making it into the haproxy.log file which leads me to think that the problem is not (just) haproxy.
What we have tried:
Splitting up each environment into its own virtual interface in keepalived.conf
Updating the binding of the api on the backend server to a working api (to eliminate api code as being an option)
Creating a new binding with a shortened url
Decreasing timeouts (client, server)
keepalived.conf:
`! Configuration File for keepalived
global_defs {
notification_email {
test#blah.com
}
notification_email_from keepalived#blah.com
smtp_server blah.mail.protection.outlook.com.
smtp_connect_timeout 30
router_id LVS_NONPROD
}
# Script used to check if HAProxy is running
vrrp_script check_haproxy {
script "pidof haproxy"
interval 2
weight 2
}
vrrp_instance VI_DEV {
state MASTER
interface ens160
virtual_router_id 52
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
}
track_script {
check_haproxy
}
}
vrrp_instance VI_TEST {
state MASTER
interface ens160
virtual_router_id 53
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
}
track_script {
check_haproxy
}
}
vrrp_instance VI_UAT {
state MASTER
interface ens160
virtual_router_id 54
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
}
track_script {
check_haproxy
}
}
vrrp_instance VI_STAGING {
state MASTER
interface ens160
virtual_router_id 55
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
}
track_script {
check_haproxy
}
}
vrrp_instance VI_SS {
state MASTER
interface ens160
virtual_router_id 56
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
}
track_script {
check_haproxy
}
}
vrrp_instance VI_NS {
state MASTER
interface ens160
virtual_router_id 57
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
xxx.xxx.xxx.xxx
}
track_script {
check_haproxy
}
}`
haproxy globals:
`global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local2 debug
tune.chksize 32768 #don't get me started...dev requirement because of antiquated requirement not coded away
tune.bufsize 32768 #refer to previous statement
tune.ssl.default-dh-param 2048
max-spread-checks 20000
tune.maxpollevents 10000
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 40000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats`
defaults:
`defaults
mode http
log global
option httplog
option log-health-checks
option dontlognull
option http-server-close
option redispatch
retries 3
timeout http-request 10s
timeout queue 60000
timeout connect 10s
timeout client 60000
timeout server 60000
timeout http-keep-alive 30s
timeout check 30s
maxconn 30000
errorfile 503 /etc/haproxy/errorfiles/503.http`
The answer was a bit silly. Internal DNS to the load balancer was incorrect, so remoting to it was impossible until I tried to ssh into the machine during a period where the website was throwing these errors. Turns out that the old load balancer had the ip addresses as part of the network scripts (ie /etc/sysconfig/network-scripts/ifcfg-eth0:0-20).
So, the new instances would work when I restarted keepalived because it would take the ip addresses and the old instance would take them back (subsequently causing a failure because the old instance didn't have the entry in it).
I stopped haproxy on the old instance, removed the /etc/sysconfig/network-scripts/ifcfg-eth0:* files from the old server, restarted keepalived on the new cluster and everything is working as it should.
Feeling a little stupid right now.
I created a simple .net core console application with docker support. Following
Masstransit code fails to connect to RabbitMQ instance on host machine. But similar implementation using RabitMq.Client is able to connect to host machine RabbitMQ instance.
Masstransit throws
MassTransit.RabbitMqTransport.RabbitMqConnectionException: Connect
failed: ctas#192.168.0.9:5672/ --->
RabbitMQ.Client.Exceptions.BrokerUnreachableException:
host machine ip : 192.168.0.9
using Masstransit
string rabbitMqUri = "rabbitmq://192.168.0.9/";
string userName = "ctas";
string password = "ctas#123";
string assetServiceQueue = "hello";
var bus = Bus.Factory.CreateUsingRabbitMq(cfg =>
{
var host = cfg.Host(new Uri(rabbitMqUri), hst =>
{
hst.Username(userName);
hst.Password(password);
});
cfg.ReceiveEndpoint(host,
assetServiceQueue, e =>
{
e.Consumer<AddNewAssetReceivedConsumer>();
});
});
bus.Start();
Console.WriteLine("Service Running.... Press enter to exit");
Console.ReadLine();
bus.Stop();
Using RabbitMQ Client
public static void Main()
{
var factory = new ConnectionFactory();
factory.UserName = "ctas";
factory.Password = "ctas#123";
factory.VirtualHost = "watcherindustry";
factory.HostName = "192.168.0.9";
using (var connection = factory.CreateConnection())
using (var channel = connection.CreateModel())
{
channel.QueueDeclare(queue: "hello",
durable: false,
exclusive: false,
autoDelete: false,
arguments: null);
var consumer = new EventingBasicConsumer(channel);
consumer.Received += (model, ea) =>
{
var body = ea.Body;
var message = Encoding.UTF8.GetString(body);
Console.WriteLine(" [x] Received {0}", message);
};
channel.BasicConsume(queue: "hello",
autoAck: true,
consumer: consumer);
Console.WriteLine(" Press [enter] to exit.");
Console.ReadLine();
}
}
Docker file
FROM microsoft/dotnet:1.1-runtime
ARG source
WORKDIR /app
COPY ${source:-obj/Docker/publish} .
ENTRYPOINT ["dotnet", "TestClient.dll"]
I created an example, and was able to connect my host, using the preview package from masstransit.
Start rabbitmq in docker and expose ports on the host
docker run -d -p 5672:5672 -p 15672:15672 --hostname my-rabbit --name some-rabbit rabbitmq:3-management
Build and run console app.
docker build -t dotnetapp .
docker run -d -e RABBITMQ_URI=rabbitmq://guest:guest#172.17.0.2:5672 --name some-dotnetapp dotnetapp
To verify your receiving messages run
docker logs some-dotnetapp --follow
you should see the following output
Application is starting...
Connecting to rabbitmq://guest:guest#172.17.0.2:5672
Received: Hello, World [08/12/2017 04:35:53]
Received: Hello, World [08/12/2017 04:35:58]
Received: Hello, World [08/12/2017 04:36:03]
Received: Hello, World [08/12/2017 04:36:08]
Received: Hello, World [08/12/2017 04:36:13]
...
Notes:
172.17.0.2 was my-rabbit container ip address but you can replace it with your machine ip address
http://localhost:15672 is the rabbitmq management console log in with guest as username and password.
Lastly portainer.io is a very useful application to visually view you local docker environment.
Thanks for the response. I managed to resolve this issue. My findings are as follows.
to connect to a rabbitmq instance on another docker container, they have to be moved/connected to the same network. To do this
create a newtork
docker network create -d bridge my_bridge
connect both app and rabbitmq containers to same network
docker network connect my_bridge <container name>
For masstransit uri use rabbitmq container IP on that network or container name
To connect rabbitmq instance of host machine from a app on docker container.
masstransit uri should include machine name( I tried IP, that did not work)
Try using virtual host in MassTransit configuration too, not sure why you decided to omit it.
var host = cfg.Host("192.168.0.9", "watcherindustry", hst =>
{
hst.Username(userName);
hst.Password(password);
});
Look at Alexey Zimarev comment to your question, if your rabbit runs on a container then it should be on your docker-compese file and then use that entry in your endpoint definition to connect to rabbit because docker creates an internal network on which you are agnostic from source code...
rabbitmq:
container_name: "rabbitmq-yournode01"
hostname: rabbit
image: rabbitmq:3.6.6-management
environment:
- RABBITMQ_DEFAULT_USER=yourusergoeshere
- RABBITMQ_DEFAULT_PASS=yourpasswordgoeshere
- RABBITMQ_DEFAULT_VHOST=vhost
volumes:
- rabbit-volume:/var/lib/rabbitmq
ports:
- "5672:5672"
- "15672:15672"
In your app settings you should have something lie:
"ConnectionString": "host=rabbitmq:5672;virtualHost=vhost;username=yourusergoeshere;password=yourpasswordgoeshere;timeout=0;prefetchcount=1",
And if you'd use EasyNEtQ you could do:
_bus = RabbitHutch.CreateBus(_connectionString); // The one above
I hope it helps,
Juan
I terminated the redis server using SHUTDOWN from redis-cli. Now the prompt shows 'not connected>'.
The only way I found to restart the server was to exit the redis-cli prompt and then do a restart of the redis service.
My question is, is there any way to restart the server from the redis-cli prompt using any redis commands WITHOUT EXITING the redis-cli prompt?
While you don't have to exit the cli, the server cannot be restarted from it once it is shut down.
i agree Itamar Haber answer and i will uncover the details
after the server restart,if you type any command in this 'not connected>',the redis-cli will attempt connect again if send command failed.
while (1) {
config.cluster_reissue_command = 0;
if (cliSendCommand(argc,argv,repeat) != REDIS_OK) {
cliConnect(1);//try to connect redis server if sendcommand failed
if (cliSendCommand(argc,argv,repeat) != REDIS_OK) {//after try to connect,send commend again
cliPrintContextError();
return REDIS_ERR;
}
}
}
after redis-server restart successfully,it will listen socket event,if socket connect occur,server will accept connect at here
void acceptTcpHandler(aeEventLoop *el, int fd, void *privdata, int mask) {
......some code.......
while(max--) {
cfd = anetTcpAccept(server.neterr, fd, cip, sizeof(cip), &cport);//accept connect
if (cfd == ANET_ERR) {
if (errno != EWOULDBLOCK)
serverLog(LL_WARNING,
"Accepting client connection: %s", server.neterr);
return;
}
serverLog(LL_VERBOSE,"Accepted %s:%d", cip, cport);
acceptCommonHandler(cfd,0,cip);
}
}