I'm running a Redis instance as standalone server with the config as follow:
#bind option is commented
protected-mode no
port 6379
timeout 0
supervised no
The problem is the instance is running normally but passed some time it just hung out and don't let establish any incoming connection rather than localhost. When I check the logs I see this:
* SLAVE OF 45.148.122.184:39844 enabled (user request from 'id=214 addr=45.148.122.184:32324 fd=30 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof')
* Connecting to MASTER 45.148.122.184:39844
# Unable to connect to MASTER: Permission denied
* Connecting to MASTER 45.148.122.184:39844
# Unable to connect to MASTER: Permission denied
* Connecting to MASTER 45.148.122.184:39844
# Unable to connect to MASTER: Permission denied
In the config I'm not setting any slave or master server. even this cluster option is disabled. I don't recognize the IP address either.
Every time I restart the service it runs well and after a few hours the same happens. the sentinel service is disabled and not running.
What can I do to solve this?
Is this a configuration issue?
Thanks in advance!
Seems like your Redis host has external IP and accessible to everyone on the internet and someone spotted it and uses slaveof has to spoil your server accidentally or intentionally.
I recommend you at least set up authentication or/and restrict access to your Redis host with a firewall.
Related
Currently trying to setup a Docker socket proxy container. I've added the new tcp address as the Docker endpoint in traefik.toml but the logs report that Traefik is unable to connect to the daemon at the standard address "unix:///var/run/docker.sock".
Is this just a bug in how the logs reports the problem or is Traefik actually looking for the daemon at the old endpoint?
Immediately after Traefik reads traefik.toml:
time="2019-06-16T18:33:10Z" level=info msg="Starting provider *docker.Provider {\"Watch\":true,\"Filename\":\"\",\"Constraints\":null,\"Trace\":false,\"TemplateVersion\":2,\"DebugLogGeneratedTemplate\":false,\"Endpoint\":\"unix:///var/run/docker.sock\",\"Domain\":\"\",\"TLS\":null,\"ExposedByDefault\":true,\"UseBindPortIP\":false,\"SwarmMode\":false,\"Network\":\"\",\"SwarmModeRefreshSeconds\":15}"
Error that repeats:
time="2019-06-16T18:33:11Z" level=error msg="Failed to retrieve information of the docker client and server host: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
time="2019-06-16T18:33:11Z" level=error msg="Provider connection error Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?, retrying in 676.457252ms"
traefik.toml config:
[docker]
endpoint = "tcp://localhost:2375"
domain = "my.domain"
I've tried changing the endpoint to the local IP of the host machine as well as the Docker network IP but all result in the same error.
Everything works fine as long as /var/run/docker.sock is mounted in the container (regardless of what the endpoint in traefik.toml is) but as soon as I remove it, I started getting the above errors.
I currently have an HTTPS Load Balancer setup operating with a 443 Frontend, Backend and Health Check that serves a single host nginx instance.
When navigating directly to the host via browser the page loads correctly with valid SSL certs.
When trying to access the site through the load balancer IP, I receive a 502 - Server error message. I check the Google logs and I notice "failed_to_pick_backend" errors at the load balancer. I also notice that it failing health checks.
Some digging around leads me to these two links: https://cloudplatform.googleblog.com/2015/07/Debugging-Health-Checks-in-Load-Balancing-on-Google-Compute-Engine.html
https://github.com/coreos/bugs/issues/1195
Issue #1 - Not sure if google-address-manager is running on the server
(RHEL 7). I do not see an entry for the HTTPS load balancer IP in the
routes. The Google SDK is installed. This is a Google-provided image
and if I update the IP address in the console, it also gets updated on
the host. How do I check if google-address-manager is running on
RHEL7?
[root#server]# ip route ls table local type local scope host
10.212.2.40 dev eth0 proto kernel src 10.212.2.40
127.0.0.0/8 dev lo proto kernel src 127.0.0.1
127.0.0.1 dev lo proto kernel src 127.0.0.1
Output of all google services
[root#server]# systemctl list-unit-files
google-accounts-daemon.service enabled
google-clock-skew-daemon.service enabled
google-instance-setup.service enabled
google-ip-forwarding-daemon.service enabled
google-network-setup.service enabled
google-shutdown-scripts.service enabled
google-startup-scripts.service enabled
Issue #2: Not receiving a 200 OK response. The certificate is valid
and the same on both the LB and server. When running curl against the
app server I receive this response.
root#server.com curl -I https://app-server.com
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html
Thoughts?
You should add firewall rules for the health check service -
https://cloud.google.com/compute/docs/load-balancing/health-checks#health_check_source_ips_and_firewall_rules and make sure that your backend service listens on the load balancer ip (easiest is bind to 0.0.0.0) - this is definitely true for an internal load balancer, not sure about HTTPS with an external ip.
A couple of updates and lessons learned:
I have found out that "google-address-manager" is now deprecated and replaced by "google-ip-forward-daemon" which is running.
[root#server ~]# sudo service google-ip-forwarding-daemon status
Redirecting to /bin/systemctl status google-ip-forwarding-daemon.service
google-ip-forwarding-daemon.service - Google Compute Engine IP Forwarding Daemon
Loaded: loaded (/usr/lib/systemd/system/google-ip-forwarding-daemon.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2017-12-22 20:45:27 UTC; 17h ago
Main PID: 1150 (google_ip_forwa)
CGroup: /system.slice/google-ip-forwarding-daemon.service
└─1150 /usr/bin/python /usr/bin/google_ip_forwarding_daemon
There is an active firewall rule allowing IP ranges 130.211.0.0/22 and 35.191.0.0/16 for port 443. The target is also properly set.
Finally, the health check is currently using the default "/" path. The developers have put an authentication in front of the site during the development process. If I bypassed the SSL cert error, I received a 401 unauthorized when running curl. This was the root cause of the issue we were experiencing. To remedy, we modified nginx basic authentication configuration to disable authentication to a new route (eg. /health)
Once nginx configuration was updated and the path was updated to the new /health route at the health check, we were receivied valid 200 responses. This allowed the health check to return healthy instances and allowed the LB to pass through traffic
This spring guide on messaging with rabbitmq does not talk about the host port configurations. I followed the same and added these properties to application.properties to connect to rabbitmq broker installed on GCP
spring:
rabbitmq:
host: XXX.XXX.XXX.XX
port: 5672
username: user
password: bitnami
virtual-host: /
While running the app I am getting timeout exception while connecting to rabbitmq
2017-08-06 17:16:54.322 ERROR 7280 --- [ container-1] o.s.a.r.l.SimpleMessageListenerContainer : Failed to check/redeclare auto-delete queue(s).
org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out: connect
at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:62) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:367) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:565) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:1430) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1411) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1387) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.core.RabbitAdmin.getQueueProperties(RabbitAdmin.java:336) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.redeclareElementsIfNecessary(SimpleMessageListenerContainer.java:1136) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1387) [spring-rabbit-1.7.2.RELEASE.jar:na]
Tried the following but still same error:
Opened up tcp:5672 through GCP firewall configuration
Changed the rabbitmq config at /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.config to change the allowed ips from localhost (127.0.0.1) to 0.0.0.0
{
rabbit,
[{tcp_listeners, [{"0.0.0.0", 5672}, {"::", 5672}]},
{default_vhost, <<"/">>},
{default_user, <<"user">>},
{default_pass, <<"bitnami">>},
{default_permissions, [<<".*">>, <<".*">>, <<".*">>]}
}
What could be the problem here ?
Update
I have installed rabbitmq locally and everything works fine.
I doubt if the updates to config file is actually not getting reflected properly. This is how I did it.
updated the rabbitmq.config
rabbitmqctl stop_app
rabbitmqctl start_app
But still I see some difference under the 'Ports and contexts' section in the UI
localhost
gcp
Any pointers ? Or is it all looking fine and the problem is something different, like with GCP setup or something ?
After telnet-ing to the port and checking the port config through the GCP console I figured out that I did a mistake in setting the right tag name to the instance where I installed rabbitmq.
Please do verify that the 'target tag' mentioned in your firewall rule is indeed mapped to the vm instance where rabbitmq is installed
Otherwise the config mentioned in the question is enough to make it work from a remote client
I have an activemq installation with master / slave failover.
Master and Slave are synced using the lease-database-locker
Master and Slave run on 2 different machines and the database is located on a third machine.
Failover and client reconnection works properly on a forced shutdown of the master broker. The slave is taking over properly and the clients reconnect due to their failover setting.
The problems start, if I simulate a network outage on the master broker only. This is done by using an iptables Drop Rule for packages going to the database on the master.
The master now realizes, that it cannot connect to the Database any longer. The slave starts up, since it's network connection is still alive.
It seems from the logs, that the clients still try to reconnect to the non responding master
For my understanding the master should inform the clients, that there is no connection anymore. The clients should failover and reconnect to the slave.
But this is not happening.
The clients do reconnect to the slave if I reestablish the db connection by reenabling the network connection to the db for the master. The master gives up beeing the master then.
I have set a queryTimeout on the lease-database-locker.
I have set updateClusterClients=true for the transport connector.
I have set a validationQueryTimeout of 10s on the db connection.
I have set a testOnBorrow for the db connection
Is there a way to force the master to inform the clients to failover in this particular case ?
After some digging I found the trick.
The broker was not informing the clients due to a missing ioExceptionHandler configuration.
The documentation can be found here
http://activemq.apache.org/configurable-ioexception-handling.html
I needed to specify
<bean id="ioExceptionHandler" class="org.apache.activemq.util.LeaseLockerIOExceptionHandler">
<property name="stopStartConnectors"><value>true</value></property>
<property name="resumeCheckSleepPeriod"><value>5000</value></property>
</bean>
and tell the broker to use the Handler
<broker xmlns="http://activemq.apache.org/schema/core" ....
ioExceptionHandler="#ioExceptionHandler" >
In order to produce an error on network outages I also had to set a queryTimeout on the lease query:
<jdbcPersistenceAdapter dataDirectory="${activemq.base}/data" dataSource="#mysql-ds-db01-st" lockKeepAlivePeriod="3000">
<locker>
<lease-database-locker lockAcquireSleepInterval="10000" queryTimeout="8" />
</locker>
This will produce an sql exception if the query takes to long due to a network outage.
I did test the network by dropping packages to the database using an iptables rule:
/sbin/iptables -A OUTPUT -p tcp --destination-port 13306 -j DROP
Sounds like you client doesn't have the address of the slave in its URI so it doesn't know where to reconnect to. The master broker doesn't inform the client where the slave is as it doesn't know there is a slave(s) or where that slave might be on the network, and even if it did that would be unreliable depending on what the conditions are that caused the master broker to drop in the first place.
You need to provide the client with the connection information for the master and the slave in the failover URI.
Apache mesos fails to find slave usage when you select slaves on the mesos gui. Also the web console is showing "failing when trying to load resource."
This is a common issue when running on EC2 or other cloud providers where machines have both an external and an internal IP. Mesos reports the internal IP in the UI, so if you're using the web UI from outside of EC2, the URLs won't work.
Current Mesos master and the latest 0.15 release candidate fix this issues by adding a --hostname command line option to set the hostname that gets reported in the UI.
If you're running <0.15, you can fix the issue by adding all the hosts in your Mesos cluster to /etc/hosts like so:
<private ip> <public fqdn> <machine hostname>
for example:
10.98.58.170 ec2-54-224-191-136.compute-1.amazonaws.com ec2-54-224-191-136