Azure Container Apps Restarts every 30 seconds - azure-container-instances

I have an Azure Container App that's based on the hosted BackgroundService model. It's essentially just a long running console app that overrides the BackgroundService.ExecuteAsync method and waits for the stop signal (via the passed cancellation token). When I run locally in Docker, it's perfect - everything runs as expected. When I deploy as an Azure Container App, it deploys and runs - although I manually had to set the scale minimum to 1 to get it to run at all - but it restarts every 30 secs or so which is obviously not ideal. My guess is that the Azure Container Apps docker host is somehow checking my instance for health and isn't satisfied so tries to restart it? Just a guess. What am I missing?
using FR911.DataAccess.Repository;
using FR911.Infrastructure.Commands;
using FR911.Utils;
using FR911.Utils.Extensions;
using SimpleInjector;
IHost host = Host.CreateDefaultBuilder(args)
.ConfigureServices(services =>
{
services.AddFR911Log4NetConfig();
services.AddTransient<ICommandProcessor, CommandProcessor>();
Container container = new Container();
container.Register(typeof(ICommandHandler<,>), new List<Type>()
{
//typeof(CacheSyncCommandHandler),
});
#if DEBUG
container.Verify();
#endif
services.AddSingleton<Container>(container);
services.AddHostedService<Worker>();
})
.Build();
await host.RunAsync();
public class Worker : BackgroundService
{
private readonly ILogger<Worker> _logger;
private ICommandProcessor _commandProcessor;
public Worker(ILogger<Worker> logger, ICommandProcessor cmdProcessor)
{
_logger = logger;
_commandProcessor = cmdProcessor;
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
_logger.LogInformation("Worker starting at: {time}", DateTimeOffset.Now);
DateTime? lastGC = null;
while (!stoppingToken.IsCancellationRequested)
{
_logger.LogInformation("Worker running at: {time}", DateTimeOffset.Now);
await Task.Delay(1000, stoppingToken);
}
_logger.LogInformation("Worker stopping at: {time}", DateTimeOffset.Now);
}
}
24 May 2022 12:10:46.5732022-05-24 12:10:46,248 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker starting at: 05/24/2022 12:10:46 +00:00
24 May 2022 12:10:46.5732022-05-24 12:10:46,249 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:46 +00:00
24 May 2022 12:10:46.5732022-05-24 12:10:46,251 Microsoft.Hosting.Lifetime fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Application started. Press Ctrl+C to shut down.
24 May 2022 12:10:46.5732022-05-24 12:10:46,252 Microsoft.Hosting.Lifetime fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Hosting environment: Production
24 May 2022 12:10:46.5732022-05-24 12:10:46,336 Microsoft.Hosting.Lifetime fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Content root path: /app
24 May 2022 12:10:47.6402022-05-24 12:10:47,637 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:47 +00:00
24 May 2022 12:10:48.6402022-05-24 12:10:48,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:48 +00:00
24 May 2022 12:10:49.6392022-05-24 12:10:49,637 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:49 +00:00
24 May 2022 12:10:50.6432022-05-24 12:10:50,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:50 +00:00
24 May 2022 12:10:51.6422022-05-24 12:10:51,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:51 +00:00
24 May 2022 12:10:52.6412022-05-24 12:10:52,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:52 +00:00
24 May 2022 12:10:53.6622022-05-24 12:10:53,637 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:53 +00:00
24 May 2022 12:10:54.6402022-05-24 12:10:54,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:54 +00:00
24 May 2022 12:10:55.6382022-05-24 12:10:55,636 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:55 +00:00
24 May 2022 12:10:56.6392022-05-24 12:10:56,637 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:56 +00:00
24 May 2022 12:10:57.6402022-05-24 12:10:57,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:57 +00:00
24 May 2022 12:10:58.6402022-05-24 12:10:58,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:58 +00:00
24 May 2022 12:10:59.6402022-05-24 12:10:59,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:10:59 +00:00
24 May 2022 12:11:00.6402022-05-24 12:11:00,637 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:00 +00:00
24 May 2022 12:11:01.6432022-05-24 12:11:01,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:01 +00:00
24 May 2022 12:11:02.6392022-05-24 12:11:02,637 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:02 +00:00
24 May 2022 12:11:03.6402022-05-24 12:11:03,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:03 +00:00
24 May 2022 12:11:04.6412022-05-24 12:11:04,637 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:04 +00:00
24 May 2022 12:11:05.6492022-05-24 12:11:05,636 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:05 +00:00
24 May 2022 12:11:06.6642022-05-24 12:11:06,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:06 +00:00
24 May 2022 12:11:07.6392022-05-24 12:11:07,637 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:07 +00:00
24 May 2022 12:11:08.6402022-05-24 12:11:08,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:08 +00:00
24 May 2022 12:11:09.6402022-05-24 12:11:09,637 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:09 +00:00
24 May 2022 12:11:10.6412022-05-24 12:11:10,637 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:10 +00:00
24 May 2022 12:11:11.6392022-05-24 12:11:11,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:11 +00:00
24 May 2022 12:11:12.6402022-05-24 12:11:12,637 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:12 +00:00
24 May 2022 12:11:13.6402022-05-24 12:11:13,638 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:13 +00:00
24 May 2022 12:11:14.6392022-05-24 12:11:14,636 FR911.Worker.Worker fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Worker running at: 05/24/2022 12:11:14 +00:00
24 May 2022 12:11:14.9312022-05-24 12:11:14,930 Microsoft.Hosting.Lifetime fr911worker-app-20--vki2kmn-cf5bff474-5w6mh INFO Application is shutting down...

I'm an Engineering Manager in Container Apps.
Your Container App was being restarted because it was failing the readiness probes.
If your Container App’s HTTP ingress is set to ‘Enabled’, the platform will try to ping it on the specified Target port (80 by default). If the platform can’t successfully ping it, it will be considered 'unhealthy' and will be restarted. Please refer to Health probes in Azure Container Apps to learn about the default health probes and how to specify your own settings.
If your Container App is not listening in the specified ingress port (for example, if your app is processing messages from a queue and not expecting external http requests) set HTTP ingress to ‘Disabled’. When HTTP ingress is set to ‘Disabled’, health probes won't be configured, and your app won't be pinged.
If your Container App is listening on the specified Targe port, but it requires a longer startup time, you can define a longer initial delay and/or longer period between pings.
Also, make sure that the Target port specified in the HTTP configuration is the same that is EXPOSEd in the dockerfile of your Container App

Turning off Ingress for my service fixed the problem. Having it turned on and the service not providing any accessible endpoints seemed to be the problem.

Related

Redis on GKE is running out of disk space

I just installed redis (actually a reinstall and upgrade) on GKE via helm. It was a pretty standard install and nothing too out of the norm. Unfortunately my "redis-master" container logs are showing sync errors over and over again:
Info 2022-02-01 12:58:22.733 MST redis1:M 01 Feb 2022 19:58:22.733 * Waiting for end of BGSAVE for SYNC
Info 2022-02-01 12:58:22.733 MST redis 8085:C 01 Feb 2022 19:58:22.733 # Write error saving DB on disk: No space left on device
Info 2022-02-01 12:58:22.830 MST redis 1:M 01 Feb 2022 19:58:22.829 # Background saving error
Info 2022-02-01 12:58:22.830 MST redis 1:M 01 Feb 2022 19:58:22.829 # Connection with replica redis-replicas-0.:6379 lost.
Info 2022-02-01 12:58:22.830 MST redis 1:M 01 Feb 2022 19:58:22.829 # SYNC failed. BGSAVE child returned an error
Info 2022-02-01 12:58:22.830 MST redis 1:M 01 Feb 2022 19:58:22.829 # Connection with replica redis-replicas-1.:6379 lost.
Info 2022-02-01 12:58:22.830 MST redis 1:M 01 Feb 2022 19:58:22.829 # SYNC failed. BGSAVE child returned an error
Info 2022-02-01 12:58:22.832 MST redis 1:M 01 Feb 2022 19:58:22.832 * Replica redis-replicas-0.:6379 asks for synchronization
Info 2022-02-01 12:58:22.832 MST redis 1:M 01 Feb 2022 19:58:22.832 * Full resync requested by replica redis-replicas-0.:6379
Info 2022-02-01 12:58:22.832 MST redis 1:M 01 Feb 2022 19:58:22.832 * Starting BGSAVE for SYNC with target: disk
Info 2022-02-01 12:58:22.833 MST redis 1:M 01 Feb 2022 19:58:22.833 * Background saving started by pid 8086
I then looked at my persistent volume claim specification "redis-data" and it is in the "Pending" Phase and never seems to get out of that phase. If I look at all my PVCs though then they are all bound and appear to be healthy.
Clearly something isn't as healthy as it seems but I am not sure how to diagnose. Any help would be appreciated.
i know it late to the party but to add more if any of get stuck into the same scenario and can't delete the PVC they can increase size of the PVC in GKE.
Check storageclass :
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
…
provisioner: kubernetes.io/gce-pd
allowVolumeExpansion: true
Edit the PVC
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Field that you need to update in PVC
spec:
accessModes:
- ReadWriteOnce
resources:
requests: <== make sure in requests section
storage: 30Gi <=========
Once changes are applied for PVC and saved just Restart the POD now.
Sharing linke below : https://medium.com/#harsh.manvar111/resizing-pvc-disk-in-gke-c5b882c90f7b
So I was pretty close on the heels of it, in my case when I uninstalled redis it didn't remove the PVC (which makes some sense) and then when I reinstalled it tried to use the same PVC.
Unfortunately, that pvc had run out of memory.
I was able to manually delete the PVC's that previously existed (we didn't need to keep the data) and then reinstall redis via helm. At that point, it created new PVC's and worked fine.

Redis crashing without any log errors

I'm debugging some weird behavior in my redis, where it's crashing each 2 days more or less, but not showing any errors whatsoever, only this on the logs:
1:C 10 Sep 2020 15:44:14.517 # Configuration loaded
1:M 10 Sep 2020 15:44:14.522 * Running mode=standalone, port=6379.
1:M 10 Sep 2020 15:44:14.522 # Server initialized
1:M 10 Sep 2020 15:44:14.524 * Ready to accept connections
1:C 12 Sep 2020 13:20:23.751 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 12 Sep 2020 13:20:23.751 # Redis version=6.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 12 Sep 2020 13:20:23.751 # Configuration loaded
1:M 12 Sep 2020 13:20:23.757 * Running mode=standalone, port=6379.
1:M 12 Sep 2020 13:20:23.757 # Server initialized
1:M 12 Sep 2020 13:20:23.758 * Ready to accept connections
That's all redis says to me.
I have lots of RAM available, but I have redis running as a single instance on a docker container, could the lack of processing power cause this? Should I use multiple nodes? I don't want to setup a cluster just to find out the problem was another, how can I trace down the actually cause of the problem?
So, in the end, it was exactly what I thought it was not: a memory leak!
I had 16GB that was slowly being consumed until redis crashed with no warnings, nor the operating system/docker. I fixed the app that caused the leak and the problem was gone.

(gcloud.beta.compute.ssh) [/usr/bin/ssh] exited with return code [255]

Try to using ssh connect google cloud computer engine (macOs Catalina)
gcloud beta compute ssh --zone "us-west1-b" "mac-vm" --project "mac-vm-282201"
and get error
ssh: connect to host 34.105.11.187 port 22: Operation timed out
ERROR: (gcloud.beta.compute.ssh) [/usr/bin/ssh] exited with return code [255].
and I try
ssh -I ~/.ssh/mac-vm-key asd61404#34.105.11.187
also get error
ssh: connect to host 34.105.11.187 port 22: Operation timed out
so I found this code to diagnose it
gcloud compute ssh —zone "us-west1-b" "mac-vm" —project "mac-vm-282201" —ssh-flag="-vvv"
return
OpenSSH_7.9p1, LibreSSL 2.7.3
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 48: Applying options for *
debug2: resolve_canonicalize: hostname 34.105.11.187 is address
debug2: ssh_connect_direct
debug1: Connecting to 34.105.11.187 [34.105.11.187] port 22.
I don't know, how can I fix this issue.
Thanks in advance!
here is my recent Serial console
Jul 4 02:28:39 mac-vm google_network_daemon[684]: For info, please visit https://www.isc.org/software/dhcp/
Jul 4 02:28:39 mac-vm dhclient[684]:
Jul 4 02:28:39 mac-vm dhclient[684]: Listening on Socket/ens4
[ 19.458355] google_network_daemon[684]: Listening on Socket/ens4
Jul 4 02:28:39 mac-vm google_network_daemon[684]: Listening on Socket/ens4
Jul 4 02:28:39 mac-vm dhclient[684]: Sending on Socket/ens4
[ 19.458697] google_network_daemon[684]: Sending on Socket/ens4
Jul 4 02:28:39 mac-vm google_network_daemon[684]: Sending on Socket/ens4
Jul 4 02:28:39 mac-vm systemd[1]: Finished Wait until snapd is fully seeded.
Jul 4 02:28:39 mac-vm systemd[1]: Starting Apply the settings specified in cloud-config...
Jul 4 02:28:39 mac-vm systemd[1]: Condition check resulted in Auto import assertions from block devices being skipped.
Jul 4 02:28:39 mac-vm systemd[1]: Reached target Multi-User System.
Jul 4 02:28:39 mac-vm systemd[1]: Reached target Graphical Interface.
Jul 4 02:28:39 mac-vm systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jul 4 02:28:39 mac-vm systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Jul 4 02:28:39 mac-vm systemd[1]: Finished Update UTMP about System Runlevel Changes.
[ 20.216129] cloud-init[718]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'modules:config' at Sat, 04 Jul 2020 02:28:39 +0000. Up 20.11 seconds.
Jul 4 02:28:39 mac-vm cloud-init[718]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'modules:config' at Sat, 04 Jul 2020 02:28:39 +0000. Up 20.11 seconds.
Jul 4 02:28:39 mac-vm systemd[1]: Finished Apply the settings specified in cloud-config.
Jul 4 02:28:39 mac-vm systemd[1]: Starting Execute cloud user/final scripts...
Jul 4 02:28:41 mac-vm google-clock-skew: INFO Synced system time with hardware clock.
[ 20.886105] cloud-init[725]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'modules:final' at Sat, 04 Jul 2020 02:28:41 +0000. Up 20.76 seconds.
[ 20.886430] cloud-init[725]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 finished at Sat, 04 Jul 2020 02:28:41 +0000. Datasource DataSourceGCE. Up 20.87 seconds
Jul 4 02:28:41 mac-vm cloud-init[725]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'modules:final' at Sat, 04 Jul 2020 02:28:41 +0000. Up 20.76 seconds.
Jul 4 02:28:41 mac-vm cloud-init[725]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 finished at Sat, 04 Jul 2020 02:28:41 +0000. Datasource DataSourceGCE. Up 20.87 seconds
Jul 4 02:28:41 mac-vm systemd[1]: Finished Execute cloud user/final scripts.
Jul 4 02:28:41 mac-vm systemd[1]: Reached target Cloud-init target.
Jul 4 02:28:41 mac-vm systemd[1]: Starting Google Compute Engine Startup Scripts...
Jul 4 02:28:41 mac-vm startup-script: INFO Starting startup scripts.
Jul 4 02:28:41 mac-vm startup-script: INFO Found startup-script in metadata.
Jul 4 02:28:42 mac-vm startup-script: INFO startup-script: sudo: ufw: command not found
Jul 4 02:28:42 mac-vm startup-script: INFO startup-script: Return code 1.
Jul 4 02:28:42 mac-vm startup-script: INFO Finished running startup scripts.
Jul 4 02:28:42 mac-vm systemd[1]: google-startup-scripts.service: Succeeded.
Jul 4 02:28:42 mac-vm systemd[1]: Finished Google Compute Engine Startup Scripts.
Jul 4 02:28:42 mac-vm systemd[1]: Startup finished in 1.396s (kernel) + 20.065s (userspace) = 21.461s.
Jul 4 02:29:06 mac-vm systemd[1]: systemd-hostnamed.service: Succeeded.
Jul 4 02:43:32 mac-vm systemd[1]: Starting Cleanup of Temporary Directories...
Jul 4 02:43:32 mac-vm systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Jul 4 02:43:32 mac-vm systemd[1]: Finished Cleanup of Temporary Directories.

Cannot restart redis-sentinel unit

I'm trying to configure 3 Redis instances and 6 sentinels (3 of them running on the Redises and the rest are on the different hosts). But when I install redis-sentinel package and put my configuration under /etc/redis/sentinel.conf and restart the service using systemctl restart redis-sentinel I get this error:
Job for redis-sentinel.service failed because a timeout was exceeded.
See "systemctl status redis-sentinel.service" and "journalctl -xe" for details.
Here is the output of journalctl -u redis-sentinel:
Jan 01 08:07:07 redis1 systemd[1]: Starting Advanced key-value store...
Jan 01 08:07:07 redis1 redis-sentinel[16269]: 16269:X 01 Jan 2020 08:07:07.263 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
Jan 01 08:07:07 redis1 redis-sentinel[16269]: 16269:X 01 Jan 2020 08:07:07.263 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=16269, just started
Jan 01 08:07:07 redis1 redis-sentinel[16269]: 16269:X 01 Jan 2020 08:07:07.263 # Configuration loaded
Jan 01 08:07:07 redis1 systemd[1]: redis-sentinel.service: Can't open PID file /var/run/sentinel/redis-sentinel.pid (yet?) after start: No such file or directory
Jan 01 08:08:37 redis1 systemd[1]: redis-sentinel.service: Start operation timed out. Terminating.
Jan 01 08:08:37 redis1 systemd[1]: redis-sentinel.service: Failed with result 'timeout'.
Jan 01 08:08:37 redis1 systemd[1]: Failed to start Advanced key-value store.
Jan 01 08:08:37 redis1 systemd[1]: redis-sentinel.service: Service hold-off time over, scheduling restart.
Jan 01 08:08:37 redis1 systemd[1]: redis-sentinel.service: Scheduled restart job, restart counter is at 5.
Jan 01 08:08:37 redis1 systemd[1]: Stopped Advanced key-value store.
Jan 01 08:08:37 redis1 systemd[1]: Starting Advanced key-value store...
Jan 01 08:08:37 redis1 redis-sentinel[16307]: 16307:X 01 Jan 2020 08:08:37.738 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
Jan 01 08:08:37 redis1 redis-sentinel[16307]: 16307:X 01 Jan 2020 08:08:37.739 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=16307, just started
Jan 01 08:08:37 redis1 redis-sentinel[16307]: 16307:X 01 Jan 2020 08:08:37.739 # Configuration loaded
Jan 01 08:08:37 redis1 systemd[1]: redis-sentinel.service: Can't open PID file /var/run/sentinel/redis-sentinel.pid (yet?) after start: No such file or directory
and my sentinel.conf file:
port 26379
daemonize yes
sentinel myid 851994c7364e2138e03ee1cd346fbdc4f1404e4c
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 172.28.128.11 6379 2
sentinel down-after-milliseconds mymaster 5000
# Generated by CONFIG REWRITE
dir "/"
protected-mode no
sentinel failover-timeout mymaster 60000
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 0
sentinel current-epoch 0
If you are trying to run your Redis servers on Debian based distribution, add below to your Redis configurations:
pidfile /var/run/redis/redis-sentinel.pid to /etc/redis/sentinel.conf
pidfile /var/run/redis/redis-server.pid to /etc/redis/redis.conf
What's the output in the sentinel log file?
I had a similar issue where Sentinel received a lot of sigterms.
In that case you need to make sure that if you use the daemonize yes setting, the systemd unit file must be using Type=forking.
Also make sure that the location of the PID file specified in the sentinel config matches the location specified in the systemd unit file.
If you face below error in journalctl or systemctl logs,
Jun 26 10:13:02 x systemd[1]: redis-server.service: Failed with result 'exit-code'.
Jun 26 10:13:02 x systemd[1]: redis-server.service: Scheduled restart job, restart counter is at 5.
Jun 26 10:13:02 x systemd[1]: Stopped Advanced key-value store.
Jun 26 10:13:02 x systemd[1]: redis-server.service: Start request repeated too quickly.
Jun 26 10:13:02 x systemd[1]: redis-server.service: Failed with result 'exit-code'.
Jun 26 10:13:02 x systemd[1]: Failed to start Advanced key-value store.
Then check /var/log/redis/redis-server.log for more information
In most cases issue is mentioned there.
i.e if a dump.rdb file is placed in /var/lib/redis then the issue might be with database count or redis version.
or in another scenario disabled IPV6 might be the issue.

Why not start rabbitmq-server?

Good evening. Faced with the problem: at some point the message broker "floating" (stopped updating the web interface and make the connection). When you try to restart rabbitmq-server start command process stops responding after the message output:
RabbitMQ 3.6.6. Copyright (C) 2007-2016 Pivotal Software, Inc.
Licensed under the MPL. See www.rabbitmq.com
Logs: /var/log/rabbitmq/rabbit#dbserver.log
/var/log/rabbitmq/rabbit#dbserver-sasl.log
Starting broker...
Prompt, as fixed a?
P.S. Command systemctl status rabbitmq-server.service printed:
● rabbitmq-server.service - RabbitMQ broker.
Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2016-12-11 15:11:24 UTC; 15s ago
Process: 2872 ExecStop=/usr/lib/rabbitmq/bin/rabbitmqctl stop (code=exited, status=0/SUCCESS)
Process: 2795 ExecStart=/usr/lib/rabbitmq/bin/rabbitmq-server (code=exited, status=73)
Main PID: 2795 (code=exited, status=73)
Status: "Processing port mapping requests..."
Dec 11 15:11:24 dbserver rabbitmqctl[2872]: * epmd reports: node 'rabbit' not running at all
Dec 11 15:11:24 dbserver rabbitmqctl[2872]: no other nodes on dbserver
Dec 11 15:11:24 dbserver rabbitmqctl[2872]: * suggestion: start the node
Dec 11 15:11:24 dbserver rabbitmqctl[2872]: current node details:
Dec 11 15:11:24 dbserver rabbitmqctl[2872]: - node name: 'rabbitmq-cli-41#dbserver'
Dec 11 15:11:24 dbserver rabbitmqctl[2872]: - home dir: /var/lib/rabbitmq
Dec 11 15:11:24 dbserver rabbitmqctl[2872]: - cookie hash: /637Yxskpg6/iDHKRZfS8g==
Dec 11 15:11:24 dbserver systemd[1]: Failed to start RabbitMQ broker.
Dec 11 15:11:24 dbserver systemd[1]: rabbitmq-server.service: Unit entered failed state.
Dec 11 15:11:24 dbserver systemd[1]: rabbitmq-server.service: Failed with result 'exit-code'.
Although I couldn't verify if exit code status=73 means exactly this, for me this issue turned out to be running out of disk space. If you check the rabbitmq logs (/var/log/rabbitmq/), it should give you more context if this is a different error.