DCOS navstar service failed to start on agent nodes - dcos

I'm setting up DC/OS on dev servers and faced with one of agent node failing to run navstar service:
# journalctl -u dcos-navstar -b
Mar 18 13:45:15 localhost.localdomain systemd[1]: Starting Navstar: A distributed systems & network overlay orchestration engine...
Mar 18 13:45:15 localhost.localdomain check-time[5868]: Checking whether time is synchronized using the kernel adjtimex API.
Mar 18 13:45:15 localhost.localdomain check-time[5868]: Time can be synchronized via most popular mechanisms (ntpd, chrony, systemd-timesyncd, etc.)
Mar 18 13:45:15 localhost.localdomain check-time[5868]: Time is in sync!
Mar 18 13:45:15 localhost.localdomain ping[5870]: ping: ready.spartan: Name or service not known
Mar 18 13:45:15 localhost.localdomain systemd[1]: dcos-navstar.service: control process exited, code=exited status=2
Mar 18 13:45:15 localhost.localdomain systemd[1]: Failed to start Navstar: A distributed systems & network overlay orchestration engine.
The ntpd service is installed and running (service is active). Time synchronization with ntpd works fine. Please advice.

Check 123 port is open and is not blocked by iptables or other firewall. Or try to use chrony as a service to synchronize the system clock with NTP servers (it is more accurate and has more features than ntp).
For CentOS:
yum install chrony
I had the same trouble with DC/OS. But not only navstar.service, but also metronome.service was failed (same time sync issue). Spent lot's of time searching for the grain of problem. Finally migrated to chrony and the problem disappeared.

For long-running tasks use Marathon. For one-time or cron tasks use Chronos. You simply use REST API to place and manage your tasks at DCOS through mentioned above frameworks. And I recommend you to use containers. Here you can read about: micro-services at DCOS

Related

Failed to start The Apache HTTP Server on ubuntu 18.04

I am trying to create a web server on my ubuntu 18.04 so i installed Apache2
but i can't start it.
Here's what appeared when i run the systemctl status apache2.service command
apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: failed (Result: exit-code) since Sat 2020-02-22 13:58:09 CET; 34s ago
Process: 2791 ExecStart=/usr/sbin/apachectl start (code=exited, status=1/FAILURE)
Feb 22 13:58:09 moemen apachectl[2791]: AH00558: apache2: Could not reliably determine the server's
Feb 22 13:58:09 moemen apachectl[2791]: (98)Address already in use: AH00072: make_sock: could not b
Feb 22 13:58:09 moemen apachectl[2791]: (98)Address already in use: AH00072: make_sock: could not b
Feb 22 13:58:09 moemen apachectl[2791]: no listening sockets available, shutting down
Feb 22 13:58:09 moemen apachectl[2791]: AH00015: Unable to open logs
Feb 22 13:58:09 moemen apachectl[2791]: Action 'start' failed.
Feb 22 13:58:09 moemen apachectl[2791]: The Apache error log may have more information.
Feb 22 13:58:09 moemen systemd[1]: apache2.service: Control process exited, code=exited status=1
Feb 22 13:58:09 moemen systemd[1]: apache2.service: Failed with result 'exit-code'.
Feb 22 13:58:09 moemen systemd[1]: Failed to start The Apache HTTP Server.
I'm new at this can you please help me
I also faced same problem.
First check
$ sudo systemctl status nginx
If nginx is active then stop this with
$ sudo systemctl stop nginx
then again try to start apache2 server in different terminal.
first remove apache2
sudo apt-get --purge remove apache2
sudo apt-get autoremove
after that if there files (.conf) /etc/sites-available remove them using
rm example.com.conf
then install again
sudo apt-get install apache2
now it will fixed
check it now
sudo ufw allow 'Apache'
sudo systemctl status apache2
Let me give a more general answer than the first 2. One possible problem with Apache is, when we try to run it, it may fail because port 80 is used by another software:
a common case is nginx which is covered by Devashish Mishra
in my case it was a server app that I deployed (in node.js, I had to tell pm2 to stop it)
in general, you may want to find what uses port 80. This may be done like Chi.C.J.Rajeeva Lochana has suggested: install netstat if you don't have it (sudo apt install net-tools), use it: sudo netstat -antup | grep 80. It will show some lines which may include :::80 or <your IP>:80 which will tell what is listening to the port
Once you've found what listens to the 80 port, you have to decide what to do with it. For instance, if that's nginx and you don't use it, you may go like Devashish Mishra has suggested: just stop it (sudo systemctl stop nginx). Likewise, you can stop or kill (sudo killall -9 program-name) other programs. However, if you need them, you'll also need to further configure Apache and rerun them (the exact steps highly depend on the case).
Please read this carefully.
Perform the following command, and if you see it is apache, then do the following below the command.
Note: You need to install the net-tools package before you could run netstat. Run sudo apt install net-tools to install it.
sudo netstat -antup | grep 80
You should check the line with something like <Your IP>:80.
Please note that this might also happen when you uninstall Apache when it is running.
The command could be:
sudo killall -9 program-name
Replace program-name with the program's name if the program running on port 80 is not stoppable. Let me know it it doesn't work.
Thanks.
I found this problem and was able to solve it by creating a folder /var/log/apache2, I checked in the /var/log/ folder, it turns out that there is no apache2 folder, just like in the case of mysql that won't start.
seen from your log that
Feb 22 13:58:09 moment apachectl[2791]: AH00015: Unable to open logs
maybe this will help
On your terminal.
Type: sudo stop /etc/init.d/apache2
The response will be:
Stopping apache2 (via systemctl): apache2.service.
Now start the server:
sudo /opt/lampp/lampp start
If you installed lamp correctly this should work

How to restart prometheus?

I have set-up prometheus in my Ubuntu machine and it is running at localhost:9090 now. But, when I run the following command, I get a failed status.
systemctl status prometheus
Output:
● prometheus.service - Prometheus
Loaded: loaded (/lib/systemd/system/prometheus.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2019-11-06 14:58:36 +0530; 8s ago
Main PID: 7046 (code=exited, status=1/FAILURE)
නෙවැ 06 14:58:36 ayesh systemd[1]: prometheus.service: Service hold-off time over, scheduling restart
නෙවැ 06 14:58:36 ayesh systemd[1]: prometheus.service: Scheduled restart job, restart counter is at 5
නෙවැ 06 14:58:36 ayesh systemd[1]: Stopped Prometheus.
නෙවැ 06 14:58:36 ayesh systemd[1]: prometheus.service: Start request repeated too quickly.
නෙවැ 06 14:58:36 ayesh systemd[1]: prometheus.service: Failed with result 'exit-code'.
නෙවැ 06 14:58:36 ayesh systemd[1]: Failed to start Prometheus.
I tried to restart prometheus using;
killall -HUP prometheus
sudo systemctl daemon-reload
sudo systemctl restart prometheus
and using;
curl -X POST http://localhost:9090/-/reload
but they did not work for me. I have checked for syntax errors of prometheus.yml using 'promtool' and it passed successfully.
Is there any other way to fix this problem?
Check if it still running on your task manager and then kill it's task from there, that will work.
The output shows a failed start of prometheus. So you shouldn't be able to kill anything. Just check your processes with:
ps -ef | [p]rometheus # the [p] is used to hide the grep process itself
Use the following command to see more log content about prometheus:
journalctl -t prometheus
There might also be more information in your log files in the directory /var/log, especially in /var/log/messages and/or /var/log/syslog.
For debugging purposes just start prometheus in the foreground by executing the following:
$(which prometheus)
This will help to find additional information about the failed start.

Net Core 2.2 AWS RHEL 7.5 Deployment

I m trying to deploy my 1st ASP.NET Core 2.2 API on AWS RHEL 7.5
my /etc/systemd/system/kestrel-mytest.service
[Unit]
Description=.NET Prototypes Application on Linux
[Service]
WorkingDirectory=/home/ec2-user/webapi
ExecStart=/usr/bin/dotnet /home/ec2-user/webapi/prototypes.dll
Restart=always
# Restart service after 10 seconds if the dotnet service crashes:
RestartSec=10
KillSignal=SIGINT
SyslogIdentifier=dotnet-example
User=apache
Environment=ASPNETCORE_ENVIRONMENT=Production
Environment=DOTNET_PRINT_TELEMETRY_MESSAGE=false
TimeoutStopSec=90
[Install]
WantedBy=multi-user.target
now I am facing with:
[ec2-user#ip-172-31-6-33 dotnet]$ sudo systemctl status kestrel-mytest.service
â kestrel-mytest.service - .NET Prototypes Application on Linux
Loaded: loaded (/etc/systemd/system/kestrel-mytest.service; disabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2019-02-19 14:06:39 UTC; 6s ago
Process: 3902 ExecStart=/usr/bin/dotnet /home/ec2-user/webapi/prototypes.dll (code=exited, status=145)
Main PID: 3902 (code=exited, status=145)
Feb 19 14:06:39 ip-172-31-6-33.ap-southeast-1.compute.internal systemd[1]: kestrel-mytest.service: main process exited, code=exited, status=145/n/a
Feb 19 14:06:39 ip-172-31-6-33.ap-southeast-1.compute.internal systemd[1]: Unit kestrel-mytest.service entered failed state.
Feb 19 14:06:39 ip-172-31-6-33.ap-southeast-1.compute.internal systemd[1]: kestrel-mytest.service failed.
[ec2-user#ip-172-31-6-33 dotnet]$
what I missed?..
thanks a lot in advance
Don
this would resolved:
working directory MUST be the same as DocumentRoot of Apache (/etc/httpd/conf/httpd.conf) (in my case, DocumentRoot is /var/www/html/ so, it should be:
WorkingDirectory=/var/www/html/webapi, do does the ExecStart as below:
ExecStart=/usr/bin/dotnet /var/www/html/webapi/prototypes.dll
things to consider:
chown -R apache:your_group /var/www/html/webapi
don't forget to stop and start kestrel to take effect.
systemctl stop kestrel-xxx
systemctl start kestrel-xxx
systemctl enable kestrel-xxx to automate start after machine rebooted
to check dotnet listener port status
sudo lsof -i -P -n | grep LISTEN

Redis service is not starting

I built from source and installed Redis on my system following this digital-ocean guide. But after running
$ sudo systemctl status redis
I get this failed status report.
● redis.service - Redis In-Memory Data Store Loaded: loaded
(/etc/systemd/system/redis.service; disabled; vendor preset: enabled)
Active: failed(Result: exit-code) since Tue 2018-04-03 01:51:54
+0530; 1s ago Process: 24974 ExecStart=/usr/local/bin/redis-server /etc/redis/redis.conf (code=exited, status=203/EXEC) Main PID: 24974
(code=exited, status=203/EXEC)
systemd[1]: redis.service: Unit entered failed state.
systemd[1]: redis.service: Failed with result 'exit-code'.
systemd[1]: redis.service: Service hold-off time over, scheduling restart.
systemd[1]: Stopped Redis In-Memory Data Store.
systemd[1]: redis.service: Start request repeated too quickly.
systemd[1]: Failed to start Redis In-Memory Data Store.
systemd[1]: redis.service: Unit entered failed state.
systemd[1]: redis.service: Failed with result 'exit-code'.
My system is Ubuntu 17.10 x64
Check by typing this:
sudo /usr/local/bin/redis-server /etc/redis/redis.conf
It will tell you where you're wrong.
I always have this kind of trouble. Usually I use the tool on utils package and this problem is solved.
sudo /tmp/redis-stable/utils/install_server.sh
I guess this approach a good way.
Could you please tell me what happen when you exec the following command? redis-server
Please don't forget give more information about your redis.conf file because the root of problem could be there and redis logs too, you can add the following line in your config file to get some error logs.
logfile /path/to/my/log/file.log
After that you should restart or reload the service to get additional information
I hope this information help you!
I have same error and I solved that problem by fixing dir in redis.conf(I guess that is the directory for dump.rdb).
dir /some/directory
So dump.rdb can be located in the directory that the user have permission.

Not able to Start rabbitmq server in centos 7 using systemctl

I am trying to start the rabbitmq server in centos 7. I installed erlang as it is a dependency to rabbitmq-server. Package erlang.x86_64 0:R16B-03.7.el7 .I then Installed rabbitmq using package rabbitmq-server-3.2.2-1.noarch.rpm. Installation was successful. I enabled management console uisng rabbitmq-plugins enable rabbitmq_management. But while starting the service rabbitmq-server it fails.
[root#tve-centos ~]# systemctl start rabbitmq-server.service
Job for rabbitmq-server.service failed. See 'systemctl status rabbitmq-server.service' and 'journalctl -xn' for details.
[root#tve-centos ~]# systemctl status rabbitmq-server.service
rabbitmq-server.service - LSB: Enable AMQP service provided by RabbitMQ broker
Loaded: loaded (/etc/rc.d/init.d/rabbitmq-server)
Active: failed (Result: exit-code) since Fri 2014-09-12 13:07:05 PDT; 8s ago
Process: 20235 ExecStart=/etc/rc.d/init.d/rabbitmq-server start (code=exited, status=1/FAILURE)
Sep 12 13:07:04 tve-centos su[20245]: (to rabbitmq) root on none
Sep 12 13:07:05 tve-centos su[20296]: (to rabbitmq) root on none
Sep 12 13:07:05 tve-centos su[20299]: (to rabbitmq) root on none
Sep 12 13:07:05 tve-centos rabbitmq-server[20235]: Starting rabbitmq-server: FAILED - check /var/log/rabbitmq/startup_{log, _err}
Sep 12 13:07:05 tve-centos rabbitmq-server[20235]: rabbitmq-server.
Sep 12 13:07:05 tve-centos systemd[1]: rabbitmq-server.service: control process exited, code=exited status=1
Sep 12 13:07:05 tve-centos systemd[1]: Failed to start LSB: Enable AMQP service provided by RabbitMQ broker.
Sep 12 13:07:05 tve-centos systemd[1]: Unit rabbitmq-server.service entered failed state.
and logs shows /var/log/rabbitmq/startup_log
BOOT FAILED
===========
Error description:
{could_not_start,rabbitmq_management,
{could_not_start_listener,[{port,15672}],eacces}}
Log files (may contain more information):
/var/log/rabbitmq/rabbit#tve-centos.log
/var/log/rabbitmq/rabbit#tve-centos-sasl.log
but no process is using port 15672
But if I try to start it using /usr/sbin/rabbitmq-server .I successfully started the service. But my requirements are to start it using the systemctl.
Better answer would be to actually fix SELinux and the firewall.
Open the port:
firewall-cmd --permanent --add-port=5672/tcp
firewall-cmd --reload
setsebool -P nis_enabled 1
That works for me.
It looks like a port issue. To confirm that
systemctl stop firewalld
systemctl disable firewalld
And disable SELinux for the time being in /etc/selinux/config file
SELINUX=disabled
Try reboot your machine and see whether the issue persists.
After running this command:
[root#gcp-hehe-amqp ~]# /sbin/service rabbitmq-server start
And getting the error:
Redirecting to /bin/systemctl start rabbitmq-server.service
Job for rabbitmq-server.service failed because the control process exited with error code. See "systemctl status rabbitmq-server.service" and "journalctl -xe" for details"
After many attempts, I solved the error by following this:
run command:
firewall-cmd --permanent --add-port=5672/tcp
then: firewall-cmd --reload
change this: SELINUX=disabled at /etc/selinux/config
Enable the proxy protocol to true at /etc/rabbitmq/rabbitmq.conf
proxy_protocol = true