cannot start galera cluster on centos 7 - galera

I am trying to install a new galera cluster on 3 nodes with centos 7.
When I try to start the cluster on the master node with gcomm:// nothing in starts successfully but I cannot start Mariadb on the other nodes.
systemctl status mariadb.service -l
● mariadb.service - MariaDB database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: activating (auto-restart) (Result: signal) since Wed 2019-03-06 13:50:12 EET; 720ms ago
Process: 20749 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 54893 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=killed, signal=ABRT)
Process: 54813 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 54811 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 54893 (code=killed, signal=ABRT)
CGroup: /system.slice/mariadb.service
├─54902 /bin/sh -ue /usr//bin/wsrep_sst_rsync --role joiner --address 10.1.0.172" --datadir /var/lib/mysql/ --parent 54893
├─54976 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf
└─55027 sleep 0.5
Mar 06 13:50:12 tms-galeracl2 systemd[1]: mariadb.service: main process exited, code=killed, status=6/ABRT
Mar 06 13:50:12 tms-galeracl2 systemd[1]: Failed to start MariaDB database server.
Mar 06 13:50:12 tms-galeracl2 systemd[1]: Unit mariadb.service entered failed state.
Mar 06 13:50:12 tms-galeracl2 systemd[1]: mariadb.service failed.
and the servers.cnf config
# Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://10.x.x.x,10.x.x.x,10.x.x.x"
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
#Cluster name
wsrep_cluster_name="galeracl"
#
# Allow server to accept connections on all interfaces.
#
bind-address=0.0.0.0
wsrep_node_address=”10.x.x.x"
wsrep_node_name=”galeracl2"
wsrep_sst_method=rsync
# Optional setting
#wsrep_slave_threads=1
#innodb_flush_log_at_trx_commit=0

You have the "wrong type" of quotes in your config -- in servers.cnf note the opening quote of the following lines:
wsrep_node_address=”10.x.x.x"
wsrep_node_name=”galeracl2"
Edit and replace with the same, plain quote that is used for closing (copy paste from your config above):
wsrep_node_address="10.x.x.x"
wsrep_node_name="galeracl2"
And you should be golden!

Related

mariadb galera node stuck at WSREP state transfer ongoing

I restarted mariadb galera cluster node, and now its taking ages to start, however I noticed
Status: "WSREP state transfer ongoing, current seqno: 9331 waited 510.000000 secs" and also noticed Memory: 71.3M memory usage is going up everytime I recheck mariadb service status
mariadb.service - MariaDB 10.1.47 database server
Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
Active: activating (start) since Fri 2020-11-27 21:26:43 GMT; 8min ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 19409 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, statu
Process: 19407 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCCESS)
Process: 19497 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`cd /usr/bin/..; /u
Process: 19495 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status
Process: 19494 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=
Main PID: 19701 (mysqld)
Status: "WSREP state transfer ongoing, current seqno: 9331 waited 510.000000 secs"
Tasks: 14 (limit: 4573)
Memory: 71.3M
CPU: 49.181s
CGroup: /system.slice/mariadb.service
├─19701 /usr/sbin/mysqld --wsrep_start_position=5b96f94b-2dcb-11eb-8e8b-eff8238871c4:9331
├─19764 sh -c wsrep_sst_rsync --role 'joiner' --address '192.168.5.165' --datadir '/var/lib/mysql/'
├─19765 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role joiner --address 192.168.5.165 --datadir /var
├─19824 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf
├─19865 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf
├─19884 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf
└─23405 sleep 1
what should I do, what is the best possbile way to avoid this in the future ?
First of all support for MariaDB 10.1 has ended but you could increase the systemd timeout.

Job for apache2.service failed because the control process exited with error code. See "systemctl status apache2.service"

I am trying to install Devstack but it stopping in the middle
Currently trying to install openstack, but the long process of individual installation, so am using devstack, however i already install OSM release 5. I just need a vim to put the images. Anyway I am installing openstack using devstack. But it's stopping in the middle . This was the error i am getting,
See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
chmod 644 /usr/lib/apache2/modules/mod_proxy_uwsgi.so
+lib/apache:install_apache_uwsgi:102 popd
~/devstack
+lib/apache:install_apache_uwsgi:104 sudo rm -rf /tmp/tmp.EgMQRfNaCS
+lib/apache:install_apache_uwsgi:106 is_ubuntu
+functions-common:is_ubuntu:466 [[ -z deb ]]
+functions-common:is_ubuntu:469 '[' deb = deb ']'
+lib/apache:install_apache_uwsgi:108 sudo a2enmod proxy
Module proxy already enabled
+lib/apache:install_apache_uwsgi:109 sudo a2enmod proxy_uwsgi
Considering dependency proxy for proxy_uwsgi:
Module proxy already enabled
Module proxy_uwsgi already enabled
+lib/apache:install_apache_uwsgi:115 restart_apache_server
+lib/apache:restart_apache_server:231 restart_service apache2
+functions-common:restart_service:2393 '[' -x /bin/systemctl ']'
+functions-common:restart_service:2394 sudo /bin/systemctl restart apache2
Job for apache2.service failed because the control process exited with error code. See "systemctl status apache2.service" and "journalctl -xe" for details.
+functions-common:restart_service:1 exit_trap
+./stack.sh:exit_trap:521 local r=1
++./stack.sh:exit_trap:522 jobs -p
+./stack.sh:exit_trap:522 jobs=
+./stack.sh:exit_trap:525 [[ -n '' ]]
+./stack.sh:exit_trap:531 '[' -f '' ']'
+./stack.sh:exit_trap:536 kill_spinner
+./stack.sh:kill_spinner:417 '[' '!' -z '' ']'
+./stack.sh:exit_trap:538 [[ 1 -ne 0 ]]
+./stack.sh:exit_trap:539 echo 'Error on exit'
Error on exit
+./stack.sh:exit_trap:541 type -p generate-subunit
+./stack.sh:exit_trap:542 generate-subunit 1559639730 82 fail
+./stack.sh:exit_trap:544 [[ -z /opt/stack/logs ]]
+./stack.sh:exit_trap:547 /opt/stack/devstack/tools/worlddump.py -d /opt/stack/logs
World dumping... see /opt/stack/logs/worlddump-2019-06-04-091653.txt for details
+./stack.sh:exit_trap:556 exit 1
)
Upon running apache restart, its gives this.
stack#bozz-feedz:/etc/apache2$ sudo service apache2 restart
Job for apache2.service failed because the control process exited with error code. See "systemctl status apache2.service" and "journalctl -xe" for deta
So i tried to kill all pids then it's ok for once, after that it comes back up
stack#bozz-feedz:/etc/apache2$ systemctl status apache2.service
● apache2.service - LSB: Apache2 web server
Loaded: loaded (/etc/init.d/apache2; bad; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: failed (Result: exit-code) since ti 2019-06-04 12:48:10 EEST; 3min 47s ago
Docs: man:systemd-sysv-generator(8)
Process: 2339 ExecStop=/etc/init.d/apache2 stop (code=exited, status=1/FAILURE)
Process: 2314 ExecStart=/etc/init.d/apache2 start (code=exited, status=0/SUCCESS)
stack#bozz-feedz:/etc/apache2$ sudo service apache2 restart
stack#bozz-feedz:/etc/apache2$ sudo service apache2 restart
stack#bozz-feedz:/etc/apache2$ sudo service apache2 restart
stack#bozz-feedz:/etc/apache2$ sudo service apache2 restart
stack#bozz-feedz:/etc/apache2$ sudo service apache2 restart
stack#bozz-feedz:/etc/apache2$ systemctl status apache2.service
● apache2.service - LSB: Apache2 web server
Loaded: loaded (/etc/init.d/apache2; bad; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: inactive (dead) since ti 2019-06-04 12:52:15 EEST; 1s ago
Docs: man:systemd-sysv-generator(8)
Process: 10597 ExecStop=/etc/init.d/apache2 stop (code=exited, status=0/SUCCESS)
Process: 10555 ExecStart=/etc/init.d/apache2 start (code=exited, status=0/SUCCESS)
stack#bozz-feedz:/etc/apache2$ sudo service apache2 restart
Then after some seconds the error comes back. H
stack#bozz-feedz:/etc/apache2$ sudo service apache2 restart
Job for apache2.service failed because the control process exited with error code. See "systemctl status apache2.service" and "journalctl -xe" for details.
stack#bozz-feedz:/etc/apache2$ systemctl status apache2.service
● apache2.service - LSB: Apache2 web server
Loaded: loaded (/etc/init.d/apache2; bad; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: failed (Result: exit-code) since ti 2019-06-04 12:53:29 EEST; 10s ago
Docs: man:systemd-sysv-generator(8)
Process: 13707 ExecStop=/etc/init.d/apache2 stop (code=exited, status=1/FAILURE)
Process: 13680 ExecStart=/etc/init.d/apache2 start (code=exited, status=0/SUCCESS)
stack#bozz-feedz:/etc/apache2$
$)
Except i kill the pids again. I don't understand why this is happening.
I expect the devstack to install completely. It's just tiring
Try to reinstall apache2.
It worked for me.
sudo apt-get reinstall apache2

Datastax & systemd

I have configured a systemd Unit for DataStax Enterprise 4.8.5:
### /etc/systemd/system/dse1.service
[Unit]
Description=DataStax Enterprise
[Service]
User=cassandra
ExecStart=/opt/dse/dse1/bin/dse cassandra -k
ExecStop=/opt/dse/dse1/bin/dse cassandra-stop
when I execute sudo systemctl start dse1, if i immediately do a status afterwards, i get:
● dse1.service - DataStax Enterprise 1
Loaded: loaded (/etc/systemd/system/dse1.service; static; vendor preset: disabled)
Active: active (running) since Wed 2016-03-23 13:47:57 EDT; 1s ago
Main PID: 31699 (cassandra)
CGroup: /system.slice/dse1.service
├─31699 /bin/sh /opt/dse/dse1/resources/cassandra/bin/cassandra -k -Djava.library.path=:/opt/dse/dse1/resources/hadoop/native...
├─31894 /bin/java -cp :/opt/dse/dse1/lib/dse-core-4.8.5.jar:/opt/dse/dse1/lib/dse-hadoop-4.8.5.jar:/opt/dse/dse1/lib/dse-hive...
└─31895 grep -q Error: Exception thrown by the agent : java.lang.NullPointerException
If I then wait a few seconds and try again, I get:
● dse1.service - DataStax Enterprise 1
Loaded: loaded (/etc/systemd/system/dse1.service; static; vendor preset: disabled)
Active: inactive (dead)
Mar 23 13:34:28 pspldsea01p.fleet.ad systemd[1]: Started DataStax Enterprise 1.
Mar 23 13:34:28 pspldsea01p.fleet.ad systemd[1]: Starting DataStax Enterprise 1...
Mar 23 13:38:33 pspldsea01p.fleet.ad systemd[1]: Started DataStax Enterprise 1.
Mar 23 13:38:33 pspldsea01p.fleet.ad systemd[1]: Starting DataStax Enterprise 1...
Mar 23 13:47:41 pspldsea01p.fleet.ad systemd[1]: Started DataStax Enterprise 1.
Mar 23 13:47:41 pspldsea01p.fleet.ad systemd[1]: Starting DataStax Enterprise 1...
Mar 23 13:47:44 pspldsea01p.fleet.ad dse[31267]: nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused'.
Mar 23 13:47:57 pspldsea01p.fleet.ad systemd[1]: Started DataStax Enterprise 1.
Mar 23 13:47:57 pspldsea01p.fleet.ad systemd[1]: Starting DataStax Enterprise 1...
Mar 23 13:48:01 pspldsea01p.fleet.ad dse[32004]: nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused'.
Hint: Some lines were ellipsized, use -l to show in full.
If I just execute /opt/dse/dse1/bin/dse cassandra -k as the cassandra user, it works fine.
I can't seem to find any additional logging in the normal logging locations or with sudo journalctl -u dse1
Any ideas? Thanks!
It is unfortunate that DataStax Enterprise doesn't come with a systemd service file to be able to use systemctl. However, it does come with an init script. Full documentation is available at the docs
Basically you have two options. The first one is to use the init.d directly, by starting the service:
sudo service dse start
I'm , however, too used to systemctl now to go back to that. So this is my systemd service file
[Unit]
Description=DataStax Enterprise
After=network.target
[Service]
PIDFile=/var/run/dse/dse.pid
ExecStart=/etc/init.d/dse start
ExecStop=/etc/init.d/dse stop
SuccessExitStatus=143
TimeoutSec=300
[Install]
WantedBy=multi-user.target
The init script has many configuration options. For the sake of simplicity, it may be wise to just use those directly in the script. For example, you specify the user in your systemd service file. That was giving me problems until I noticed that the user is already specified in the script. No need to duplicate options.
The SucessExitStatus=143 option is a common configuration for Java applications.
You may have to adapt the location of the script if you didn't install DSE with your package manager
Adding this in case post late but still if this can be useful.
Adding with adding additional parameter RemainAfterExit
/etc/systemd/system/dse1.service
[Unit]
Description=DataStax Enterprise
[Service]
User=cassandra
RemainAfterExit=yes
ExecStart=/opt/dse/dse1/bin/dse cassandra -k
ExecStop=/opt/dse/dse1/bin/dse cassandra-stop
[Install]
WantedBy=multi-user.target

Redis - Monit does not start properly

Monit can not start redis-server properly.
Running Redis through init.d works correctly:
$ sudo su
$ /etc/init.d/redis_6379 start
$ #=> Starting Redis server...
$ ps aux | grep redis
$ #=> root 8980 0.0 0.0 42128 1964 ? Ssl 04:56 0:00 /etc/redis/src/redis-server *:6379
$ /etc/init.d/redis_6379 stop
$ #=> Stopping ...
$ #=> Redis stopped
$ #=> (ps aux| grep redis) There's no redis process.
Running Redis through Monit does not work correctly:
(I killed the Redis process and rm /var/run/redis_6379.pid)
$ sudo su
$ monit start redis
$ ps aux | grep redis
$ #=> root 9082 0.0 0.0 35076 1972 ? Ssl 05:08 0:00 /etc/redis/src/redis-server *:6379
monit.log:
[MSK Jan 6 05:08:14] info : 'redis' start on user request
[MSK Jan 6 05:08:14] info : monit daemon at 3947 awakened
[MSK Jan 6 05:08:14] info : Awakened by User defined signal 1
[MSK Jan 6 05:08:14] info : 'redis' start: /etc/init.d/redis_6379
[MSK Jan 6 05:08:44] error : 'redis' failed to start
[MSK Jan 6 05:08:44] info : 'redis' start action done
Stopping Redis through Monit does not works correclty also:
$ ps aux | grep redis
$ #=> root 9018 0.0 0.0 35076 1968 ? Ssl 05:02 0:00 /etc/redis/src/redis-server *:6379
$ monit stop redis
$ ps aux | grep redis
$ #=> root 9082 0.0 0.0 35076 1972 ? Ssl 05:08 0:00 /etc/redis/src/redis-server *:6379
monit.log
[MSK Jan 6 05:10:02] info : 'redis' stop on user request
[MSK Jan 6 05:10:02] info : monit daemon at 3947 awakened
[MSK Jan 6 05:10:02] info : Awakened by User defined signal 1
[MSK Jan 6 05:10:02] info : 'redis' stop action done
I have:
Ubuntu 12.04.3 LTS
redis-2.8.2
monit-5.3.2
redis instalation path /etc/redis
monit instalation path /etc/monit (installed from apt-get repo)
And following config files:
https://gist.github.com/itsNikolay/665112df34d2eae09330
I had the same problem, and there is no mutch talk about this situation around. I fixed it with another solution, it may concern someone else so I post it here.
in monit configuration file I had
start program = "/etc/init.d/redis start"
stop program = "/etc/init.d/redis stop"
Replacing by the following fixed the problem (with Ubuntu)
start program = "/usr/sbin/service redis start"
stop program = "/usr/sbin/service redis stop"
Just changes owner of /etc/redis dir
$ chown -R root /etc/redis
and restart monit
$ monit restart
The problem is gone. Strange. I wish it helps.

Apache not making core dump even though kill -11 makes one

Hope this is something simple, but maybe not.
I have 4 RHEL5 web boxes that are setup behind a load balancer. All serve the majority of their content off of an NFS share.
Occasionally (twice a day or less) I'll see a note in the syslog from the kernel about a segfault from apache:
/var/log/messages.2:Sep 13 14:09:14 20050lpweb01 kernel: httpd[10006]: segfault at 00007fffae2eede8 rip 00002ab21a4045d4 rsp 00007fffae2eedd0 error 6
Sometimes, this is accompanied by a message to the apache error log about it as well, but not always:
# grep -ic seg /var/log/messages* |egrep -v '0$'
/var/log/messages.2:1
/var/log/messages.3:2
/var/log/messages.4:4
# zgrep -ic seg /var/log/httpd/error_log* |egrep -v '0$'
/var/log/httpd/error_log:1
/var/log/httpd/error_log.10.gz:1
/var/log/httpd/error_log.17.gz:1
/var/log/httpd/error_log.19.gz:1
/var/log/httpd/error_log.23.gz:1
/var/log/httpd/error_log.24.gz:2
/var/log/httpd/error_log.25.gz:1
/var/log/httpd/error_log.28.gz:2
/var/log/httpd/error_log.30.gz:1
/var/log/httpd/error_log.31.gz:1
/var/log/httpd/error_log.35.gz:1
/var/log/httpd/error_log.39.gz:4
/var/log/httpd/error_log.42.gz:1
/var/log/httpd/error_log.44.gz:3
/var/log/httpd/error_log.46.gz:1
I've setup core dumping per instructions all over the web:
echo "ulimit -c unlimited >/dev/null 2>&1" >> /etc/profile
echo "DAEMON_COREFILE_LIMIT='unlimited'" >> /etc/sysconfig/init
echo 1 > /proc/sys/fs/suid_dumpable
echo "core.%p" > /proc/sys/kernel/core_pattern
echo "CoreDumpDirectory /home/coredump" > /etc/httpd/conf.d/core_dumps.conf
mkdir /home/coredump
chown apache: /home/coredump
source /etc/profile
service httpd stop
service httpd start
When I "induce" a segfault (kill -11 ), the coredump is generated:
/var/log/httpd/error_log:[Sat Sep 15 20:43:32 2012] [notice] child pid 20746 exit signal Segmentation fault (11), possible coredump in /home/coredumps
But when the segfault occurs on its own, no coredump is made:
/var/log/httpd/error_log:[Sat Sep 15 12:03:44 2012] [notice] child pid 10652 exit signal Segmentation fault (11)
Why is this happening and how can I make sure the core dump happens every time?
We are running PHP 5.2 but other than that, everything is installed from standard RHEL or EPEL repos.