I have a proxmox cluster of 3 nodes, (node1, node2, node3) but 2 days ago I started to see a picture of how the cluster is losing node2. This node is active, it works, the VM continues to work on it, I have access via ssh and web. All nodes see it on the network. This problem prevents VM replication and VM megration.
Log in node2:
Oct 28 08:08:00 node2 systemd[1]: Starting Proxmox VE replication runner...
Oct 28 08:08:00 node2 pvesr[50099]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:08:01 node2 pvestatd[3594]: storage 'nfs-back' is not online
Oct 28 08:08:01 node2 corosync[3573]: [KNET ] rx: host: 4 link: 0 is up
Oct 28 08:08:01 node2 corosync[3573]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Oct 28 08:08:01 node2 pvesr[50099]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:08:02 node2 pvesr[50099]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:08:03 node2 corosync[3573]: [TOTEM ] A new membership (2.bb9e) was formed. Members
Oct 28 08:08:03 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:03 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:03 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:03 node2 pvesr[50099]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:08:04 node2 pvesr[50099]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:08:05 node2 corosync[3573]: [TOTEM ] A new membership (2.bba2) was formed. Members
Oct 28 08:08:05 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:05 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:05 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:05 node2 pvesr[50099]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:08:06 node2 pvesr[50099]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:08:07 node2 corosync[3573]: [TOTEM ] A new membership (2.bba6) was formed. Members
Oct 28 08:08:07 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:07 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:07 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:07 node2 pvesr[50099]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:08:08 node2 pvesr[50099]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:08:09 node2 corosync[3573]: [TOTEM ] A new membership (2.bbaa) was formed. Members
Oct 28 08:08:09 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:09 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:09 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:09 node2 pvesr[50099]: error with cfs lock 'file-replication_cfg': no quorum!
Oct 28 08:08:09 node2 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Oct 28 08:08:09 node2 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Oct 28 08:08:09 node2 systemd[1]: Failed to start Proxmox VE replication runner.
Oct 28 08:08:10 node2 pvestatd[3594]: storage 'nfs-back' is not online
Oct 28 08:08:11 node2 corosync[3573]: [TOTEM ] A new membership (2.bbae) was formed. Members
Oct 28 08:08:11 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:11 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:11 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:13 node2 corosync[3573]: [TOTEM ] A new membership (2.bbb2) was formed. Members
Oct 28 08:08:13 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:13 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:13 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:15 node2 corosync[3573]: [TOTEM ] A new membership (2.bbb6) was formed. Members
Oct 28 08:08:15 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:15 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:15 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:17 node2 corosync[3573]: [TOTEM ] A new membership (2.bbba) was formed. Members
Oct 28 08:08:17 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:17 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:17 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:19 node2 corosync[3573]: [TOTEM ] A new membership (2.bbbe) was formed. Members
Oct 28 08:08:19 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:19 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:19 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:21 node2 corosync[3573]: [TOTEM ] A new membership (2.bbc2) was formed. Members
Oct 28 08:08:21 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:21 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:21 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:23 node2 corosync[3573]: [TOTEM ] A new membership (2.bbc6) was formed. Members
Oct 28 08:08:23 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:23 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:23 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:25 node2 corosync[3573]: [TOTEM ] A new membership (2.bbca) was formed. Members
Oct 28 08:08:25 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:25 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:25 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:27 node2 corosync[3573]: [TOTEM ] A new membership (2.bbce) was formed. Members
Oct 28 08:08:27 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:27 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:27 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:27 node2 corosync[3573]: [KNET ] rx: host: 1 link: 0 is up
Oct 28 08:08:27 node2 corosync[3573]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Oct 28 08:08:27 node2 corosync[3573]: [TOTEM ] A new membership (1.bbd2) was formed. Members joined: 1 4
Oct 28 08:08:27 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:27 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:27 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:08:27 node2 pmxcfs[3565]: [dcdb] notice: members: 1/2737, 2/3565, 4/14943
Oct 28 08:08:27 node2 pmxcfs[3565]: [dcdb] notice: starting data syncronisation
Oct 28 08:08:27 node2 pmxcfs[3565]: [status] notice: members: 1/2737, 2/3565, 4/14943
Oct 28 08:08:27 node2 pmxcfs[3565]: [status] notice: starting data syncronisation
Oct 28 08:08:27 node2 corosync[3573]: [QUORUM] This node is within the primary component and will provide service.
Oct 28 08:08:27 node2 corosync[3573]: [QUORUM] Members[3]: 1 2 4
Oct 28 08:08:27 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:27 node2 pmxcfs[3565]: [status] notice: node has quorum
Oct 28 08:08:27 node2 pmxcfs[3565]: [dcdb] notice: received sync request (epoch 1/2737/00000CB2)
Oct 28 08:08:27 node2 pmxcfs[3565]: [status] notice: received sync request (epoch 1/2737/00000CDA)
Oct 28 08:08:27 node2 pmxcfs[3565]: [dcdb] notice: received all states
Oct 28 08:08:27 node2 pmxcfs[3565]: [dcdb] notice: leader is 1/2737
Oct 28 08:08:27 node2 pmxcfs[3565]: [dcdb] notice: synced members: 1/2737, 4/14943
Oct 28 08:08:27 node2 pmxcfs[3565]: [dcdb] notice: waiting for updates from leader
Oct 28 08:08:27 node2 pmxcfs[3565]: [dcdb] notice: dfsm_deliver_queue: queue length 6
Oct 28 08:08:27 node2 pmxcfs[3565]: [status] notice: received all states
Oct 28 08:08:27 node2 pmxcfs[3565]: [status] notice: all data is up to date
Oct 28 08:08:27 node2 pmxcfs[3565]: [status] notice: dfsm_deliver_queue: queue length 3
Oct 28 08:08:27 node2 pmxcfs[3565]: [dcdb] notice: update complete - trying to commit (got 8 inode updates)
Oct 28 08:08:27 node2 pmxcfs[3565]: [dcdb] notice: all data is up to date
Oct 28 08:08:27 node2 pmxcfs[3565]: [dcdb] notice: dfsm_deliver_sync_queue: queue length 6
Oct 28 08:08:37 node2 pve-ha-crm[3627]: status change wait_for_quorum => slave
Oct 28 08:08:48 node2 corosync[3573]: [KNET ] link: host: 4 link: 0 is down
Oct 28 08:08:48 node2 corosync[3573]: [KNET ] link: host: 1 link: 0 is down
Oct 28 08:08:48 node2 corosync[3573]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Oct 28 08:08:48 node2 corosync[3573]: [KNET ] host: host: 4 has no active links
Oct 28 08:08:48 node2 corosync[3573]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Oct 28 08:08:48 node2 corosync[3573]: [KNET ] host: host: 1 has no active links
Oct 28 08:08:48 node2 corosync[3573]: [TOTEM ] Token has not been received in 1237 ms
Oct 28 08:08:49 node2 corosync[3573]: [TOTEM ] A processor failed, forming new configuration.
Oct 28 08:08:50 node2 pvestatd[3594]: storage 'nfs-back' is not online
Oct 28 08:08:51 node2 corosync[3573]: [TOTEM ] A new membership (2.bbd6) was formed. Members left: 1 4
Oct 28 08:08:51 node2 corosync[3573]: [TOTEM ] Failed to receive the leave message. failed: 1 4
Oct 28 08:08:51 node2 corosync[3573]: [CPG ] downlist left_list: 2 received
Oct 28 08:08:51 node2 pmxcfs[3565]: [dcdb] notice: members: 2/3565
Oct 28 08:08:51 node2 corosync[3573]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Oct 28 08:08:51 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:08:51 node2 pmxcfs[3565]: [status] notice: members: 2/3565
Oct 28 08:08:51 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:08:51 node2 pmxcfs[3565]: [status] notice: node lost quorum
Oct 28 08:08:51 node2 pmxcfs[3565]: [dcdb] crit: received write while not quorate - trigger resync
Oct 28 08:08:51 node2 pmxcfs[3565]: [dcdb] crit: leaving CPG group
Oct 28 08:08:51 node2 pmxcfs[3565]: [dcdb] notice: start cluster connection
Oct 28 08:08:51 node2 pmxcfs[3565]: [dcdb] crit: cpg_join failed: 14
Oct 28 08:08:51 node2 pmxcfs[3565]: [dcdb] crit: can't initialize service
Oct 28 08:08:53 node2 pve-ha-crm[3627]: status change slave => wait_for_quorum
Oct 28 08:08:57 node2 pmxcfs[3565]: [dcdb] notice: members: 2/3565
Oct 28 08:08:57 node2 pmxcfs[3565]: [dcdb] notice: all data is up to date
Oct 28 08:09:00 node2 pvestatd[3594]: storage 'nfs-back' is not online
Oct 28 08:09:00 node2 systemd[1]: Starting Proxmox VE replication runner...
Oct 28 08:09:00 node2 pvesr[51468]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:09:01 node2 pvesr[51468]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:09:02 node2 pvesr[51468]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:09:03 node2 pvesr[51468]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:09:04 node2 pvesr[51468]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:09:05 node2 pvesr[51468]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:09:06 node2 pvesr[51468]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:09:07 node2 pvesr[51468]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:09:08 node2 pvesr[51468]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:09:09 node2 pvesr[51468]: error with cfs lock 'file-replication_cfg': no quorum!
Oct 28 08:09:09 node2 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Oct 28 08:09:09 node2 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Oct 28 08:09:09 node2 systemd[1]: Failed to start Proxmox VE replication runner.
Oct 28 08:09:10 node2 pvestatd[3594]: storage 'nfs-back' is not online
Oct 28 08:09:19 node2 corosync[3573]: [KNET ] rx: host: 1 link: 0 is up
Oct 28 08:09:19 node2 corosync[3573]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Oct 28 08:09:20 node2 pvestatd[3594]: storage 'nfs-back' is not online
Oct 28 08:09:20 node2 corosync[3573]: [TOTEM ] Token has not been received in 1439 ms
Oct 28 08:09:22 node2 corosync[3573]: [TOTEM ] Token has not been received in 3089 ms
Oct 28 08:09:25 node2 corosync[3573]: [TOTEM ] A new membership (2.bbe2) was formed. Members
Oct 28 08:09:25 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:09:25 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:09:25 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:09:26 node2 corosync[3573]: [TOTEM ] Token has not been received in 1238 ms
Oct 28 08:09:27 node2 corosync[3573]: [TOTEM ] Token has not been received in 2888 ms
Oct 28 08:09:28 node2 corosync[3573]: [KNET ] rx: host: 4 link: 0 is up
Oct 28 08:09:28 node2 corosync[3573]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Oct 28 08:09:28 node2 corosync[3573]: [TOTEM ] A new membership (2.bbf2) was formed. Members
Oct 28 08:09:28 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:09:28 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:09:28 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:09:28 node2 corosync[3573]: [TOTEM ] A new membership (1.bbf6) was formed. Members joined: 1 4
Oct 28 08:09:28 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:09:28 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:09:28 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:09:28 node2 pmxcfs[3565]: [dcdb] notice: members: 1/2737, 2/3565, 4/14943
Oct 28 08:09:28 node2 pmxcfs[3565]: [dcdb] notice: starting data syncronisation
Oct 28 08:09:28 node2 pmxcfs[3565]: [status] notice: members: 1/2737, 2/3565, 4/14943
Oct 28 08:09:28 node2 pmxcfs[3565]: [status] notice: starting data syncronisation
Oct 28 08:09:28 node2 corosync[3573]: [QUORUM] This node is within the primary component and will provide service.
Oct 28 08:09:28 node2 corosync[3573]: [QUORUM] Members[3]: 1 2 4
Oct 28 08:09:28 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:09:28 node2 pmxcfs[3565]: [status] notice: node has quorum
Oct 28 08:09:29 node2 pmxcfs[3565]: [dcdb] notice: received sync request (epoch 1/2737/00000CB4)
Oct 28 08:09:29 node2 pmxcfs[3565]: [status] notice: received sync request (epoch 1/2737/00000CDC)
Oct 28 08:09:29 node2 pmxcfs[3565]: [dcdb] notice: received all states
Oct 28 08:09:29 node2 pmxcfs[3565]: [dcdb] notice: leader is 1/2737
Oct 28 08:09:29 node2 pmxcfs[3565]: [dcdb] notice: synced members: 1/2737, 4/14943
Oct 28 08:09:29 node2 pmxcfs[3565]: [dcdb] notice: waiting for updates from leader
Oct 28 08:09:29 node2 pmxcfs[3565]: [status] notice: received all states
Oct 28 08:09:29 node2 pmxcfs[3565]: [status] notice: all data is up to date
Oct 28 08:09:29 node2 pmxcfs[3565]: [dcdb] notice: update complete - trying to commit (got 7 inode updates)
Oct 28 08:09:29 node2 pmxcfs[3565]: [dcdb] notice: all data is up to date
Oct 28 08:09:38 node2 pve-ha-crm[3627]: status change wait_for_quorum => slave
Oct 28 08:09:53 node2 corosync[3573]: [KNET ] link: host: 4 link: 0 is down
Oct 28 08:09:53 node2 corosync[3573]: [KNET ] link: host: 1 link: 0 is down
Oct 28 08:09:53 node2 corosync[3573]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Oct 28 08:09:53 node2 corosync[3573]: [KNET ] host: host: 4 has no active links
Oct 28 08:09:53 node2 corosync[3573]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Oct 28 08:09:53 node2 corosync[3573]: [KNET ] host: host: 1 has no active links
Oct 28 08:09:54 node2 corosync[3573]: [TOTEM ] Token has not been received in 1237 ms
Oct 28 08:09:55 node2 corosync[3573]: [TOTEM ] A processor failed, forming new configuration.
Oct 28 08:09:57 node2 corosync[3573]: [TOTEM ] A new membership (2.bbfa) was formed. Members left: 1 4
Oct 28 08:09:57 node2 corosync[3573]: [TOTEM ] Failed to receive the leave message. failed: 1 4
Oct 28 08:09:57 node2 corosync[3573]: [CPG ] downlist left_list: 2 received
Oct 28 08:09:57 node2 pmxcfs[3565]: [dcdb] notice: members: 2/3565
Oct 28 08:09:57 node2 corosync[3573]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Oct 28 08:09:57 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:09:57 node2 pmxcfs[3565]: [status] notice: members: 2/3565
Oct 28 08:09:57 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:09:57 node2 pmxcfs[3565]: [status] notice: node lost quorum
Oct 28 08:09:57 node2 pmxcfs[3565]: [dcdb] crit: received write while not quorate - trigger resync
Oct 28 08:09:57 node2 pmxcfs[3565]: [dcdb] crit: leaving CPG group
Oct 28 08:09:57 node2 pmxcfs[3565]: [dcdb] notice: start cluster connection
Oct 28 08:09:57 node2 pmxcfs[3565]: [dcdb] crit: cpg_join failed: 14
Oct 28 08:09:57 node2 pmxcfs[3565]: [dcdb] crit: can't initialize service
Oct 28 08:09:57 node2 pve-ha-lrm[3636]: unable to write lrm status file - unable to open file '/etc/pve/nodes/node2/lrm_status.tmp.3636' - Permission denied
Oct 28 08:10:00 node2 systemd[1]: Starting Proxmox VE replication runner...
Oct 28 08:10:00 node2 pvesr[52774]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:10:01 node2 pvestatd[3594]: storage 'nfs-back' is not online
Oct 28 08:10:01 node2 pvesr[52774]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:10:02 node2 pve-ha-crm[3627]: status change slave => wait_for_quorum
Oct 28 08:10:02 node2 pvesr[52774]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:10:03 node2 pmxcfs[3565]: [dcdb] notice: members: 2/3565
Oct 28 08:10:03 node2 pmxcfs[3565]: [dcdb] notice: all data is up to date
Oct 28 08:10:03 node2 pvesr[52774]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:10:04 node2 pvesr[52774]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:10:05 node2 pvesr[52774]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:10:06 node2 pvesr[52774]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:10:07 node2 pvesr[52774]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:10:08 node2 pvesr[52774]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 28 08:10:09 node2 pvesr[52774]: error with cfs lock 'file-replication_cfg': no quorum!
Oct 28 08:10:09 node2 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Oct 28 08:10:09 node2 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Oct 28 08:10:09 node2 systemd[1]: Failed to start Proxmox VE replication runner.
Oct 28 08:10:10 node2 pvestatd[3594]: storage 'nfs-back' is not online
Oct 28 08:10:10 node2 corosync[3573]: [KNET ] rx: host: 4 link: 0 is up
Oct 28 08:10:10 node2 corosync[3573]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Oct 28 08:10:13 node2 corosync[3573]: [TOTEM ] A new membership (2.bbfe) was formed. Members
Oct 28 08:10:13 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:10:13 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:10:13 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:10:15 node2 corosync[3573]: [TOTEM ] A new membership (2.bc02) was formed. Members
Oct 28 08:10:15 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:10:15 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:10:15 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:10:17 node2 corosync[3573]: [TOTEM ] A new membership (2.bc06) was formed. Members
Oct 28 08:10:17 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:10:17 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:10:17 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:10:19 node2 corosync[3573]: [TOTEM ] A new membership (2.bc0a) was formed. Members
Oct 28 08:10:19 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:10:19 node2 corosync[3573]: [QUORUM] Members[1]: 2
Oct 28 08:10:19 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:10:20 node2 corosync[3573]: [KNET ] rx: host: 1 link: 0 is up
Oct 28 08:10:20 node2 corosync[3573]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Oct 28 08:10:20 node2 corosync[3573]: [TOTEM ] A new membership (1.bc0e) was formed. Members joined: 1 4
Oct 28 08:10:20 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:10:20 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:10:20 node2 corosync[3573]: [CPG ] downlist left_list: 0 received
Oct 28 08:10:20 node2 pmxcfs[3565]: [dcdb] notice: members: 1/2737, 2/3565, 4/14943
Oct 28 08:10:20 node2 pmxcfs[3565]: [dcdb] notice: starting data syncronisation
Oct 28 08:10:20 node2 pmxcfs[3565]: [status] notice: members: 1/2737, 2/3565, 4/14943
Oct 28 08:10:20 node2 pmxcfs[3565]: [status] notice: starting data syncronisation
Oct 28 08:10:20 node2 corosync[3573]: [QUORUM] This node is within the primary component and will provide service.
Oct 28 08:10:20 node2 corosync[3573]: [QUORUM] Members[3]: 1 2 4
Oct 28 08:10:20 node2 corosync[3573]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 28 08:10:20 node2 pmxcfs[3565]: [status] notice: node has quorum
Oct 28 08:10:20 node2 pmxcfs[3565]: [dcdb] notice: received sync request (epoch 1/2737/00000CB6)
Oct 28 08:10:20 node2 pmxcfs[3565]: [status] notice: received sync request (epoch 1/2737/00000CDE)
Oct 28 08:10:20 node2 pmxcfs[3565]: [dcdb] notice: received all states
Oct 28 08:10:20 node2 pmxcfs[3565]: [dcdb] notice: leader is 1/2737
Oct 28 08:10:20 node2 pmxcfs[3565]: [dcdb] notice: synced members: 1/2737, 4/14943
Oct 28 08:10:20 node2 pmxcfs[3565]: [dcdb] notice: waiting for updates from leader
Oct 28 08:10:20 node2 pmxcfs[3565]: [dcdb] notice: dfsm_deliver_queue: queue length 5
Oct 28 08:10:20 node2 pmxcfs[3565]: [status] notice: received all states
Oct 28 08:10:20 node2 pmxcfs[3565]: [status] notice: all data is up to date
Oct 28 08:10:20 node2 pmxcfs[3565]: [status] notice: dfsm_deliver_queue: queue length 6
Oct 28 08:10:20 node2 pmxcfs[3565]: [dcdb] notice: update complete - trying to commit (got 7 inode updates)
Oct 28 08:10:20 node2 pmxcfs[3565]: [dcdb] notice: all data is up to date
Oct 28 08:10:20 node2 pmxcfs[3565]: [dcdb] notice: dfsm_deliver_sync_queue: queue length 5
Oct 28 08:10:27 node2 pve-ha-crm[3627]: status change wait_for_quorum => slave
Oct 28 08:11:00 node2 systemd[1]: Starting Proxmox VE replication runner...
Oct 28 08:11:00 node2 systemd[1]: pvesr.service: Succeeded.
Oct 28 08:11:00 node2 systemd[1]: Started Proxmox VE replication runner.
Oct 28 08:12:00 node2 systemd[1]: Starting Proxmox VE replication runner...
Oct 28 08:12:00 node2 systemd[1]: pvesr.service: Succeeded.
Oct 28 08:12:00 node2 systemd[1]: Started Proxmox VE replication runner.
Oct 28 08:12:01 node2 sshd[55659]: Accepted publickey for root from 172.16.100.10 port 60110 ssh2: RSA SHA256:6w/SKpMwegZ2NibghYuUTJWUZRg7k+PO9d2eyex6Rm0
Oct 28 08:12:01 node2 sshd[55659]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 28 08:12:01 node2 systemd-logind[2958]: New session 38678 of user root.
Oct 28 08:12:01 node2 systemd[1]: Started Session 38678 of user root.
Oct 28 08:12:01 node2 sshd[55659]: Received disconnect from 172.16.100.10 port 60110:11: disconnected by user
Oct 28 08:12:01 node2 sshd[55659]: Disconnected from user root 172.16.100.10 port 60110
Oct 28 08:12:01 node2 sshd[55659]: pam_unix(sshd:session): session closed for user root
Oct 28 08:12:01 node2 systemd[1]: session-38678.scope: Succeeded.
Oct 28 08:12:01 node2 systemd-logind[2958]: Session 38678 logged out. Waiting for processes to exit.
Oct 28 08:12:01 node2 systemd-logind[2958]: Removed session 38678.
Oct 28 08:12:01 node2 sshd[55677]: Accepted publickey for root from 172.16.100.10 port 60116 ssh2: RSA SHA256:6w/SKpMwegZ2NibghYuUTJWUZRg7k+PO9d2eyex6Rm0
Oct 28 08:12:01 node2 sshd[55677]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 28 08:12:01 node2 systemd-logind[2958]: New session 38679 of user root.
Oct 28 08:12:01 node2 systemd[1]: Started Session 38679 of user root.
Oct 28 08:12:02 node2 zed[55796]: eid=99008 class=history_event pool_guid=0xE8B91311C244C8FC
in syslog pve-ha-lrm:
Oct 28 10:19:10 node2 systemd[1]: Started PVE Local HA Resource Manager Daemon.
Oct 28 10:20:17 node2 pve-ha-lrm[51076]: unable to write lrm status file - unable to open file '/etc/pve/nodes/node2/lrm_status.tmp.51076' - Permission denied
Oct 28 10:21:20 node2 pve-ha-lrm[51076]: unable to write lrm status file - unable to open file '/etc/pve/nodes/node2/lrm_status.tmp.51076' - Permission denied
What could it be?
Permission is probably denied because your cluster lost quorum. In case of lost quorum, files go read-only. Quorum gets lost when one node cannot be reached by the other nodes.
You should look at pvecm status to verify this.
Related
Run the container using docker redis:latest, and after about 30 minutes, the master changes to a slave and it is no longer writable.
Also, the slave outputs an error once per second that it cannot find the master.
1:M 08 Jul 2022 03:10:55.899 * DB saved on disk
1:M 08 Jul 2022 03:15:56.087 * 100 changes in 300 seconds. Saving...
1:M 08 Jul 2022 03:15:56.089 * Background saving started by pid 61
61:C 08 Jul 2022 03:15:56.091 * DB saved on disk
61:C 08 Jul 2022 03:15:56.092 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 08 Jul 2022 03:15:56.189 * Background saving terminated with success
1:S 08 Jul 2022 03:20:12.258 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 08 Jul 2022 03:20:12.258 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 03:20:12.258 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 03:20:12.259 * REPLICAOF 178.20.40.200:8886 enabled (user request from 'id=39 addr=95.182.123.66:36904 laddr=172.31.9.234:6379 fd=11 name= age=1 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=47 qbuf-free=20427 argv-mem=24 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=22320 events=r cmd=slaveof user=default redir=-1 resp=2')
1:S 08 Jul 2022 03:20:12.524 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 03:20:12.791 * Master replied to PING, replication can continue...
1:S 08 Jul 2022 03:20:13.335 * Trying a partial resynchronization (request 6743ff015583c86f3ac7a4305026c42991a1ca18:1).
1:S 08 Jul 2022 03:20:13.603 * Full resync from master: ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ:1
1:S 08 Jul 2022 03:20:13.603 * MASTER <-> REPLICA sync: receiving 54976 bytes from master to disk
1:S 08 Jul 2022 03:20:14.138 * Discarding previously cached master state.
1:S 08 Jul 2022 03:20:14.138 * MASTER <-> REPLICA sync: Flushing old data
1:S 08 Jul 2022 03:20:14.139 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 08 Jul 2022 03:20:14.140 # Wrong signature trying to load DB from file
1:S 08 Jul 2022 03:20:14.140 # Failed trying to load the MASTER synchronization DB from disk: Invalid argument
1:S 08 Jul 2022 03:20:14.140 * Reconnecting to MASTER 178.20.40.200:8886 after failure
1:S 08 Jul 2022 03:20:14.140 * MASTER <-> REPLICA sync started
...
1:S 08 Jul 2022 05:09:50.010 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:50.298 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:50.587 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:50.587 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:51.013 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:51.014 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:51.294 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:51.581 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:51.581 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:52.017 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:52.017 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:52.297 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:52.578 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:52.578 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:53.021 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:53.021 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:53.308 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:53.594 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:53.594 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:54.025 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:54.025 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:54.316 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:54.608 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:54.608 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:55.028 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:55.028 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:55.309 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:55.588 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:55.588 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:56.031 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:56.031 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:56.311 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:56.592 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:56.592 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:57.035 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:57.035 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:57.321 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:57.610 * Master replied to PING, replication can continue...
...
SLAVEOF NO ONE
config set slave-read-only no
If I force the slave to be writable with the above command and try to write, all data will be flushed after about 5 seconds.
I don't want to turn master into slave.
I am getting this error on clean ec2 amazon linux.
I don't know what's causing this error because I also have enough memory.
Why does redis forcibly demote master to slave?
I have enabled both RDB and AOF backup via save 1 1 and appendonly yes. This configuration creates both RDB and AOF files, at prescribed locations. However, during restart of Redis the following is noticed
If appendonly yes, then RDB file is not read, regardless as to whether AOF file exists or not
If appendonly no, then RDB file is read
I've tested the above by setting appendonly yes and running rm /persistent/redis/appendonly.aof; systemctl restart redis. The log file shows
Aug 13 11:11:06 saltspring-zynqmp redis-server[16292]: 16292:M 13 Aug 11:11:06.199 # Redis is now ready to exit, bye bye...
Aug 13 11:11:06 saltspring-zynqmp redis[16292]: DB saved on disk
Aug 13 11:11:06 saltspring-zynqmp redis[16292]: Removing the pid file.
Aug 13 11:11:06 saltspring-zynqmp redis[16292]: Redis is now ready to exit, bye bye...
Aug 13 11:11:06 saltspring-zynqmp systemd[1]: redis.service: Succeeded.
Aug 13 11:11:06 saltspring-zynqmp systemd[1]: Stopped redis.service.
Aug 13 11:11:06 saltspring-zynqmp systemd[1]: Starting redis.service...
Aug 13 11:11:06 saltspring-zynqmp redis-check-aof[16354]: Cannot open file: /persistent/redis/appendonly.aof
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: 16355:C 13 Aug 11:11:06.232 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: 16355:C 13 Aug 11:11:06.233 # Redis version=4.0.14, bits=64, commit=00000000, modified=0, pid=16355, just started
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: Redis version=4.0.14, bits=64, commit=00000000, modified=0, pid=16355, just started
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: 16355:C 13 Aug 11:11:06.234 # Configuration loaded
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: Configuration loaded
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: 16355:C 13 Aug 11:11:06.234 * supervised by systemd, will signal readiness
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: supervised by systemd, will signal readiness
Aug 13 11:11:06 saltspring-zynqmp systemd[1]: Started redis.service.
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: 16355:M 13 Aug 11:11:06.239 * Increased maximum number of open files to 10032 (it was originally set to 1024).
Aug 13 11:11:06 saltspring-zynqmp redis[16355]: Increased maximum number of open files to 10032 (it was originally set to 1024).
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: 16355:M 13 Aug 11:11:06.241 * Running mode=standalone, port=6379.
Aug 13 11:11:06 saltspring-zynqmp redis[16355]: Running mode=standalone, port=6379.
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: 16355:M 13 Aug 11:11:06.242 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
Aug 13 11:11:06 saltspring-zynqmp redis[16355]: WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: 16355:M 13 Aug 11:11:06.242 # Server initialized
Aug 13 11:11:06 saltspring-zynqmp redis[16355]: Server initialized
Aug 13 11:11:06 saltspring-zynqmp redis-server[16355]: 16355:M 13 Aug 11:11:06.242 * Ready to accept connections
Aug 13 11:11:06 saltspring-zynqmp redis[16355]: Ready to accept connections
Notice that the excepted message
...
Aug 13 11:26:53 saltspring-zynqmp redis[16616]: DB loaded from disk: 0.000 seconds
Aug 13 11:26:53 saltspring-zynqmp redis[16616]: Ready to accept connections
is missing. To get RDB read, appendonly must be set to no.
Any thoughts?
Cheers,
Try to using ssh connect google cloud computer engine (macOs Catalina)
gcloud beta compute ssh --zone "us-west1-b" "mac-vm" --project "mac-vm-282201"
and get error
ssh: connect to host 34.105.11.187 port 22: Operation timed out
ERROR: (gcloud.beta.compute.ssh) [/usr/bin/ssh] exited with return code [255].
and I try
ssh -I ~/.ssh/mac-vm-key asd61404#34.105.11.187
also get error
ssh: connect to host 34.105.11.187 port 22: Operation timed out
so I found this code to diagnose it
gcloud compute ssh —zone "us-west1-b" "mac-vm" —project "mac-vm-282201" —ssh-flag="-vvv"
return
OpenSSH_7.9p1, LibreSSL 2.7.3
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 48: Applying options for *
debug2: resolve_canonicalize: hostname 34.105.11.187 is address
debug2: ssh_connect_direct
debug1: Connecting to 34.105.11.187 [34.105.11.187] port 22.
I don't know, how can I fix this issue.
Thanks in advance!
here is my recent Serial console
Jul 4 02:28:39 mac-vm google_network_daemon[684]: For info, please visit https://www.isc.org/software/dhcp/
Jul 4 02:28:39 mac-vm dhclient[684]:
Jul 4 02:28:39 mac-vm dhclient[684]: Listening on Socket/ens4
[ 19.458355] google_network_daemon[684]: Listening on Socket/ens4
Jul 4 02:28:39 mac-vm google_network_daemon[684]: Listening on Socket/ens4
Jul 4 02:28:39 mac-vm dhclient[684]: Sending on Socket/ens4
[ 19.458697] google_network_daemon[684]: Sending on Socket/ens4
Jul 4 02:28:39 mac-vm google_network_daemon[684]: Sending on Socket/ens4
Jul 4 02:28:39 mac-vm systemd[1]: Finished Wait until snapd is fully seeded.
Jul 4 02:28:39 mac-vm systemd[1]: Starting Apply the settings specified in cloud-config...
Jul 4 02:28:39 mac-vm systemd[1]: Condition check resulted in Auto import assertions from block devices being skipped.
Jul 4 02:28:39 mac-vm systemd[1]: Reached target Multi-User System.
Jul 4 02:28:39 mac-vm systemd[1]: Reached target Graphical Interface.
Jul 4 02:28:39 mac-vm systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jul 4 02:28:39 mac-vm systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Jul 4 02:28:39 mac-vm systemd[1]: Finished Update UTMP about System Runlevel Changes.
[ 20.216129] cloud-init[718]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'modules:config' at Sat, 04 Jul 2020 02:28:39 +0000. Up 20.11 seconds.
Jul 4 02:28:39 mac-vm cloud-init[718]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'modules:config' at Sat, 04 Jul 2020 02:28:39 +0000. Up 20.11 seconds.
Jul 4 02:28:39 mac-vm systemd[1]: Finished Apply the settings specified in cloud-config.
Jul 4 02:28:39 mac-vm systemd[1]: Starting Execute cloud user/final scripts...
Jul 4 02:28:41 mac-vm google-clock-skew: INFO Synced system time with hardware clock.
[ 20.886105] cloud-init[725]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'modules:final' at Sat, 04 Jul 2020 02:28:41 +0000. Up 20.76 seconds.
[ 20.886430] cloud-init[725]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 finished at Sat, 04 Jul 2020 02:28:41 +0000. Datasource DataSourceGCE. Up 20.87 seconds
Jul 4 02:28:41 mac-vm cloud-init[725]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'modules:final' at Sat, 04 Jul 2020 02:28:41 +0000. Up 20.76 seconds.
Jul 4 02:28:41 mac-vm cloud-init[725]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 finished at Sat, 04 Jul 2020 02:28:41 +0000. Datasource DataSourceGCE. Up 20.87 seconds
Jul 4 02:28:41 mac-vm systemd[1]: Finished Execute cloud user/final scripts.
Jul 4 02:28:41 mac-vm systemd[1]: Reached target Cloud-init target.
Jul 4 02:28:41 mac-vm systemd[1]: Starting Google Compute Engine Startup Scripts...
Jul 4 02:28:41 mac-vm startup-script: INFO Starting startup scripts.
Jul 4 02:28:41 mac-vm startup-script: INFO Found startup-script in metadata.
Jul 4 02:28:42 mac-vm startup-script: INFO startup-script: sudo: ufw: command not found
Jul 4 02:28:42 mac-vm startup-script: INFO startup-script: Return code 1.
Jul 4 02:28:42 mac-vm startup-script: INFO Finished running startup scripts.
Jul 4 02:28:42 mac-vm systemd[1]: google-startup-scripts.service: Succeeded.
Jul 4 02:28:42 mac-vm systemd[1]: Finished Google Compute Engine Startup Scripts.
Jul 4 02:28:42 mac-vm systemd[1]: Startup finished in 1.396s (kernel) + 20.065s (userspace) = 21.461s.
Jul 4 02:29:06 mac-vm systemd[1]: systemd-hostnamed.service: Succeeded.
Jul 4 02:43:32 mac-vm systemd[1]: Starting Cleanup of Temporary Directories...
Jul 4 02:43:32 mac-vm systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Jul 4 02:43:32 mac-vm systemd[1]: Finished Cleanup of Temporary Directories.
I currently have a working cluster with two nodes. Following is the content of /etc/aerospike/aerospike.conf -
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port <existing server's ip> 3002
mesh-seed-address-port <other server's ip> 3002
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
I tried by changing the heartbeat setting by removing the address port of the other node -
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port <existing server's ip> 3002
interval 250
timeout 10
}
Then I restarted the aerospike and the amc services -
service aerospike restart
service amc restart
However, still the /var/log/aerospike/aerospike.log file shows two nodes present -
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:249) system-memory: free-kbytes 125756260 free-pct 99 heap-kbytes (2343074,2344032,2417664) heap-efficiency-pct 96.9
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:263) in-progress: tsvc-q 0 info-q 0 nsup-delete-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:285) fds: proto (20,23,3) heartbeat (1,1,0) fabric (19,19,0)
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:294) heartbeat-received: self 0 foreign 1488
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:348) {FC} objects: all 0 master 0 prole 0
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:409) {FC} migrations: complete
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:428) {FC} memory-usage: total-bytes 0 index-bytes 0 sindex-bytes 0 data-bytes 0 used-pct 0.00
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:348) {TARGETPARAMS} objects: all 0 master 0 prole 0
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:409) {TARGETPARAMS} migrations: complete
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:428) {TARGETPARAMS} memory-usage: total-bytes 0 index-bytes 0 sindex-bytes 0 data-bytes 0 used-pct 0.00
Mar 07 2017 13:16:38 GMT: INFO (info): (ticker.c:169) NODE-ID bb93c00b70b0022 CLUSTER-SIZE 2
Mar 07 2017 13:16:38 GMT: INFO (info): (ticker.c:249) system-memory: free-kbytes 125756196 free-pct 99 heap-kbytes (2343073,2344032,2417664) heap-efficiency-pct 96.9
So does the AMC console.
This should help: http://www.aerospike.com/docs/operations/manage/cluster_mng/removing_node
Once the node is removed properly, you can restart it with the different heartbeat config so that it doesn't join the other node.
For version, simply do asd --version. You can also use asinfo -v build.
The version is also printed within asadm / AMC and in the logs right at startup.
I so much tried to install nginx , but it didn't work. After that i tried install and configure apache(httpd) server on my Fedora23 dist. But my server doesn't want to be start namely it doesn't work and it returns some errors , when i try to start it: sudo systemctl start httpd Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details.
And after that i saw logs in journalctl -xe. it shows this:
Jun 28 11:26:49 cyber audit[15981]: CRED_REFR pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:s
Jun 28 11:26:49 cyber sudo[15981]: pam_systemd(sudo:session): Cannot create session: Already running in a session
Jun 28 11:26:49 cyber kernel: audit: type=1105 audit(1467095209.363:842): pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:
Jun 28 11:26:49 cyber sudo[15981]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 28 11:26:49 cyber audit[15981]: USER_START pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:
Jun 28 11:26:49 cyber polkitd[900]: Registered Authentication Agent for unix-process:15982:464115 (system bus name :1.256 [/usr/bin/pkttyagent --notif
Jun 28 11:26:49 cyber systemd[1]: Starting The Apache HTTP Server...
-- Subject: Unit httpd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit httpd.service has begun starting up.
Jun 28 11:26:49 cyber audit[15988]: AVC avc: denied { append } for pid=15988 comm="httpd" name="error.log" dev="dm-0" ino=1704951 scontext=system_u
Jun 28 11:26:49 cyber kernel: audit: type=1400 audit(1467095209.491:843): avc: denied { append } for pid=15988 comm="httpd" name="error.log" dev="d
Jun 28 11:26:49 cyber systemd[1]: httpd.service: Main process exited, code=exited, status=1/FAILURE
Jun 28 11:26:49 cyber systemd[1]: Failed to start The Apache HTTP Server.
-- Subject: Unit httpd.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit httpd.service has failed.
--
-- The result is failed.
Jun 28 11:26:49 cyber systemd[1]: httpd.service: Unit entered failed state.
Jun 28 11:26:49 cyber audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=httpd comm="system
Jun 28 11:26:49 cyber kernel: audit: type=1130 audit(1467095209.525:844): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
Jun 28 11:26:49 cyber kernel: audit: type=1106 audit(1467095209.533:845): pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:
Jun 28 11:26:49 cyber kernel: audit: type=1104 audit(1467095209.533:846): pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:
Jun 28 11:26:49 cyber systemd[1]: httpd.service: Failed with result 'exit-code'.
Jun 28 11:26:49 cyber sudo[15981]: pam_unix(sudo:session): session closed for user root
Jun 28 11:26:49 cyber audit[15981]: USER_END pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:se
Jun 28 11:26:49 cyber audit[15981]: CRED_DISP pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:s
Jun 28 11:26:50 cyber polkitd[900]: Unregistered Authentication Agent for unix-process:15982:464115 (system bus name :1.256, object path /org/freedesk
Jun 28 11:27:15 cyber google-chrome.desktop[3111]: [1:1:0628/112715:ERROR:PlatformKeyboardEvent.cpp(117)] Not implemented reached in static PlatformEv
I did all steps like this steps
Can anybody explain me what is error and how solve it ?
service httpd status