Change Replication and other base commands - replication

I have been attempting to do something here, I will explain what I am attempting and what I have done thus far which is not much since I cannot find the information online.
There is a feature that has not been added to Proxmox (according to others online), "-w" option for zvol transfers (send and recv zfs commands)
Full story:
I have some blades and wanted to create a high-availability system with (if)possible load balancing for a work cloud solution.I set my software up and that worked great.
When I try to replicate I receive:
2022-11-30 07:57:03 123-0: start replication job2022-11-30 07:57:03 123-0: guest => VM 123, running => 2041662022-11-30 07:57:03 123-0: volumes => local-zfs:vm-123-disk-02022-11-30 07:58:15 123-0: create snapshot '__replicate_123-0_1669823823__' on local-zfs:vm-123-disk-02022-11-30 07:58:15 123-0: using secure transmission, rate limit: none2022-11-30 07:58:15 123-0: full sync 'local-zfs:vm-123-disk-0' (__replicate_123-0_1669823823__)2022-11-30 07:58:30 123-0: cannot send rpool/data/vm-123-disk-0#__replicate_123-0_1669823823__: encrypted dataset rpool/data/vm-123-disk-0 may not be sent with properties without the raw flag2022-11-30 07:58:30 123-0: warning: cannot send 'rpool/data/vm-123-disk-0#__replicate_123-0_1669823823__': backup failed2022-11-30 07:58:30 123-0: command 'zfs send -Rpv -- rpool/data/vm-123-disk-0#__replicate_123-0_1669823823__' failed: exit code 12022-11-30 07:58:30 123-0: cannot receive: failed to read from stream2022-11-30 07:58:30 123-0: cannot open 'rpool/data/vm-123-disk-0': dataset does not exist2022-11-30 07:58:30 123-0: command 'zfs recv -F -- rpool/data/vm-123-disk-0' failed: exit code 12022-11-30 07:58:30 123-0: delete previous replication snapshot '__replicate_123-0_1669823823__' on local-zfs:vm-123-disk-02022-11-30 07:58:31 123-0: end replication job with error: command 'set -o pipefail && pvesm export local-zfs:vm-123-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_123-0_1669823823__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=ProxmoxVEBlade1' root#192.168.1.175 -- pvesm import local-zfs:vm-123-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_123-0_1669823823__ -allow-rename 0' failed: exit code
The simple solution to this is to add -w for raw transfer.
So my question is, to introduce functional replication in Proxmox VE, How would I add/modify options of the standard send recv futions of ZFS.Thank you for your time and answers. Please use simplest terms as I am very ignorant of this stuff.
I was directed here by someone who probably could do this but is too busy to help. They told me this is likely the only place I would get a "real" answer.
So far I have determined that the command is called by a Perl script. But where that script is, I am unsure.
Note: Non-Encrypted zvols are not an option. Unfortunately.

Related

failed retrieving file from mirror.erickochen.nl

When I run pacman -Syu to update, it first shows no error, I normally update everything and after that, I run pacman -Syu again, it shows this, what is the reason and any solution?
:: Synchronizing package databases...
core is up to date
extra is up to date
community is up to date
error: failed retrieving file 'core.db' from mirror.erickochen.nl : Failed to connect to mirror.erickochen.nl port 443 after 5241 ms: Connection timed out
error: failed retrieving file 'extra.db' from mirror.erickochen.nl : Failed to connect to mirror.erickochen.nl port 443 after 5202 ms: Connection timed out
error: failed retrieving file 'community.db' from mirror.erickochen.nl : Failed to connect to mirror.erickochen.nl port 443 after 5202 ms: Connection timed out
warning: too many errors from mirror.erickochen.nl, skipping for the remainder of this transaction
:: Starting full system upgrade...
there is nothing to do
Sometimes mirrors go offline, it's recommended to have multiple mirrors so you don't have a single point of failure, as well as keeping mirrors updated. Using reflector is recommended since it also finds fast candidates based on your location.
For the time being, edit /etc/pacman.d/mirrorlist and uncomment a couple of mirrors, then try updating again.

Why can I read ksqldb streams but not topics within ksql client?

I am testing ksqldb on AWS EC2 instances in the latest release (confluent 5.5.1) and have an access problem that I can't solve.
I have a secured Kafka sever (SASL_SSSL, SASL mode PLAIN), an unsecured Schema Registry (another issue with Avro Serializers, but ok for the moment), and a secured KSQL Server and Client.
Topics are filled properly with AVRO data (value only, no key) from a JDBC source connector.
I can access the KSQL Server with ksql without issues
I can access KSQL REST API without issues
When I list topics within ksql, I get the correct list.
When I select a push stream, I get messages when I push something into the topic (with Kafka Connect, in my case).
BUT: When I call "print topic" I get a ~60 sec block in the client, followed by a 'Timeout expired while fetching topic metadata'.
The ksql-kafka.log goes wild with repeated entries like
[2020-09-02 18:52:46,246] WARN [Consumer clientId=consumer-2, groupId=null] Bootstrap broker ip-10-1-2-10.eu-central-1.compute.internal:9093 (id: -3 rack: null) disconnected (org.apache.kafka.clients.NetworkClient:1037)
The corresponding broker log shows
Sep 2 18:52:44 ip-10-1-6-11 kafka-server-start: [2020-09-02 18:52:44,704] INFO [SocketServer brokerId=1002] Failed authentication with ip-10-1-2-231.eu-central-1.compute.internal/10.1.2.231 (Unexpected Kafka request of type METADATA during SASL handshake.) (org.apache.kafka.common.network.Selector)
This is my ksql-server.properties file:
ksql.service.id= hf_kafka_ksql_001
bootstrap.servers=ip-10-1-11-229.eu-central-1.compute.internal:9093,ip-10-1-6-11.eu-central-1.compute.internal:9093,ip-10-1-2-10.eu-central-1.compute.internal:9093
ksql.streams.state.dir=/var/data/ksqldb
ksql.schema.registry.url=http://ip-10-1-1-22.eu-central-1.compute.internal:8081
ksql.output.topic.name.prefix=ksql-interactive-
ksql.internal.topic.replicas=3
confluent.support.metrics.enable=false
# currently the keystore contains only the ksql server and the certificate chain to the CA
ssl.keystore.location=/var/kafka-ssl/ksql.keystore.jks
ssl.keystore.password=kspassword
ssl.key.password=kspassword
ssl.client.auth=true
# Need to set this to empty, otherwise the REST API is not accessible with the client key.
ssl.endpoint.identification.algorithm=
# currently the truststore contains only the CA certificate
ssl.truststore.location=/var/kafka-ssl/client.truststore.jks
ssl.truststore.password=ctpassword
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="ksql" \
password="ksqlsecret";
listeners=https://0.0.0.0:8088
advertised.listener=https://ip-10-1-2-231.eu-central-1.compute.internal:8088
authentication.method=BASIC
authentication.roles=admin,ksql,cli
authentication.realm=KsqlServerProps
# authentication for producers, needed for ksql commands like "Create Stream"
producer.ssl.endpoint.identification.algorithm=HTTPS
producer.security.protocol=SASL_SSL
producer.sasl.mechanism=PLAIN
producer.ssl.truststore.location=/var/kafka-ssl/client.truststore.jks
producer.ssl.truststore.password=ctpassword
producer.sasl.mechanism=PLAIN
producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="ksql" \
password="ksqlsecret";
# authentication for consumers, needed for ksql commands like "Create Stream"
consumer.ssl.endpoint.identification.algorithm=HTTPS
consumer.security.protocol=SASL_SSL
consumer.ssl.truststore.location=/var/kafka-ssl/client.truststore.jks
consumer.ssl.truststore.password=ctpassword
consumer.sasl.mechanism=PLAIN
consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="ksql" \
password="ksqlsecret";
I call ksql with
ksql --user cli --password test --config-file /var/kafka-ssl/ksql_cli.properties https://ip-10-1-2-231.eu-central-1.compute.internal:8088'
This is my ksql client configuration ksql_cli.properties:
security.protocol=SSL
#ssl.client.auth=true
ssl.truststore.location=/var/kafka-ssl/client.truststore.jks
ssl.truststore.password=ctpassword
ssl.keystore.location=/var/kafka-ssl/ksql.keystore.jks
ssl.keystore.password=kspassword
ssl.key.password=kspassword
JAAS config, included as Parameter on service start
KsqlServerProps {
org.eclipse.jetty.jaas.spi.PropertyFileLoginModule required
file="/var/kafka-ssl/cli.password"
debug="false";
};
with cli.password containing the authentication users and passwords for the ksql client.
I call ksql with
ksql --user cli --password test --config-file /var/kafka-ssl/ksql_cli.properties https://ip-10-1-2-231.eu-central-1.compute.internal:8088'
I possibly have tried any permutation of keys, settings etc but to no avail. Obviously there is something wroing in key management. For me, it is surprising that usings streams is ok but the low-level topics is not.
Has someone found a solution for that issue? I am really running ou of ideas here. Thanks.
Found it! It was easy to overlook - the client's configuration needs of course. a SASL setting...
security.protocol=SASL_SSL

Node-RED Handle error from node that throws no standardexception

I want to use the Pushover Node to send notifications. I'm already using it via curl for some time and very seldom some messages aren't sent. Thats why I have in bash
echo "$curlOutput" | grep -qP '{"status":1'
if [ ! $? -eq 0 ]
then
echo "$2" | mail --append "Content-Type: text/plain; charset=UTF-8" -s "$1" name#company.com
fi
to capture the error and then send the message via email.
Now I want to do something similiar in Node-Red. For testing purposes if simulated a network error via sudo iptables -A OUTPUT -d 104.20.0.0/16 -m comment --comment "Pushovertest" -j REJECT
That successfully blocks. In node-red-log I see
18 May 13:46:24 - [error] [pushover:252a17dc.1239d8] Error: connect
ECONNREFUSED 104.20.125.71:443
Now look at this Node-RED flow
The error is displayed in the debug window and comes from pushover node. The catch node doesn't catch the expection, obviously because pushover doesn't use the exception framework https://developer.ibm.com/recipes/tutorials/nodered-exception-handling-framework/
First test passed: There is an error logged. But how can I react to this error within Node-RED to do something else in this case?
I'll guess from looking at the source that the error is coming from line 103
of the 57-pushover.js file.
The call to node.error() on this line is not pushing the incoming msg object so it won't be passed to the catch node. There are 2 signatures for the node.error() function, the first just takes the error message, the second takes the error message and the incoming msg object, only the second forwards the error and msg object is passed to the catch node.
Please feel free to submit a pull request to update this node.

Issue with Open Shift Origin Mongo DB service

I have installed OpenShift Origin V3 on aws ec2(Fedora19) using oo-install.The set up is One Broker +One Node.
I was making some modifications to the security groups to make it more restrictive -
and it ended up some issues in the mongo service.
1.service mongod does not start up and the status shows failed.
The /var/log/mongodb/mongodb.log says
Thu Mar 6 11:24:08.189 [initandlisten] ERROR: listen(): bind() failed errno:99 Cannot assign requested address for socket: :27017
Thu Mar 6 11:24:08.189 [initandlisten] now exiting
Running oo-accept-broker -v says
FAIL: error logging into mongo db: MOPED: Retrying connection to primary for replica set :27017">]>: MOPED: Retrying connection to primary for replica set :27017">]>/MOPED: --username Retrying, exit code: 1
Any pointers on how to resolve this will be greatly appreciated.
Thanks
Shabna
I would try rolling back your changes to the security groups first and then make the changes one by one and see which one causes the issue, then post that to stack and see if anyone can comment on the specific change that is affecting mongodb.

How do I use Nagios to monitor a log file

We are using Nagios to monitor our network with great success. However, we have a syslog for critical application errors and while I set up check_log, it doesn't seem to work as well as monitering a device.
The issues are:
It only shows the last entry
There doesn't seem to be a way to acknowledge the critical error and
return the monitor to a good state
Is nagios the wrong tool, or are we just not setting up the service monitering right?
Here are my entries
# log file
define command{
command_name check_log
command_line $USER1$/check_log -F /var/log/applications/appcrit.log -O /tmp/appcrit.log -q ?
}
# Define the log monitering service
define service{
name logfile-check ;
use generic-service ;
check_period 24x7 ;
max_check_attempts 1 ;
normal_check_interval 5 ;
retry_check_interval 1 ;
contact_groups admins ;
notification_options w,u,c,r ;
notification_period 24x7 ;
register 0 ;
}
define service{
use logfile-check
host_name localhost
service_description CritLogFile
check_command check_log
}
For monitoring logs with Nagios, typically the log checker will return a warning only for newly discovered error messages each time it is invoked (so it must retain some state in order to know to ignore them on subsequent runs). Therefore I usually set:
max_check_attempts 1
is_volatile 1
This causes Nagios to send out the alert immeidately, but only once, and then go back to normal.
My favorite log checker is logwarn, but I'm biased because I wrote it myself after not finding any existing ones that I liked. The logwarn package includes a Nagios plugin.
Nothing in your config jumps out at me as being misconfigured.
By design, check_log will only show either an OK message, or the last log entry that triggered an alert. If you need to see multiple entries, you'll need to modify the plugin.
However, I find the fact that you're not getting recoveries somewhat odd. The way check_log works (by comparing the current log to the previous version), you should get a recovery on the very next service check. Except of course, when there have been additional matching entries added to the log since the last check.
Does forcing another service check (or several) cause it to recover?
Also, I don't intend this in a mean way, but make sure it's really malfunctioning.
Is your log getting additional matching entries in between checks, causing it not to recover? Your check is matching "?" which will match anything new in the log. Is something else (a non-error) being added to the log and inadvertently causing a match?
If none of the above are the issue, I would suggest narrowing it down by taking Nagios out of the equation. Try running check_log manually (from the command line, but as the same user as nagios), and with a different oldlog. It should go something like this -
run check with a new "oldlog" - get initialization message
run check - check OK
make change to log
run check - check fails
run check - check OK
If this doesn't work, then you know to focus on the log, the oldlog, and how the check_log is doing the check.
If it works, then it points more towards a problem with your nagios configuration.
There is a Nagios plugin that you can use to check the log files: it's called check_logfiles and it's used to scan the lines of a file for regular expressions.
The following link shows how to install and configure check_logfiles for Nagios and Opsview:
https://www.opsview.com/resources/nagios-alternative/blog/syslog-monitoring-nagios-opsview
As there are many ways to achieve a goal, there is also a nice plugin from Consol available:
https://labs.consol.de/lang/en/nagios/check_logfiles/
supports regex
supports log rotation
To use it, you need a cfg file, this is an example for oracle databases
#searches = ({
tag => 'oraalerts',
options => 'sticky=28800',
logfile => '/u01/app/oracle/diag/rdbms/davmdkp/DAVMDKP1/trace/alert_DAVMDKP1.log',
criticalpatterns => [
'ORA\-0*204[^\d]', # error in reading control file
'ORA\-0*206[^\d]', # error in writing control file
'ORA\-0*210[^\d]', # cannot open control file
'ORA\-0*257[^\d]', # archiver is stuck
'ORA\-0*333[^\d]', # redo log read error
'ORA\-0*345[^\d]', # redo log write error
'ORA\-0*4[4-7][0-9][^\d]',# ORA-0440 - ORA-0485 background process failure
'ORA\-0*48[0-5][^\d]',
'ORA\-0*6[0-3][0-9][^\d]',# ORA-6000 - ORA-0639 internal errors
'ORA\-0*1114[^\d]', # datafile I/O write error
'ORA\-0*1115[^\d]', # datafile I/O read error
'ORA\-0*1116[^\d]', # cannot open datafile
'ORA\-0*1118[^\d]', # cannot add a data file
'ORA\-0*1122[^\d]', # database file 16 failed verification check
'ORA\-0*1171[^\d]', # datafile 16 going offline due to error advancing checkpoint
'ORA\-0*1201[^\d]', # file 16 header failed to write correctly
'ORA\-0*1208[^\d]', # data file is an old version - not accessing current version
'ORA\-0*1578[^\d]', # data block corruption
'ORA\-0*1135[^\d]', # file accessed for query is offline
'ORA\-0*1547[^\d]', # tablespace is full
'ORA\-0*1555[^\d]', # snapshot too old
'ORA\-0*1562[^\d]', # failed to extend rollback segment
'ORA\-0*162[89][^\d]', # ORA-1628 - ORA-1632 maximum extents exceeded
'ORA\-0*163[0-2][^\d]',
'ORA\-0*165[0-6][^\d]', # ORA-1650 - ORA-1656 tablespace is full
'ORA\-16014[^\d]', # log cannot be archived, no available destinations
'ORA\-16038[^\d]', # log cannot be archived
'ORA\-19502[^\d]', # write error on datafile
'ORA\-27063[^\d]', # number of bytes read/written is incorrect
'ORA\-0*4031[^\d]', # out of shared memory.
'No space left on device',
'Archival Error',
],
warningpatterns => [
'ORA\-0*3113[^\d]', # end of file on communication channel
'ORA\-0*6501[^\d]', # PL/SQL internal error
'ORA\-0*1140[^\d]', # follows WARNING: datafile #20 was not in online backup mode
'Archival stopped, error occurred. Will continue retrying',
]
});
I believe there's now a real Nagios plugin that monitors logs effectively.
http://support.nagios.com/forum/viewtopic.php?f=6&t=8851&p=42088&hilit=unixautomation#p42088
The home page of the Nagios plugin on that page is Nagios Log Monitor
Your [ commands.cfg file ] will contain:
define command {
command_name NagiosLogMonitor
command_line $USER1$/NagiosLogMonitor $HOSTNAME$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ '$ARG5$' '$ARG6$' $ARG7$ $ARG8$ $ARG9$ $ARG10$
}
OR
define command {
command_name NagiosLogMonitor
command_line $USER1$/NagiosLogMonitor $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ '$ARG5$' '$ARG6$' $ARG7$ $ARG8$ $ARG9$ $ARG10$
}
Your [ services.cfg file ] will look similar to:
define service {
check_command NagiosLogMonitor!logrobot!autofig!/var/log/proteus.log!15!500.html!500 Internal Server Error!1!2!-foundn
max_check_attempts 1
service_description 500_ERRORS_LOGCHECK
host_name sky.blat-01.net,sky.blat-02.net,sky.blat-03.net
use fifteen-minute-interval
}
Nagios now has a solution that integrates tightly with Nagios Core, XI, etc.
Nagios Log Server which can alert on any query on any log file on any system in your infrastructure.