Issue with backups failing on Bacula - bacula

I'm new to Bacula, and I've inherited an environment that was already setup and deployed. Recently one of our servers that we have always backed up crashed and was deemed no longer of any use, so I was tasked to remove it from the client list, which I did. Since I've removed it, every morning I have jobs failing and I can see from the email I receive that it's looking to copy an old job:
15-Jun 01:00 bacula-dir JobId 56332: Copying using JobId=55657 Job=server2-fd.2022-05-31_18.00.01_46
15-Jun 01:00 bacula-dir JobId 56332: Fatal error: Previous Job resource not found for "server2-fd".
15-Jun 01:00 bacula-dir JobId 56332: Error: Bacula bacula-dir 9.4.2 (04Feb19):
Build OS: x86_64-pc-linux-gnu redhat Enterprise release
Prev Backup JobId: 55657
Prev Backup Job: server2-fd.2022-05-31_18.00.01_46
New Backup JobId: 0
Current JobId: 56332
Current Job: CopyDiskToTape.2022-06-15_01.00.01_17
Backup Level: Incremental
I can't find any indication of server2 in any of my jobs and I'm not sure how to get rid of these errors. What am I missing here?

Ok, I found a utility called dbcheck. Comes with bacula, allowed me to check for orphaned client records.

Related

Determine when a gitlab CI job ran

I have a CI job that ran last week:
Is there a way to find out exactly when it finished? I am trying to debug a problem that we just noticed, and knowing if the job finished at 9:00am or 9:06am or 6:23pm a week ago would be useful information.
The output from the job does not appear to indicate what time it started or stopped. When I asked Google, I got information about how to run jobs in serial or parallel or create CI jobs, but nothing about getting the time of the job.
For the future, I could put date into script or before_script, but that is not going to help with this job.
This is on a self-hosted gitlab instance. I am not sure of the version or what optional settings have been enabled.

Tar incremental restore : Cannot rename

I created a python script to do an incremental backup strategy on seven days whith a full backup on Sunday, using the command : tar
I have no probleme to generate my differents backups.
However, I've got an issue during trying to restore an incremental backup with this message error :
tar: Cannot rename `./path1' to `./path2': No such file or directory
tar: Exiting with failure status due to previous errors
My backups strategy run for a jenkins service.
Do you why I've got this error message which stop my restore. And do you know how to fix it
The short answer is: DO NOT use GNU's tar for incremental backups.
The long answer is, - there is pretty old bug that won't allows to restore incremental archives reliably. The bug still exists and reported multiple times since 2004.
References:
stackexchange 01,stackexchange 02,
Ubuntu-Lunchpad,
GNU 01, GNU 02, GNU 03,
Debian

amq-broker failed to start in ServiceMix

When starting ServiceMix, I'm getting this error on startup.
2017-06-21 16:24:51,647 | ERROR | ctivemq.server]) | configadmin | 3 - org.apache.felix.configadmin - 1.8.12 | [org.osgi.service.cm.ManagedServiceFactory, id=188, bundle=25/mvn:org.apache.activemq/activemq-osgi/5.14.3]: Updating configuration org.apache.activemq.server.598341f8-41a8-446f-b9f2-0de589a8a14c caused a problem: Cannot start the broker
org.osgi.service.cm.ConfigurationException: null : Cannot start the broker
at org.apache.activemq.osgi.ActiveMQServiceFactory.updated(ActiveMQServiceFactory.java:144)[25:org.apache.activemq.activemq-osgi:5.14.3]
at org.apache.felix.cm.impl.helper.ManagedServiceFactoryTracker.updated(ManagedServiceFactoryTracker.java:159)[3:org.apache.felix.configadmin:1.8.12]
at org.apache.felix.cm.impl.helper.ManagedServiceFactoryTracker.provideConfiguration(ManagedServiceFactoryTracker.java:93)[3:org.apache.felix.configadmin:1.8.12]
at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceFactoryUpdate.provide(ConfigurationManager.java:1620)[3:org.apache.felix.configadmin:1.8.12]
at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceFactoryUpdate.run(ConfigurationManager.java:1563)[3:org.apache.felix.configadmin:1.8.12]
at org.apache.felix.cm.impl.UpdateThread.run0(UpdateThread.java:141)[3:org.apache.felix.configadmin:1.8.12]
at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:109)[3:org.apache.felix.configadmin:1.8.12]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
Caused by: javax.management.InstanceAlreadyExistsException: org.apache.activemq:type=Broker,brokerName=amq-broker
at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)[:1.8.0_121]
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)[:1.8.0_121]
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)[:1.8.0_121]
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)[:1.8.0_121]
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)[:1.8.0_121]
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)[:1.8.0_121]
at org.apache.activemq.broker.jmx.ManagementContext.registerMBean(ManagementContext.java:408)[25:org.apache.activemq.activemq-osgi:5.14.3]
at org.apache.activemq.broker.jmx.AnnotatedMBean.registerMBean(AnnotatedMBean.java:72)[25:org.apache.activemq.activemq-osgi:5.14.3]
at org.apache.activemq.broker.BrokerService.startManagementContext(BrokerService.java:2584)[25:org.apache.activemq.activemq-osgi:5.14.3]
at org.apache.activemq.broker.BrokerService.start(BrokerService.java:608)[25:org.apache.activemq.activemq-osgi:5.14.3]
at org.apache.activemq.osgi.ActiveMQServiceFactory.updated(ActiveMQServiceFactory.java:140)[25:org.apache.activemq.activemq-osgi:5.14.3]
nothing has been deployed to it, and the only changes so far is that camel-http4, camel-jetty9, and camel-mongodb features have been installed.
What could be causing this and how can I fix it?
I've figured out the cause. ServiceMix was started, had the features installed, stopped, zipped, sent to a new machine, and unpacked in a different directory.
The problem was fixed by deleting the following folder
apache-servicemix-7.0.0\data\cache\bundle3\data\config\org\apache\activemq\server
Which contained ActiveMQ config information that was no longer valid after the server was moved.
Other systems appeared to also be affected by this. The proper way to fix it seems to be to either delete the data directory while karaf is offline, or to start karaf with the clean flag. (Note, this will wipe all changes from the base version though)
I have since moved on to using the karaf-maven-plugin for pre-setting up the server, and only installing the servicemix components I'm actually using.

Aerospike migration aborted

We tried to update aerospike version and have a strange problem.
We had 3 – node cluster version 3.5.4 and replication factor 2.
And we decide to update to 3.8.2.3, so we installed new version on new server and added new node on cluster, after migration we removed old node. All was perfect.
We decided to repeat our algorithm.
We added one more new node to cluster and saw that migration failed. We caught a lot of errors in the logs like below.
Jun 06 2016 22:43:26 GMT: WARNING (partition): (partition.c::2221) {namespace:3368} migrate rx aborted. During migrate receive start, duplicate partition contains primary version
In addition, we saw that count of replica objects less than origin objects, for example:
Our Migration config
So, how we can fix situation?
I see from your output that there aren't any migrations in progress. And the replica counts do not match primary counts.
Prior to 3.7.0.1 prior round migrations could interfere with subsequent rounds. I suspect that is what happened here. I recommend that you continue to upgrade and disregard these issues for now. If on completion the counts still do not match you will need to force the partitions to resync.
To force partitions to resync issue the following commands.
asadm -h [NODE IP] -e "cluster dun all";
sleep 10;
asadm -h [NODE IP] -e "cluster undun all";
This will cause all partition versions to diverge and resync.

Restart-ability in pentaho community edition

I am using pentaho 5.2 community edition for both my production environment and aware that there is no restartability (Checkpoint) in pentaho community edition. How do i setup restartability in pentaho community edition. Any references or link would be very useful.
There is not such feature in CE edition.
The idea of EE restart-ability was to have separate database table (like log tables - that exists in CE edition) and control on fail/success job entries based on this records. The gain is to automatically restart failed job entries and ability to show execution results over time.
For example - one can monitor job execution status code via console and restart job from console. In this case whole job will be restarted.
If checkpoints and restart-ability - job will be restarted from failed entry.
So if you have jobs that usually contains only one or two entries, if in case of restart-ability running time is not critical, or fail-handling implemented some other way - you may don't need this feature at all.
Once again - restart-ability only restart failed job entries. If for some case failed job entry made DB inconsistent - restart of job entry should fix it. If job entries rely on some initial state outside and initial outside state changed during job - for example some files was deleted restart will only restart job - not recover something that got unrecoverable broken.