I'm trying to set up a simple backup solution using LTFS and backintime (which uses rsync). The first backup works fine. Rsync writes all files to the tape. However, when I try to update the backup, the tape starts hunting around and performance gets horrible.
If I mount ltfs with -o syslogtrace, I can see what ltfs is doing in the syslog:
Backend readpos: (1, 2296175) FM = 10
FUSE release file
FUSE set times
FUSE chmod
FUSE rename
FUSE create file
Backend locate: (1, 63137)
FUSE chmod
Backend readpos: (1, 63137) FM = 6
Backend locate: (1, 2296176)
Backend readpos: (1, 2296176) FM = 10
FUSE flush
Backend locate: (1, 63138)
I'd interpret this as rsync reading from two locations simulataneously.
Why is rsync reading at all? It should be writing! How can I prevent the tape from hunting around?
The command used for rsync without includes and excludes to make it more readable:
sync --recursive --times --devices --specials --hard-links --human-readable --links --perms --executability --group --owner --info=progress2 --no-inc-recursive --whole-file --delete --delete-excluded -v -i --out-format=BACKINTIME: %i %n%L --link-dest=../../20230114-102905-656/backup --chmod=Du+wx / /media/tape/5/new_snapshot/backup
Related
our Redis server went down (unsure why, not a server outage) and trying to restart with the pre-existing aof file in place and it wouldn't start getting stuck on the line "Reading RDB preamble from AOF file..."
The server was originally started as follows:
docker run -d -p 6379:6379 --name redis -e REDIS_PASSWORD=# -v /var/redis/data:/bitnami/redis/data bitnami/redis:5.0.5-r5
BGREWRITEAOF was enabled so the aof was rewritten by Redis to keep the size lower
Moving the previous AOF file and starting a new container, Redis came up fresh.
I have tried to import the previous AOF using
redis-cli -a # --pipe < previousfile.aof
or
cat previousfil.aof | redis-cli - a # --pipe
But it fails due to the input being too large
redis-check-aof says my previous AOF is invalid, but I wonder it is because the file was compressed with BGREWRITEAOF
I'm stuck with how to move forward.
Can I revert the BGREWRITEAOF to get the raw file?
Can I break the file up in that format and import?
Are there better ways to import Redis data
Thanks
Scaleway recently launched GLACIER class storage "C14 Cold Storage Class"
They have a great plan of 75GB free and I'd like to take advantage of this using the restic backup tool.
To get this working I have successfully followed the S3 instructions for repository creation and uploading, with one caveat. I can not successfully pass the storage-class header as GLACIER.
Using awscliv2, I can successfully pass a header that looks very much like this from my local machine: aws s3 cp object s3://bucket/ --storage-class GLACIER
But with restic, having dug through some github issues, I can see an option to pass a -o flag. The linked issues resolution is not that clear to me so I have tried the following restic commands without successfully seeing the "GLACIER" class of storage label next to the files objects in the Scaleway bucket console:
restic -r s3:s3.fr-par.scw.cloud/restic-testing -o GLACIER --verbose backup ~/test.txt
restic -r s3:s3.fr-par.scw.cloud/restic-testing -o storage-class=GLACIER --verbose backup ~/test.txt
Can someone suggest another option?
I'm starting to use C14's GLACIER storage class with restic, and until now it seems be working very well.
I suggest to create the repository in the usual way with restic -r s3:s3.fr-par.scw.cloud/test-bucket init, which will create the config file and keys in the STANDARD storage class.
For backups, I'm using the command:
$ restic backup -r s3:s3.fr-par.scw.cloud/test-bucket -o s3.storage-class=GLACIER --host host /path
similar to what you did, apart the option is s3.storage-class and not storage-class.
In this way files in the data and snapshots directories are in GLACIER storage class, and you can add backups with no problem.
I can also mount the repository while data is in GLACIER class (I suppose all the info are taken from cache) so I can do restic mount /mnt/c14 and I can browse the files, also if I cannot copy them or see their content.
If I need to restore files, I restore all bucket in STANDARD class with s3cmd restore --recursive s3://test-bucket/ (see s3cmd), I test that all files are correctly in standard class with:
$ aws s3 ls s3://test-bucket --recursive | tr -s ' ' | cut -d' ' -f 4 | xargs -n 1 -I {} sh -c "aws s3api head-object --bucket unitedhost --key '{}' | jq -r .StorageClass" | grep --quiet GLACIER
which returns true if at least one file is in GLACIER class, so you have to wait this command to returns false.
Obviously a restore will need more time, but I'm using C14 glacier as a second or third backup, while using another restic repository in Backblaze B2 which is a warm storage.
In addition to vstefanoxx 's answer : Here is my workflow.
I setup the restic repository just like vstefanoxx.
Now, if you want to prune the repository... you cannot as the files are in glacier and restic needs read-write access to the bucket to prune.
What is interesting about Scaleway is that file transferts between glacier and standard class are free. So let's move the data back to the standard class :
s3cmd restore --recursive s3://test-bucket
And wait until the end of the process using the command given by vstefanoxx. Once your data is in the standard class it costs you five times more, so we have to be efficient :-)
So we now prune the repository:
restic prune -r s3:s3.fr-par.scw.cloud/test-bucket
And once it is finished, move everything (in fact data, index and snapshots but not keys) back to glacier:
s3cmd cp s3://test-bucket/data/ s3://test-bucket/data/ --recursive --storage-class=GLACIER
s3cmd cp s3://test-bucket/index/ s3://test-bucket/index/ --recursive --storage-class=GLACIER
s3cmd cp s3://test-bucket/snapshots/ s3://test-bucket/snapshots/ --recursive --storage-class=GLACIER
So we are now to a point where we have pruned the repository, trying to pay the least amount of money !
The chosen answer doesn't seem to work when doing incremental backups. I went with a different solution.
I set up a normal bucket, initialized with your usual restic init. Then I set up the following lifetime rule:
<?xml version="1.0" ?>
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Rule>
<ID>data-to-glacier</ID>
<Filter>
<Prefix>data/</Prefix>
</Filter>
<Status>Enabled</Status>
<Transition>
<Days>0</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
</Rule>
</LifecycleConfiguration>
Days is set to 0, which means that the rule will be applied to all files. Rules are not applied continuously though, they're applied once a day at midnight UTC.
This rule will only apply to the files in data/, which are the big files.
This rule description is supposed to be used with s3cmd but you can also do it from the dashboard if you prefer a GUI.
how to perform precopy memory migration in LXC /LXD so that I can perform live migration from one host to another?How do I set to precopy migration in CRIU?
The Pre-copy memory migration for Live migration with CRUI
Live Migration of Linux Containers between hosts (there is migration script), LXC and CRUI:
You should have built and installed a recent (>= 1.3.1) version of
CRIU.
LXC upstream has begun to integrate checkpoint/restore support through
the lxc-checkpoint tool. This functionality has been in the recent
released version of LXC---LXC 1.1.0 , you can install the LXC 1.1.0 or
you can check out the development version on Ubuntu by doing:
sudo add-apt-repository ppa:ubuntu-lxc/daily
sudo apt-get update
sudo apt-get install lxc
And add the following lines (as above) to LXC container config:
cat | sudo tee -a /var/lib/lxc/u1/config << EOF
# hax for criu
lxc.console = none
lxc.tty = 0
lxc.cgroup.devices.deny = c 5:1 rwm
EOF
checkpoint the container:
lxc-checkpoint -s -D /tmp/checkpoint -n u1
At this point, the container's state is stored in /tmp/checkpoint, and the filesystem is in /var/lib/lxc/u1/rootfs. You can restore the container by doing:
lxc-checkpoint -r -D /tmp/checkpoint -n u1
PS:
You can do live migration for processes:
Dump
Take tasks you're about to migrate and dump them into some place, asking criu to leave them in stopped state after dump:
criu dump --tree <pid> --images-dir <path-to-existing-directory> --leave-stopped
The directory you put images to can reside on the shared file-system if you're using one. In this case you can skip the Copy step and proceed to Restore.
Copy
Copy images to destination node:
scp -r <path-to-images-dir> <dst>:/<path-to-images>
Restore
Go to the destination node and restore the apps from images on it:
criu restore --tree <pid> --images-dir <path-to-images>
I have a Arduino Yun and want setup the server for Yun.
So what I want is to copy a folder that contain a py file and a index.html to my Yun
I used mac terminal to do this operation
the command looks like this
scp -r /Users/gudi/Desktop/LobsterHeartRate root#192.168.240.1:/mnt/sda1
and then terminal asked for the password
after I typed, it shows
scp: /mnt/sda1/LobsterHeartRate: Not a directory
I didn't type /mnt/sda1/LobsterHeartRate why it shows this error
Your code
scp -r /Users/gudi/Desktop/LobsterHeartRate root#192.168.240.1:/mnt/sda1
requires that the remote directory /mnt/sda1 exists. This looks like it is not true in your case. Check it using ssh root#192.168.240.1 ls /mnt/sda1.
scp is simple tool and it does not allow you to rename directories on the fly and the target directory must exists. You might try
scp -r /Users/gudi/Desktop/LobsterHeartRate root#192.168.240.1:/mnt/
ssh root#192.168.240.1 mv /mnt/LobsterHeartRate /mnt/sda1
or so, if it will suit your needs. But copying more files, rsync is usually more suitable. Check its manual page and give it a try next time.
As #Jens Höpken notes, your post is a bit sparse. But trying to read between the lines of your post I suspect that LobsterHeartRate is a DIRECTORY on your local system but a FILE named LobsterHeartRate in your target system. This might be happening right at the top of the directory tree, or perhaps you have directories/files of the same name further down the tree. scp -rv might help resolve any confusions here.
Beware: scp -r resolves symbolic links. If you want to preserve symlinks you need to do something else. For historic reasons I use the following, though cpio with a find front-end opens up interesting possibilities for fine-grained file selections.
( cd /Users/gudi/Desktop && tar -cf - LobsterHeartRate ) |
ssh root#192.168.240.1 'cd /mnt/sda1 && tar -xf -'
For a safe "dry run" you could change the -xf to a -tf. The && chains are required to prevent bad things from happening if any prior command fails.
Disclaimer: any debugging is left as an exercise for the student.
I was wondering if anybody could help me with this issue in deploying a spark cluster using the bdutil tool.
When the total number of cores increase (>= 1024), it failed all the time with the following reasons:
Some machine is never sshable, like "Tue Dec 8 13:45:14 PST 2015: 'hadoop-w-5' not yet sshable (255); sleeping"
Some nodes fail with an "Exited 100" error when deploying spark worker nodes, like "Tue Dec 8 15:28:31 PST 2015: Exited 100 : gcloud --project=cs-bwamem --quiet --verbosity=info compute ssh hadoop-w-6 --command=sudo su -l -c "cd ${PWD} && ./deploy-core-setup.sh" 2>>deploy-core-setup_deploy.stderr 1>>deploy-core-setup_deploy.stdout --ssh-flag=-tt --ssh-flag=-oServerAliveInterval=60 --ssh-flag=-oServerAliveCountMax=3 --ssh-flag=-oConnectTimeout=30 --zone=us-central1-f"
In the log file, it says:
hadoop-w-40: ==> deploy-core-setup_deploy.stderr <==
hadoop-w-40: dpkg-query: package 'openjdk-7-jdk' is not installed and no information is available
hadoop-w-40: Use dpkg --info (= dpkg-deb --info) to examine archive files,
hadoop-w-40: and dpkg --contents (= dpkg-deb --contents) to list their contents.
hadoop-w-40: Failed to fetch http://httpredir.debian.org/debian/pool/main/x/xml-core/xml-core_0.13+nmu2_all.deb Error reading from server. Remote end closed connection [IP: 128.31.0.66 80]
hadoop-w-40: E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
I tried 16-core 128-nodes, 32-core 64-nodes, 32-core 32-nodes and other over 1024-core configurations, but either the above Reason 1 or 2 will show up.
I also tried to modify the ssh-flag to change the ConnectTimeout to 1200s, and change bdutil_env.sh to set the polling interval to 30s, 60s, ..., none of them works. There will be always some nodes which fail.
Here is one of the configurations that I used:
time ./bdutil \
--bucket $BUCKET \
--force \
--machine_type n1-highmem-32 \
--master_machine_type n1-highmem-32 \
--num_workers 64 \
--project $PROJECT \
--upload_files ${JAR_FILE} \
--env_var_files hadoop2_env.sh,extensions/spark/spark_env.sh \
deploy
To summarize some of the information that came out from a separate email discussion, as IP mappings change and different debian mirrors get assigned, there can be occasional problems where the concurrent calls to apt-get install during a bdutil deployment can either overload some unbalanced servers or trigger DDOS protections leading to deployment failures. These do tend to be transient, and at the moment it appears I can deploy large clusters in zones like us-east1-c and us-east1-d successfully again.
There are a few options you can take to reduce the load on the debian mirrors:
Set MAX_CONCURRENT_ASYNC_PROCESSES to a much smaller value than the default 150 inside bdutil_env.sh, such as 10 to only deploy 10 at a time; this will make the deployment take longer, but would lighten the load as if you just did several back-to-back 10-node deployments.
If the VMs were successfully created but the deployment steps fail, instead of needing to retry the whole delete/deploy cycle, you can try ./bdutil <all your flags> run_command -t all -- 'rm -rf /home/hadoop' followed by ./bdutil <all your flags> run_command_steps to just run through the whole deployment attempt.
Incrementally build your cluster using resize_env.sh; initially set --num_workers 10 and deploy your cluster, and then edit resize_env.sh to set NEW_NUM_WORKERS=20, and run ./bdutil <all your flags> -e extensions/google/experimental/resize_env.sh deploy and it will only deploy the new workers 10-20 without touching those first 10. Then you just repeat, adding another 10 workers to NEW_NUM_WORKERS each time. If a resize attempt fails, you simply ./bdutil <all your flags> -e extensions/google/experimental/resize_env.sh delete to only delete those extra workers without affecting the ones you already deployed successfully.
Finally, if you're looking for more reproducible and optimized deployments, you should consider using Google Cloud Dataproc, which lets you use the standard gcloud CLI to deploy cluster, submit jobs, and further manage/delete clusters without needing to remember your bdutil flags or keep track of what clusters you have on your client machine. You can SSH into Dataproc clusters and use them basically the same as bdutil clusters, with some minor differences, like Dataproc DEFAULT_FS being HDFS so that any GCS paths you use should fully-specify the complete gs://bucket/object name.