Is tar ignoring --after-date option? - backup

I wanted to make an incremental backup with tar.
I have made a full backup on 2012-04-08 and later I wanted to backup all files created or changed after that date.
So I did something like this:
cd /directory/I/wanted/to/back/up
tar --newer 2012-04-08 -cvnf "/backup/dir/$(date +%F).tar"
After a while I realised that tar is archiving files that I know have not changed since last backup. I checked their modification dates and they should not be included.
I coudn't believe it so I made a little test:
cd ~
mkdir test
cd test
touch -t 201101010000 OLD
touch NEW
cd ..
tar -N 2012-01-01 -cvf test.tar ./test/*
tar -tf test.tar
ls -o ./test/
It clearly seems that tar is ignoring the -N, --newer and --after-date options. It has archived both files, even though the one I named OLD was created with a timestamp before 2012.
How can I help myself with this?

Your date format must be
tar --newer 20120408
for newer files than Apr 8, 2012. No "-"!
Or you can use
TWODAYSAGO=`(date --date '2 days ago' --rfc-3339=seconds)`
tar -cz --newer-mtime="${TWODAYSAGO}" -f bakfile.tgz /dir_to_backup

Related

RSYNC and folder hierachcy

After making a full forensic copy of a harddrive using dd, I would like to keep up with changes between the original and backup harddisc, therefore I started using rsync.
Whenever, I run
sync -a -v -n --progress /media/drive1 /media/drive2
the command would start listing all files contained in drive1. However, only a couple of them has changed after I did DD.
Trying that on a single folder
sync -a -v -n --progress /media/drive1/folder /media/drive2
works fine and just displays the new files in that folder - those which are not contained in /media/drive2/folder.
However, executing the command on the level of both volumes
sync -a -v -n --progress /media/drive1 /media/drive2
does not account for the differentials, contrary to the documentation which is everywhere available, but takes all files which are already on both drives.
What is my mistake?
The way rsync treats its source and destination paths is easy to get wrong. When you use the command:
sync -a -v -n --progress /media/drive1 /media/drive2
...it tries to sync the drive1 folder into drive2; that is, it creates and populates /media/drive2/drive1. When you add "/folder" to the source path, it works as expected because then it's trying to sync with /media/drive2/folder, which is what you want.
Fortunately, the solution is easy: add "/" to the end of the source path, which tells it to sync the contents of drive1 into drive2, rather than the folder itself:
sync -a -v -n --progress /media/drive1/ /media/drive2
BTW, I'd recommend adding --dry-run to make sure it's doing what you want before running it "for real". You'll probably also have to delete /media/drive2/drive1.

Extract huge tar.gz archives from S3 without copying archives to a local system

I'm looking for a way to extract huge dataset (18 TB+ found here https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) with this in mind I need the process to be fast (i.e. I don't want to spend twice the time for first copying and then extracting files) Also I don't want archives to take extra space not even one 20 gb+ archive.
Any thoughts on how one can achieve that?
If you can arrange to pipe the data straight into tar, it can uncompress and extract it without needing a temporary file.
Here is a example. First create a tar file to play with
$ echo abc >one
$ echo def >two
$ tar cvf test.tar
$ tar cvf test.tar one two
one
two
$ gzip test.tar
Remove the test files
$ rm one two
$ ls one two
ls: cannot access one: No such file or directory
ls: cannot access two: No such file or directory
Now extract the contents by piping the compressed tar file into the tar command.
$ cat test.tar.gz | tar xzvf -
one
two
$ ls one two
one two
The only part missing now is how to download the data and pipe it into tar. Assuming you can access the URL with wget, you can get it to send the data to stdout. So you end up with this
wget -qO- https://youtdata | tar xzvf -

No such file or directory from sh script

Looking for the origin of this error message:
Processing: +([^_]).flv
date: +([^_]).flv: No such file or directory
I started getting this at some point in the last few months (can't say when as I wasn't logging my cron output. I know, I know!).
When I originally wrote this, it worked ok for at least two months. I'm wondering if there was an sh update that broke it?
The script runs via crontab and gets all .flv files in the current directory without an underscore and processes each one. It then checks the modified date for files that have been created in the last 24 hours and runs the yamdi meta tag injector for .flv files.
It seems like it's not recognizing the pattern as a pattern and looking for it as an actual file to me. If I run this script from an ssh shell it works ok, it's only when running via cron that it gives this error.
shopt -s extglob
now=$(date +"%s")
for f in +([^_]).flv; do
echo "Processing: $f"
age=$(date -r "$f" +"%s")
calc=$(((now-age) / 60 / 60))
if(( calc < 24 )); then
echo "$f age=$calc"
yamdi -i "$f" -o "$f".seek
rm "$f"
cp "$f".seek "$f"
touch -d #$age "$f"
fi
done
This is most likely a problem of the wrong shell being used; make sure your script's first line represents the right shell:
#!/bin/bash
for bash, or whatever shell you wrote this for. You might want to check your environment variables that cron may set (that's a very common problem -- one assumes everything is set up correctly, but the environment that cron offers to scripts it executes is different).

trying to download a dataset from a website

I am trying to download a dataset from a website but I can't download the whole folder .. I have to download each file separately which will need a lot of time. I am wondering if there is anyway to download the whole folder at a time??
The website link: http://www.physionet.org/pn4/eegmmidb/
Use wget with the -r switch to turn on mirroring.
This command will do what you want:
wget --no-parent -r http://www.physionet.org/pn4/eegmmidb/
It'll produce a mirror copy of everything from that directory on down.
These two for cycles run in bash should do it:
for S in S{001..109}; do
mkdir ${S}
cd ${S}
for R in R{01..14}; do
file="http://www.physionet.org/pn4/eegmmidb/${S}/${S}${R}.edf"
wget "$file"
wget "${file}.event"
done
cd ..
done

Forcedly update workspace in Accurev

Is there any command to update my workspace forcedly in Accurev, directly replace the local files with the backed files, and don't care about the conflict files, modified files and so on?
I really miss the cvs command cvs update -C -d
According to question, I have similar issues. Usually I just use following commands:
accurev update -9
accurev pop -O -R .
accurev update
No, you will need to run a few operations. You can create a script to force update your workspace.
Basically, you will generate a list of all the modified, kept, overlap, member files, then purge those files, then update your workspace.
Check out the stat section in the CLI manual.
What You can do is delete all the local files from the file system and then do a:
accurev pop -R <path to local workspace directory>
I had similar issue; First take a back up of existing workspace, then Delete all the files in the local work-space folder. Click update button on Accurev. All files will be re-loaded on the workspace.
If you just want to undo all changes you have done in the workspace:
accurev stat -R -m -fl . | xargs -n 1 accurev purge
You can use similar command with rm / accurev pop to force refetch from backend. You can also vary the flags, -m for modified, -k for kept, -a for all.