Backup files I get with gitlab-rake are tar files how can I get tar.gz ?
Here the files::
root#gitlab:~# ll /mnt/backup-git/ -h
total 1.9G
-rw------- 1 git git 57M Nov 29 15:57 1480431448_gitlab_backup.tar
-rw------- 1 git git 57M Nov 29 15:57 1480431473_gitlab_backup.tar
-rw------- 1 git git 452M Nov 30 02:00 1480467623_gitlab_backup.tar
Here my configuration values for the backup::
$ grep -i backup /etc/gitlab/gitlab.rb | grep -v '^#'
gitlab_rails['backup_path'] = "/mnt/backup-git/"
gitlab_rails['backup_keep_time'] = 604800
To create them, following the documentation here, (omnibus installation):
root#gitlab:~# crontab -l | grep -v '^#'
0 2 * * * /opt/gitlab/bin/gitlab-rake gitlab:backup:create CRON=1
It doesn't really make sense to compress the gitlab backup tar files. The gitlab backup tar files are the final tarball made during the backup process and the contents are all files compressed during the backup process. You can read more here
Related
Using:
podman version 4.2.0
AlmaLinux 8.7
I've created an image based on redhat/ubi8 with the following Dockerfile:
FROM docker.io/redhat/ubi8
RUN dnf install -y gcc-c++ cmake python39 openssh git
RUN useradd -ms /bin/bash foobar -g users
USER foobar
WORKDIR /home/foobar/
RUN mkdir -p .ssh
$ docker build -t mount_test_image .
I run the image from a directory that contains a directory ssh, and I want to mount that directory to /home/foobar/.ssh with ownership of foobar.users
$ ls -l
-rw-r--r--. 1 host_user users 269 Dec 7 09:10 Dockerfile
drwxrwxr-x. 2 host_user users 18 Dec 2 10:41 ssh
docker run -it -d --rm --mount type=bind,src=ssh,target=/home/foobar/.ssh --name=mount_test mount_test_image
However when I enter the container via
docker exec -it mount_test '/bin/sh'
The home directory looks like this:
drwx------. 1 foobar users 18 Dec 7 17:10 .
drwxr-xr-x. 1 root root 21 Dec 7 17:10 ..
-rw-r--r--. 1 foobar users 18 Jun 20 11:31 .bash_logout
-rw-r--r--. 1 foobar users 141 Jun 20 11:31 .bash_profile
-rw-r--r--. 1 foobar users 376 Jun 20 11:31 .bashrc
drwxrwxr-x. 2 root root 18 Dec 2 18:41 .ssh
I obviously get a "permission denied" when trying to access that directory.
sh-4.4$ ls /home/foobar/.ssh
ls: cannot open directory '/home/foobar/.ssh': Permission denied
I tried changing the ownership of the directory on the host to match the uid of the container user, but then it just looks like this:
drwxrwxr-x. 2 nobody root 18 Dec 2 18:41 .ssh
My host user uid:gid is 501:100 and the container user is 1000:100. Right now I'm just trying to generate an ssh key to upload to bitbucket, but this seems like a simple feature a container should be have. All the tutorials and examples just stop after the --mount command instruction and say "there ya go!". What good is the mount point if you can't read/write it?
EDIT:
I tried on Archlinux using docker instead of podman and it works like one would expect with both -v and --mount. The owner of the mounted directory in the container matches the uid and gid of the host. Is this then a bug in podman or is it just done differently?
You are using a non-root user (foobar) in a rootless container. You must use --userns=keep-id for the container user to see the mounted volumes.
https://github.com/containers/podman/blob/main/docs/tutorials/rootless_tutorial.md#using-volumes
How the gitlab-ci cache is working on docker runner?
What is /cache directory?
What is cache_dir?
Where and how files matching the "paths" in "cache" gitlab-ci.yml are stored?
Volume mounted to /cache directory is created automatically on gitlab-runner installation and managed by cache_dir setting
more about cache_dir:
https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-section
https://gitlab.com/gitlab-org/gitlab-runner/blob/main/docs/executors/docker.md#the-builds-and-cache-storage
If you modify the /cache storage path, you also need to make sure to mark this
directory as persistent by defining it in volumes = ["/my/cache/"] under the
[runners.docker] section in config.toml.
TLDR
/cache dir is different from cache config in gitlab-ci.yml
/cache dir in job container is where the cached files are stored
files matching to cache config in gitlab-ci.yml are copied to /cache/$CI_PROJECT_NAMESPACE/$CI_PROJECT_NAME/<cache-key>-<cache-number> at the end of job
The "Clear Runner Caches" button in your project "Pipelines" page schedules TO NOT extract /cache/$CI_PROJECT_NAMESPACE/$CI_PROJECT_NAME/<cache-key>-<cache-number>/cache.zip to dir specified in cache config in gitlab-ci.yml (Instead of "removing content of /cache folder as I though at first)
P.S.
There is container named gitlab-runner-cache created on machine with gitlab-runner (https://gitlab.com/gitlab-org/gitlab-runner/blob/af343971874198a1923352107409583b78e8aa80/executors/docker/executor_docker.go#L382)
(Seems like) This container is used to create anonymous volume where /cache data is stored. After the anonymous volume is created this container is stopped.
The job containers (meaning container were your tests typically run) mounts this anonymous volume
Proofs
HAVING gitlab-ci.yml
image: srghma/docker-nixos-with-git-crypt
cache:
key: "test00000" # to reset cache - change this key OR clear cache in project settings page
paths:
- .mycache # gitlab allows only cache dirs that are relative to project root OR /cache (created automatically)
testtest:
script:
- nix-env -i tree
- tree --dirsfirst -L 4 /cache
- ls -al ./.mycache || true
- echo "test" > /cache/test
- mkdir -p ./.mycache
- echo "test" > ./.mycache/test
- tree --dirsfirst -L 4 /cache
- ls -al ./.mycache || true
Output:
on first run
Running with gitlab-runner 11.6.0 (f100a208)
on srghma_gitlab_runner 9b3980da
Using Docker executor with image srghma/docker-nixos-with-git-crypt ...
Pulling docker image srghma/docker-nixos-with-git-crypt ...
Using docker image sha256:ad3491aae178f629df713e0719750cc445b4881702b6b04b7cf325121f0032bf for srghma/docker-nixos-with-git-crypt ...
Running on runner-9b3980da-project-222-concurrent-0 via myrunner.com...
Fetching changes...
Removing .mycache/
HEAD is now at 675caa7 feat: cache update
From https://gitlab.com/srghma/myproject
675caa7..3d1e223 nix -> origin/nix
Checking out 3d1e2237 as nix...
Skipping Git submodules setup
Checking cache for test00000-11...
No URL provided, cache will be not downloaded from shared cache server. Instead a local version of cache will be extracted.
Successfully extracted cache
$ nix-env -i tree
installing 'tree-1.8.0'
these paths will be fetched (0.03 MiB download, 0.09 MiB unpacked):
/nix/store/dhfq0dsg9a0j5ai78bmh5qlrla8wvcxz-tree-1.8.0
copying path '/nix/store/dhfq0dsg9a0j5ai78bmh5qlrla8wvcxz-tree-1.8.0' from 'https://cache.nixos.org'...
building '/nix/store/dankqr2x4g5igc4w7lw9xqnn7lcy4f7a-user-environment.drv'...
created 233 symlinks in user environment
$ tree --dirsfirst -L 4 /cache
/cache
0 directories, 0 files
$ ls -al ./.mycache || true
$ echo "test" > /cache/test
ls: ./.mycache: No such file or directory
$ mkdir -p ./.mycache
$ echo "test" > ./.mycache/test
$ tree --dirsfirst -L 4 /cache
/cache
`-- test
0 directories, 1 file
$ ls -al ./.mycache || true
total 12
drwxr-xr-x 2 root root 4096 Feb 24 11:44 .
drwxrwxrwx 20 root root 4096 Feb 24 11:44 ..
-rw-r--r-- 1 root root 5 Feb 24 11:44 test
Creating cache test00000-11...
.mycache: found 2 matching files
No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally.
Created cache
Job succeeded
on second run
Running with gitlab-runner 11.6.0 (f100a208)
on srghma_gitlab_runner 9b3980da
Using Docker executor with image srghma/docker-nixos-with-git-crypt ...
Pulling docker image srghma/docker-nixos-with-git-crypt ...
Using docker image sha256:ad3491aae178f629df713e0719750cc445b4881702b6b04b7cf325121f0032bf for srghma/docker-nixos-with-git-crypt ...
Running on runner-9b3980da-project-222-concurrent-0 via myrunner.com...
Fetching changes...
Removing .mycache/
HEAD is now at 3d1e223 feat: cache update
Checking out 3d1e2237 as nix...
Skipping Git submodules setup
Checking cache for test00000-11...
No URL provided, cache will be not downloaded from shared cache server. Instead a local version of cache will be extracted.
Successfully extracted cache
$ nix-env -i tree
installing 'tree-1.8.0'
these paths will be fetched (0.03 MiB download, 0.09 MiB unpacked):
/nix/store/dhfq0dsg9a0j5ai78bmh5qlrla8wvcxz-tree-1.8.0
copying path '/nix/store/dhfq0dsg9a0j5ai78bmh5qlrla8wvcxz-tree-1.8.0' from 'https://cache.nixos.org'...
building '/nix/store/dankqr2x4g5igc4w7lw9xqnn7lcy4f7a-user-environment.drv'...
created 233 symlinks in user environment
$ tree --dirsfirst -L 4 /cache
/cache
|-- srghma
| `-- myproject
| `-- test00000-11
| `-- cache.zip
`-- test
3 directories, 2 files
$ ls -al ./.mycache || true
total 12
drwxr-xr-x 2 root root 4096 Feb 24 11:44 .
drwxrwxrwx 20 root root 4096 Feb 24 11:44 ..
-rw-r--r-- 1 root root 5 Feb 24 11:44 test
$ echo "test" > /cache/test
$ mkdir -p ./.mycache
$ echo "test" > ./.mycache/test
$ tree --dirsfirst -L 4 /cache
/cache
|-- srghma
| `-- myproject
| `-- test00000-11
| `-- cache.zip
`-- test
3 directories, 2 files
$ ls -al ./.mycache || true
total 12
drwxr-xr-x 2 root root 4096 Feb 24 11:44 .
drwxrwxrwx 20 root root 4096 Feb 24 11:44 ..
-rw-r--r-- 1 root root 5 Feb 24 11:44 test
Creating cache test00000-11...
.mycache: found 2 matching files
No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally.
Created cache
Job succeeded
after you clear cache by clicking on "Clear Runner Caches" in your project "Pipelines" page
Running with gitlab-runner 11.6.0 (f100a208)
on srghma_gitlab_runner 9b3980da
Using Docker executor with image srghma/docker-nixos-with-git-crypt ...
Pulling docker image srghma/docker-nixos-with-git-crypt ...
Using docker image sha256:ad3491aae178f629df713e0719750cc445b4881702b6b04b7cf325121f0032bf for srghma/docker-nixos-with-git-crypt ...
Running on runner-9b3980da-project-222-concurrent-0 via myrunner.com...
Fetching changes...
Removing .mycache/
HEAD is now at 3d1e223 feat: cache update
Checking out 3d1e2237 as nix...
Skipping Git submodules setup
Checking cache for test00000-12...
No URL provided, cache will be not downloaded from shared cache server. Instead a local version of cache will be extracted.
Successfully extracted cache
$ nix-env -i tree
installing 'tree-1.8.0'
these paths will be fetched (0.03 MiB download, 0.09 MiB unpacked):
/nix/store/dhfq0dsg9a0j5ai78bmh5qlrla8wvcxz-tree-1.8.0
copying path '/nix/store/dhfq0dsg9a0j5ai78bmh5qlrla8wvcxz-tree-1.8.0' from 'https://cache.nixos.org'...
building '/nix/store/dankqr2x4g5igc4w7lw9xqnn7lcy4f7a-user-environment.drv'...
created 233 symlinks in user environment
$ tree --dirsfirst -L 4 /cache
/cache
|-- srghma
| `-- myproject
| `-- test00000-11
| `-- cache.zip
`-- test
3 directories, 2 files
$ ls -al ./.mycache || true
ls: ./.mycache: No such file or directory
$ echo "test" > /cache/test
$ mkdir -p ./.mycache
$ echo "test" > ./.mycache/test
$ tree --dirsfirst -L 4 /cache
/cache
|-- srghma
| `-- myproject
| `-- test00000-11
| `-- cache.zip
`-- test
3 directories, 2 files
$ ls -al ./.mycache || true
total 12
drwxr-xr-x 2 root root 4096 Feb 24 11:45 .
drwxrwxrwx 20 root root 4096 Feb 24 11:45 ..
-rw-r--r-- 1 root root 5 Feb 24 11:45 test
Creating cache test00000-12...
.mycache: found 2 matching files
No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally.
Created cache
Job succeeded
Doing my gitlab backup the backuped files have:
no timestamp
should be like this: The filename will be [TIMESTAMP]_gitlab_backup.tar
here the files::
root#gitlab:~# ll /mnt/backup-git/ -h
total 1.9G
-rw------- 1 git git 57M Nov 29 15:57 1480431448_gitlab_backup.tar
-rw------- 1 git git 57M Nov 29 15:57 1480431473_gitlab_backup.tar
-rw------- 1 git git 452M Nov 30 02:00 1480467623_gitlab_backup.tar
Here my configuration values for the backup::
$ grep -i backup /etc/gitlab/gitlab.rb | grep -v '^#'
gitlab_rails['backup_path'] = "/mnt/backup-git/"
gitlab_rails['backup_keep_time'] = 604800
To create them, following the documentation here, (omnibus installation):
root#gitlab:~# crontab -l | grep -v '^#'
0 2 * * * /opt/gitlab/bin/gitlab-rake gitlab:backup:create CRON=1
The files clearly have a timestamp already:
1480431448_gitlab_backup.tar
The bold is the unix time for the backup
I'm trying to install HTTPD in docker, I wrote a dockerfile like this:
FROM centos
VOLUME /var/log/httpd
VOLUME /etc/httpd
VOLUME /var/www/html
# Update Yum Repostory
RUN yum clean all && \
yum makecache fast && \
yum -y update && \
yum -y install httpd
RUN yum clean all
EXPOSE 80
CMD /usr/sbin/httpd -D BACKGROUND && tail -f /var/log/httpd/access_log
it works if I run the image without host volumes, but failed if I use parameter:
--volume /data/httpd/var/www/html:/var/www/html --volume /data/httpd/var/log:/var/log --volume /data/httpd/etc:/etc/httpd
the error message is:
httpd: Could not open configuration file /etc/httpd/conf/httpd.conf: No such file or directory
I checked the mount point which is empty:
# ll /data/httpd/etc/
total 0
But if I don't use "volume" by default docker copys over files to a temp folder:
# ll /var/lib/docker/volumes/04f083887e503c6138a65b300a1b40602d227bb2bbb58c69b700f6ac753d1c34/_data
total 4
drwxr-xr-x. 2 root root 35 Nov 3 03:16 conf
drwxr-xr-x. 2 root root 78 Nov 3 03:16 conf.d
drwxr-xr-x. 2 root root 4096 Nov 3 03:16 conf.modules.d
lrwxrwxrwx. 1 root root 19 Nov 3 03:16 logs -> ../../var/log/httpd
lrwxrwxrwx. 1 root root 29 Nov 3 03:16 modules -> ../../usr/lib64/httpd/modules
lrwxrwxrwx. 1 root root 10 Nov 3 03:16 run -> /run/httpd
So I'm confused, why docker refused to copy them to the named location? and how to fix this problem?
This is a documented behavior indeed:
Volumes are initialized when a container is created. If the container’s
base image contains data at the specified mount point, that existing data
is copied into the new volume upon volume initialization. (Note that this
does not apply when mounting a host directory.)
i.e. when you mount the /etc/httpd volume --volume /data/httpd/etc:/etc/httpd, no data will be copied.
You can also see https://github.com/docker/docker/pull/9092 for a more detailed discussion on why it works this way (in case you are interested).
A usual workaround for this is to copy your initial data, to the volume folder (from within the container), inside your ENTRYPOINT or CMD script,
in case it is empty.
Note that your initial dataset must be kept outside the volume folder (e.g. as .tar file in /opt), for this to work, as the volume folder will be shadowed by the host folder mounted over it.
Given below is a sample Dockerfile and Script, which demonstrate the behavior:
Sample Dockerfile
FROM debian:stable
RUN mkdir -p /opt/test/; touch /opt/test/initial-data-file
VOLUME /opt/test
Sample script (try various volume mappings)
#Build image
>docker build -t volumetest .
Sending build context to Docker daemon 2.56 kB
Step 0 : FROM debian:stable
---> f7504c16316c
Step 1 : RUN mkdir -p /opt/test/; touch /opt/test/initial-data-file
---> Using cache
---> 1ea0475e1a18
Step 2 : VOLUME /opt/test
---> Using cache
---> d8d32d849b82
Successfully built d8d32d849b82
#Implicit Volume mapping (as defined in Dockerfile)
>docker run --rm=true volumetest ls -l /opt/test
total 0
-rw-r--r-- 1 root root 0 Nov 4 18:26 initial-data-file
#Explicit Volume mapping
> docker run --rm=true --volume /opt/test volumetest ls -l /opt/test/
total 0
-rw-r--r-- 1 root root 0 Nov 4 18:26 initial-data-file
#Explicitly Mounted Volume
>mkdir test
>docker run --rm=true --volume "$(pwd)/test/:/opt/test" volumetest ls -l /opt/test
total 0
And here is a simple entrypoint script, illustrating a possible workaround:
#!/bin/bash
VOLUME=/opt/test
DATA=/opt/data-volume.tar.gz
if [[ -n $(find "$VOLUME" -maxdepth 0 -empty) ]]
then
echo Preseeding VOLUME $VOLUME with data from $DATA...
tar -C "$VOLUME" -xvf "$DATA"
fi
"$#"
add the following to the Dockerfile
COPY data-volume.tar.gz entrypoint /opt/
ENTRYPOINT ["/opt/entrypoint"]
First run:
>docker run --rm=true --volume "$(pwd)/test/:/opt/test" volumetest ls -l /opt/test
Preseeding VOLUME /opt/test with data from /opt/data-volume.tar.gz...
preseeded-data
total 0
-rw-r--r-- 1 1001 users 0 Nov 4 18:43 preseeded-data
Subsequent runs:
>docker run --rm=true --volume "$(pwd)/test/:/opt/test" volumetest ls -l /opt/test
ls -l /opt/test
total 0
-rw-r--r-- 1 1001 users 0 Nov 4 18:43 preseeded-data
Note, that the volume folder will only be populated with data,
if it was completely empty before.
I have found the question How to determine if data is valid tar file without a file?, but I was wondering: is there a ready made command line solution?
What about just getting a listing of the tarball and throw away the output, rather than decompressing the file?
tar -tzf my_tar.tar.gz >/dev/null
Edited as per comment. Thanks zrajm!
Edit as per comment. Thanks Frozen Flame! This test in no way implies integrity of the data. Because it was designed as a tape archival utility most implementations of tar will allow multiple copies of the same file!
you could probably use the gzip -t option to test the files integrity
http://linux.about.com/od/commands/l/blcmdl1_gzip.htm
from: http://unix.ittoolbox.com/groups/technical-functional/shellscript-l/how-to-test-file-integrity-of-targz-1138880
To test the gzip file is not corrupt:
gunzip -t file.tar.gz
To test the tar file inside is not corrupt:
gunzip -c file.tar.gz | tar -t > /dev/null
As part of the backup you could probably just run the latter command and
check the value of $? afterwards for a 0 (success) value. If either the tar
or the gzip has an issue, $? will have a non zero value.
If you want to do a real test extract of a tar file without extracting to disk, use the -O option. This spews the extract to standard output instead of the filesystem. If the tar file is corrupt, the process will abort with an error.
Example of failed tar ball test...
$ echo "this will not pass the test" > hello.tgz
$ tar -xvzf hello.tgz -O > /dev/null
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
$ rm hello.*
Working Example...
$ ls hello*
ls: hello*: No such file or directory
$ echo "hello1" > hello1.txt
$ echo "hello2" > hello2.txt
$ tar -cvzf hello.tgz hello[12].txt
hello1.txt
hello2.txt
$ rm hello[12].txt
$ ls hello*
hello.tgz
$ tar -xvzf hello.tgz -O
hello1.txt
hello1
hello2.txt
hello2
$ ls hello*
hello.tgz
$ tar -xvzf hello.tgz
hello1.txt
hello2.txt
$ ls hello*
hello1.txt hello2.txt hello.tgz
$ rm hello*
You can also check contents of *.tag.gz file using pigz (parallel gzip) to speedup the archive check:
pigz -cvdp number_of_threads /[...]path[...]/archive_name.tar.gz | tar -tv > /dev/null
A nice option is to use tar -tvvf <filePath> which adds a line that reports the kind of file.
Example in a valid .tar file:
> tar -tvvf filename.tar
drwxr-xr-x 0 diegoreymendez staff 0 Jul 31 12:46 ./testfolder2/
-rw-r--r-- 0 diegoreymendez staff 82 Jul 31 12:46 ./testfolder2/._.DS_Store
-rw-r--r-- 0 diegoreymendez staff 6148 Jul 31 12:46 ./testfolder2/.DS_Store
drwxr-xr-x 0 diegoreymendez staff 0 Jul 31 12:42 ./testfolder2/testfolder/
-rw-r--r-- 0 diegoreymendez staff 82 Jul 31 12:42 ./testfolder2/testfolder/._.DS_Store
-rw-r--r-- 0 diegoreymendez staff 6148 Jul 31 12:42 ./testfolder2/testfolder/.DS_Store
-rw-r--r-- 0 diegoreymendez staff 325377 Jul 5 09:50 ./testfolder2/testfolder/Scala.pages
Archive Format: POSIX ustar format, Compression: none
Corrupted .tar file:
> tar -tvvf corrupted.tar
tar: Unrecognized archive format
Archive Format: (null), Compression: none
tar: Error exit delayed from previous errors.
I have tried the following command and they work well.
bzip2 -t file.bz2
gunzip -t file.gz
However, we can found these two command are time-consuming. Maybe we need some more quick way to determine the intact of the compress files.
These are all very sub-optimal solutions. From the GZIP spec
ID2 (IDentification 2)
These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
(0x8b, \213), to identify the file as being in gzip format.
Has to be coded into whatever language you're using.
> use the -O option. [...] If the tar file is corrupt, the process will abort with an error.
Sometimes yes, but sometimes not. Let's see an example of a corrupted file:
echo Pete > my_name
tar -cf my_data.tar my_name
# // Simulate a corruption
sed < my_data.tar 's/Pete/Fool/' > my_data_now.tar
# // "my_data_now.tar" is the corrupted file
tar -xvf my_data_now.tar -O
It shows:
my_name
Fool
Even if you execute
echo $?
tar said that there was no error:
0
but the file was corrupted, it has now "Fool" instead of "Pete".