how to merge gz files into a tar.gz without decompression? - gzip

I have a program that only consumes uncomprssed files. I have a couple of .gz files and my goal is to feed the concatenation of them to that program. If I had a tar.gz file I could mount the tar.gz archive with the archivemont command.
I know I can concatenate the gz files:
cat a.gz b.gz > c.gz
But there is no way, that I am aware of, to mount a gz file. I don't have enough disk space to uncompress all of the files and the tar command do not accept stdin as the input so I cannot do this:
zcat *.gz | tar - | gzip > file.tar.gz

It is not clear what operations you need to perform on the tar.gz archive. But from what I can discern, tar.gz is not the format for this application. The entire archive stream is compressed by gzip, so you can't pull out or change a file without having to re-compress everything after it. The tar.gz stream can be specially prepared to keep the compression of each file independent, but then you might as well use the .zip format, which is better suited for random access and manipulation of individual files in the archive.
To address one of your comments, tar can in fact accept stdin as input. See pipe tar extract into tar create for some examples, where both GNU tar and BSD tar (with different syntax) can take in a tar file from stdin, delete entries, and write a new tar file to stdout.

Related

tar gzip of directory leaves old directory there and tar.gz is much larger

New to zipping files here. I used the following command to gzip a bunch of large files within a single directory:
tar -cvzf archive-RAW-MAFs.tar.gz RAW_MAFS/
When this was done, I noticed that it left the old directory tree where it was, and that the tar.gz was much larger. I'm not sure what the original size of the directory was as I didn't check it beforehand, but I think it was much larger than stated here...
-rw-r----- 1 xxx xxxx 21218045403 May 8 21:39 archive-RAW-MAFs.tar.gz
drwxr-s--- 34 xxx xxxx 4096 May 8 20:21 RAW_MAFS
I can also traverse through the original RAW_MAFs directory and open files. Ideally, I would like only the zipped file, because I don't need to touch this data again for a while and want to save as much as I can.
I'll take the second question first.
The original file are still there because you haven't told tar to delete them. Add the --remove-files option to the command line to get tar to do what you want
tar -cvzf archive-RAW-MAFs.tar.gz RAW_MAFS/ --remove-files
Regarding the size of the RAW_MAFS directory tree. If it hasn't been deleted yet can you not check their sizes?
If the original files in RAW_MAFS are already compressed, then compressing again when you put them in your tar file will increase the size. Can you provide more details on what you are storing in the tar file?
If you are storing compressed files in the tar, try running without the z option.

make: Convert .pdf files in a folder to .txt files without using loops

I want to convert all .pdf files in a folder into .txt files with make without using loops and with the help of pdftotext. The new .txt files shall keep the original file name. Additionally, the new file gets a new file extension.
Example:
test1.pdf --> test2.newextension
Everything's written within a Makefile file. I start the conversion by typing in "make converted" in my console.
My first (miserable) attempt was:
converted:
#ls *.pdf | -n1 pdftotext
However, there are 3 things still missing with it:
It doesn't repeat the process
The new file extension isn't being added to the newly created files.
Is the original name being kept or being given to the pdftotext function?
I used to program with the bash and Makefile is completely new to me. I'd be thankful for answers!
You can refer to this simple example:
SOURCES ?= $(wildcard *.pdf)
%.txt: %.pdf
pdftotext $< $#
all: $(SOURCES:%.pdf=%.txt)
clean:
rm -f *.txt
If no SOURCE was defined, it'll just try to get all *.pdf files from the local directory.
Then we define a pattern rule teaching make how to make *.txt out of *.pdf.
We also define target all that tried to make a txt file for each .pdf file in SOURCES variable.
And also a clean rule deleting quietly all .txt files in current dir (hence be careful, potentially dangerous).

How to untar and gzip the extracted files in one operation?

I have a huge (500GB) gzipped tar file, and I want to extract all the files in it. The tar file is gzipped, but the files in it are not. The problem is that if I extract them all like this
tar xzf huge.tgz
then I run out of space.
Is there a way to simultaneously extract and gzip the files? I could write a script to do
tar tzf huge.tgz
and then extract each file and gzip it, one after the other. But I was hoping there might be a more efficient solution.
You would have to write a program that uses, for example, libarchive and zlib to extract entries and run them through gzip compression.

gsutil finding the uncompressed file size

Is there a way to get the uncompressed file size using gsutils? I've looked into du and ls -l. Both return the compressed size. I would like to avoid having to download the files to see their size.
gsutils provides only some basic commands like copy, list directory files etc. I would like to suggest you to write a python scripts which let you know the original size of the zipped files.

bunzip / bzip2 an entire directory instead of individual files in the directory

With gunzip its simply zip -r archive.zip my_dir/.
Am failing to find an equivalent command for bunzip. Some if found are zipping individual files inside of a directory, but i want one .bzip2 archive.
gunzip is not zip. zip is an archiver which handles files and directories. gzip/gunzip only compresses a single file or stream of data.
bzip2 is just like gzip, and only compresses a single file or stream of data. For both gzip and bzip2, it is traditional to use tar as the archiving program, and compressing the output. In fact that is such a common idiom that tar has options to invoke gzip or bzip2 for you. Do a man tar.