How should you write the prep stage when the source file is a .gz? - gzip

The setup macro cannot handle a .gz file. I think I should extract the source .gz file without deleting it (decompressing with gzip deletes the file normally), and then manually cd into the uncompressed directory.
I am wondering if this is a good solution.

You cannot extract a directory from just a .gz file. That would be a .tar.gz file. The extraction of a .tar.gz file does not normally delete the .tar.gz file.

Related

how to upload large files to pypi?

I have a data file, which is about 2.5GB, .h5format and I want to upload it to pypi. As the limit is only 60MB I compressed the file, with a .rar format. When I do the pip show -f packagename it shows data.rar under the files uploaded. But for the .py file to work, it should be in an uncompressed state. How do I deal with this issue? Thank you.

Open a gz file using Minizip Library

I'm trying to open a gz file with Minizip library (built on zlib).
Here is the code:
......
......
unzFile uf = unzOpen("MyFile.gz");
......
But Visual Studio 2013 crashes with this message:
Debug Assertion Failed!
file open.c
line 98
Expression: ("Invalid file open mode",0)
What could it mean?
A .gz file is a single file that's been compressed.
A .zip file is a compressed archive; i.e. a hierarchical structure of compressed files.
tl;dr minizip doesn't support .gz files as it's not a .zip file.

bunzip / bzip2 an entire directory instead of individual files in the directory

With gunzip its simply zip -r archive.zip my_dir/.
Am failing to find an equivalent command for bunzip. Some if found are zipping individual files inside of a directory, but i want one .bzip2 archive.
gunzip is not zip. zip is an archiver which handles files and directories. gzip/gunzip only compresses a single file or stream of data.
bzip2 is just like gzip, and only compresses a single file or stream of data. For both gzip and bzip2, it is traditional to use tar as the archiving program, and compressing the output. In fact that is such a common idiom that tar has options to invoke gzip or bzip2 for you. Do a man tar.

how to merge gz files into a tar.gz without decompression?

I have a program that only consumes uncomprssed files. I have a couple of .gz files and my goal is to feed the concatenation of them to that program. If I had a tar.gz file I could mount the tar.gz archive with the archivemont command.
I know I can concatenate the gz files:
cat a.gz b.gz > c.gz
But there is no way, that I am aware of, to mount a gz file. I don't have enough disk space to uncompress all of the files and the tar command do not accept stdin as the input so I cannot do this:
zcat *.gz | tar - | gzip > file.tar.gz
It is not clear what operations you need to perform on the tar.gz archive. But from what I can discern, tar.gz is not the format for this application. The entire archive stream is compressed by gzip, so you can't pull out or change a file without having to re-compress everything after it. The tar.gz stream can be specially prepared to keep the compression of each file independent, but then you might as well use the .zip format, which is better suited for random access and manipulation of individual files in the archive.
To address one of your comments, tar can in fact accept stdin as input. See pipe tar extract into tar create for some examples, where both GNU tar and BSD tar (with different syntax) can take in a tar file from stdin, delete entries, and write a new tar file to stdout.

How to uncompress and import a .tar.gz file in kettle?

I am trying to figure out how to create a job/transformation to uncompress and load a .tar.gz file. Does anyone have any advice for getting this to work?
you want to read a text file that is compressed?
Just specify the file in the text file input step in the transformation - and specify the compression (GZip). Kettle can read directly from compressed files.
If you do need the file uncompressed then use a job step - not sure if there is a native uncompress, but if not just use a shell script step.
There is not such component in kettle to uncompress the tar.gz file i found.
But if we have the csv file text compressed in gizip format we can use gzip input component.