gsutil finding the uncompressed file size - gsutil

Is there a way to get the uncompressed file size using gsutils? I've looked into du and ls -l. Both return the compressed size. I would like to avoid having to download the files to see their size.

gsutils provides only some basic commands like copy, list directory files etc. I would like to suggest you to write a python scripts which let you know the original size of the zipped files.

Related

how to upload large files to pypi?

I have a data file, which is about 2.5GB, .h5format and I want to upload it to pypi. As the limit is only 60MB I compressed the file, with a .rar format. When I do the pip show -f packagename it shows data.rar under the files uploaded. But for the .py file to work, it should be in an uncompressed state. How do I deal with this issue? Thank you.

tar gzip of directory leaves old directory there and tar.gz is much larger

New to zipping files here. I used the following command to gzip a bunch of large files within a single directory:
tar -cvzf archive-RAW-MAFs.tar.gz RAW_MAFS/
When this was done, I noticed that it left the old directory tree where it was, and that the tar.gz was much larger. I'm not sure what the original size of the directory was as I didn't check it beforehand, but I think it was much larger than stated here...
-rw-r----- 1 xxx xxxx 21218045403 May 8 21:39 archive-RAW-MAFs.tar.gz
drwxr-s--- 34 xxx xxxx 4096 May 8 20:21 RAW_MAFS
I can also traverse through the original RAW_MAFs directory and open files. Ideally, I would like only the zipped file, because I don't need to touch this data again for a while and want to save as much as I can.
I'll take the second question first.
The original file are still there because you haven't told tar to delete them. Add the --remove-files option to the command line to get tar to do what you want
tar -cvzf archive-RAW-MAFs.tar.gz RAW_MAFS/ --remove-files
Regarding the size of the RAW_MAFS directory tree. If it hasn't been deleted yet can you not check their sizes?
If the original files in RAW_MAFS are already compressed, then compressing again when you put them in your tar file will increase the size. Can you provide more details on what you are storing in the tar file?
If you are storing compressed files in the tar, try running without the z option.

How to untar and gzip the extracted files in one operation?

I have a huge (500GB) gzipped tar file, and I want to extract all the files in it. The tar file is gzipped, but the files in it are not. The problem is that if I extract them all like this
tar xzf huge.tgz
then I run out of space.
Is there a way to simultaneously extract and gzip the files? I could write a script to do
tar tzf huge.tgz
and then extract each file and gzip it, one after the other. But I was hoping there might be a more efficient solution.
You would have to write a program that uses, for example, libarchive and zlib to extract entries and run them through gzip compression.

How to Merge PDFs from S3 using ghostscript

ghostscript is working fine to merge multiple pdf files into one. It is working fine when we merge files from our server. Now, I want to merge pdf files which are present in amazon S3.
Is that possible? something like
gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf https://<bucket>.s3.amazonaws.com/pdf1.pdf http://<bucket>.s3.amazonaws.com/pdf2.pdf
No, you cannot do that. Ghostscript does not have a http client built in, and it requires random access to the files as well, so it might be very slow even if it did work.
All files must be available via the local Operating System's file system.
Of course, it would in principle be possible to add a new file device type (similar to %rom% and %ram%) to do file access by http. Ghostscript is open source so you can add this yourself if you want.
Please note that you aren't merging PDF files, the source files are interpreted and a brand new PDF file created from the marking content of the input. Its not the same thing.

how to merge gz files into a tar.gz without decompression?

I have a program that only consumes uncomprssed files. I have a couple of .gz files and my goal is to feed the concatenation of them to that program. If I had a tar.gz file I could mount the tar.gz archive with the archivemont command.
I know I can concatenate the gz files:
cat a.gz b.gz > c.gz
But there is no way, that I am aware of, to mount a gz file. I don't have enough disk space to uncompress all of the files and the tar command do not accept stdin as the input so I cannot do this:
zcat *.gz | tar - | gzip > file.tar.gz
It is not clear what operations you need to perform on the tar.gz archive. But from what I can discern, tar.gz is not the format for this application. The entire archive stream is compressed by gzip, so you can't pull out or change a file without having to re-compress everything after it. The tar.gz stream can be specially prepared to keep the compression of each file independent, but then you might as well use the .zip format, which is better suited for random access and manipulation of individual files in the archive.
To address one of your comments, tar can in fact accept stdin as input. See pipe tar extract into tar create for some examples, where both GNU tar and BSD tar (with different syntax) can take in a tar file from stdin, delete entries, and write a new tar file to stdout.