how to upload large files to pypi? - pypi

I have a data file, which is about 2.5GB, .h5format and I want to upload it to pypi. As the limit is only 60MB I compressed the file, with a .rar format. When I do the pip show -f packagename it shows data.rar under the files uploaded. But for the .py file to work, it should be in an uncompressed state. How do I deal with this issue? Thank you.

Related

How can you read select few files from a tar file in s3 without having to download the tar?

You do not want to download the tar file.
Just download select few files inside the tar.
Does s3 provide any API to do this or is Apache Commons Compress my best bet?
While you'll be able to stream the file from S3 you'll still basically download it. The Apache Commons Compress library will help hide some of this and is a good solution. The other option would be to store the individual files in the tar file so that they can be randomly accessed.

gsutil finding the uncompressed file size

Is there a way to get the uncompressed file size using gsutils? I've looked into du and ls -l. Both return the compressed size. I would like to avoid having to download the files to see their size.
gsutils provides only some basic commands like copy, list directory files etc. I would like to suggest you to write a python scripts which let you know the original size of the zipped files.

how to merge gz files into a tar.gz without decompression?

I have a program that only consumes uncomprssed files. I have a couple of .gz files and my goal is to feed the concatenation of them to that program. If I had a tar.gz file I could mount the tar.gz archive with the archivemont command.
I know I can concatenate the gz files:
cat a.gz b.gz > c.gz
But there is no way, that I am aware of, to mount a gz file. I don't have enough disk space to uncompress all of the files and the tar command do not accept stdin as the input so I cannot do this:
zcat *.gz | tar - | gzip > file.tar.gz
It is not clear what operations you need to perform on the tar.gz archive. But from what I can discern, tar.gz is not the format for this application. The entire archive stream is compressed by gzip, so you can't pull out or change a file without having to re-compress everything after it. The tar.gz stream can be specially prepared to keep the compression of each file independent, but then you might as well use the .zip format, which is better suited for random access and manipulation of individual files in the archive.
To address one of your comments, tar can in fact accept stdin as input. See pipe tar extract into tar create for some examples, where both GNU tar and BSD tar (with different syntax) can take in a tar file from stdin, delete entries, and write a new tar file to stdout.

How should you write the prep stage when the source file is a .gz?

The setup macro cannot handle a .gz file. I think I should extract the source .gz file without deleting it (decompressing with gzip deletes the file normally), and then manually cd into the uncompressed directory.
I am wondering if this is a good solution.
You cannot extract a directory from just a .gz file. That would be a .tar.gz file. The extraction of a .tar.gz file does not normally delete the .tar.gz file.

How to uncompress and import a .tar.gz file in kettle?

I am trying to figure out how to create a job/transformation to uncompress and load a .tar.gz file. Does anyone have any advice for getting this to work?
you want to read a text file that is compressed?
Just specify the file in the text file input step in the transformation - and specify the compression (GZip). Kettle can read directly from compressed files.
If you do need the file uncompressed then use a job step - not sure if there is a native uncompress, but if not just use a shell script step.
There is not such component in kettle to uncompress the tar.gz file i found.
But if we have the csv file text compressed in gizip format we can use gzip input component.