How to uncompress and import a .tar.gz file in kettle? - pentaho

I am trying to figure out how to create a job/transformation to uncompress and load a .tar.gz file. Does anyone have any advice for getting this to work?

you want to read a text file that is compressed?
Just specify the file in the text file input step in the transformation - and specify the compression (GZip). Kettle can read directly from compressed files.
If you do need the file uncompressed then use a job step - not sure if there is a native uncompress, but if not just use a shell script step.

There is not such component in kettle to uncompress the tar.gz file i found.
But if we have the csv file text compressed in gizip format we can use gzip input component.

Related

how to upload large files to pypi?

I have a data file, which is about 2.5GB, .h5format and I want to upload it to pypi. As the limit is only 60MB I compressed the file, with a .rar format. When I do the pip show -f packagename it shows data.rar under the files uploaded. But for the .py file to work, it should be in an uncompressed state. How do I deal with this issue? Thank you.

How to import .zip data to neo4j?

I have a CSV file which is zipped and stored on s3, I'm planning to import the file directly from the URL. I'm not able to find any way of doing that in Neo4j official docs.
LOAD CSV can do this. neo4j-import has the same underlying file reader and so can read zipped files directly, although it seems to be lacking URL support currently.

gsutil finding the uncompressed file size

Is there a way to get the uncompressed file size using gsutils? I've looked into du and ls -l. Both return the compressed size. I would like to avoid having to download the files to see their size.
gsutils provides only some basic commands like copy, list directory files etc. I would like to suggest you to write a python scripts which let you know the original size of the zipped files.

Moovweb - uncompress gzipped outgoing_response.http content?

I have a Moovweb project and I'm trying to compare the incoming_response.http and outgoing_response.http files in tmp/messages/... folders.
The incoming_response.http from the upstream server is saved in plain text,
but the outgoing_response.http file is gzipped content!
How can I convert it to plain text so I can look through the response?
Thanks!
put this in your main.ts files:
export("disable_compression", "true")

BULK IMPORT a zip file in T-SQL

I've got some data files that are stored compressed on our company's server with a .Z extension (UNIX compress utility used to zip them down).
Can SQL Server's BULK IMPORT operation read these files in that format? Or must I uncompress them before getting at the data?
The BULK IMPORT would not natively be able to do this however if you are on SQL2005 or greater you can use SSIS. The first step would be to perform an Exectute Process Task and use a zip utility to unzip the file. The second step is to use the SSIS Bulk Insert task to push the data into SQL Server.
EDIT: use the compress from unixutils rather than cygwin to uncompress the files as it understands native windows filenames. This means that you don't have to maintain /cygdrive paths as well as native paths.