Pentaho Compress File - pentaho

Is it possible to compress a file so that it is even smaller than the normal compression option? I need to compress files in the best way possible, but in pentaho when we are going to send the file by email it simply compresses... there is no option to compress even less
Send a file with the best compress

If you have a more potent compressor installed in your server/machine where you are executing PDI, you can use a shell script to perform the compression and execute that shell script using Pentaho.

Related

consume gzip files with databricks autoloader

I am currently unable to find a direct way to load .gz files via autoloader. I can load the files as a binary content but I cannot extract the compressed xml files and process them further in a streaming way.
Therefore, I would like to know if there is a way to consume the content of a gzip file via databricks autoloader

How to upload large files to mediawiki in an efficient way

We have to upload a lot of virtual box images witch are between 1G and 6G.
So i would prefer to use ftp for upload and then include the files in mediawiki.
Is there a way to do this?
Currently I use a jailed ftp user who can upload to a folder and then use the UploadLocal extension to include the files.
But this works only for files smaller then around 1G. If we upload bigger files we get a timeout and even by setting execution_time of PHP to 3000s the including stops after about 60s with a 505 gateway time out (witch is also the only thing appearing in the logs).
So is there a better way of doing this?
You can import files from shell using maintenance/importImages.php. Alternatively, upload by URL by flipping $wgAllowCopyUploads, $wgAllowAsyncCopyUploads and friends (requires that job queue be run using cronjobs). Alternatively, decide if you need to upload these files into MediaWiki at all, because just linking to them might suffice.

how can I upload a gzipped json file to bigquery via the HTTP API?

When I try to upload an uncompressed json file, it works fine; but when I try a gzipped version of the same json file, the job would fail with lexical error resulted from failure to parse the json content.
I gzipped the json file with the gzip command from Mac OSX 10.8 and I have set the sourceFormat to: "NEWLINE_DELIMITED_JSON".
Did I do something incorrectly or gzipped json file should be processed differently?
I believe that using the multipart/related request it is not possible to submit binary data (such as the compressed file. However, if you don't want to use uncompressed data, you may be able to use resumable upload.
What language are you coding in? The python jobs.insert() api takes a media upload parameter, which you should be able to give a filename to in order to do resumable upload (which sends your job metadata and new table data as separate streams). I was able to use this to upload a compressed file.
This is what bq.py uses, so you could look at the source code here.
If you aren't using python, the googleapis client libraries for other languages should have similar functionality.
You can upload gzipped files to Google Cloud Storage, and BigQuery will be able to ingest it with a load job:
https://developers.google.com/bigquery/loading-data-into-bigquery#loaddatagcs

Why do Amazon S3 returns me an Error 330 about simple files?

I have added the "Content-Encoding: gzip" header to my S3 files and now when I try to access them, it returns me a "Error 330 (net::ERR_CONTENT_DECODING_FAILED)".
Note that my files are simply images, js and css.
How do I solve that issue?
You're going to have to manually gzip them and then upload them to S3. S3 doesn't have the ability to gzip on the fly like your web server does.
EDIT: Images are already compressed so don't gzip them.
Don't know if you are using Grunt as deployment tool but, use this to compress your files:
https://github.com/gruntjs/grunt-contrib-compress
Then:
https://github.com/MathieuLoutre/grunt-aws-s3
To upload compressed files to Amazon S3. Et voila!

Speeding up file downloads from web server

How do I optimize my server for server file downloads, apart from buying a faster network plan? The files are static and stored in a folder as is. An .htaccess file ensures files are forced to download rather than attempt to open in browser. Is it recommended to use a CDN?
A CDN is definitely a way to get good performance, but the price only makes sense for quite large volumes.
What kind of files are you serving? If they're compressible (eg HTML, word processor documents, PDFs; not JPEGs), then make sure compression is enabled - use mod_deflate.