Pentaho - upload list of files to Amazon s3 - amazon-s3

I am looking for a way to upload a list of files to Amazon S3.
I have tried this: http://open-bi.blogspot.co.il/2010/03/kettel-job-plugin-send-files-to-amazon.html
But it did not work for me. I am using ketle 5.
I would prefer a transformation step, but a job step would also be great.
Thanks

I am looking for the same thing. I think the best solution might be to use ftp? I think you can send files to S3 with ftp.
In my scenario, I also have to move and rename the files before uploading. So we have the path to the file and filename, we can use a FileExists step to make sure it exists first. Then run the move and rename file. Then I was going to try with an sftp step to upload the entire directory of tiles up to Amazon.

Related

How to create and interaction between google drive and aws s3?

I'm trying to set up a connection a Google Drive folder and S3 bucket, but I'm not sure where to start.
I've already created a sort of "Frankenstein process", but it's easy to use only by me and sharing it to my co-workers it's a pain.
I have a script that generates a plain text file and saves it into a drive folder. And to upload, I've installed Drive file stream to save it in my mac, then all I did was create a script using Python3, with the boto3 library, to upload the text file into different s3 buckets depending on the file name.
I was thinking that I can create a lambda to process the file into the s3 buckets but I cannot resolve how to create the connection between drive and s3. I would appreciate if someone could give me a piece of advise on how to start with this.
Thanks
if you just simply want to connect google drive and aws s3 there is one service name zapier which provide different type of integration without line of code
https://zapier.com/apps/amazon-s3/integrations/google-drive
For more details you can check this link out

Upload empty folder to s3 using sdk

Is it possible to upload empty folder to s3 using SDK (Preffer .NET)?
As far as I know, I can't do it using AWS Console.
Has anybody Idea how I can do workaround using SDK to make it possible upload an empty folder? Because I need to create an application to make user possible to upload entire folder including empty folder.
Many thanks & Regards
As a workaround, AWS suggests simulating empty folder by creating 0 byte file and deleting it once the first object in a folder being created.
You can find more details here:
https://forums.aws.amazon.com/thread.jspa?threadID=50849

upload file to my server using pentaho data integration(pdi)

On my pdi transformations, a file is created and then a table on my database_1 is updated with the information on that file.
That work perfectly fine.
What i need now is to upload the file to some place in my web server. But I want to upload it automaticaly each time I finish the transformation described above.
There is any job that could send a file to my web server?! or any other viable automated ways to do it?
thank you.
Can't you just use the SFTP step?

Allowing users to download files as a batch from AWS s3 or Cloudfront

I have a website that allows users to search for music tracks and download those they they select as mp3.
I have the site on my server and all of the mp3s on s3 and then distributed via cloudfront. So far so good.
The client now wishes for users to be able to select a number of music track and then download them all in bulk or as a batch instead of 1 at a time.
Usually I would place all the files in a zip and then present the user a link to that new zip file to download. In this case, as the files are on s3 that would require I first copy all the files from s3 to my webserver process them in to a zip and then download from my server.
Is there anyway i can create a zip on s3 or CF or is there someway to batch / group files in to a zip?
Maybe i could set up an EC2 instance to handle this?
I would greatly appreciate some direction.
Best
Joe
I am afraid you won't be able to create the batches w/o additional processing. firing up an EC2 instance might be an option to create a batch per user
I am facing the exact same problem. So far the only thing I was able to find is Amazon's s3sync tool:
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
In my case, I am using Rails + its Paperclip addon which means that I have no way to easily download all of the user's images in one go, because the files are scattered in a lot of subdirectories.
However, if you can group your user's files in a better way, say like this:
/users/<ID>/images/...
/users/<ID>/songs/...
...etc., then you can solve your problem right away with:
aws s3 sync s3://<your_bucket_name>/users/<user_id>/songs /cache/<user_id>
Do have in mind you'll have to give your server the proper credentials so the S3 CLI tools can work without prompting for usernames/passwords.
And that should sort you.
Additional discussion here:
Downloading an entire S3 bucket?
s3 is single http request based.
So the answer is threads to achieve the same thing
Java api - uses TransferManager
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html
You can get great performance with multi threads.
There is no bulk download sorry.

How to upload a file from the web onto Amazon S3?

I have a link to a file (like so: http://example.com/tmp/database.csv). I want to upload it directly into S3, instead of downloading it on my computer first (and then uploading). Is this possible?
The file will have to move through some application you write. Amazon S3 does not have any mechanism to execute code or pull files, so the only way to do this is to send it directly from the server where the file is hosted or from another server.