How to add the date to file name while uploading a file to s3 bucket using alteryx - amazon-s3

I have a workflow in alteryx where I am downloading two files from two different urls. After making the required modifications I want to upload them to the s3 bucket as well as save a copy locally. I want to add the current date while saving the file in both cases. I was successful in using a formula tool to rename the file saved locally. But unable to do for the copy being uploaded to s3. Can anyone help me with this? PS: Since it's the company data I cant share the screenshot of the workflow.

Related

Unzip files from S3 before putting them into Snowflake

I have data available in an S3 bucket we don't own, with a zipped folder containing files for each date.
We are using Snowflake as our data warehouse. Snowflake accepts gzip'd files, but does not ingest zip'd folders.
Is there a way to directly ingest the files into Snowflake that will be more efficient than copying them all into our own S3 bucket and unzipping them there, then pointing e.g. Snowpipe to that bucket? The data is on the order of 10GB per day, so copying is very doable, but would introduce (potentially) unnecessary latency and cost. We also don't have access to their IAM policies, so can't do something like S3 Sync.
I would be happy to write something myself, or use a product/platform like Meltano or Airbyte, but I can't find a suitable solution.
How about using SnowSQL to load the data into Snowflake, and using Snowflake stage table/user/named stage to hold files at stages.
https://docs.snowflake.com/en/user-guide/data-load-local-file-system-create-stage.html
I had a similar use case. I use an event based trigger that runs a Lambda function everytime there is a new zipped file in my S3 folder. The Lambda functions opens the zipped files, gzips each individual file and re-uploads them to a different S3 folder. Here's the full working code: https://betterprogramming.pub/unzip-and-gzip-incoming-s3-files-with-aws-lambda-f7bccf0099c9

how to store auto generated files in a different AWS S3 folder while running Tableau using Athena connector?

I am using Athena to connect a single csv file stored in AWS S3 folder with Tableau Desktop and have been successful in connecting the S3 data using Athena.
However, when I perform any activity in Tableau like drag and drop, slice and dice, for each activity, an auto generated csv and a metadata gets saved in the same folder as my input file.
Due to this additional files getting auto-generated in the same input file folder, the visuals in Tableau also get affected (due to additional records).
How do i ensure that, for any activity i perform in Tableau, the auto-generated files get stored in a different folder (rather than the same folder from where the input file is being called) ?
This will solve my problem as the visuals and the analysis will show correct numbers.
Currently, the work-around that I am using is after every activity I perform in Tableau (slice,filter, etc..), i go back to the S3 folder, delete the additional files that got auto-generated, then continue with activity in Tableau, then back to S3 folder for deletion, etc... (Definitely not the ideal way).
While executing Athena query, I am storing the query results in a different folder, because there is a provision for doing the same.
Please suggest if there is a similar provision for storing the auto-generated files (while working on Tableau) in a different folder ?
P.S. If there is an option of preventing these files from getting generated, that will also be helpful.
Anand
How do I ensure that the auto-generated files get stored in a different folder?
In order to store results of you queries in a different location, you need to specify different path for S3 Staging Directory. In order to do that, you need to Edit Connection to AWS Athena.
Here we did everything within Tableau itself, but the same result can be accomplished within AWS Athena settings for query result locaion
If there is an option of preventing these files from getting generated, that will also be helpful.
On the left side of the toolbar, there is an option Pause/Resume Auto Updates. When paused, Tableau doesn't send new query to AWS Athena.

Can Someone Help Me Troubleshoot Error In BQ "does not contain valid backup metadata."

I keep trying to upload a new table onto my companies BQ, but I keep getting the error you see in the title ("does not contain valid backup metadata.").
For reference, I'm uploading a .csv file that has been saved to our Google Cloud data storage. It's being uploaded as a native table.
Can anyone help me troubleshoot this?
It sounds like you are specifying the file type DATASTORE_BACKUP. When you specify that file type, BigQuery will take whatever uri you provide (even if it has a .CSV suffix) and search for the Google Cloud Data Storage Backup files relative to that url.

S3 — Auto generate folder structure?

I need to store user uploaded files in Amazon S3. I'm new to S3, but as I got from docs, S3 requires of me to specify file upload path in PUT method.
I'm wondering if there is a way to send file to S3, and simply get link for http(s) access? I wish Amazon to handle all headache related to file/folder structure itself. For example, I just pipe from node.js file to S3, and on callback I get http link with no expiration date. And Amazon itself creates smth like /2014/12/01/.../$hash.jpg and just returns me the final link? Such use case looks to be quite common.
Is it possible? If no, could you suggest any options to simplify file storage/filesystem tree structure in S3?
Many thanks.
S3 doesnt' have folders, actually. In a normal filesystem, 2014/12/01/blah.jpg would mean you've got a 2014 folder with a folder called 12 inside it and so on, but in S3 the entire 2014/12/01/blah.jpg it the key - essentially a single long filename. You don't have to create any folders.

query regarding cloud file storage services- can i append data to an existing file

I am working to create an application where some files will be stored in Amazon S3/Rackspace Cloud Files/other similar cloud file storage providers.
There are a couple of scenarios where it would be easier for me, if I could append data to an existing file... Is this possible? Or do I have to download the file from Amazon S3, then append data to it, and finally upload the modified file back to Amazon S3?
There is no way to append anything to existing files in S3.
You will have to download it and upload it again after modifying.
If you wish though, you can always upload the new data with a tag (a timestamp or a counter), e.g. file_201201011344. So when reading files, you get all files mactching your pattern and append them on the client side.