Update Athena Table from 2 external tables in Athena from s3 - sql

I am relatively new to athena & s3.
I have an s3 bucket which contains 2 folders with csv files in both. I have created 2 external tables for each folder in athena.
I want to create another final table in athena which joins the two files and updates with more rows automatically as more files are added into the s3 bucket. Please could you advise the best way to get the output needed?
I have tried "create table from query" in athena. But the table remains static as i upload more files to s3, and doesnt update.

For this use-case I would suggest creating a view in Athena. You can read more on it here.

Related

Query S3 Bucket With Amazon Athena and modify values

I have an S3 bucket with 500 csv files that are identical except for the number values in each file.
How do I write query that grabs dividendsPaid and make it positive for each file and send that back to s3?
Amazon Athena is a query engine that can perform queries on objects stored in Amazon S3. It cannot modify files in an S3 bucket. If you want to modify those input files in-place, then you'll need to find another way to do it.
However, it is possible for Amazon Athena to create a new table with the output files stored in a different location. You could use the existing files as input and then store new files as output.
The basic steps are:
Create a table definition (DDL) for the existing data (I would recommend using an AWS Glue crawler to do this for you)
Use CREATE TABLE AS to select data from the table and write it to a different location in S3. The command can include an SQL SELECT statement to modify the data (changing the negatives).
See: Creating a table from query results (CTAS) - Amazon Athena

Is it possible to delete records from Hive external table with AWS S3 bucket as location?

Is it possible to delete records from Hive external table with AWS S3 bucket as location using IICS.
For example : DELETE FROM MY_HIVE_TABLE WHERE COLUMN1='TEST1';
You can not delete from hive(unless they are kudu).
As per your comment, i think you can add a filter transformation right before target hive S3 table with condition COLUMN1<>'TEST1'
And then overwrite the hive target table in S3 using IICS.
This will overwrite same table with everything but COLUMN1=TEST1 i.e. deleting the data.

Update table in Athena

I have a table in Athena created from S3. I wanted to update the column values using the update table command. Is the UPDATE Table command not supported in Athena?
Is there any other way to update the table ?
Thanks
Athena only supports External Tables, which are tables created on top of some data on S3. Since the S3 objects are immutable, there is no concept of UPDATE in Athena. What you can do is create a new table using CTAS or a view with the operation performed there, or maybe use Python to read the data from S3, then manipulate it and overwrite it.

Mapping AWS glue table columns to target RDS instance table columns

i have created a glue job, that takes data from S3 bucket and insert into **RDS postgres instance**.
In the S3 bucket I have created different folder (partition).
Can I map different columns in different partitions to same target RDS instance?
When you say partition in s3, is it indicated using the hive style ?? eg: bucket/folder1/folder2/partion1=xx/partition2=xx/partition3=yy/..
If so, you shouldnt be storing data with different structures of information in s3 partitions and then mapping them to a single table. However if its just data in different folders, like s3://bucket/folder/folder2/folder3/.. and these are actually genuinely different datasets, then yes it is possible to map those datasets to a single table. However you cannot do it via the ui. You will need to read these datasets as separate dynamic/data frames and join them using a key in glue/spark and load them to rds

How to efficiently append new data to table in AWS Athena?

I have a table in Athena that is created from a csv file stored in S3 and I am using Lambda to query it. But I have incoming data being processed by the lambda function and want to append a new row to the existing table in Athena. How can I do this? Because I saw in documentation that Athena prohibits some SQL statements like INSERT INTO and CREATE TABLE AS SELECT
If you are adding new data you can save the new data file into the same folder (prefix/key) that the table is in reading from. Athena will read from all files in this folder, the format of the new file just needs to be the same as the existing one.