How to update column in AWS Athena Table - sql

I have a table in Athena with the following columns.
Describe my_table
row_id
icd9_code
linksto
The column icd9_code is empty with intdata type. I want to insert some integer values to the column icd9_code of my table named my_table.
Those integer values are stored in an excel sheet in my local pc. Does AWS athena provide some way to do it.

Amazon Athena is primarily designed to run SQL queries across data stored in Amazon S3. It is not able to access data stored in Microsoft Excel files, nor is it able to access files stored on your computer.
To update a particular column of data for existing rows of data, you would need to modify the files in Amazon S3 that contain those rows of data.

Related

How to update table in Athena

I am creating a table 'A' from another table 'B' in Athena using create sql query. However, Table 'B' is updated with new rows every hour. I want to know how can I update the table A data without dropping table A and creating it again.
I tried dropping table and creating it again, but that seems to create performance issue as every time a new table is getting created. I want to insert only new rows in table A whichever are added in Table B
Amazon Athena is a query engine, not a database.
When a query runs on a table, Athena uses the location of a table to determine where the data is stored in an Amazon S3 bucket. It then reads all files in that location (including sub-directories) and runs the query on that data.
Therefore, the easiest way to add data to Amazon Athena tables is to create additional files in that location in Amazon S3. The next time Athena runs a query, those files will be included as part of the referenced table. Even running the INSERT INTO command creates new files in that location. ("Each INSERT operation creates a new file, rather than appending to an existing file.")
If you wish to copy data from Table-B to Table-A, and you know a way to identify which rows to add (eg there is a column with a timestamp), you could use something like:
INSERT INTO table_a
SELECT * FROM table_b
WHERE timestamp_field > (SELECT MAX(timestamp_field FROM table_a))

Query S3 Bucket With Amazon Athena and modify values

I have an S3 bucket with 500 csv files that are identical except for the number values in each file.
How do I write query that grabs dividendsPaid and make it positive for each file and send that back to s3?
Amazon Athena is a query engine that can perform queries on objects stored in Amazon S3. It cannot modify files in an S3 bucket. If you want to modify those input files in-place, then you'll need to find another way to do it.
However, it is possible for Amazon Athena to create a new table with the output files stored in a different location. You could use the existing files as input and then store new files as output.
The basic steps are:
Create a table definition (DDL) for the existing data (I would recommend using an AWS Glue crawler to do this for you)
Use CREATE TABLE AS to select data from the table and write it to a different location in S3. The command can include an SQL SELECT statement to modify the data (changing the negatives).
See: Creating a table from query results (CTAS) - Amazon Athena

AWS - How to extract CSV reports from a set of JSON files in S3

I have a RDS database with the following structure: CustomerId|Date|FileKey.
FileKey points to a JSON file in S3.
Now I want to create CSV reports with a costumer, date range filters and columns definition (ColumnName + JsonPath), like that:
Name => data.person.name
OtherColumn1 => data.exampleList[0]
OtherColumn2 => data.exampleList[2]
I often need to add and remove columns from the columns definition.
I know I can run a SQL SELECT on RDS, get each S3 file (JSON), extract the data and create my CSV file, but, this is not a good solution because I need to query my RDS instance and make millions of requests to S3 for each report request or each columns definition change.
Saving all data on RDS table instead on S3 is also not a good solution because JSON file contains a lot of data and columns not the same for costumers.
Any idea?

How to ETL a table that contains Blob columns from one Oracle table to another using SSIS

We have a table that contains 50 rows of data.the table includes BLOB data types, we are trying to see if we can use SSIS to copy the data from table1 to table2 inclusing the BLOB columns as we have tried to use other methods but did not succeed.
The Blob columns contain Excel documents.
Is this possible? Is there an easier way to do it in SSIS?
If not possible as there an easier way to it on Oracle?

How to rename AWS Athena columns with parquet file source?

I have data loaded in my S3 bucket folder as multiple parquet files.
After loading them into Athena I can query the data successfully.
What are the ways to rename the Athena table columns for parquet file source and still be able to see the data under renamed column after querying?
Note: checked with edit schema option, column is getting renamed but after querying you will not see data under that column.
There is as far as I know no way to create a table with different names for the columns than what they are called in the files. The table can have fewer or extra columns, but only the names that are the same as in the files will be queryable.
You can, however, create a view with other names, for example:
CREATE OR REPLACE VIEW a_view AS
SELECT
a AS b,
b AS c
FROM the_table