Can we add column to an existing table in AWS Athena using SQL query? - sql

I have a table in AWS Athena which contains 2 records. Is there a SQL query using which a new column can be inserted in to the table?

You can find more information about adding columns to table in Athena documentation
Or you can use CTAS
For example, you have a table with
CREATE EXTERNAL TABLE sample_test(
id string)
LOCATION
's3://bucket/path'
and you can create another table from sample_test with the query
CREATE TABLE new_test
AS
SELECT *, 'new' AS new_col FROM sample_test
You can use any available query after AS

This is mainly for future readers like me, who was struggling to get this working for Hive table with AVRO data and if you don't want to create new table i.e updating schema of the existing table. It works for csv using 'add columns', but not for Hive + AVRO. For Hive + AVRO, to append columns at the end, before partition columns, the solution is available at this link. However, there are couple of things to note that, we need to pass full schema to the literal attribute and not just the changes; and (not sure why but) we had to alter hive table for all 3 things in the same order - 1. add columns using add columns 2. set tblproperties and 3. set serdeproperties. Hopefully it helps someone.

Related

How to modify CTAS query to append query results to table based on if new partition doesn't exist? - Athena

I have a query that I want to execute daily that's to be partitioned by the date it's executed. The results of this query should be appended to a the same table.
My idea was ideally having something similar to the CREATE TABLE IF NOT EXISTS command for adding data by a new partition every day to the existing table if the partition doesn't already exist, but I can't figure out how I'd be able to integrate this in my query.
My query:
CREATE TABLE IF NOT EXISTS db_name.table_name
WITH (
external_location = 's3://my-query-results-location/',
format = 'PARQUET',
parquet_compression = 'SNAPPY',
partitioned_by = ARRAY['date_executed'])
AS
SELECT
{columns_that_I_am_selecting_here_including_'date_executed'}
What this does is create a new table for the first day it's executed but nothing happens for subsequent days, I'm assuming because of the CREATE TABLE IF NOT EXISTS validating that the table already exists and not proceeding with the logic.
Is there a way to modify my query to create a table for the first day executed and append the results by a new partition for each subsequent day?
I'm quite sure ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION would not apply to my use case here as I'm running a CTAS query.
You can simply use INSERT INTO existing_table SELECT....
Presumably your table is already partitioned, so include that partition column in the SELECT and Amazon Athena will automatically put the data in the correct directory.
For example, you might include hte column like this: SELECT ... CURRENT_DATE as date_executed
See: INSERT INTO - Amazon Athena

Hive - Create Table statement with 'select query' and 'partition by' commands

I want to create a Partitioned Table in Hive. I know to create a table structure first with the help of "Create table ... Partitioned by" command and then insert the data into the table using "Insert Into Table" command
But what I am trying to do is to combine these two commands into a single query like below but it is throwing errors.
CREATE TABLE test_extract AS
SELECT
*
FROM master_extract
PARTITION BY (year string
,month string)
;
Both Year and Month are two separate columns in the master_extract table.
Is there any way to achieve something like this ?
No, this is not possible, because Create Table As Select (CTAS) has restrictions:
The target table cannot be a partitioned table.
The target table cannot be an external table.
The target table cannot be a list bucketing table.
You can create table separately and then insert overwrite it.
There has been some development since this question was originally asked and answered. As per hive documentation: Starting with Hive 3.2.0, CTAS statements can define a partitioning specification for the target table (HIVE-20241).
You can also see the related ticket here. It has been resolved back in July 2018.
Therefore if your hive is of 3.2.0 or higher, then you can simply do
CREATE TABLE test_extract PARTITIONED BY (year string, month string) AS
SELECT
col1,
col2,
year,
month
FROM master_extract

How to copy table by spark-sql

Actually, I want to move one table to another database.
But spark don't permit this.
Then, how to copy table by spark-sql?
I already tried this.
SELECT *
INTO table1 IN new_database
FROM old_database.table1
But it was not working.
maybe try:
CREATE TABLE new_db.new_table AS
SELECT *
FROM old_db.old_table;
To preserve partitioning and storage format do the following-
Get the complete schema of the existing table by running-
show create table db.old_table
The above query will output the table schema which you can just execute after changing the path name and table name.
Then insert all the rows into the new blank table using-
insert into db.new_table select * from db.old_table
The following snippet will create a new table while preserving the definition of the "old" table.
CREATE TABLE db.new_table LIKE db.old_table;
For more info, check the doc's CREATE TABLE.

How can I create a partitioned table 'like' an unpartitioned table with Hive HQL?

I've got a table with two weeks worth of entries, and I would like to copy those entries into a table partitioned by date (creating it if it does not exist).
I'm writing a luigi task to do this, and I would love for it to be independent of the table schema--i.e. I wouldn't have to specify column names and types, and it would CREATE TABLE IF NOT EXISTS when necessary.
I was hoping I could use:
CREATE TABLE IF NOT EXISTS test_part
COMMENT 'This is a test table to see if partitioning works in this case'
PARTITIONED BY (event_date string)
AS select *, '2014-12-15' from source_db.source_table
where event_at <'2014-12-16' and event_at >='2014-12-15';
But this of course fails with: FAILED: SemanticException [Error 10068]: CREATE-TABLE-AS-SELECT does not support partitioning in the target table
I tried again with "like" with basically the same results. Is there a way to do this that I am missing? It doesn't have to be atomic. Multiple sequential commands are fine.
You do not do a create table as.
You create a table first using describe source_table and then you make an insert into table partition (event_date string)
2 steps it works better.

How can I copy a Redshift table but add a sortkey to a column?

I'm currently working on a project that uses a Redshift table with 51 columns. However, the person who made the table forgot to add a sortkey to our time column which will hurt performance for our use case if we don't add it.
How can I make a version of the table with our time column as the sortkey? I'm aware that you can't make a column a sortkey if its a member of an existing table, but I was hoping there's a way to do it that doesn't involve writing out the CREATE TABLE syntax by hand; for example, something like this would be nice:
timecube=# CREATE TABLE foo (like bar) sortkey(time);
ERROR: CREATE TABLE LIKE is not supported with DISTSTYLE, DISTKEY(), or SORTKEY() clauses
but as you can see its not supported. Is there another way? As we're still developing we don't need any of existing data.
Using traditional tools like pgdump didn't work well because they don't include any of the Redshift extras like encoding.
Redshift supports specifying the DIST and SORT keys as part of CREATE TABLE AS statements, as per the docs.
CREATE TABLE table_name
DISTSTYLE KEY
DISTKEY ( column )
SORTKEY ( column )
AS
(SELECT *
FROM source_table)
;
First step you need to do use get create table statement for existing table. Then create new table this time add sort key to new table.
Check encoding for old table ( when you load data using copy command it automatically adds compression encodings)
select "column", type, encoding
from pg_table_def where tablename = 'old_table'
When creating new table add encoding type for each column. Create table with Sort key .
Once new table is created use below command
insert into new table ( select * from old table order by time asc)