CREATE TABLE AS select * from partitioned table - sql

I want to create a table using CTAS of partitioned table.
New table must have all the data and partitions, subpartitions of old table.
How to do this?

You need to first create the new table with all the partitions, there is no way you can add partition definitions to a CTAS. Once the table is created you can populate it using insert into .. select.
You can use dbms_metadata.get_ddl to get the definition of the old table.
select dbms_metadata.get_ddl('TABLE', 'NAME_OF_EXISTING_TABLE')
from dual;
Save the output of that into a script, do a search and replace to adjust the table name, then run the create table and then run the insert into ... select ...

Related

BigQuery create Temp table

I am trying to create a temporary table with a specific name so i can use it in other queries. The problem is that when i create a temp table like this:
BEGIN
BEGIN TRANSACTION;
--delete test_temp where 1=1;
create temp table test_temp as SELECT * FROM `analytics.reports.analysis` ;
COMMIT TRANSACTION;
END;
the table name is indeed test_temp but if i will try to select * from test_temp
i get nothing. the table id always generates a new unique id.
something like analytics-prod:_script3991beb3e9868774854ac09d407f1c397428a4f6.test_temp
Is there a way to make this temp table with a pre-defined name?
What I am trying to achieve basically is create a table from a select every 15min.
I thought to create a scheduled query running this transaction that will delete + populate the temp table and then I can always reference to test_temp without changing the table id.
According to the docs:
You can refer to a temporary table by name for the duration of the current multi-statement query
So if you want to refer the table in other queries, an option could be use a permanent table in a separated dataset:
...
CREATE OR REPLACE TABLE `analytics.my_temp_dataset.test_temp` AS SELECT * FROM `analytics.reports.analysis` ;
...
In that case, you don't need a temp table but a permanent table you just replace on the next run using the CREATE OR REPLACE TABLE statement. If you are using the temp table as part of a scripting stage, then I suggest using running this instead:
BEGIN
CREATE OR REPLACE TEMP TABLE _SESSION.test_temp AS
SELECT * FROM `analytics.reports.analysis`;
-- SELECT COUNT(*) CNT FROM _SESSION.test_temp;
END;
Since it is a temp table you don't need a DROP query at the end.
You can create a temporary table in a session in BigQuery.
You can create a session via the API and then create your temporary tables in the session. The tables will then be visible to all queries using that session.
There are limitations on sessions you should be aware of:
They last up to 24 hours max
No concurrent jobs running in a session
I use temporary tables with sessions to do complex transformations in an ETL pipeline. For example:
Create session
Create temporary tables
Do joins, and other transformations, write results to temporary tables
Do final transformations and output results to permanent tables
End session
I am writing a Python API for this and will put it on github at some point. I could probably post a gist if that is helpful.
In the meantime, see Google's documentation for sessions in BigQuery:
https://cloud.google.com/bigquery/docs/sessions-intro

Drop and overwrite external table in hive

I need to create an external table in hiveql with the output from a SELECT clause. Every time when the HiveQL is ran the table should be dropped and recreated . When we drop an external table only the table structure is getting dropped but not the data files from HDFS location. How to achieve this?
Create Table As Select (CTAS) has restrictions. One of them is that target table cannot be External.
You have these options:
Create external table once, then INSERT OVERWRITE
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) select_statement1 FROM from_statement;
Use managed table, then you can DROP TABLE, then CREATE TABLE ... as SELECT
See also answer about skipTrash and auto.purge property.

Hive table creation with a default value

I have a table in RDBMS like so:
create table test (sno number, entry_date date default sysdate).
Now I want to create a table in hive with a structure as adding a default value to a column.
Hive currently doesn't support the feature of adding default value to any column while creating a table.
As a workaround load data into a temporary table and use the insert overwrite table statement to add the current date and time into the main table.
Create a temporary table:
create table test (sno number);
Load data into the table:
Create final table:
create table final_table (sno number, createDate string);
Finally load the data from temp test table to the final table:
insert overwrite table final_table select sno, FROM_UNIXTIME( UNIX_TIMESTAMP(), 'dd/MM/YYYY' ) from test;
Hive doesn't support DEFAULT fields
Doesn't mean you can't do it, though. Just a two step process of creating one "staging" table, then inserting into a second table and selecting that "default" value.
Adding a default value to a column while creating table in hive
Since you mention,
I've table in RDBMS
You could also use your existing table, and use Sqoop to import the data into Hive.

How to copy table by spark-sql

Actually, I want to move one table to another database.
But spark don't permit this.
Then, how to copy table by spark-sql?
I already tried this.
SELECT *
INTO table1 IN new_database
FROM old_database.table1
But it was not working.
maybe try:
CREATE TABLE new_db.new_table AS
SELECT *
FROM old_db.old_table;
To preserve partitioning and storage format do the following-
Get the complete schema of the existing table by running-
show create table db.old_table
The above query will output the table schema which you can just execute after changing the path name and table name.
Then insert all the rows into the new blank table using-
insert into db.new_table select * from db.old_table
The following snippet will create a new table while preserving the definition of the "old" table.
CREATE TABLE db.new_table LIKE db.old_table;
For more info, check the doc's CREATE TABLE.

Add partitions on existing hive table

I'm processing a big hive's table (more than 500 billion records).
The processing is too slow and I would like to make it faster.
I think that by adding partitions, the process could be more efficient.
Can anybody tell me how I can do that?
Note that my table already exists.
My table :
create table T(
nom string,
prenom string,
...
date string)
Partitioning on date field.
Thx
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
INSERT OVERWRITE TABLE table_name PARTITION(Date) select date from table_name;
Note :
In the insert statement for a partitioned table make sure that you are specifying the partition columns at the last in select clause.
You have to restructure the table. Here are the steps:
Make sure no other process is writing to the table.
Create new external table using partitioning
Insert into new table by selecting from the old table
Drop the new table (external), only table will be dropped but data will be there
Drop the old table
Create the table with original name by pointing to the location under step 2
You can run repair command to fix all the metadata.
Alternative 4, 5, 6 and 7
Create the table with original name by running show create table on new table and replace with original table name
Run LOAD DATA INPATH command to move files under partitions to new partitions of new table
Drop the external table created
Both the approaches will achieve restructuring with one insert/map reduce job.