How to recalculate table created by CTAS? - impala

I have created table using this statement:
CREATE TABLE tablename STORED AS PARQUET AS (SELECT ...)
How can i recalculate it without DROP TABLE - CREATE TABLE flow?

In Impala, The INSERT INTO syntax appends data to a table. The existing data files are left as-is, and the inserted data is put into one or more new data files.
The INSERT OVERWRITE syntax replaces the data in a table. Currently, the overwritten data files are deleted immediately; they do not go through the HDFS trash mechanism.
So If you want to replace the data in the table tablename without undergoing drop table and create table, you can run a query like this
INSERT OVERWRITE TABLE tablename SELECT * from <source_tablename>;

Related

Drop and overwrite external table in hive

I need to create an external table in hiveql with the output from a SELECT clause. Every time when the HiveQL is ran the table should be dropped and recreated . When we drop an external table only the table structure is getting dropped but not the data files from HDFS location. How to achieve this?
Create Table As Select (CTAS) has restrictions. One of them is that target table cannot be External.
You have these options:
Create external table once, then INSERT OVERWRITE
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) select_statement1 FROM from_statement;
Use managed table, then you can DROP TABLE, then CREATE TABLE ... as SELECT
See also answer about skipTrash and auto.purge property.

Delete data in external and partitioned table in hive

I'm trying to delete data from external and partitioned table in hive. I can delete partitions with:
ALTER TABLE myTable DROP PARTITION(field > 'xxxx')
or
TRUNCATE TABLE myTable PARTITION(field)
But related files in Blob storage are not deleted. How do I delete those files?
In other hand, I'd like to delete data using any field as a filter (not only partition field). Can it be done in my case (in an external and partitioned table)? I've tried to achive this using:
INSERT OVERWRITE TABLE myTable PARTITION(field)
SELECT * FROM myTable WHERE machine = 'xxxxx'
But data in SELECT doesn't replace data in myTable.
Data in the external table will remain if you drop table or partition. Only if the table is managed, the data will be deleted automatically when the table or partition is deleted.
INSERT OVERWRITE TABLE myTable PARTITION(field) SELECT...
statement can replace data with newly loaded data for partitions existing in the returned dataset. If returned dataset is empty, the data will remain untouched.
To delete data in external table you need to delete files on the filesystem.

Insert overwrite doesnt updates data in external table

There is one external table in Hive. It has data. When I do insert overwrite it updates files at location which table points. But table data is not updated when executing select *.
I tried lot but couldn't find answer for this issue. I did my work using alternative method. I am sharing here so that if someone facing same issue, he can use this.
1.
create table target_table_name like source_table_name;
2.
insert overwrite table target_table_name partition(partition_column_name) select * from source_table_name;
3.create external table another_table_name like source_table_name stored as file_format_of_source_table location 'location_of_source_table';
4.msck repair table another_table;
5.Then you can drop source_table and rename another_table to source_table_name.

How to copy table by spark-sql

Actually, I want to move one table to another database.
But spark don't permit this.
Then, how to copy table by spark-sql?
I already tried this.
SELECT *
INTO table1 IN new_database
FROM old_database.table1
But it was not working.
maybe try:
CREATE TABLE new_db.new_table AS
SELECT *
FROM old_db.old_table;
To preserve partitioning and storage format do the following-
Get the complete schema of the existing table by running-
show create table db.old_table
The above query will output the table schema which you can just execute after changing the path name and table name.
Then insert all the rows into the new blank table using-
insert into db.new_table select * from db.old_table
The following snippet will create a new table while preserving the definition of the "old" table.
CREATE TABLE db.new_table LIKE db.old_table;
For more info, check the doc's CREATE TABLE.

Create external table with select from other table

I am using HDInsight and need to delete my clusters when I am finished running queries. However, I need the data I gather to survive for another day. I am working on queries that would create calculated columns from table1 and insert them into table2. First I wanted a simple test to copy the rows. Can you create an external table from a select statement?
drop table if exists table2;
create external table table2 as
select *
from table1
STORED AS TEXTFILE LOCATION 'wasb://{container name}#{storage name}.blob.core.windows.net/';
yes but you have to seperate it into two commands. First create the external table then fill it.
create external table table2(attribute STRING)
STORED AS TEXTFILE
LOCATION 'table2';
INSERT OVERWRITE TABLE table2 Select * from table1;
The schema of table2 has to be the same as the select query, in this example it consists only of one string attribute.
I know this is too stale question but here is the solution.
CREATE EXTERNAL TABLE table2
STORED AS textfile
LOCATION wasb://....
AS SELECT * FROM table1
Since create external table with "as select" clause is not supported in Hive, first we need to create external table with complete DDL command and then load the data into the table. Please go through this for different data format supports.
create external table table_ext(col1 typ1,...)
STORED AS ORC
LOCATION 'table2'; // optional if not provided then default location is used
INSERT OVERWRITE TABLE table_ext Select * from table1;
make sure table_ext has same DDL as table1.