CREATE EXTERNAL TABLE with clause in Hive - hive

I would like to know if it is possible to create an external table in Hive according to a condition (I mean a WHERE) ?

You cannot create an external table with Create Table As Select (CTAS) in Hive. But you can create the external table first and insert data into the table from any other table with your filter criteria. Below is an example of creating a partitioned external table stored as ORC and inserting records into that table.
CREATE EXTERNAL TABLE `table_name`(
`column_1` bigint,
`column_2` string)
PARTITIONED BY (
`partition_column_1` string,
`partition_column_2` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'${dataWarehouseDir}/table_name'
TBLPROPERTIES (
'orc.compress'='ZLIB');
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE table_name PARTITION(partition_column_1, partition_column_2)
SELECT column_1, column_2, partition_column_1, partition_column_2 FROM Source_Table WHERE column = "your filter criteria here";

CREATE TABLE myTable AS
SELECT a,b,c FROM selectTable;

Hive cannot create an external table with CTAS.
CTAS has these restrictions:
1.The target table cannot be a partitioned table.
2.The target table cannot be an external table.
3.The target table cannot be a list bucketing table.
Reference:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS)
Alternately,
You can create an external table and insert into the table with select query.

You can create and external table by separating out the select statement with where clause.First create the external table and then use insert overwrite to external table using select with a where clause.
CREATE EXTERNAL TABLE table_name
STORED AS TEXTFILE
LOCATION '/user/path/table_name';
INSERT OVERWRITE TABLE table_name
SELECT * FROM Source_Table WHERE column="something";

Related

Drop and overwrite external table in hive

I need to create an external table in hiveql with the output from a SELECT clause. Every time when the HiveQL is ran the table should be dropped and recreated . When we drop an external table only the table structure is getting dropped but not the data files from HDFS location. How to achieve this?
Create Table As Select (CTAS) has restrictions. One of them is that target table cannot be External.
You have these options:
Create external table once, then INSERT OVERWRITE
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) select_statement1 FROM from_statement;
Use managed table, then you can DROP TABLE, then CREATE TABLE ... as SELECT
See also answer about skipTrash and auto.purge property.

Copied data column is not partitioned in target table in hive

I have created a table in hive from existing partitioned table using the command
create table new_table As select * from old_table;
Record counts are matching in both the table but when I give DESC table I could see the column is not partitioned in New table.
You should explicitly specify partition columns when creating the table.
create table new_table partitioned by (col1 datatype,col2 datatype,...) as
select * from old_table;

Read multiple files in Hive table by date range

Let's imagine I store one file per day in a format:
/path/to/files/2016/07/31.csv
/path/to/files/2016/08/01.csv
/path/to/files/2016/08/02.csv
How can I read the files in a single Hive table for a given date range (for example from 2016-06-04 to 2016-08-03)?
Assuming every files follow the same schema, I would then suggest that you store the files with the following naming convention :
/path/to/files/dt=2016-07-31/data.csv
/path/to/files/dt=2016-08-01/data.csv
/path/to/files/dt=2016-08-02/data.csv
You could then create an external table partitioned by dt and pointing to the location /path/to/files/
CREATE EXTERNAL TABLE yourtable(id int, value int)
PARTITIONED BY (dt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/path/to/files/'
If you have several partitions and don't want to write alter table yourtable add partition ... queries for each one, you can simply use the repair command that will automatically add partitions.
msck repair table yourtable
You can then simply select data within a date range by specifying the partition range
SELECT * FROM yourtable WHERE dt BETWEEN '2016-06-04' and '2016-08-03'
Without moving your file:
Design your table schema. In hive shell, create the table (partitioned by date)
Loading files into tables
Query with HiveQL ( select * from table where dt between '2016-06-04 ' and '2016-08-03')
Moving your file:
Design your table schema. In hive shell, create the table (partitioned by date)
move /path/to/files/2016/07/31.csv under /dbname.db/tableName/dt=2016-07-31, then you'll have
/dbname.db/tableName/dt=2016-07-31/file1.csv
/dbname.db/tableName/dt=2016-08-01/file1.csv
/dbname.db/tableName/dt=2016-08-02/file1.csv
load partition with
alter table tableName add partition (dt=2016-07-31);
See Add partitions
In Spark-shell, read hive table
/path/to/data/user_info/dt=2016-07-31/0000-0
1.create sql
val sql = "CREATE EXTERNAL TABLE `user_info`( `userid` string, `name` string) PARTITIONED BY ( `dt` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://.../data/user_info'"
2. run it
spark.sql(sql)
3.load data
val rlt= spark.sql("alter table user_info add partition (dt=2016-09-21)")
4.now you can select data from table
val df = spark.sql("select * from user_info")

Create external table with select from other table

I am using HDInsight and need to delete my clusters when I am finished running queries. However, I need the data I gather to survive for another day. I am working on queries that would create calculated columns from table1 and insert them into table2. First I wanted a simple test to copy the rows. Can you create an external table from a select statement?
drop table if exists table2;
create external table table2 as
select *
from table1
STORED AS TEXTFILE LOCATION 'wasb://{container name}#{storage name}.blob.core.windows.net/';
yes but you have to seperate it into two commands. First create the external table then fill it.
create external table table2(attribute STRING)
STORED AS TEXTFILE
LOCATION 'table2';
INSERT OVERWRITE TABLE table2 Select * from table1;
The schema of table2 has to be the same as the select query, in this example it consists only of one string attribute.
I know this is too stale question but here is the solution.
CREATE EXTERNAL TABLE table2
STORED AS textfile
LOCATION wasb://....
AS SELECT * FROM table1
Since create external table with "as select" clause is not supported in Hive, first we need to create external table with complete DDL command and then load the data into the table. Please go through this for different data format supports.
create external table table_ext(col1 typ1,...)
STORED AS ORC
LOCATION 'table2'; // optional if not provided then default location is used
INSERT OVERWRITE TABLE table_ext Select * from table1;
make sure table_ext has same DDL as table1.

Load Hive partition from Hive view

I have a External Hive table with 4 partitions. I also have 4 hive views based on a different Hive table.
Every week I want the hive view to overwrite the partitions in the External Hive table.
I know I can create an unpartitioned hive table from a view like show below
CREATE TABLE hive_table AS SELECT * FROM hive_view;
But is there a way to overwrite partitions from view data?
Yes, there is a way:
INSERT OVERWRITE TABLE <table_name>
PARTITION(<partition_clause>)
SELECT <select_clause>
It is required to set hive.exec.dynamic.partition to true before such operations. See details here: Hive Language Manual DML - Dynamic Partitions
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
--partition table
create external table pracitse_part (
id int,
first_name string,
last_name string,
email string,
ip_address string
)
partitioned by (gender string)
row format delimited
fields terminated by ',';
--create veiw table
create view practise_view as
select p.*
from practise p join practise_temp pt
on p.id=pt.id
where p.id < 11;
--load data into partition table from view table
insert overwrite table pracitse_part partition(gender)
select id,first_name,last_name,email,ip_address,gender from practise_view;