I have created a table in hive from existing partitioned table using the command
create table new_table As select * from old_table;
Record counts are matching in both the table but when I give DESC table I could see the column is not partitioned in New table.
You should explicitly specify partition columns when creating the table.
create table new_table partitioned by (col1 datatype,col2 datatype,...) as
select * from old_table;
Related
I am a creating a table in hive and inserting data based on the partition column. The data is loading but when query the table and try to find the distinct of partition column values then I am getting nulls.
Table creation syntax: CREATE TABLE IF NOT EXISTS Table 1 (column1 INT, column2 String, Column3 String) PARTITIONED BY (ACC_D date) STORED AS orc;
Inserting data into the Table:
INSERT INTO TABLE Table 1 PARTITION (ACC_D)
SELECT DISTINCT
*
FROM
Schema.Sales;
I am using these properties after create and before inserting into the table
SET hive.exec.dynamic.partition=TRUE;
SET hive.exec.dynamic.partition.mode=nonstrict;
Using these properties based on the answer from this post : create table in hive
Data loaded on a daily basis.
Need to create a partition with the date column.
Date
3/15/2021 8:02:32 AM
12/21/2020 12:20:41 PM
You need to convert the table into a partition to the table. Then change the loading sql so that it inserts into the table properly.
Create a new table identical to original table and make sure the exclude partition column from list of columns and add it in partitioned by like below.
create table new_tab() partitioned by ( partition_dt string );
Load data into new_tab from original table. Make sure last column in your select clause is the partitioned col.
set hive.exec.dynamic.partition.mode=nonstrict;
insert into new_table partition(partition_dt )
select src.*, from_unixtime(unix_timestamp(dttm_column),'MM/dd/yyyy') as partition_dt from original_table src;
Drop original table and rename new_table as original table.
drop table original_table ;
alter table new_table rename to original_table ;
I have a Hive table foo. There are several fields in this table. One of them is some_id. Number of unique values in this fields in range 5,000-10,000. For each value (in example it 10385) I need to perform CTAS queries like
CREATE TABLE bar_10385 AS
SELECT * FROM foo WHERE some_id=10385 AND other_id=10385;
What is the best way to perform this bunch of queries?
You can store all these tables in the single partitioned one. This approach will allow you to load all the data in single query. Query performance will not be compromised.
Create table T (
... --columns here
)
partitioned by (id int); --new calculated partition key
Load data using one query, it will read source table only once:
insert overwrite table T partition(id)
select ..., --columns
case when some_id=10385 AND other_id=10385 then 10385
when some_id=10386 AND other_id=10386 then 10386
...
--and so on
else 0 --default partition for records not attributed
end as id --partition column
from foo
where some_id in (10385,10386) AND other_id in (10385,10386) --filter
Then you can use this table in queries specifying partition:
select from T where id = 10385; --you can create a view named bar_10385, it will act the same as your table. Partition pruning works fast
I would like to know if it is possible to create an external table in Hive according to a condition (I mean a WHERE) ?
You cannot create an external table with Create Table As Select (CTAS) in Hive. But you can create the external table first and insert data into the table from any other table with your filter criteria. Below is an example of creating a partitioned external table stored as ORC and inserting records into that table.
CREATE EXTERNAL TABLE `table_name`(
`column_1` bigint,
`column_2` string)
PARTITIONED BY (
`partition_column_1` string,
`partition_column_2` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'${dataWarehouseDir}/table_name'
TBLPROPERTIES (
'orc.compress'='ZLIB');
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE table_name PARTITION(partition_column_1, partition_column_2)
SELECT column_1, column_2, partition_column_1, partition_column_2 FROM Source_Table WHERE column = "your filter criteria here";
CREATE TABLE myTable AS
SELECT a,b,c FROM selectTable;
Hive cannot create an external table with CTAS.
CTAS has these restrictions:
1.The target table cannot be a partitioned table.
2.The target table cannot be an external table.
3.The target table cannot be a list bucketing table.
Reference:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS)
Alternately,
You can create an external table and insert into the table with select query.
You can create and external table by separating out the select statement with where clause.First create the external table and then use insert overwrite to external table using select with a where clause.
CREATE EXTERNAL TABLE table_name
STORED AS TEXTFILE
LOCATION '/user/path/table_name';
INSERT OVERWRITE TABLE table_name
SELECT * FROM Source_Table WHERE column="something";
I want to create a table using CTAS of partitioned table.
New table must have all the data and partitions, subpartitions of old table.
How to do this?
You need to first create the new table with all the partitions, there is no way you can add partition definitions to a CTAS. Once the table is created you can populate it using insert into .. select.
You can use dbms_metadata.get_ddl to get the definition of the old table.
select dbms_metadata.get_ddl('TABLE', 'NAME_OF_EXISTING_TABLE')
from dual;
Save the output of that into a script, do a search and replace to adjust the table name, then run the create table and then run the insert into ... select ...