is it possible to add bucketing on already partitioned table in hive? - hive

Is it possible to create bucketing for a table that already contains partition?
I have a table in hive with more than 100M of records and I want to create a bucket on this.

No you can't, you will have to create another table with bucketing enabled:
set hive.enforce.bucketing = true;
FROM old_table insert into table new_bucketed_partitioned_table select * ;

Related

Drop and overwrite external table in hive

I need to create an external table in hiveql with the output from a SELECT clause. Every time when the HiveQL is ran the table should be dropped and recreated . When we drop an external table only the table structure is getting dropped but not the data files from HDFS location. How to achieve this?
Create Table As Select (CTAS) has restrictions. One of them is that target table cannot be External.
You have these options:
Create external table once, then INSERT OVERWRITE
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) select_statement1 FROM from_statement;
Use managed table, then you can DROP TABLE, then CREATE TABLE ... as SELECT
See also answer about skipTrash and auto.purge property.

How to registerTempTable in SparkSQL

Spark version:2.2.0.cloudera2
Usually, we register a temp table in this way:
dataframe.registerTempTable($table_name)
But if I want to create a table in SQL statement, like this:
CREATE TABLE test_table from select * from table1
Spark will create a permanent table. Is there some way to create a temp table in SparkSQL statement?
You need to add TEMPORARY keyword in the SQL statement which would restrict writing the records to hive metastore for that particular table.
CREATE TEMPORARY TABLE test_table from select * from table1
Refer: https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html

Copied data column is not partitioned in target table in hive

I have created a table in hive from existing partitioned table using the command
create table new_table As select * from old_table;
Record counts are matching in both the table but when I give DESC table I could see the column is not partitioned in New table.
You should explicitly specify partition columns when creating the table.
create table new_table partitioned by (col1 datatype,col2 datatype,...) as
select * from old_table;

Add partitions on existing hive table

I'm processing a big hive's table (more than 500 billion records).
The processing is too slow and I would like to make it faster.
I think that by adding partitions, the process could be more efficient.
Can anybody tell me how I can do that?
Note that my table already exists.
My table :
create table T(
nom string,
prenom string,
...
date string)
Partitioning on date field.
Thx
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
INSERT OVERWRITE TABLE table_name PARTITION(Date) select date from table_name;
Note :
In the insert statement for a partitioned table make sure that you are specifying the partition columns at the last in select clause.
You have to restructure the table. Here are the steps:
Make sure no other process is writing to the table.
Create new external table using partitioning
Insert into new table by selecting from the old table
Drop the new table (external), only table will be dropped but data will be there
Drop the old table
Create the table with original name by pointing to the location under step 2
You can run repair command to fix all the metadata.
Alternative 4, 5, 6 and 7
Create the table with original name by running show create table on new table and replace with original table name
Run LOAD DATA INPATH command to move files under partitions to new partitions of new table
Drop the external table created
Both the approaches will achieve restructuring with one insert/map reduce job.

Load Hive partition from Hive view

I have a External Hive table with 4 partitions. I also have 4 hive views based on a different Hive table.
Every week I want the hive view to overwrite the partitions in the External Hive table.
I know I can create an unpartitioned hive table from a view like show below
CREATE TABLE hive_table AS SELECT * FROM hive_view;
But is there a way to overwrite partitions from view data?
Yes, there is a way:
INSERT OVERWRITE TABLE <table_name>
PARTITION(<partition_clause>)
SELECT <select_clause>
It is required to set hive.exec.dynamic.partition to true before such operations. See details here: Hive Language Manual DML - Dynamic Partitions
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
--partition table
create external table pracitse_part (
id int,
first_name string,
last_name string,
email string,
ip_address string
)
partitioned by (gender string)
row format delimited
fields terminated by ',';
--create veiw table
create view practise_view as
select p.*
from practise p join practise_temp pt
on p.id=pt.id
where p.id < 11;
--load data into partition table from view table
insert overwrite table pracitse_part partition(gender)
select id,first_name,last_name,email,ip_address,gender from practise_view;