hive concatenate partition all - hive

How to concatenate all partitions in one single command.
for eg , in case to analyze stats , we can run analyze table 'tablename' partition ( 'partition column name' ) compute statistics;
similarly for dropping etc we can run , alter table 'tablename' drop partition ( 'partition column name' '>/=/< etc' 'value' )
but it seems there is no way to concatenate command for whole partitions in one go.
for now i am generating like hdfs dfs -ls 'hdfs table location' | awk -F '/' '{print "alter table 'tablename' partition ("$NF") concatenate;"}'
is there any way where i can get same output in one command?

Related

Alter Hive partition column name NOT changed in HDFS

When alter the partition column name of the partition table(named partitioned_table), the corresponding directory in the HDFS does not change. However, the deletion and movement of partitions can be changed in the HDFS.And the the column name is changed using "show partitioin partitioned_table".
Hive version is 4.0.0-alpha-2.
Use the below statement to alter partiton column name.
ALTER TABLE table_name PARTITION
(partition_column = partition_col_value,
partition_column = partition_col_value)
RENAME TO PARTITION (partition_column = partition_col_value,
partition_column = partition_col_value);
Why and how to change the corresponding directory in HDFS when alter partition column name in Hive.
When you alter a partition, it only affects the Hive Metastore, and will never affect data in HDFS. For that, you need to explicitly insert data into the Hive table at that partition, or issue an hdfs mv command, then MSCK REPAIR Hive query to fix the metadata

sqoop : Pull data to hive table with extra columns

I need to pull records from a MySQL table with n columns and store them in hive with extra columns. Is there any way in sqoop to perform it?
Example:
MySQL table has the following fields id, name, place. And,
Hive table structure is id, name, place and contact number(null).
So when performing sqoop, I want to add an extra column contact number in hive as (null).
You can specify it in the by using --query option in sqoop and select the extra column with NULL AS.
sqoop import \
--query 'SELECT id, name, place, NULL AS contact_number FROM mysql_table'
--connect jdbc:mysql://mysql.example.com/sqoop \
--Any other options

dropping hive partition dynamically

I have an HIVE table with daily partitions day wise, something like below (which includes future date's partition as well)
20160901
20160902
........
........
........
20160931
20161001
20161002
I want to pass one date say for example yesterday's date 20160922 and want to drop all partitions dynamically which are >= 20160922 (though today is 20160923, but I want to drop from 20160922 date).
How can I can drop all these partitions dynamically.
You can not do in hive directly as it does not support dynamic sql.
There can be work around using shell script/or any script create file having drop partition script like below.
alter table partition_t drop if exists partition (y=20160922 );
alter table partition_t drop if exists partition (y=20160921 );
alter table partition_t drop if exists partition (y=20160920 );
...
then run hive -v -f ./file.sh
alter table partition_t drop if exists partition
Before Inserting Data Into Table Perform the below steps.
1) Go to Hdfs Folder of that table and delete all the folders Inside
Table Directory using Shell Commands. hadoop fs -rm r <>
2) Run MSCK repair Table to update the metadata about partitions.
above two steps will delete all the available partitions based on pattern.
Now Insert your new data.
You can drop partitions giving a range filter. For reference see that answer : https://stackoverflow.com/a/48422251/3132181
So your code could be like that:
Alter table mytable drop partition (datehour >= '20160922')

Export the data from a hive/impala table with few conditions into file

What is the efficient way to export the data from hive/impala table with conditions into file(the data would be huge, close to 10 GB)? The format of the hive table is paraquet with snappy compressed and file is csv.
The table is partitioned daily and data needs to be extracted on daily basis, I would like to know if
1) Imapala approach
impala-shell -k -i servername:portname -B -q 'select * from table where year_month_date=$$$$$$$$' -o filename '--output_delimiter=\001'
2) Hive approach
Insert overwrite directory '/path' select * from table where year_month_date=$$$$$$$$
would be efficient
Assuming table tbl as your hive parquet table and condition as your filter condition.
CTAS command:
CREATE TABLE tbl_text ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/tmp/data' AS select * from tbl where condition;
You will find your CSV text file (delimited by ',') at /tmp/data in HDFS.
You can get this file to your local file system if needed using:
hadoop fs -get /tmp/data
Please try to use Dynamic Partitioning for your Hive/Impala table to efficiently export the data conditionally.
Partition your table with the columns of your interest and based on your queries for best results
Step 1: Create a Temporary Hive Table TmpTable and load your raw data into it
Step 2: Set hive parameters to support Dynamic partition
SET hive.exec.dynamic.partition.mode=non-strict;
SET hive.exec.dynamic.partition=true;
Step 3: Create your Main Hive Table with partition columns, example :
CREATE TABLE employee (
emp_id int,
emp_name string
PARTITIONED BY (location string)
STORED AS PARQUET;
Step 4: Load data from Temporary table to your employee table (Main Table)
insert overwrite table employee partition(location)
select emp_id,emp_name, location from TmpTable;
Step 5: export the data from hive with a condition
INSERT OVERWRITE DIRECTORY '/path/to/output/dir' SELECT * FROM employee WHERE location='CALIFORNIA';
Please refer this link:
Dynamic Partition Concept
Hope this is useful.

S3 folder structure for hive partitioned tables without "=" in it

I have an existing S3 folder structure like this,
s3://mydata/{country}/{date}/
{country} could be any of 30 different countries
{date} could be any date since 20150101
How can I read this in Hive by treating {country} as partition and {date} as sub partition ?
You can use the Hive DDL statement ALTER TABLE ADD PARTITION
ALTER TABLE mydata
ADD PARTITION (country='south-africa', date='20191024')
LOCATION 's3://mydata/south-africa/20191024/';
You can script this using a shell script, and passing each statement to Hive like hive -e 'ALTER TABLE $TABLE ADD PARTITION $PARTITION_SPEC LOCATION $PARTITION_LOCATION'
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddPartitions