How to rename partition value in Hive? - hive

I have a hive table 'videotracking_playevent' which uses the following partition format (all strings): source/createyear/createmonth/createday.
Example: source=home/createyear=2016/createmonth=9/createday=1
I'm trying to update the partition values of createmonth and createday to consistently use double digits instead.
Example: source=home/createyear=2016/createmonth=09/createday=01
I've tried to the following query:
ALTER TABLE videotracking_playevent PARTITION (
source='home',
createyear='2015',
createmonth='11',
createday='1'
) RENAME TO PARTITION (
source='home',
createyear='2015',
createmonth='11',
createday='01'
);
However that returns the following, non-descriptive error from hive: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. null
I've confirmed that this partition exists, and I think I'm using the correct syntax. My hive version is Hive 1.1.0
Any ideas what I might be doing wrong?

There was an issue with old version of Hive with renaming partition. This might be an issue for your case too. Please see this link for detail.
You need to set below two property before executing the rename partition command if you are using Older version of Hive.
set fs.hdfs.impl.disable.cache=false;
set fs.file.impl.disable.cache=false;
Now run the query by setting this property.
hive> set fs.hdfs.impl.disable.cache=false;
hive> set fs.file.impl.disable.cache=false;
hive> ALTER TABLE partition_test PARTITION (year='2016',day='1') RENAME TO PARTITION (year='2016',day='01');
OK
Time taken: 0.28 seconds
hive> show partitions partition_test;
OK
year=2016/day=01
Time taken: 0.091 seconds, Fetched: 1 row(s)
hive>
This issue is fixed in Hive latest version. In my case Hive version is 1.2.1 and it works, without setting that property. Please see the example below.
Create a partitioned table.
hive> create table partition_test(
> name string,
> age int)
> partitioned by (year string, day string);
OK
Time taken: 5.35 seconds
hive>
Now add the partition and check the newly added partition.
hive> alter table partition_test ADD PARTITION (year='2016', day='1');
OK
Time taken: 0.137 seconds
hive>
hive> show partitions partition_test;
OK
year=2016/day=1
Time taken: 0.169 seconds, Fetched: 1 row(s)
hive>
Rename the partition using RENAME TO PARTITION command and check it.
hive> ALTER TABLE partition_test PARTITION (year='2016',day='1') RENAME TO PARTITION (year='2016',day='01');
OK
Time taken: 0.28 seconds
hive> show partitions partition_test;
OK
year=2016/day=01
Time taken: 0.091 seconds, Fetched: 1 row(s)
hive>
Hope it helps you.

Rename lets you change the value of a partition column. One of use cases is that you can use this statement to normalize your legacy partition column value to conform to its type. In this case, the type conversion and normalization are not enabled for the column values in old partition_spec even with property hive.typecheck.on.insert set to true (default) which allows you to specify any legacy data in form of string in the old partition_spec"
Bug open
https://issues.apache.org/jira/browse/HIVE-10362

You can create a copy of the table without partition, then update the column of the table, and then recreate the first one with partition
create table table_name partitioned by (table_column) as
select
*
from
source_table
That worked for me.

Related

How to specify fields when there are keywords in the field list in Hive?

I am trying to parse some historical SQL in the high version of hive (version is 2.3.7) so that tasks can be migrated to the high version. I encountered the following keyword problem. I cannot delete the field list after the table name because it may disrupt the insertion order.
How to deal with such a keyword problem? The field name of the old version of SQL cannot be changed
hive> create database db_test;
OK
Time taken: 0.017 seconds
hive> use db_test;
OK
Time taken: 0.007 seconds
hive> create table tb_test_to(
> `name` String,
> `interval` STRING
> );
OK
Time taken: 0.037 seconds
hive> create table tb_test_from(
> `name` String,
> `interval` STRING
> );
OK
Time taken: 0.052 seconds
hive> show tables;
OK
tb_test_from
tb_test_to
Time taken: 0.011 seconds, Fetched: 2 row(s)
hive> insert into tb_test_to (name,`interval`) select name, `interval` from tb_test_from;
FAILED: SemanticException 1:24 '`interval`' in insert schema specification is not found among regular columns of db_test.tb_test_to nor dynamic partition columns.. Error encountered near token '`interval`'
hive>
You can only do
insert into tb_test_to select name, `interval` from tb_test_from;
because you cannot specify a column list in HiveQL queries, according to the HiveQL manual at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Synopsis.3
Values must be provided for every column in the table. The standard SQL syntax that allows the user to insert values into only some columns is not yet supported. To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to.

Deletion of Partitions

I am not able to drop partition in hive table.
ALTER TABLE db.table drop if exists partition(dt="****-**-**/id=**********");
OK
Time taken: 0.564 seconds
But partitions are not getting deleted
Below is the what I get when I check partitions of my table:
hive> show partitions db.table;
OK
dt=****-**-**/id=**********
dt=****-**-**/id=**********
dt=****-**-**/id=**********
dt=****-**-**/id=**********
After running Alter table db.table drop if exists command it should actually delete the partition . But it is not happening so .
Can you please suggest me on this.
Thanks in advance.
Try this:
ALTER TABLE db.table drop if exists partition(dt='****-**-**', id='**********');
As #leftjoin also mentioned, you have to specify partitions with comma seperated.
ALTER TABLE page_view DROP if exists PARTITION (dt='****-**-**', id='**********');
Please note -
In Hive 0.7.0 or later, DROP returns an error if the partition doesn't
exist, unless IF EXISTS is specified or the configuration variable
hive.exec.drop.ignorenonexistent is set to true.
Due to this reason, your query didn't fail and returned OK response.

How to create n number of external tables with a single hdfs path using Hive

Is it possible to create n number of external tables are pointing to a single hdfs path using Hive. If yes what are the advantages and its limitations.
It is possible to create many tables (both managed and external at the same time) on top of the same location in HDFS.
Creating tables with exactly the same schema on top of the same data is not useful at all, but you can create different tables with different number of columns for example or with differently parsed columns using RegexSerDe for example, so you can have different schemas in these tables. And you can have different permissions on these tables in Hive. Also table can be created on top of the sub-folder of some other tables folder, in this case it will contain a sub-set of data. Better use partitions in single table for the same.
And the drawback is that it is confusing because you can rewrite the same data using more than one table and also you may drop it accidentally, thinking this data belongs to the only table and you can drop data because you do not need that table any more.
And this is few tests:
Create table with INT column:
create table T(id int);
OK
Time taken: 1.033 seconds
Check location and other properties:
hive> describe formatted T;
OK
# col_name data_type comment
id int
# Detailed Table Information
Database: my
Owner: myuser
CreateTime: Fri Jan 04 04:45:03 PST 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://myhdp/user/hive/warehouse/my.db/t
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1546605903
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.134 seconds, Fetched: 26 row(s)
sts)
Create second table on top of the same location but with STRING column:
hive> create table T2(id string) location 'hdfs://myhdp/user/hive/warehouse/my.db/t';
OK
Time taken: 0.029 seconds
Insert data:
hive> insert into table T values(1);
OK
Time taken: 33.266 seconds
Check data:
hive> select * from T;
OK
1
Time taken: 3.314 seconds, Fetched: 1 row(s)
Insert into second table:
hive> insert into table T2 values( 'A');
OK
Time taken: 23.959 seconds
Check data:
hive> select * from T2;
OK
1
A
Time taken: 0.073 seconds, Fetched: 2 row(s)
Select from first table:
hive> select * from T;
OK
1
NULL
Time taken: 0.079 seconds, Fetched: 2 row(s)
String was selected as NULL because this table is defined as having INT column.
And now insert STRING into first table (INT column):
insert into table T values( 'A');
OK
Time taken: 84.336 seconds
Surprise, it is not failing!
What was inserted?
hive> select * from T2;
OK
1
A
NULL
Time taken: 0.067 seconds, Fetched: 3 row(s)
NULL was inserted, because during previous insert string was converted to int and this resulted in NULL
Now let's try to drop one table and select from another one:
hive> drop table T;
OK
Time taken: 4.996 seconds
hive> select * from T2;
OK
Time taken: 6.978 seconds
Returned 0 rows because first table was MANAGED and drop table also removed common location.
THE END,
data is removed, do We need T2 table without data in it?
drop table T2;
OK
Second table is removed, you see, it was metadata only. The table was also managed and drop table should remove the location with data also, but it's already nothing to remove in HDFS, only metadata was removed.

HDINSIGHT hive, MSCK REPAIR TABLE table_name throwing error

i have an external partitioned table named employee with partition(year,month,day), everyday a new file come and seat at the particular day location call for today's date it will be at 2016/10/13.
TABLE SCHEMA:
create External table employee(EMPID Int,FirstName String,.....)
partitioned by (year string,month string,day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' LOCATION '/.../emp';
so everyday we need to run command which is working fine as
ALTER TABLE employee ADD IF NOT EXISTS PARTITION (year=2016,month=10,day=14) LOCATION '/.../emp/2016/10/14';
but once we are trying with below command because we don't want to execute the above alter table command manually, it throws below Error
hive> MSCK REPAIR TABLE employee;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Note:
hive> MSCK TABLE employee; //this show me that a partition has not added in the table
OK
Partitions not in metastore: employee:2016/10/14
Time taken: 1.066 seconds, Fetched: 1 row(s)
please help me as i stuck with this. do we have any workaround for this type of situation?
I got a workaround solution for my problem which is, if the table static partition name is like 'year=2016/month=10/day=13' then we can use below command and it is working...
set hive.msck.path.validation=ignore;
MSCK REPAIR TABLE table_name;

Hive external table not showing partitions

I have created an external table using Hive. My
hive> desc <table_name>;
shows the following output:
OK
transactiontype string
transactionid int
sourcenumber int
destnumber int
amount int
assumedfield1 int
transactionstatus string
assumedfield2 int
assumedfield3 int
transactiondate date
customerid int
# Partition Information
# col_name data_type comment
transactiondate date
customerid int
Time taken: 0.094 seconds, Fetched: 17 row(s)
But when I execute the following command:
hive> show partitions <dbname.tablename>;
OK
Time taken: 0.11 seconds
No partitions are shown. What might be the problem? When i see the hive.log, data in the table seems to be paritioned properly according to the 'transactiondate' and the 'customerid' fields. What is the max number of partitions that a single node should have? I have set 1000 partitions.
2015-06-15 10:33:44,713 INFO [LocalJobRunner Map Task Executor #0]: exec.FileSinkOperator (FileSinkOperator.java:createBucketForFileIdx(593)) - Writing to temp file: FS hdfs://localhost:54310/home/deepak/mobile_money_jan.txt/.hive-staging_hive_2015-06-15_10-30-53_308_5507019849041735537-1/_task_tmp.-ext-10002/transactiondate=2015-01-16/customerid=34560544/_tmp.000002_0
I am running hive on a single node hadoop cluster.
Try adding partitions manually
> alter table db.table add IF NOT EXISTS
> partition(datadate='2017-01-01') location
>'hdfs_location/datadate=2017-01-01'
HI whenever we create an external table it's location is changed to a specified location in HIVE METADATA,it means now this changes reflects in hive meta store too.
BUT the partition information remain unchanged ,partition information is not updated in hive meta store so we need to add those partitions manually.
ALTER TABLE "your-table" ADD PARTITION(transactiondate='datevalue',customerid='id-value');