Trino Invalid partition spec: - hive

I've created a external partition table in trino, I'm using hive connector.
I'm changing the partition location with unregister partition using command
system.unregister_partition(schema_name, table_name, partition_columns, partition_values)
and then register new partition location using system.register_partition(schema_name, table_name, partition_columns, partition_values, location).
Now If I run sync_partition_metadata system.sync_partition_metadata(schema_name, table_name, mode, case_sensitive) it is getting failed with below error
trino:default> call system.sync_partition_metadata('default','register_tbl1','full');
Query 20221207_050340_00129_cs2qk failed: Invalid partition spec: /path/default/compact_register_tbl1/partition_col=2022-12-07
Is there any other way to handle this call if we are changing the partition locations ??
Thanks in advance!

Related

How to add partition in presto?

In hive I can do it by:
ALTER TABLE xxx ADD PARTITION
(datehour='yy') LOCATION
'zz';
How can I do it in presto?
Currently, Presto Hive connector does not provide means for creating new partitions at arbitrary locations. If your partition location is under table location, you can use Presto Hive connector procedures:
system.create_empty_partition -- creates a new empty partition with specified values for partition keys
system.sync_partition_metadata -- synchronizes partition list in Metastore with the partitions on the storage
If you want to create/declare partitions somewhere else than under table's location, please file an issue.

Getting exception while updating table in Hive

I have created one table in hive from existing s3 file as follows:
create table reconTable (
entryid string,
run_date string
)
LOCATION 's3://abhishek_data/dump1';
Now I would like to update one entry as follows:
update reconTable set entryid='7.24E-13' where entryid='7.24E-14';
But I am getting following error:
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
I have gone through a few posts here, but not getting any idea how to fix this.
I think you should create an external table when reading data from a source like S3.
Also, you should declare the table in ORC format and set properties 'transactional'='true'.
Please refer to this for more info: attempt-to-do-update-or-delete-using-transaction-manager
You can refer to this Cloudera Community Thread:
https://community.cloudera.com/t5/Support-Questions/Hive-update-delete-and-insert-ERROR-in-cdh-5-4-2/td-p/29485

Hive metastore partition , how it works?

I have couple of query , please help me to understand
In Hive I see for couple of hive tables , Partitions information in cluster and in metastore are different what could be the reason ?
used "hive> show partitions " in Hive and " SELECT * FROM PARTITIONS WHERE TBL_ID=;" in metastore.
For some hive tables I see less number of partition information in Cluster but in metastore it is showing more partition . For this type of case when running query in hive tables using where clause for partition it is giving error that some partition are missing .
Where as there are some hive tables for which metastore has less number of partition information compare to cluster and in that case query is not giving error when running query using partition in where clause .
I suppose you are using Cloudera/Impala. The documentation says: If you believe an object exists but you cannot see it in the SHOW output, check with the system administrator if you need to be granted a new privilege for that object.
A table could span multiple different HDFS directories if it is partitioned. The directories could be widely scattered because a partition can reside in an arbitrary HDFS directory based on its LOCATION attribute.
See here: show partitions

external hive metastore issue in EMR cluster

I am pointing my EMR cluster's hive metastore to exteral MySQL RDS instance.
I have created new hive database "mydb" and I got the entry in external MySQL DB in hive.DBS table.
hdfs://ip-10-239-1-118.ec2.internal:8020/user/hive/warehouse/mydb.db mydb hadoop USER
I have also created new hive table "mytable" under mydb database. I got the entry in external MySQL DB in hive.TBLS. so far everything is good..
I terminated my cluster..When I come back next day..I launched new cluster
now, I did the below,
USE MYDB;
create table mytable_2(id int);
I am getting below error,
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: java.net.NoRouteToHostException No Route to Host from ip-10-239-1-4.ec2.internal/10.239.1.4 to ip-10-239-1-118.ec2.internal:8020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost)
note :
IP 10.239.1.4 is my current cluster's name node.
IP 10.239.1.118 is my earlier cluster's name node
please let me know what properties need to override to avoid this kind of errors?
I have same issue, and fixed. ^_^
hive> create table sales.t1(i int);
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got
exception: java.net.NoRouteToHostException
No Route to Host from ip-123-234-101-101.ec2.internal/123-234-101-101
to ip-111-111-202-202.ec2.internal:8020 failed on socket timeout
exception: java.net.NoRouteToHostException: No route to host;
For more details see: http://wiki.apache.org/hadoop/NoRouteToHost)
Cause:
We had an external Metastore for the cluster so that we could get rid of the cluster and spin up a new one anytime. Hive Metastore still keeps references to old cluster if there are ‘MANAGED’ tables.
Solution:
hive --service metatool -listFSRoot
hive --service metatool -updateLocation < new_value > < old_value >
E.g.:
new_value = hdfs://ip-XXX.New.XXX.XXX:PORT/user/hive/warehouse
old_value = hdfs://ip-YYY.Old.YYY.YYY:PORT/user/hive/warehouse
Alternatively, you can go into Glue in the AWS console, go to databases/default and edit the entry to have the updated ip in the Location field (which is the output of hive --service metatool -listFSRoot )

HIVE Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

I am getting the below error on creating a hive database
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/facebook/fb303/FacebookService$Iface
Hadoop version:**hadoop-1.2.1**
HIVE Version: **hive-0.12.0**
Hadoop path:/home/hadoop_test/data/hadoop-1.2.1
hive path :/home/hadoop_test/data/hive-0.12.0
I have copied hive*.jar ,jline-.jar,antlr-runtime.jar from hive-0.12.0/lib to hadoop-1.2./lib
set hive.msck.path.validation=ignore;
MSCK REPAIR TABLE table_name;
Make sure the location is specified correctly
In the following way, I solved the problem.
set hive.msck.repair.batch.size=1;
set hive.msck.path.validation=ignore;
If you can not set the value, and get the error.Error: Error while processing statement: Cannot modify hive.msck.path.validation at runtime. It is not in list of params that are allowed to be modified at runtime (state=42000,code=1)
add content in hive-site:
key:
hive.security.authorization.sqlstd.confwhitelist.append
value:
hive\.msck\.path\.validation|hive\.msck\.repair\.batch\.size
Set hive.metastore.schema.verification property in hive-site.xml to true, by default it is false.
For further details check this link.
Amazon Athena
If you get here because of Amazon Athena errors, you might use this bit below. First check that all you files have the same schema:
If you run an ALTER TABLE ADD PARTITION (or MSCK REPAIR TABLE) statement and mistakenly specify a partition that already exists and an incorrect Amazon S3 location, zero byte placeholder files of the format partition_value_$folder$ are created in Amazon S3. You must remove these files manually.
We removed the files with the awscli.
aws s3 rm s3://bucket/key/table/ --exclude="*" --include="*folder*" --recursive --dryrun
See also the docs with some extra steps included.
To proper fix this with MSCK
Remove the older partitions from metastore, if their path not exists, using
ALTER TABLE dbname.tablename DROP PARTITION IF EXISTS (partition_column_name > 0);
RUN MSCK REPAIR COMMAND
MSCK REPAIR TABLE dbname.tablename;
Why the step 1 is required because MSCK Repair command will through error if the partition is removed from the file system (HDFS), so by removing all the partitions from the metastore first and then sync with MSCK will properly add the required partitions
The reason why we got this error was we added a new column to the external Hive table. set hive.msck.path.validation=ignore; worked upto fixing hive queries but Impala had additional issues which were solved with below steps:
After doing an invalidate metadata, Impala queries started failing with Error: incompatible Parquet schema for column
Impala error SOLUTION: set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name;
if you're using Cloudera distribution below steps will make the change permanent and you don't have to set the option per session.
Cloudera Manager -> Clusters -> Impala -> Configuration -> Impala Daemon Query Options Advanced Configuration Snippet (Safety Valve)
Add the value: PARQUET_FALLBACK_SCHEMA_RESOLUTION=name
NOTE: do not use SET or semi-colon when setting the parameter in Cloudera Manager
open hive cli using "hive --hiveconf hive.root.logger=DEBUG,console" to enable logs and debug from there, in my case a camel case name for partition was written on hdfs and i created hive table with its name fully in lowercase.
None of proposed solutions worked for me.
I discovered a 0B file named _$folder$ inside my table location path (at same level of partitions).
Removing it allowed me to run a MSCK REPAIR TABLE t without issues.
This file was comming from a s3 restore (roll back to a previous versionned state)
I faced the same error. Reason in my case was a directory created in the HDFS warehouse with the same name. When this directory was deleted, it resolved my issue.
It's probably because your metastore_db is corrubpted. Delete .lck files from metastore_db.
hive -e "msck repair table database.tablename"
it will repair table metastore schema of table;
setting the below property and then doing msck repair worked for me :
set hive.mapred.mode=unstrict;
I faced similar issue when the underlying hdfs directory got updated with new partitions and hence the hive metastore went out of sync.
Solved using the following two steps:
MSCK table table_name showed what all partitions are out of sync.
MSCK REPAIR table table_name added the missing partitions.