Can't create a delta lake table in hive - hive

I'm trying to create a delta lake table in hive 3.
I moved the delta-hive-assembly_2.11-0.3.0.jar to the hive aux directory and set into the hive cli
SET hive.input.format=io.delta.hive.HiveInputFormat;
SET hive.tez.input.format=io.delta.hive.HiveInputFormat;
but when I try to create the table it throws the following error:
[2b8497c1-b4d3-492e-80a5-ec4db4119018 HiveServer2-Handler-Pool: Thread-133]: Exception occured while getting the URI from storage handler: Expected authority at index 22: deltastoragehandler://: Unsupported ex
java.net.URISyntaxException: Expected authority at index 22: deltastoragehandler://: Unsupported ex
at org.apache.hadoop.hive.ql.metadata.DefaultStorageHandler.getURIForAuth(DefaultStorageHandler.java:76) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
at org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizerV2.addHivePrivObject(CommandAuthorizerV2.java:210) [hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
The create statement:
CREATE EXTERNAL TABLE default.test_delta
( id INT, activ INT)
STORED BY 'io.delta.hive.DeltaStorageHandler'
LOCATION '/dev/delta/tessttt';
It's anyone that know why this error occurs ?

You need to add in the properties of the table the following: TBLPROPERTIES('DO_NOT_UPDATE_STATS'='true');
Example
CREATE EXTERNAL TABLE default.test_delta
( id INT, activ INT)
STORED BY 'io.delta.hive.DeltaStorageHandler'
LOCATION '/dev/delta/tessttt'
TBLPROPERTIES('DO_NOT_UPDATE_STATS'='true';
The solution also works for iceberg format.
Inspired by: https://github.com/delta-io/connectors/issues/279

Related

Create hive external table with partitions

I have already an internal table in hive. Now I want to create an external table with partitions based on date to it. But it throws error, when I try to create it.
Sample code:
create external table db_1.T_DATA1 partitioned by (date string) as select * from db_2.temp
LOCATION 'file path';
Error:
ParseException line 2:0 cannot recognize input near 'LOCATION' ''file
path'' '' in table source
As per the answer provided at https://stackoverflow.com/a/26722320/4326922 you should be able to create external table with CTAS.

columns has 2 elements while hbase.columns.mapping has 3 elements error while creating hive table from hbase

I'm getting following error when I run the below command for creating hive table.
sample is my hive table I'm trying to create. hloan is my existing hbase table. Please help.
create external table sample(id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES("hbase.columns.mapping"=":key,hl:id,hl:name")
TBLPROPERTIES ("hbase.table.name"="hloan","hbase.mapred.output.outputtable"="sample");
ERROR:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 2 elements while hbase.columns.mapping has 3 elements (counting the key if implicit))
As error describes your create external table statement having 2 columns id,name.
In Hbase mapping you are having 3 columns :key,hl:id,hl:name
Create table with 3 columns:
hive> create external table sample(key int, id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES("hbase.columns.mapping"=":key,hl:id,hl:name")
TBLPROPERTIES ("hbase.table.name"="hloan","hbase.mapred.output.outputtable"="hloan");
(or)
if key and id columns having same data then you can skip hl:id in mapping.
Create table with 2 columns:
hive> create external table sample(id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES("hbase.columns.mapping"=":key,hl:name")
TBLPROPERTIES ("hbase.table.name"="hloan","hbase.mapred.output.outputtable"="hloan");

PrestoDB Hive Catalog: no viable alternative at input 'CREATE EXTERNAL'

I am running a query in Teradata PrestoDB distribution for Hive catalog as:
CREATE EXTERNAL TABLE hive.default.mydata
id INT, datetime timestamp, latitude FLOAT,
longitude FLOAT, bookingid VARCHAR, pre_lat FLOAT,
pre_long FLOAT, time_hour decimal(6, 1), dist_kms decimal(6, 1),
ma6_dist_kms decimal(6, 1), istationary INT, quality_overall VARCHAR,
quality_nonstationary VARCHAR, cartype VARCHAR, isbigloss INT,
bookregion VARCHAR, iho_road VARCHAR)
STORED AS PARQUET
LOCATION "s3://sb.mycompany.com/someFolder/anotherFolder";
Throwing following exception:
Query 20180316_022346_00001_h9iie failed: line 1:8: no viable alternative at input 'CREATE EXTERNAL'
Even when I use hive and run a show table command, I see an error as Schema is set but catalog is not:
presto> use hive;
presto:hive> show tables;
Error running command:
Error starting query at http://localhost:8080/v1/statement returned HTTP response code 400.
Response info:
JsonResponse{statusCode=400, statusMessage=Bad Request, headers={Content-Length=[32], Date=[Fri, 16 Mar 2018 02:25:25 GMT], Content-Type=[text/plain]}, hasValue=false, value=null}
Response body:
Schema is set but catalog is not
Any help would be appreciated. Thanks.
There is no such thing like CREATE EXTERNAL TABLE in Presto. In order to create Hive external table in Presto, please do something like:
CREATE TABLE hive.web.request_logs (
request_time timestamp,
url varchar,
ip varchar,
user_agent varchar
)
WITH (
format = 'TEXTFILE',
external_location = 's3://my-bucket/data/logs/'
)
Please visit this page to see how to interact with Hive from Presto: https://docs.starburstdata.com/latest/connector/hive.html?highlight=hive
use hive; set only the current schema in the user session. I think you wanted to do something like: USE hive.default;. Please take a look here for more details: https://docs.starburstdata.com/latest/sql/use.html

Vora Modeler View preview fails with com.sap.spark.vora.client.jdbc.VoraJdbcException

I'm running Vora 1.3 on HDP 2.4.3 with Spark 1.6.2.
I've got two tables with data of the same schema, one table residing in a HANA db, another stored as CSV file in HDFS.
I created both tables in Vora using Zeppelin:
CREATE TABLE flights_2006 (Year int, Month_ int, DayofMonth int, DayOfWeek int, DepTime int, CRSDepTime int, ArrTime int, CRSArrTime int, UniqueCarrier string, FlightNum int,
TailNum string, ActualElapsedTime int, CRSElapsedTime int, AirTime int, ArrDelay int, DepDelay int, Origin string, Dest string, Distance int, TaxiIn int, TaxiOut int,
Cancelled int, CancellationCode int, Diverted int, CarrierDelay int, WeatherDelay int, NASDelay int, SecurityDelay int, LateAircraftDelay int)
USING com.sap.spark.vora
OPTIONS (
files "/exch/flights_filtered/part-00000,/exch/flights_filtered/part-00001,/exch/flights_filtered/part-00002,/exch/flights_filtered/part-00003,/exch/flights_filtered/part-00004",
csvdelimiter ","
)
Q1. By the way, when is it going to be possible to supply just directory names, not list all files in a directory, when creating Vora tables from file sources? It's very impractical, as one cannot predict how many part-files are going to be in a directory.
CREATE TABLE flights_2007
USING com.sap.spark.hana
OPTIONS (
tablepath "XXXXXXXXXXXX",
dbschema "XXXXXXXXXX",
host "XXXXXXXXXXX",
instance "00",
user "XXXXXXXXXXX",
passwd "XXXXXXXXXX"
)
And I was able to produce a result from the table join for these two (business meaning of such join set aside):
select f7.MONTH, f7.DAYOFMONTH, f7.UNIQUECARRIER, f7.FLIGHTNUM, f7.YEAR, f7.DEPTIME, f6.year, f6.DepTime
from flights_2007 as f7 inner join flights_2006 as f6
on f7.MONTH = f6.Month_ and f7.DAYOFMONTH = f6.DayofMonth and f7.UNIQUECARRIER = f6.UniqueCarrier and f7.FLIGHTNUM = f6.FlightNum
where f7.MONTH = 1 and f7.DAYOFMONTH = 2 and f7.UNIQUECARRIER = 'WN'
Then I tried to do the very same steps in Vora Modeler.
Q2. How comes that REGISTER TABLE in Zeppelin doesn't lead to tables being available in Vora Modeler?
So, I executed the same two table creation statements in Vora Modeler, using all capitals in table names, as I remember Vora has some issues with that earlier. Then created a Vora View as a join of the two tables with this condition:
FLIGHTS_2007.MONTH = FLIGHTS_2006.MONTH_ and
FLIGHTS_2007.DAYOFMONTH = FLIGHTS_2007.DAYOFMONTH and
FLIGHTS_2007.UNIQUECARRIER = FLIGHTS_2006.UNIQUECARRIER and
FLIGHTS_2007.FLIGHTNUM = FLIGHTS_2006.FLIGHTNUM
.. and used the where-condition:
FLIGHTS_2007.MONTH = 1 and
FLIGHTS_2007.DAYOFMONTH = 2 and
FLIGHTS_2007.UNIQUECARRIER = 'WN'
Expected result for that View preview would be the same as for Zeppelin-based select. Actual result (first few lines of):
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 2165.0 failed 4 times, most recent failure: Lost task 2.3 in stage 2165.0 (TID 78743, eba165.extendtec.com.au): com.sap.spark.vora.client.jdbc.VoraJdbcException: [Vora [eba165.extendtec.com.au:34530.1615085]] Unknown error when executing SELECT "FLIGHTS_2006"."FLIGHTNUM", "FLIGHTS_2006"."DEPTIME", "FLIGHTS_2006"."UNIQUECARRIER", "FLIGHTS_2006"."MONTH_", "FLIGHTS_2006"."YEAR" FROM "FLIGHTS_2006": HL(9): Runtime error. (schema error: could not resolve column "FLIGHTS_2006"."YEAR" (sql parse error)) at com.sap.spark.vora.client.jdbc.VoraJdbcClient.liftedTree1$1(VoraJdbcClient.scala:210) at com.sap.spark.vora.client.jdbc.VoraJdbcClient.generateAutocloseableIteratorFromQuery(VoraJdbcClient.scala:187) at com.sap.spark.vora.client.VoraClient$$anonfun$generateAutocloseableIteratorFromQuery$1.apply(VoraClient.scala:363) at com.sap.spark.vora.client.VoraClient$$anonfun$generateAutocloseableIteratorFromQuery$1.apply(VoraClient.scala:363) at scala.util.Try$.apply(Try.scala:161) at com.sap.spark.vora.client.VoraClient.handleExceptions(VoraClient.scala:775) at com.sap.spark.vora.client.VoraClient.generateAutocloseableIteratorFromQuery(VoraClient.scala:362) at com.sap.spark.vora.VoraRDD.compute(voraRDD.scala:54) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at
Q3. Did I do anything wrong in Vora Modeler? Or is it actually a bug?
you mention that you used all caps for table names when running your CREATE statements. In my experience with the 1.3 Modeler, you must use all uppercase for your column names as well.
schema error: could not resolve column "FLIGHTS_2006"."YEAR"
For example, if you used "CREATE TABLE FLIGHTS_2006 (Year int, ...", try changing that to "CREATE TABLE FLIGHTS_2006 (YEAR int, ..."
Regarding your Q1, yes this is something are currently reviewing as a feature request.
Regarding your Q2, is your Zeppelin connected to the same Vora Thriftserver as your Vora Modeler (aka Vora Tools) ?
Regarding your Q3, the other reply from Ryan is correct, column names are also case sensetive in Vora 1.3

Impala can not drop external table

I create a external table with a wrong(non-exists) path :
create external table IF NOT EXISTS ds_user_id_csv
(
type string,
imei string,
imsi string,
idfa string,
msisdn string,
mac string
)
PARTITIONED BY(prov string,day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
stored as textfile
LOCATION 'hdfs://cdh0:8020/user/hive/warehouse/test.db/ds_user_id';
And I can not drop the table:
[cdh1:21000] > drop table ds_user_id_csv
> ;
Query: drop table ds_user_id_csv
ERROR:
ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore:
CAUSED BY: MetaException: java.lang.IllegalArgumentException: Wrong FS: hdfs://cdh0:8020/user/hive/warehouse/test.db/ds_user_id, expected: hdfs://nameservice1
So how to solve this? Thank you.
Use the following command to change the location
ALTER TABLE name ds_user_id_csv SET LOCATION '{new location}';