I'm creating a new external table in AWS Athena. When I try to create a decimal(9,2) variable, the creation of the table is fine, but when I try to SELECT, I get the following error:
GENERIC_INTERNAL_ERROR: java.lang.String cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal
Example of my DDL:
CREATE EXTERNAL TABLE IF NOT EXISTS `MY_DB`.`MY_NEW_TABLE` (
`event_id` bigint
,`time_stamp` string
,`amount` decimal(9,2)
)
PARTITIONED BY (
`year` smallint
,`month` tinyint
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ','
,'quoteChar' = '"'
,'escapeChar' = '\\'
)
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://<BUCKETNAME>/<SUBFOLDER>/'
TBLPROPERTIES (
'classification' = 'csv'
,'skip.header.line.count' = '1'
);
If I define amount as a float, it works, and I can later CAST() it to a decimal(9,2) as I want, but I would prefer to have it in the table structure directly.
Related
I am having a external table with below format in hive.
CREATE EXTERNAL TABLE cs_mbr_prov(
key struct<inid:string,......>,
memkey string,
ob_id string,
.....
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
'hbase.columns.mapping'=' :key,ci:MEMKEY, .....',
'serialization.format'='1')
I want to create same type of table in Azure Databricks where my Input and Output are in parquet format.
As per the official Doc I created and reproduced the table with Input and Output are in parquet format.
Sample code:
CREATE EXTERNAL TABLE `vams`(
`country` string,
`count` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'dbfs:/FileStore/'
TBLPROPERTIES (
'totalSize'='2335',
'numRows'='240',
'rawDataSize'='2095',
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'transient_lastDdlTime'='1418173653')
Reference:
https://learn.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-table-hiveformat
I'm trying to follow the examples of Hive connector to create hive table. I can write HQL to create a table via beeline. But wonder how to make it via prestosql.
Given table
CREATE TABLE hive.web.request_logs (
request_time varchar,
url varchar,
ip varchar,
user_agent varchar,
dt varchar
)
WITH (
format = 'CSV',
partitioned_by = ARRAY['dt'],
external_location = 's3://my-bucket/data/logs/'
)
How to specify SERDEPROPERTIES like separatorChar and quoteChar?
How to specify TBLPROPERTIES like skip.header.line.count?
In Presto you do this like this:
CREATE TABLE table_name( ... columns ... )
WITH (format='CSV', csv_separator='|', skip_header_line_count=1);
You can list all supported table properties in Presto with
SELECT * FROM system.metadata.table_properties;
I just do a simple query like this ,but somme Exception appear.
insert overwrite table stage_dfqp.user_currency partition (dt='2018-05-16')
select fuid,
fbpid,
fgamefsk
from stage_dfqp.pb_gamecoins
enter image description here
but when I change query like this(just add limit XXX) Exception disappear
insert overwrite table stage_dfqp.user_currencypartition (dt='2018-05-16')
select fuid,
fbpid,
fgamefsk
from stage_dfqp.pb_gamecoins limit 100
Hive table info:
CREATE TABLE `stage_dfqp.user_currency`(
`fuid` bigint ,
`coin_type` string ,
`coin_num` bigint
)
PARTITIONED BY (
`dt` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
I am parsing a nested xml file using hivexml serde but it returns null while we select the data from hive table.
Sample xml file is xml data.
Query which i created for parsing the xml.
CREATE EXTERNAL TABLE IF NOT EXISTS abc ( mail string, Type string, Id bigint, Date string, LId bigint, value string)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.OptOutEmail"="/Re/mail/text()",
"column.xpath.OptOutType"="/Re/Type/text()",
"column.xpath.SurveyId"="/Re/Id/text()",
"column.xpath.RequestedDate"="/Re/Date/text()",
"column.xpath.EmailListId"="/Re/Lists/LId/text()",
"column.xpath.Description"="/Re/Lists/value/text()")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/abc/xyz'
TBLPROPERTIES ("xmlinput.start"="<Out>","xmlinput.end"= "</Out>");
Please can someone help.
Try with the below query. I have loaded the data into the table from a local path.
CREATE EXTERNAL TABLE IF NOT EXISTS xmlList ( mail string, Type string, Id
bigint, Dated string, LId bigint, value string)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.mail"="/Re/mail/text()",
"column.xpath.Type"="/Re/Type/text()",
"column.xpath.Id"="/Re/Id/text()",
"column.xpath.Dated"="/Re/Dated/text()",
"column.xpath.LId"="/Re/Lists/List/LId/text()",
"column.xpath.value"="/Re/Lists/List/value/text()")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES ("xmlinput.start"="<Re>","xmlinput.end"= "</Re>");
I am new to hadoop. I need a help regarding error encountered in Hive while creating a new table. I have gone through this Hive FAILED: ParseException line 2:0 cannot recognize input near ''macaddress'' 'CHAR' '(' in column specification
My question: Is it necessary to write a location of the table in the script? because I am writing table location at starting and I am afraid about writing the location because it should not disturb my rest of the databases by any mulfunction operation.
Here is my query:
CREATE TABLE meta_statistics.tank_items (
shop_offers_history_before bigint,
shop_offers_temp bigint,
videos_distinct_temp bigint,
deleted_temp bigint,
t_stamp timestamp )
CLUSTERED BY (
tank_items_id)
INTO 8 BUCKETS
ROW FORMAT SERDE
TBLPROPERTIES (transactional=true)
STORED AS ORC;
The error I am getting is-
ParseException line 1:3 cannot recognize input near 'TBLPROPERTIES'
'(' 'transactional'
What would be the other possibilities of errors and how can I remove those?
There is a syntax error in your create query. Error which you have shared says that hive cannot recognize input near 'TBLPROPERTIES'.
Solution:
As per hive syntax, the key value passed in TBLPROPERTIES should be in double quotes. it should be like this: TBLPROPERTIES ("transactional"="true")
So if I correct your query it will be:
CREATE TABLE meta_statistics.tank_items (
shop_offers_history_before bigint,
shop_offers_temp bigint,
videos_distinct_temp bigint,
deleted_temp bigint,
t_stamp timestamp
) CLUSTERED BY (tank_items_id) INTO 8 BUCKETS
ROW FORMAT SERDE TBLPROPERTIES ("transactional"="true") STORED AS ORC;
Execute above query, then if you get any other syntax error them make sure that the order of STORED AS , CLUSTERED BY , TBLPROPERTIES is as per the hive syntax.
Refer this for more details:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
1) ROW FORMAT SERDE -> you should pass some serde
2) TBLPROPERTIES key value should be in double quotes
3) if you give CLUSTERED BY value should be there in the columns given
replace as follows
CREATE TABLE meta_statistics.tank_items ( shop_offers_history_before bigint, shop_offers_temp bigint, videos_distinct_temp bigint, deleted_temp bigint, t_stamp timestamp ) CLUSTERED BY (shop_offers_history_before) INTO 8 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS ORC TBLPROPERTIES ("transactional"="true");
hope this helps