I am trying to create a table in hive. need help with it.
Sample code:
CREATE EXTERNAL TABLE table1(
id STRING,
name STRING,
"12489738" STRING,
"12492628" STRING,
"12492633" STRING,
"12492638" STRING,
"12492655" STRING,
"12492659" STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
LOCATION ""
tblproperties ("skip.header.line.count"="1");
But it throws error:
Error info:
NoViableAltException(320#[])
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11633)
at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:49892)
at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:40082)
at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:38241)
at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:6726)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:4122)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1786)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1152)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:447)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:330)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1233)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1274)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1160)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
FAILED: ParseException line 4:0 cannot recognize input near '"12489738"' 'STRING' ',' in column specification
Try with escaping numbered column names with `(backticks)
hive> CREATE EXTERNAL TABLE table1( id STRING, name STRING, `12489738` STRING,
`12492628` STRING,
`12492633` STRING, `12492638` STRING, `12492655` STRING, `12492659` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" LOCATION ""
tblproperties ("skip.header.line.count"="1");
Related
I have stored data source in s3 and when querying it in athena and querying the total no of rows , its giving me more rows than present in csv file stored in s3 .
I have also given separate path for athena query result i.e different from the data source folder path of s3 .
Please help me with this , why athena is giving me extra rows and unknown values in them ,thus creating discrepancies in the data.
Please find the query i wrote create the table in athena
athena_client.start_query_execution(QueryString='create database cms_data',ResultConfiguration={'OutputLocation': 's3://cms-dashboard-automation/Athenaoutput/'})
\t#Tables created for athena
context = {'Database': 'cms_data'}
athena_client.start_query_execution(QueryString='''CREATE EXTERNAL TABLE IF NOT EXISTS `cms_data`.`mpf_data` (
`State` String,
`County` String,
`Org_Name` String,
`Contract_ID` String,
`Plan_ID` double,
`Segment_ID` double,
`Plan_Type_Desc` String,
`Contract_Year` double,
`Category_Name` String,
`Service_Name` String,
`Limit_Flag` double,
`Authorization_Flag` double,
`Referral_Flag` double,
`Network_Description` String,
`Cost_Share` String )
\t ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
\t 'field.delim' = ','
) LOCATION 's3://cms-dashboard-automation/MPF_Data/'
TBLPROPERTIES ('has_encrypted_data'='false');
''',QueryExecutionContext = context,ResultConfiguration={'OutputLocation': 's3://cms-dashboard-automation/Athenaoutput/'})
I am trying to create an empty table which contains some columns with determined datatypes in S3 with a command launched in Athena, but it is throwing me the following error:
line 11:1: mismatched input 'PARTITIONED'. Expecting: 'COMMENT', 'WITH', <EOF>
The query I'm executing is the following:
CREATE TABLE IF NOT EXISTS boards_raw_fields_v1 (
"uuid" bigint,
"source" string,
"raw_company_name" string,
"raw_contract_type" string,
"raw_employment_type" string,
"raw_working_hours_type" string,
"raw_all_locations" array<string>,
"raw_categories" array<string>,
"raw_industry" string)
PARTITIONED BY (
"year" string,
"month" string,
"day" string,
"hour" string,
"version" bigint)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://my-data/datalake-raw/boards/raw-fields/v1'
TBLPROPERTIES (
'classification'='parquet',
'compressionType'='snappy',
'projection.enabled'='false',
'typeOfData'='file')
In the URI:
s3://my-data/datalake-raw/boards/raw-fields/v1
IMPORTANT NOTE: All the folders are currently created except the last one v1
What am I doing wrong here in the process of creating the table?
val parquetDF = session.read.parquet("s3a://test/ovd").selectExpr("id", "topic", "update_id", "blob")
Trying to read parquet file and dump into Postgres. One of the column in postgres table is of JSONB datatype and in parquet it is in String format.
parquetDF.write.format("jdbc")
.option("driver", "org.postgresql.Driver")
.option("url", "jdbc:postgresql://localhost:5432/db_metamorphosis?binaryTransfer=true&stringtype=unspecified")
.option("dbtable", "entitlements.general")
.option("user", "mdev")
.option("password", "")
.option("stringtype", "unspecified")
.mode(SaveMode.Append)
.save()
And it fails with this erorr :
Caused by: org.postgresql.util.PSQLException: ERROR: column "blob" is of type jsonb but expression is of type character
Hint: You will need to rewrite or cast the expression.
Position: 85
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998)
... 16 more
Someone on SO suggested to put stringtype=unspecified as then Postgres will decided the datatype for string, but it seems to be not working.
<scala.major.version>2.12</scala.major.version>
<scala.version>${scala.major.version}.8</scala.version>
<spark.version>2.4.0</spark.version>
<postgres.version>9.4-1200-jdbc41</postgres.version>
This is my second attempt of using SerDe. First one worked quiet well but now, I'm really struggling.
I got an XML of this structure:
This is the Hive table I created
CREATE TABLE raw_abc.text_abc
(
publicationid string,
parentid string,
id string,
level string,
usertypeid string,
name string,
assetcrossreferences_ordered string,
assetcrossreferences MAP<string, string>,
attributenames_ordered string,
attributenames map<string,string>,
seo_ordered string,
seo MAP<string, string>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.publicationid"="/ST:ECC-HierarchyMessage/#PublicationID",
"column.xpath.parentid"="/ST:ECC-HierarchyMessage/Product/#ParentID",
"column.xpath.id"="/ST:ECC-HierarchyMessage/Product/#ID",
"column.xpath.level"="/ST:ECC-HierarchyMessage/Product/#Level",
"column.xpath.usertypeid"="/ST:ECC-HierarchyMessage/Product/#UserTypeID",
"column.xpath.name"="/ST:ECC-HierarchyMessage/Product/#Name",
"column.xpath.assetcrossreferences_ordered"="/ST:ECC-HierarchyMessage/Product/AssetCrossReferences/#Ordered",
"column.xpath.assetcrossreferences"="/ST:ECC-HierarchyMessage/Product/AssetCrossReferences/AssetCrossReference",
"column.xpath.attributenames_ordered"="/ST:ECC-HierarchyMessage/Product/AttributeNames/#Ordered",
"column.xpath.attributenames"="/ST:ECC-HierarchyMessage/Product/AttributeNames/#Ordered",
"column.xpath.seo_ordered"="/ST:ECC-HierarchyMessage/Product/SEO/#Ordered",
"column.xpath.seo"="/ST:ECC-HierarchyMessage/Product/SEO"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
location 's3a://ec-abc-dev/inbound/abc/abc/'
TBLPROPERTIES (
"xmlinput.start"="<ST:ECC-HierarchyMessage>",
"xmlinput.end"="</ST:ECC-HierarchyMessage>"
)
;
Table is created successfully, however,
when I try select * from raw_abc.text_abc , I get no records in return.
Any idea what's wrong here? I've spent the last 2 days trying to figure it out with no luck.
Thanks,
G
I am very new here, I am trying to run the following code on my
cloudera quickstart VM.
CREATE TABLE apache_common_log (
host STRING,
identity STRING,
user STRING,
time STRING,
request STRING,
status STRING,
size STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"
[^\"]*\") (-|[0-9]*) (-|[0-9]*)",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s"
)
STORED AS TEXTFILE;
but I got some error:
failed: execution error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask, Cannot validate serde: org.apache.hadoop.hive.serde2.RegexSerde
I did some research, all the fields are STRING, and i have add jar
/usr/lib/hive/lib/hive-contrib.jar
/usr/lib/hive/lib/hive-serde.jar
/usr/lib/hive/lib/hive-common.jar
it still didn't work.
really need some help!
any input will be appreciated!!!