Column names with numbers in a file and creating hive table

Column names with numbers in a file and creating hive table - hive

I am trying to create a table in hive. need help with it.
Sample code:
CREATE EXTERNAL TABLE table1(
id STRING,
name STRING,
"12489738" STRING,
"12492628" STRING,
"12492633" STRING,
"12492638" STRING,
"12492655" STRING,
"12492659" STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
LOCATION ""
tblproperties ("skip.header.line.count"="1");
But it throws error:
Error info:
NoViableAltException(320#[])
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11633)
at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:49892)
at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:40082)
at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:38241)
at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:6726)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:4122)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1786)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1152)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:447)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:330)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1233)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1274)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1160)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
FAILED: ParseException line 4:0 cannot recognize input near '"12489738"' 'STRING' ',' in column specification

Try with escaping numbered column names with `(backticks)
hive> CREATE EXTERNAL TABLE table1( id STRING, name STRING, `12489738` STRING,
`12492628` STRING,
`12492633` STRING, `12492638` STRING, `12492655` STRING, `12492659` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" LOCATION ""
tblproperties ("skip.header.line.count"="1");

Related

Athena gives extra rows than actual present in datasource in s3

I have stored data source in s3 and when querying it in athena and querying the total no of rows , its giving me more rows than present in csv file stored in s3 .
I have also given separate path for athena query result i.e different from the data source folder path of s3 .
Please help me with this , why athena is giving me extra rows and unknown values in them ,thus creating discrepancies in the data.
Please find the query i wrote create the table in athena
athena_client.start_query_execution(QueryString='create database cms_data',ResultConfiguration={'OutputLocation': 's3://cms-dashboard-automation/Athenaoutput/'})
\t#Tables created for athena
context = {'Database': 'cms_data'}
athena_client.start_query_execution(QueryString='''CREATE EXTERNAL TABLE IF NOT EXISTS `cms_data`.`mpf_data` (
`State` String,
`County` String,
`Org_Name` String,
`Contract_ID` String,
`Plan_ID` double,
`Segment_ID` double,
`Plan_Type_Desc` String,
`Contract_Year` double,
`Category_Name` String,
`Service_Name` String,
`Limit_Flag` double,
`Authorization_Flag` double,
`Referral_Flag` double,
`Network_Description` String,
`Cost_Share` String )
\t ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
\t 'field.delim' = ','
) LOCATION 's3://cms-dashboard-automation/MPF_Data/'
TBLPROPERTIES ('has_encrypted_data'='false');
''',QueryExecutionContext = context,ResultConfiguration={'OutputLocation': 's3://cms-dashboard-automation/Athenaoutput/'})

AWS Athena: line 11:1: mismatched input 'PARTITIONED'. Expecting: 'COMMENT', 'WITH', <EOF> while creating table

I am trying to create an empty table which contains some columns with determined datatypes in S3 with a command launched in Athena, but it is throwing me the following error:
line 11:1: mismatched input 'PARTITIONED'. Expecting: 'COMMENT', 'WITH', <EOF>
The query I'm executing is the following:
CREATE TABLE IF NOT EXISTS boards_raw_fields_v1 (
"uuid" bigint,
"source" string,
"raw_company_name" string,
"raw_contract_type" string,
"raw_employment_type" string,
"raw_working_hours_type" string,
"raw_all_locations" array<string>,
"raw_categories" array<string>,
"raw_industry" string)
PARTITIONED BY (
"year" string,
"month" string,
"day" string,
"hour" string,
"version" bigint)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://my-data/datalake-raw/boards/raw-fields/v1'
TBLPROPERTIES (
'classification'='parquet',
'compressionType'='snappy',
'projection.enabled'='false',
'typeOfData'='file')
In the URI:
s3://my-data/datalake-raw/boards/raw-fields/v1
IMPORTANT NOTE: All the folders are currently created except the last one v1
What am I doing wrong here in the process of creating the table?

ERROR: column "blob" is of type jsonb but expression is of type character

val parquetDF = session.read.parquet("s3a://test/ovd").selectExpr("id", "topic", "update_id", "blob")
Trying to read parquet file and dump into Postgres. One of the column in postgres table is of JSONB datatype and in parquet it is in String format.
parquetDF.write.format("jdbc")
.option("driver", "org.postgresql.Driver")
.option("url", "jdbc:postgresql://localhost:5432/db_metamorphosis?binaryTransfer=true&stringtype=unspecified")
.option("dbtable", "entitlements.general")
.option("user", "mdev")
.option("password", "")
.option("stringtype", "unspecified")
.mode(SaveMode.Append)
.save()
And it fails with this erorr :
Caused by: org.postgresql.util.PSQLException: ERROR: column "blob" is of type jsonb but expression is of type character
Hint: You will need to rewrite or cast the expression.
Position: 85
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998)
... 16 more
Someone on SO suggested to put stringtype=unspecified as then Postgres will decided the datatype for string, but it seems to be not working.
<scala.major.version>2.12</scala.major.version>
<scala.version>${scala.major.version}.8</scala.version>
<spark.version>2.4.0</spark.version>
<postgres.version>9.4-1200-jdbc41</postgres.version>

Can't get Hive SerDe to work - returns 0 records

This is my second attempt of using SerDe. First one worked quiet well but now, I'm really struggling.
I got an XML of this structure:
This is the Hive table I created
CREATE TABLE raw_abc.text_abc
(
publicationid string,
parentid string,
id string,
level string,
usertypeid string,
name string,
assetcrossreferences_ordered string,
assetcrossreferences MAP<string, string>,
attributenames_ordered string,
attributenames map<string,string>,
seo_ordered string,
seo MAP<string, string>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.publicationid"="/ST:ECC-HierarchyMessage/#PublicationID",
"column.xpath.parentid"="/ST:ECC-HierarchyMessage/Product/#ParentID",
"column.xpath.id"="/ST:ECC-HierarchyMessage/Product/#ID",
"column.xpath.level"="/ST:ECC-HierarchyMessage/Product/#Level",
"column.xpath.usertypeid"="/ST:ECC-HierarchyMessage/Product/#UserTypeID",
"column.xpath.name"="/ST:ECC-HierarchyMessage/Product/#Name",
"column.xpath.assetcrossreferences_ordered"="/ST:ECC-HierarchyMessage/Product/AssetCrossReferences/#Ordered",
"column.xpath.assetcrossreferences"="/ST:ECC-HierarchyMessage/Product/AssetCrossReferences/AssetCrossReference",
"column.xpath.attributenames_ordered"="/ST:ECC-HierarchyMessage/Product/AttributeNames/#Ordered",
"column.xpath.attributenames"="/ST:ECC-HierarchyMessage/Product/AttributeNames/#Ordered",
"column.xpath.seo_ordered"="/ST:ECC-HierarchyMessage/Product/SEO/#Ordered",
"column.xpath.seo"="/ST:ECC-HierarchyMessage/Product/SEO"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
location 's3a://ec-abc-dev/inbound/abc/abc/'
TBLPROPERTIES (
"xmlinput.start"="<ST:ECC-HierarchyMessage>",
"xmlinput.end"="</ST:ECC-HierarchyMessage>"
)
;
Table is created successfully, however,
when I try select * from raw_abc.text_abc , I get no records in return.
Any idea what's wrong here? I've spent the last 2 days trying to figure it out with no luck.
Thanks,
G

Hive error: cannot validate serde: org.apache.hadoop.serde2.RegexSerde

I am very new here, I am trying to run the following code on my
cloudera quickstart VM.
CREATE TABLE apache_common_log (
host STRING,
identity STRING,
user STRING,
time STRING,
request STRING,
status STRING,
size STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"
[^\"]*\") (-|[0-9]*) (-|[0-9]*)",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s"
)
STORED AS TEXTFILE;
but I got some error:
failed: execution error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask, Cannot validate serde: org.apache.hadoop.hive.serde2.RegexSerde
I did some research, all the fields are STRING, and i have add jar
/usr/lib/hive/lib/hive-contrib.jar
/usr/lib/hive/lib/hive-serde.jar
/usr/lib/hive/lib/hive-common.jar
it still didn't work.
really need some help!
any input will be appreciated!!!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Column names with numbers in a file and creating hive table - hive

Related

Athena gives extra rows than actual present in datasource in s3

AWS Athena: line 11:1: mismatched input 'PARTITIONED'. Expecting: 'COMMENT', 'WITH', <EOF> while creating table

ERROR: column "blob" is of type jsonb but expression is of type character

Can't get Hive SerDe to work - returns 0 records

Hive error: cannot validate serde: org.apache.hadoop.serde2.RegexSerde

Categories

Resources