mapping JSON object stored in HBase to struct<Array<..>> Hive external table - hive

I have a hbase table that contains a column in JSON format.So, I want create a hive external table that contains a struct> type.
Hbase table named smms:
colum name : nodeid , value : "4545781751" in STRING FORMAT
column name : events in JSON FORMAT
value : [{"id":12542, "status" :"true", ..},{"id":"1477", "status":"false", ..}]
Hive external table :
Create external table msg (
key INT
nodeid STRING,
events STRUCT<ARRAY<id:INT, status: STRING>
}
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,data:nodeid, data:events") TBLPROPERTIES ("hbase.table.name" = "smms");
the hive query : select * from msg; return a following result :
nodeid : 4545781751
events : NULL
Thanks

The HBaseStorageHandler (de)serialiser only supports String and binary fields https://cwiki.apache.org/confluence/display/Hive/HBaseIntegratio
What you store in HBase is actually a string (which indeed contains a JSON) but you can't map it to a complex Hive type.
The solution would be to define events as a string, and to export the data to another HIVE table using HIVE JSON deserialiser https://github.com/rcongiu/Hive-JSON-Serde

Related

Json to Athena table gives 0 results

I have a json that looks like this. No nesting.
[{"id": [1984262,1984260]}]
I want to create a table in Athena using sql such that I have a column "id" and each row in that column would contain a value from the array. Something like this
id
1984262
1984260
What I tried
CREATE EXTERNAL TABLE table1 (
id string
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
and
CREATE EXTERNAL TABLE table2 (
id array<string>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
and
CREATE EXTERNAL TABLE table2 (
id array<bigint>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
When I preview the table I see empty rows with absolutely no data. Please help.
long story short: your JSON file needs to be compliant with the JSON-SerDe.
To query json data with athena you need to define a JSON (de-)serializer. You chose Hive JSON SerDe.
https://docs.aws.amazon.com/athena/latest/ug/json-serde.html
Now you data needs to be compliant with that serializer. For Hive JSON SerDe that means that each line needs to be a single line json that corresponds to one record. For you that would mean:
{ "id" : 1984262 }
{ "id" : 1984260 }
and the corresponding table definition would be
CREATE EXTERNAL TABLE table1 (
id bigint
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
https://github.com/rcongiu/Hive-JSON-Serde/blob/develop/README.md

Hive table from HBase with a column cotaining avro

I was able to create an external Hive table with just one column containing an Avro data stored into HBase through the following query:
CREATE EXTERNAL TABLE test_hbase_avro
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,familyTest:columnTest",
"familyTest.columnTest.serialization.type" = "avro",
"familyTest.columnTest.avro.schema.url" = "hdfs://path/person.avsc")
TBLPROPERTIES (
"hbase.table.name" = "otherTest",
"hbase.mapred.output.outputtable" = "hbase_avro_table",
"hbase.struct.autogenerate"="true");
What I wish to do is to create a table with the same avro file and other columns containing strings or integer but I was not able to do that and didn't find any example. Can anyone help me? Thank you

Hbase scan showing hexa character for special character

Inserted data to hbase table through hive external table.
Hive table contains basically 3 columns
id - String
identifier - map<string,string>
src - string
inserted the tale to hbase table.
map data present in hive table for identifier column.
sample map data
{"CUSTID":"CUST4302109","LYLT":"44302109"}
Data inserted to hbase table. while fetching the data through scan command.
O2008031353044301300 column=INTR:IDNFS-string, timestamp=1626515550906, value=CUSTID\x03CUST4301300\x02\x03\x02LYLT\x0344301300
Hexachar are coming instead of special char.
Using below mentioned properties, while creating the hbase hive external table,
ROW FORMAT SERDE
'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'`enter code here`
WITH SERDEPROPERTIES (
'hbase.columns.mapping'=':key,INTR:CID,INTR:IDNFS-string,INTR:SRC',enter code here
'hbase.table.default.storage.type' ='string',
'serialization.format'='1')
how to get the actual special char?

I have a json file and I want to create Hive external table over it but with more descriptive field names

I have a JSON file and I want to create Hive external table over it but with more descriptive field names.Basically, I want to map the less descriptive field names present in json file to more descriptive fields in Hive external table.
e.g.
{"field1":"data1","field2":100}
Hive Table:
Create External Table my_table (Name string, Id int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
LOCATION '/path-to/my_table/';
Where Name points to field1 and Id points to field2.
Thanks!!
You can use this SerDe that allows custom mappings between the JSON data and the hive columns: https://github.com/rcongiu/Hive-JSON-Serde
See in particular this part: https://github.com/rcongiu/Hive-JSON-Serde#mapping-hive-keywords
so, in your case, you'd need to do something like
CREATE EXTERNAL TABLE my_table(name STRING, id, INT)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
"mapping.name" = "field1",
"mapping.id" = "field2" )
LOCATION '/path-to/my_table/'
Note that hive column names are case insensitive, while JSON attributes
are case sensitive.

Create External Hive Table Pointing to HBase Table

I have a table named "HISTORY" in HBase having column family "VDS" and the column names ROWKEY, ID, START_TIME, END_TIME, VALUE. I am using Cloudera Hadoop Distribution. I want to provide SQL interface to HBase table using Impala. In order to do this we have to create respective External Table in Hive? So how to create external hive table pointing to this HBase table?
Run the following code in Hive Query Editor:
CREATE EXTERNAL TABLE IF NOT EXISTS HISTORY
(
ROWKEY STRING,
ID STRING,
START_TIME STRING,
END_TIME STRING,
VALUE DOUBLE
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES
(
"hbase.columns.mapping" = ":key,VDS:ID,VDS:START_TIME,VDS:END_TIME,VDS:VALUE"
)
TBLPROPERTIES("hbase.table.name" = "HISTORY");
Don't forget to Refresh Impala Metadata after External Table Creation with the following bash command:
echo "INVALIDATE METADATA" | impala-shell;