Json to Athena table gives 0 results - sql

I have a json that looks like this. No nesting.
[{"id": [1984262,1984260]}]
I want to create a table in Athena using sql such that I have a column "id" and each row in that column would contain a value from the array. Something like this
id
1984262
1984260
What I tried
CREATE EXTERNAL TABLE table1 (
id string
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
and
CREATE EXTERNAL TABLE table2 (
id array<string>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
and
CREATE EXTERNAL TABLE table2 (
id array<bigint>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
When I preview the table I see empty rows with absolutely no data. Please help.

long story short: your JSON file needs to be compliant with the JSON-SerDe.
To query json data with athena you need to define a JSON (de-)serializer. You chose Hive JSON SerDe.
https://docs.aws.amazon.com/athena/latest/ug/json-serde.html
Now you data needs to be compliant with that serializer. For Hive JSON SerDe that means that each line needs to be a single line json that corresponds to one record. For you that would mean:
{ "id" : 1984262 }
{ "id" : 1984260 }
and the corresponding table definition would be
CREATE EXTERNAL TABLE table1 (
id bigint
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
https://github.com/rcongiu/Hive-JSON-Serde/blob/develop/README.md

Related

Hive table from HBase with a column cotaining avro

I was able to create an external Hive table with just one column containing an Avro data stored into HBase through the following query:
CREATE EXTERNAL TABLE test_hbase_avro
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,familyTest:columnTest",
"familyTest.columnTest.serialization.type" = "avro",
"familyTest.columnTest.avro.schema.url" = "hdfs://path/person.avsc")
TBLPROPERTIES (
"hbase.table.name" = "otherTest",
"hbase.mapred.output.outputtable" = "hbase_avro_table",
"hbase.struct.autogenerate"="true");
What I wish to do is to create a table with the same avro file and other columns containing strings or integer but I was not able to do that and didn't find any example. Can anyone help me? Thank you

Hbase scan showing hexa character for special character

Inserted data to hbase table through hive external table.
Hive table contains basically 3 columns
id - String
identifier - map<string,string>
src - string
inserted the tale to hbase table.
map data present in hive table for identifier column.
sample map data
{"CUSTID":"CUST4302109","LYLT":"44302109"}
Data inserted to hbase table. while fetching the data through scan command.
O2008031353044301300 column=INTR:IDNFS-string, timestamp=1626515550906, value=CUSTID\x03CUST4301300\x02\x03\x02LYLT\x0344301300
Hexachar are coming instead of special char.
Using below mentioned properties, while creating the hbase hive external table,
ROW FORMAT SERDE
'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'`enter code here`
WITH SERDEPROPERTIES (
'hbase.columns.mapping'=':key,INTR:CID,INTR:IDNFS-string,INTR:SRC',enter code here
'hbase.table.default.storage.type' ='string',
'serialization.format'='1')
how to get the actual special char?

send data from stage to multi column table in snowflake

I have an internal named stage where json files are stored and from there I want to store them in snowflake table. The structure of destination table is as follows,
file_name (string)
load_date (timestamp)
data (variant)
I am using the following query to move the data from stage to table
copy into tableName (data) from #stagename/filename.json;
But the above query is only populating the data column, what I want is to insert the timestamp and filename too. Any idea what changes I need to make in the query? Thanks
You need to use a COPY statement with a transformation - documentation here. When you use that method you can query the metadata of the files to get the filename, row number etc - documentation for that here.
Example file filename.json uploaded to an internal stage called stagename:
[{"name": "simon"},{"name": "jason"}, {"name": "jessica"}]
Sql to load create and load table:
-- Create example table first with 3 columns
create or replace transient table test_table
(
file_name varchar,
load_date timestamp,
data variant
);
-- Load with transformation:
copy into test_table (file_name, load_date, data) from (
select
metadata$filename,
current_timestamp,
f.$1
from #stagename/filename.json f
)
file_format = (
type = json
strip_outer_array = true
)
force=true
;
Results:
+-------------+-----------------------------+-----------------------+
|FILE_NAME |LOAD_DATE |DATA |
+-------------+-----------------------------+-----------------------+
|filename.json|2021-07-16 08:56:24.075000000|{"name": "simon"} |
|filename.json|2021-07-16 08:56:24.075000000|{"name": "jason"} |
|filename.json|2021-07-16 08:56:24.075000000|{"name": "jessica"} |
+-------------+-----------------------------+-----------------------+

mapping JSON object stored in HBase to struct<Array<..>> Hive external table

I have a hbase table that contains a column in JSON format.So, I want create a hive external table that contains a struct> type.
Hbase table named smms:
colum name : nodeid , value : "4545781751" in STRING FORMAT
column name : events in JSON FORMAT
value : [{"id":12542, "status" :"true", ..},{"id":"1477", "status":"false", ..}]
Hive external table :
Create external table msg (
key INT
nodeid STRING,
events STRUCT<ARRAY<id:INT, status: STRING>
}
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,data:nodeid, data:events") TBLPROPERTIES ("hbase.table.name" = "smms");
the hive query : select * from msg; return a following result :
nodeid : 4545781751
events : NULL
Thanks
The HBaseStorageHandler (de)serialiser only supports String and binary fields https://cwiki.apache.org/confluence/display/Hive/HBaseIntegratio
What you store in HBase is actually a string (which indeed contains a JSON) but you can't map it to a complex Hive type.
The solution would be to define events as a string, and to export the data to another HIVE table using HIVE JSON deserialiser https://github.com/rcongiu/Hive-JSON-Serde

I have a json file and I want to create Hive external table over it but with more descriptive field names

I have a JSON file and I want to create Hive external table over it but with more descriptive field names.Basically, I want to map the less descriptive field names present in json file to more descriptive fields in Hive external table.
e.g.
{"field1":"data1","field2":100}
Hive Table:
Create External Table my_table (Name string, Id int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
LOCATION '/path-to/my_table/';
Where Name points to field1 and Id points to field2.
Thanks!!
You can use this SerDe that allows custom mappings between the JSON data and the hive columns: https://github.com/rcongiu/Hive-JSON-Serde
See in particular this part: https://github.com/rcongiu/Hive-JSON-Serde#mapping-hive-keywords
so, in your case, you'd need to do something like
CREATE EXTERNAL TABLE my_table(name STRING, id, INT)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
"mapping.name" = "field1",
"mapping.id" = "field2" )
LOCATION '/path-to/my_table/'
Note that hive column names are case insensitive, while JSON attributes
are case sensitive.