I have a json that looks like this. No nesting.
[{"id": [1984262,1984260]}]
I want to create a table in Athena using sql such that I have a column "id" and each row in that column would contain a value from the array. Something like this
id
1984262
1984260
What I tried
CREATE EXTERNAL TABLE table1 (
id string
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
and
CREATE EXTERNAL TABLE table2 (
id array<string>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
and
CREATE EXTERNAL TABLE table2 (
id array<bigint>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
When I preview the table I see empty rows with absolutely no data. Please help.
long story short: your JSON file needs to be compliant with the JSON-SerDe.
To query json data with athena you need to define a JSON (de-)serializer. You chose Hive JSON SerDe.
https://docs.aws.amazon.com/athena/latest/ug/json-serde.html
Now you data needs to be compliant with that serializer. For Hive JSON SerDe that means that each line needs to be a single line json that corresponds to one record. For you that would mean:
{ "id" : 1984262 }
{ "id" : 1984260 }
and the corresponding table definition would be
CREATE EXTERNAL TABLE table1 (
id bigint
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
https://github.com/rcongiu/Hive-JSON-Serde/blob/develop/README.md
Related
I was able to create an external Hive table with just one column containing an Avro data stored into HBase through the following query:
CREATE EXTERNAL TABLE test_hbase_avro
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,familyTest:columnTest",
"familyTest.columnTest.serialization.type" = "avro",
"familyTest.columnTest.avro.schema.url" = "hdfs://path/person.avsc")
TBLPROPERTIES (
"hbase.table.name" = "otherTest",
"hbase.mapred.output.outputtable" = "hbase_avro_table",
"hbase.struct.autogenerate"="true");
What I wish to do is to create a table with the same avro file and other columns containing strings or integer but I was not able to do that and didn't find any example. Can anyone help me? Thank you
Inserted data to hbase table through hive external table.
Hive table contains basically 3 columns
id - String
identifier - map<string,string>
src - string
inserted the tale to hbase table.
map data present in hive table for identifier column.
sample map data
{"CUSTID":"CUST4302109","LYLT":"44302109"}
Data inserted to hbase table. while fetching the data through scan command.
O2008031353044301300 column=INTR:IDNFS-string, timestamp=1626515550906, value=CUSTID\x03CUST4301300\x02\x03\x02LYLT\x0344301300
Hexachar are coming instead of special char.
Using below mentioned properties, while creating the hbase hive external table,
ROW FORMAT SERDE
'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'`enter code here`
WITH SERDEPROPERTIES (
'hbase.columns.mapping'=':key,INTR:CID,INTR:IDNFS-string,INTR:SRC',enter code here
'hbase.table.default.storage.type' ='string',
'serialization.format'='1')
how to get the actual special char?
I have an internal named stage where json files are stored and from there I want to store them in snowflake table. The structure of destination table is as follows,
file_name (string)
load_date (timestamp)
data (variant)
I am using the following query to move the data from stage to table
copy into tableName (data) from #stagename/filename.json;
But the above query is only populating the data column, what I want is to insert the timestamp and filename too. Any idea what changes I need to make in the query? Thanks
You need to use a COPY statement with a transformation - documentation here. When you use that method you can query the metadata of the files to get the filename, row number etc - documentation for that here.
Example file filename.json uploaded to an internal stage called stagename:
[{"name": "simon"},{"name": "jason"}, {"name": "jessica"}]
Sql to load create and load table:
-- Create example table first with 3 columns
create or replace transient table test_table
(
file_name varchar,
load_date timestamp,
data variant
);
-- Load with transformation:
copy into test_table (file_name, load_date, data) from (
select
metadata$filename,
current_timestamp,
f.$1
from #stagename/filename.json f
)
file_format = (
type = json
strip_outer_array = true
)
force=true
;
Results:
+-------------+-----------------------------+-----------------------+
|FILE_NAME |LOAD_DATE |DATA |
+-------------+-----------------------------+-----------------------+
|filename.json|2021-07-16 08:56:24.075000000|{"name": "simon"} |
|filename.json|2021-07-16 08:56:24.075000000|{"name": "jason"} |
|filename.json|2021-07-16 08:56:24.075000000|{"name": "jessica"} |
+-------------+-----------------------------+-----------------------+
I have a hbase table that contains a column in JSON format.So, I want create a hive external table that contains a struct> type.
Hbase table named smms:
colum name : nodeid , value : "4545781751" in STRING FORMAT
column name : events in JSON FORMAT
value : [{"id":12542, "status" :"true", ..},{"id":"1477", "status":"false", ..}]
Hive external table :
Create external table msg (
key INT
nodeid STRING,
events STRUCT<ARRAY<id:INT, status: STRING>
}
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,data:nodeid, data:events") TBLPROPERTIES ("hbase.table.name" = "smms");
the hive query : select * from msg; return a following result :
nodeid : 4545781751
events : NULL
Thanks
The HBaseStorageHandler (de)serialiser only supports String and binary fields https://cwiki.apache.org/confluence/display/Hive/HBaseIntegratio
What you store in HBase is actually a string (which indeed contains a JSON) but you can't map it to a complex Hive type.
The solution would be to define events as a string, and to export the data to another HIVE table using HIVE JSON deserialiser https://github.com/rcongiu/Hive-JSON-Serde
I have a JSON file and I want to create Hive external table over it but with more descriptive field names.Basically, I want to map the less descriptive field names present in json file to more descriptive fields in Hive external table.
e.g.
{"field1":"data1","field2":100}
Hive Table:
Create External Table my_table (Name string, Id int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
LOCATION '/path-to/my_table/';
Where Name points to field1 and Id points to field2.
Thanks!!
You can use this SerDe that allows custom mappings between the JSON data and the hive columns: https://github.com/rcongiu/Hive-JSON-Serde
See in particular this part: https://github.com/rcongiu/Hive-JSON-Serde#mapping-hive-keywords
so, in your case, you'd need to do something like
CREATE EXTERNAL TABLE my_table(name STRING, id, INT)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
"mapping.name" = "field1",
"mapping.id" = "field2" )
LOCATION '/path-to/my_table/'
Note that hive column names are case insensitive, while JSON attributes
are case sensitive.