Creating an infinite table in Hive (avro) - hive

I am new to Hive and Cloudera. I am trying to create a table in Hive from avro schema and then load data there. The code for table creation is below:
CREATE EXTERNAL TABLE newTab3
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "namespaceNameTochange",
"type": "record",
"name": "customer",
"fields": [
{ "name": "name","type": "string"},
{ "name": "id","type": "int"}
]
}');
Table is created successfully (the columns are as in avro schema). However, number of rows grows infinitely, and all of the values are null, even before loading data into the table.
Could anybody tell me what I am doing wrong here? Thanks in advance.

Related

Get a property from a JSON which is stored in a clob column

I have a clob column in my table, in which I store JSON string. the data in the column is something like below:
{
"date": "2021/11/11",
"name": "test",
"errorCode": "00000",
"type": "test"
}
I want to write a select query to get the name.
Any suggestions?
Assuming you're on 12c or higher, you can extract the value using json_value:
select json_value (
'{"date":"2021/11/11", "name":"test","errorCode":"00000","type":"test"}',
'$.name'
) nm
from dual;
NM
test
One option is using JSON_TABLE provided your DB version is 12c+(12.1.0.2) such as
SELECT jt.*
FROM t,
JSON_TABLE(jscol,
'$'
COLUMNS(
name VARCHAR2(100) PATH '$.name',
"date" VARCHAR2(100) PATH '$.date'
)
) jt
this way, you can pick all the columns listed after COLUMNS keyword
Demo

Saving JSON file to SQL Server Database tables

I am having a nested JSON file as shown below (where condition and rules can be nested to multiple levels)
{
"condition": "and",
"rules": [
{
"field": "26",
"operator": "=",
"value": "TEST1"
},
{
"field": "36",
"operator": "=",
"value": "TEST2"
},
{
"condition": "or",
"rules": [
{
"field": "2",
"operator": "=",
"value": 100
},
{
"field": "3",
"operator": "=",
"value": 12
},
{
"condition": "or",
"rules": [
{
"field": "12",
"operator": "=",
"value": "CA"
},
{
"field": "12",
"operator": "=",
"value": "AL"
}
]
}
]
}
]
}
I want to save this JSON (conditon and rules fields in json file can be nested to multiple levels) in to SQL Server Tables and later wanted to construct the same JSON from these created tables. How can i do this ? From these table i am planning to get other json formats also that is why decided to split the json to table columns.
I think need to create a recursive sql function to do same.
i have created following tables to save the same json .
CREATE TABLE [Ruleset]
([RulesetID] [BIGINT] IDENTITY(1, 1) NOT NULL PRIMARY KEY,
[Condition] [VARCHAR](50) NOT NULL,
[ParentRuleSetID] [BIGINT] NULL
);
GO
CREATE TABLE [Rules]
([RuleID] [BIGINT] IDENTITY(1, 1) NOT NULL PRIMARY KEY,
[Fields] [VARCHAR](MAX) NOT NULL,
[Operator] [VARCHAR](MAX) NOT NULL,
[Value] [VARCHAR](MAX) NOT NULL,
[RulesetID] [BIGINT] NULL
FOREIGN KEY REFERENCES [Ruleset](RulesetID)
);
insert script as follows,
INSERT INTO [Ruleset] values
('AND',0),
('OR',1),
('OR',2)
INSERT INTO [Rules] values
('26','=','TEST1',1),
('364','=','TEST2',1),
('2','=','100',2),
('3','=','12',2),
('12','=','CA',3),
('12','=','AL',3)
Will these tables are enough? Will be able to save all details?
Attaching the values that i have added to these tables manually.
How can i save this JSON to these table and later will construct the same JSON from these tables via stored procedure or queries ?
please provide suggestions and samples!
Actually you can declare the column type as NVARCHAR(MAX), and save the json string into it.
As JSON case sensitive please check your schema definition and sample data. I see a discrepancy between the definition of the tables, their contents and your JSON
All scripts tested on MS SQL Server 2016
I used a temporary table variable in this script, but you can do without it. See an example in SQL Fiddle
-- JSON -> hierarchy table
DECLARE #ExpectedJSON NVARCHAR(MAX) = '
{
"condition": "and",
"rules": [
{
"field": "26",
"operator": "=",
"value": "TEST1"
},
{
"field": "36",
"operator": "=",
"value": "TEST2"
},
{
"condition": "or",
"rules": [
{
"field": "2",
"operator": "=",
"value": 100
},
{
"field": "3",
"operator": "=",
"value": 12
},
{
"condition": "or",
"rules": [
{
"field": "12",
"operator": "=",
"value": "CA"
},
{
"field": "12",
"operator": "=",
"value": "AL"
}
]
}
]
}
]
}
'
DECLARE #TempRuleset AS TABLE
(RulesetID BIGINT NOT NULL PRIMARY KEY,
condition VARCHAR(50) NOT NULL,
ParentRuleSetID BIGINT NOT NULL,
RulesJSON NVARCHAR(MAX)
)
;WITH ParseRuleset AS (
SELECT 1 AS RulesetID,
p.condition,
p.rules,
0 AS ParentRuleSetID
FROM OPENJSON(#ExpectedJSON, '$') WITH (
condition VARCHAR(50),
rules NVARCHAR(MAX) AS JSON
) AS p
UNION ALL
SELECT RulesetID + 1,
p.condition,
p.rules,
c.RulesetID AS ParentRuleSetID
FROM ParseRuleset AS c
CROSS APPLY OPENJSON(c.rules) WITH (
condition VARCHAR(50),
rules NVARCHAR(MAX) AS JSON
) AS p
where
p.Rules IS NOT NULL
)
INSERT INTO #TempRuleset (RulesetID, condition, ParentRuleSetID, RulesJSON)
SELECT RulesetID,
condition,
ParentRuleSetID,
rules
FROM ParseRuleset
-- INSEERT INTO Ruleset ...
SELECT RulesetID,
condition,
ParentRuleSetID,
RulesJSON
FROM #TempRuleset
-- INSERT INTO Rules ...
SELECT RulesetID,
field,
operator,
value
FROM #TempRuleset tmp
CROSS APPLY OPENJSON(tmp.RulesJSON) WITH (
field VARCHAR(MAX),
operator VARCHAR(MAX),
value VARCHAR(MAX)
) AS p
WHERE p.field IS NOT NULL
SQL Fiddle
Hierarchy tables -> JSON:
CREATE TABLE Ruleset
(RulesetID BIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
condition VARCHAR(50) NOT NULL,
ParentRuleSetID BIGINT NULL
);
GO
CREATE TABLE Rules
(RuleID BIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
field VARCHAR(MAX) NOT NULL,
operator VARCHAR(MAX) NOT NULL,
value VARCHAR(MAX) NOT NULL,
RulesetID BIGINT NULL FOREIGN KEY REFERENCES Ruleset(RulesetID)
);
INSERT INTO Ruleset values
('and',0),
('or',1),
('or',2)
INSERT INTO Rules values
('26','=','TEST1',1),
('36','=','TEST2',1),
('2','=','100',2),
('3','=','12',2),
('12','=','CA',3),
('12','=','AL',3)
-- hierarchy table -> JSON
;WITH GetLeafLevel AS
(
SELECT Ruleset.RulesetID,
Ruleset.condition,
Ruleset.ParentRuleSetID,
1 AS lvl,
( SELECT field,
operator,
value
FROM Rules
WHERE Rules.RulesetID = Ruleset.RulesetID
FOR JSON AUTO, WITHOUT_ARRAY_WRAPPER
) AS JSON_Rules
FROM Ruleset
WHERE ParentRuleSetID = 0
UNION ALL
SELECT Ruleset.RulesetID,
Ruleset.condition,
Ruleset.ParentRuleSetID,
GetLeafLevel.lvl + 1,
( SELECT field,
operator,
value
FROM Rules
WHERE Rules.RulesetID = Ruleset.RulesetID
FOR JSON AUTO, WITHOUT_ARRAY_WRAPPER
)
FROM Ruleset
INNER JOIN GetLeafLevel ON Ruleset.ParentRuleSetID = GetLeafLevel.RulesetID
),
-- SELECT * FROM GetLeafLevel -- debug
ConcatReverseOrder AS
(
SELECT GetLeafLevel.*,
CONCAT('{"condition":"',
GetLeafLevel.condition,
'","rules":[',
GetLeafLevel.JSON_Rules,
']}'
) AS js
FROM GetLeafLevel
WHERE GetLeafLevel.lvl = (SELECT MAX(lvl) FROM GetLeafLevel)
UNION ALL
SELECT GetLeafLevel.*,
CONCAT('{"condition":"',
GetLeafLevel.condition,
'","rules":[',
GetLeafLevel.JSON_Rules,
',',
ConcatReverseOrder.js,
']}'
) AS js
FROM GetLeafLevel
INNER JOIN ConcatReverseOrder ON GetLeafLevel.RuleSetID = ConcatReverseOrder.ParentRuleSetID
)
-- SELECT * FROM ConcatReverseOrder -- debug
SELECT js
FROM ConcatReverseOrder
WHERE ParentRuleSetID = 0
SQL Fiddle
I feel like I would need to know more about how you plan to use the data to answer this. My heart is telling me that there is something wrong about storing this information in MSSQL, if not wrong, problematic.
If i had to do it, I would convert these conditions into a matrix lookup table of rotatable events within your branch, so for each conceivable logic branch you could create a row in a lookup to evaluate this.
Depending out on your required output / feature set you can either do something like the above or just throw everything in a NVARCHAR as suggested by rkortekaas.
Your use case really does seem a perfect match for a NoSql Option such as MongoDb, Azure Table storage, or CosmosDB (CosmosDB can be pricey if you don't know your way round it).
Extract from MongoDB page:
In MongoDB, data is stored as documents. These documents are stored in
MongoDB in JSON (JavaScript Object Notation) format. JSON documents
support embedded fields, so related data and lists of data can be
stored with the document instead of an external table.
However, from here on I'm going to assume you are tied to SQL Server for other reasons.
You have stated that you are going to are just putting the document in and getting the same document out, so it doesn't make sense to go to the effort of splitting out all the fields.
SQL Server is much better at handling text fields than it used to be IMO.
Systems I've worked on before have had the following columns (I would write the sql, but I'm not at my dev machine):
Id [Primary Key, Integer, Incrementing index]
UserId [a Foreign Key to what this relates to - probably not 'user' in your case!]
Value [nvarchar(1000) contains the json as a string]
The lookup is easily done based on the foreign key.
However, suppose you want it to be more NoSql like, you could have:
Id [Primary Key, Integer, Incrementing index]
Key [nvarchar(100) a string key that you make, and can easily re-make for looking up the value (e.g. User_43_Level_6_GameData - this column should have an index]
Value [nvarchar(1000) contains the json as a string]
The reason I've kept to having an integer ID is to avoid fragmentation. You could obviously make the Value column bigger.
Json can easily be converted between a json object and a json string. In Javascript, you would use Json Parse and Stringify. If you are using C# you could use the following snippets, though there are many ways to do this task (the objects can be nested as deep as you like)
.NET Object to Json
Weather w = new Weather("rainy", "windy", "32");
var jsonString = JsonSerializer.Serialize(w);
Json to .NET Object (C#)
var w = JsonSerializer.Deserialize(jsonString);
UPDATE
Although this is the way I've done things in the past, it looks like there are new options in sql server to handle JSON - OPENJSON and JSONQUERY could be potential options, though I haven't used them myself - they still use nvarchar for the JSON column.

How to insert geojson data to geometry field in postgresql

I want to insert geoJSON to a geometry column of a table.
I have already inserted CSV file to the same column following this
tutorial,
I wonder how to insert the geoJSON to any geometry column?
I tried following this
answer but could not get what is going on there.
Just use an update with the function ST_GeomFromGeoJSON:
UPDATE mytable SET geom = ST_GeomFromGeoJSON(json_column);
The following example inserts a GeoJSON point into a JSON column and afterwards updates the geometry column with the above mentioned function.
CREATE TEMPORARY TABLE mytable(
json_column json,
geom geometry);
INSERT INTO mytable (json_column) VALUES ('{
"type": "Point",
"coordinates": [7.0069, 51.1623]
}');
UPDATE mytable SET geom = ST_GeomFromGeoJSON(json_column);
SELECT * FROM mytable;
json_column | geom
--------------------------------------+--------------------------------------------
{ +| 01010000009E5E29CB10071C400612143FC6944940
"type": "Point", +|
"coordinates": [7.0069, 51.1623]+|
} |
(1 Zeile)

Ingesting decimal in hive table of Avro Serde

I am trying to check whether i can change the precision and scale of decimal field in hive with Avro Serde.So I have writtenbelow code.
create database test_avro;
use test_avro_table;
create external table test_table(
name string,
salary decimal(17,2),
country string
)
row format delimited
fields terminated by ","
STORED AS textfile;
LOAD DATA LOCAL INPATH '/home/appsdesdssu/data/CACS_POC/data/' INTO TABLE
test_table;
create external table test_table_avro
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
tblproperties ('avro.schema.literal'='{
"name": "my_record",
"type": "record",
"fields": [
{"name":"name", "type":"string"},
{"name":"salary","type": "bytes","logicalType": "decimal","precision":
17,"scale": 2},
{"name":"country", "type":"string"}
]}');
insert overwrite table test_table_avro select * from test_table;
Here, I am getting error saying
FAILED: UDFArgumentException Only string, char, varchar or binary data can be cast into binary data types.
Data file:
steve,976475632987465.257,USA
rogers,349643905318384.137,mexico
groot,534563663653653.896,titan
If i am missing anything here than please let me know.
Hive did not support decimal to Binary version till now. So we have to work around by first converting it to string and than binary.So, below lines
insert overwrite table test_table_avro select * from test_table;
needs change to
insert overwrite table test_table_avro select name,cast(cast(salary as string) as binary),country from test_table;

move a function to another schema

It is possible to move a table from one schema to another:
ALTER TABLE my_table SET SCHEMA different_schema;
However, I cannot find the equivalent feature for moving a function from one schema to another.
How can I do this?
(version 8.3+)
Taken from the docs:
ALTER FUNCTION name ( [ [ argmode ] [ argname ] argtype [, ...] ] )
SET SCHEMA new_schema