OPENJSON does not select all documents into the SQL table - sql

I have been trying to export the contents of a JSON file to an SQL Server table. However, despite the presence of multiple rows in the JSON, the output SQL table consists of only the first row from the JSON. The code I am using is as follows:
DROP TABLE IF EXISTS testingtable;
DECLARE #json VARCHAR(MAX) = '{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" },
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }';
SELECT * INTO testingtable FROM OPENJSON(#json) WITH (_id int, city varchar(20), loc float(50), pop int, state varchar(5)
)
SELECT * FROM testingtable
And the output obtained is as follows:
Click to view

A multiline JSON text is enclosed in a square bracket, for example;
[
{first data set},
{second data set}, .....
]
You can either add square brackets while passing data to this query or else you can add square brackets to your #json variable (eg. '['+ #json + ']')
DECLARE #json VARCHAR(MAX) = '{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" },
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }';
SELECT * INTO testingtable FROM OPENJSON ('['+ #json + ']') WITH (_id int, city varchar(20), loc float(50), pop int, state varchar(5)
)
SELECT * FROM testingtable

The string isn't valid JSON. You can't have two root objects in a JSON document. Properly formatted, the JSON string looks like this
DECLARE #json VARCHAR(MAX) = '{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" },
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }';
It should be
DECLARE #json VARCHAR(MAX) = '[{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" },
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }
]';
Looks like OPENJSON parsed the first object and stopped as soon as it encounterd the invalid text.
The quick & dirty way to fix this would be to add the missing square brackets :
SELECT * FROM OPENJSON('[' + #json + ']') WITH (_id int, city varchar(20), loc float(50), pop int, state varchar(5))
I suspect that string came from a log or event file that stores individual records in separate lines. That's not valid, nor is there any kind of standard or specification for this (name squatters notwithstanding) but a lot of high-traffic applications use this, eg in log files or event streaming.
The reason they do this is that there's no need to construct or read the entire array to get a record. It's easy to just append a new line for each record. Reading huge files and processing them in parallel is also easier - just read the text line by line and feed it to workers. Or split the file in N parts to the nearest newline and feed individual parts to different machines. That's how Map-Reduce works.
That's why adding the square brackets is a dirty solution - you have to read the entire text from a multi-MB or GB-sized file before you can parse it. That's not what OPENJSON was built to do.
The proper solution would be to read the file line-by-line using another tool, parse the records and insert the values into the target tables.

If you know the JSON docs will not contain any internal newline characters, you can split the string with string_split. OPENJSON doesn't care about leading whitespace or trailing ,. That way you avoid adding the [ ] characters, and don't have to parse it as one big document.
EG:
DROP TABLE IF EXISTS testingtable;
DECLARE #jsonFragment VARCHAR(MAX) = '{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" },
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }';
SELECT *
INTO testingtable
FROM string_split(#jsonFragment,CHAR(10)) docs
cross apply
(
select *
from openjson(docs.value)
WITH (_id int, city varchar(20), loc float(50), pop int, state varchar(5))
) d
SELECT * FROM testingtable
This format is what you might call a "JSON Fragment", by analogy with XML. And this is another difference between XML and JSON in SQL Server. For XML the engine is happy to parse and store XML Fragments, but not with JSON.

Related

How to return nested json in one row using JSON_TABLE

I am struggling with the following query. I have json and it is splitting the duration and distance elements over two rows. I need them in one and cannot see where to change this.
The following will just work from SQL*Plus etc (Oracle 20.1)
select *
from json_table(
'{
"destination_addresses" : [ "Page St, London SW1P 4BG, UK" ],
"origin_addresses" : [ "Corbridge Drive, Luton, LU2 9UH, UK" ],
"rows" : [
{
"elements" : [
{
"distance" : {
"text" : "88 km",
"value" : 87773
},
"duration" : {
"text" : "1 hours 25 mins",
"value" : 4594
},
"status" : "OK"
}
]
}
],
"status" : "OK"
}',
'$' COLUMNS
(
origin_addresses varchar2(1000 char) path '$.origin_addresses[*]',
destination_addresses varchar2(1000 char) path '$.destination_addresses[*]',
nested path '$.rows.elements.distance[*]'
COLUMNS(
distance_text varchar2(100) path '$.text',
distance_value varchar2(100) path '$.value'
),
nested path '$.rows.elements.duration[*]'
COLUMNS(
duration_text varchar2(100) path '$.text',
duration_value varchar2(100) path '$.value'
)
)
);
Mathguy: I don't have influence over the JSON that Google returns but origin and destinations are arrays it is just this search is from A to Z and not A,B,C to X,Y,Z for example which would return 9 results instead of 1
See results here

Mongodb query problem, how to get the matching items of the $or operator

Thank you for first.
MongoDB Version:4.2.11
I have a piece of data like this:
{
"name":...,
...
"administration" : [
{"name":...,"job":...},
{"name":...,"job":...}
],
"shareholder" : [
{"name":...,"proportion":...},
{"name":...,"proportion":...},
]
}
I want to match some specified data through regular expressions:
For a example:
db.collection.aggregate([
{"$match" :
{
"$or" :
[
{"name" : {"$regex": "Keyword"}}
{"administration.name": {"$regex": "Keyword"}},
{"shareholder.name": {"$regex": "Keyword"}},
]
}
},
])
I want to set a flag when the $or operator successfully matches any condition, which is represented by a custom field, for example:{"name" : {"$regex": "Keyword"}}Execute on success:
{"$project" :
{
"_id":false,
"name" : true,
"__regex_type__" : "name"
}
},
{"administration.name" : {"$regex": "Keyword"}}Execute on success:"__regex_type__" : "administration.name"
I try do this:
{"$project" :
{
"_id":false,
"name" : true,
"__regex_type__" :
{
"$switch":
{
"branches":
[
{"case": {"$regexMatch":{"input":"$name","regex": "Keyword"}},"then" : "name"},
{"case": {"$regexMatch":{"input":"$administration.name","regex": "Keyword"}},"then" : "administration.name"},
{"case": {"$regexMatch":{"input":"$shareholder.name","regex": "Keyword"}},"then" : "shareholder.name"},
],
"default" : "Other matches"
}
}
}
},
But $regexMatch cannot match the array,I tried to use $unwind again, but returned the number of many array members, which did not meet my starting point.
I want to implement the same function as mysql this SQL statement in mongodb, like this:
SELECT name,administration.name,shareholder.name,(
CASE
WHEN name REGEXP("Keyword") THEN "name"
WHEN administration.name REGEXP("Keyword") THEN "administration.name"
WHEN shareholder.name REGEXP("Keyword") THEN "shareholder.name"
END
)AS __regex_type__ FROM db.mytable WHERE
name REGEXP("Keyword") OR
shareholder.name REGEXP("Keyword") OR
administration.name REGEXP("Keyword");
Maybe this method is stupid, but I don’t have a better solution.
If you have a better solution, I would appreciate it!!!
Thank you!!!
Since $regexMatch does not handle arrays, use $filter to filter individual array elements with $regexMatch, then use $size to see how many elements matched.
[{"$match"=>{"$or"=>[{"a"=>"test"}, {"arr.a"=>"test"}]}},
{"$project"=>
{"a"=>1,
"arr"=>1,
"src"=>
{"$switch"=>
{"branches"=>
[{"case"=>{"$regexMatch"=>{"input"=>"$a", "regex"=>"test"}},
"then"=>"a"},
{"case"=>
{"$gte"=>
[{"$size"=>
{"$filter"=>
{"input"=>"$arr.a",
"cond"=>
{"$regexMatch"=>{"input"=>"$$this", "regex"=>"test"}}}}},
1]},
"then"=>"arr.a"}],
"default"=>"def"}}}}]
[{"_id"=>BSON::ObjectId('5ffb2df748966813f82f15ad'), "a"=>"test", "src"=>"a"},
{"_id"=>BSON::ObjectId('5ffb2df748966813f82f15ae'),
"arr"=>[{"a"=>"test"}],
"src"=>"arr.a"}]

How do I query an array in MongoDB?

I have been trying multiple queries but still can't figure it out. I have multiple documents that look like this:
{
"_id" : ObjectId("5b51f519a33e7f54161a0efb"),
"assigneesEmail" : [
"felipe#gmail.com"
],
"organizationId" : "5b4e0de37accb41f3ac33c00",
"organizationName" : "PaidUp Volleyball Club",
"type" : "athlete",
"firstName" : "Mylo",
"lastName" : "Fernandes",
"description" : "",
"status" : "active",
"createOn" : ISODate("2018-07-20T14:43:37.610Z"),
"updateOn" : ISODate("2018-07-20T14:43:37.610Z"),
"__v" : 0
}
I need help writing a query where I can find this document by looking up the email in any part of the array element assigneeEmail. Any suggestions? I have tried $elemMatch but still could not get it to work.
Looks like my query was just incorrect. I figured it out.

Data not being populated after altering the hive table

I have a hive table which is being populated by the underlying parquet file in HDFS location. Now, I have altered the table schema by changing the column name,but the column is now getting populated with NULL instead of original data in parquet.
Give this a try. Open up the .avsc file. For the column, you will find something like
{
"name" : "start_date",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "start_date",
"sqlType" : "12"
}
Add an alias
{
"name" : "start_date",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "start_date",
"aliases" : [ "sta_dte" ],
"sqlType" : "12"
}

How to setup a field mapping for ElasticSearch that allows both exact and full text searching?

Here is my problem:
I have a field called product_id that is in a format similar to:
A+B-12321412
If I used the standard text analyzer it splits it into tokens like so:
/_analyze/?analyzer=standard&pretty=true" -d '
A+B-1232412
'
{
"tokens" : [ {
"token" : "a",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "b",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "1232412",
"start_offset" : 5,
"end_offset" : 12,
"type" : "<NUM>",
"position" : 3
} ]
}
Ideally, I would like to sometimes search for an exact product id and other times use a sub string and or just do a query for part of the product id.
My understanding of mappings and analyzers is that I can only specify one analyzer per field.
Is there a way to store a field as both analyzed and exact match?
Yes, you can use the fields parameter. In your case:
"product_id": {
"type": "string",
"fields": {
"raw": { "type": "string", "index": "not_analyzed" }
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html
This allows you to index the same data twice, using two different definitions. In this case it will be indexed via both the default analyzer and not_analyzed which will only pick up exact matches. This is also useful for sorting return results:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-fields.html
However, you will need to spend some time thinking about how you want to search. In particular, given part numbers with a mix of alpha, numeric and punctuation or special characters you may need to get creative to tune your queries and matches.