Filter datetime value within a JSON collection of a collection inside CosmoDB using SQL - sql

Using Microsoft CosmoDBs SQL like syntax. I have a collection of entries that follow a schema like this (simplified for this post)
{"id":"123456",
"activities": {
"activityA": {
"loginType": "siteA",
"lastLogin": "2018-02-06T19:42:22.205Z"
},
"activityB": {
"loginType": "siteB",
"lastLogin": "2018-03-07T11:39:50.346Z"
},
"activityC": {
"loginType": "siteC",
"lastLogin": "2018-04-08T15:21:15.312Z"
}
}
}
Without knowing the exact index into the activities entry activities list/sub collection, how can I query to get back all items in the Cosmo db collection that have a "lastLogin" matching a date range?
If I only wanted to search on the first item in the activities list, I could do something like this using index 0.
SELECT * FROM c where (c.activities[0].lastLogin > '2018-01-01T00:00:00') and (c.activities[0].lastLogin <= '2019-02-15T00:00:00')
But I want to search all entries in the list. Would be nice if there was something like this:
SELECT * FROM c where (c.activities[?].lastLogin > '2018-01-01T00:00:00') and (c.activities[?].lastLogin <= '2019-02-15T00:00:00')
But that doesn't exist.

The answer is that you can not iterate over a non list collection. Had the collection item been structured like this
{"id":"123456",
"activities": [
{ "label": "activityA",
"loginType": "siteA",
"lastLogin": "2018-02-06T19:42:22.205Z"
},
{
"label": "activtyB",
"loginType": "siteB",
"lastLogin": "2018-03-07T11:39:50.346Z"
},
etc...
It would be easy to crease a UDF to iterate over with something like this
UDF: filterActivityList
function(activityList, targetDateTimeStart, targetDateTimeEnd) {
var s, _i, _len;
for (_i = 0, _len = activityList.length; _i < _len; _i++) {
s = activityList[_i];
if ((s.lastLogin >= targetDateTimeStart) && (s.lastLogin < targetDateTimeEnd))
{
return true;
}
}
return false;
}
Then to query:
select * from c WHERE udf.filterActivityList(c.activities, '2018-01-01T00:00:00', '2018-02-01T00:00:00');
If I were to leave the structure as a JSON hierarchy instead of converting it to a JSON list then I would have to write another udf to accept the top level node of the hierarchy as an input parameter and have it convert the notes under it to a list, then apply the udf.filterActivityList UDF to the result. From my experience this approach is resource intensive and takes a very long time for Cosmo to process.

Related

JSON query to extract all ids where array is not empty

I have a JSON structure similar to this:
{
"id":"1234"
"feedback": {
"Features": []
}
}
I wish to find all the documents where Features is not an empty array.
This is what I have tried:
SELECT * FROM c where ARRAY_LENGTH([c.feedback.Features])> 0
I am not sure if this is the correct approach. Any suggestions are appreciated.
Your query would not work fine and still return the document in below JSON you provided:
{
"id":"1234"
"feedback": {
"Features": []
}
}
Would suggest to use the query like below. This will cover when there are zero items present in 'Feature' attribute as well as if 'Feature' attribute is missing altogether.
SELECT * FROM c where ARRAY_LENGTH([c.feedback.Features[0]]) > 0
It should work if you exclude the surrounding brackets from the property path:
SELECT * FROM c where ARRAY_LENGTH(c.feedback.Features) > 0

Parsing JSON in Snowflake

I'm trying to parse a the below nested JSON in Snowflake using the latteral function in Snowflake but I wanted to each nested column in "GoalTime" to show up as a column. For example,
GoalTime_InDoorOpen
2020-03-26T12:58:00-04:00
GoalTime_InLastOff
null
GoalTime_OutStartBoarding
2020-03-27T14:00:00-04:00
"GoalTime": [
{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
},
or if you have many rows (what appear to be flights) and thus you need to columns per flight this code be what you are after
with data as (
select flight_code, parse_json(json) as json from values ('nz101','{GoalTime:[{"GoalName": "GoalA", "GoalTime": "2020-03-26T12:58:00-04:00"}, {"GoalName": "GoalB"}]}'),
('nz201','{GoalTime:[{"GoalName": "GoalA"}, {"GoalName": "GoalB", "GoalTime": "2020-03-26T12:58:00-02:00"}]}')
j(flight_code, json)
), unrolled as (
select d.flight_code, f.value:GoalName as goal_name, f.value:GoalTime as goal_time
from data d,
lateral flatten (input => json:GoalTime) f
)
select *
from unrolled
pivot(min(goal_time) for goal_name in ('GoalA', 'GoalB'))
order by flight_code;
it gives the results:
FLIGHT_CODE 'GoalA' 'GoalB'
nz101 "2020-03-26T12:58:00-04:00" null
nz201 null "2020-03-26T12:58:00-02:00"
create or replace function JSON_STRING()
returns string
language javascript
as
$$
return `
[
{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
}
]
`;
$$;
select value:GoalName::string as GoalName, value:GoalTime::timestamp as GoalTime
from lateral flatten(input => parse_json(JSON_STRING()));
-- See how the lateral flatten combination works on a JSON variant:
select * from lateral flatten(input => parse_json(JSON_STRING()));
I wrote this to run in any Snowflake worksheet, no tables needed. The function on top simply allows the JSON to be written as a multi-line string in the SQL statement below it. It has no other use than representing a string holding your JSON.
Step 1 is to PARSE_JSON, which converts a string into a variant data type formatted as a JSON object.
Step 2 is the lateral flatten. If you do a select star on that, it will return a number of columns. One of them is "value".
Step 3 is to extract the properties you want using single : notation for the property name and dots to traverse down the nodes from there (if there are any).
Step 4 is to cast the property to the data type you want using double :: notation. This is especially important if you're doing comparisons on the column particularly in join keys.
Note that there's a slight invalid part of the JSON that did not allow it to parse. In the top level the array had a property, which did not parse. I removed that to allow parsing.
Probably close to what you seek is using a standard SQL UNION statement.
Given the following are true to recreate the solution:
Created a table 'JSON_GOALS' with one column for raw JSON called, GOALS_RAW
You have loaded JSON data into a table as the raw JSON, with compliant JSON object array syntax, and a parent, GoalTimeGroup, ex: {[{}]}, so
{
"GoalTimeGroup": [{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
}
]
}
Doing so allows you to write a fairly standard JSON retrieve in Snowflake with the following syntax:
SELECT GOALS_RAW:GoalTimeGroup[0].GoalName, GOALS_RAW:GoalTimeGroup[1].GoalName, GOALS_RAW:GoalTimeGroup[2].GoalName
FROM JSON_GOALS
UNION
SELECT GOALS_RAW:GoalTimeGroup[0].GoalTime, GOALS_RAW:GoalTimeGroup[1].GoalTime, GOALS_RAW:GoalTimeGroup[2].GoalName
FROM JSON_GOALS
;
This gives you closer to the answer you are looking for and seems to provide a simpler solution. You can also control how many rows you'd want based on your JSON object attributes for each GOAL object.
Recommendations to enhance this would be to create a function that could detect the depth of each nested element and perhaps auto generate the indexes for 'n' number of columns.
The library below provides a method called "ExecuteAll" which one of the params is "tags", so if you provide an array of tags and values, all of them will be parsed and validated plus keeping the features of the sql injection protection from Snowflake.
snowflake-multisql

Returning unknown JSON in a query

Here is my scenario. I have data in a Cosmos DB and I want to return c.this, c.that etc as the indexer for Azure Cognitive Search. One field I want to return is JSON of an unknown structure. The one thing I do know about it is that it is flat. However it is my understanding that the return value for an indexer needs to be known. How, using SQL in a SELECT, would I return all JSON elements in the flat object? Here is an example value I would be querying:
{
"BusinessKey": "SomeKey",
"Source": "flat",
"id": "SomeId",
"attributes": {
"Source": "flat",
"Element": "element",
"SomeOtherElement": "someOtherElement"
}
}
So I would want my select to be maybe something like:
SELECT
c.BusinessKey,
c.Source,
c.id,
-- SOMETHING HERE TO LIST OUT ALL ATTRIBUTES IN THE JSON AS FIELDS IN THE RESULT
And I would want the result to be:
{
"BusinessKey": "SomeKey",
"Source": "flat",
"id": "SomeId",
"attributes": [{"Source":"flat"},{"Element":"element"},{"SomeOtherElement":"someotherelement"}]
}
Currently we are calling ToString on the c.attributes, which is the JSON of unknown structure but it is adding all the escape characters. When we want to search the index, we have to add all those escape characters and it's getting really unruly.
Is there a way to do this using SQL?
Thanks for any help!
You could use UDF in cosmos db sql.
UDF code:
function userDefinedFunction(object){
var returnArray = [];
for (var key in object) {
var map = {};
map[key] = object[key];
returnArray.push(map);
}
return returnArray;
}
Sql:
SELECT
c.BusinessKey,
c.Source,
c.id,
udf.test(c.attributes) as attributes
from c
Output:

MongoDB like statement with multiple fields

With SQL we can do the following :
select * from x where concat(x.y ," ",x.z) like "%find m%"
when x.y = "find" and x.z = "me".
How do I do the same thing with MongoDB, When I use a JSON structure similar to this:
{
data:
[
{
id:1,
value : "find"
},
{
id:2,
value : "me"
}
]
}
The comparison to SQL here is not valid since no relational database has the same concept of embedded arrays that MongoDB has, and is provided in your example. You can only "concat" between "fields in a row" of a table. Basically not the same thing.
You can do this with the JavaScript evaluation of $where, which is not optimal, but it's a start. And you can add some extra "smarts" to the match as well with caution:
db.collection.find({
"$or": [
{ "data.value": /^f/ },
{ "data.value": /^m/ }
],
"$where": function() {
var items = [];
this.data.forEach(function(item) {
items.push(item.value);
});
var myString = items.join(" ");
if ( myString.match(/find m/) != null )
return 1;
}
})
So there you go. We optimized this a bit by taking the first characters from your "test string" in each word and compared the tokens to each element of the array in the document.
The next part "concatenates" the array elements into a string and then does a "regex" comparison ( same as "like" ) on the concatenated result to see if it matches. Where it does then the document is considered a match and returned.
Not optimal, but these are the options available to MongoDB on a structure like this. Perhaps the structure should be different. But you don't specify why you want this so we can't advise a better solution to what you want to achieve.

NodeJS JSON Array filtering

I have used Node to retrieve a set of results from SQL and they're returned like this;
[
{
"event_id": 111111,
"date_time": "2012-11-16T01:59:07.000Z",
"agent_addr": "127.0.0.1",
"priority": 6,
"message": "aaaaaaaaa",
"up_time": 9015040,
"hostname": "bbbbbbb",
"context": "ccccccc"
},
{
"event_id": 111112,
"date_time": "2012-11-16T01:59:07.000Z",
"agent_addr": "127.0.0.1",
"priority": 6,
"message": "aaaaaaaaa",
"up_time": 9015040,
"hostname": "bbbbbbb",
"context": "ddddddd"
},
]
There are usually a lot of entries in the array and I need to efficiently filter the array to show only the entries that have a context of "ccccccc". I've tried a for loop, but it's incredibly slow.
Any suggesstions?
There is a very simple way of doing that if you want to do that in node and don't want to use sql for that you can user javascript built-in Array.filter function.
var output = arr.filter(function(x){return x.context=="ccccccc"}); //arr here is you result array
The ouput array will contains only objects having context "ccccccc".
Another way of doing what Khurrum said, is with the arrow function. It has the same result but some people prefer that notation.
var output = arr.filter(x => x.context == "ccccccc" );
As suggested by Matt, why not include WHERE context = "ccccccc" in yout SQL query?
Else if you must keep all in maybe use one of the following to filter the results
// Place all "ccccccc" context row in an array
var ccccccc = [];
for (var i = results.length - 1; i >= 0; i--) {
if(results[i] == 'ccccccc')
ccccccc.push(results[i]);
};
// Place any context in an named array within an object
var contexts = {};
for (var i = results.length - 1; i >= 0; i--) {
if(contexts[results[i]] == 'undefined')
contexts[results[i]]
contexts[results[i]].push(results[i]);
};
or use the underscore (or similar) filter function.