MongoDB like statement with multiple fields - sql

With SQL we can do the following :
select * from x where concat(x.y ," ",x.z) like "%find m%"
when x.y = "find" and x.z = "me".
How do I do the same thing with MongoDB, When I use a JSON structure similar to this:
{
data:
[
{
id:1,
value : "find"
},
{
id:2,
value : "me"
}
]
}

The comparison to SQL here is not valid since no relational database has the same concept of embedded arrays that MongoDB has, and is provided in your example. You can only "concat" between "fields in a row" of a table. Basically not the same thing.
You can do this with the JavaScript evaluation of $where, which is not optimal, but it's a start. And you can add some extra "smarts" to the match as well with caution:
db.collection.find({
"$or": [
{ "data.value": /^f/ },
{ "data.value": /^m/ }
],
"$where": function() {
var items = [];
this.data.forEach(function(item) {
items.push(item.value);
});
var myString = items.join(" ");
if ( myString.match(/find m/) != null )
return 1;
}
})
So there you go. We optimized this a bit by taking the first characters from your "test string" in each word and compared the tokens to each element of the array in the document.
The next part "concatenates" the array elements into a string and then does a "regex" comparison ( same as "like" ) on the concatenated result to see if it matches. Where it does then the document is considered a match and returned.
Not optimal, but these are the options available to MongoDB on a structure like this. Perhaps the structure should be different. But you don't specify why you want this so we can't advise a better solution to what you want to achieve.

Related

Add exception to natural sorting in datatables

Is there a way to add exceptions to the natural ordering plugin, so that it ignores things like c. , [, ], ? ?
This is an example of my data:
161?
1604
[1563]
c. 1476
I'd like the sorted asc. output to be:
c. 1476
[1563]
1604
161?
Right now what I get is all the numbers first, and the strings beginning with [ afterwards.
My initialisation code:
<script type="text/javascript" src="//cdn.datatables.net/plug-ins/1.10.24/sorting/natural.js"></script>
$('#sourcesList').DataTable({
"paging": false,
"columnDefs": [
{ type: 'natural-nohmtl', targets: '_all' }
],
[...]
PS: this is my data in the wild.
The natural sorting option takes care of certain types of numeric data which you can reasonably expect to encounter in the "real world" - such as numbers with different thousands separators, or with currency symbols of different types in different positions - or "1st", "2nd" "3rd" and so on. Don't quote me on these exact examples, as I have not looked at that add-on in detail. But that is the overall idea.
Items such as question-marks - I would not expect that add-on to handle those. Items in square brackets - same thing.
Given your clarification in a comment, I think you do not need or want this add-on.
Instead you can use a column renderer with DataTables' ability to store different versions of a value - for display, sorting, and filtering.
var dataSet = [ ["161?"], ["1604"], ["[1563]"], ["c. 1476"] ];
$(document).ready(function() {
$('#example').DataTable( {
data: dataSet,
columnDefs: [ {
type: "natural-nohmtl",
targets: [ 0 ],
title: "My Data",
"render": function ( data, type, row, meta ) {
if ( type === 'sort' ) {
//return parseInt(data.replace(/\D/g, '')); // numbers as numbers
return data.replace(/\D/g, ''); // numbers as strings
} else {
return data;
}
}
} ]
} );
} );
This saves a version of each value with all non-digits stripped from the value, using replace(/\D/g, ''). This altered version of the data is stored as the value which will be used when sorting. The unaltered version is what the user will see.
My result:
This is the case where numbers are treated as text.
You can uncomment the commented-out return statement to get numbers as numbers.
However...
This crude approach of stripping non-digits from the data meets the narrow example from your question, but it may not be sufficient for all the data you are expecting to handle. So, you may need to refine the "replace" logic.
But the render function should meet your needs, more generally.

Parsing JSON in Snowflake

I'm trying to parse a the below nested JSON in Snowflake using the latteral function in Snowflake but I wanted to each nested column in "GoalTime" to show up as a column. For example,
GoalTime_InDoorOpen
2020-03-26T12:58:00-04:00
GoalTime_InLastOff
null
GoalTime_OutStartBoarding
2020-03-27T14:00:00-04:00
"GoalTime": [
{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
},
or if you have many rows (what appear to be flights) and thus you need to columns per flight this code be what you are after
with data as (
select flight_code, parse_json(json) as json from values ('nz101','{GoalTime:[{"GoalName": "GoalA", "GoalTime": "2020-03-26T12:58:00-04:00"}, {"GoalName": "GoalB"}]}'),
('nz201','{GoalTime:[{"GoalName": "GoalA"}, {"GoalName": "GoalB", "GoalTime": "2020-03-26T12:58:00-02:00"}]}')
j(flight_code, json)
), unrolled as (
select d.flight_code, f.value:GoalName as goal_name, f.value:GoalTime as goal_time
from data d,
lateral flatten (input => json:GoalTime) f
)
select *
from unrolled
pivot(min(goal_time) for goal_name in ('GoalA', 'GoalB'))
order by flight_code;
it gives the results:
FLIGHT_CODE 'GoalA' 'GoalB'
nz101 "2020-03-26T12:58:00-04:00" null
nz201 null "2020-03-26T12:58:00-02:00"
create or replace function JSON_STRING()
returns string
language javascript
as
$$
return `
[
{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
}
]
`;
$$;
select value:GoalName::string as GoalName, value:GoalTime::timestamp as GoalTime
from lateral flatten(input => parse_json(JSON_STRING()));
-- See how the lateral flatten combination works on a JSON variant:
select * from lateral flatten(input => parse_json(JSON_STRING()));
I wrote this to run in any Snowflake worksheet, no tables needed. The function on top simply allows the JSON to be written as a multi-line string in the SQL statement below it. It has no other use than representing a string holding your JSON.
Step 1 is to PARSE_JSON, which converts a string into a variant data type formatted as a JSON object.
Step 2 is the lateral flatten. If you do a select star on that, it will return a number of columns. One of them is "value".
Step 3 is to extract the properties you want using single : notation for the property name and dots to traverse down the nodes from there (if there are any).
Step 4 is to cast the property to the data type you want using double :: notation. This is especially important if you're doing comparisons on the column particularly in join keys.
Note that there's a slight invalid part of the JSON that did not allow it to parse. In the top level the array had a property, which did not parse. I removed that to allow parsing.
Probably close to what you seek is using a standard SQL UNION statement.
Given the following are true to recreate the solution:
Created a table 'JSON_GOALS' with one column for raw JSON called, GOALS_RAW
You have loaded JSON data into a table as the raw JSON, with compliant JSON object array syntax, and a parent, GoalTimeGroup, ex: {[{}]}, so
{
"GoalTimeGroup": [{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
}
]
}
Doing so allows you to write a fairly standard JSON retrieve in Snowflake with the following syntax:
SELECT GOALS_RAW:GoalTimeGroup[0].GoalName, GOALS_RAW:GoalTimeGroup[1].GoalName, GOALS_RAW:GoalTimeGroup[2].GoalName
FROM JSON_GOALS
UNION
SELECT GOALS_RAW:GoalTimeGroup[0].GoalTime, GOALS_RAW:GoalTimeGroup[1].GoalTime, GOALS_RAW:GoalTimeGroup[2].GoalName
FROM JSON_GOALS
;
This gives you closer to the answer you are looking for and seems to provide a simpler solution. You can also control how many rows you'd want based on your JSON object attributes for each GOAL object.
Recommendations to enhance this would be to create a function that could detect the depth of each nested element and perhaps auto generate the indexes for 'n' number of columns.
The library below provides a method called "ExecuteAll" which one of the params is "tags", so if you provide an array of tags and values, all of them will be parsed and validated plus keeping the features of the sql injection protection from Snowflake.
snowflake-multisql

Returning unknown JSON in a query

Here is my scenario. I have data in a Cosmos DB and I want to return c.this, c.that etc as the indexer for Azure Cognitive Search. One field I want to return is JSON of an unknown structure. The one thing I do know about it is that it is flat. However it is my understanding that the return value for an indexer needs to be known. How, using SQL in a SELECT, would I return all JSON elements in the flat object? Here is an example value I would be querying:
{
"BusinessKey": "SomeKey",
"Source": "flat",
"id": "SomeId",
"attributes": {
"Source": "flat",
"Element": "element",
"SomeOtherElement": "someOtherElement"
}
}
So I would want my select to be maybe something like:
SELECT
c.BusinessKey,
c.Source,
c.id,
-- SOMETHING HERE TO LIST OUT ALL ATTRIBUTES IN THE JSON AS FIELDS IN THE RESULT
And I would want the result to be:
{
"BusinessKey": "SomeKey",
"Source": "flat",
"id": "SomeId",
"attributes": [{"Source":"flat"},{"Element":"element"},{"SomeOtherElement":"someotherelement"}]
}
Currently we are calling ToString on the c.attributes, which is the JSON of unknown structure but it is adding all the escape characters. When we want to search the index, we have to add all those escape characters and it's getting really unruly.
Is there a way to do this using SQL?
Thanks for any help!
You could use UDF in cosmos db sql.
UDF code:
function userDefinedFunction(object){
var returnArray = [];
for (var key in object) {
var map = {};
map[key] = object[key];
returnArray.push(map);
}
return returnArray;
}
Sql:
SELECT
c.BusinessKey,
c.Source,
c.id,
udf.test(c.attributes) as attributes
from c
Output:

Is there a way to use the graphLookup aggregation pipeline stage for arrays?

I am currently working on an application that uses MongoDB as the data repository. I am mainly concerned about the graphLookup query to establish links between different people, based on what flights they took. My document contains an array field, that in turn contains key value pairs. I need to establish the links based on one of the key:value pairs of that array.
I have already tried some queries of aggregation pipeline with $graphLookup as one of the stages and they have all worked fine. But now that I am trying to use it with an array, I am hitting a blank.
Below is the array field from the first document :
"movementSegments":[
{
"carrierCode":"MO269",
"departureDateTimeMillis":1550932676000,
"arrivalDateTimeMillis":1551019076000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA007_1550932676000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_8",
"departureMonthlyTemporalSpatialWindow":"DOH_2",
"arrivalWeeklyTemporalSpatialWindow":"LHR_8",
"arrivalMonthlyTemporalSpatialWindow":"LHR_2"
}
]
The other document has the below field :
"movementSegments":[
{
"carrierCode":"MO269",
"departureDateTimeMillis":1548254276000,
"arrivalDateTimeMillis":1548340676000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA003_1548254276000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_4",
"departureMonthlyTemporalSpatialWindow":"DOH_1",
"arrivalWeeklyTemporalSpatialWindow":"LHR_4",
"arrivalMonthlyTemporalSpatialWindow":"LHR_1"
},
{
"carrierCode":"MO270",
"departureDateTimeMillis":1548254276000,
"arrivalDateTimeMillis":1548340676000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA003_1548254276000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_4",
"departureMonthlyTemporalSpatialWindow":"DOH_1",
"arrivalWeeklyTemporalSpatialWindow":"LHR_4",
"arrivalMonthlyTemporalSpatialWindow":"LHR_1"
}
]
And I am running the below query :
db.person_events.aggregate([
{ $match: { eventId: "22446688" } },
{
$graphLookup: {
from: 'person_events',
startWith: '$movementSegments.carrierCode',
connectFromField: 'carrierCode',
connectToField: 'carrierCode',
as: 'carrier_connections'
}
}
])
The above query creates an array field in the document, but there are no values in it. As per the expectation, both my documents should get linked based on the carrier number.
Just to be clear about the query, the documents contain an eventId field, and the match pipeline returns one document to me after the match stage.
Well, I don't know how I missed it, but here is the solution to my problem which gives me the required results :
db.person_events.aggregate([
{ $match: { eventId: "22446688" } },
{
$graphLookup: {
from: 'person_events',
startWith: '$movementSegments.carrierCode',
connectFromField: 'movementSegments.carrierCode',
connectToField: 'movementSegments.carrierCode',
as: 'carrier_connections'
}
}
])

Dojo DGrid RQL Search

I am working with a dgrid where I want to find a search term in my grid on two columns.
For instance, I want to see if the scientific name and commonName columns contain the string "Aca" (I want my search to be case insensitive)
My Grid definition:
var CustomGrid = declare([Grid, Pagination ]);
var gridStore = new Memory({ idProperty: 'tsn', data: null });
gridStore.queryEngine = rql.query;
grid = new CustomGrid({
store: gridStore,
columns:
[
{ field: "tsn", label: "TSN #"},
{ field: "scientificName", label: "Scientific Name"},
{ field: "commonName", label: "Common Name",},
],
autoHeight: 'true',
firstLastArrows: 'true',
pageSizeOptions: [50, 100],
}, id);
With the built in query language (I think simple query language), I was able to find the term in one column or the other, but I couldn't do a complex search that would return results for both columns.
grid.set("query", { scientificName : new RegExp(speciesKeyword, "i") });
grid.refresh()
I started reading and I think RQL can solve this problem, however, I am struggling with the syntax.
I have been looking at these pages:
http://rql-engine.eu01.aws.af.cm/
https://github.com/kriszyp/rql
And I am able to understand basic queries, however the "contains" syntax eludes me.
For instance if I had this simple data set and wanted to find the entries with scientific and common names that contain the string "Aca" I would think my contains query would look like this:
contains(scientificName,string:aca)
However, this results in no matches.
[
{
"tsn": 1,
"scientificName": "Acalypha ostryifolia",
"commonName": "Rough-pod Copperleaf",
},
{
"tsn": 2,
"scientificName": "Aegalius acadicus",
"commonName": "Northern Saw-whet Owl",
},
{
"tsn": 3,
"scientificName": "Portulaca pilosa",
"commonName": "2012-02-01",
},
{
"tsn": 4,
"scientificName": "Accipiter striatus",
"commonName": "Kiss-me-quick",
},
{
"tsn": 5,
"scientificName": "Acorus americanus",
"commonName": "American Sweetflag",
}
]
Can someone guide me in how to formulate the correct syntax? Thank you.
From what I'm briefly reading, it appears that:
contains was replaced by any and all
these are meant for array comparisons, not string comparisons
I'm not sure offhand whether RegExps can just be handed to other operations e.g. eq.
With dojo/store/Memory, you can also pass a query function which will allow you to do whatever you want, so if you wanted to compare for a match in one field or the other you could do something like this:
grid.set('query', function (item) {
var scientificRx = new RegExp(speciesKeyword, 'i');
var commonRx = new RegExp(...);
return scientificRx.test(item.scientificName) || commonRx.test(item.commonName);
});
Of course, if you want to filter only items that match both, you can do that with simple object syntax:
grid.set('query', {
scientificName: scientificRx,
commonName: commonRx
});