Convert Mongodb schema including nested Object - mongodb-query

Suppose a mongodb collection schema like this:
{
"_id" : ObjectId("5a5b2657a19692e18a3792ad"),
"Toponym" : "Climate station Stavenhagen",
"Lat" : 53.33333,
"Lon" : "13.99999",
"SensorMaker" : "Hitachi",
"SensorClass" : "Thermometer",
"Dimension" : "SoilTemperature_0.05mSensor1",
"Gauge" : "degC"
}
And I would like to change the complete collection (~ 90k items) to this, to conform to minimal GeoJson:
{
"_id" : ObjectId("5a5b2657a19692e18a3792ad"),
"Toponym" : "Climate station Stavenhagen",
"geometry" : {
"type" : "Point",
"coordinates" : [53.33333, 13.99999]
},
"SensorMaker": "Hitachi",
"SensorClass": "Thermometer",
"Dimension" : "SoilTemperature_0.05mSensor1",
"Gauge" : "degC"
}
I tried to convert it using this query, but whatever I do I will receive an error the like "Line 5: Unexpected string":
db.sensor_geo.aggregate([
{ '$group' : {
'_id' : '$_id',
'Toponym' : '$Toponym'
'geometry': { 'type': 'Point', { $set : {"coordinates.$[]": [ {'$Lat', '$Lon'} ] }}},
'SensorMaker' : '$SensorMaker',
'SensorClass' : '$SensorClass',
'Dimension' : '$Dimension',
'Gauge' : '$Gauge'
}
}
]);
Should I've used $push instead of $set, even though this also lead nowhere? Do I also have to create an ObjectID for the nested Object, and that may have caused the problem?

You can try below aggregation pipeline with bulk writes.
Below aggregation changes the Lat and Lon field to geometry with bulk update to write the new geometry field and remove the Lat and Lon fields.
var bulk = db.getCollection("sensor_geo").initializeUnorderedBulkOp();
var count = 0;
var batch = 1;
db.getCollection("sensor_geo").aggregate([
{"$project":{
"geometry":{
"type":"Point", "coordinates":["$Lat", "$Lon"]
}
}}]).forEach(function(doc){
var _id = doc._id;
var geometry = doc.geometry;
bulk.find({ "_id" : _id }).updateOne(
{
$set: {"geometry":geometry},
$unset: {"Lat":"", "Lon":""}
}
);
count++;
if (count == batch) {
bulk.execute();
bulk = db.getCollection("sensor_geo").initializeUnorderedBulkOp();
count = 0;
}
});
if (count > 0) {
bulk.execute();
}

Related

How do I write this SQL query in Mongodb syntax?

How do I write this SQL query in Mongodb syntax?
select a.title
from movies as a
inner join ratings as b on a.movieId=b.movieId
where a.genres like '%Children%'
and b.rating>3
group by a.title;
For this SQL Query:
select movies.title
from movies
inner join ratings on movies.movieId=ratings.movieId
where movies.genres like '%Children%'
and ratings.rating>3
group by movies.title;
The equivalent MongoDB Query is: (included sort and limit as well, remove if not required)
db.movies.aggregate(
[
{
"$lookup" : {
"from" : "ratings",
"localField" : "movieId",
"foreignField" : "movieId",
"as" : "ratings_docs"
}
},
{
"$match" : {
"ratings_docs" : {
"$ne" : [ ]
}
}
},
{
"$addFields" : {
"ratings_docs" : {
"$arrayElemAt" : [
"$ratings_docs",
0
]
}
}
},
{
"$match" : {
"genres" : /^.*Children.*$/is,
"ratings_docs.rating" : {
"$gt" : 3
}
}
},
{
"$group" : {
"_id" : {
"title" : "$title"
}
}
},
{
"$project" : {
"title" : "$_id.title"
}
},
{
"$sort" : {
"_id" : -1
}
},
{
"$limit" : 100
}
]
)
You can also generate the equivalent mongodb query anytime from the tools. like in my case I am using No Sql Booster for MongoDB. I am also using free version of No Sql Booster for MongoDB
Steps that you can follow:
STEP 1: Connect your Mongo DB Query String, and select this SQL as shown in image:
STEP 2: You wll see a text area with mb.runSQLQuery() as shown below. You can write any query, and click on Code. The code will be generated below as shown in image. Don't worry, it converts all the queries, doesnot connect on the database.

Minimize the size of returned json data from spring data repository

I have two microservice, one of them need at boot to load all operator name/codes and index them in a RadixTree.
I am trying to load around 36000 records using feign/data-rest and it is working but I noticed that in the response approximately half of the data size are coming from links
{
"_embedded" : {
"operatorcode" : [ {
"enabled" : true,
"code" : 9320,
"operatorCodeId" : 110695,
"operatorName" : "Afghanistan - Kabul/9320",
"operatorId" : 1647,
"activationDate" : "01-01-2008",
"deactivationDate" : "31-12-2099",
"countryId" : 1,
"_links" : {
"self" : {
"href" : "http://10.44.0.51:8083/operatorcode/110695"
},
"operatorCode" : {
"href" : "http://10.44.0.51:8083/operatorcode/110695{?projection}",
"templated" : true
},
"operator" : {
"href" : "http://10.44.0.51:8083/operatorcode/110695/operator"
}
}
}
...
]
}
Is there any way to stop sending back the _links as in my case it is not being used I tried setting use-hal-as-default-JSON-media-type: false and using projections but did not succeed.
I am not sure that it is a correct way to do this but you can try something like this:
#Bean
public Jackson2ObjectMapperBuilder jacksonBuilder() {
Jackson2ObjectMapperBuilder b = new Jackson2ObjectMapperBuilder();
b.mixIn(Object.class, IgnorePropertiesInJackson.class);
return b;
}
#JsonIgnoreProperties({"_links"})
private abstract class IgnorePropertiesInJackson {
}

db.find vs db.aggregation to select nested array Object

I'v tried to perform the following query :
db.getCollection('fxh').find({"username": "user1", "pf.acc.accnbr" : 915177},{userid: true, "pf.pfid": true, "pf.acc.accid":true})
and my collection is the following :
{
"_id" : ObjectId("5932fd8f381d4c0a7de21942"),
"userid" : 1496513894,
"username" : "user1",
"email" : "user1#gmail.com",
"fullname" : "User 1",
"pf" : {
"acc" : [
{
"cyc" : [
{
"det" : {
"status" : "New",
"dcycid" : 1496513941
},
"status" : "New",
"name" : "QPT202017_M1",
"cycid" : 1496513940
}
],
"status" : "New",
"accnbr" : 915177,
"accid" : 1496513939
},
{
"cyc" : [
{
"det" : {
"status" : "New",
"dcycid" : 1496552643
},
"status" : "New",
"name" : "QPT202017_S8",
"cycid" : 1496552642
}
],
"status" : "New",
"accnbr" : 73497,
"accid" : 1496552641
}
],
"pfid" : 1496513935,
},
"lastupdate" : ISODate("2017-06-03T18:18:55.080Z"),
"__v" : 0
}
When I execute the query the result is the following :
{
"_id" : ObjectId("5932fd8f381d4c0a7de21942"),
"userid" : 1496513894,
"portfolio" : {
"acc" : [
{
"accid" : 1496513939
},
{
"accid" : 1496552641
}
],
"pfid" : 1496513935
}
}
And my problem is that I need to see only the concerned accid and the result returns the all accid !.
Any idea how just to return the selected accid of accnbr ?
NB : I have also tried to add $ sign at the end of my query , it
selects the right acc but it returns the all objects or I need just
only ONE returned object.
On 6/5/17
I also used the aggregate command instead of find and it get result by using this :
db.getCollection('fxh').aggregate([ { $unwind : "$pf.acc"} , { $match : {"username":"adh1", "pf.acc.accbr": 915177 } }, {$project : {_id:0, accid: "$pf.acc.accid"}}])
But could NOT get a lower level result, when I ran this :
db.getCollection('fxh').aggregate([ { $unwind : "$pf.acc.cyc"} , { $match : {"username":"adh1", "pf.acc.accbr": 915177, "pf.acc.cyc.name": "QPT202017_M1" } }, {$project : {_id:0, cycid: "$pf.acc.cyc.cycid"}}])
Any idea ?
You can try the below aggregation pipeline.
The idea is to $unwind one nested level at a time, starting from the outermost to the innermost.
For each nested level unwinding, you can apply the$match to limit the documents and continue till you have the desired shape.
You can $group it together at the end to get back to the original shape.
db.getCollection('fxh').aggregate([
{ $match : {"username":"adh1"} },
{ $unwind : "$pf.acc"} ,
{ $match : {"pf.acc.accbr": 915177 } },
{ $unwind : "$pf.acc.cyc"},
{ $match : {"pf.acc.cyc.name": "QPT202017_M1" } },
{$project : {_id:0, accid: "$pf.acc.accid", cycid: "$pf.acc.cyc.cycid"}}])

MongoDB is slower than SQL Server

I have the same data of around 30 million record saved in a SQL Server table and a MongoDB collection. A sample record is shown below, I have set up the same indexes as well. Below are the queries to return the same data, one in SQL the other in mongo. The SQL query takes 2 seconds to compute and return, mongo on the other hand takes 50. Any ideas why mongo so much slower than SQL??
SQL
SELECT
COUNT(DISTINCT IP) AS Count,
DATEPART(dy, datetime)
FROM
collection
GROUP BY
DATEPART(dy, datetime)
MONGO
db.collection.aggregate([{$group:{ "_id": { $dayOfYear:"$datetime" }, IP: { $addToSet: "$IP"} }},{$unwind:"$IP"},{$group:{ _id: "$_id", count: { $sum:1} }}])
Sample Document, there are around 30 million of exact same data in both
{
"_id" : ObjectId("57968ebc7391bb1f7c2f4801"),
"IP" : "127.0.0.1",
"userAgent" : "Mozilla/5.0+(Windows+NT+10.0;+WOW64;+Trident/7.0;+LCTE;+rv:11.0)+like+Gecko",
"Country" : null,
"datetime" : ISODate("2016-07-25T16:50:18-05:00"),
"proxy" : null,
"url" : "/records/archives/archivesdb/deathcertificates/",
"HTTPStatus" : "302",
"HTTPResponseTime" : "218"
}
EDIT: added the explanation of both queries
MONGO
{
"waitedMS" : NumberLong(0),
"stages" : [
{
"$cursor" : {
"query" : {
},
"fields" : {
"IP" : 1,
"datetime" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "IISLogs.pubprdweb01",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
}
}
},
{
"$group" : {
"_id" : {
"$dayOfYear" : [
"$datetime"
]
},
"IP" : {
"$addToSet" : "$IP"
}
}
},
{
"$unwind" : {
"path" : "$IP"
}
},
{
"$group" : {
"_id" : "$_id",
"count" : {
"$sum" : {
"$const" : 1
}
}
}
}
],
"ok" : 1
}
SQL Server I don't have the permissions on it since I'm not a DBA or anything but it works fast enough that I'm not too concerned about its execution plan, the troublesome thing to me is that the mongo is using FETCH
The MongoDB version is slow because $group can't use an index (as evidenced by the "COLLSCAN" in the query plan), so all 30 million docs must be read into memory and run through the pipeline.
This type of real-time query (computing summary data from all docs) is simply not a good fit for MongoDB. It would be better to periodically run your aggregate with an $out stage (or use a map-reduce) to generate the summary data from the main collection and then query the resulting summary collection instead.

Errors doing mongodb nested query

I am having a little trouble with nested queries in mongodb.
I have a collection with the following structure --
{
"_id" : Objectid(..),
"result" : {
"name" : nameValue,
"reference" : base64Value,
"city" : cityValue
}
}
Now I am to do two queries in the mongo shell -
search for a specific reference value (so query for equality)
I am using the following query -
db.TestCollection.find("result.reference" : a3d245e343 }
but I get nothing when I know the record is there in the collection
search and print for all city values.
I am looking to print something like this--
{ "city": "new york city" }
{ "city" : "brooklyn" }
... etc
For this I use this query --
db.TestCollection.find( {}, {"results.city", 1} )
For this I do not get the output I was hoping for but only get a list of all "_id" values like this --
{ "_id" : ObjectId("52e466bd562bdb7b1b320d1d") }
{ "_id" : ObjectId("52e466be562bdb7b1b320d1e") }
{ "_id" : ObjectId("52e466be562bdb7b1b320d1f") }
{ "_id" : ObjectId("52e466bf562bdb7b1b320d20") }
{ "_id" : ObjectId("52e466bf562bdb7b1b320d21") }
{ "_id" : ObjectId("52e466bf562bdb7b1b320d22") }
{ "_id" : ObjectId("52e466c0562bdb7b1b320d23") }
{ "_id" : ObjectId("52e466c0562bdb7b1b320d24") }
{ "_id" : ObjectId("52e466c1562bdb7b1b320d25") }
{ "_id" : ObjectId("52e466c1562bdb7b1b320d26") }
{ "_id" : ObjectId("52e466c2562bdb7b1b320d27") }
{ "_id" : ObjectId("52e466c2562bdb7b1b320d28") }
{ "_id" : ObjectId("52e466c2562bdb7b1b320d29") }
{ "_id" : ObjectId("52e466c3562bdb7b1b320d2a") }
{ "_id" : ObjectId("52e466c3562bdb7b1b320d2b") }
{ "_id" : ObjectId("52e466c4562bdb7b1b320d2c") }
{ "_id" : ObjectId("52e466c4562bdb7b1b320d2d") }
{ "_id" : ObjectId("52e466c4562bdb7b1b320d2e") }
{ "_id" : ObjectId("52e466c5562bdb7b1b320d2f") }
{ "_id" : ObjectId("52e466c5562bdb7b1b320d30") }
has more
What am I doing wrong?
I know there are lot of questions regarding queries but I am still wrapping my head around the whole idea. Thanks for helping a newbie out.
I found the issue with both the commands -
For some reason the condition should be given in single quotes like this --
db.TestCollection.find('result.reference' : a3d245e343 }
I thought I can use double quotes in there but looks like I am wrong.
I was using a comma instead of a colon in the projector. The correct answer is -
db.TestCollection.find( {}, {"results.city" : 1} )