How do I write this SQL query in Mongodb syntax? - sql

How do I write this SQL query in Mongodb syntax?
select a.title
from movies as a
inner join ratings as b on a.movieId=b.movieId
where a.genres like '%Children%'
and b.rating>3
group by a.title;

For this SQL Query:
select movies.title
from movies
inner join ratings on movies.movieId=ratings.movieId
where movies.genres like '%Children%'
and ratings.rating>3
group by movies.title;
The equivalent MongoDB Query is: (included sort and limit as well, remove if not required)
db.movies.aggregate(
[
{
"$lookup" : {
"from" : "ratings",
"localField" : "movieId",
"foreignField" : "movieId",
"as" : "ratings_docs"
}
},
{
"$match" : {
"ratings_docs" : {
"$ne" : [ ]
}
}
},
{
"$addFields" : {
"ratings_docs" : {
"$arrayElemAt" : [
"$ratings_docs",
0
]
}
}
},
{
"$match" : {
"genres" : /^.*Children.*$/is,
"ratings_docs.rating" : {
"$gt" : 3
}
}
},
{
"$group" : {
"_id" : {
"title" : "$title"
}
}
},
{
"$project" : {
"title" : "$_id.title"
}
},
{
"$sort" : {
"_id" : -1
}
},
{
"$limit" : 100
}
]
)
You can also generate the equivalent mongodb query anytime from the tools. like in my case I am using No Sql Booster for MongoDB. I am also using free version of No Sql Booster for MongoDB
Steps that you can follow:
STEP 1: Connect your Mongo DB Query String, and select this SQL as shown in image:
STEP 2: You wll see a text area with mb.runSQLQuery() as shown below. You can write any query, and click on Code. The code will be generated below as shown in image. Don't worry, it converts all the queries, doesnot connect on the database.

Related

Convert SQL query to Java Elasticsearch Query with grails plugin

How can I convert the following SQL query to Java Elasticsearch Query?
SELECT s.*
FROM supplier_detail_info sdi
JOIN supplier s ON sdi.supplier_id = s.id
JOIN supplier_detail sd ON sdi.supplier_detail_id = sd.id
WHERE (sdi.value = '70' AND sd.id = 1 ) OR (sdi.value = '46' and sd.id = 4);
I have tried the following (excluding the OR clause from above) but failed:
def query = QueryBuilders.boolQuery().
must(QueryBuilders.nestedQuery('supplierDetailInfos', QueryBuilders.matchQuery("supplierDetailInfos.value", '46')))
.must(QueryBuilders.nestedQuery('supplierDetailInfos.supplierDetail', QueryBuilders.matchQuery("supplierDetailInfos.supplierDetail.id", 4)))
This resulted in:
{
"bool" : {
"must" : [ {
"nested" : {
"query" : {
"match" : {
"supplierDetailInfos.value" : {
"query" : "46",
"type" : "boolean"
}
}
},
"path" : "supplierDetailInfos"
}
}, {
"nested" : {
"query" : {
"match" : {
"supplierDetailInfos.supplierDetail.id" : {
"query" : 4,
"type" : "boolean"
}
}
},
"path" : "supplierDetailInfos.supplierDetail"
}
} ]
}
}
I'm using grails 2.3.7 and have the followings domains:
class Supplier {
String name
....
static hasMany = [supplierDetailInfos : SupplierDetailInfo]
static searchable = {
only = ['name', 'supplierDetailInfos']
supplierDetailInfos component: true
}
}
class SupplierDetailInfo {
SupplierDetail supplierDetail
Supplier supplier
String value
static searchable = {
only = ['id', 'supplierDetail', 'value']
supplierDetail component: true
}
}
class SupplierDetail {
String name
....
static searchable = {
only = ['id']
}
}
I'm using elasticsearch 2.3.5 and the elastic search plugin "org.grails.plugins:elasticsearch:0.1.0".
From what I understand,
must is equivalent to AND
&
should is equivalent to OR.
The above query should have returned only those suppliers for which the value of supplier_detail_info is '46' when the supplier_detail id is 4. But it returns all suppliers for whom there exists a supplier detail info with value '46'. It simply ignores the 'and sd.id = 4' part of the query. Also, I cannot figure out how to use the OR part.
Instead of a bool query try a nested query with a bool inner query and two must clauses. Also I think that the match queries should be term queries as they seem like keyword/long fields and not text fields and if you want OR make the must a should clause
{
"query":{
"nested":{
"path":"supplierDetailInfos",
"query":{
"bool":{
"must":[
{
"match":{
"supplierDetailInfos.value":"46"
}
},
{
"match":{
"supplierDetailInfos.supplierDetail.id":4
}
}
]
}
}
}
}
}
I found the solution to my problem thanks to #sramalingam24.
The elasticsearch query can be written as follows:
{
"query":{
"nested":{
"path":"supplierDetailInfos",
"query":{
"bool":{
"must":[
{
"match":{
"supplierDetailInfos.value":"46"
}
},
{
"query":{
"nested":{
"path":"supplierDetailInfos.supplierDetail",
"query":{
"bool":{
"must":[
{
"match":{
"supplierDetailInfos.supplierDetail.id":4
}
},
]
}
}
}
}
}
]
}
}
}
}
}
The equivalent query can be written using QueryBuilders as follows:
def supplierDetailQuery = QueryBuilders.boolQuery()
def supplierDetailIdQuery = QueryBuilders.nestedQuery('supplierDetailInfos.supplierDetail',
QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("supplierDetailInfos.supplierDetail.id", id)))
def supplierDetailIdAndValueQuery = QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("supplierDetailInfos.value", value))
.must(supplierDetailIdQuery)
supplierDetailQuery.must(QueryBuilders.nestedQuery('supplierDetailInfos',
supplierDetailIdAndValueQuery))

MongoDB is slower than SQL Server

I have the same data of around 30 million record saved in a SQL Server table and a MongoDB collection. A sample record is shown below, I have set up the same indexes as well. Below are the queries to return the same data, one in SQL the other in mongo. The SQL query takes 2 seconds to compute and return, mongo on the other hand takes 50. Any ideas why mongo so much slower than SQL??
SQL
SELECT
COUNT(DISTINCT IP) AS Count,
DATEPART(dy, datetime)
FROM
collection
GROUP BY
DATEPART(dy, datetime)
MONGO
db.collection.aggregate([{$group:{ "_id": { $dayOfYear:"$datetime" }, IP: { $addToSet: "$IP"} }},{$unwind:"$IP"},{$group:{ _id: "$_id", count: { $sum:1} }}])
Sample Document, there are around 30 million of exact same data in both
{
"_id" : ObjectId("57968ebc7391bb1f7c2f4801"),
"IP" : "127.0.0.1",
"userAgent" : "Mozilla/5.0+(Windows+NT+10.0;+WOW64;+Trident/7.0;+LCTE;+rv:11.0)+like+Gecko",
"Country" : null,
"datetime" : ISODate("2016-07-25T16:50:18-05:00"),
"proxy" : null,
"url" : "/records/archives/archivesdb/deathcertificates/",
"HTTPStatus" : "302",
"HTTPResponseTime" : "218"
}
EDIT: added the explanation of both queries
MONGO
{
"waitedMS" : NumberLong(0),
"stages" : [
{
"$cursor" : {
"query" : {
},
"fields" : {
"IP" : 1,
"datetime" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "IISLogs.pubprdweb01",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
}
}
},
{
"$group" : {
"_id" : {
"$dayOfYear" : [
"$datetime"
]
},
"IP" : {
"$addToSet" : "$IP"
}
}
},
{
"$unwind" : {
"path" : "$IP"
}
},
{
"$group" : {
"_id" : "$_id",
"count" : {
"$sum" : {
"$const" : 1
}
}
}
}
],
"ok" : 1
}
SQL Server I don't have the permissions on it since I'm not a DBA or anything but it works fast enough that I'm not too concerned about its execution plan, the troublesome thing to me is that the mongo is using FETCH
The MongoDB version is slow because $group can't use an index (as evidenced by the "COLLSCAN" in the query plan), so all 30 million docs must be read into memory and run through the pipeline.
This type of real-time query (computing summary data from all docs) is simply not a good fit for MongoDB. It would be better to periodically run your aggregate with an $out stage (or use a map-reduce) to generate the summary data from the main collection and then query the resulting summary collection instead.

What is the default doc sequence of the result from an Elasticsearch filter request?

I recently run an Elasticsearch filter request that is
{
"from" : 0,
"size" : 10,
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : {
"terms" : {
"a_id" : [ 257793, 257798, 257844 ]
}
}
}
}
}
},
"explain" : false,
"fields" : "a_id"
}
So that I can find all docs with a_id in 257793, 257798, 257844 and the results are 257844, 257798, 257793. So far so good.
Then I find that whatever the sequence of the term numbers are, the return docs are always in the same a_id order. That is, even I run
"terms" : {
"a_id" : [257798, 257844, 257793 ]
}
The result docs are in the order of 257844, 257798, 257793 as well.
So I am so curious about the mechanism behind the Elasticsearch filtering. Can anyone help me and give me a hint?
By default, ES returns in descending order of _score. You can provide the sort option, to say in which order and based on what you want the results to be returned. For e.g., for based on date field
{
"sort": { "date": { "order": "desc" }}
"query" : {
"term" : { "user" : "kimchy" }
}
}
You can get more information:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/_sorting.html

Errors doing mongodb nested query

I am having a little trouble with nested queries in mongodb.
I have a collection with the following structure --
{
"_id" : Objectid(..),
"result" : {
"name" : nameValue,
"reference" : base64Value,
"city" : cityValue
}
}
Now I am to do two queries in the mongo shell -
search for a specific reference value (so query for equality)
I am using the following query -
db.TestCollection.find("result.reference" : a3d245e343 }
but I get nothing when I know the record is there in the collection
search and print for all city values.
I am looking to print something like this--
{ "city": "new york city" }
{ "city" : "brooklyn" }
... etc
For this I use this query --
db.TestCollection.find( {}, {"results.city", 1} )
For this I do not get the output I was hoping for but only get a list of all "_id" values like this --
{ "_id" : ObjectId("52e466bd562bdb7b1b320d1d") }
{ "_id" : ObjectId("52e466be562bdb7b1b320d1e") }
{ "_id" : ObjectId("52e466be562bdb7b1b320d1f") }
{ "_id" : ObjectId("52e466bf562bdb7b1b320d20") }
{ "_id" : ObjectId("52e466bf562bdb7b1b320d21") }
{ "_id" : ObjectId("52e466bf562bdb7b1b320d22") }
{ "_id" : ObjectId("52e466c0562bdb7b1b320d23") }
{ "_id" : ObjectId("52e466c0562bdb7b1b320d24") }
{ "_id" : ObjectId("52e466c1562bdb7b1b320d25") }
{ "_id" : ObjectId("52e466c1562bdb7b1b320d26") }
{ "_id" : ObjectId("52e466c2562bdb7b1b320d27") }
{ "_id" : ObjectId("52e466c2562bdb7b1b320d28") }
{ "_id" : ObjectId("52e466c2562bdb7b1b320d29") }
{ "_id" : ObjectId("52e466c3562bdb7b1b320d2a") }
{ "_id" : ObjectId("52e466c3562bdb7b1b320d2b") }
{ "_id" : ObjectId("52e466c4562bdb7b1b320d2c") }
{ "_id" : ObjectId("52e466c4562bdb7b1b320d2d") }
{ "_id" : ObjectId("52e466c4562bdb7b1b320d2e") }
{ "_id" : ObjectId("52e466c5562bdb7b1b320d2f") }
{ "_id" : ObjectId("52e466c5562bdb7b1b320d30") }
has more
What am I doing wrong?
I know there are lot of questions regarding queries but I am still wrapping my head around the whole idea. Thanks for helping a newbie out.
I found the issue with both the commands -
For some reason the condition should be given in single quotes like this --
db.TestCollection.find('result.reference' : a3d245e343 }
I thought I can use double quotes in there but looks like I am wrong.
I was using a comma instead of a colon in the projector. The correct answer is -
db.TestCollection.find( {}, {"results.city" : 1} )

Obtaining Object IDs for Schedule States in Rally

I have set up a "checkbox group" with the five schedule states in our organization's workspace. I would like to query using the Lookback API with the selected schedule states as filters. Since the LBAPI is driven by ObjectIDs, I need to pass in the ID representations of the schedule states, rather than their names. Is there a quick way to get these IDs so I can relate them to the checkbox entries?
Lookback API will accept string-valued ScheduleStates as query arguments. Thus the following query:
{
find: {
_TypeHierarchy: "HierarchicalRequirement",
"ScheduleState": "In-Progress",
__At:"current"
}
}
Works correctly for me. If you want/need OIDs though, and add &fields=true to the end of your REST query URL, you'll notice the following information coming back:
GeneratedQuery: {
{ "fields" : true,
"find" : { "$and" : [ { "_ValidFrom" : { "$lte" : "2013-04-18T20:00:25.751Z" },
"_ValidTo" : { "$gt" : "2013-04-18T20:00:25.751Z" }
} ],
"ScheduleState" : { "$in" : [ 2890498684 ] },
"_TypeHierarchy" : { "$in" : [ -51038,
2890498773,
10487547445
] },
"_ValidFrom" : { "$lte" : "2013-04-18T20:00:25.751Z" }
},
"limit" : 10,
"skip" : 0
}
}
You'll notice the ScheduleState OID here:
"ScheduleState" : { "$in" : [ 2890498684 ] }
So you could run a couple of sample queries on different ScheduleStates and find their corresponding OIDs.