Mongodb Query to filter documents based on the length of a field value - mongodb-query

I am writing a AWS lambda code in python. My database is AWS DocumentDB. And I use pymongo.
This code snippet works fine
query = {"media_id": {"$exists": True}} collection.find(query)
But it returns a lot of records, so I want to fetch the records where the length of media_id field is less than 3.
For that I tried this query query = { "media_id": {"$exists": True}, "$expr: {"$lt": [{"$strLenCP": "$media_id"}, 3]},}, but I get
Feature not supported Error
because $expr is not supported in DocumentDB.
I am looking for the query which works in DocumentDB.

The solution might seems a bit tedious, but all the operations inside should be supported according to the official doc
Use an aggregation to project an auxiliary field to store the length of media_id then match on your criteria.
db.collection.aggregate([
{
$addFields: {
length: {
"$strLenCP": "$media_id"
}
}
},
{
$match: {
media_id: {
$exists: true
},
length: {
$gte: 3
}
}
},
{
"$project": {
length: false
}
}
])

Related

Flatten complex json using Databricks and ADF

I have following json which I have flattened partially using explode
{
"result":[
{
"employee":[
{
"employeeType":{
"name":"[empName]",
"displayName":"theName"
},
"groupValue":"value1"
},
{
"employeeType":{
"name":"#bossName#",
"displayName":"theBoss"
},
"groupValue":[
{
"id":"1",
"type":{
"name":"firstBoss",
"displayName":"CEO"
},
"name":"Martha"
},
{
"id":"2",
"type":{
"name":"secondBoss",
"displayName":"cto"
},
"name":"Alex"
}
]
}
]
}
]
}
I need to get following fields:
employeeType.name
groupValue
I am able to extract those fields and value. But, if name value starts with # like in "name":"#bossName#", I am getting groupValue as string from which I need to extract id and name.
"groupValue":[
{
"id":"1",
"type":{
"name":"firstBoss",
"displayName":"CEO"
},
"name":"Martha"
},
{
"id":"2",
"type":{
"name":"secondBoss",
"displayName":"cto"
},
"name":"Alex"
}
]
How to convert this string to json and get the values.
My code so far:
from pyspark.sql.functions import *
db_flat = (df.select(explode("result.employee").alias("emp"))
.withColumn("emp_name", col(emp.employeeType.name))
.withColumn("emp_val",col("emp.groupValue")).drop("emp"))
How can I extract groupValue from db_flat and get id and name from it. Maybe use python panda library.
Since you see they won't be dynamic. You can traverse through the json while mapping like as below. Just identify record and array, specify index [i] as needed.
Example:
id --> $['employee'][1]['groupValue'][0]['id']
name --> $['employee'][1]['groupValue'][0]['type']['name']

Need to convert this SQL query to MongoDB

I am new to MongoDB. I need to convert this SQL code to MongoDB
select TOP 5 r.regionName, COUNT(c.RegionID)
from region as r,
company as c
where c.RegionID = r._id
group by r.regionName
order by COUNT(c.RegionID) DESC;
Option 1. You can use the aggregation framework with $lookup, $group, $project , $sort and $limit stages, but this seems like a wrong approach since the true power to change relation database with mongoDB is the denormalization and avoidance of join ($lookup) like queries.
Option 2. You convert your multi-table relational database schema to document model and proceed with simple $group, $project, $sort and $limit stage aggregation query for the above task.
Since you have not provided any mongodb document examples it is hard to provide how your queries will look like ...
Despite of my comment I try to give a translation (not tested):
db.region.aggregate([
{
$lookup: // left outer join collections
{
from: "company",
localField: "_id",
foreignField: "RegionID",
as: "c"
}
},
{ $match: { c: { $ne: [] } } }, // remove non-matching documents (i.e. INNER JOIN)
{ $group: { _id: "$regionName", regions: { $addToSet: { "$c.RegionID" } } } }, // group and get distinct regions
{ $project: { regionName: "$_id", count: { $size: "$regions" } , _id: 0} } // some cosmetic and count
{ $sort: { regionName: 1 } }, // order result
{ $limit: 5 } // limit number or returned documents
])

Remove old MongoDB documents

Quite an interesting case. I have an enormous MongoDB collection with lots of documents. These are two of the fields ( I changed the field names).
{
"pidNumber" : NumberLong(12103957251),
"eventDate" : ISODate("2018-05-15T00:00:00.000+0000")
}
I need to count all the instances where the date is older than 1 year but ONLY if there's a more recent document with the same pidNumber.
So for example:
If there's only one document with pidNumber 1234 and it's from three
years ago - keep it (don't count).
But if on top of that there's another document with pidNumber 1234 and
it's from two years ago - count the three years old one.
Is it possible to do? Does anyone have on how to do it?
Not fully clear what you are looking for but this might be a good starting point:
db.collection.aggregate([
{
$group:
{
_id: "$pidNumber",
eventDate: { $max: "$eventDate" },
count: { $sum: 1 }
}
},
{
$match: {
$or: [
{ count: 1 },
{ eventDate: { $gt: moment().endOf('day').subtract(1, 'year').toDate() } }
]
}
}
])
Whenever one have to deal with date/time values I recommend the moment.js library.

Creating a couchdb view to index if item in an array exists

I have the following sample documents in my couchdb. The original table in production has about 2M records.
{
{
"_id": "someid|goes|here",
"collected": {
"tags": ["abc", "def", "ghi"]
}
},
{
"_id": "someid1|goes|here",
"collected": {
"tags": ["abc", "klm","pqr"]
},
},
{
"_id": "someid2|goes|here",
"collected": {
"tags": ["efg", "hij","klm"]
},
}
}
Based on my previous question here, how to search for values when the selector is an array,
I currently have an index added for the collected.tags field, but the search is still taking a long time. Here is the search query I have.
{
"selector": {
"collected.tags": {
"$elemMatch": {
"$regex": "abc"
}
}
}
}
There are about 300k records matching the above condition, there search seems to take a long time. So, I want to create a indexed view to retrieve and lookup faster instead of a find/search. I am new to couchdb and am not sure how to setup the map function to create the indexed view.
Figured the map function out myself. Now all the documents are indexed and retrievals are faster
function (doc) {
if(doc.collected.tags.indexOf('abc') > -1){
emit(doc._id, doc);
}
}

How to convert this sql query to mongodb

Considering this query written in sql server how would I efficiently convert it to mongodb:
select * from thetable where column1 = column2 * 2
You can use below aggregation.
You project a new field comp to calculate the expression value followed by $match to keep the docs with eq(0) value and $project with exclusion to drop comp field.
db.collection.aggregate([
{ $addFields: {"comp": {$cmp: ["$column1", {$multiply: [ 2, "$column2" ]} ]}}},
{ $match: {"comp":0}},
{ $project:{"comp":0}}
])
If you want to run your query in mongo Shell,
try below code,
db.thetable .find({}).forEach(function(tt){
var ttcol2 = tt.column2 * 2
var comapreCurrent = db.thetable.findOne({_id : tt._id,column1 : ttcol2});
if(comapreCurrent){
printjson(comapreCurrent);
}
});
I liked the answer posted by #Veeram but it would also be possible to achieve this using $project and $match pipeline operation.
This is just for understanding the flow
Assume we have the below 2 documents stored in a math collection
Mongo Documents
{
"_id" : ObjectId("58a055b52f67a312c3993553"),
"num1" : 2,
"num2" : 4
}
{
"_id" : ObjectId("58a055be2f67a312c3993555"),
"num1" : 2,
"num2" : 6
}
Now we need to find if num1 = 2 times of num2 (In our case the document with _id ObjectId("58a055b52f67a312c3993553") will be matching this condition)
Query:
db.math.aggregate([
{
"$project": {
"num2": {
"$multiply": ["$num2",1]
},
"total": {
"$multiply": ["$num1",2]
},
"doc": "$$ROOT"
}
},
{
"$project": {
"areEqual": {"$eq": ["$num2","$total"]
},
doc: 1
}
},
{
"$match": {
"areEqual": true
}
},
{
"$project": {
"_id": 1,
"num1": "$doc.num1",
"num2": "$doc.num2"
}
}
])
Pipeline operation steps:-
The 1st pipeline operation $project calculates the total
The 2nd pipeline operation $project is used to check if the total
matches the num2. This is needed as we cannot use the comparision
operation of num2 with total in the $match pipeline operation
The 3rd pipeline operation matches if areEqual is true
The 4th pipeline operation $project is just used for projecting the fields
Note:-
In the 1st pipeline operation I have multiplied num2 with 1 as num1 and num2 are stored as integers and $multiply returns double value. So incase I do not use $mutiply for num2, then it tries to match 4 equals 4.0 which will not match the document.
Certainly no need for multiple pipeline stages when a single $redact pipeline will suffice as it beautifully incorporates the functionality of $project and $match pipeline steps. Consider running the following pipeline for an efficient query:
db.collection.aggregate([
{
"$redact": {
"$cond": [
{
"$eq": [
"$column1",
{ "$multiply": ["$column2", 2] }
]
},
"$$KEEP",
"$$PRUNE"
]
}
}
])
In the above, $redact will return all documents that match the condition using $$KEEP and discards those that don't match using the $$PRUNE system variable.