Remove old MongoDB documents - sql

Quite an interesting case. I have an enormous MongoDB collection with lots of documents. These are two of the fields ( I changed the field names).
{
"pidNumber" : NumberLong(12103957251),
"eventDate" : ISODate("2018-05-15T00:00:00.000+0000")
}
I need to count all the instances where the date is older than 1 year but ONLY if there's a more recent document with the same pidNumber.
So for example:
If there's only one document with pidNumber 1234 and it's from three
years ago - keep it (don't count).
But if on top of that there's another document with pidNumber 1234 and
it's from two years ago - count the three years old one.
Is it possible to do? Does anyone have on how to do it?

Not fully clear what you are looking for but this might be a good starting point:
db.collection.aggregate([
{
$group:
{
_id: "$pidNumber",
eventDate: { $max: "$eventDate" },
count: { $sum: 1 }
}
},
{
$match: {
$or: [
{ count: 1 },
{ eventDate: { $gt: moment().endOf('day').subtract(1, 'year').toDate() } }
]
}
}
])
Whenever one have to deal with date/time values I recommend the moment.js library.

Related

Need to convert this SQL query to MongoDB

I am new to MongoDB. I need to convert this SQL code to MongoDB
select TOP 5 r.regionName, COUNT(c.RegionID)
from region as r,
company as c
where c.RegionID = r._id
group by r.regionName
order by COUNT(c.RegionID) DESC;
Option 1. You can use the aggregation framework with $lookup, $group, $project , $sort and $limit stages, but this seems like a wrong approach since the true power to change relation database with mongoDB is the denormalization and avoidance of join ($lookup) like queries.
Option 2. You convert your multi-table relational database schema to document model and proceed with simple $group, $project, $sort and $limit stage aggregation query for the above task.
Since you have not provided any mongodb document examples it is hard to provide how your queries will look like ...
Despite of my comment I try to give a translation (not tested):
db.region.aggregate([
{
$lookup: // left outer join collections
{
from: "company",
localField: "_id",
foreignField: "RegionID",
as: "c"
}
},
{ $match: { c: { $ne: [] } } }, // remove non-matching documents (i.e. INNER JOIN)
{ $group: { _id: "$regionName", regions: { $addToSet: { "$c.RegionID" } } } }, // group and get distinct regions
{ $project: { regionName: "$_id", count: { $size: "$regions" } , _id: 0} } // some cosmetic and count
{ $sort: { regionName: 1 } }, // order result
{ $limit: 5 } // limit number or returned documents
])

Mongodb Query to filter documents based on the length of a field value

I am writing a AWS lambda code in python. My database is AWS DocumentDB. And I use pymongo.
This code snippet works fine
query = {"media_id": {"$exists": True}} collection.find(query)
But it returns a lot of records, so I want to fetch the records where the length of media_id field is less than 3.
For that I tried this query query = { "media_id": {"$exists": True}, "$expr: {"$lt": [{"$strLenCP": "$media_id"}, 3]},}, but I get
Feature not supported Error
because $expr is not supported in DocumentDB.
I am looking for the query which works in DocumentDB.
The solution might seems a bit tedious, but all the operations inside should be supported according to the official doc
Use an aggregation to project an auxiliary field to store the length of media_id then match on your criteria.
db.collection.aggregate([
{
$addFields: {
length: {
"$strLenCP": "$media_id"
}
}
},
{
$match: {
media_id: {
$exists: true
},
length: {
$gte: 3
}
}
},
{
"$project": {
length: false
}
}
])

how to select a single item and get it's relations in faunadb?

I have two collections which have the data in the following format
{
"ref": Ref(Collection("Leads"), "267824207030650373"),
"ts": 1591675917565000,
"data": {
"notes": "voicemail ",
"source": "key-name",
"name": "Glenn"
}
}
{
"ref": Ref(Collection("Sources"), "266777079541924357"),
"ts": 1590677298970000,
"data": {
"key": "key-name",
"value": "Google Ads"
}
}
I want to be able to query the Leads collection and be able to retrieve the corresponding Sources document in a single query
I came up with the following query to try and use an index but I couldn't get it to run
Let(
{
data: Get(Ref(Collection('Leads'), '267824207030650373'))
},
{
data: Select(['data'],Var('data')),
source: q.Lambda('data',
Match(Index('LeadSourceByKey'), Get(Select(['source'], Var('data') )) )
)
}
)
Is there an easy way to retrieve the Sources document ?
What you are looking for is the following query which I broke down for you in multiple steps:
Let(
{
// Get the Lead document
lead: Get(Ref(Collection("Leads"), "269038063157510661")),
// Get the source key out of the lead document
sourceKey: Select(["data", "source"], Var("lead")),
// use the index to get the values via match
sourceValues: Paginate(Match(Index("LeadSourceValuesByKey"), Var("sourceKey")))
},
{
lead: Var("lead"),
sourceValues: Var("sourceValues")
}
)
The result is:
{
lead: {
ref: Ref(Collection("Leads"), "269038063157510661"),
ts: 1592833540970000,
data: {
notes: "voicemail ",
source: "key-name",
name: "Glenn"
}
},
sourceValues: {
data: [["key-name", "Google Ads"]]
}
}
sourceValues is an array since you specified in your index that there will be two items returned, the key and the value and an index always returns the array. Since your Match could have returned multiple values in case it wasn't a one-to-one, this becomes an array of an array.
This is only one approach, you could also make the index return a reference and Map/Get to get the actual document as explained on the forum.
However, I assume you asked the same question here. Although I applaud asking questions on stackoverflow vs slack or even our own forum, please do not just post the same question everywhere without linking to the others. This makes many people spend a lot of time while the question is already answered elsewhere.
You might probably change the Leads document and put the Ref to Sources document in source:
{
"ref": Ref(Collection("Leads"), "267824207030650373"),
"ts": 1591675917565000,
"data": {
"notes": "voicemail ",
"source": Ref(Collection("Sources"), "266777079541924357"),
"name": "Glenn"
}
}
{
"ref": Ref(Collection("Sources"), "266777079541924357"),
"ts": 1590677298970000,
"data": {
"key": "key-name",
"value": "Google Ads"
}
}
And then query this way:
Let(
{
lead: Select(['data'],Get(Ref(Collection('Leads'), '267824207030650373'))),
source:Select(['source'],Var('lead'))
},
{
data: Var('lead'),
source: Select(['data'],Get(Var('source')))
}
)

Creating a couchdb view to index if item in an array exists

I have the following sample documents in my couchdb. The original table in production has about 2M records.
{
{
"_id": "someid|goes|here",
"collected": {
"tags": ["abc", "def", "ghi"]
}
},
{
"_id": "someid1|goes|here",
"collected": {
"tags": ["abc", "klm","pqr"]
},
},
{
"_id": "someid2|goes|here",
"collected": {
"tags": ["efg", "hij","klm"]
},
}
}
Based on my previous question here, how to search for values when the selector is an array,
I currently have an index added for the collected.tags field, but the search is still taking a long time. Here is the search query I have.
{
"selector": {
"collected.tags": {
"$elemMatch": {
"$regex": "abc"
}
}
}
}
There are about 300k records matching the above condition, there search seems to take a long time. So, I want to create a indexed view to retrieve and lookup faster instead of a find/search. I am new to couchdb and am not sure how to setup the map function to create the indexed view.
Figured the map function out myself. Now all the documents are indexed and retrievals are faster
function (doc) {
if(doc.collected.tags.indexOf('abc') > -1){
emit(doc._id, doc);
}
}

extract children + parent details using lodash

I have the following JSON structure and I want to extract all the information about a particular team with a given id including the division it belongs to.
{
"teams":[
{
"divisionName":"5th Grade - Green",
"divisionTeams":[
{
"id":3222,
"name":"Columbia Ravens 5th",
"coach":"John Miller"
},
{
"id":3220,
"name":"HC Elite OMalley 5th",
"coach":"Eddie OMalley"
}
]
},
{
"divisionName":"5th Grade - White",
"divisionTeams":[
{
"id":3225,
"name":"CBSA Hoyas 5th Grade",
"coach":"Terrance Taylor"
},
{
"id":3276,
"name":"HC Elite 4th Tookes",
"coach":"Anthony Tookes"
},
]
}
]
}
I tried using the following lodash code, but it came up as undefined.
var team=_.chain(data.teams)
.flatten("divisionTeams")
.find({"id":3222 })
.value();
console.log(team);
Any help is much appreciated.
You can do everything you need with find() and some(). There's no need to flatten the arrays.
_.find(teams, function(item) {
return _.some(item.divisionTeams, { id: 3222 });
});