PyMongo Aggregation "AttributeError: 'dict' object has no attribute '_txn_read_preference'" - pymongo

I'm sure there is an error in my code since I'm a newby to pyMongo, but I'll give it a go. The data in MongoDB is 167k+ and is as follows:
{'overall': 5.0,
'reviewText': {'ago': 1,
'buy': 2,
'daughter': 1,
'holiday': 1,
'love': 2,
'niece': 1,
'one': 2,
'still': 1,
'today': 1,
'use': 1,
'year': 1},
'reviewerName': 'dcrm'}
I would like to get a tally of terms used within that reviewText field for all 5.0 ratings. I have run the following code and I get the error that follows. Any insight?
#1 Find the top 20 most common words found in 1-star reviews.
aggr = [{"$unwind": "$reviewText"},
{"$group": { "_id": "$reviewText", "word_freq": {"$sum":1}}},
{"$sort": {"word_freq": -1}},
{"$limit": 20},
{"$project": {"overall":"$overall", "word_freq":1}}]
disk_use = { 'allowDiskUse': True }
findings = list(collection.aggregate(aggr, disk_use))
for item in findings:
p(item)
As you can see, I came across the 'allDiskUse' component since I seemed to exceed the 100MB threshold. But the error that I get is:
AttributeError: 'dict' object has no attribute '_txn_read_preference'

you are quite close, allowDiskUse is named parameter not a dictionary so the statement should be like this
findings = list(collection.aggregate(aggr, allowDiskUse=True))
or
findings = list(collection.aggregate(aggr, **disk_use ))

ManishSingh response is the best, but if you don't get exactly what he means and why you are having this error I can clarify what you have and why this is not correct:
The problem might be that you are using "allowDiskUse" with quotes like this:
findings = list(collection.aggregate(aggr, {"allowDiskUse": True})) # wrong
but the correct is this:
findings = list(collection.aggregate(aggr, allowDiskUse=True)) # correct

Related

MongoDB - how to get last item within a two level hierarchy

My goal is to get the last stock quote from a database for each of the ticker symbols in the list.
I found the $last function, but not sure how to apply it to a level of a hierarchy, i.e. the per ticker symbol. (I just realized I only have need one level in the hierarchy now, but it would be nice to know how to do it for two als0. Per this doc page, the _id is supposed to be the group by expression, and I have set that to the ticker symbol.
(I'm using Python3 and PyMongo.)
This is the query I have now, but it is only returning one row. I want it to return 4 rows, one for each ticker.
tickerList = ['AAPL', 'MSFT', 'AMZN', 'NFLX']
docs = dbCollectionQuotes.aggregate([
{'$match': {
'$and': [
{'timestampYear': 2020},
{'timestampMonth': 10},
{'timestampDay': 6},
{'timestampHour': 15},
{'ticker': {'$in': tickerList }}
]
}},
{'$sort': {
'ticker': 1,
'timestampIsoDateTime': 1
}},
{'$group': {
'_id': '$ticker',
'lastId': {'$last': '$_id'},
'minuteClose': {'$last': '$minuteClose'},
'todaysChangePerc': {'$last': '$todaysChangePerc'},
'timestampIsoDateTime': {'$last': '$timestampIsoDateTime'}
}}
])

Pymongo: Best way to remove $oid in Response

I have started using Pymongo recently and now I want to find the best way to remove $oid in Response
When I use find:
result = db.nodes.find_one({ "name": "Archer" }
And get the response:
json.loads(dumps(result))
The result would be:
{
"_id": {
"$oid": "5e7511c45cb29ef48b8cfcff"
},
"about": "A jazz pianist falls for an aspiring actress in Los Angeles."
}
My expected:
{
"_id": "5e7511c45cb29ef48b8cfcff",
"about": "A jazz pianist falls for an aspiring actress in Los Angeles."
}
As you seen, we can use:
resp = json.loads(dumps(result))
resp['id'] = resp['id']['$oid']
But I think this is not the best way. Hope you guys have better solution.
You can take advantage of aggregation:
result = db.nodes.aggregate([{'$match': {"name": "Archer"}}
{'$addFields': {"Id": '$_id.oid'}},
{'$project': {'_id': 0}}])
data = json.dumps(list(result))
Here, with $addFields I add a new field Id in which I introduce the value of oid. Then I make a projection where I eliminate the _id field of the result. After, as I get a cursor, I turn it into a list.
It may not work as you hope but the general idea is there.
First of all, there's no $oid in the response. What you are seeing is the python driver represent the _id field as an ObjectId instance, and then the dumps() method represent the the ObjectId field as a string format. the $oid bit is just to let you know the field is an ObjectId should you need to use for some purpose later.
The next part of the answer depends on what exactly you are trying to achieve. Almost certainly you can acheive it using the result object without converting it to JSON.
If you just want to get rid of it altogether, you can do :
result = db.nodes.find_one({ "name": "Archer" }, {'_id': 0})
print(result)
which gives:
{"name": "Archer"}
import re
def remove_oid(string):
while True:
pattern = re.compile('{\s*"\$oid":\s*(\"[a-z0-9]{1,}\")\s*}')
match = re.search(pattern, string)
if match:
string = string.replace(match.group(0), match.group(1))
else:
return string
string = json_dumps(mongo_query_result)
string = remove_oid(string)
I am using some form of custom handler. I managed to remove $oid and replace it with just the id string:
# Custom Handler
def my_handler(x):
if isinstance(x, datetime.datetime):
return x.isoformat()
elif isinstance(x, bson.objectid.ObjectId):
return str(x)
else:
raise TypeError(x)
# parsing
def parse_json(data):
return json.loads(json.dumps(data, default=my_handler))
result = db.nodes.aggregate([{'$match': {"name": "Archer"}}
{'$addFields': {"_id": '$_id'}},
{'$project': {'_id': 0}}])
data = parse_json(result)
In the second argument of find_one, you can define which fields to exclude, in the following way:
site_information = mongo.db.sites.find_one({'username': username}, {'_id': False})
This statement will exclude the '_id' field from being selected from the returned documents.

AWS IoT rules SQL query Select not returning expected values from shadow

I am trying to create a rule that publishes the selected data from the things shadow.
My SQL query is
SELECT state FROM '$aws/things/+/shadow/update/accepted'
I would expect this to return both the desired and reported but it only return one object and not nested.
{
temp: 200,
io: false
}
instead of
{
desired: {
temp: 200,
io: true
},
reported: {
temp: 200,
io: false
}
}
so then I tried doing
SELECT state.desired, state.reported FROM '$aws/things/+/shadow/update/accepted'
and I only recieve the the object, basically which ever I put at the end of the SELECT statement after the ,
Anyone have any idea? I am trying to strip out all the metadata and timestamps.
Found the answer for whoever comes across this in the future. In the rule creation above where you enter your SQL Query, you need to change the SQL version to beta.

Filtering dstore collection against an array field

I'm trying to filter a dstore collection by a field that has an array of values. My json data looks like the following (simplified):
[{
user_id: 1,
user_name: "John Doe",
teams: [{team_id: 100, team_name: 'Red Sox'}, {team_id: 101, team_name: 'Buccaneers'}]
},
{
user_id: 2,
user_name: "Fred Smith",
teams: [{team_id: 100, team_name: 'Buccaneers'}, {team_id: 102, team_name: 'Rays'}]
}]
I can do a simple filter against the username field and it works perfectly.
this.dstoreFilter = new this.dstore.Filter();
var results = this.dgrid.set('collection', this.dstore.filter(
this.dstoreFilter.match('user_name',new RegExp(searchTerm, 'i'))
));
How, though, do I construct a filter to show me only those players who play for the Red Sox, for example. I've tried using the filter.contains() method, but I can't find any adequate documentation on how it works. Looking at the dstore code, I see that the filter.contains() method has the following signature: (value, required, object, key), but that's not helping me much.
Any guidance would be much appreciated. Thanks in advance!
You can find documentation on Filtering here.
In your case, .contains() will not work because it is intended to work on values of array type. What you want to filter here is array of objects. Here is a quote from the doc link:
contains: Filters for objects where the specified property's value is an array and the array contains any value that equals the provided value or satisfies the provided expression.
In my opinion, the best way here is to override the filter method where you want to filter by team name. Here is some sample code:
this.grid.set('collection', this.dstore.filter(lang.hitch(this, function (item) {
var displayUser = false;
for(var i=0; i < item.teams.length; i++){
var team = item.teams[i];
if(team.team_name == 'Red Sox'){
displayUser = true;
break;
}
}
return displayUser;
})));
this.grid.refresh();
For each user in the store, if false is returned, it's display is set to false and if true is returned it gets displayed. This is by far the easiest way that I know of to apply complex filtering on dstore.
Some similar questions that you might want to read up: link, link, link

Is it possible to turn an array returned by the Mongo GeoNear command (using Ruby/Rails) into a Plucky object?

As a total newbie I have been trying to get the geoNear command working in my rails application and it appear to be working fine. The major annoyance for me is that it is returning an array with strings rather than keys which I can call on to pull out data.
Having dug around, I understand that MongoMapper uses Plucky to turn the the query resultant into a friendly object which can be handled easily but I haven't been able to find out how to transform the result of my geoNear query into a plucky object.
My questions are:
(a) Is it possible to turn this into a plucky object and how do i do that?
(b) If it is not possible how can I most simply and systematically extract each record and each field?
here is the query in my controller
#mult = 3963 * (3.14159265 / 180 ) # Scale to miles on earth
#results = #db.command( {'geoNear' => "places", 'near'=> #search.coordinates , 'distanceMultiplier' => #mult, 'spherical' => true})
Here is the object i'm getting back (with document content removed for simplicity)
{"ns"=>"myapp-development.places", "near"=>"1001110101110101100100110001100010100010000010111010", "results"=>[{"dis"=>0.04356444023196527, "obj"=>{"_id"=>BSON::ObjectId('4ee6a7d210a81f05fe000001'),...}}], "stats"=>{"time"=>0, "btreelocs"=>0, "nscanned"=>1, "objectsLoaded"=>1, "avgDistance"=>0.04356444023196527, "maxDistance"=>0.0006301239824196907}, "ok"=>1.0}
Help is much appreciated!!
Ok so lets say you store the results into a variable called places_near:
places_near = t.command( {'geoNear' => "places", 'near'=> [50,50] , 'distanceMultiplier' => 1, 'spherical' => true})
This command returns an hash that has a key (results) which maps to a list of results for the query. The returned document looks like this:
{
"ns": "test.places",
"near": "1100110000001111110000001111110000001111110000001111",
"results": [
{
"dis": 69.29646421910687,
"obj": {
"_id": ObjectId("4b8bd6b93b83c574d8760280"),
"y": [
1,
1
],
"category": "Coffee"
}
},
{
"dis": 69.29646421910687,
"obj": {
"_id": ObjectId("4b8bd6b03b83c574d876027f"),
"y": [
1,
1
]
}
}
],
"stats": {
"time": 0,
"btreelocs": 1,
"btreelocs": 1,
"nscanned": 2,
"nscanned": 2,
"objectsLoaded": 2,
"objectsLoaded": 2,
"avgDistance": 69.29646421910687
},
"ok": 1
}
To iterate over the responses just iterate as you would over any list in ruby:
places_near['results'].each do |result|
# do stuff with result object
end