PyMongo: how to get all objects that match any of possible filters? - pymongo

I have a list of "usernames" as an array, and a list of posts in a mongodb collection with an "author" variable. I want to get all objects from the collection whose author is one of the usernames in the array.
so if:
Collection:
{
"author": "tim"
},
{
"author": "bob"
},
{
"author": "jon"
}
following = ["tim", "jon"]
then i only want to get the posts by tim and jon

I got the answer; it should be:
following = ["tim", "jon"]
mongo.db.posts.find({"author": {"$in": following}})
This will only get posts where the author is tim or jon.

Related

Query for entire JSON document in nested JSON schema

Background:
I wish to locate the entire JSON document that has a condition where "state" = "new" and where length(Features.id) > 4
{
"id": "123"
"feedback": {
"Features": [
{
"state": "new"
"id": "12345"
}
]
}
}
This is what I have tried to do:
Since this is a nested document. My query looks like this:
A stackoverflow member has helped me to access the nested contents within the query, but is there a way to obtain the full document
I have used:
SELECT VALUE t.id FROM t IN f.feedback.Features where t.state = 'new' and length(t.id)>4
This will give me the ids.
My desire is to have access to the full document with this condition?
{
"id": "123"
"feedback": {
"Features": [
{
"state": "new"
"id": "12345"
}
]
}
}
Any help is appreciated
Try this
SELECT *
FROM f
WHERE
f.feedback.Features[0].state = 'new'
AND length(f.feedback.Features[0].id)>4
Here is the SELECT spec for CosmosDB for more details
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-select
Also, check out "working with JSON" in CosmosDB notes
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-working-with-json
If the Features array has more than 1 value, you can use EXISTS clause to search within them. See specs of EXISTS here with examples:
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-subquery#exists-expression

Mapping Apple podcast episode id to rss feed element

I'm trying to map Apple podcast's episode id to that specific podcast entry in RSS feed. Say I have the episode with the following link https://podcasts.apple.com/us/podcast/the-numberphile-podcast/id1441474794?i=1000475383420 so the podcast_id=1441474794 and episode_id=1000475383420. Now I'm able to get the RSS feed with podcast id through this code:
from urllib.request import urlopen
import json
import xmltodict
podcast_id = "1441474794"
ITUNES_URL = 'https://itunes.apple.com/lookup?id='
with urlopen(ITUNES_URL + podcast_id) as response:
res = json.load(response)
feedUrl = res['results'][0]['feedUrl']
print(feedUrl)
with urlopen(feedUrl) as response:
res = xmltodict.parse(response)
with open('res.json', "w") as f:
f.write(json.dumps(res))
This gives me a JSON with some general info about the podcast and an array with all the episodes. For a specific episode the result looks like this:
"item": [
{
"title": "The Parker Quiz - with Matt Parker",
"dc:creator": "Brady Haran",
"pubDate": "Thu, 21 May 2020 16:59:08 +0000",
"link": "https://www.numberphile.com/podcast/matt-parker-quiz",
"guid": {
"#isPermaLink": "false",
"#text": "5b2cf993266c07b1ca7a812f:5bd2f1a04785d353e1b39d76:5ec683354f70a700f9f04555"
},
"description": "some description here...",
"itunes:author": "Numberphile Podcast",
"itunes:subtitle": "Matt Parker takes a quiz prepared by Brady. The YouTube version of this quiz contains a few visuals at https://youtu.be/hMwQwppzrys",
"itunes:explicit": "no",
"itunes:duration": "00:55:34",
"itunes:image": {
"#href": "https://images.squarespace-cdn.com/content/5b2cf993266c07b1ca7a812f/1541821254439-PW3116VHYDC1Y3V7GI0A/podcast_square2_2000x2000.jpg?format=1500w&content-type=image%2Fjpeg"
},
"itunes:title": "The Parker Quiz - with Matt Parker",
"enclosure": {
"#url": "https://traffic.libsyn.com/secure/numberphile/numberphile_parker_quiz.mp3",
"#type": "audio/mpeg"
},
"media:content": {
"#url": "https://traffic.libsyn.com/secure/numberphile/numberphile_parker_quiz.mp3",
"#type": "audio/mpeg",
"#isDefault": "true",
"#medium": "audio",
"media:title": {
"#type": "plain",
"#text": "The Parker Quiz - with Matt Parker"
}
}
},
...]
The episode_id=1000475383420 doesn't appear anywhere in the RSS feed response so there is no way to find which episode corresponds to this id. Is there a clean way to find the episode by id? For example an Apple api call with episode id which will give me info about the episode and then I can match the info with RSS feed entry.
The element/tag that is supposed to uniquely identify an episode in a podcast RSS feed is:
<guid>
Here is some related info from the Apple Podcasts Connect Guide to RSS that might be helpful.
If you can get a hold of the <guid> then you can access the episode from the feed.
A less reliable option would be to try the <link> tag for the episode. On that URL that you provided, there is a link down toward the end of the page that is named 'Episode Website'
That may also get you a unique key to the episode in the RSS feed. But it may not work as you would expect in all cases. i.e. say the creator/publisher of the podcast RSS simply just put the same URL in each episode instead of a unique URL per episode.
Yeah the second response is a general-purpose podcast rss feed, independent of Apple or other sources. I'd not expect it ever to have Apple / podcast player-specific results.
Best I've been able to do is do a title match based on json-ld metadata on the podcsat episode html page. json-ld data is semantic data (vs presentation) so much less likely to change. I use the extruct library for some semblance of hope of extracting meaningful metadata and jsonpath_rw for parsing json text (amazing library)
import extruct
from jsonpath_rw import parse
metadata = extruct.extract(itunes_podcast_episode_html, uniform=True)
title_pattern = "[json-ld][*]['name']"
expr = parse(title_pattern)
title = [match.value for match in expr.find(metadata)][0]
print(f"itunes podcast episode name = '{title}'")

amadeus api where to get a list of hotel chain codes

In Amadeus API, in hotel search, in the result there is "chainCode", is there a list in csv of all chain codes ?
https://developers.amadeus.com/self-service/category/hotel/api-doc/hotel-search/api-reference
I'm referring to "chainCode" in this result example
{
"data": [
{
"type": "hotel-offers",
"hotel": {
"type": "hotel",
"hotelId": "XKPARC12",
"chainCode": "XK",
"dupeId": "501132260",
"name": "Holiday Inn Paris-notre Dame",
How to get the hotel chain name from "XK" in this example ?
sorry we totally missed this question. Since you cannot retrieve it via API, please take a look to the data-collection repository, which contains a list of Hotel chain codes.

MongoDB: How retrieve data that is newly constructed instead of original documents in the collection?

I have a collection in which documents are all in this format:
{"user_id": ObjectId, "book_id": ObjectId}
It represents the relationship between user and book, which is also one-to-many, that means, a user can have more than one books.
Now I got three book_id, for example:
["507f191e810c19729de860ea", "507f191e810c19729de345ez", "507f191e810c19729de860efr"]
I want to query out the users who have these three books, because the result I want is not the document in this collection, but a newly constructed array of user_id, it seems complicated and I have no idea about how to make the query, please help me.
NOTE:
The reason why I didn't use the structure like:
{"user_id": ObjectId, "book_ids": [ObjectId, ...]}
is because in my system, books increase frequently and have no limit in amount, in other words, user may read thousands of books, so I think it's better to use the traditional way to store it.
This question is not restricted by MongoDB, you can answer it in relational database thoughts.
Using a regular find you cannot get back all user_id fields who own all the book_id's because you normalized your collection (flattened it).
You can do it, if you use aggregation framework:
db.collection.aggregate([
{
$match: {
book_id: {
$in: ["507f191e810c19729de860ea",
"507f191e810c19729de345ez",
"507f191e810c19729de860efr" ]
}
}
},
{
$group: {
_id: "$user_id",
count: { $sum: 1 }
}
},
{
$match: {
count: 3
}
},
{
$group: {
_id: null,
users: { $addToSet: "$_id" }
}
}
]);
What this does is filters through the pipeline only for documents which match one of the three book_id values, then it groups by user_id and counts how many matches that user got. If they got three they pass to the next pipeline operation which groups them into an array of user_ids. This solution assumes that each 'user_id,book_id' record can only appear once in the original collection.

JSON parsing using JSON.net

I am accessing the facebook api and using the json.net library (newtonsoft.json.net)
I declare a Jobject to parse the content and look for the specific elements and get their values. Everything works fine for the first few but then I get this unexplained nullexception error " (Object reference not set to an instance of an object)
Now I took a look at the declaration but cannot see how to change it. Any help appreciated:
Dim jobj as JObject = JObject.Parse(responseData)
message = jobj("message").tostring
The error occurs at the last line above.I check to see if message is null and then look for the next desired field as follows
catch exception..
dim jobj2 as JObject = JObject.parse(responseData)
description = jobj2("description").tostring
JSON responsedata:
{
"id": "5281959998_126883980715630",
"from": {
"name": "The New York Times",
"category": "Company",
"id": "5281959998"
},
"picture": "http://external.ak.fbcdn.net /safe_image.php?d=e207958ca7563bff0cdccf9631dfe488&w=
90&h=90&url=http\u00253A\u00252F\u00252Fgraphics8.nytimes.com\u00252Fimages\u00252F2011\u00252F02\u00252F04\u00252Fbusiness\u00252FMadoff\u00252FMadoff-thumbStandard.jpg",
"link": "http://nyti.ms/hirbn0",
"name": "JPMorgan Said to Have Doubted Madoff Long Before His Scheme Was Revealed",
"caption": "nyti.ms",
"description": "Newly unsealed court documents show that bank
executives were suspicious of Bernard Madoff\u2019s accounts
and steered clients away from him but did not alert regulators.",
"icon": "http://static.ak.fbcdn.net/rsrc.php/yD/r/aS8ecmYRys0.gif",
"type": "link",
"created_time": "2011-02-04T16:09:03+0000",
"updated_time": "2011-02-06T20:09:51+0000",
"likes": {
"data": [
{
"name": "Siege Ind.",
"category": "Product/service",
"id": "152646224787462"
},
{
"name": "Lindsey Souter",
"id": "100000466998283"
},
This is one example where "message" does not appear in the first few lines but appears later. So what I do is look for position of message and description and which ever is first go and get that and if I get an error or the fields do not return anything, I try and parse by regex and even that is not working right.
Well, presumably jobj("message") has returned Nothing, which will happen if jobj doesn't have a message property. If you think this is incorrect, please post a short but complete piece of JSON to help us investigate.
(Is there any reason why you're declaring message and assigning it a value on the second line, only to overwrite that value on the third line?)