Using an API to Extract All Comments from a Reddit Post

Using an API to Extract All Comments from a Reddit Post - api

I am using the Reddit API (Pushshift) : https://github.com/pushshift/api
Using the documentation, I understand how I can use this to extract every comment containing the word "covid" that was left in a certain time period:
https://api.pushshift.io/reddit/search/comment?q=covid&after=3h&before=2h&size=1
The output looks something like this:
{"data":[{"subreddit_id":"t5_2qh6p","author_is_blocked":false,"comment_type":null,"edited":false,"author_flair_type":"richtext","total_awards_received":0,"subreddit":"Conservative","author_flair_template_id":null,"id":"j98zf27","gilded":0,"archived":false,"collapsed_reason_code":null,"no_follow":false,"author":"VamboRoolOkay","send_replies":true,"parent_id":41917615743,"score":1,"author_fullname":"t2_7uxkru5f","all_awardings":[],"body":"I will never believe that election fraud wasn't a significant factor. Go ahead - call it a conspiracy theory. But I also maintained that Covid was lab-created. Truth is the Daughter of Time.","top_awarded_type":null,"author_flair_css_class":null,"author_patreon_flair":false,"collapsed":false,"author_flair_richtext":[{"e":"text","t":"Conservative"}],"is_submitter":false,"gildings":{},"collapsed_reason":null,"associated_award":null,"stickied":false,"author_premium":false,"can_gild":true,"link_id":"t3_116l7ct","unrepliable_reason":null,"author_flair_text_color":"dark","score_hidden":true,"permalink":"/r/Conservative/comments/116l7ct/kamala_harris_plans_on_running_with_biden_in_2024/j98zf27/","subreddit_type":"public","locked":false,"author_flair_text":"Conservative","treatment_tags":[],"created_utc":1676866031,"subreddit_name_prefixed":"r/Conservative","controversiality":0,"author_flair_background_color":"","collapsed_because_crowd_control":null,"distinguished":null,"retrieved_utc":1676866047,"updated_utc":1676866048,"body_sha1":"328df3784d15f77b98a84418c4ce720822227cfe","utc_datetime_str":"2023-02-20 04:07:11"}],"error":null,"metadata":{"es":{"took":98,"timed_out":false,"_shards":{"total":828,"successful":828,"skipped":824,"failed":0},"hits":{"total":{"value":573,"relation":"eq"},"max_score":null}},"es_query":{"size":1,"query":{"bool":{"must":[{"bool":{"must":[{"simple_query_string":{"fields":["body"],"query":"covid","default_operator":"and"}},{"range":{"created_utc":{"gte":1676862433000}}},{"range":{"created_utc":{"lt":1676866033000}}}]}}]}},"aggs":{},"sort":{"created_utc":"desc"}},"es_query2":"{\"size\":1,\"query\":{\"bool\":{\"must\":[{\"bool\":{\"must\":[{\"simple_query_string\":{\"fields\":[\"body\"],\"query\":\"covid\",\"default_operator\":\"and\"}},{\"range\":{\"created_utc\":{\"gte\":1676862433000}}},{\"range\":{\"created_utc\":{\"lt\":1676866033000}}}]}}]}},\"aggs\":{},\"sort\":{\"created_utc\":\"desc\"}}","api_launch_time":1673017478.254743,"api_request_start":1676873233.6143198,"api_request_end":1676873233.7406816,"api_total_time":0.12636184692382812}}
My Question: Suppose I identify a post that contains the word "covid" - now, I want to retrieve every comment on this post : Is this possible to do?
For instance, based on the output of these results, I see that :
link_id: t3_116l7ct
parent_id:41917615743
Can I somehow use this information to write an API query to retrieve all comments from this post?
I tried the following query but got an empty result: https://api.pushshift.io/reddit/comment/search/?link_id=t3_116cjib
Thanks!

Related

Using the Reddit API, is it possible to return a list of comments if the submission title includes a specific keyword?

Using the Reddit API, is it possible to return a list of Reddit comments if the submission title includes a specific keyword? For example, if the keyword is "Lime Sparkling Water", I want to return all the comments under submissions that have "Lime Sparkline Water" in the title.
I've tried using the Pushshift API for Reddit but looks like we can only isolate the submission data or the comment data and not isolate the comments data based on the submissions data.
Please help :)

Yes, this is possible with PRAW.
You can use PRAW's stream function, that page also has examples how to use PRAW.
An example being:
subreddit = reddit.subreddit("AskReddit")
for submission in subreddit.stream.submissions():
# do something with submission
...
This will return all submissions within "AskReddit". From there you could check the post title:
if 'Lime Sparking Water' in submission:
# do something with the submission
Although, I know this is a hypothetical phrase to search, you'd be better off searching lowercase phrases/words and .lower()

How can I get page id, wikidata id of some title along with multiple languages in a single API call?

I have been trying to call Wikipedia API to retrieve page id and wikidata item id using below call and it works fine.
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=wikibase_item&redirects=1&format=xml&titles=Cat
but I need to retrieve the same information from other languages of my choice for example if I mention German and French languages in my call, it should look for their translation of word Cat and retrieve their page info. There is langlink property in Wikipedia API but somehow it doesn't work with query action along with pageprop.
So ideally, I want something like this:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=wikibase_item&prop=langlinks&lllang=de&lllang=fr&titles=Cat
Any help would be appreciated.

Using lllang twice will just result in the second value overwriting the first one. You'll have to omit the paramter and then you get all the links:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops|langlinks&ppprop=wikibase_item&titles=Cat

Obtain Deviantart Deviation ID / UUID from page URL

I was looking at the Deviantart API to see what you can do with it .
A lot of requests require you to provide a deviation id to work with.
Take for instance adding a deviation to favorites ( in Collections -> Add deviation to favorites above, I cannot post more than 2 links... )
Now I looked through the API to figure out how to obtain that id, but I did not find out how to do so.
If I only have the deviation URL, for instance http://kennyklent.deviantart.com/art/Pinkie-Pie-Dancing-296143815 , how can I tell its deviation-id?
It is not the number at the end 296143815, I would've thought so, but it's not.
If it helps, here's one example from the api's /browse/dailydeviations endpoint
"deviationid": "27FD366A-30CB-FC3E-DE54-9621E90FCE60",
"printid": "E984FC87-8B57-239C-FE7C-E2674A0DDFC4",
"url": "http://mudimba.deviantart.com/art/SF-Botanical-Gardens-57879397",
So this deviation SF-Botanical-Gardens-57879397 has the id 27FD366A-30CB-FC3E-DE54-9621E90FCE60 - but how would I find out if it wasn't listed in the examples?

Update 06/2017:
For anyone stumbling across this 2 years later, the answer below still works but there is now another way to get the UUID. Every Deviation now has a meta property da:appurl showing the UUID value on the deviation page itself.
To stay with the SF-Botanical-Gardens-57879397 example from above, looking at the page source at http://mudimba.deviantart.com/art/SF-Botanical-Gardens-57879397 reveals:
<meta property="da:appurl" content="DeviantArt://deviation/27FD366A-30CB-FC3E-DE54-9621E90FCE60">
Which contains exactly the UUID value 27FD366A-30CB-FC3E-DE54-9621E90FCE60
Original answer
I got an answer from a Deviantart dev directly, http://comments.deviantart.com/1/492518964/3755610860
You cannot convert integer IDs into UUID format, you have to query the api to find the correct uuid. So for your example, you would query the /gallery/folders endpoint and then the gallery/{folderid} endpoint to get the list of deviations in that folder.
There's no easier way to obtain the UUID for a given URL for now.

Use custom function to populate gSpreadsheet cell based on a XML/JSON response

Ok, this one has become a little tricky for me and I really need some assistance to work through it.
Problem
I have a GSpreadsheet which has a list of data, in this case Twitter usernames. Using the API of a service provider (in this case the Klout API), I would like to retrieve information about that user to populate a cell within a spreadsheet.
Based on what I can work out so far, I would need to write a custom function to do this but I have no idea where to start, how I might construct it, or if there are any examples of doing this.
Scenario
The Klout API can return either an XML or JSON response (see http://developer.klout.com/docs/read/api/API), based on the string passed. For example, the URL:
http://api.klout.com/1/users/show.xml?key=SECRET&users=thewinchesterau
would return the following XML response:
<users>
<user>
<twitter_id>17439480</twitter_id>
<twitter_screen_name>thewinchesterau</twitter_screen_name>
<score>
<kscore>56.63</kscore>
<slope>0</slope>
<description>creates content that is spread throughout their network and drives discussions.</description>
<kclass_id>10</kclass_id>
<kclass>Socializer</kclass>
<kclass_description>You are the hub of social scenes and people count on you to find out what's happening. You are quick to connect people and readily share your social savvy. Your followers appreciate your network and generosity.</kclass_description>
<kscore_description>thewinchesterau has a low level ofinfluence.</kscore_description>
<network_score>58.06</network_score>
<amplification_score>29.16</amplification_score>
<true_reach>90</true_reach>
<delta_1day>0.3</delta_1day>
<delta_5day>0.5</delta_5day>
</score>
</user>
</users>
Based on this response, I would like to be able to populate different cells with the values returned within the XML (or JSON if easier) packet.
So, for example, I would have a spreadsheet like the following which would have custom functions to go out and retrieve the value of the relevant XML element response to populate the cell:
Cell A B C D E
1 Username kscore Network score Amplification score True reach
2 thewinchester =kscore(A2) =nscore(A2) =ascore(A2) =tscore(A2)
Questions
Are there any gSpreadsheet examples you know of that use an API to pull data in from an external source?
How would one write a custom function to fetch the result from the API and populate a cell with a result of a specific element?
Any information, examples or helpers you have are greatly appreciated.

You want the importXML function, documented here. The formula you want will look something like this:
=importXML("http://api.klout.com/1/users/show.xml?key=SECRET&users=" + A1, "//users/user/score/kscore")

You could write a custom script with Google AppScript, but there's a simple solution to this similar to what Nick Johnson posted. I've tested this against the score function, but it could be easily adapted to the show endpoint with different XPath.
=importXML("http://api.klout.com/1/klout.xml?users="&A1&"&key=YOUR_API_KEY", "//users/user/kscore")
This presumes your Twitter IDs are in the A column.
Note, Google Docs limits the number of such importXML functions to 50 per spreadsheet. You could concatenate groups of 5 userids for each importXML call, effectively putting your limit to 250 a sheet.
This could also be adapted to a similar call in Excel that doesn't have that limit. Keep in mind the Klout ToS, though, using proper attribution and rate limits.

Google Custom Search API, Howto return country specific results only

I am making some PHP code which takes a given search phrase and url and searches through the google search results until it finds the url (only first 100 results). My problem is, this is only working for the US. I have tried adding the "&cr=" option, but it still on returns US results.
The full URL I am using for the request is:
https://www.googleapis.com/customsearch/v1?key=API_KEY&cx=CX_VALUE&q=KEYWORD&cr=COUNTRY&alt=JSON
Does anyone have any experience with this? I want to be able to see UK results. Tried inserting &cr=countryUK , but still only does US results.
Thanks :)
Regards,
Stian

Use the gl=<country code> param to limit it to your country of choice (so gl=gb for the uk).
More info here:
http://googleajaxsearchapi.blogspot.com/2009/10/web-search-in-your-country.html

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using an API to Extract All Comments from a Reddit Post - api

Related

Using the Reddit API, is it possible to return a list of comments if the submission title includes a specific keyword?

How can I get page id, wikidata id of some title along with multiple languages in a single API call?

Obtain Deviantart Deviation ID / UUID from page URL

Use custom function to populate gSpreadsheet cell based on a XML/JSON response

Google Custom Search API, Howto return country specific results only

Categories

Resources