Yahoo Answers API - questionSearch - Filter out certain words - api

Is the Yahoo Answers API very limited on its functionality, or the documentation is incomplete?
Here is API ref:
http://developer.yahoo.com/answers/V1/questionSearch.html
I'd need to be able to retrieve questions that contain termA or termB but neither termC or termD.
Example URL for questions containing termA:
http://answers.yahooapis.com/AnswersService/V1/questionSearch?appid=YahooDemo&query=termA
Also, further information on the API will be helpful.
Thanks!

Some Googling showed that the query supports AND, OR, ANDNOT and brackets. So for a query that returns anything on cats but nothing on dogs:
http://answers.yahooapis.com/AnswersService/V1/questionSearch?appid=YahooDemo&query=cats+ANDNOT+dogs
As for the brackets, to query for anything that mentions cats or dogs but not fish:
http://answers.yahooapis.com/AnswersService/V1/questionSearch?appid=YahooDemo&query=(cats+dogs)+ANDNOT+fish

Related

Using an API to Extract All Comments from a Reddit Post

I am using the Reddit API (Pushshift) : https://github.com/pushshift/api
Using the documentation, I understand how I can use this to extract every comment containing the word "covid" that was left in a certain time period:
https://api.pushshift.io/reddit/search/comment?q=covid&after=3h&before=2h&size=1
The output looks something like this:
{"data":[{"subreddit_id":"t5_2qh6p","author_is_blocked":false,"comment_type":null,"edited":false,"author_flair_type":"richtext","total_awards_received":0,"subreddit":"Conservative","author_flair_template_id":null,"id":"j98zf27","gilded":0,"archived":false,"collapsed_reason_code":null,"no_follow":false,"author":"VamboRoolOkay","send_replies":true,"parent_id":41917615743,"score":1,"author_fullname":"t2_7uxkru5f","all_awardings":[],"body":"I will never believe that election fraud wasn't a significant factor. Go ahead - call it a conspiracy theory. But I also maintained that Covid was lab-created. Truth is the Daughter of Time.","top_awarded_type":null,"author_flair_css_class":null,"author_patreon_flair":false,"collapsed":false,"author_flair_richtext":[{"e":"text","t":"Conservative"}],"is_submitter":false,"gildings":{},"collapsed_reason":null,"associated_award":null,"stickied":false,"author_premium":false,"can_gild":true,"link_id":"t3_116l7ct","unrepliable_reason":null,"author_flair_text_color":"dark","score_hidden":true,"permalink":"/r/Conservative/comments/116l7ct/kamala_harris_plans_on_running_with_biden_in_2024/j98zf27/","subreddit_type":"public","locked":false,"author_flair_text":"Conservative","treatment_tags":[],"created_utc":1676866031,"subreddit_name_prefixed":"r/Conservative","controversiality":0,"author_flair_background_color":"","collapsed_because_crowd_control":null,"distinguished":null,"retrieved_utc":1676866047,"updated_utc":1676866048,"body_sha1":"328df3784d15f77b98a84418c4ce720822227cfe","utc_datetime_str":"2023-02-20 04:07:11"}],"error":null,"metadata":{"es":{"took":98,"timed_out":false,"_shards":{"total":828,"successful":828,"skipped":824,"failed":0},"hits":{"total":{"value":573,"relation":"eq"},"max_score":null}},"es_query":{"size":1,"query":{"bool":{"must":[{"bool":{"must":[{"simple_query_string":{"fields":["body"],"query":"covid","default_operator":"and"}},{"range":{"created_utc":{"gte":1676862433000}}},{"range":{"created_utc":{"lt":1676866033000}}}]}}]}},"aggs":{},"sort":{"created_utc":"desc"}},"es_query2":"{\"size\":1,\"query\":{\"bool\":{\"must\":[{\"bool\":{\"must\":[{\"simple_query_string\":{\"fields\":[\"body\"],\"query\":\"covid\",\"default_operator\":\"and\"}},{\"range\":{\"created_utc\":{\"gte\":1676862433000}}},{\"range\":{\"created_utc\":{\"lt\":1676866033000}}}]}}]}},\"aggs\":{},\"sort\":{\"created_utc\":\"desc\"}}","api_launch_time":1673017478.254743,"api_request_start":1676873233.6143198,"api_request_end":1676873233.7406816,"api_total_time":0.12636184692382812}}
My Question: Suppose I identify a post that contains the word "covid" - now, I want to retrieve every comment on this post : Is this possible to do?
For instance, based on the output of these results, I see that :
link_id: t3_116l7ct
parent_id:41917615743
Can I somehow use this information to write an API query to retrieve all comments from this post?
I tried the following query but got an empty result: https://api.pushshift.io/reddit/comment/search/?link_id=t3_116cjib
Thanks!

how to get table info and summary of page using Wikipedia api?

I want to get minimal information of a Wikipedia page using MediaWiki API like DuckDuckGo. For example for Steve Carell: https://duckduckgo.com/?q=steve+carell&t=hp&ia=news&iax=about
How can I get this information with a Wikipedia url (eg https://en.wikipedia.org/wiki/Steve_Carell) in HTML format?
You can use the MediaWiki API for that. There's an extension, TextExtracts, which is exactly for that (and it is installed on Wikipedia).
In your case, e.g.:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exsentences=1&titles=Steve%20Carell
will return something like:
<p class=\"mw-empty-elt\">\n</p>\n\n<p class=\"mw-empty-elt\">\n \n</p>\n<p><b>Steven John Carell</b> (<span></span>; born August 16, 1962) is an American actor, comedian, producer, writer and director.</p>
You can customize how many sentences (or characters) the API returns, as well, please consult the API documentation for that.
There's also the way to retrieve the short description, which is saved at Wikidata (and visible in the mobile view of Wikipedia). This call would be:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&titles=Steve_Carell
This returns the following property in the pageprops of the page:
"wikibase-shortdesc": "American actor"
This may fit better depending on your use case.
You can even get both of the results with a single, combined, request:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts|pageprops&exsentences=1&titles=Steve_Carell

Google's search operator AND

I am trying to figure out how exactly google's search operator AND works for an application that I am currently building.
I found this article:
https://supple.com.au/tools/google-advanced-search-operators/
that says that google uses AND operator as a default operator when searching
however I am trying some examples and this doesn't seem to be the whole story
For example, when I search for:
Google search term: perth tourism sea surfing
it gives me more search results than:
Google search term: perth tourism sea
How is that possible, I would expect that AND operator would narrow the search results not increase them (this is what I would expect from an OR operator)
Any ideas of why this is happening?
Check this resource out: https://www.webpagefx.com/blog/google-2/google-advanced-search-operators-cheat-sheet/
I have a feeling the default behaviour is and/or, but I am somewhat speculating.
You could incorporate some quotes such as perth tourism "sea surfing" but I might try something such as allintext: perth tourism sea surfing. That would be more machine-friendly than worrying about where to include quotes. Quotes may also constrain the order of your keywords. For example, "I like cats" will find an exact match (and not return pages with I cats like).
Additionally, I think you might not be using AND as it is documented on the site you linked.
The AND operator functions with the same logic as an AND operator - similar to the OR operator, it must be in all CAPS to work. Google will look for all conditions to be met before returning any results.
e.g. site:twitter.com AND intitle:SuppleSolutions AND inurl:Saijo_George where all the 3 conditions should be satisfied for Google to return any result.
Your search terms should probably be:
perth AND tourism AND sea AND surfing
You could add to your logic to replace the search string's characters with AND but before that, I would probably see if allintext: perth tourism sea surfing is viable.
Apparently I think that it is a problem of personalization and localization
I did the following things:
1. I open an incognito window
2. I signed out from all google accounts
3. I opened google.com rather than google.xx using www.google.com/ncr
4. I searched "perth" AND "tourism" AND "sea" and "surfing"
which seems to narrow down the results from searches with less terms.

citeseerx search api

Is there a way to access CiteSeerX programmatically (e.g. search by author and/or title?) Surprisingly I cannot find anything relevant; surely others too are trying to get scholarly article metadata without resorting to scraping?
EDIT: note that CiteSeerX supports OAI PMH, but that seems to be an API geared towards digital libraries keeping up to date with each other ("content dissemination") and does not specifically support search. Moreover the citeseer info on that page is very sparse and even says "Currently, there are difficulties with the OAI".
There is another SO question about CiteSeerX API (though not specifically search); the 2 answers do not resolve the problem (one talks about Mendeley, another piece of software, and the other says OAI-PMH implementations are free to offer extensions to the minimal spec).
Alternatively, can anyone suggest a good way to obtain citations from authors/titles programmatically?
As suggested by one of the commenters, I tried jabref first:
jabref -n -f "citeseer:title:(lessons from) author:(Brewer)"
However jabref seems to not realize that the query string needs to contain colons and so throws an error.
For search results, I ended up scraping the CiteSeerX results with Python's BeautifulSoup:
url = "http://citeseerx.ist.psu.edu/search?q="
q = "title%3A%28{1}%29+author%3%28{0}%29&submit=Search&sort=cite&t=doc"
url += q.format (author_last, title.replace (" ", "+"))
soup = BeautifulSoup (urllib2.urlopen (url).read ())
result = soup.html.body ("div", id = "result_list") [0].div
title = result.h3.a.string.strip ()
authors = result ("span", "authors") [0].string
authors = authors [len ("by "):].strip ()
date = result ("span", "pubyear") [0].string.strip (", ")
It is possible to get a document ID from the results (the misleadingly-named "doi=..." part in the summary link URL) and then pass that to the CiteSeerX OAI engine to get Dublin Core XML (e.g. http://citeseerx.ist.psu.edu/oai2?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:CiteSeerX.psu:10.1.1.42.2177); however that XML ends up containing multiple dc:date elements, which makes it less useful than the scrape output.
Too bad CiteSeerX makes people resort to scraping in spite of all the open archives / open access rhetoric.

Google Custom Search API, Howto return country specific results only

I am making some PHP code which takes a given search phrase and url and searches through the google search results until it finds the url (only first 100 results). My problem is, this is only working for the US. I have tried adding the "&cr=" option, but it still on returns US results.
The full URL I am using for the request is:
https://www.googleapis.com/customsearch/v1?key=API_KEY&cx=CX_VALUE&q=KEYWORD&cr=COUNTRY&alt=JSON
Does anyone have any experience with this? I want to be able to see UK results. Tried inserting &cr=countryUK , but still only does US results.
Thanks :)
Regards,
Stian
Use the gl=<country code> param to limit it to your country of choice (so gl=gb for the uk).
More info here:
http://googleajaxsearchapi.blogspot.com/2009/10/web-search-in-your-country.html