This may well be documented somewhere obvious, but I'm not seeing it. I'm parsing Google News results from RSS, but I'm struggling to get the RSS feed to match what I'm seeing online, with results limited to my country.
I'm in South Africa. To see SA news on a topic, I search for the topic in Google News, then select "Pages from South Africa" in the left menu. Although that option is under "The web", it does limit the news results as well.
However, the RSS link in the page footer goes to the generic (ie: not region-specific) news results as if I hadn't selected "pages from..." at all. I've been playing with the parameters in the feed URL, but I haven't found any way to get it to restrict the RSS results to my region. (Similarly: can't find an option to limit a CSE - custom search engine - the same way).
Any ideas?
Update: looks like it can't be done - the RSS URL doesn't obey the same rules as the regular searches ("&cr=countryZA"). Manipulating the query string to get the result, and scraping the results out, is in defiance of Google's Ts&Cs.
you can use &geo=usa or even use zip code like &geo=99553
It can be done like so: https://news.google.com/news/feeds?country=AU?geo=2000
The above URL would limit the RSS output to news in Sydney, Australia.
Related
I am a long time customer of using the Custom Search API.
The problem - as described in the CSE documentation - is that the API is intended to search your own site and not the web in general. It misses results, for example from books.google.com, and results from other languages etc.
Is there another (paid) API that returns all results?
Sample search string: "الاستخدامات التالية من التطبيق"
(The above search gets 1 result in Google Search but 0 results in the Custom Search I am paying for.)
Thanks.
I didn't want to switch to Bing, but I was getting better results in the end.
For anyone else having this issue:
https://learn.microsoft.com/en-us/rest/api/cognitiveservices/bing-web-api-v7-reference
How to get all Wikipedia article titles in one place without extra characters and pageids. Just the article's title. Something like this:
When I download wikipedia dump, I get this
Maybe I know a movement that might get me all pages but I wanted to get all pages in one take.
You'll find it on https://dumps.wikimedia.org
The latest List of page titles in main namespace for English Wikipedia as a database dump is here (69 MB).
If you rather want it through the API you use query and list=allpages but that only give you maximum 500 (5k for bots) at a time, so you will have to make more than 10 000 API calls for the English Wikipedia.
Example: https://en.wikipedia.org/w/api.php?action=query&format=xml&list=allpages&aplimit=max
I'm looking for an API that would give similar results to the Google's "people also search for" feature. So that, for instance, when I search for Stanley Kubrik, I see all the other film directors that people search for.
I know about the Freebase API but it simply provides information about the search item, not what other search items it may be related to.
There is also a TargetingIdeaSelector tool in Google AdWords API that shows related keywords, but that doesn't really range the results semantically.
Finally, there's a very simple Bing API that shows related searches (also here), but, again, it does not range information semantically.
Do you know of any API or maybe if there is something like that in Google's APIs that would show me related searches ranged semantically?
Google used to offer such API but it was decapricated a few years back. I am unsure why this was the case but my guess is because it housed no real benefit for them and likely cost a lot to maintain. most major search engines tend to not have search API's in my experience.
You could however try an make your own using a PHP and DOM Parser to parse the results from somewhere like google and export the data out as JSON.
available for download here http://simplehtmldom.sourceforge.net
This should pull out all the links from Google which you can then format out. You can parse all data and can target objects see the documentation for more
$search = $_GET['search'];
> $google_search = file_get_html('https://www.google.co.uk/?gws_rd=ssl#q=' . $search);
>
> foreach($google_search->find('a') as $item) {
> echo $item->href . '<br>';
> }
Hope that helps
The results that Google shows is based on massive amount of data that i guess built on "what X who searched for Y also searched for", "what other people similar to X who also searched for Y searched for" and so on. In addition maybe there is some reliance on semantic information coming from Freebase.
On an initiative to understand what kind of properties Google shows in their infoboxes, i.e. Why when we search for France we get a card with map, flag, capital, population ... etc. amongst the hundreds of properties relate to France i created a "Knowledge Base Extractor " that is able to parse the Google infobox and expose the data as RDF using the Fresnel Vocabulary.
The Algorithm implemented is the following:
Query DBpedia for all concepts (types) for which there is at least one instance that has a link to a Freebase ID
For each of these concepts pick (n) instances randomly
For each instance, issue a Google Search query:
if an infobox is available -> scrap the infobox to extract the properties
if no infoxbox is available, check if Google suggests "do you mean ... ?" and if so, traverse the link and look for an infobox
if no infobox or correction is available, disambiguate the concept (type) used in the search query and check if an infobox is returned
if Google suggests disambiguation in an infobox parse all the links in it -> it is best to find which suggestion maps to the current data-type we are using -> check the Freebase - DBpedia mappings
Cluster properties for each concept
I also capture that "people searched for" section, but you might also want to tweak it a bit more.
Also note that you might want to check the CSS selectors for the infobox as Google changes them often (maybe auto-generated). This is done in the options.json
"knowledgeBox" : "#kno-result",
"knowledgeBox_disambiguate" : ".kp-blk",
"property" : "._Nl",
"property_value" : ".kno-fv",
"label" : ".kno-ecr-pt",
"description" : ".kno-rdesc",
"type" : "._kx",
"images" : ".bicc",
"special_property" : ".kno-sh",
"special_property_value" : "._Zh",
"special_property_value_link" : "a._dt"
As the question suggest, does the Google Custom Search API have function to return category (music, entertainment, news, gaming, etc) based on input keywords?
You can use Adwords Keyword Planner tool or display plannertool for getting related ideas of keywords.
We would like to be able to pull out certain places from foursquare an categorize them on our website along with comments from foursquare users. I have the following questions:
1- Can we pull out places and categorize them the way we want on our website? e.g: restaurants/bar/lounge/club/landmarks/others.
2- can we pull out as well phone numbers (when available) and addresses (longitude-lattitude) of places ?
3- Does foursquare have any general descriptive summaries of each place?
Thanks for the help.
Chris
Foursquare has an API, more information can be found at this link
To answer your questions:
Yes, check out the Venues Platform in 4sq API, specifically, the search. When you query the API, as part of the result set for each venue, you get a category
If available, you will get them back under the 'contact' field, check out the response venue object from the search function
Yes, description field, you will need to make an API request to get the complete venue object.
Edit: one last thing, attribute and play nice :)
From my experience, you do not get a lot of venues with 'contact' and 'description' information. But foursquare is not very popular where I test my application, so it might be bad experience - experiment with it yourself.
FourSquare has a great category tree that you can use for categorizing restaurants
http://aboutfoursquare.com/foursquare-categories/
Actually, I'm using this tree in my website:
Dishes Map