Freebase API - listing a city's tourist attractions by relevance - api

I'm trying to use Freebase to list tourist attractions for cities by relevance.
Using the Topic API, it's simple to retrieve results for a certain city using its MID (e.g. "/m/04jpl" for London)
https:// www.googleapis.com/freebase/v1/topic/m/04jpl/?&filter=/travel/travel_destination/tourist_attractions
However, this gives a limited 10 results. The response ends with "count": 87.0". How do I get all 87? It's possible to click a "87 values total" link on London's Freebase page. Effectively, I want to do the same here.
I realise I could use MQL, but I want the results to be ranked by relevance, not by timestamp. Using the Search API, it's possible to rank by freebase, entity or schema, so I'd rather use that.
First, I looked at the Search Output schema for the Search API. However, even outputting "all" didn't produce Tourist Attraction results. Using metaschema with the Search API DID work. I used "part_of" to select London. However, it only works for some locations:
https:// www.googleapis.com/freebase/v1/search?limit=50&filter=(all%20type:/travel/tourist_attraction%20part_of:/m/04jpl)&indent=true
What I REALLY want to be able to do is make it work for a relatively unknown location like "Loughborough" (MID /m/01z21p). As you can see, substituting /m/04jpl for /m/01z21p produces no results:
https:// www.googleapis.com/freebase/v1/search?limit=50&filter=(all%20type:/travel/tourist_attraction%20part_of:/m/01z21p)&indent=true
Looking at "Loughborough", we see that its tourist attraction like "Loughborough Town Hall" has a "/travel/tourist_attraction/near_travel_destination" of "Loughborough". How would I compose this filter?
I want something like the following (that actually works):
https:// www.googleapis.com/freebase/v1/search?limit=50&filter=(all%20type:/travel/tourist_attraction)&filter=(/travel/tourist_attraction/near_travel_destination:/m/01z21p)&indent=true
Thanks!
NOTE: To enter the links into your browser you need to remove the space between the https:// and www. I would have done so, but I don't have the required permissions here yet to post more than 2 links.

I solved this problem using 2 Freebase API calls.
1) An MQL query that gets a list of all the tourist attractions for a particular MID. These results are not ranked in any useful way. I am also returning the result number to make processing a little easier later
https://www.googleapis.com/freebase/v1/mqlread?query={"mid":"/m/04jpl","/travel/travel_destination/tourist_attractions":[{"mid":null}],"resultnumber:/travel/travel_destination/tourist_attractions":[{"return":"count"}]}
The list of returned MIDs are then used to create a new query (using a for loop). You must enter all MIDs returned from the above query, so that they can all be ranked together.
2) https://www.googleapis.com/freebase/v1/search?limit=10&filter=(any%20mid:/m/0gsxw%20mid:/m/01d_0p%20mid:/m/07gyc)&scoring=entity
It's best to choose a return format that just returns MIDs, to ensure that loading times aren't extensive.
You then have a ranked list of MIDs! You'll need one final query to return whatever details you desire.
I hope this has proved helpful.

Related

Getting search results from wikidata website, but not API

I'm trying out the wikidata API but have some trouble with the search query "Jas 39 C Gripen". It returns results on the wikidata website, but not if I use the API.
On The wikidata website I get two search results for the query
https://www.wikidata.org/w/index.php?search=Jas+39+C+Gripen&title=Special:Search&fulltext=1
The same query using the API, does not return a result
https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&language=en&type=item&continue=0&search=Jas%2039%20C%20Gripen
Am I missing some parameters or using the wrong parameters? For many other queries I get results from the API.
It seems that the native Wikidata search applies some "fuzzy logic" when interpreting the user's entries. In your case, it shows two results, although the character C is missing in the first one.
Coming back to the API and the action you have chosen, you could use Jas 39 Gripen as search term (which will show one result) as well as Jas 39C Gripen (which will also show one result). But it seems that you can't use Jas 39 C Gripen (note the space character between 9 and C).
In other words,
https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&language=en&type=item&continue=0&search=Jas%2039%20Gripen
https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&language=en&type=item&continue=0&search=Jas%2039C%20Gripen
both work, but
https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&language=en&type=item&continue=0&search=Jas%2039%20C%20Gripen
does not.
I have investigated this issue further and finally found the solution. Try this:
https://www.wikidata.org/w/api.php?action=query&list=search&format=json&srsearch=Jas+39+C+Gripen
The action query allows some "fuzziness" in the search term. Please refer to the API documentation for further details. In short, this action performs a full text search (which you obviously want) and allows for a nearmatch search type.
The reason seems to be that the English label is JAS 39C Gripen. By removing one space from your query, you will get the result you are looking for:
https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&language=en&type=item&continue=0&search=Jas%2039C%20Gripen

What is the appropriate use case in a REST API to use the ? Question Mark for parameters?

On this REST tutorial site,
When, if ever, is it appropriate to put something like
http://dev.m.gatech.edu/developer/USER_NAME/api/WIDGET_NAME/test?query=someparam
instead of
http://dev.m.gatech.edu/developer/USER_NAME/api/WIDGET_NAME/test/someparam
or
http://dev.m.gatech.edu/developer/USER_NAME/api/WIDGET_NAME/test/someparam/var1/param/var2/param
?
I've seen various things on SO.
All cases where you are performing a GET request and need to pass some parameters should be in the form of ?param=value.
So is that first link up top, they are just wrong? Man, who can you trust these days :).
No, they are not wrong. Take this example from that site
GET http://www.example.com/customers/33245/orders
Here, customers, 33245 and orders are not query parameters, they are resource endpoints, or uri nodes as they call them on your restapitutorial.com
If you do
GET http://www.example.com/customers you get all customers
GET http://www.example.com/customers/33245 you get customer 33245
GET http://www.example.com/customers/33245/orders you get customer 33245's orders
They all return 0 or more resources. If you were to apply a query to for example the first one and you wanted to GET all customers with John as first name, you would do this
GET http://www.example.com/customers?firstname=John
In the last example in your question, it would be written as GET http://www.example.com/customers/firstname/john instead, which is wrong in terms of restfulness. There is no customer resource 'firstname', and there is no firstname resource 'john'.
There are customers whose firstname is 'john' and you would GET them with
GET http://www.example.com/customers?firstname=John

Is it possible to order lucene documents by matching term?

I'm using Lucene 4.10.3 with Java 1.7
I'm wondering whether it's possible to order query results the matching term?
Simply put, if my documents conatin a text field;
The query is
text:a*
I want documents with ab, then ac, then ad etc.
The real case is more complex however, what I'm actually trying to accomplish is to "stuff" a relational DB into my lucene Index (probably not the best idea?).
An appropriate example would be :
I have documents representing books in a library. every book has a title and also a list of people who has borrowed this book and the date of borrowing.
when a user searches for a book with title containing "JAVA", I want to give priority to books that were borrowed by this user. This could be accomplished by adding a TextField "borrowers", adding a SHOULD clause on it and ordering by score)
also, if there are several books with "JAVA" that this user has borrowed before, I want to show the most recent borrowed ones first. so I thought to create a TextField "borrowers" that will look like
borrowers : "user1__20150505 user2__20150506" etc.
I will add a BooleanClause borrowers: user1* and order by matching term.
any other solution ideas will be welcome
I understand your real problem is more complex, but maybe this is helpful anyway.
You could first search for Tokens in the index that match your query, then for each matching token executing a query using this token specifically.
See https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/index/TermsEnum.html for that. Just seek to the prefix and iterate until the prefix stops matching.
In general it is sometimes easy to just issue two queries. For example one within the corpus of books the user as borrowed before and another witin the whole corpus.
These approaches may not work, but in that case you could implement a custom Scorer somehow mapping the ordering to a number.
See http://opensourceconnections.com/blog/2014/03/12/using-customscorequery-for-custom-solrlucene-scoring/

RESTful API - URI Structure Advice

I have REST API URL structure similar to:
/api/contacts GET Returns an array of contacts
/api/contacts/:id GET Returns the contact with id of :id
/api/contacts POST Adds a new contact and return it with an id added
/api/contacts/:id PUT Updates the contact with id of :id
/api/contacts/:id PATCH Partially updates the contact with id of :id
/api/contacts/:id DELETE Deletes the contact with id of :id
My question is about:
/api/contacts/:id GET
Suppose that in addition to fetching the contact by ID, I also want to fetch it by an unique alias.
What should be URI structure be if I want to be able to fetch contact by either ID or Alias?
If you're alias's are not numeric i would suggest using the same URI structure and figuring out if it's an ID or an alias on your end. Just like Facebook does with username and user_id. facebook.com/user_id or facebook.com/username.
Another approach would be to have the client use GET /contacts with some extra GET parameters as filters to first search for a contact and then looking up the ID from that response.
Last option i think would be to use a structure like GET /contacts/alias/:alias. But this would kinda imply that alias is a subresource of contacts.
The path and query part of IRIs are up to you. The path is for hierarchical data, like api/version/module/collection/item/property, the query is for non-hierarchical data, like ?display-fields="id,name,etc..." or ?search="brown teddy bear"&offset=125&count=25, etc...
What you have to keep in mind, that you are working with resources and not operations. So the IRIs are resource identifiers, like DELETE /something, and not operation identifiers, like POST /something/delete. You don't have to follow any structure by IRIs, so for example you could use simply POST /dashuif328rgfiwa. The server would understand, but it would be much harder to write a router for this kind of IRIs, that's why we use nice IRIs.
What is important that a single IRI always belongs only to a single resource. So you cannot read cat properties with GET /cats/123 and write dog properties with PUT /cats/123. What ppl usually don't understand, that a single resource can have multiple IRIs, so for example /cats/123, /cats/name:kitty, /users/123/cats/kitty, cats/123?fields="id,name", etc... can belong to the same resource. Or if you want to give an IRI to a thing (the living cat, not the document which describes it), then you can use /cats/123#thing or /users/123#kitty, etc... You usually do that in RDF documents.
What should be URI structure be if I want to be able to fetch contact
by either ID or Alias?
It can be /api/contacts/name:{name} for example /api/contacts/name:John, since it is clearly hierarchical. Or you can check if the param contains numeric or string in the /api/contacts/{param}.
You can use the query too, but I don't recommend that. For example the following IRI can have 2 separate meanings: /api/contacts?name="John". You want to list every contact with name John, or you want one exact contact. So you have to make some conventions about this kind of requests in the router of your server side application.
I would consider adding a "search" resource when you are trying to resolve a resource with the alias:
GET /api/contacts/:id
and
GET /api/contacts?alias=:alias
or
GET /api/contacts/search?q=:alias
First of all, the 'ID' in the URL doesn't have to be a numerical ID generated by your database. You could use any piece of data (including the alias) in the URL, as long as its unique. Of course, if you are using numerical ID's everywhere, it is more consistent to do the same in your contacts API. But you could choose to use the aliases instead of numeric IDs (as long as they are always unique).
Another approach would be, as Stromgren suggested, to allow both numeric IDs and aliases in the URL:
/api/contacts/123
/api/contacts/foobar
But this can obviously cause problems if aliases can be numeric, because then you wouldn't have any way to differentiate between an ID and a (numeric) alias.
Last but not least, you can implement a way of filtering the complete collection, as shlomi33 already suggested. I wouldn't introduce a search resource, as that isn't really RESTful, so I'd go for the other solution instead:
/api/contacts?alias=foobar
Which should return all contacts with foobar as alias. Since the alias should be unique, this will return 1 or 0 results.

Google Places API - RadarSearch results are confusing

I'm running a query vs the Google Places RadarSearch API and don't entirely understand the results. I'm trying to find nearby Tesco Supermarkets. My query is structured like this:
https://maps.googleapis.com/maps/api/place/radarsearch/xml?location=51.503186,-0.126446&types=store&keyword=tesco&name=tesco&radius=5000&key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
I've tried a bunch of variations of the fields types, keyword and name. None of the results are Tesco stores. Am i missing something?
The Google docs show the fields as:
keyword — A term to be matched against all content that Google has indexed for this place, including but not limited to name, type, and address, as well as customer reviews and other third-party content.
name — One or more terms to be matched against the names of places, separated by a space character. Results will be restricted to those containing the passed name values. Note that a place may have additional names associated with it, beyond its listed name. The API will try to match the passed name value against all of these names. As a result, places may be returned in the results whose listed names do not match the search term, but whose associated names do.
I always get the maximum of 200 results which maybe includes 1 or 2 Tescos. When I check on Google maps there are 10 Tescos in the radius I am searching. It's as if the api is ignoring the name field. It doesn't matter what I populate in the name field, I still get the same results
UPDATE: Seems this is a known bug https://code.google.com/p/gmaps-api-issues/issues/detail?id=7082
maybe I am wrong, but I believe it is a commercial issue, google will show all business filtering them with a particular criteria they are no publishing the rules, for example in your search, the type you used was "store" , so they are returning to you all stores, and using the name or keyword in their own way who knows which criteria they are internally using, and there is something else, on the API description, the sample that they provide for radar search shows the name of the place in the result, but in the tests i am doing, they are not even sending the name, so you couldn't iterate those results, and filter by your own, for you to get the name, you have to do another call using:
https://maps.googleapis.com/maps/api/place/details/json?placeid=ChIJq4lX1doEdkgR5JXPstgQjc0&key=YOUR_KEY
Maybe there is another way but I don't see it.
I find the radar search is returning strange results today. It worked differently a couple of days ago.
The keyword-parameter has no effect at the moment and I have breaking integration-tests that were working before. I hope this is a temporary issue.
I filed a bug report for it: https://code.google.com/p/gmaps-api-issues/issues/detail?id=7086