Getting search results from wikidata website, but not API - wikipedia-api

I'm trying out the wikidata API but have some trouble with the search query "Jas 39 C Gripen". It returns results on the wikidata website, but not if I use the API.
On The wikidata website I get two search results for the query
https://www.wikidata.org/w/index.php?search=Jas+39+C+Gripen&title=Special:Search&fulltext=1
The same query using the API, does not return a result
https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&language=en&type=item&continue=0&search=Jas%2039%20C%20Gripen
Am I missing some parameters or using the wrong parameters? For many other queries I get results from the API.

It seems that the native Wikidata search applies some "fuzzy logic" when interpreting the user's entries. In your case, it shows two results, although the character C is missing in the first one.
Coming back to the API and the action you have chosen, you could use Jas 39 Gripen as search term (which will show one result) as well as Jas 39C Gripen (which will also show one result). But it seems that you can't use Jas 39 C Gripen (note the space character between 9 and C).
In other words,
https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&language=en&type=item&continue=0&search=Jas%2039%20Gripen
https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&language=en&type=item&continue=0&search=Jas%2039C%20Gripen
both work, but
https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&language=en&type=item&continue=0&search=Jas%2039%20C%20Gripen
does not.
I have investigated this issue further and finally found the solution. Try this:
https://www.wikidata.org/w/api.php?action=query&list=search&format=json&srsearch=Jas+39+C+Gripen
The action query allows some "fuzziness" in the search term. Please refer to the API documentation for further details. In short, this action performs a full text search (which you obviously want) and allows for a nearmatch search type.

The reason seems to be that the English label is JAS 39C Gripen. By removing one space from your query, you will get the result you are looking for:
https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&language=en&type=item&continue=0&search=Jas%2039C%20Gripen

Related

Ldap search for objects where attribute X contains multiple values

I would like to know if it is possible to do a search like this:
"give me all objects where description has more than 1 value"
The short answer is no. At least not from a single LDAP Query without somehow parsing the results.
I know of a tool that will provide those results however it has not been updated in a while but last time I used it, it worked.

Advanced search REST API

My requirement is to implement advanced search Rest API for searching the phones. The URI for the search API is http://myservice/api/v1/phones/search?q=${query_expression}
Where q is the complex query expression. Have the following questions
1) Since advanced search involves a lengthy query expression, the URI will not fit in a GET call. Is it alright to implement the search API via POST request and still maintain the RESTfulness?
2) I have come across the following implementations for the advanced search:
1st approach - Send the complete infix expression for the query expression.
eg.
PHONENAME STARTSWITH 'AR' AND ( PHONETYPE = '4G' OR PHONECOLOR = 'RED')
2nd approach - Constructing entire query expression in the form of a json.
eg.
{"criteria":[
{"index":1,"field":"PHONENAME","value":"AR","comparator":"STARTSWITH"},
{"index":2,"field":"PHONETYPE","value":"4G","comparator":"EQUALS"},
{"index":3,"field":"PHONECOLOR","value":"RED","comparator":"EQUALS"}
],"criteria":"( 1 AND (2 OR 3) )"}
3rd approach - Alternative way to implement the query expression as a json.
eg.
{"and":[
{"field":"PHONENAME","value":"AR","comparator":"STARTSWITH"},
"or":[
{"field":"PHONETYPE","value":"4G","comparator":"EQUALS"},
{"field":"PHONECOLOR","value":"RED","comparator":"EQUALS"}]
]}
Which approach would be considered more RESTful out of the three? Suggestions for any other approaches are welcome :)
You could follow the approach taken by ElasticSearch, which out of the examples you had given is the third one.
See https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html
The third approach is also easier to understand and easier to maintain.
For example if in the future you would need to add "fuzzy" query operator and it would have a completely different model, that would be an easy thing to do.
Yes, POST is a catch-all. It's preferable to use it for resource creation, but according to the spec it may be used in this way also. However, you should consider changing the endpoint to be /search-results. This gives you the flexibility to start storing search results later, and you can return a Location header pointing to the results of a particular complex query. Another alternative is to let users POST their search criteria, and then do a GET /search-results?criteria={id}.
Don't do the second one. It's hard to read and more complex than it should be. Either the first or the third are fine. The first is more compact but probably harder to handle on the back end. For the third, you really don't need the index.

Google Places API - RadarSearch results are confusing

I'm running a query vs the Google Places RadarSearch API and don't entirely understand the results. I'm trying to find nearby Tesco Supermarkets. My query is structured like this:
https://maps.googleapis.com/maps/api/place/radarsearch/xml?location=51.503186,-0.126446&types=store&keyword=tesco&name=tesco&radius=5000&key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
I've tried a bunch of variations of the fields types, keyword and name. None of the results are Tesco stores. Am i missing something?
The Google docs show the fields as:
keyword — A term to be matched against all content that Google has indexed for this place, including but not limited to name, type, and address, as well as customer reviews and other third-party content.
name — One or more terms to be matched against the names of places, separated by a space character. Results will be restricted to those containing the passed name values. Note that a place may have additional names associated with it, beyond its listed name. The API will try to match the passed name value against all of these names. As a result, places may be returned in the results whose listed names do not match the search term, but whose associated names do.
I always get the maximum of 200 results which maybe includes 1 or 2 Tescos. When I check on Google maps there are 10 Tescos in the radius I am searching. It's as if the api is ignoring the name field. It doesn't matter what I populate in the name field, I still get the same results
UPDATE: Seems this is a known bug https://code.google.com/p/gmaps-api-issues/issues/detail?id=7082
maybe I am wrong, but I believe it is a commercial issue, google will show all business filtering them with a particular criteria they are no publishing the rules, for example in your search, the type you used was "store" , so they are returning to you all stores, and using the name or keyword in their own way who knows which criteria they are internally using, and there is something else, on the API description, the sample that they provide for radar search shows the name of the place in the result, but in the tests i am doing, they are not even sending the name, so you couldn't iterate those results, and filter by your own, for you to get the name, you have to do another call using:
https://maps.googleapis.com/maps/api/place/details/json?placeid=ChIJq4lX1doEdkgR5JXPstgQjc0&key=YOUR_KEY
Maybe there is another way but I don't see it.
I find the radar search is returning strange results today. It worked differently a couple of days ago.
The keyword-parameter has no effect at the moment and I have breaking integration-tests that were working before. I hope this is a temporary issue.
I filed a bug report for it: https://code.google.com/p/gmaps-api-issues/issues/detail?id=7086

Freebase API - listing a city's tourist attractions by relevance

I'm trying to use Freebase to list tourist attractions for cities by relevance.
Using the Topic API, it's simple to retrieve results for a certain city using its MID (e.g. "/m/04jpl" for London)
https:// www.googleapis.com/freebase/v1/topic/m/04jpl/?&filter=/travel/travel_destination/tourist_attractions
However, this gives a limited 10 results. The response ends with "count": 87.0". How do I get all 87? It's possible to click a "87 values total" link on London's Freebase page. Effectively, I want to do the same here.
I realise I could use MQL, but I want the results to be ranked by relevance, not by timestamp. Using the Search API, it's possible to rank by freebase, entity or schema, so I'd rather use that.
First, I looked at the Search Output schema for the Search API. However, even outputting "all" didn't produce Tourist Attraction results. Using metaschema with the Search API DID work. I used "part_of" to select London. However, it only works for some locations:
https:// www.googleapis.com/freebase/v1/search?limit=50&filter=(all%20type:/travel/tourist_attraction%20part_of:/m/04jpl)&indent=true
What I REALLY want to be able to do is make it work for a relatively unknown location like "Loughborough" (MID /m/01z21p). As you can see, substituting /m/04jpl for /m/01z21p produces no results:
https:// www.googleapis.com/freebase/v1/search?limit=50&filter=(all%20type:/travel/tourist_attraction%20part_of:/m/01z21p)&indent=true
Looking at "Loughborough", we see that its tourist attraction like "Loughborough Town Hall" has a "/travel/tourist_attraction/near_travel_destination" of "Loughborough". How would I compose this filter?
I want something like the following (that actually works):
https:// www.googleapis.com/freebase/v1/search?limit=50&filter=(all%20type:/travel/tourist_attraction)&filter=(/travel/tourist_attraction/near_travel_destination:/m/01z21p)&indent=true
Thanks!
NOTE: To enter the links into your browser you need to remove the space between the https:// and www. I would have done so, but I don't have the required permissions here yet to post more than 2 links.
I solved this problem using 2 Freebase API calls.
1) An MQL query that gets a list of all the tourist attractions for a particular MID. These results are not ranked in any useful way. I am also returning the result number to make processing a little easier later
https://www.googleapis.com/freebase/v1/mqlread?query={"mid":"/m/04jpl","/travel/travel_destination/tourist_attractions":[{"mid":null}],"resultnumber:/travel/travel_destination/tourist_attractions":[{"return":"count"}]}
The list of returned MIDs are then used to create a new query (using a for loop). You must enter all MIDs returned from the above query, so that they can all be ranked together.
2) https://www.googleapis.com/freebase/v1/search?limit=10&filter=(any%20mid:/m/0gsxw%20mid:/m/01d_0p%20mid:/m/07gyc)&scoring=entity
It's best to choose a return format that just returns MIDs, to ensure that loading times aren't extensive.
You then have a ranked list of MIDs! You'll need one final query to return whatever details you desire.
I hope this has proved helpful.

Lucene: Accessing payloads of results of a query

When I'm searching for a query in Lucene, I receive a list of documents as result. But how can I get the hits within those documents? I want to access the payload of those word, which where found by the query.
If your query contains only one term you can simply use TermPositions to access the payload of this term. But if you have a more complex query with Phrase Search, Proximity Search, ... you can't just search for the single terms in TermPositions.
I would like to receive a List<Token>, TokenStream or something similiar, which contains all the Tokens that were found by the query. Then I can iterate over the list and access the payload of each Token.
I solved my problem by using SpanQueries. Nearly every Query can be expressed as SpanQuery. A SpanQuery gives access to the spans where the hit within a document is. Because the normal QueryParser doesn`t produce SpanQueries, I had to write my own parser which only creates SpanQueries. Another option would be the SurroundParser from Lucene-Contrib, which also creates SpanQueries.
I think you'll want to start by looking at the Lucene Highlighter, as it highlights the matching terms in the document.