!contains on Notes field - rally

My app has keywords boxes to provide words to filter on containing or not containing and it does it in a number of fields on the objects (defects, user stories and tasks) such as Notes and Description.
We just realized that if we specify a nonsense word in the exclude box it returns fewer results than if we did not specify the word. I believe it should be true that if we specify a word that does not exist anywhere in the search results, it should be exactly the same results if we did not specify this search criteria.
I narrowed it down to only the Notes field. The rest of the queries return the expected results.
(Notes !contains "illidan")
This is how I am using it. I am testing on the web services api page, so I've completely removed my app from the equation. If I search defects with only that query, I get 4512 results. If I just run an empty query I get 16526 results. But my test word is not contained anywhere in defects, confirmed by searching (Notes contains "illidan") which returns 0 results.
Is there something I'm missing here or is this a bug in Rally?

Simon's comment is on-target. Testing on a much smaller dataset, what I see is that !contains works as follows:
(Notes !contains "illidan")
Will return all Defects where the Notes field is non-empty AND does not contain "Illidan". Thus if you have:
N Defects
EN Defects with an Empty Notes field
Zero Defects with Notes fields that contain the string "illidan"
Your query will return (N - EN) Defects.
This is not documented in Rally's query help system, adding to the confusion. However, it is expected behavior in terms of how the query logic functions.
I'll file an enhancement request with Rally's documentation team to better document this behavior.

Related

Is it possible to order lucene documents by matching term?

I'm using Lucene 4.10.3 with Java 1.7
I'm wondering whether it's possible to order query results the matching term?
Simply put, if my documents conatin a text field;
The query is
text:a*
I want documents with ab, then ac, then ad etc.
The real case is more complex however, what I'm actually trying to accomplish is to "stuff" a relational DB into my lucene Index (probably not the best idea?).
An appropriate example would be :
I have documents representing books in a library. every book has a title and also a list of people who has borrowed this book and the date of borrowing.
when a user searches for a book with title containing "JAVA", I want to give priority to books that were borrowed by this user. This could be accomplished by adding a TextField "borrowers", adding a SHOULD clause on it and ordering by score)
also, if there are several books with "JAVA" that this user has borrowed before, I want to show the most recent borrowed ones first. so I thought to create a TextField "borrowers" that will look like
borrowers : "user1__20150505 user2__20150506" etc.
I will add a BooleanClause borrowers: user1* and order by matching term.
any other solution ideas will be welcome
I understand your real problem is more complex, but maybe this is helpful anyway.
You could first search for Tokens in the index that match your query, then for each matching token executing a query using this token specifically.
See https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/index/TermsEnum.html for that. Just seek to the prefix and iterate until the prefix stops matching.
In general it is sometimes easy to just issue two queries. For example one within the corpus of books the user as borrowed before and another witin the whole corpus.
These approaches may not work, but in that case you could implement a custom Scorer somehow mapping the ordering to a number.
See http://opensourceconnections.com/blog/2014/03/12/using-customscorequery-for-custom-solrlucene-scoring/

Google Places API - RadarSearch results are confusing

I'm running a query vs the Google Places RadarSearch API and don't entirely understand the results. I'm trying to find nearby Tesco Supermarkets. My query is structured like this:
https://maps.googleapis.com/maps/api/place/radarsearch/xml?location=51.503186,-0.126446&types=store&keyword=tesco&name=tesco&radius=5000&key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
I've tried a bunch of variations of the fields types, keyword and name. None of the results are Tesco stores. Am i missing something?
The Google docs show the fields as:
keyword — A term to be matched against all content that Google has indexed for this place, including but not limited to name, type, and address, as well as customer reviews and other third-party content.
name — One or more terms to be matched against the names of places, separated by a space character. Results will be restricted to those containing the passed name values. Note that a place may have additional names associated with it, beyond its listed name. The API will try to match the passed name value against all of these names. As a result, places may be returned in the results whose listed names do not match the search term, but whose associated names do.
I always get the maximum of 200 results which maybe includes 1 or 2 Tescos. When I check on Google maps there are 10 Tescos in the radius I am searching. It's as if the api is ignoring the name field. It doesn't matter what I populate in the name field, I still get the same results
UPDATE: Seems this is a known bug https://code.google.com/p/gmaps-api-issues/issues/detail?id=7082
maybe I am wrong, but I believe it is a commercial issue, google will show all business filtering them with a particular criteria they are no publishing the rules, for example in your search, the type you used was "store" , so they are returning to you all stores, and using the name or keyword in their own way who knows which criteria they are internally using, and there is something else, on the API description, the sample that they provide for radar search shows the name of the place in the result, but in the tests i am doing, they are not even sending the name, so you couldn't iterate those results, and filter by your own, for you to get the name, you have to do another call using:
https://maps.googleapis.com/maps/api/place/details/json?placeid=ChIJq4lX1doEdkgR5JXPstgQjc0&key=YOUR_KEY
Maybe there is another way but I don't see it.
I find the radar search is returning strange results today. It worked differently a couple of days ago.
The keyword-parameter has no effect at the moment and I have breaking integration-tests that were working before. I hope this is a temporary issue.
I filed a bug report for it: https://code.google.com/p/gmaps-api-issues/issues/detail?id=7086

Freebase API - listing a city's tourist attractions by relevance

I'm trying to use Freebase to list tourist attractions for cities by relevance.
Using the Topic API, it's simple to retrieve results for a certain city using its MID (e.g. "/m/04jpl" for London)
https:// www.googleapis.com/freebase/v1/topic/m/04jpl/?&filter=/travel/travel_destination/tourist_attractions
However, this gives a limited 10 results. The response ends with "count": 87.0". How do I get all 87? It's possible to click a "87 values total" link on London's Freebase page. Effectively, I want to do the same here.
I realise I could use MQL, but I want the results to be ranked by relevance, not by timestamp. Using the Search API, it's possible to rank by freebase, entity or schema, so I'd rather use that.
First, I looked at the Search Output schema for the Search API. However, even outputting "all" didn't produce Tourist Attraction results. Using metaschema with the Search API DID work. I used "part_of" to select London. However, it only works for some locations:
https:// www.googleapis.com/freebase/v1/search?limit=50&filter=(all%20type:/travel/tourist_attraction%20part_of:/m/04jpl)&indent=true
What I REALLY want to be able to do is make it work for a relatively unknown location like "Loughborough" (MID /m/01z21p). As you can see, substituting /m/04jpl for /m/01z21p produces no results:
https:// www.googleapis.com/freebase/v1/search?limit=50&filter=(all%20type:/travel/tourist_attraction%20part_of:/m/01z21p)&indent=true
Looking at "Loughborough", we see that its tourist attraction like "Loughborough Town Hall" has a "/travel/tourist_attraction/near_travel_destination" of "Loughborough". How would I compose this filter?
I want something like the following (that actually works):
https:// www.googleapis.com/freebase/v1/search?limit=50&filter=(all%20type:/travel/tourist_attraction)&filter=(/travel/tourist_attraction/near_travel_destination:/m/01z21p)&indent=true
Thanks!
NOTE: To enter the links into your browser you need to remove the space between the https:// and www. I would have done so, but I don't have the required permissions here yet to post more than 2 links.
I solved this problem using 2 Freebase API calls.
1) An MQL query that gets a list of all the tourist attractions for a particular MID. These results are not ranked in any useful way. I am also returning the result number to make processing a little easier later
https://www.googleapis.com/freebase/v1/mqlread?query={"mid":"/m/04jpl","/travel/travel_destination/tourist_attractions":[{"mid":null}],"resultnumber:/travel/travel_destination/tourist_attractions":[{"return":"count"}]}
The list of returned MIDs are then used to create a new query (using a for loop). You must enter all MIDs returned from the above query, so that they can all be ranked together.
2) https://www.googleapis.com/freebase/v1/search?limit=10&filter=(any%20mid:/m/0gsxw%20mid:/m/01d_0p%20mid:/m/07gyc)&scoring=entity
It's best to choose a return format that just returns MIDs, to ensure that loading times aren't extensive.
You then have a ranked list of MIDs! You'll need one final query to return whatever details you desire.
I hope this has proved helpful.

How do i include other fields in a lucene search?

Lets use emails for an example as a document. You have your subject, body, the person who its from and lets say we can also tag them (as gmail does)
From my understanding of QueryParser i give it ONE field and the parser type. If a user enter text the user only searches whatever i set. I notice it will look in the subject or body field if i wrote fieldName: text to search however how do i make a regular query such as "funny SO question unicorn" find result(s) with some of those strings in the subject, the others in the body? ATM because i knew it would be easy i made a field called ALL and combined all the other fields into that but i would like to know how i can do it in a proper way. Especially since my next app is text search dependent
Use MultiFieldQueryParser. You can specify list of fields to be searched using following constructor.
MultiFieldQueryParser(Version matchVersion, String[] fields, Analyzer analyzer)
This will generate a query as if you have created multiple queries on different fields. This partially addresses your problem. This, still, will not match one term matching in field1 and another matching in field2. For this, as you have rightly pointed out, you will need to combine all the fields in one single field and search in that field. Nevertheless, you will find MultiFieldQueryParser useful when query terms do not cross the field boundaries.

How would you reproduce a tagging system like the one StackOverflow uses?

I am trying to produce a tagging system for a recruitment agency model and love the way SO separates tags and searches for the remaining phrases.
How would you compare the tags in a table to the search query etc...
I have come up with the following but it has some hickups...
User enters search query
Full text SQL contains() search on tbl_tags
Returns 5 results
Check if each "exact tag phrase" exists in original query string.
If it does exist then add tagID to array.
Remove tag names from original search string...
Search in tbl_people for people with linked TagIDs and search text fields with remaining text.
Example search : French Project Managers with Oracle experience
Tags : [French] [Project Manager]s with [Oracle] experience
Remaining text : s with experience
Now the problem comes when I search for Project Managers it leaves me with a surplus "s"... and there are probably other bugs with this logic too that I cannot account for...
Any ideas on how to make the logic perfect?
Thanks in advance, I understand this might be a bit of an abstract question...
You're missing a key ingredient of how StackOverflow does its search. SO requires that the user delineate the tags in the search string by explicitly putting brackets around the tags. The (probably simplified) logic would then be.
Extract marked tags using regex to detect contents inside brackets
Using list of most common tags, scan string for unmarked tags and extract them.
Remove tag meta characters
Perform full-text search, filtered by tags