relative search in lucene (not geo-sptial search) - lucene

I am having only "Europe" being indexed along with some related data,but when someone searches using the word "Germany" although there is nothing specifically indexed for Germany but logically I can provide results under Europe than providing nothing at all,is there any way to do this? Does lunene have any supporting libraries which can do this?
But I dont want to have any geo-sptial search so how can we achieve this

I think that would just work out of the box by using a multi-valued field. You can have an indexed field which contains geo information (let's call it "place") such as Munich, Bavaria, Germany, Europe, World or Nice, French Riviera, France, Europe, World. Then, if you are looking for something in Bavaria, just run the query:
+text:something +place:(Bavaria Germany Europe World)
This will make all documents which have "something" in their text field appear in the result set, and boost documents depending on how far they are from Bavaria.

Related

OpenRefine: after reconciliation, I want to retrieve the title of Wikidata items in other languages

In the same way I can obtain geographic coordinates for, let's say a list of cities, I'd like to create columns in OpenRefine with the names of those cities in different languages (Venice, Venezia, Wenecja, Wenedig…). Is it possible? Apparently there is no property like that in Wikidata.
Wikidata screenshot
You can fetch those using special property codes in OpenRefine. To fetch the label in a given language, find out the language code for it, and then prepend L to it. For instance, Lit will give you the label in Italian.
You can also do Dfr (French description) or Ade (German aliases).
This is documented at https://wikidata.reconci.link/

SQL Server - Creating a "Search library" of terms to use in a query

Firstly I apologise in advance if this question is a bit bare bones or has misleading/confusing terminology but I'm not sure how else to phrase it.
I have a few tables which capture the language of interactions based on a few different factors. What I would like to do is set up a sort of temporary library of language based terms that I can reference in a query so that I can search the various tables and find matches against the terms stored in the library.
I'll try and give an example:
The library might consist of the following terms:
English, German, French, Italian, Spanish
I then want to search these tables:
teacherSpokenLanguages, courseLanguages, studentLanguages
And find all the rows that contain the search terms in any particular field (and specify which field that term is being found).
I hope there's enough information to piece together my request. Is this even remotely possible? Could I create a temporary table to contain these values perhaps? I can't do anything permanent on the database, it all has to be housed within this one query and has to be non-destructive.

Neo4j 2 Cypher fuzzy search

I'm using Neo4j 2 REST API and I have the ability to add plugins.
I have an entity in my database with the label 'Entity' and name 'United Kingdom'.
How do I execute a fuzzy search to find this entity.
I would like to be able to find it using queries like
United
Kingdom
Uniter Kingdom
United Kinjdom
So the .*<query>.* won't do it.
I notice there was support for something like this in previous versions.
start n = node:index("name : 'United Kinjom'~0.2") return n
But this doesn't appear to work anymore.
It still works. Adding fulltext search to the automatic new schema indexes is on the roadmap. Until then you can still use the "legacy" indexes.
http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/

Hibernate Search: how to query for embeded entities

I like to use Hibernate Search for implementing an sophisticated autosuggestion feature across multiple input fields on a web page.
Each input field is for its own entity, let's say Country and City. There is a many-to-one relationship between both entities
(countries contain cities).
The autosuggestion should work such that when typing e.g. a country name prefix and the city field is already filled,
you get only suggestions for countries that have such a city (and vice versa).
The server side autosuggestion service should return list of projections
(entityId, entityName) which are rendered into the input field (dropdown, whatever).
According to the schema and after having read the manual I tried the following index schema:
SearchMapping mapping = new SearchMapping();
mapping.analyzerDef(...
.entity(City.class).indexed().indexName("MyIndex")
.property("cityId", ElementType.FIELD)
.documentId()
.name("id")
.property("name", ElementType.FIELD)
.documentId()
.name("id")
.property("country", ElementType.METHOD)
.indexEmbedded()
.entity(Country.class).indexed()
.property("id", ElementType.FIELD)
.documentId()
.name("id")
.property("name", ElementType.METHOD)
.field()
.name("name")
This mapping defines City to be the main entity, right?
I have indexed all cities and am able to query for them (also by combining both fields). However, I only get matches when querying for cities.
i.e. when querying like
fullTextSession.getSearchFactory().buildQueryBuilder().forEntity(City.class).get();
This is not useful for the country field becuse when I type in "Spain", I get a single row for each city of Spain. (Spain, Spain, Spain, Spain ,.... ;-))
The question is: How is it possible to search for country entities? Changing the index structure? The indexing procedure? Or how to query?
The only way I found was to setup a Facet for country, and you the different possible facets as autosuggestion. However, this is also not perfect
since it is not possible to sort facets alphabetically.
Of course, in this example, I could switch both entities in the mapping, but suppose scenarios with more complex entity graphs.
UPDATE: adding queries requested in comment
For building queries, I employ the QueryBuilder. The following produces a result set like in the Spain example:
fullTextSession.getSearchFactory().buildQueryBuilder().forEntity(City.class).get();
with query:
country.name:Spain
If I try to use a query builder for countries
fullTextSession.getSearchFactory().buildQueryBuilder().forEntity(Country.class).get();
and query:
name:Spain
I get no results.
You are not showing your actual query. You don't have to use the query DSL, but you can also write native Lucene queries. In both cases (DSL or native Lucene) you can combine queries via boolean logic. Embedded entities follow the java bean notation. The country name would for example in a city query be reached as country.name. Again, without your actual query it is hard to give any more specific feedback.
Last, but not least, facets can also be sorted alphabetically. Check FacetSortOrder.COUNT_DESC.

Lucene exact ordering

I've had this long term issue in not quite understanding how to implement a decent Lucene sort or ranking. Say I have a list of cities and their populations. If someone searches "new" or "london" I want the list of prefix matches ordered by population, and I have that working with a prefix search and an sort by field reversed, where there is a population field, IE New Mexico, New York; or London, Londonderry.
However I also always want the exact matching name to be at the top. So in the case of "London" the list should show "London, London, Londonderry" where the first London is in the UK and the second London is in Connecticut, even if Londonderry has a higher population than London CT.
Does anyone have a single query solution?
dlamblin,let me see if I get this correctly: You want to make a prefix-based query, and then sort the results by population, and maybe combine the sort order with preference for exact matches.
I suggest you separate the search from the sort and use a CustomSorter for the sorting:
Here's a blog entry describing a custom sorter.
The classic Lucene book describes this well.
API for
Sortcomparator
says
There is a distinct Comparable for each unique term in the field - if
some documents have the same term in
the field, the cache array will have
entries which reference the same
Comparable
You can apply a
FieldSortedHitQueue
to the sortcomparator which has a Comparator field for which the api says ...
Stores a comparator corresponding to
each field being sorted by.
Thus the term can be sorted accordingly
My current solution is to create an exact searcher and a prefix searcher, both sorted by reverse population, and then copy out all my hits starting from the exact hits, moving to the prefix hits. It makes paging my results slightly more annoying than I think it should be.
Also I used a hash to eliminate duplicates but later changed the prefix searcher into a boolean query of a prefix search (MUST) with an exact search (MUST NOT), to have Lucene remove the duplicates. Though this seemed even more wasteful.
Edit: Moved to a comment (since the feature now exists): Yuval F Thank you for your blog post ... How would the sort comparator know that the name field "london" exactly matches the search term "london" if it cannot access the search term?