RavenDB spatial search with bounding rectangle - ravendb

I have documents similar to this in RavenDB:
public class MyClass
{
...
public double Latitude { get; set; }
public double Longitude { get; set; }
...
}
Now I want to find all documents with positions within the bounds of a rectangle specified by its N and S latitude and W and E longitude.
A simple approach would be a query like this:
.Where(o => o.Latitude <= boundaryNorth &&
o.Latitude >= boundarySouth &&
o.Longitude >= boundaryWest &&
o.Longitude <= boundaryEast)
But that doesn't work if the bounding rectangle lies across the antimeridian, complicating the query (check if this is the case, split up the bounding rectangle into two, create two of the previous expressions combined by ||).
I know that you can create a spatial index and then query it for all documents with positions within a specified radius from a given origin, but I haven't found any other methods of querying that index.
Is there any other way to query spatial indexes?

Currently we only allow to search for items within a specific distance of a point.
We do have other capabilities, but they aren't expose at the present time.
I suggest taking this to the mailing list and seeing if we can provide you with the API you want.

Related

Kotlin - Can the minimum distance between a location and a list of locations be found with a one liner?

I would like the following "pseudocode" to be valid syntax (but clearly it's not) :
minDistance = minOf(myLocations.forEach{return location.distanceTo(it)})
To clarify, I am trying to find the distance from the smartphone (location) to the closest location in a mutable list of locations (myLocations).
Does Kotlin allow this level of terseness, or must I break it up in a few more lines and help variables?
I believe this is what you're looking for
minDistance = myLocations.minOf { location.distanceTo(it) }
Additional info:
If you want the location with the shortest distance instead, then you can use
myLocations.minByOrNull { location.distanceTo(it) }

How to index 2-dimensional values like x,y position in map

I'm developing a map in Kotlin.
When I zoom in, I need to know what place in my screen (ex: Places in 123.45<x<124.57, 63.2<y<64.5)
When I used MySql, it calculated very fast.
But I have to develop data structure myself.
For this I used 2 TreeMap each index x, y and intersect them. But it has awful performance.
What Can I do to index 2-dimensional index. And get values between each value.
Add ===========================================================
// method 1: without treemap
for (point in points){
if(point.x in startX..finishX && point.y in startY..finishY){
satisfiedPoints.add(point)
}
}
// method 2: with treemap
val satisfiedXPoints = xIndex.subMap(startX, true, finishX, true).values
for(point in satisfiedXPoints){
if(point.y in startY..finishY){
satisfiedPoints.add(point)
}
}
I used TreeMap like this. But method 2 takes more time than 1. Is there something wrong with iterate map?

Getting Term Frequencies For Query

In Lucene, a query can be composed of many sub-queries. (such as TermQuery objects)
I'd like a way to iterate over the documents returned by a search, and for each document, to then iterate over the sub-queries.
For each sub-query, I'd like to get the number of times it matched. (I'm also interested in the fieldNorm, etc.)
I can get access to that data by using indexSearcher.explain, but that feels quite hacky because I would then need to parse the "description" member of each nested Explanation object to try and find the term frequency, etc. (also, calling "explain" is very slow, so I'm hoping for a faster approach)
The context here is that I'd like to experiment with re-ranking Lucene's top N search results, and to do that it's obviously helpful to extract as many "features" as possible about the matches.
Via looking at the source code for classes like TermQuery, the following appears to be a basic approach:
// For each document... (scoreDoc.doc is an integer)
Weight weight = weightCache.get(query);
if (weight == null)
{
weight = query.createWeight(indexSearcher, true);
weightCache.put(query, weight);
}
IndexReaderContext context = indexReader.getContext();
List<LeafReaderContext> leafContexts = context.leaves();
int n = ReaderUtil.subIndex(scoreDoc.doc, leafContexts);
LeafReaderContext leafReaderContext = leafContexts.get(n);
Scorer scorer = weight.scorer(leafReaderContext);
int deBasedDoc = scoreDoc.doc - leafReaderContext.docBase;
int thisDoc = scorer.iterator().advance(deBasedDoc);
float freq = 0;
if (thisDoc == deBasedDoc)
{
freq = scorer.freq();
}
The 'weightCache' is of type Map and is useful so that you don't have to re-create the Weight object for every document you process. (otherwise, the code runs about 10x slower)
Is this approximately what I should be doing? Are there any obvious ways to make this run faster? (it takes approx 2 ms for 280 documents, as compared to about 1 ms to perform the query itself)
Another challenge with this approach is that it requires code to navigate through your Query object to try and find the sub-queries. For example, if it's a BooleanQuery, you call query.clauses() and recurse on them to look for all leaf TermQuery objects, etc. Not sure if there is a more elegant / less brittle way to do that.

How can I ask Lucene to do simple, flat scoring?

Let me preface by saying that I'm not using Lucene in a very common way and explain how my question makes sense. I'm using Lucene to do searches in structured records. That is, each document, that is indexed, is a set of fields with short values from a given set. Each field is analysed and stored, the analysis producing usually no more than 3 and in most cases just 1 normalised token. As an example, imagine files for each of which we store two fields: the path to the file and a user rating in 1-5. The path is tokenized with a PathHierarchyTokenizer and the rating is just stored as-is. So, if we have a document like
path: "/a/b/file.txt"
rating: 3
This document will have for its path field the tokens "/a", "/a/b" and "/a/b/file.ext", and for rating the token "3".
I wish to score this document against a query like "path:/a path:/a/b path:/a/b/different.txt rating:1" and get a value of 2 - the number of terms that match.
My understanding and observation is that the score of the document depends on various term metrics and with many documents with many fields each, I most definitely am not getting simple integer scores.
Is there some way to make Lucene score documents in the outlined fashion? The queries that are run against the index are not generated by the users, but are built by the system and have an optional filter attached, meaning they all have a fixed form of several TermQuerys joined in a BooleanQuery with nothing like any fuzzy textual searches. Currently I don't have the option of replacing Lucene with something else, but suggestions are welcome for a future development.
I doubt there's something ready to use, so most probably you will need to implement your own scorer and use it when searching. For complicated cases you may want to play around with queries, but for simple case like yours it should be enough to overwrite DefaultSimilarity setting tf factor to raw frequency (number of specified terms in document in question) and all other components to 1. Something like this:
public class MySimilarity extends DefaultSimilarity {
#Override
public float computeNorm(String field, FieldInvertState state) {
return 1;
}
#Override
public float queryNorm(float sumOfSquaredWeights) {
return 1;
}
#Override
public float tf(float freq) {
return freq;
}
#Override
public float idf(int docFreq, int numDocs) {
return 1;
}
#Override
public float coord(int overlap, int maxOverlap) {
return 1;
}
}
(Note, that tf() is the only method that returns something different than 1)
And the just set similarity on IndexSearcher.

Lucene SpellChecker Prefer Permutations or special scoring

I'm using Lucene.NET 3.0.3
How can I modify the scoring of the SpellChecker (or queries in general) using a given function?
Specifically, I want the SpellChecker to score any results that are permutations of the searched word higher than the the rest of the suggestions, but I don't know where this should be done.
I would also accept an answer explaining how to do this with a normal query. I have the function, but I don't know if it would be better to make it a query or a filter or something else.
I think the best way to go about this would be to use a customized Comparator in the SpellChecker object.
Check out the source code of the default comparator here:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-spellchecker/3.6.0/org/apache/lucene/search/spell/SuggestWordScoreComparator.java?av=f
Pretty simple stuff, should be easy to extend if you already have the algorithm you want to use to compare the two Strings.
Then you can use set it up to use your comparator with SpellChecker.SetComparator
I think I mentioned the possiblity of using a Filter for this in a previous question to you, but I don't think that's really the right way to go, looking at it a bit more.
EDIT---
Indeed, No Comparator is available in 3.0.3, So I believe you'll need to access the scoring through the a StringDistance object. The Comparator would be nicer, since the scoring has already been applied and is passed into it to do what you please with it. Extending a StringDistance may be a bit less concrete since you will have to apply your rules as a part of the score.
You'll probably want to extend LevensteinDistance (source code), which is the default StringDistance implementation, but of course, feel free to try JaroWinklerDistance as well. Not really that familiar with the algorithm.
Primarily, you'll want to override getDistance and apply your scoring rules there, after getting a baseline distance from the standard (parent) implementation's getDistance call.
I would probably implement something like (assuming you ahve a helper method boolean isPermutation(String, String):
class CustomDistance() extends LevensteinDistance{
float getDistance(String target, String other) {
float distance = super.getDistance();
if (isPermutation(target, other)) {
distance = distance + (1 - distance) / 2;
}
return distance;
}
}
To calculate a score half again closer to 1 for a result that is a permuation (that is, if the default algorithm gave distance = .6, this would return distance = .8, etc.). Distances returned must be between 0 and 1. My example is just one idea of a possible scoring for it, but you will likely need to tune your algorithm somewhat. I'd be cautious about simply returning 1.0 for all permutations, since that would be certain to prefer 'isews' over 'weis' when looking with 'weiss', and it would also lose the ability to sort the closeness of different permutations ('isews' and 'wiess' would be equal matches to 'weiss' in that case).
Once you have your Custom StringDistance it can be passed to SpellChecker either through the Constructor, or with SpellChecker.setStringDistance
From femtoRgon's advice, here's what I ended up doing:
public class PermutationDistance: SpellChecker.Net.Search.Spell.StringDistance
{
public PermutationDistance()
{
}
public float GetDistance(string target, string other)
{
LevenshteinDistance l = new LevenshteinDistance();
float distance = l.GetDistance(target, other);
distance = distance + ((1 - distance) * PermutationScore(target, other));
return distance;
}
public bool IsPermutation(string a, string b)
{
char[] ac = a.ToLower().ToCharArray();
char[] bc = b.ToLower().ToCharArray();
Array.Sort(ac);
Array.Sort(bc);
a = new string(ac);
b = new string(bc);
return a == b;
}
public float PermutationScore(string a, string b)
{
char[] ac = a.ToLower().ToCharArray();
char[] bc = b.ToLower().ToCharArray();
Array.Sort(ac);
Array.Sort(bc);
a = new string(ac);
b = new string(bc);
LevenshteinDistance l = new LevenshteinDistance();
return l.GetDistance(a, b);
}
}
Then:
_spellChecker.setStringDistance(new PermutationDistance());
List<string> suggestions = _spellChecker.SuggestSimilar(word, 10).ToList();