Why or() queries are not using index in Datastax DSE 5.0.x Graph? - datastax

I have created an index on User and on uuid
if I do:
schema.vertexLabel("User").describe()
I get:
schema.vertexLabel("User").index("byUuid").materialized().by("uuid").add()
When I am running:
g.V().hasLabel("User").has("uuid","oneUuid")
The index is picked up properly..
but when I do the following:
g.V().or(__.hasLabel("User").has("uuid","oneUuid"), __.hasLabel("User").has("uuid","anotherUUID"))
I am getting:
org.apache.tinkerpop.gremlin.driver.exception.ResponseException: Could
not find an index to answer query clause and graph.allow_scan is
disabled:
Thanks!

or() is not easily optimizable as you can do much more complicated things like:
g.V().or(
hasLabel("User").has("uuid","oneUuid"),
hasLabel("User").has("name","bob"))
...where one or more condition could not be answered by an index lookup. It is doable and will likely be done in future versions, but afaik none of the currently available graph databases is trying to optimize the OrStep.
Anyway, your sample query can easily be rewritten so that it actually uses the index:
g.V().hasLabel("User").has("uuid", within("oneUuid", "anotherUUID"))

Related

Difficulties using Janusgraph indexes with Gremlin UNION

I have an issue using UNION operator: in this first example the query works but gives no result:
g.V().has('name','Barack Obama').union(has('name','Michelle Obama'))
Instead in this second example the gremlin compiler replies that cannot use indexes:
g.V().union(has('name','Barack Obama'), g.V().has('name','Michelle Obama'))
Could not find a suitable index to aswer graph query and graph scans are disabled: [()]:VERTEX
Am I wrongly doing this type of query or has Janugraph some limitations?
Not sure about the error message, probably related to the fact you are trying to start a new traversal inside the union step.
I think this is the query you are trying to run:
g.V().union(has('name','Barack Obama'), has('name','Michelle Obama'))
or even better:
g.V().has('name', within('Barack Obama', 'Michelle Obama'))

Django performance issue with exclude

I have word model and phrase model
class Word(models.Model):
checked = models.BooleanField(default=False)
class Phrase(models.Model):
words = models.ManyToManyField(Word, null=True,related_name = "phrases")
Word model has attribute checked and many to many connection to phrase
I have a query, something like this:
Phrase.objects.exclude(words__checked=True).filter(words__group__pk__in = groups_ids)
But it works really really slow on big datasets, and the problem is in exclude section, cause without it - it works fast enough
So I found a suggestion that I should use raw sql here,
Performance issue with django exclude
So, how can rewrite this sentence with raw sql ?
(I need this query to both work postgres and mysql due to requirements, or I will need two queries if one query can't achieve this, postgres query has more priority)
So far I 've tried to use .extra syntax,but it didn't work, so asking it here.

Hibernate Search - possible to get new Lucene query after facets applied?

A Lucene Query is generated as so:
Query luceneQuery = builder.all().createQuery();
Then facets are applied.
I'm not sure if when facets are applied the luceneQuery is ANDed and ORed with other Querys resulting in a new Lucene Query. Alternatively, perhaps a bunch of BitSets's are applied to the original Query to refine the results. (I don't know).
If a new query is generated I'd like to retrieve it. If not, I need a rethink. That's the crux of the question.
Why:
I'm applying a faceted search on a field with multiple possible values.
E.g. TMovie.class many-to-many TTag.class (multiple-value-facet)
I'm filtering on TMovie where TTag is some value.
Anyway, the filtering works but there is a known problem whereby the Facet-counts returned are incorrect.
Detailed here: Add faceting over multivalued to application using Hibernate Search and https://forum.hibernate.org/viewtopic.php?f=9&t=1010472
I'm using this solution:
http://sujitpal.blogspot.ie/2007/04/lucene-search-within-search-with.html (see comment on new API under article)
The BitSet solution (in this example at least) generates counts based on the original Lucene Query. This works perfectly. However.....
If alternate (different, not TTags) facets are applied to the original query some complications arise.
The Bitset solution calculates on the original Lucene query. It does not calculate on the lucene query now reduced by the application of alternate Facets (a different FacetSelection) (or even TTag Facets themselves for that matter). I.e. the count calculations are irrespective of any other FacetSelection Facets applied.
So...
A. can I get the new Lucene query after facets are applied? The BitSet solution applied to this would be correct.
B. Any other alternative suggestions?
Thanks so much.. All comments welcome.
John
Regarding your first question, applying a facet is not modifying the original query, it uses a custom Collector called FacetCollector - see https://github.com/hibernate/hibernate-search/blob/master/engine/src/main/java/org/hibernate/search/query/collector/impl/FacetCollector.java. Under the hood the collector uses a Lucene FieldCache for doing the facet count. There is also the root of the limitation for multi-value faceting. FieldCache does not support multiple values per field.
Anyways, no additional queries are applied during faceting and the original query is unmodified. The benefit of course is performance. The solution you are pointing to probably works as well, but relies on running multiple queries. However, it might be a valid work around for your use case.

not query in lucene

i need to do not queries on my lucene index. Lucene currently allows not only when we have two or more terms in the query:
So I can do something like:
country:canada not sweden
but I can't run a query like:
country:not sweden
Could you please let me know if there is some efficient solution for this problem
Thanks
A very late reply, but it might be useful for somebody else later:
*:* AND NOT country:sweden
IF I'm not mistaken this should do a logical "AND" with all documents and the documents with a country that is different from "sweden".
Try with the following query in the search box:
NOT message:"warning"
message being the search field
Please check answer for similar question. The solution is to use MatchAllDocsQuery.
The short answer is that this is not possible using the standard Lucene.
Lucene does not allow NOT queries as a single term for the same reason it does not allow prefix queries - to perform either, the engine would have to look through each document to ascertain whether the document is/is not a hit. It has to look through each document because it cannot use the search term as the key to look up documents in the inverted index (used to store the indexed documents).
To take your case as an example:
To search for not sweden, the simplest (and possibly most efficient) approach would be to search for sweden and then "invert" the result set to return all documents that are not in that result set. Doing this would require finding all the required (ie. not in the result set) documents in the index, but without a key to look them up by. This would be done by iterating over the documents in the index - a task it is not optimised for, and hence speed would suffer.
If you really need this functionality, you could maintain your own list of items when indexing, so that a not sweden search becomes a sweden search using Lucene, followed by an inversion of the results using your set of items.
OK, I see what you are trying to do.
You can use it as a query refinement since there are no unary Boolean operators in Lucene. Despite the answers above, I believe this is a better and most forward approach (note the space before the wildcard):
&query= *&qf=-country:Canada

Lucene Boolean Query on Not ANalyzed Fields

Using RavenDB to do a query on Lucene Index.
This query parses okay:
X:[[a]] AND Y:[[b]] AND Z:[[c]]
However this query gives me a parse exception:
X:[[a]] AND Y:[[b]] AND Z:[[c]] AND P:[[d]]
"Lucene.Net.QueryParsers.ParseException: Cannot parse '( AND )': Encountered \" \"AND"
I tried this on complexed index and simple reproduce cases and same result it seems once you go past three ands it blows up. Im using [[]] and not analyzed because i want exact matches (also sometimes values contain whitespace etc..) and from RavenDB I have veyr little control over the indexing.
Im wondering how I can rewrite the query to avoid the parse exception?
This is now fixed in the latest RavenDB builds. See this thread for more info.
This looks rather like a bug in Lucene's QueryParser, perhaps try reporting this in the user mailing list.
As a bypass, you could create a BooleanQuery manually and add the terms you want yourself. Since they are not analyzed, and the query doesn't look too complicated, you may be better off without the query-parser.