Selecting documents with empty strings in RavenDB Studio - ravendb

This ought to be simple enough, but I can't get a select statement going in RavenDB Studio where I say "give me all documents where field NAME_x is empty".
The usual suspects don't yield anything:
NAME_5:[[EMPTY_STRING]]
NOT NAME_5:[[EMPTY_STRING]] (just to see if the logic was backwards somehow)
NAME_5: [[NULL_VALUE]]
The index DOES include all fields (NAME_0 through NAME_5) and there ARE plenty of records that have this
(...)
"NAME_5": "",
(...)
so my quesion is - why don't they show up ? I guess the query's syntax is wrong, but I don't know where/how.
Thanks for a hint !

This should just work.
See: http://live-test.ravendb.net/studio/index.html#databases/query/index/recentquery--187795092?&database=44438301
Here is what I get when I try.

Related

Why does this choose statement not work in an Access criteria?

I really don't know what I did wrong...I was following advice from a blog post stating that this code would allow me to keep Access from breaking up my criteria (I have a ton of criteria and it was making this statement into four separate lines and adding columns.) Here's my code right now.
Choose(1,(([dbo_customerQuery].[store])>=[forms]![TransactionsForm]![txtStoreFrom] Or
[forms]![TransactionsForm]![txtStoreFrom] Is Null) And (([dbo_customerQuery].[store])
<=[forms]![TransactionsForm]![txtStoreTo] Or [forms]![TransactionsForm]![txtStoreTo]
Is Null))
The statement inside of the choose is definitely correct so am I using "Choose" wrong? I don't get it, the blog post used it exactly this way. When I execute queries, no matter what those fields do, I end up getting no results. The query is supposed to filter based on a date range, taking null values into account
My concern is that you are trying to work around a bad design. You may get this immediate issue solved to some degree, and continue to build the bad design. Access is flexible, and forgiving, but there's a big price eventually -- maybe you're already there.
I realize this is not an answer. It may seem rude -- I apologize. But I think the general advice may help you. I'll tag this "community wiki" since I'm not contributing to a programming solution.
It might be the placing of the parens, try this:
Choose(1,(([dbo_customerQuery].[store]>=[forms]![TransactionsForm]![txtStoreFrom]) Or
[forms]![TransactionsForm]![txtStoreFrom] Is Null) And (([dbo_customerQuery].[store]
<=[forms]![TransactionsForm]![txtStoreTo]) Or [forms]![TransactionsForm]![txtStoreTo]
Is Null))
I have moved two closing parentheses.
I have found what it was now. My statement
Choose(1,(([dbo_customerQuery].[store])>=[forms]![TransactionsForm]![txtStoreFrom] Or
[forms]![TransactionsForm]![txtStoreFrom] Is Null) And (([dbo_customerQuery].[store])
<=[forms]![TransactionsForm]![txtStoreTo] Or [forms]![TransactionsForm]![txtStoreTo]
Is Null))
was correct, the problem was I assumed it would work as a criteria, but it actually had to be done exactly as in the blog post posted above. It had to be posted directly as the FIELD, with "<> False" being the criteria.
Once done, it did stay on one line, and it worked just as expected.

Lucene query fails with mixed MUST/MUST_NOT

Given a document with this text, indexed in a field named Content:
The dish ran away with the spoon.
The following query fails to match that document:
+Content:dish +(-Content:xyz) <-- no results!
I want the query to be treated as must include "dish", must not include "xyz". It's the "must not" part that is failing.
I know the +- combination looks funny but syntactically it should be correct, especially considering that the following variations all work:
+Content:dish +(-Content:xyz +Content:spoon) <-- this works
+Content:dish -Content:xyz <-- this works
So why doesn't +(-Content:xyz) work? Is that by design, or a bug, or am I just missing something? I'm using Lucene.Net but I assume regular Lucene behaves the same.
Lucene doesn't start with a full view of everything, like a SQL database. Lucene starts with no documents matched, and finds things based on the clauses searched on. This is why:
-Content:xyz
On it's own doesn't really work. It knows not to bring in content:xyz, but hasn't been given any documents to match. The same is true of your query, because it's placed in a subquery.
-Content:xyz is evaluated first, which gets no docs on it's own. So then you have, effectively
+Content:dish +(no documents)
It's useful to think of - as an AND NOT rather than simply a NOT (though don't take that to imply the +/- and AND/OR/NOT syntax necessarily map to each other directly).
If you want to be able to execute a lonely negative query like that, you need to bring in all documents first. The MatchAllDocsQuery is the best way to accomplish that, something like:
BooleanQuery query = new BooleanQuery();
query.add(new BooleanClause(new MatchAllDocsQuery(), BooleanClause.Occur.SHOULD));
query.add(new BooleanClause(new TermQuery(new Term("Content","xyz")), BooleanClause.Occur.MUST_NOT));
Would be the equivalent of a SQL style query with only a negation for a WHERE clause.
Of course, this isn't really necessary in the case you've listed since:
+Content:dish -Content:xyz
Is perfectly adequate.

Lucene Boolean Query on Not ANalyzed Fields

Using RavenDB to do a query on Lucene Index.
This query parses okay:
X:[[a]] AND Y:[[b]] AND Z:[[c]]
However this query gives me a parse exception:
X:[[a]] AND Y:[[b]] AND Z:[[c]] AND P:[[d]]
"Lucene.Net.QueryParsers.ParseException: Cannot parse '( AND )': Encountered \" \"AND"
I tried this on complexed index and simple reproduce cases and same result it seems once you go past three ands it blows up. Im using [[]] and not analyzed because i want exact matches (also sometimes values contain whitespace etc..) and from RavenDB I have veyr little control over the indexing.
Im wondering how I can rewrite the query to avoid the parse exception?
This is now fixed in the latest RavenDB builds. See this thread for more info.
This looks rather like a bug in Lucene's QueryParser, perhaps try reporting this in the user mailing list.
As a bypass, you could create a BooleanQuery manually and add the terms you want yourself. Since they are not analyzed, and the query doesn't look too complicated, you may be better off without the query-parser.

Lucene Index and Query Design Question - Searching People

I have recently just started working with Lucene (specifically, Lucene.Net) and have successfully created several indicies and have no problem with any of them. Previously having worked with Endeca, I find that Lucene is lightweight, powerful, and has a much lower learning curve (due mostly to a concise API).
However, I have one specific index/query situation which I am having problems wrapping my head around. What I have is a person directory. People can be searched for in this application, with the goal of returning both exact and approximate matches. Right now, in the index I concatenate the "FirstName" and "LastName" into a single field called "FullName", adding a space between the two. So FirstName:Jon with LastName:Smith yield FullName:Jon Smith. I do anticipate the possibility of middle names and possibly suffix, but that is not important at the moment.
I would like to do the equivalent of a fuzzy search on the name, so someone searching for "John Smith" would still get back "Jon Smith". I had thought about a multisearch, however, this becomes more involved if his name was actually "Jon Del Carmen" or "Jon Paul Del Carmen". I have nothing in what the user types in to delineate the first name or last name pieces.
The only thought that I have is that I could replace spaces in the concatenated value with a character that would not be discarded. If I did this when I built the document for the index and also when I parsed the query, I could treat it as one larger word, right? Is there another way to do this that would work for both simple names ("Jon Smith") and also more complex names ("Jon Paul Del Carmen")?
Any advice would truly be appreciated. Thanks in advance!
Edit: Additional detail follows.
In Luke, I put in the following query:
FullName:jonn smith~
It is being parsed as:
FullName:jonn CreatedOn:smith~0.5
With an Explanation of:
BooleanQuery:boost=1.0000
clauses=2, maxClauses=1024
Clause 0: SHOULD
TermQuery:boost=1.0000
Term: field='FullName' text='jonn'
Cluase 1: SHOULD
FuzzyQuery: boost=1.0000
prefixLen=0, minSimilarity=0.5000
org.apache.lucene.search.FuzzyTermEnum: diff=-1.0000
FilteredTermEnum: Exception null
"CreatedOn" is another Field in the index. I tried putting quotes around the term "jonn smith", but it then treats it like a phrasequery, instead. I am sure that the problem is that I am just not doing something right, but being so green at all of this, I am not sure what that something truly is.
My problem was with how I was building the index. What I ended up doing was making sure that it was not tokenizing the FullName, and the query started returning the correct results. The Explain results from above were due to an ID10T error on my part and is now returning correctly.

Prevent "Too Many Clauses" on lucene query

In my tests I suddenly bumped into a Too Many Clauses exception when trying to get the hits from a boolean query that consisted of a termquery and a wildcard query.
I searched around the net and on the found resources they suggest to increase the BooleanQuery.SetMaxClauseCount().
This sounds fishy to me.. To what should I up it? How can I rely that this new magic number will be sufficient for my query? How far can I increment this number before all hell breaks loose?
In general I feel this is not a solution. There must be a deeper problem..
The query was +{+companyName:mercedes +paintCode:a*} and the index has ~2.5M documents.
the paintCode:a* part of the query is a prefix query for any paintCode beginning with an "a". Is that what you're aiming for?
Lucene expands prefix queries into a boolean query containing all the possible terms that match the prefix. In your case, apparently there are more than 1024 possible paintCodes that begin with an "a".
If it sounds to you like prefix queries are useless, you're not far from the truth.
I would suggest you change your indexing scheme to avoid using a Prefix Query. I'm not sure what you're trying to accomplish with your example, but if you want to search for paint codes by first letter, make a paintCodeFirstLetter field and search by that field.
ADDED
If you're desperate, and are willing to accept partial results, you can build your own Lucene version from source. You need to make changes to the files PrefixQuery.java and MultiTermQuery.java, both under org/apache/lucene/search. In the rewrite method of both classes, change the line
query.add(tq, BooleanClause.Occur.SHOULD); // add to query
to
try {
query.add(tq, BooleanClause.Occur.SHOULD); // add to query
} catch (TooManyClauses e) {
break;
}
I did this for my own project and it works.
If you really don't like the idea of changing Lucene, you could write your own PrefixQuery variant and your own QueryParser, but I don't think it's much better.
It seems like you are using this on a field that is sort of a Keyword type (meaning there will not be multiple tokens in your data source field).
There is a suggestion here that seems pretty elegant to me: http://grokbase.com/t/lucene.apache.org/java-user/2007/11/substring-indexing-to-avoid-toomanyclauses-exception/12f7s7kzp2emktbn66tdmfpcxfya
The basic idea is to break down your term into multiple fields with increasing length until you are pretty sure you will not hit the clause limit.
Example:
Imagine a paintCode like this:
"a4c2d3"
When indexing this value, you create the following field values in your document:
[paintCode]: "a4c2d3"
[paintCode1n]: "a"
[paintCode2n]: "a4"
[paintCode3n]: "a4c"
By the time you query, the number of characters in your term decide which field to search on. This means that you will perform a prefix query only for terms with more of 3 characters, which greatly decreases the internal result count, preventing the infamous TooManyBooleanClausesException. Apparently this also speeds up the searching process.
You can easily automate a process that breaks down the terms automatically and fills the documents with values according to a name scheme during indexing.
Some issues may arise if you have multiple tokens for each field. You can find more details in the article