How can I write a ravenDB query in the studio that finds all fields that are not empty - lucene

I'm new to lucene, day 1 new. So I've read a tutorial on lucene and spent a while trying to work out how to find a non null value in lucene.
So I have a document called Inspect
The document has two fields I'm interested in: Inspect and Direct.
{
"Inspect": "Feather",
"Direct": {}
}
I want to find all documents where Inspect = "Feather" and Direct is not empty.
I am also interested in finding documents where Direct is also empty.
I am doing this in the ravenDB studio, so I am using lucene. I have tried a few things like
Inspect: Feather
And NOT
Direct: [[NULL_VALUE]]
However this doesn't seem to work. Any advice or some direction would be much appreciated.
Cheers

You need to run a query like this:
Inspect: Feather AND NOT Direct.Count: 0
When you are comparing to a null object, it fails, Direct is not null, but with the .Count there you are actually counting the number of properties in the object, which seems to be what you want.

#stacka Hi! I'm also rather new to RavenDB, but I have some ideas that may help you. First of all, use the '-' (minus) character instead of NOT. It's a convention. Second, you may face the problem that query cannot be run against db, when any property is not indexed. So, you should create one including the field you want to query against. Hope, this would help.

Related

Lucene part:123 vs part 123 vs part123

I'm not too familiar with Lucene so my apologies if this isn't clear or I'm getting my terms/nomenclature mixed up.
So I have a requirement where a field containing text (example part:123) should be able to be found via the following:
part:123
part 123
part123
Now my understanding of StandardAnalyzer is that it will break the word "part:123" into terms "part" and "123".
So with that, I'm able to search with part:123 or part 123, but because they're two different terms "part123" won't work.
It seems to me like I'd also need to get the indexer to add another term where both are combined, so it'd be "part", "123", "part123".
Is this the right way to accomplish this -- and does anyone know how I'd go about implementing this?
Thanks!

How to index only words with a minimum length using Apache Lucene 5.3.1?

may someone give me a hint on how to index only words with a minimum length using Apache Lucene 5.3.1?
I've searched through the API but didn't find anything which suits my needs except this, but I couldn't figure out how to use that.
Thanks!
Edit:
I guess that's important info, so here's a copy of my explanation of what I want to achieve from my reply below:
"I don't intend to use queries. I want to create a source code summarization tool for which I created a doc-term matrix using Lucene. Now it also shows single- or double-character words. I want to exclude them so they don't show up in the results as they have little value for a summary. I know I could filter them when outputting the results, but that's not a clean solution imo. An even worse would be to add all combinations of single- or double-character words to the stoplist. I am hoping there is a more elegant way then one of those."
You should use a custom Analyzer with LengthTokeFilter. E.g.
Analyzer ana = CustomAnalyzer.builder()
.withTokenizer("standard")
.addTokenFilter("standard")
.addTokenFilter("lowercase")
.addTokenFilter("length", "min", "4", "max", "50")
.addTokenFilter("stop", "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
.build();
But it is better to use a stopword (words what occur in almost all documents, like articles for English language) list. This gives a more accurate result.

LINQ query to filter a list returns nothing

Following a tutorial, i've built a LINQ query to filter a List of objects:
Dim Query = From FilteredItem In Items, FiltSupp In FilteredItem.Suppliers
Where ((SuppliersComboBox.SelectedItem.ToString = "All Suppliers" OrElse FiltSupp.CompanyName = SelectedSupplier) And
(String.IsNullOrEmpty(TxtItemCode.Text) OrElse FilteredItem.ItemCode.ToString.Contains(TxtItemCode.Text)) And
(String.IsNullOrEmpty(TxtName.Text) OrElse FilteredItem.Name.ToLower.Contains(TxtName.Text.ToLower)) And
(String.IsNullOrEmpty(TxtPkgCost.Text) OrElse FilteredItem.PkgCost.ToString.Contains(TxtPkgCost.Text)))
Select FilteredItem
The problem is that the query returns nothing even when the list is being populated.
At debugging, the list value is correct (populated with few items), but the query doesn't returning anything. I can't figure out what is wrong, so any suggestion would be greatly appreciated!
Ps. I'm using vb.net, but answers in C# are also wellcome
To start off with, this answer is obviously not complete yet, so please hold off any downvotes until I finalize it. I have every intention of helping the OP if I can, however at this time there is no way for me to even begin helping out because the information needed is not present.
#ISAE, Can you please provide the following:
hoping the Items being queried is a strongly typed DataTable; the class file for the Items object type, preferably the designer files also (.xsc, .xsd, and .xss)
the forms' designer files and code behinds.
Alternatively, if the project can be put on GitHub, that would be even better. If you can't release all of that code, then really need to know enough to somewhat reconstruct what's going on. As one of the comments pointed out, that LINQ query is really messy and it's hard to dissect right now.

TSearch2 - dots explosion

Following conversion
SELECT to_tsvector('english', 'Google.com');
returns this:
'google.com':1
Why does TSearch2 engine didn't return something like this?
'google':2, 'com':1
Or how can i make the engine to return the exploded string as i wrote above?
I just need "Google.com" to be foundable by "google".
Unfortunately, there is no quick and easy solution.
Denis is correct in that the parser is recognizing it as a hostname, which is why it doesn't break it up.
There are 3 other things you can do, off the top of my head.
You can disable the host parsing in the database. See postgres documentation for details. E.g. something like ALTER TEXT SEARCH CONFIGURATION your_parser_config
DROP MAPPING FOR url, url_path
You can write your own custom dictionary.
You can pre-parse your data before it's inserted into the database in some manner (maybe splitting all domains before going into the database).
I had a similar issue to you last year and opted for solution (2), above.
My solution was to write a custom dictionary that splits words up on non-word characters. A custom dictionary is a lot easier & quicker to write than a new parser. You still have to write C tho :)
The dictionary I wrote would return something like 'www.facebook.com':4, 'com':3, 'facebook':2, 'www':1' for the 'www.facebook.com' domain (we had a unique-ish scenario, hence the 4 results instead of 3).
The trouble with a custom dictionary is that you will no longer get stemming (ie: www.books.com will come out as www, books and com). I believe there is some work (which may have been completed) to allow chaining of dictionaries which would solve this problem.
First off in case you're not aware, tsearch2 is deprecated in favor of the built-in functionality:
http://www.postgresql.org/docs/9/static/textsearch.html
As for your actual question, google.com gets recognized as a host by the parser:
http://www.postgresql.org/docs/9.0/static/textsearch-parsers.html
If you don't want this to occur, you'll need to pre-process your text accordingly (or use a custom parser).

Newbie issue with LINQ in vb.net

Here is the single line from one of my functions to test if any objects in my array have a given property with a matching value
Return ((From tag In DataCache.Tags Where (tag.FldTag = strtagname) Select tag).Count = 1)
WHERE....
DataCache.Tags is an array of custom objects
strtagname = "brazil"
and brazil is definitely a tag name stored within one of the custom objects in the array.
However the function continually returns false.
Can someone confirm to me that the above should or should not work.
and if it wont work can someone tell me the best way to test if any of the objects in the array contain a property with a specific value.
I suppose in summary I am looking for the equivalent of a SQL EXISTS statement.
Many thanks in hope.
Your code is currently checking whether the count is exactly one.
The equivalent of EXISTS in LINQ is Any. You want something like:
Return DataCache.Tags.Any(Function(tag) tag.FldTag = strtagname)
(Miraculously it looks like that syntax may be about right... it looks like the docs examples...)
Many Thanks for the response.
Your code did not work. Then I realised that I was comparing to an array value so it would be case sensitive.
However glad I asked the question, as I found a better way than mine.
Many thanks again !