RavenDB DocumentId with whitespaces - ravendb

Is it recommended to have white spaces in a documentId when using ravenDB: for example Job\E15E83C2-7C00-491D-8EAE-DD8B4ED6DA77\file one.pdf
Can anyone think of any problems?

Spaces are supported, but it is easier to avoid them if you can.

Related

How to wrap multiple lines with quotes each?

This question is the same as this one which has an accepted answer that doesn't solve the problem. I have:
line1
line2
line3
and the desired result is:
'line1'
'line2'
'line3'
The issue is not solved with multiple cursors as for more than 2 lines, it would require too much work to place n * 2 cursors around n lines which is exactly what I'm trying to avoid. The suggested solution in the question mentioned earlier is to write a plugin in Java that achieves the desired outcome which I regard as an overkill and I'm certain there are built-in features that would achieve the same thing.
It seems like a regex search and replace may work here, if you don't need to use this too often. Enable the In Selection option (and the Regex option of course) and search for (^.*$), and replace with '$1'.

Lucene Tag Searching problems with C#, escape problems?

I am using lucene 2.9.2 (.NET doesnt have a lucene 3)
"tag:C#" Gets me the same results as "tag:c". How do i allow 'C#' to be a searchword? i tried changing Field.Index.ANALYZED to Field.Index.NOT_ANALYZED but that gave me no results.
I assuming i need to escape each tag, how might i do that?
The problem isn't the query, its the query analyzer you are using which is removing the "#" from both the query and (if you are using the same analyzer for insertion - which you should be) and the field.
You will need to find an analyzer that preserves special characters like that or write a custom one.
Edit: Check out KeywordAnalyzer - it might just do the trick:
"Tokenizes" the entire stream as a single token. This is useful for data like zip codes, ids, and some product names.
According to the Java Documentation for Lucene 2.9.2 '#' is not a special character, which needs escaping in the Query. Can you check out (i.e. by opening the index with Luke), how the value 'C#' is actually stored in the index?

not query in lucene

i need to do not queries on my lucene index. Lucene currently allows not only when we have two or more terms in the query:
So I can do something like:
country:canada not sweden
but I can't run a query like:
country:not sweden
Could you please let me know if there is some efficient solution for this problem
Thanks
A very late reply, but it might be useful for somebody else later:
*:* AND NOT country:sweden
IF I'm not mistaken this should do a logical "AND" with all documents and the documents with a country that is different from "sweden".
Try with the following query in the search box:
NOT message:"warning"
message being the search field
Please check answer for similar question. The solution is to use MatchAllDocsQuery.
The short answer is that this is not possible using the standard Lucene.
Lucene does not allow NOT queries as a single term for the same reason it does not allow prefix queries - to perform either, the engine would have to look through each document to ascertain whether the document is/is not a hit. It has to look through each document because it cannot use the search term as the key to look up documents in the inverted index (used to store the indexed documents).
To take your case as an example:
To search for not sweden, the simplest (and possibly most efficient) approach would be to search for sweden and then "invert" the result set to return all documents that are not in that result set. Doing this would require finding all the required (ie. not in the result set) documents in the index, but without a key to look them up by. This would be done by iterating over the documents in the index - a task it is not optimised for, and hence speed would suffer.
If you really need this functionality, you could maintain your own list of items when indexing, so that a not sweden search becomes a sweden search using Lucene, followed by an inversion of the results using your set of items.
OK, I see what you are trying to do.
You can use it as a query refinement since there are no unary Boolean operators in Lucene. Despite the answers above, I believe this is a better and most forward approach (note the space before the wildcard):
&query= *&qf=-country:Canada

Lucene.Net support phrases?: What is best approach to tokenize comma-delimited data (atomically) in fields during indexing?

I have a database with a column I wish to index that has comma-delimited names, e.g.,
User.FullNameList = "Helen Ready, Phil Collins, Brad Paisley"
I prefer to tokenize each name atomically (name as a whole searchable entity). What is the best approach for this?
Did I miss a simple option to set
the tokenize delimiter?
Do I have
to subclass or write my own class
that to roll my own tokenizer?
Something else? ;)
Or does Lucene.net not support phrases?
Or is it smart enough to handle this use case automatically?
I'm sure I'm not the first person to have to do this. Googling produced no noticeable solutions.
*** EDIT: using my example, I want to store these name phrases in a single field:
Helen Ready
Phil Collins
Brad Paisley
NOT these individual words:
Helen
Ready
Phil
Collins
Brad
Paisley
Edit:
Having read your clarification, here is hopefully a more relevant answer:
You did not miss an option to modify the separator character.
You do need to roll your own tokenizer. I suggest you subclass CharTokenizer. You need to define isTokenChar() according to your spec, meaning that anything but a comma is a token char.
You can split the string by comma yourself, and either --
Index each name using the Keyword analyzer (non-tokenized)
OR index each name using the standard analyzer, and wrap your searches in quotes. Make sure to index a dummy term in between each name so that "Ready Phil" doesn't match the document

Lucene search and underscores

When I use Luke to search my Lucene index using a standard analyzer, I can see the field I am searchng for contains values of the form MY_VALUE.
When I search for field:"MY_VALUE" however, the query is parsed as field:"my value"
Is there a simple way to escape the underscore (_) character so that it will search for it?
EDIT:
4/1/2010 11:08AM PST
I think there is a bug in the tokenizer for Lucene 2.9.1 and it was probably there before.
Load up Luke and try to search for "BB_HHH_FFFF5_SSSS", when there is a number, the following tokens are returned:
"bb hhh_ffff5_ssss"
After some testing, I've found that this is because of the number. If I input
"BB_HHH_FFFF_SSSS", I get
"bb hhh ffff ssss"
At this point, I'm leaning towards a tokenizer bug unless the presence of the number is supposed to have this behavior but I fail to see why.
Can anyone confirm this?
It doesn't look like you used the StandardAnalyzer to index that field. In Luke you'll need to select the analyzer that you used to index that field in order to match MY_VALUE correctly.
Incidentally, you might be able to match MY_VALUE by using the KeywordAnalyzer.
I don't think you'll be able to use the standard analyser for this use case.
Judging what I think your requirements are, the keyword analyser should work fine for little effort (the whole field becomes a single term).
I think some of the confusion arises when looking at the field with luke. The stored value is not what's used by queries, what you need are the terms. I suspect that when you look at the terms stored for your field, they'll be "my" and "value".
Hope this helps,