search a analyzed field through the stored original value in elasticsearch - lucene

In elastic search I have a field that is analyzed and I am also storing the original value. I want to search the field with the stored value not the analyzed value.
Is there any way to do it?
note: I cannot make the field not_analyzed, because I am searching the analyzed values also.

Take a look at the multi fields type, which will allow to two store the field both analyzed for full text search and not_analyzed for exact matches.

Related

How to use Lucene Luke for testing search results on more than one field?

I am using Lucene Luke to test search index results and noticed that I cannot select more than one field in 'Default field' drop down list. Is this by design or we cannot use Luke tool for searching against multiple fields?
Basically I would like to know SOLR qf(query field) equivalent in Lucene.
Thanks
You can search using format field:query.
For details see: https://lucene.apache.org/core/8_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package.description
Lucene supports fielded data. When performing a search you can either
specify a field, or use the default field. The field names and default
field is implementation specific.
You can search any field by typing the field name followed by a colon
":" and then the term you are looking for.
As an example, let's assume a Lucene index contains two fields, title
and text and text is the default field. If you want to find the
document entitled "The Right Way" which contains the text "don't go
this way", you can enter:
title:"The Right Way" AND text:go or
title:"The Right Way" AND go Since text is the default field, the
field indicator is not required.
Note: The field is only valid for the term that it directly precedes,
so the query
title:The Right Way Will only find "The" in the title field. It will
find "Right" and "Way" in the default field (in this case the text
field).

Search in all columns in crate

As elastic search has _all field I am not able to find anything regarding that in cratedb. SO do we need to maintain our own analyzed field for that purpose or does crate provide something in built?
The _all field is a special catch-all field which concatenates the values of all of the other fields into one big string, using space as a delimiter, which is then analyzed and indexed, but not stored. This means that it can be searched, but not retrieved.
The _all field allows you to search for values in documents without knowing which field contains the value. This makes it a useful option when getting started with a new dataset
refer : https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
We don't have something similar to that, so you'd need to add it to the query or maintain a dedicated column.

Lucene - Expected behavior when indexing multiple occurrences of a token within a field

Lets say that I'm indexing a string value "useridA;useridB,userdidC,useridA,useridA"
The field is set to ANALYZED and uses a custom CharTokenizer which looks for a boundary comma char.
What is the expected behavior in the index, as the token "useridA" occurs multiples times within the same field?
Will it just re-index the same value an preserve the same space as if it would have been just one occurrence?
At the basic level lucene is an "inverted term index" it stores term->docID. So if a term occurs many times it'll only be recorded once.
Obviously this is a huge simplification. Positional information will also be stored depending on the TermVector value used when adding the field (you will need this to use phrase and slop queries).
Depending only your use-case I'd suggest you de-dupe the list either when indexing or just use a HashSet< string> for that property of whatever your class is.

Luke Where are my field values?

I've used Luke like four times per year for the past three years. I only break it out when I need it. One concept I've never understood is why only certain fields' values are displayed. I can query these "empty" fields for expected values and get the expected results, but Luke never displays these. I assume I'm missing something fundamental and obvious, but it's not so obvious to me.
Example Search tab:
Example Documents tab:
When a program creates a Lucene Document, it might tell Lucene whether to store the value of the field or not. See, for example, the stored argument to the StringField constructor. If the value is not stored then it can be searched on, but the original bytes of the value are not saved in the index, since they are not required nor used by the search.
A typical pattern with, say, http://www.elasticsearch.org/ is to store the original JSON in a single field and not to store the actually indexed fields. That way the application working with the retrieved data might use it's native data format and does not have to be aware of the Lucene and it's flat key-value Document.

Untokenized field in Lucene search

I have stored a field in index file which is untokenized. When I try to get that field value from the index file I'm not able to do get it.
Note: I have another one untokenized field, there I'm able to get that value, the data stored in this field are not having any white spaces among this.
Example: (smith,david,walter,john)... But the one I'm asking is having white spaces among it. Example: (david smith,mark john,bill man)...
I don't think this might be the reason.
Your help is appreciated.
Remember that tokenization or lack of it has to be done both while indexing and searching.
Did you try using a keywordTokenizer in the search side?