elasticsearch exact match containing hash value - indexing

I am facing problem with elastic search, i am using query to search data from document. following is the query to search single data from document.
"query": {
"filtered": {
"query": {
"query_string": {
"query": "'.$lotnumber.'",
"fields": ["LotNumber"]
}
}
}
}
}'
It is working fine for simple value but if $lotnumber contains any value with hash in between then it is showing all the data from document.any one here who can help me to resolve problem of searching exact value from document with hash value ??

The first things that I would think of in this case is make the field lotnumber not-analyzed in your mapping. That should do the trick.
In your mapping
"album": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}

Related

Can Elasticsearch make suggestions for mapping?

Playing around with Elasticsearch I added a document to my index called "pets", that looks like this:
{
"name" : "Piper",
"type" : "dog"
}
Then I added a second document:
{
"name" : "Max",
"type" : "dog",
"breed": "Scottish Terrier"
}
Now, I understand that the mapping of my "pets" index is initially created based on my first document ( unless i define a mapping at some point ). However, I am curious to know if ES can suggest a mapping based on the existing data ( like MySQL's "Propose table structure" ) or maybe update the mapping automatically.
Yes, ElasticSearch will automatically update the mapping.
Sometimes the language in the ElasticSearch documentation makes it sound like once the mapping is set, it cannot be changed. This is only true for the existing fields. Any additional fields will be automatically assigned a type and added to the mapping.
Remember you can always check the mapping of an index with the get mapping API:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-get-mapping.html
For example, with the example you have above, after your first "pet" document the mapping is:
{
"my_index": {
"mappings": {
"pet": {
"properties": {
"name": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
And after the second "pet" document, your mapping is:
{
"my_index": {
"mappings": {
"pet": {
"properties": {
"breed": {
"type": "string"
},
"name": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
I'm not familiar with MySQL's propose table structure, so I can't comment on that...

How to specify an analyzer while creating an index in ElasticSearch

I'd like to specify an analyzer, name it, and use that name in a mapping while creating an index. I'm lost, my ES instance always returns me an error message.
This is, roughly, what I'd like to do:
"settings": {
"mappings": {
"alfedoc": {
"properties": {
"id": { "type": "string" },
"alfefield": { "type": "string", "analyzer": "alfeanalyzer" }
}
}
},
"analysis": {
"analyzer": {
"alfeanalyzer": {
"type": "pattern",
"pattern":"\\s+"
}
}
}
}
But this does not seem to work; the ES instance always returns me an error like
MapperParsingException[mapping [alfedoc]]; nested: MapperParsingException[Analyzer [alfeanalyzer] not found for field [alfefield]];
I tried putting the "analysis" branch of the dictionary at several places (inside the mapping etc.) but to no avail. I guess a working complete example (which I couldn't find up to now) would help me along as well. Probably I'm missing something rather basic.
"analysis" goes in the "settings" block, which goes either before or after the "mappings" block when creating an index.
"settings": {
"analysis": {
"analyzer": {
"alfeanalyzer": {
"type": "pattern",
"pattern": "\\s+"
}
}
}
},
"mappings": {
"alfedoc": { ... }
}
Here's a good complete, example: Example 1

Indexing with edge NGrams for typeahead

I'm trying to get Elasticsearch to index some documents for typeahead suggestions. As far as I can tell, the edge NGram handling in Elasticsearch is provided by Lucene underneath. Unfortunately, the documentation for Lucene in this regard is proving to be very tough for me to make sense of. The best I have come up with is based on https://gist.github.com/988923, but it doesn't seem to work (the index with these settings only returns matches on full words, as though the settings didn't exist):
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"typeahead_analyzer":{
"type":"custom",
"tokenizer":"edgeNGram",
"filter":["typeahead_ngram"]
}
},
"filter":{
"typeahead_ngram":{
"type":"edgeNGram",
"min_gram":1,
"max_gram":8,
"side":"front"
}
}
}
}
}
}
I really don't know at all how analyzers, tokenizers, and filters go together - do I even want a filter? Should I just have a tokenizer? Do I have to reference these settings when I index the documents for them to be used? How can I find out what settings Lucene underneath is using for a given index? How do I debug this? Help :-)
I solved this using edgeNGram. Below are the mappings and analysis that I used to accomplish this.
{
"analysis": {
"analyzer": {
"str_search_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase"
]
},
"str_index_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"substring"
]
}
},
"filter": {
"substring": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 10,
"side": "front"
}
}
}
}
{
"index_name": {
"properties": {
"location": {
"type": "geo_point"
},
"name": {
"type": "string",
"index": "analyzed",
"search_analyzer": "str_search_analyzer",
"index_analyzer": "str_index_analyzer"
}
}
}
}
An important footnote is that I needed to use a match query with the AND operator to query against this properly.
Hope this helps.

How can I handle duplicate data in Elasticsearch?

I have used parent & child mapping to normalize data but as far as I understand there is no way to get any fields from _parent document.
Here is the mapping of my index:
{
"mappings": {
"building": {
"properties": {
"name": {
"type": "string"
}
}
},
"flat": {
"_parent": {
"type": "building"
},
"properties": {
"name": {
"type": "string"
}
}
},
"room": {
"_parent": {
"type": "flat"
},
"properties": {
"name": {
"type": "string"
},
"floor": {
"type": "long"
}
}
}
}
}
Now, I'm trying to find the best way of storing flat_name and building_name in room type. I won't query these fields but I should be able to get them when I query other fields like floor.
There will be millions of rooms and I don't have much memory so I suspect that these duplicate values may cause out of memory. For now, flat_name and building_name fields are has "index": "no" property and I turned on compression for _source field.
Do you have any efficient suggestion for avoiding duplicate values like querying multiple queries or hacky way to get fields from _parent document or denormalized data is the only way to handle this kindle of problem?

Elasticsearch/Tire text query DSL for excluding certain fields from being searched

I have a elastic search query like the following,
{
"query": {
"bool": {
"must": [
{
"query_string": {
"fields": ["title"],
"query": "test"
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 50,
"sort": [],
"facets": {}
}
I am able to execute an elastic search query on certain fields by giving a fields param to query_string as mentioned above. In my index mapping i have around 50 fields indexed. How do i query for all but one field. Something like an exclude option to query string. Is it possible with Tire/Elastic Search ?
I assumed it cannot be done and proceeded with getting all the mappings and parsing the hash which kinda sucks actually.