I have the following problem:
I have an index of 30 million documents the mapping as follows:
curl -XPUT localhost:8080/xxxxx/yyyyy/_mapping?pretty=true -d '{"xxxxx":{"_id":{"type":"string","index":"not_analyzed"},"properties":{"content":
{"type":"string","store":"no"},"title":{"type":"string","index":"no"},"created_date":{"type":"integer","index":"not_analyzed"},"url":
{"type":"string","index":"not_analyzed"},"author":{"type":"string","index":"no"},"author_url":{"type":"string","index":"no"},"domain":
{"type":"string","index":"not_analyzed"},"lang":{"type":"string","index":"no"}}}}'
Tokenizer is not selected in the settings, so apply a standard.
I would like to request "facets" to create ranking links(url) in field "content". Unfortunately I can not do that because the standard tokenizer shared links (url) to pieces.
Question:
Can an existing index without reindexing change the tokenizer, so that new documents added to the index handle the new tokenizer (uax_url_email) and old documents remain unchanged.
I tried that:
curl -XPUT localhost:8080/xxxxx -d '{
"settings" : {
"index": {
"analysis" :{
"analyzer": {
"default": {
"type" : "custom",
"tokenizer" : "uax_url_email",
"filter" : "lowercase"
}
}
}
}
}
}
'
but I get an error:
{"error": "IndexAlreadyExistsException [[xxxxx] Already exists]", "status": 400}
Is there another way to not reindex with query "facets" to create ranking links (url)?
Thank you in advance of any help
Try next, for existing index "xxxxx"
curl -XPUT localhost:8080/xxxxx/_settings -d '{
"analysis" :{
"analyzer": {
"default": {
"type" : "custom",
"tokenizer" : "uax_url_email",
"filter" : "lowercase"
}
}
}
}
Be sure your elasticseach port is 8080, by default it 9200
Related
I'm trying to create new content page from confluence REST api.
Any page created with REST API show up in old editor view. I have tried editor2 option instead of storage and also i tried with metadata(Example), but no luck.
Is there any solution by which i can create page in new editor (v2) from REST API of confluence.
Found this example on Atlassian.net and it works for me:
curl -u 'user:apitoken' -H 'content-type: application/json' \
'https://hello.atlassian.net/wiki/rest/api/content' \
-d '{
"type": "page",
"title": "Page created via API",
"space": {
"key": "TEST"
},
"body": {
"storage": {
"value": "<h1>Test page</h1>",
"representation": "storage"
}
},
"metadata": {
"properties": {
"editor": {
"value": "v2"
}
}
}
}'
I'm new to elastic search, and I'm having a hard time with the analyzers.
I am creating an index like this (to replicate my problem, you can copy and paste the follwoing code in your console directly.)
Please read comments in the script for my problem and questions.
#!/bin/bash
# fails if the index doesn't exist but that's OK
curl -XDELETE 'http://localhost:9200/movies/'
# creating the index that will allow type wrapper, and generate _id automatically from the path
curl -XPOST http://localhost:9200/movies -d '{
"settings" : {
"number_of_shards" : 1,
"mapping.allow_type_wrapper" : true,
"analysis": {
"analyzer": {
"en_std": {
"type":"standard",
"stopwords": "_english_"
}
}
}
},
"mappings" : {
"movie" : {
"_id" : {
"path" : "movie.id"
}
}
}
}'
# inserting some data
curl -XPOST http://localhost:9200/movies/movie -d '{
"movie" : {
"id" : 101,
"title" : "Bat Man",
"starring" : {
"firstname" : "Christian",
"lastname" : "Bale"
}
}
}'
#trying to get by ID ... \m/ works!!!
curl -XGET http://localhost:9200/movies/movie/101
# tryign to search using query_string ... \m/ works
curl -XPOST http://localhost:9200/movies/movie/_search -d '{
"query" : {
"query_string" : {
"query" : "bat"
}
}
}'
# when i try to search in a paricular field it fails. returns 0 hits
curl -XPOST http://localhost:9200/movies/_search -d '{
"query" : {
"query_string" : {
"query" : "bat",
"fields" : ["movie.title"]
}
}
}'
#I thought the analyzer was the problem, so i checked.
curl 'http://localhost:9200/movies/movie/_search?pretty=true' -d '{
"query" : {
"query_string" : {
"query" : "bat"
}
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "movie.title"
}
}
}
}'
# The field wasn't analyzed.
# the follwoing is the result
#{
# "took" : 1,
# "timed_out" : false,
# "_shards" : {
# "total" : 1,
# "successful" : 1,
# "failed" : 0
# },
# "hits" : {
# "total" : 1,
# "max_score" : 0.13424811,
# "hits" : [ {
# "_index" : "movies",
# "_type" : "movie",
# "_id" : "101",
# "_score" : 0.13424811,
# "fields" : {
# "terms" : [ "Bat Man" ]
# }
# } ]
# }
#}
# So i even tried the term as such... Nope didn't work :( 0 hits.
curl -XPOST http://localhost:9200/movies/_search -d '{
"query" : {
"query_string" : {
"query" : "Bat Man",
"fields" : ["movie.title"]
}
}
}'
Can anyone point out what i'm doing wrong?
You should insert a sleep 1 command right after inserting the doc and everything will work.
Elasticsearch provides search in near real-time (read this). Everytime you index a document, the Lucene index is not updated (refreshed in terms of Elasticsearch) immediately. How frequently your index is refreshed is configurable on Index level. You can also forcefully refresh the index by passing the query parameter refresh=true with every HTTP request, which will make ES update the index. But you may start suffering on the performance because of that depending upon your requirement.
There is a Refresh API as well.
I am facing problem with elastic search, i am using query to search data from document. following is the query to search single data from document.
"query": {
"filtered": {
"query": {
"query_string": {
"query": "'.$lotnumber.'",
"fields": ["LotNumber"]
}
}
}
}
}'
It is working fine for simple value but if $lotnumber contains any value with hash in between then it is showing all the data from document.any one here who can help me to resolve problem of searching exact value from document with hash value ??
The first things that I would think of in this case is make the field lotnumber not-analyzed in your mapping. That should do the trick.
In your mapping
"album": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
I read couple of similar problems on SO and suggest solution not work..
I want to find all fields where word is shorter than 8
my database screen:
I tried to do this using this query
{
"query": {
"match_all": {}
},
"filter": {
"script": {
"script": "doc['word'].length < 5"
}
}
}
what I doing wrong? I miss something?
Any field used in a script is loaded entirely into memory (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_document_fields), so you may want to consider an alternative approach.
You can e.g. use the regexp-filter to just find terms of a certain length, with a pattern like .{0,4}.
Here's a runnable example you can play with: https://www.found.no/play/gist/2dcac474797b0b2b952a
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"word":"bar"}
{"index":{"_index":"play","_type":"type"}}
{"word":"barf"}
{"index":{"_index":"play","_type":"type"}}
{"word":"zip"}
'
# Do searches
# This will not match barf
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"filtered": {
"filter": {
"regexp": {
"word": {
"value": ".{0,3}"
}
}
}
}
}
}
'
I have a document like below, the "tags" field is a nested document, and I want to make all child field for tags document to be index = not_analyzed. The problem is that field in tags will be dynamic. any tag could possible.
So how I can define dynamic mapping for this.
{
strong text'level': 'info',
'tags': {
'content': u'Nov 6 11:07:10 ja10 Keepalived_healthcheckers: Adding service [172.16.08.105:80] to VS [172.16.1.21:80]',
'id': 1755360087,
'kid': '2012121316',
'mailto': 'yanping3,chunying,pengjie',
'route': 15,
'service': 'LVS',
'subject': 'LVS_RS',
'upgrade': 'no upgrade configuration for this alert'
},
'timestamp': 1383707282.500464
}
I think you can use dynamic templates for this. For example following shell script creates dynamic_mapping_test index with dynamic template set when indexing field tags.*, mapping is set to type:string and index:not_analyzed.
echo "Delete dynamic_mapping_test"
curl -s -X DELETE http://localhost:9200/dynamic_mapping_test?pretty ; echo ""
echo "Create dynamic_mapping_test with nested tags and dynamic_template"
curl -s -X POST http://localhost:9200/dynamic_mapping_test?pretty -d '{
"mappings": {
"document": {
"dynamic_templates": [
{
"string_template": {
"path_match": "tags.*",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
],
"properties": {
"tags": {
"type": "nested"
}
}
}
}
}' ; echo ""
echo "Display mapping"
curl -s "http://localhost:9200/dynamic_mapping_test/_mapping?pretty" ; echo ""
echo "Index document with new property tags.content"
curl -s -X POST "http://localhost:9200/dynamic_mapping_test/document?pretty" -d '{
"tags": {
"content": "this CONTENT should not be analyzed"
}
}' ; echo ""
echo "Refresh index"
curl -s -X POST "http://localhost:9200/dynamic_mapping_test/_refresh"
echo "Display mapping again"
curl -s "http://localhost:9200/dynamic_mapping_test/_mapping?pretty" ; echo ""
echo "Index document with new property tags.title"
curl -s -X POST "http://localhost:9200/dynamic_mapping_test/document?pretty" -d '{
"tags": {
"title": "this TITLE should not be analyzed"
}
}' ; echo ""
echo "Refresh index"
curl -s -X POST "http://localhost:9200/dynamic_mapping_test/_refresh"; echo ""
echo "Display mapping again"
curl -s "http://localhost:9200/dynamic_mapping_test/_mapping?pretty" ; echo ""
I suggest, all string "not_analyzed", and all numbers to long and "not_analyzed".
Because default string analyzed have more memory and file size.
I have reduced size and search fields' full word
range search long type.
{
"mappings": {
"_default_": {
"_source": {
"enabled": true
},
"_all": {
"enabled": false
},
"_type": {
"index": "no",
"store": false
},
"dynamic_templates": [
{
"el": {
"match": "*",
"match_mapping_type": "long",
"mapping": {
"type": "long",
"index": "not_analyzed"
}
}
},
{
"es": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
I don't think there is any way to specify mapping while indexing the data. So, as an alternative, you can modify your tags document to have the following mapping:
{ tags: {
properties: {
tag_type: {type: 'string', index: 'not_analyzed'}
tag_value: {type: 'string', index: 'not_analyzed'}
}
}
}
Here, tag_type can contain the any of the values (content, id, kid, mailto, etc.), and tag_values can contain the actual value of the field that is named in tag_type.