How to query and list all types within an elasticsearch index? - indexing

Problem: What is the most correct way to simply query for and list all types within a specific index (and all indices) in elasticsearch?
I've been reading through the reference and API but can't seem to find anything obvious.
I can list indices with the command:
$ curl 'localhost:9200/_cat/indices?v'
I can get stats (which don't seem to include types) with the command:
$ curl localhost:9200/_stats
I'd expect that there'd be a straightforward command as simple as:
$ curl localhost:9200/_types
or
$ curl localhost:9200/index_name/_types
Thanks for any help you can offer.

What you call "type" is actually a "mapping type" and the way to get them is simply by using:
curl -XGET localhost:9200/_all/_mapping
Now since you only want the names of the mapping types, you don't need to install anything, as you can use simply use Python to only get you what you want out of that previous response:
curl -XGET localhost:9205/_all/_mapping | python -c 'import json,sys; indices=json.load(sys.stdin); indices = [type for index in indices for type in indices.get(index).get("mappings")]; print list(indices);'
The Python script does something very simple, i.e. it iterates over all the indices and mapping types and only retrieves the latter's names:
import json,sys;
resp = json.load(sys.stdin);
indices = [type for index in resp for type in indices.get(index).get("mappings")];
print list(indices);'
UPDATE
Since you're using Ruby, the same trick is available by using Ruby code:
curl -XGET localhost:9205/_all/_mapping | ruby -e "require 'rubygems'; require 'json'; resp = JSON.parse(STDIN.read); resp.each { |index, indexSpec | indexSpec['mappings'].each {|type, fields| puts type} }"
The Ruby script looks like this:
require 'rubygems';
require 'json';
resp = JSON.parse(STDIN.read);
resp.each { |index, indexSpec |
indexSpec['mappings'].each { |type, fields|
puts type
}
}

You can just print the index and use the _mapping API so you will see only the section of "mappings" in the index.
For example: curl -GET http://localhost:9200/YourIndexName/_mapping?pretty
You will get something like that:
{
"YourIndexName" : {
"mappings" : {
"mapping_type_name_1" : {
"properties" : {
"dateTime" : {
"type" : "date"
},
"diskMaxUsedPct" : {
"type" : "integer"
},
"hostName" : {
"type" : "keyword"
},
"load" : {
"type" : "float"
},
"memUsedPct" : {
"type" : "float"
},
"netKb" : {
"type" : "long"
}
}
},
"mapping_type_name_2" : {
"properties" : {
"dateTime" : {
"type" : "date"
},
"diskMaxUsedPct" : {
"type" : "integer"
},
"hostName" : {
"type" : "keyword"
},
"load" : {
"type" : "float"
},
"memUsedPct" : {
"type" : "float"
}
}
}
}
}
}
mapping_type_name_1 and mapping_type_name_2 are the types in this index, and you also can see the structure of these types.
Good explanation about mapping_types is here: https://logz.io/blog/elasticsearch-mapping/

private Set<String> getTypes(String indexName) throws Exception{
HttpClient client = HttpClients.createDefault();
HttpGet mappingsRequest = new HttpGet(getServerUri()+"/"+getIndexName()+"/_mappings");
HttpResponse scanScrollResponse = client.execute(mappingsRequest);
String response = IOUtils.toString(scanScrollResponse.getEntity().getContent(), Charset.defaultCharset());
System.out.println(response);
String mappings = ((JSONObject)JSONSerializer.toJSON(JSONObject.fromObject(response).get(indexName).toString())).get("mappings").toString();
Set<String> types = JSONObject.fromObject(mappings).keySet();
return types;
}

Related

How to get specific stub on multiple matching URL with different queryParameter

I have 2 WireMock mappings JSON files with same URL. In the first mapping JSON file, I only have a xDate as query parameter. In the 2nd mapping JSON file, I have the xDate and yType as query parameters.
How do I make the stub such that when I hit the URL with the 2 parameters, it will get the correct mapping/file information.
1st mapping json file:
"request" : {
"customMatcher" : {
"name" : "is-today",
"parameters" : {
"queryParamName" : "xDate",
"dateFormat": "yyyy-MM-dd"
}
},
"urlPathPattern" : "/myUrl",
"method" : "GET"
},
"response" : {
"status" : 200,
"bodyFileName" : "body1.json",
"headers" : {
"Server" : "Apache-Coyote/1.1",
"Content-Type" : "application/json"
}
}
2nd mapping json:
"request" : {
"customMatcher" : {
"name" : "is-today",
"parameters" : {
"queryParamName" : "xDate",
"dateFormat": "yyyy-MM-dd"
}
},
"queryParameters":{
"yType" : {
"equalTo": "Value"
}
},
"urlPathPattern" : "/myUrl",
"method" : "GET"
},
"response" : {
"status" : 200,
"bodyFileName" : "body2.json",
"headers" : {
"Server" : "Apache-Coyote/1.1",
"Content-Type" : "application/json"
}
}
When I was testing it, it always hits the 1st mapping JSON. When I tried to hit the URL with 2 input parameters, it always go to the 1st mapping.
Tried putting the "priority" value on 1st and 2nd mapping file but somehow, its not working properly for me.

Minimize the size of returned json data from spring data repository

I have two microservice, one of them need at boot to load all operator name/codes and index them in a RadixTree.
I am trying to load around 36000 records using feign/data-rest and it is working but I noticed that in the response approximately half of the data size are coming from links
{
"_embedded" : {
"operatorcode" : [ {
"enabled" : true,
"code" : 9320,
"operatorCodeId" : 110695,
"operatorName" : "Afghanistan - Kabul/9320",
"operatorId" : 1647,
"activationDate" : "01-01-2008",
"deactivationDate" : "31-12-2099",
"countryId" : 1,
"_links" : {
"self" : {
"href" : "http://10.44.0.51:8083/operatorcode/110695"
},
"operatorCode" : {
"href" : "http://10.44.0.51:8083/operatorcode/110695{?projection}",
"templated" : true
},
"operator" : {
"href" : "http://10.44.0.51:8083/operatorcode/110695/operator"
}
}
}
...
]
}
Is there any way to stop sending back the _links as in my case it is not being used I tried setting use-hal-as-default-JSON-media-type: false and using projections but did not succeed.
I am not sure that it is a correct way to do this but you can try something like this:
#Bean
public Jackson2ObjectMapperBuilder jacksonBuilder() {
Jackson2ObjectMapperBuilder b = new Jackson2ObjectMapperBuilder();
b.mixIn(Object.class, IgnorePropertiesInJackson.class);
return b;
}
#JsonIgnoreProperties({"_links"})
private abstract class IgnorePropertiesInJackson {
}

What is the default doc sequence of the result from an Elasticsearch filter request?

I recently run an Elasticsearch filter request that is
{
"from" : 0,
"size" : 10,
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : {
"terms" : {
"a_id" : [ 257793, 257798, 257844 ]
}
}
}
}
}
},
"explain" : false,
"fields" : "a_id"
}
So that I can find all docs with a_id in 257793, 257798, 257844 and the results are 257844, 257798, 257793. So far so good.
Then I find that whatever the sequence of the term numbers are, the return docs are always in the same a_id order. That is, even I run
"terms" : {
"a_id" : [257798, 257844, 257793 ]
}
The result docs are in the order of 257844, 257798, 257793 as well.
So I am so curious about the mechanism behind the Elasticsearch filtering. Can anyone help me and give me a hint?
By default, ES returns in descending order of _score. You can provide the sort option, to say in which order and based on what you want the results to be returned. For e.g., for based on date field
{
"sort": { "date": { "order": "desc" }}
"query" : {
"term" : { "user" : "kimchy" }
}
}
You can get more information:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/_sorting.html

Elasticsearch - Extracting PDF content and encoding with base64

I want to be able to extract content from a PDF file and to be able to search within that content using ElasticSearch.
I did install elasticsearch/elasticsearch-mapper-attachments/2.6.0
I have created a new index named "docs".
I did create a file named "tmp.json" with that content :
{"title": "file.pdf", "file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg=="}
I did execute the following :
curl -X PUT "http://localhost:9200/docs/attachment/_mapping" -d '{
"attachment": {
"properties" : {
'file" : {
"type" : "attachment",
"fields" : {
"title" : {"store":"yes"},
"file":{
"type":"string",
"term_vector":"with_positions_offsets",
"store":"yes"}
}
}
}
}
}'
and the following :
curl -X POST "http://localhost:9200/docs/attachment" -d #tmp.json
The problem is that the content is stored as it is in the file.
I was expecting the content to be decoded, like so :
base64.b64decode("IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg==")
That gives :
b'"God Save the Queen" (alternatively "God Save the King"'
To encode in base64, here what I do :
import json, base64
file64 = base64.b64encode(open('file.pdf', "rb").read()).decode('ascii')
f = open('tmp.json', 'w')
data = {"file":file64, "title":fname}
json.dump(data,f)
f.close()
I would like to be able to see the content using kibana (but for now I see only the base64 data ...)
This didn't work :
curl -X PUT "http://localhost:9200/docs/attachment/_mapping" -d '{
"attachment": {
"properties" : {
"content" : {
"type" : "attachment",
"fields" : {
"title" : {"store":"yes"},
"content":{
"type":"string",
"term_vector":"with_positions_offsets",
"store":"yes"}
}
}
}
}
}'
This worked, and I can see the content of the PDF through Kibana :
curl -X PUT "http://localhost:9200/docs" -d '{
"mappings" : {
"attachment" : {
"properties" : {
"content" : {
"type" : "attachment",
"fields" : {
"content" : { "store" : "yes" },
"author" : { "store" : "yes" },
"title" : { "store" : "yes"},
"date" : { "store" : "yes" },
"keywords" : { "store" : "yes", "analyzer" : "keyword" },
"name" : { "store" : "yes" },
"content_length" : { "store" : "yes" },
"content_type" : { "store" : "yes" }
}
}
}
}
}
}'

Elastic Search Index not analyzed

I'm new to elastic search, and I'm having a hard time with the analyzers.
I am creating an index like this (to replicate my problem, you can copy and paste the follwoing code in your console directly.)
Please read comments in the script for my problem and questions.
#!/bin/bash
# fails if the index doesn't exist but that's OK
curl -XDELETE 'http://localhost:9200/movies/'
# creating the index that will allow type wrapper, and generate _id automatically from the path
curl -XPOST http://localhost:9200/movies -d '{
"settings" : {
"number_of_shards" : 1,
"mapping.allow_type_wrapper" : true,
"analysis": {
"analyzer": {
"en_std": {
"type":"standard",
"stopwords": "_english_"
}
}
}
},
"mappings" : {
"movie" : {
"_id" : {
"path" : "movie.id"
}
}
}
}'
# inserting some data
curl -XPOST http://localhost:9200/movies/movie -d '{
"movie" : {
"id" : 101,
"title" : "Bat Man",
"starring" : {
"firstname" : "Christian",
"lastname" : "Bale"
}
}
}'
#trying to get by ID ... \m/ works!!!
curl -XGET http://localhost:9200/movies/movie/101
# tryign to search using query_string ... \m/ works
curl -XPOST http://localhost:9200/movies/movie/_search -d '{
"query" : {
"query_string" : {
"query" : "bat"
}
}
}'
# when i try to search in a paricular field it fails. returns 0 hits
curl -XPOST http://localhost:9200/movies/_search -d '{
"query" : {
"query_string" : {
"query" : "bat",
"fields" : ["movie.title"]
}
}
}'
#I thought the analyzer was the problem, so i checked.
curl 'http://localhost:9200/movies/movie/_search?pretty=true' -d '{
"query" : {
"query_string" : {
"query" : "bat"
}
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "movie.title"
}
}
}
}'
# The field wasn't analyzed.
# the follwoing is the result
#{
# "took" : 1,
# "timed_out" : false,
# "_shards" : {
# "total" : 1,
# "successful" : 1,
# "failed" : 0
# },
# "hits" : {
# "total" : 1,
# "max_score" : 0.13424811,
# "hits" : [ {
# "_index" : "movies",
# "_type" : "movie",
# "_id" : "101",
# "_score" : 0.13424811,
# "fields" : {
# "terms" : [ "Bat Man" ]
# }
# } ]
# }
#}
# So i even tried the term as such... Nope didn't work :( 0 hits.
curl -XPOST http://localhost:9200/movies/_search -d '{
"query" : {
"query_string" : {
"query" : "Bat Man",
"fields" : ["movie.title"]
}
}
}'
Can anyone point out what i'm doing wrong?
You should insert a sleep 1 command right after inserting the doc and everything will work.
Elasticsearch provides search in near real-time (read this). Everytime you index a document, the Lucene index is not updated (refreshed in terms of Elasticsearch) immediately. How frequently your index is refreshed is configurable on Index level. You can also forcefully refresh the index by passing the query parameter refresh=true with every HTTP request, which will make ES update the index. But you may start suffering on the performance because of that depending upon your requirement.
There is a Refresh API as well.