Searching within an array in kibana - lucene

I am pushing my logs to elasticsearch which stores a typical doc as-
{
"_index": "logstash-2014.08.11",
"_type": "machine",
"_id": "2tSlN1P1QQuHUkmoJfkmnQ",
"_score": null,
"_source": {
"category": "critical log with list",
"app_name": "attachment",
"stacktrace_array": [
"this is the first line",
"this is the second line",
"this is the third line",
"this is the fourth line",
],
"#timestamp": "2014-08-11T13:30:51+00:00"
},
"sort": [
1407763851000,
1407763851000
]
}
Kibana makes searching substrings very easy. For example searching for "critical" in the dashboard will fetch all logs with the word critical in any string mapped value.
How do i go about searching for something like "second line" which is a string nested in an array within my doc?

It would be a simple field:<search_term> query, like -
"query": {
"query_string": {
"query": "stacktrace_array:*second line*"
}
...
So in layman terms, for Kibana dashboard, put your search query like so -
stacktrace_array:*second line*

Related

Solr Nested Documents: query for parent document which has several specific nested documents

Let's imagine I have document with several nested documents:
{
"id": "doc1",
"type": "maindoc",
"title": "some document 1 title"
"nested": [
{
"id": "nested1",
"nested_type": "nestedType1",
"title": "nested doc 1 title"
},
{
"id": "nested2",
"nested_type": "nestedType2",
"title": "nested doc 2 title"
},
{
"id": "nested3",
"nested_type": "nestedType3",
"title": "nested doc 3 title"
}
]
}
So now if I want to search for document which has nested doc 1 - I do this:
{!parent which='type:maindoc'}
nested_type:nestedType1
But what if I want to search for document which has 2 specific children at the same time?
For example I want to find doc which has both nestedType1 + nestedType2.
Obviously query like this will not work:
{!parent which='type:maindoc'}
nested_type:nestedType1 AND nested_type:nestedType2
So how can I do that? Is that possible at all?
Something like this did the trick in my testing:
({!parent which='type:maindoc' v='nested_type:nestedType1'}) AND ({!parent which='type:maindoc' v='nested_type:nestedType2'})

sql hibernate hql

How can we use regexp in sql query to fetch data(key/value) from a json field.
For more understanding i have a table market inside that book is json having a field as title .
{
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": ["A123"],
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": ["A1235"],
"price": 12.99
}]
The query i have written is :
select * from market where books REGEXP('"title":\s*(\["[A-Za-z0-9]*"\])');
But I do not get output.
there you go - follow the url
MY_Sql_5.7_JSON_FUNCTIONS
JSON_EXTRACT - is the method - what you need -
JSON_EXTRACT(json_doc, path[, path] ...) - is the syntax for it.
On the page you will find the examples also. Check it out.

putting exact matches first in elasticsearch multimatch query

it sure seems that there's no easy way to do this ... how can i ensure that certain fields in my multi match query going to actually be boosted correctly so that exact matches show up at the top?
i honestly seem to have tried this a multitude of ways, but maybe someone knows the answer ...
in my movie and music database, i'm trying to search multiple fields at once, but ensure that exact matches make it to the top and that certain fields such as title and artist name have more boost.
here's the main portion of my query...
"query": {
"bool": {
"should": [
{
"multi_match": {
"type": "phrase_prefix",
"query": "brave",
"max_expansions": 10,
"fields": [
"title^3",
"artists.name^2",
"starring.name^2",
"credits.name",
"tracks^0.1"
]
}
}
],
"minimum_number_should_match": 1
}
}
as you see, the query is 'brave'. it just so happens there's a movie called brave. perfect, i want it at the top - since not only is it an exact match, but the match is in the title. however, there's a popular song called 'brave' from sara bareilles which ends up on top. why?
i've tried every analyzer known to man, custom and otherwise, and i've tried changing the 'type' parameter to every other permutation (phrase, best_fields, cross_fields, most_fields), and it just doesn't seem to honor the fact that i'm effectively trying to either promote 'title' and 'artists.name' and 'starring.name' and DEMOTE 'tracks'.
is there any way i can ensure all exact matches show up at the top (especially in title, etc) followed by expansions, etc?
any suggestions would be helpful.
EDIT
the analyzer i'm currently using which seems to work better than others is a custom one i call 'nameAnalyzer' which is made up of a 'lowercase' filter and 'keyword' tokenizer only.
here's some example documents in the order in which they're appearing in the results:
fields": {
"title": [
"Brave"
],
"credits.name": [
"Kelly MacDonald",
"Emma Thompson",
"Billy Connolly",
"Julie Walters",
"Kevin McKidd",
"Craig Ferguson",
"Robbie Coltrane"
],
"starring.name": [
"Emma Thompson",
"Julie Walters",
"Billy Connolly",
"Kevin Mckidd",
"Kelly Macdonald"
]
,
fields": {
"credits.name": [
"Hilary Weeks",
"Scott Wiley",
"Sarah Sample",
"Debra Fotheringham",
"Dustin Christensen",
"Russ Dixon"
],
"title": [
"Say Love"
],
"artists.name": [
"Hilary Weeks"
],
"tracks": [
"Say Love",
"Another Second Chance",
"It's A Good Day",
"Brave",
"I Found Me",
"Hero",
"Tell Me",
"Where I Am",
"Better Promises",
"Even When"
]
,
fields": {
"title": [
"Brave Little Toaster"
],
"credits.name": [
"Randy Bennett",
"Jim Jackman",
"Randy Cook",
"Judy Toll",
"Jon Lovitz",
"Tim Stack",
"Timothy E. Day",
"Thurl Ravenscroft",
"Deanna Oliver",
"Phil Hartman",
"Jonathon Benair",
"Joe Ranft"
],
"starring.name": [
"Jon Lovitz",
"Thurl Ravenscroft",
"Tim Stack",
"Timothy E. Day",
"Deanna Oliver"
]
},
"fields": {
"title": [
"Braveheart"
],
"credits.name": [
"Bernard Horsfall",
"Martin Dempsey",
"James Robinson",
"Robert Paterson",
"Alan Tall",
"Rupert Vansittart",
"Donal Gibson",
"Malcolm Tierney",
"Sandy Nelson",
"Sean Lawlor"
],
"starring.name": [
"Brendan Gleeson",
"Sophie Marceau",
"Mel Gibson",
"Patrick Mcgoohan",
"Catherine Mccormack"
]
}
maybe someone knows why the second title ... (in this case not sara bareilles as i said before, but) Hillary Weeks - who has a track called 'brave' ... why is it before title 'braveheart' and 'brave little toaster'?
EDIT AGAIN
to further complicate the situation, what if i had a 'rank' field that was a part of my document? i'm finding it very difficult to add that to my _score field using a script score function...
"functions": [
{
"script_score": {
"script": "_score * 1/ doc['rank'].value"
}
}
]

How to get list of statements for a given Wikidata ID?

The only thing I managed to do is this link:
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q568&format=jsonfm
But this produces lots of useless data. What I need is to get all the statements for the given item, but I can't see any of the statements in the query above.
here it will be:
{ "instance of" : "chemical element",
"element symbol" : "Li",
"atomic number" : 3,
"oxidation state" : 1,
"subclass of" : ["chemical element", "alkali metal"]
// etc...
}
Is there an API for this or must I scrape the web page?
The information you want is in your query, except it's hard to decode. For example, this:
"P246": [
{
"id": "q568$E47B8CE7-C91D-484A-9DA4-6153F132997D",
"mainsnak": {
"snaktype": "value",
"property": "P246",
"datatype": "string",
"datavalue": {
"value": "Li",
"type": "string"
}
},
"type": "statement",
"rank": "normal",
"references": …
}
]
means that the “element symbol” (property P246) is “Li”. So, you will need to read all the properties from your query and then find out the name for each of the properties you found.
To get just the statements, you could also use action=wbgetclaims, but it's in the same format as above.

In Elasticsearch, Why do I lose the whole word token when I run a word through an ngram filter?

It seems that if I am running a word or phrase through an ngram filter, the original word does not get indexed. Instead, I only get chunks of the word up to my max_gram value. I would expect the original word to get indexed as well. I'm using Elasticsearch 0.20.5. If I set up an index using a filter with ngrams like so:
CURL -XPUT 'http://localhost:9200/test/' -d '{
"settings": {
"analysis": {
"filter": {
"my_ngram": {
"max_gram": 10,
"min_gram": 1,
"type": "nGram"
},
"my_stemmer": {
"type": "stemmer",
"name": "english"
}
},
"analyzer": {
"default_index": {
"filter": [
"standard",
"lowercase",
"asciifolding",
"my_ngram",
"my_stemmer"
],
"type": "custom",
"tokenizer": "standard"
},
"default_search": {
"filter": [
"standard",
"lowercase"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}'
Then I put a long word into a document:
CURL -XPUT 'http://localhost:9200/test/item/1' -d '{
"foo" : "REALLY_REALLY_LONG_WORD"
}'
And I query for that long word:
CURL -XGET 'http://localhost:9200/test/item/_search' -d '{
"query":
{
"match" : {
"foo" : "REALLY_REALLY_LONG_WORD"
}
}
}'
I get 0 results. I do get a result if I query for a 10 character chunk of that word. When I run this:
curl -XGET 'localhost:9200/test/_analyze?text=REALLY_REALLY_LONG_WORD
I get tons of grams back, but not the original word. Am I missing a configuration to make this work the way I want?
If you would like to keep the complete word of phrase, use a multi-field mapping for the value where you keep one "not analyzed" or with keyword-tokenizer instead.
Also, when searching a field with nGram-tokenized values, you should probably also use the nGram-tokenizer for the search, then the n-character limit will also apply for the search-phrase, and you will get the expected results.