elasticsearch update by JOIN like SQL using logstash - sql

Create two indices in elasticsearch parent and child
PUT parent/car/sedan
{
"type": "sedan",
"details": {
"wheels": 4,
"doors": 4,
"seats": 5,
"fuel": "gasoline"
}
}
PUT child/toyota/corolla
{
"color": "white",
"type": "sedan",
"details": {
"wheels": 4,
"doors": 4,
"seats": 5,
"fuel": "gasoline"
}
}
SQL UPDATE by JOIN (the corresponding SQL version that we'll perform on elasticsearch using logstash)
update CHILD.doors = PARENT.doors
from PARENT, CHILD
where PARENT.type = CHILD.type
ELASTICSEARCH UPDATE by JOIN (execute logstash with the logstash.conf as mentioned below)
input {
elasticsearch {
docinfo => true
hosts => ["127.0.0.1:9200"]
user => "admin"
password => "pass"
index => "child"
query => '{ "query": { "match": { "type": "sedan" } } }'
}
}
filter {
mutate {
remove_field => ["message","#version","#timestamp"]
}
elasticsearch {
hosts => ["127.0.0.1:9200"]
user => "admin"
password => "pass"
index => "parent"
query => "type:sedan"
fields => { "details.doors" => "parent_doors"
"details.seats" => "parent_seats"
"type" => "parent_type"
}
}
prune {
whitelist_names => ["color","type", "details","parent_doors","parent_seats","parent_type"]
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["127.0.0.1:9200"]
user => "admin"
password => "pass"
index => "%{[#metadata][_index]}"
document_type => "%{[#metadata][_type]}"
document_id => "%{[#metadata][_id]}"
action => "update"
doc_as_upsert => true
script_lang => "painless"
script => "if ( ctx._source.type == '%{parent_type}' ) { ctx._source.details.doors = %{parent_doors} }"
}
}
This Works. If you have a better way of achieving the same, please do let us know.

Related

How to filter with a nested document based on multiple terms?

I am trying to replicate this DSL query in NEST. Basically a structured filter that will return all of the products that have the color red.
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "keywordFacets",
"query": {
"bool": {
"filter": [
{ "term": { "keywordFacets.name": "color" } },
{ "term": { "keywordFacets.value": "Red" } }
]
}
}
}
}
]
}
}
}
Here is the POCO with attribute mapping.
[ElasticsearchType]
public class Product
{
[Keyword]
public long ProductId { get; set; }
[Nested]
public List<KeywordFacet> KeywordFacets { get; set; }
// other properties...
}
[ElasticsearchType]
public class KeywordFacet
{
[Keyword]
public string Name { get; set; }
[Keyword]
public string Value { get; set; }
}
I can't figure out how to get the two terms inside the nested filter array. This is my failed attempt so far:
var searchRequest = new SearchDescriptor<Product>()
.Query(q => q
.Bool(b => b
.Filter(bf => bf
.Nested(nq => nq
.Path(nqp => nqp.KeywordFacets)
.Query(qq => qq
.Bool(bb => bb
.Filter(ff => ff
.Term(t => t
.Field(p => p.KeywordFacets.First().Name)
.Value("color")
.Field(p2 => p2.KeywordFacets.First().Value).Value("Red")))))))));
Here is some sample data that is returned when I run the DSL query in Postman:
{
"productId": 183150,
"keywordFacets": [
{
"name": "color",
"value": "Red"
},
{
"name": "color",
"value": "Blue"
},
{
"name": "color",
"value": "Grey"
}
]
}
Here's the proper syntax after fumbling around for a while.
var searchRequest = new SearchDescriptor<Product>()
.Query(q => q
.Bool(b => b
.Filter(bf => bf
.Nested(nq => nq
.Path(nqp => nqp.KeywordFacets)
.Query(qq => qq.Bool(bb => bb
.Filter(ff => ff
.Term(t => t
.Field(p => p.KeywordFacets[0].Name).Value("color"))))
&& qq.Bool(bb => bb
.Filter(ff => ff
.Term(t => t
.Field(p2 => p2.KeywordFacets[0].Value).Value("Red"))))
)
)
)
)
);

Can't get Data via Graphql from Couchdb Server

I am using Couchdb server running in a Docker Build on my Mac.
To access the Data I run an express app via node.
After Building my Schema for the Graphql and test it with the Graphql Interface, I am able to get the Data from a specific User but not the Document for all the Users from my users resolver.
Here is my Code:
//resolvers
var resolvers = {
users: () => {
var statement = "SELECT META(user).id, user.* FROM `" + bucket._name + "` AS user WHERE user.type = 'user'";
var query = Couchbase.N1qlQuery.fromString(statement);
return new Promise((resolve, reject) => {
bucket.query(query, (error, result) => {
if(error) {
return reject(error);
}
resolve(result);
});
});
},
user: (data) => {
var id = data.id;
return new Promise((resolve, reject) => {
bucket.get(id, (error, result) => {
if (error) {
return reject(error);
}
resolve(result.value);
});
});
}
};
//express setup
app.use("/graphql", ExpressGraphQL({
schema: schema,
rootValue: resolvers,
graphiql: true
}));
app.listen(3000, () => {
console.log("lisntening...")
});
//Schema
var schema = BuildSchema(`
type Query{
user(id: String!): User,
users: [User]
}
type User{
id: String,
profileImage: String,
birthdate: String,
reviews: [String],
premium: Boolean
}
`);
When I try my query like this:
{
users{
id
}
}
i get the following error msg returned:
{
"errors": [
{
"message": "syntax error - at user",
"locations": [
{
"line": 31,
"column": 3
}
],
"path": [
"users"
]
}
],
"data": {
"users": null
}
}
The specific user query from the second resolver works fine!
Here my test document in the Database:
{
"id": "ohuibjnklmönaio",
"profileImage": "pic",
"birthdate": "date",
"reviews": [
"a",
"b"
],
"premium": true,
"type": "user"
}

Nest serialize elasticsearch Terms Query request

I'm using NEST 2.0.2 to query ElasticSearch.
Really great API, thanks for the effort, but needs documentation update I think.
Anyways,
I want to serialize my request. I could not find any info, there are some stackoverflow questions but it's about older versions, and api changed.
I want to write a "terms query". But could not succeed.
The working sense DSL is below.
GET myindex/mytype/_search?search_type=count
{
"query": {
"bool": {
"must": [
{
"term": {
"field1": {
"value": 2
}
}
}
],
"must_not": [
{
"terms": {
"field2": [
16,
17,
18,
19
]
}
}
]
}
},
"aggs": {
"termsAggField2": {
"terms": {
"field": "field2",
"size": 20
},
"aggs": {
"sumAggField3": {
"sum": {
"field": "field3"
}
}
}
}
}
}
And the terms query code is below. DSL works in sense, but the query does not working. The "not in" does not filter the output.
List<QueryContainer> must_not = new List<QueryContainer>();
must_not.Add(Query<mytype>.Terms(trms => trms.Terms(new string[] { "16", "17", "18", "19" })));
var resultTermsSum = b1.ElasticClient.Search<mytype>(q=>q.SearchType(SearchType.Count)
.Query(q2 => q2.Bool(
b => b.MustNot(must_not.ToArray())
)
)
.Aggregations(a => a.Terms("termsAggField2", terms => terms.Field("field2").Size(20)
.Aggregations(a2 => a2.Sum("sumAggField3", sum => sum.Field("field3"))))));
ie why I want to see the serialized request and see my problem.
thanks.
regards.
Edit: It's now working with the following update. It'd be great if I could serialize ;)
List<QueryContainer> must_not = new List<QueryContainer>();
short [] valueCollection = new short[] { 16, 19, 99, 100 };
must_not.Add(Query<mytpe>.Terms(trms => trms.Field("field2").Terms(valueCollection)));
var resultTermsSum = b1.ElasticClient.Search<mytype>(q=>q.SearchType(SearchType.Count)
.Query(q2 => q2.Bool(
b => b.MustNot(must_not.ToArray())
)
)
.Aggregations(a => a.Terms("termsAggField2", terms => terms.Field("field2").Size(20)
.Aggregations(a2 => a2.Sum("sumAggField3", sum => sum.Field("field3"))))));

How to resolve ELK Stack Mapping Conflict for apache access combined logs

I am trying to learn ELK stack where i have staretd with indexing apache access logs, i have Logstash 1.4.2,Elasticseach 1.5.1 and kiabna 4.0.2 for windows. Following are my configurtion files. for mapping at elasticsearch i have used
curl -XPOST localhost:9200/apache_access?ignore_conflicts=true -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"apache" : {
"properties" : {
"timestamp" : {"type":"date", "format" : "DD/MMM/YYYY:HH:mm:ss" },
"bytes": {"type": "long"},
"response":{ "type":"long"},
"clientip":{ "type": "ip"},
"geoip" : { "type" : "geo_point"}
}
}
}
}'
and my logstash-apache.conf is
input {
file {
path => "D:\data\access_log1.log"
start_position => beginning
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
geoip{
source => "clientip"
target => "geoip"
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ,"ISO8601"]
}
}
output {
elasticsearch {
host => "localhost"
protocol => http
index => "apache_access"
}
stdout { codec => rubydebug }
}
what i am facing is , for the fields for which i have applied mapping in elasticsearch i.e bytes,response,clientip i am getting conflict. i understand what is happening, as it says these fields have string and long both as field type. but i dont understand why it is happening as i have applied mapping. i would also like to resolve this issue. any help is appreciated.

Using has_key in array in rails from multi-level JSON data

I have a set of JSON data that has multiple levels of data. Snippet of the JSON (ignore the missing formatting please):
DATA SET A:
"interaction": {
"author": {
"username": "johndoe",
"name": "John Doe"
}
},
"gender": "male"
DATA SET B:
"interaction": {
"author": {
"name": "Jane Doe"
}
},
"gender": "male"
In a single-level I can use:
if record.has_key?('gender')
and that will return a true/false value if the key is present.
If I try to seed data and the key isn't present, it will throw an error and stop seeding.
My question: How would I check to see if the "username" key exists. Data Set B, for example, doesn't have a username key and would throw an error, but I can't figure out how to modify the has_key() command to check for a few levels down.
Thanks for the help.
I've decided to work around the has_key method and used a begin/rescue/end.
begin
#data.username = record['interaction']['author']['username']
rescue
#data.username = nil
end
Here's a possibility:
# where a and b are loaded from JSON
a = {
'interaction' => {
'author' => {
'username' => 'johndoe',
'name' => 'jd'
}
}
}
b = {
'interaction' => {
'author' => {
'oijoijoij' => 'johndoe',
'name' => 'jd'
}
}
}
class Hash
def recursive_has_key?(key)
has_key?(key) or values.any? { |v| v.is_a?(Hash) and v.recursive_has_key?(key) }
end
end
puts a.recursive_has_key?('username')
puts b.recursive_has_key?('username')
Which outputs
$ ruby foo.rb
true
false