Push from Filbeat to Elasticsearch with custom _type and _id - filebeat

The problem is to push json logs collected by Filebeat to Elasticsearch with defined _type and _id. Default elastic _type is "log" and _id is smth. like "AVryuUKMKNQ7xhVUFxN2".
My log row:
{"unit_id":10001,"node_id":1,"message":"Msg ..."}
Desired record in Elasticsearch:
"hits" : [ {
"_index" : "filebeat",
"_type" : "unit_id",
"_id" : "10001",
...
"_source" : {
"message" : "Msg ...",
"node_id" : 1,
...
}
} ]
I know how to do it with Logstash, just use document_id => "%{unit_id}" and document_type => "unit_id" in the output section. The goal is to use only Filebeat. Because it is a very-light weight solution and no intermediate aggregation is needed here.

You can set a custom _type by using the document_type option in Filebeat.
There is no way to set the _id directly in Filebeat as of version 5.x.
filebeat.prospectors:
- paths: ['/var/log/messages']
document_type: syslog
You could use the Elasticsearch Ingest Node feature to set the _id field. You would need to use a script processor to copy a value from the event into the _id field. Once you have defined your pipeline you would tell Filebeat to send its data to that pipeline using the output.elasticsearch.pipeline config option.

You can now set a custom _id : https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-deduplication.html

Related

Complex MongodbDB query in Mule4

I am trying to make a Mongodb query in Mule with the $in function, but mule says Invalid input '$', expected Namespace or NameIdentifier
have a collection that stores user authorization
{
"_id" : ObjectId("584a0dea073d4c3e976140a9"),
"partnerDataAccess" : [
{
"factoryID" : "Fac-1",
"partnerID" : "Part-1"
}
],
"userID" : "z12",
}
{
"_id" : ObjectId("584f5eba073d4c3e976140ab"),
"partnerDataAccess" : [
{
"factoryID" : "Fac-1",
"partnerID" : "Part-2"
},
{
"factoryID" : "Fac-2",
"partnerID" : "Part-2"
}
],
"userID" : "w12",
}
the flow will submit a userID and partnerID and query the database to see if authorization exist
when I query from Robo 3T, I write queries like this
e.g. user w12 and partner Part-2
db.getCollection('user').find({
userID:"w12", "partnerDataAccess.partnerID": {$in : ["Part-2", "ALL"]}
})
The $in was used because there is the "ALL" setting for admins
but while I try to put the find part into the Mongodb connector, Mule gives error during development and runtime
Hardcoded:
<mongo:find-one-document collectionName="user" doc:name="Find one document" doc:id="a03a6689-6b9d-473c-b8a6-3b8d1e989e38" config-ref="MongoDB_Config">
<mongo:find-query ><![CDATA[#[{
userID:"w12",
"partnerDataAccess.partnerID": {$in : ["Part-2", "ALL"]}
}]]]></mongo:find-query>
</mongo:find-one-document>
parametized
<mongo:find-one-document collectionName="user" doc:name="Find one document" doc:id="a03a6689-6b9d-473c-b8a6-3b8d1e989e38" config-ref="MongoDB_Config">
<mongo:find-query ><![CDATA[#[{
userID: payload.User,
"partnerDataAccess.partnerID": {$in : [ payload.partner, "ALL"]}
}]]]></mongo:find-query>
</mongo:find-one-document>
Error:
during development:
Invalid input '$', expected } or ~ or , (line 3, column 38):
Runtime:
Message : "Script '{
userID:"w12",
"partnerDataAccess.partnerID": {$in : ["Part-2", "ALL"]}
} ' has errors:
Invalid input '$', expected Namespace or NameIdentifier (line 3, column 38):
at 3 : 3" evaluating expression:
I have tried removing the $ or escaping the $ with backslash but it does not work
I know my query is not actually complex, welcome any help
seems to have found the correct way
><![CDATA[#[{
userID:"w12",
"partnerDataAccess.partnerID": {"\$in" : ["Part-2", "ALL"]}
}]]]>

FIWARE-Orion Context Broker metadata updates trigger notifications

I'm using 3 FIWARE GEs: IDAS+Orion+CEP.
As reported in the Orion documentation (https://github.com/telefonicaid/fiware-orion/blob/develop/doc/manuals/user/metadata.md) "changing the metadata of a given attribute or adding a new metadata element is considered a change even if attribute value itself hasn't changed".
Is there a way to send notifications from Orion only if the value of the attribute specified in the subscription changes?
I've tried the solution proposed in the documentation, delete and re-create the attribute, in order to remove the metadata. But, since the messages to Orion are produced by IDAS, the metadata are created with the new communication.
Thanks.
UPDATE:
GEs Version:
- Orion - 0.26.1-next
- IoTAgent (IDAS) - 1.3.1
The metadata added by IDAS are:
"attributes" : [
{
"name" : "temperature",
"type" : "int",
"value" : "37",
"metadatas" : [
{
"name" : "TimeInstant",
"type" : "ISO8601",
"value" : "2015-12-29T12:46:04.421859"
}
]
}
]
Specifically, from mongodb query:
"temperature" : { "value" : "37", "type" : "int", "md" : [ { "name" : "TimeInstant", "type" : "ISO8601", "value" : "2015-12-29T12:46:04.421859" } ], "creDate" : 1450716887, "modDate" : 1451393164 }
As far as I know, TimeInstant metadata sending from IDAS/IoTAgent to Orion couldn't be disabled by the time being.
A possible workaround could be to have a proxy between IDAS and Orion ir order to remove the TimeInstant metadata (or the whole metadata field in JSON to prevent some other metadata could be causing a similar problem).

artifactory rest api fully qualified class search

Is there any way to use fully qualified class name to search from Artifactory(similar to class-searth in Artifactory web UI). Based on this Documentation , i know i can use wildcard(*) and .class file extension like this:-
GET /api/search/archive?name=*Logger.class&repos=third-party-releases-local,repo1-cache
But i am looking for a way to use fully qualified class name similar to this:-
GET /api/search/archive?name=org.apache.log4j.Logger&repos=third-party-releases-local,repo1-cache
but this is not working.
You can use the Artifactory query language for this.
For example, a query for searching an archive item called org/apache/log4j/Logger.class in the jcenter-cache repository would be
items.find({
"repo" : "jcenter-cache",
"archive.entry.name":{"$eq":"Logger.class "},
"archive.entry.path":{"$eq":"org/apache/log4j"}
})
The response would be
{
"results" : [ {
"repo" : "jcenter-cache",
"path" : "org/apache/log4j/com.springsource.org.apache.log4j/1.2.16",
"name" : "com.springsource.org.apache.log4j-1.2.16.jar",
"type" : "file",
"size" : 481202,
"created" : "2015-12-30T20:57:36.305Z",
"created_by" : "admin",
"modified" : "2010-08-04T13:18:06.000Z",
"modified_by" : "admin",
"updated" : "2015-12-30T20:57:36.354Z"
} ],
"range" : {
"start_pos" : 0,
"end_pos" : 1,
"total" : 1
}
}
To run such a query using curl use the following when the query is inside a file names aql.txt
curl -H "content-type: text/plain" -uuser:password --data #aql.txt http://my-artifactory-host/api/search/aql

Elasticsearch bulk/batch indexing with python requests module

I have a smallish (~50,00) array of json dictionaries that I want to store/index in ES. My preference is to use python, since the data I want to index is coming from a csv file, loaded and converted to json via python. Alternatively, I would like to skip the step of converting to json, and simply use the array of python dictionaries I have. Anyway, a quick search revealed the bulk indexing functionality of ES. I want to do something like this:
post_url = 'http://localhost:9202/_bulk'
request.post(post_url, data = acc ) # acc a python array of dictionaries
or
post_url = 'http://localhost:9202/_bulk'
request.post(post_url, params = acc ) # acc a python array of dictionaries
both request give a [HTTP 500 error]
My understanding is that you have to have one "command" per line (index, create, delete...) and then some of them (like index) takes a row of data on the next line like so
{'index': ''}\n
{'your': 'data'}\n
{'index': ''}\n
{'other': 'data'}\n
NB the new-lines, even on the last row.
Empty index objects like above works if you POST to ../index/type/_bulk or else you need to specify index and type I think, have not tried that.
You the following function will do it:
def post_request(self, endpoint, data):
endpoint = 'localhost:9200/_bulk'
response = requests.post(endpoint, data=data, headers={'content-type':'application/json', 'charset':'UTF-8'})
return response
As data you need to pass a String such:
{ "index" : { "_index" : "test-index", "_type" : "_doc", "_id" : "1681", "routing" : 0 }}
{ "field1" : ... , ..., "fieldN" : ... }
{ "index" : { "_index" : "test-index", "_type" : "_doc", "_id" : "1684", "routing" : 1 }}
{ "field1" : ... , ..., "fieldN" : ... }
Make sure you add a "\n" at the end of each line.
I don't know much about Python, but did you look at Pyes?
Bulk is supported in Pyes.

Two "id" fields in one MongoDB collection with Rails 3?

I've got a Rails 3.0.9 project using the latest version of MongoDB and Mongoid 2.2.
I imported a CSV with an "id" field into a MongoDB collection named College, resulting in a collection like so:
{ "_id" : ObjectID("abc123"), "id" : ######, ... }
Observations:
The show action results in a URL utilizing the ObjectID
Displaying 'college.id' in index.html.erb displays the ObjectID
Questions:
How do I use the original "id" field as the parameter
Is "id" reserved by MongoDB, meaning I need to rename the "id" field in the
College collection (perhaps to "code") - if so, how?
Thanks!
Update
Answer:
db.colleges.update( { "name" : { $exists : true } } , { $rename : { "id" : "code" } }, false, true )
I used "name" since that was a field I could check for the existence.
_id is a reserved and required property in MongoDB - I think mongoid is mapping id to _id since that makes sense. There might be a way to access the id property through mongoid but I think you are much better off renaming the id column to something else to avoid confusion in the future.
{ $rename : { old_field_name : new_field_name } }
will rename the field name in a document (mongo 1.7.2+).
so
db.college.update({ "_id" : { $exists : true }}, { $rename : { 'id' : 'code' } }, false, true);
should update every record in that collection and rename the id field to code.
(obviously test this before running in any important data)