Timezone offset with logstash / redis / ES - redis

I'm trying to configure logstash with redis and elasticsearch.
I have a problem with the #timestamp field.
The value of #timestamp is always the real event timestamp -2 hrs.
I have a shipper configured like this :
input{ file {...}}
filter{
if [type]=="apachelogs"{
grok{
match => [ "message", "%{COMBINEDAPACHELOG}"]
}
date {
locale => "en"
timezone => "Europe/Brussels"
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
}
output{ redis{...}}
and a logstash-indexer like this :
input{ redis {...}}
output { elasticsearch {...}}
The result of an event in ES looks like this :
"#timestamp": "2014-05-21T13:29:53.000Z"
...
"timestamp": "21/May/2014:15:29:53 +0200"
So as you can see there is always a 2hrs offset in the #timestamp and I can't figure out why.
I've tried different things such as changing the timezone etc. without success.
Any idea about this one ?
Thanks

You can use this filter to change timezone.
Change
"#timestamp": "2014-04-23T13:40:29.000Z"
to
"#timestamp": "2014-04-23T15:40:29.000+0200"
Try to use this filter
filter {
ruby {
code => "
event['#timestamp'] = event['#timestamp'].localtime('+02:00')
"
}
}
Hope this can help you.

timezone should be work.
What's wrong with your result? Values in following two fields indicate the same time point.
"#timestamp": "2014-05-21T13:29:53.000Z"
"timestamp": "21/May/2014:15:29:53 +0200"
Where Z stands for +0000.

Related

Kafka Connect S3 sink - how to use the timestamp from the message itself [timestamp extractor]

I've been struggling with a problem using kafka connect and the S3 sink.
First the structure:
{
Partition: number
Offset: number
Key: string
Message: json string
Timestamp: timestamp
}
Normally when posting to Kafka, the timestamp should be set by the producer. Unfortunately there seems to be cases where this didn't happen. This means that the Timestamp might sometimes be null
To extract this timestamp the connector was set to the following value:
"timestamp.extractor":"Record".
Now it is always certain that the Message field itself always contains a timestamp as well.
Message:
{
timestamp: "2019-04-02T06:27:02.667Z"
metadata: {
creationTimestamp: "1554186422667"
}
}
The question however is that now, I would like to use that field for the timestamp.extractor
I was thinking that this would suffice, but this doesn't seem to work:
"timestamp.extractor":"RecordField",
"timestamp.field":"message.timestamp",
This results in a NullPointer as well.
Any ideas as to how to use the timestamp from the kafka message payload itself, instead of the default timestamp field that is set for kafka v0.10+
EDIT:
Full config:
{ "name": "<name>",
"config": {
"connector.class":"io.confluent.connect.s3.S3SinkConnector",
"tasks.max":"4",
"topics":"<topic>",
"flush.size":"100",
"s3.bucket.name":"<bucket name>",
"s3.region": "<region>",
"s3.part.size":"<partition size>",
"rotate.schedule.interval.ms":"86400000",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"storage.class":"io.confluent.connect.s3.storage.S3Storage",
"format.class":"io.confluent.connect.s3.format.json.JsonFormat",
"locale":"ENGLISH",
"timezone":"UTC",
"schema.generator.class":"io.confluent.connect.storage.hive.schema.TimeBasedSchemaGenerator",
"partitioner.class":"io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"partition.duration.ms": "3600000",
"path.format": "'year'=YYYY/'month'=MM/'day'=dd",
"timestamp.extractor":"RecordField",
"timestamp.field":"message.timestamp",
"max.poll.interval.ms": "600000",
"request.timeout.ms": "610000",
"heartbeat.interval.ms": "6000",
"session.timeout.ms": "20000",
"s3.acl.canned":"bucket-owner-full-control"
}
}
EDIT 2:
Kafka message payload structure:
{
"reference": "",
"clientId": "",
"gid": "",
"timestamp": "2019-03-19T15:27:55.526Z",
}
EDIT 3:
{
"transforms": "convert_op_creationDateTime",
"transforms.convert_op_creationDateTime.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.convert_op_creationDateTime.target.type": "Timestamp",
"transforms.convert_op_creationDateTime.field": "timestamp",
"transforms.convert_op_creationDateTime.format": "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
}
So I tried doing a transform on the object, but it seems like I've been stuck again on this thing. The pattern seems to be invalid. Looking around the internet it does seem like this is a valid SimpleDatePattern. It seems to be complaining about the 'T'. Updated the message schema as well.
Based on the schema you've shared, you should be setting:
"timestamp.extractor":"RecordField",
"timestamp.field":"timestamp",
i.e. no message prefix to the timestamp field name.
If the data is a string, then Connect will try to parse as milliseconds - source code here.
In any case, message.timestamp assumes the data looks like { "message" : { "timestamp": ... } }, so just timestamp would be correct. And having nested fields didn't use to be possible anyway, so you might want to clarify which version of Connect you have.
I'm not entirely sure how you would get instanceof Date to evalutate to true when using JSON Converter, and even if you had set schema.enable = true, then also in the code, you can see there is only conditions for schema types of numbers and strings, but still assumes that it is milliseconds.
You can try using the TimestampConverter transformation to convert your date string.

Use sprintf syntax inside logstash's sprintf syntax

For the below data structure:
{
"sprints": [
{
"id": 17193,
"name": "Sprint 12"
},
{
"id": 16510,
"name": "Sprint 11"
}
],
"velocityStatEntries": {
"16510": {
"estimated": {
"value": 49
},
"completed": {
"value": 36
}
},
"17193": {
"estimated": {
"value": 52
},
"completed": {
"value": 70
}
}
}
}
Given this, I want to be able to produce an Elasticsearch object that's easier to handle, by adding the values of the Estimated and Completed fields to the sprints with their matching IDs.
Ideally, I would like to handle this without writing Ruby, but I am not finding a logstash-native solution that handles this scnenario.
First, I split the data on the sprints field using split, so, I only have a single sprints object, and can use [sprints][id] to know what sprint I'm processing.
Then, I have attempted to work with the mutate filter, in one of two ways:
- using merge to add the [velocityStateEntries][] object to the
current sprint
- using add_field to add the two fields I need
Syntactically, is this possible? Ideally, I would want to be able to do a 'double substitution' of sorts, obtaining the estimated time for the current sprint something like:
add_field => {
"estimatedTime" => "%{[velocityStatEntries][%{[sprints][id]}][estimated][value]}"
}
but this only seems to work with a hardcoded format such as "estimatedTime" => "%{[velocityStatEntries][1234][estimated][value]}"
Do I have to use the Ruby format for this?
For what it's worth, the Ruby solution is very simple:
ruby {
code => "
sprintId = event.get('[sprints][id]');
estimated = event.get('[velocityStatEntries]['+(sprintId).to_s+'][estimated][value]');
completed = event.get('[velocityStatEntries]['+(sprintId).to_s+'][completed][value]');
event.set('[sprints][estimatedUnits]', estimated);
event.set('[sprints][completedUnits]', completed);
"
}

Logstash: modify apache date format

The grok-filter %{COMBINEDAPACHELOG} formats the timestamp as dd/MMM/YYYY:HH:mm:ss Z however I need the timestamp in the format of yyyy-MM-dd HH:mm:ss
I tried the below configuration
grok {
match => [
"message", "%{COMBINEDAPACHELOG}",
]
break_on_match => false
}
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
target => ["datetime"]
}
but got the below parsing error:
Failed parsing date from field {:field=>"timestamp", :value=>"19/May/2012:12:40:18 -0700", :exception=>java.lang.IllegalArgumentException: Invalid format: "19/May/2012:12:40:18 -0700" is malformed at "/May/2012:12:40:18 -0700", :level=>:warn}
Would highly appreciate if anyone can throw more light on the same.
The COMBINEDAPACHELOG pattern is expecting the date in the log entry to match the format so it can shove it into the "timestamp" field. It doesn't format your timestamp at all.
Once the date has been grok'ed out into "timestamp", you can use the date{} filter to move it into #timestamp. The pattern you supply there should match whatever's in the field.
So, pass "dd/MMM/yyyy:HH:mm:ss Z" as the format to date{} and you should be all set.
EDIT:
Based on your additional details, I was hoping that you could match each component of the input date and then combine them into a new field. That would work if you were trying to swap, say, firstName and lastName in a string, but dates are more complicated. A simple string swap wouldn't handle converting "Jan" to "01" or deal with timezones at all.
So, we're back to creating a date object and then outputting that as a string in the format you desire.
# convert "timestamp" to a date field "datetime"
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
target => ["datetime"]
}
# convert "datetime" to a string "datestring"
ruby {
code => "
event['datestring'] = event['datetime'].strftime('%Y-%m-%d %H:%M:%S')
"
}
For the latest version of Logstash the code would be:
# convert "datetime" to a string "datestring"
ruby {
code => "event.set('datestring', event.get('datetime').strftime('%Y-%m-%d %H:%M:%S'))"
}

ElasticSearch analyzed fields

I'm building my search but need to analyze 1 field with different analyzers. My problem is for a field I need to have an analyzer on it for stemming (snowball) and then also one to keep the full word as one token (keyword). I can get this to work by the following index settings:
curl -X PUT "http://localhost:9200/$IndexName/" -d '{
"settings":{
"analysis":{
"analyzer":{
"analyzer1":{
"type":"custom",
"tokenizer":"keyword",
"filter":[ "standard", "lowercase", "stop", "snowball", "my_synonyms" ]
}
}
},
"filter": {
"my_synonyms": {
"type": "synonym",
"synonyms_path ": "synonyms.txt"
}
}
}
},
"mappings": {
"product": {
"properties": {
"title": {
"type": "string",
"search_analyzer" : "analyzer1",
"index_analyzer" : "analyzer1"
}
}
}
}
}';
The problem comes when searching on a single word in the title field. If it's populated with The Cat in the Hat it will store it as "The Cat in the Hat" but if I search for cats I get nothing returned.
Is this even possible to accomplish or do I need to have 2 separate fields and analyze one with keyword and the other with snowball?
I'm using nest in vb code to index the data if that matters.
Thanks
Robert
You can apply two different analyzers to the same using the fields property (previously known as multi fields).
My VB.NET is a bit rusty, so I hope you don't mind the C# examples. If you're using the latest code from the dev branch, Fields was just added to each core mapping descriptor so you can now do this:
client.Map<Foo>(m => m
.Properties(props => props
.String(s => s
.Name(o => o.Bar)
.Analyzer("keyword")
.Fields(fs => fs
.String(f => f
.Name(o => o.Bar.Suffix("stemmed"))
.Analyzer("snowball")
)
)
)
)
);
Otherwise, if you're using NEST 1.0.2 or earlier (which you likely are), you have to accomplish this via the older multi field type way:
client.Map<Foo>(m => m
.Properties(props => props
.MultiField(mf => mf
.Name(o => o.Bar)
.Fields(fs => fs
.String(s => s
.Name(o => o.Bar)
.Analyzer("keyword"))
.String(s => s
.Name(o => o.Bar.Suffix("stemmed"))
.Analyzer("snowball"))
)
)
)
);
Both ways are supported by Elasticsearch and will do the exact same thing. Applying the keyword analyzer to the primary bar field, and the snowball analyzer to the bar.stemmed field. stemmed of course was just the suffix I chose in these examples, you can use whatever suffix name you desire. In fact, you don't need to add a suffix, you can name the multi field something completely different than the primary field.

Rally Lookback: help fetching all history based on future state

Probably a lookback newbie question, but how do I return all of the history for stories based on an attribute that gets set later in their history?
Specifically, I want to load all of the history for all stories/defects in my project that have an accepted date in the last two weeks.
The following query (below) doesn't work because it (of course) only returns those history records where accepted date matches the query. What I actually want is all of the history records for any defect/story that is eventually accepted after that date...
filters :
[
{
property: "_TypeHierarchy",
value: { $nin: [ -51009, -51012, -51031, -51078 ] }
},
{
property: "_ProjectHierarchy",
value: this.getContext().getProject().ObjectID
},
{
property: "AcceptedDate",
value: { $gt: Ext.Date.format(twoWeeksBack, 'Y-m-d') }
}
]
Thanks to Nick's help, I divided this into two queries. The first grabs the final history record for stories/defects with an accepted date. I accumulate the object ids from that list, then kick off the second query, which finds the entire history for each object returned from the first query.
Note that I'm caching some variables in the "window" scope - that's my lame workaround to the fact that I can't ever quite figure out the context of "this" when I need it...
window.projectId = this.getContext().getProject().ObjectID;
I also end up flushing window.objectIds (where I store the results from the first query) when I exec the query, so I don't accumulate results across reloads. I'm sure there's a better way to do this, but I struggle with scope in javascript.
filter for first query
filters : [ {
property : "_TypeHierarchy",
value : {
$nin : [ -51009, -51012, -51031, -51078 ]
}
}, {
property : "_ProjectHierarchy",
value : window.projectId
}, {
property : "AcceptedDate",
value : {
$gt : Ext.Date.format(monthBack, 'Y-m-d')
}
}, {
property : "_ValidTo",
value : {
$gt : '3000-01-01'
}
} ]
Filter for second query:
filters : [ {
property : "_TypeHierarchy",
value : {
$nin : [ -51009, -51012, -51031, -51078 ]
}
}, {
property : "_ProjectHierarchy",
value : window.projectId
}, {
property : "ObjectID",
value : {
$in : window.objectIds
}
}, {
property : "c_Kanban",
value : {
$exists : true
}
} ]
Here's an alternative query that will return only the snapshots that represent transition into the Accepted state.
find:{
_TypeHierarchy: { $in : [ -51038, -51006 ] },
_ProjectHierarchy: 999999,
ScheduleState: { $gte: "Accepted" },
"_PreviousValues.ScheduleState": {$lt: "Accepted", $exists: true},
AcceptedDate: { $gte: "2014-02-01TZ" }
}
A second query is still required if you need the full history of the stories/defects. This should at least give you a cleaner initial list. Also note that Project: 999999 limits to the given project, while _ProjectHierarchy finds stories/defects in the child projects, as well.
In case you are interested, the query is similar to scenario #5 in the Lookback API documentation at https://rally1.rallydev.com/analytics/doc/.
If I understand the question, you want to get stories that are currently accepted, but you want that the returned results include snapshots from the time when they were not accepted. Before you write code, you may test an equivalent query in the browser and see if the results look as expected.
Here is an example - you will have to change OIDs.
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12352608129/artifact/snapshot/query.js?find={"_ProjectHierarchy":12352608219,"_TypeHierarchy":"HierarchicalRequirement","ScheduleState":"Accepted",_ValidFrom:{$gte: "2013-11-01",$lt: "2014-01-01"}}},sort:[{"ObjectID": 1},{_ValidFrom: 1}]&fields=["Name","ScheduleState","PlanEstimate"]&hydrate=["ScheduleState"]
You are correct that a query like this: find={"AcceptedDate":{$gt:"2014-01-01T00:00:00.000Z"}}
will return one snapshot per story that satisfies it.
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12352608129/artifact/snapshot/query.js?find={"AcceptedDate":{$gt:"2014-01-01T00:00:00.000Z"}}&fields=true&start=0&pagesize=1000
but a query like this: find={"ObjectID":{$in:[16483705391,16437964257,14943067452]}}
will return the whole history of the 3 artifacts:
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12352608129/artifact/snapshot/query.js?find={"ObjectID":{$in:[16483705391,16437964257,14943067452]}}&fields=true&start=0&pagesize=1000
To illustrate, here are some numbers: the last query returns 17 results for me. I check each story's revision history, and the number of revisions per story are 5, 5, 7 respectively, sum of which is equal to the total result count returned by the query.
On the other hand the number of stories that meet find={"AcceptedDate":{$gt:"2014-01-01T00:00:00.000Z"}} is 13. And the query based on the accepted date returns 13 results, one snapshot per story.