Aggregating Fields from AWS CloudWatch Logs

Aggregating Fields from AWS CloudWatch Logs - amazon-cloudwatch

I am pretty new to CloudWatch unfortunately, and have a rather complex task. So after some extensive work, we have some logs that break down all of our clicks into reasons that it was tossed or did not work as good as it could have. It comes in with a title of "Successful Parse Report".
Successful Parse Report: {
"<IDENTIFIER>": {
"b39fjoa...................................................": {
"errorCounts": {},
"recordCounts": {
"optouts": 0,
"clicks": 100
},
"warningCounts": {
"<FIELD 1>": {
"<REASON1>": 1
},
"<FIELD2>": {
"<REASON2>": 100
},
"<FIELD3>": {
"<REASON3>": 100
},
"<FIELD4>": {
"<REASON4>": 12
},
"<FIELD5>": {
"<REASON5>": 9
},
"<FIELD6>": {
"<REASON6>": 14
}
}
}
}
}
As you can see, there are a few fields. This means 100 clicks happened, and each "REASON" is a different thing I need aggregated with the "FIELD" as the identifier on the graph.I also need this done on all logs for each of the lambdas I am using this for.
My question is, knowing each log I am trying to aggregate starts with the same "Successful Parse Report", how can I possibly aggregate all of the clicks from all of my logs, have each "REASON" aggregate across all the same logs, have the "FIELD" be the identifier for each "REASON", and graph it to be able to visualize these counts in real time as they come in.

Related

Is there a way to use the graphLookup aggregation pipeline stage for arrays?

I am currently working on an application that uses MongoDB as the data repository. I am mainly concerned about the graphLookup query to establish links between different people, based on what flights they took. My document contains an array field, that in turn contains key value pairs. I need to establish the links based on one of the key:value pairs of that array.
I have already tried some queries of aggregation pipeline with $graphLookup as one of the stages and they have all worked fine. But now that I am trying to use it with an array, I am hitting a blank.
Below is the array field from the first document :
"movementSegments":[
{
"carrierCode":"MO269",
"departureDateTimeMillis":1550932676000,
"arrivalDateTimeMillis":1551019076000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA007_1550932676000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_8",
"departureMonthlyTemporalSpatialWindow":"DOH_2",
"arrivalWeeklyTemporalSpatialWindow":"LHR_8",
"arrivalMonthlyTemporalSpatialWindow":"LHR_2"
}
]
The other document has the below field :
"movementSegments":[
{
"carrierCode":"MO269",
"departureDateTimeMillis":1548254276000,
"arrivalDateTimeMillis":1548340676000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA003_1548254276000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_4",
"departureMonthlyTemporalSpatialWindow":"DOH_1",
"arrivalWeeklyTemporalSpatialWindow":"LHR_4",
"arrivalMonthlyTemporalSpatialWindow":"LHR_1"
},
{
"carrierCode":"MO270",
"departureDateTimeMillis":1548254276000,
"arrivalDateTimeMillis":1548340676000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA003_1548254276000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_4",
"departureMonthlyTemporalSpatialWindow":"DOH_1",
"arrivalWeeklyTemporalSpatialWindow":"LHR_4",
"arrivalMonthlyTemporalSpatialWindow":"LHR_1"
}
]
And I am running the below query :
db.person_events.aggregate([
{ $match: { eventId: "22446688" } },
{
$graphLookup: {
from: 'person_events',
startWith: '$movementSegments.carrierCode',
connectFromField: 'carrierCode',
connectToField: 'carrierCode',
as: 'carrier_connections'
}
}
])
The above query creates an array field in the document, but there are no values in it. As per the expectation, both my documents should get linked based on the carrier number.
Just to be clear about the query, the documents contain an eventId field, and the match pipeline returns one document to me after the match stage.

Well, I don't know how I missed it, but here is the solution to my problem which gives me the required results :
db.person_events.aggregate([
{ $match: { eventId: "22446688" } },
{
$graphLookup: {
from: 'person_events',
startWith: '$movementSegments.carrierCode',
connectFromField: 'movementSegments.carrierCode',
connectToField: 'movementSegments.carrierCode',
as: 'carrier_connections'
}
}
])

Use sprintf syntax inside logstash's sprintf syntax

For the below data structure:
{
"sprints": [
{
"id": 17193,
"name": "Sprint 12"
},
{
"id": 16510,
"name": "Sprint 11"
}
],
"velocityStatEntries": {
"16510": {
"estimated": {
"value": 49
},
"completed": {
"value": 36
}
},
"17193": {
"estimated": {
"value": 52
},
"completed": {
"value": 70
}
}
}
}
Given this, I want to be able to produce an Elasticsearch object that's easier to handle, by adding the values of the Estimated and Completed fields to the sprints with their matching IDs.
Ideally, I would like to handle this without writing Ruby, but I am not finding a logstash-native solution that handles this scnenario.
First, I split the data on the sprints field using split, so, I only have a single sprints object, and can use [sprints][id] to know what sprint I'm processing.
Then, I have attempted to work with the mutate filter, in one of two ways:
- using merge to add the [velocityStateEntries][] object to the
current sprint
- using add_field to add the two fields I need
Syntactically, is this possible? Ideally, I would want to be able to do a 'double substitution' of sorts, obtaining the estimated time for the current sprint something like:
add_field => {
"estimatedTime" => "%{[velocityStatEntries][%{[sprints][id]}][estimated][value]}"
}
but this only seems to work with a hardcoded format such as "estimatedTime" => "%{[velocityStatEntries][1234][estimated][value]}"
Do I have to use the Ruby format for this?

For what it's worth, the Ruby solution is very simple:
ruby {
code => "
sprintId = event.get('[sprints][id]');
estimated = event.get('[velocityStatEntries]['+(sprintId).to_s+'][estimated][value]');
completed = event.get('[velocityStatEntries]['+(sprintId).to_s+'][completed][value]');
event.set('[sprints][estimatedUnits]', estimated);
event.set('[sprints][completedUnits]', completed);
"
}

Collapsing a group using Google Sheets API

So as a workaround to difficulties creating a new sheet with groups I am trying to create and collapse these groups in a separate call to batchUpdate. I can call request an addDimensionGroup successfully, but when I request updateDimensionGroup to collapse the group I just created, either in the same API call or in a separate one, I get this error:
{
"error": {
"code": 400,
"message": "Invalid requests[1].updateDimensionGroup: dimensionGroup.depth must be \u003e 0",
"status": "INVALID_ARGUMENT"
}
}
But I'm passing depth as 0 as seen by the following JSON which I send in my request:
{
"requests":[{
"addDimensionGroup":{
"range":{
"dimension":"ROWS",
"sheetId":0,
"startIndex":2,
"endIndex":5}
}
},{
"updateDimensionGroup":{
"dimensionGroup":{
"range": {
"dimension":"ROWS",
"sheetId":0,
"startIndex":2,
"endIndex":5
},
"depth":0,
"collapsed":true
},
"fields":"*"
}
}],
"includeSpreadsheetInResponse":true}',
...
I'm not entirely sure what I am supposed to provide for "fields", the documentation for UpdateDimensionGroupRequest says it is supposed to be a string ("string ( FieldMask format)"), but the FieldMask definition itself shows the possibility of multiple paths, and doesn't tell me how they are supposed to be separated in a single string.
What am I doing wrong here?

The error message is actually instructing you that the dimensionGroup.depth value must be > 0:
If you call spreadsheets.get() on your sheet, and request only the DimensionGroup data, you'll note that your created group is actually at depth 1:
GET https://sheets.googleapis.com/v4/spreadsheets/{SSID}?fields=sheets(rowGroups)&key={API_KEY}
This makes sense, since the depth is (per API spec):
depth numberThe depth of the group, representing how many groups have a range that wholly contains the range of this group.
Note that any given particular DimensionGroup "wholly contains its own range" by definition.
If your goal is to change the status of the DimensionGroup, then you need to set its collapsed property:
{
"requests":
[
{
"updateDimensionGroup":
{
"dimensionGroup":
{
"range":
{
"sheetId": <your sheet id>,
"dimension": "ROWS",
"startIndex": 2,
"endIndex": 5
},
"collapsed": true,
"depth": 1
},
"fields": "collapsed"
}
}
]
}
For this particular Request, the only attribute you can set is collapsed - the other properties are used to identify the desired DimensionGroup to manipulate. Thus, specifying fields: "*" is equivalent to fields: "collapsed". This is not true for the majority of requests, so specifying fields: "*" and then omitting a non-required request parameter is interpreted as "Delete that missing parameter from the server's representation".
To change a DimensionGroup's depth, you must add or remove other DimensionGroups that encompass it.

square connect api batch processing

I need assistance with batch processing, especially in adding tax codes to items.
I'm experimenting with the square batch processing feature and my sample cases are create 2 items and add the tax code to them. In all 4 requests - 2 for creating item, 2 to 'put' the tax code. I have tried the following orders:
1. create the two items; add the taxes
2. create one item; add tax code to that item; create second item, add code to the second item.
In both instances, the result is the same - the taxes are applied to only one item. For the second item, the response I get is:
{
"status_code":404,
"body":{
"type":"not_found",
"message":"NotFound"
},
"request_id":4
}
To help with the investigation, here's the sample json that I use in the cURL request.
{
"requests":[
{
"method":"POST",
"relative_path":"\/v1\/me\/items",
"access_token":"XXX-YYY",
"body":
{
"id":126,
"name":"TestItem",
"description":"TestItemDescription",
"category_id":"DF1F51FB-11D6-4232-B138-2ECE3D89D206",
"variations":[
{
"name":"var1",
"pricing_type":"FIXED_PRICING",
"price_money":
{
"currency_code":"CAD",
"amount":400
},
"sku":"123444:QWEFASDERRG"
}
]},
"request_id":1
},
{
"method":"PUT",
"relative_path":"\/v1\/me\/items\/126\/fees\/7F2D50D8-43C1-4518-8B8D-881CBA06C7AB",
"access_token":"XXX-YYY",
"request_id":2
},
{
"method":"POST",
"relative_path":"\/v1\/me\/items",
"access_token":"XXX-YYY",
"body":
{
"id":127,
"name":"TestItem1",
"description":"TestItemDescription1",
"category_id":"DF1F51FB-11D6-4232-B138-2ECE3D89D206",
"variations":[
{
"name":"var1",
"pricing_type":"FIXED_PRICING",
"price_money":
{
"currency_code":"CAD",
"amount":400
},
"sku":"123444:QWEFASDERRG1"
}
]
},
"request_id":3
},
{
"method":"PUT",
"relative_path":"\/v1\/me\/items\/127\/fees\/7F2D50D8-43C1-4518-8B8D-881CBA06C7AB",
"access_token":"XXX-YYY",
"request_id":4
}
]
}
Below is the full response that I receive indicating successful creation of two items and only one successful tax push.
[
{
"status_code":200,
"body":
{
"visibility":"PUBLIC",
"available_online":false,
"available_for_pickup":false,
"id":"126",
"description":"TestItemDescription",
"name":"TestItem",
"category_id":"DF1F51FB-11D6-4232-B138-2ECE3D89D206",
"category":
{
"id":"DF1F51FB-11D6-4232-B138-2ECE3D89D206",
"name":"Writing Instruments"
},
"variations":[
{
"pricing_type":"FIXED_PRICING",
"track_inventory":false,
"inventory_alert_type":"NONE",
"id":"4c70909b-90bd-4742-b772-e4fabe636557",
"name":"var1",
"price_money":
{
"currency_code":"CAD",
"amount":400
},
"sku":"123444:QWEFASDERRG",
"ordinal":1,
"item_id":"126"
}
],
"modifier_lists":[],
"fees":[],
"images":[]
},
"request_id":1
},
{
"status_code":200,
"body":{},
"request_id":2
},
{
"status_code":200,
"body":
{
"visibility":"PUBLIC",
"available_online":false,
"available_for_pickup":false,
"id":"127",
"description":"TestItemDescription1",
"name":"TestItem1",
"category_id":"DF1F51FB-11D6-4232-B138-2ECE3D89D206",
"category":
{
"id":"DF1F51FB-11D6-4232-B138-2ECE3D89D206",
"name":"Writing Instruments"
},
"variations":[
{
"pricing_type":"FIXED_PRICING",
"track_inventory":false,
"inventory_alert_type":"NONE",
"id":"6de8932f-603e-4cd9-99ad-67f6c7777ffd",
"name":"var1",
"price_money":
{
"currency_code":"CAD",
"amount":400
},
"sku":"123444:QWEFASDERRG1",
"ordinal":1,
"item_id":"127"
}
],
"modifier_lists":[],
"fees":[],
"images":[]
},
"request_id":3
},
{
"status_code":404,
"body":
{
"type":"not_found",
"message":"NotFound"
},
"request_id":4
}
]
I have checked through going for the list of items and both items with their item ID's are present in the inventory. So the questions I have are, Why the tax is applied to one item and not to the other? How to resolve it?

From the Square docs:
Note the following when using the Submit Batch endpoint:
You cannot include more than 30 requests in a single batch.
Recursive
requests to the Submit Batch endpoint are not allowed (i.e., none of
the requests included in a batch can itself be a request to this
endpoint).
There is no guarantee of the order in which batched
requests are performed.
(emphasis mine).
If you want to use the batch API, you will have to create parent entities like items first, then in a separate batch request apply any child entities like fees, discounts, etc... Alternately, you can just make separate requests. There may not be much benefit from using the batch API in this case.

Rally Lookback: help fetching all history based on future state

Probably a lookback newbie question, but how do I return all of the history for stories based on an attribute that gets set later in their history?
Specifically, I want to load all of the history for all stories/defects in my project that have an accepted date in the last two weeks.
The following query (below) doesn't work because it (of course) only returns those history records where accepted date matches the query. What I actually want is all of the history records for any defect/story that is eventually accepted after that date...
filters :
[
{
property: "_TypeHierarchy",
value: { $nin: [ -51009, -51012, -51031, -51078 ] }
},
{
property: "_ProjectHierarchy",
value: this.getContext().getProject().ObjectID
},
{
property: "AcceptedDate",
value: { $gt: Ext.Date.format(twoWeeksBack, 'Y-m-d') }
}
]

Thanks to Nick's help, I divided this into two queries. The first grabs the final history record for stories/defects with an accepted date. I accumulate the object ids from that list, then kick off the second query, which finds the entire history for each object returned from the first query.
Note that I'm caching some variables in the "window" scope - that's my lame workaround to the fact that I can't ever quite figure out the context of "this" when I need it...
window.projectId = this.getContext().getProject().ObjectID;
I also end up flushing window.objectIds (where I store the results from the first query) when I exec the query, so I don't accumulate results across reloads. I'm sure there's a better way to do this, but I struggle with scope in javascript.
filter for first query
filters : [ {
property : "_TypeHierarchy",
value : {
$nin : [ -51009, -51012, -51031, -51078 ]
}
}, {
property : "_ProjectHierarchy",
value : window.projectId
}, {
property : "AcceptedDate",
value : {
$gt : Ext.Date.format(monthBack, 'Y-m-d')
}
}, {
property : "_ValidTo",
value : {
$gt : '3000-01-01'
}
} ]
Filter for second query:
filters : [ {
property : "_TypeHierarchy",
value : {
$nin : [ -51009, -51012, -51031, -51078 ]
}
}, {
property : "_ProjectHierarchy",
value : window.projectId
}, {
property : "ObjectID",
value : {
$in : window.objectIds
}
}, {
property : "c_Kanban",
value : {
$exists : true
}
} ]

Here's an alternative query that will return only the snapshots that represent transition into the Accepted state.
find:{
_TypeHierarchy: { $in : [ -51038, -51006 ] },
_ProjectHierarchy: 999999,
ScheduleState: { $gte: "Accepted" },
"_PreviousValues.ScheduleState": {$lt: "Accepted", $exists: true},
AcceptedDate: { $gte: "2014-02-01TZ" }
}
A second query is still required if you need the full history of the stories/defects. This should at least give you a cleaner initial list. Also note that Project: 999999 limits to the given project, while _ProjectHierarchy finds stories/defects in the child projects, as well.
In case you are interested, the query is similar to scenario #5 in the Lookback API documentation at https://rally1.rallydev.com/analytics/doc/.

If I understand the question, you want to get stories that are currently accepted, but you want that the returned results include snapshots from the time when they were not accepted. Before you write code, you may test an equivalent query in the browser and see if the results look as expected.
Here is an example - you will have to change OIDs.
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12352608129/artifact/snapshot/query.js?find={"_ProjectHierarchy":12352608219,"_TypeHierarchy":"HierarchicalRequirement","ScheduleState":"Accepted",_ValidFrom:{$gte: "2013-11-01",$lt: "2014-01-01"}}},sort:[{"ObjectID": 1},{_ValidFrom: 1}]&fields=["Name","ScheduleState","PlanEstimate"]&hydrate=["ScheduleState"]
You are correct that a query like this: find={"AcceptedDate":{$gt:"2014-01-01T00:00:00.000Z"}}
will return one snapshot per story that satisfies it.
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12352608129/artifact/snapshot/query.js?find={"AcceptedDate":{$gt:"2014-01-01T00:00:00.000Z"}}&fields=true&start=0&pagesize=1000
but a query like this: find={"ObjectID":{$in:[16483705391,16437964257,14943067452]}}
will return the whole history of the 3 artifacts:
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12352608129/artifact/snapshot/query.js?find={"ObjectID":{$in:[16483705391,16437964257,14943067452]}}&fields=true&start=0&pagesize=1000
To illustrate, here are some numbers: the last query returns 17 results for me. I check each story's revision history, and the number of revisions per story are 5, 5, 7 respectively, sum of which is equal to the total result count returned by the query.
On the other hand the number of stories that meet find={"AcceptedDate":{$gt:"2014-01-01T00:00:00.000Z"}} is 13. And the query based on the accepted date returns 13 results, one snapshot per story.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Aggregating Fields from AWS CloudWatch Logs - amazon-cloudwatch

Related

Is there a way to use the graphLookup aggregation pipeline stage for arrays?

Use sprintf syntax inside logstash's sprintf syntax

Collapsing a group using Google Sheets API

square connect api batch processing

Rally Lookback: help fetching all history based on future state

Categories

Resources