How to track SLA of VM availability set (or availability zone) through heartbeats with Log Analytics (KQL) - azure-log-analytics

I want to track the SLAs of our VMs in a Monitor Workbook using a Log Analytics query.
For this, I use the 'Heartbeat' table, which gives the heartbeats of each VM.
However, some of our VMs are in an availability set/zone and as such, the SLA is only broken,
if in an interval of 1 minute, both heartbeats are missing.
As such I need to be able to group the heartbeats by availability set/zone in the query, but there doesn't seem to be such a property on the heartbeat.
I can use a separate Azure Resource Graph query to search for which VMs are in an availability set/zone, but when I merge this query with my Log Analytics query, I can't do any further Kusto Query Language processing on the query (I can only merge the tables).
For information, these are my Log Analytics Heartbeat query and my Resource Graph SLA query:
let timeRangeStart = {TimeRange:start};
let timeRangeEnd = {TimeRange:end};
Heartbeat
| where ResourceType == "virtualMachines"
| extend ResourceGroup = case(ResourceGroup <> "", ResourceGroup, "On-Prem")
| where TimeGenerated > timeRangeStart and TimeGenerated < timeRangeEnd and Computer in ({Servers})
| extend Resource=tolower(iff(isempty(_ResourceId), Resource, _ResourceId))
| summarize heartbeat_tot = count() by Resource,ResourceGroup, SubscriptionId
| extend total_number_of_buckets=round((timeRangeEnd-timeRangeStart)/1m)
| extend round(availability_rate=heartbeat_tot*100/total_number_of_buckets,2)
| extend availability_rate = min_of(availability_rate, 100)
| order by availability_rate asc
Resources // VMs
| where type == 'microsoft.compute/virtualmachines'
| extend AvSet = properties.availabilitySet.id
| extend AvZone = properties.availabilityZone.id
| extend VMname_SLA = iff(isnotempty(AvZone), AvZone, iff(isnotempty(AvSet), AvSet, id))
| extend SLA_VM = iff(isnotnull(AvZone), '99.99%', iff(isnotnull(AvSet), '99.95%', ''))
| extend managedBy = tolower(id)
| join kind = leftouter (
Resources // Disks
| where type == 'microsoft.compute/disks'
| where isnotempty(managedBy)
| extend managedBy = tolower(managedBy)
// What do Standard HDD disks have as SKU tag??? I used StandardHDD for the time being
| extend Tier_disk = sku.tier
| extend SLA_disk = iff(Tier_disk == 'StandardHDD', '95%', iff(Tier_disk == 'Standard', '99.5%', '99.9%'))
) on managedBy
| extend SLA_tot = iff(isnotempty(SLA_VM), SLA_VM, SLA_disk)
| project managedBy, VMname_SLA, SLA_tot
| order by managedBy asc

How many resources is it?
If it is not a large number of resources, a workaround would be:
run your ARG query in text parameter, and format the results of the query to effectively generate a json array of objects, with id, location, etc that you need. then mark this parameter as hidden
in your Logs query, reference that parameter json text before the query, and use KQL operators to turn that JSON structure into a table. then you can join/filter on that table in the query
it isn't optimal, and won't work well if there are large numbers of resources since every time you run your query you're effectively "uploading" a json blob and then immediately parsing it apart again.

Related

How to filter a date-field with a swift vapor-fluent query

To avoid multiple inserts of the same person in a database, I wrote the following function:
func anzahlDoubletten(_ req: Request, nname: String, vname: String, gebTag: Date)
async throws -> Int {
try await
Teilnehmer.query(on: req.db)
.filter(\.$nname == nname)
.filter(\.$vname == vname)
.filter(\.$gebTag == gebTag)
.count()
}
The function always returns 0, even if there are multiple records with the same surname, prename and birthday in the database.
Here is the resulting sql-query:
[ DEBUG ] SELECT COUNT("teilnehmer"."id") AS "aggregate" FROM "teilnehmer" WHERE "teilnehmer"."nname" = $1 AND "teilnehmer"."vname" = $2 AND "teilnehmer"."geburtstag" = $3 ["neumann", "alfred e.", 1999-09-09 00:00:00 +0000] [database-id: psql, request-id: 1AC70C41-EADE-43C2-A12A-99C19462EDE3] (FluentPostgresDriver/FluentPostgresDatabase.swift:29)
[ INFO ] anzahlDoubletten=0 [request-id: 1AC70C41-EADE-43C2-A12A-99C19462EDE3] (App/Controllers/TeilnehmerController.swift:49)
if I query directly I obtain:
lwm=# select nname, vname, geburtstag from teilnehmer;
nname | vname | geburtstag
---------+-----------+------------
neumann | alfred e. | 1999-09-09
neumann | alfred e. | 1999-09-09
neumann | alfred e. | 1999-09-09
neumann | alfred e. | 1999-09-09
so count() should return 4 not 0:
lwm=# select count(*) from teilnehmer where nname = 'neumann' and vname = 'alfred e.' and geburtstag = '1999-09-09';
count
-------
4
My DateFormatter is defined like so:
let dateFormatter = ISO8601DateFormatter()
dateFormatter.formatOptions = [.withFullDate, .withDashSeparatorInDate]
And finally the attribute "birthday" in my model:
...
#Field(key: "geburtstag")
var gebTag: Date
...
I inserted the 4 alfreds in my database using the model and fluent, passing the birthday "1999-09-09" as a String and fluent inserted all records correctly.
But .filter(\.$gebTag == gebTag) seems to return constantly 'false'.
Is it at all possible to use .filter() with data types other than String?
And if so, what am I doing wrong?
Many thanks for your help
Michael
The problem you've hit is that you're storing only dates whereas you're filtering on dates with times. Unfortunately there's no native way to store just a date. However there are a few options.
The easiest way is to change the date field to a String and then use your date formatter (make sure you remove the time part) to convert the query option to a String.
I am guessing slightly here, but I suspect that your table was not created by a Migration? If it had been, your geburtstag field would include a time component as this is the default and you would have spotted the problem quickly.
In any event, the filter is actually filtering on the time component of gebTag as well as the date. This is why it is returning zero.
I suggest converting the geburtstag to a type that includes the time and ensuring that the time component is set to 0:00:00 when you store it. You can reset the time component to 'midnight' using something like this:
extension Date {
var midnight: Date { return Calendar.current.date(bySettingHour: 0, minute: 0, second: 0, of: self)! }
}
Then change your filter to:
.filter(\.$gebTag == gebTag.midnight)
Alternatively, just use the static method in Calendar:
.filter(\.$gebTag == Calendar.startOfDay(for:gebTag))
I think this is the most straightforward way of doing it.

NextgenSplunk: Need help forming a splunk query which takes sessionId from a particular set of logs, use it to form next query

I need to form a Splunk query to find a particular sessionId for which log a is available but log b is not. Both are part of the same transaction but code breaking in between somewhere.
LOGGER.info("Log a:: setting some details in session");
Response response = handler.transactionMethod(token); //throws some exception
LOGGER.info("Log b:: getting details in session");
So in the success scenario, both Log a and Log b will be printed. But when transactionMethod throws an exception, only Log a will be printed for that sessionId and not Log b.
The requirement is I need to find any of the sessionId for which only Log a is present, not Log b.
Assuming that you have 2 fields TEXT and SessionID already defined, we will use the following test data:
SessionID=1001 TEXT="setting
SessionID=1001 TEXT="getting
SessionID=1002 TEXT="setting
SessionID=1003 TEXT="getting"
Splunk query:
| makeresults count=4
| streamstats count
| eval TEXT=case(count=1 OR count=3, "setting", count=2 OR count=4, "getting")
| eval SessionID=case(count=1 OR count=2, 1001, count=3, 1002, count=4, 1003)
``` The above is just setting of the test data ```
``` Below is the actual SPL for the task ```
| stats count(eval(TEXT=setting")) as LogA count(eval(TEXT="getting") as Logb by SessionID
| search LogA > 0 and LogB = 0
As you can see I specifically excluded the case when only "LogB" record is present (SessionID=3)

Process fields with nested arrays into strings with strcat_array for output in Kusto

I would like to process Azure AD audit Logs into HTML tables/csv files. The data contains nested sets of arrays that I would like to summarise into a comma separated string.
eg data that looks like this
{
"TargetResources": [{"displayName": "Policy",
"modifiedProperties": [{"displayname": "PolicySetting1"},
{"displayname": "PolicySetting2"}]
}]
}
Would be processed into
TargetResource | Policy
modifedProps | PolicySetting1, PolicySetting2
mv-expand doesn't seem to work because some rows do not have modifiedProperties so those rows get eliminated
The only solution I have been able to find that gets close to what I am trying to do looks like this:
AuditLogs
| extend TargetResource = tostring(TargetResources[0].displayName)
| extend ModifiedProperty0 = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[0].displayName)
| extend ModifiedProperty1 = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].displayName)
| extend ModifiedProperty2 = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[2].displayName)
| extend ModifiedProperties = strcat(ModifiedProperty0,", ",ModifiedProperty1,", ",ModifiedProperty2)
This solution is limited in that it cannot work for arbitrary numbers of modifiedProperty values (it only works properly for exactly 3) which is a requirement for my purposes, I would like the solution to work if modifiedProperties does not exist and if there are 0-15 values.
Thank you for any help you can provide
if I understood your description correctly, you could use mv-apply (twice) to achieve that:
datatable(d: dynamic)
[
dynamic({"TargetResources":[{"displayName": "Policy0","someOtherProperty":"hello world"}]}),
dynamic({"TargetResources":[{"displayName": "Policy1","modifiedProperties":[{"displayname":"PolicySetting1"},{"displayname":"PolicySetting2"}]}]}),
dynamic({"TargetResources":[{"displayName": "Policy2","modifiedProperties":[{"displayname":"PolicySetting3"},{"displayname":"PolicySetting4"}]}, {"displayName":"Policy3","modifiedProperties":[{"displayname":"PolicySetting5"},{"displayname":"PolicySetting6"}]}]}),
]
| mv-apply tr = d.TargetResources on (
extend TargetResource = tr.displayName
| mv-apply mp = tr.modifiedProperties on (
extend propertyName = mp.displayname
| summarize modifiedProps = strcat_array(make_set(propertyName), ", ")
)
)
| project TargetResource, modifiedProps
TargetResource
modifiedProps
Policy0
Policy1
PolicySetting1, PolicySetting2
Policy2
PolicySetting3, PolicySetting4
Policy3
PolicySetting5, PolicySetting6

Google Pub/Sub to Dataflow, avoid duplicates with Record ID

I'm trying to build a Streaming Dataflow Job which read events from Pub/Sub and write them into BigQuery.
According to the documentation, Dataflow can detect duplicate messages delivery if a Record ID is used (see: https://cloud.google.com/dataflow/model/pubsub-io#using-record-ids)
But even using this Record ID, I still have some duplicates
(around 0.0002%).
Did I miss something ?
EDIT:
I use Spotify Async PubSub Client to publish messages with the following snipplet:
Message
.builder()
.data(new String(Base64.encodeBase64(json.getBytes())))
.attributes("myid", id, "mytimestamp", timestamp.toString)
.build()
Then I use Spotify scio to read the message from pub/sub and save it to DataFlow:
val input = sc.withName("ReadFromSubscription")
.pubsubSubscription(subscriptionName, "myid", "mytimestamp")
input
.withName("FixedWindow")
.withFixedWindows(windowSize) // apply windowing logic
.toWindowed // convert to WindowedSCollection
//
.withName("ParseJson")
.map { wv =>
wv.copy(value = TableRow(
"message_id" -> (Json.parse(wv.value) \ "id").as[String],
"message" -> wv.value)
)
}
//
.toSCollection // convert back to normal SCollection
//
.withName("SaveToBigQuery")
.saveAsBigQuery(bigQueryTable(opts), BQ_SCHEMA, WriteDisposition.WRITE_APPEND)
The Window size is 1 minute.
After only few seconds injecting messages I already have duplicates in BigQuery.
I use this query to count duplicates:
SELECT
COUNT(message_id) AS TOTAL,
COUNT(DISTINCT message_id) AS DISTINCT_TOTAL
FROM my_dataset.my_table
//returning 273666 273564
And this one to look at them:
SELECT *
FROM my_dataset.my_table
WHERE message_id IN (
SELECT message_id
FROM my_dataset.my_table
GROUP BY message_id
HAVING COUNT(*) > 1
) ORDER BY message_id
//returning for instance:
row|id | processed_at | processed_at_epoch
1 00166a5c-9143-3b9e-92c6-aab52601b0be 2017-02-02 14:06:50 UTC 1486044410367 { ...json1... }
2 00166a5c-9143-3b9e-92c6-aab52601b0be 2017-02-02 14:06:50 UTC 1486044410368 { ...json1... }
3 00354cc4-4794-3878-8762-f8784187c843 2017-02-02 13:59:33 UTC 1486043973907 { ...json2... }
4 00354cc4-4794-3878-8762-f8784187c843 2017-02-02 13:59:33 UTC 1486043973741 { ...json2... }
5 0047284e-0e89-3d57-b04d-ebe4c673cc1a 2017-02-02 14:09:10 UTC 1486044550489 { ...json3... }
6 0047284e-0e89-3d57-b04d-ebe4c673cc1a 2017-02-02 14:08:52 UTC 1486044532680 { ...json3... }
The BigQuery documentation states that there may be rare cases where duplicates arrive:
"BigQuery remembers this ID for at least one minute" -- if Dataflow takes more than one minute before retrying the insert BigQuery may allow the duplicate in. You may be able to look at the logs from the pipeline to determine if this is the case.
"In the rare instance of a Google datacenter losing connectivity unexpectedly, automatic deduplication may not be possible."
You may want to try the instructions for manually removing duplicates. This will also allow you to see the insertID that was used with each row to determine if the problem was on the Dataflow side (generating different insertIDs for the same record) or on the BigQuery side (failing to deduplicate rows based on their insertID).

How to store smart-list rules in a relational database

The system I'm building has smart groups. By smart groups, I mean groups that update automatically based on these rules:
Include all people that are associated with a given client.
Include all people that are associated with a given client and have these occupations.
Include a specific person (i.e., by ID)
Each smart groups can combine any number of these rules. So, for example, a specific smart list might have these specific rules:
Include all people that are associated with client 1
Include all people that are associated with client 5
Include person 6
Include all people associated with client 10, and who have occupations 2, 6, and 9
These rules are OR'ed together to form the group. I'm trying to think about how to best store this in the database given that, in addition to supporting these rules, I'd like to be able to add other rules in the future without too much pain.
The solution I have in mind is to have a separate model for each rule type. The model would have a method on it that returns a queryset that can be combined with other rules' querysets to, ultimately, come up with a list of people. The one downside of this that I can see is that each rule would have its own database table. Should I be concerned about this? Is there, perhaps, a better way to store this information?
Why not use Q objects?
rule1 = Q(client = 1)
rule2 = Q(client = 5)
rule3 = Q(id = 6)
rule4 = Q(client = 10) & (Q(occupation = 2) | Q(occupation = 6) | Q(occupation = 9))
people = Person.objects.filter(rule1 | rule2 | rule3 | rule4)
and then store their pickled strings into the database.
rule = rule1 | rule2 | rule3 | rule4
pickled_rule_string = pickle.dumps(rule)
Rule.objects.create(pickled_rule_string=pickled_rule_string)
Here are the models we implemented to deal with this scenario.
class ConsortiumRule(OrganizationModel):
BY_EMPLOYEE = 1
BY_CLIENT = 2
BY_OCCUPATION = 3
BY_CLASSIFICATION = 4
TYPES = (
(BY_EMPLOYEE, 'Include a specific employee'),
(BY_CLIENT, 'Include all employees of a specific client'),
(BY_OCCUPATION, 'Include all employees of a speciified client ' + \
'that have the specified occupation'),
(BY_CLASSIFICATION, 'Include all employees of a specified client ' + \
'that have the specified classifications'))
consortium = models.ForeignKey(Consortium, related_name='rules')
type = models.PositiveIntegerField(choices=TYPES, default=BY_CLIENT)
negate_rule = models.BooleanField(default=False,
help_text='Exclude people who match this rule')
class ConsortiumRuleParameter(OrganizationModel):
""" example usage: two of these objects one with "occupation=5" one
with "occupation=6" - both FK linked to a single Rule
"""
rule = models.ForeignKey(ConsortiumRule, related_name='parameters')
key = models.CharField(max_length=100, blank=False)
value = models.CharField(max_length=100, blank=False)
At first I was resistant to this solution as I didn't like the idea of storing references to other objects in a CharField (CharField was selected, because it is the most versatile. Later on, we might have a rule that matches any person whose first name starts with 'Jo'). However, I think this is the best solution for storing this kind of mapping in a relational database. One reason this is a good approach is that it's relatively easy to clean hanging references. For example, if a company is deleted, we only have to do:
ConsortiumRuleParameter.objects.filter(key='company', value=str(pk)).delete()
If the parameters were stored as serialized objects (e.g., Q objects as suggested in a comment), this would be a lot more difficult and time consuming.