Process fields with nested arrays into strings with strcat_array for output in Kusto - kql

I would like to process Azure AD audit Logs into HTML tables/csv files. The data contains nested sets of arrays that I would like to summarise into a comma separated string.
eg data that looks like this
{
"TargetResources": [{"displayName": "Policy",
"modifiedProperties": [{"displayname": "PolicySetting1"},
{"displayname": "PolicySetting2"}]
}]
}
Would be processed into
TargetResource | Policy
modifedProps | PolicySetting1, PolicySetting2
mv-expand doesn't seem to work because some rows do not have modifiedProperties so those rows get eliminated
The only solution I have been able to find that gets close to what I am trying to do looks like this:
AuditLogs
| extend TargetResource = tostring(TargetResources[0].displayName)
| extend ModifiedProperty0 = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[0].displayName)
| extend ModifiedProperty1 = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].displayName)
| extend ModifiedProperty2 = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[2].displayName)
| extend ModifiedProperties = strcat(ModifiedProperty0,", ",ModifiedProperty1,", ",ModifiedProperty2)
This solution is limited in that it cannot work for arbitrary numbers of modifiedProperty values (it only works properly for exactly 3) which is a requirement for my purposes, I would like the solution to work if modifiedProperties does not exist and if there are 0-15 values.
Thank you for any help you can provide

if I understood your description correctly, you could use mv-apply (twice) to achieve that:
datatable(d: dynamic)
[
dynamic({"TargetResources":[{"displayName": "Policy0","someOtherProperty":"hello world"}]}),
dynamic({"TargetResources":[{"displayName": "Policy1","modifiedProperties":[{"displayname":"PolicySetting1"},{"displayname":"PolicySetting2"}]}]}),
dynamic({"TargetResources":[{"displayName": "Policy2","modifiedProperties":[{"displayname":"PolicySetting3"},{"displayname":"PolicySetting4"}]}, {"displayName":"Policy3","modifiedProperties":[{"displayname":"PolicySetting5"},{"displayname":"PolicySetting6"}]}]}),
]
| mv-apply tr = d.TargetResources on (
extend TargetResource = tr.displayName
| mv-apply mp = tr.modifiedProperties on (
extend propertyName = mp.displayname
| summarize modifiedProps = strcat_array(make_set(propertyName), ", ")
)
)
| project TargetResource, modifiedProps
TargetResource
modifiedProps
Policy0
Policy1
PolicySetting1, PolicySetting2
Policy2
PolicySetting3, PolicySetting4
Policy3
PolicySetting5, PolicySetting6

Related

How to filter a date-field with a swift vapor-fluent query

To avoid multiple inserts of the same person in a database, I wrote the following function:
func anzahlDoubletten(_ req: Request, nname: String, vname: String, gebTag: Date)
async throws -> Int {
try await
Teilnehmer.query(on: req.db)
.filter(\.$nname == nname)
.filter(\.$vname == vname)
.filter(\.$gebTag == gebTag)
.count()
}
The function always returns 0, even if there are multiple records with the same surname, prename and birthday in the database.
Here is the resulting sql-query:
[ DEBUG ] SELECT COUNT("teilnehmer"."id") AS "aggregate" FROM "teilnehmer" WHERE "teilnehmer"."nname" = $1 AND "teilnehmer"."vname" = $2 AND "teilnehmer"."geburtstag" = $3 ["neumann", "alfred e.", 1999-09-09 00:00:00 +0000] [database-id: psql, request-id: 1AC70C41-EADE-43C2-A12A-99C19462EDE3] (FluentPostgresDriver/FluentPostgresDatabase.swift:29)
[ INFO ] anzahlDoubletten=0 [request-id: 1AC70C41-EADE-43C2-A12A-99C19462EDE3] (App/Controllers/TeilnehmerController.swift:49)
if I query directly I obtain:
lwm=# select nname, vname, geburtstag from teilnehmer;
nname | vname | geburtstag
---------+-----------+------------
neumann | alfred e. | 1999-09-09
neumann | alfred e. | 1999-09-09
neumann | alfred e. | 1999-09-09
neumann | alfred e. | 1999-09-09
so count() should return 4 not 0:
lwm=# select count(*) from teilnehmer where nname = 'neumann' and vname = 'alfred e.' and geburtstag = '1999-09-09';
count
-------
4
My DateFormatter is defined like so:
let dateFormatter = ISO8601DateFormatter()
dateFormatter.formatOptions = [.withFullDate, .withDashSeparatorInDate]
And finally the attribute "birthday" in my model:
...
#Field(key: "geburtstag")
var gebTag: Date
...
I inserted the 4 alfreds in my database using the model and fluent, passing the birthday "1999-09-09" as a String and fluent inserted all records correctly.
But .filter(\.$gebTag == gebTag) seems to return constantly 'false'.
Is it at all possible to use .filter() with data types other than String?
And if so, what am I doing wrong?
Many thanks for your help
Michael
The problem you've hit is that you're storing only dates whereas you're filtering on dates with times. Unfortunately there's no native way to store just a date. However there are a few options.
The easiest way is to change the date field to a String and then use your date formatter (make sure you remove the time part) to convert the query option to a String.
I am guessing slightly here, but I suspect that your table was not created by a Migration? If it had been, your geburtstag field would include a time component as this is the default and you would have spotted the problem quickly.
In any event, the filter is actually filtering on the time component of gebTag as well as the date. This is why it is returning zero.
I suggest converting the geburtstag to a type that includes the time and ensuring that the time component is set to 0:00:00 when you store it. You can reset the time component to 'midnight' using something like this:
extension Date {
var midnight: Date { return Calendar.current.date(bySettingHour: 0, minute: 0, second: 0, of: self)! }
}
Then change your filter to:
.filter(\.$gebTag == gebTag.midnight)
Alternatively, just use the static method in Calendar:
.filter(\.$gebTag == Calendar.startOfDay(for:gebTag))
I think this is the most straightforward way of doing it.

How to track SLA of VM availability set (or availability zone) through heartbeats with Log Analytics (KQL)

I want to track the SLAs of our VMs in a Monitor Workbook using a Log Analytics query.
For this, I use the 'Heartbeat' table, which gives the heartbeats of each VM.
However, some of our VMs are in an availability set/zone and as such, the SLA is only broken,
if in an interval of 1 minute, both heartbeats are missing.
As such I need to be able to group the heartbeats by availability set/zone in the query, but there doesn't seem to be such a property on the heartbeat.
I can use a separate Azure Resource Graph query to search for which VMs are in an availability set/zone, but when I merge this query with my Log Analytics query, I can't do any further Kusto Query Language processing on the query (I can only merge the tables).
For information, these are my Log Analytics Heartbeat query and my Resource Graph SLA query:
let timeRangeStart = {TimeRange:start};
let timeRangeEnd = {TimeRange:end};
Heartbeat
| where ResourceType == "virtualMachines"
| extend ResourceGroup = case(ResourceGroup <> "", ResourceGroup, "On-Prem")
| where TimeGenerated > timeRangeStart and TimeGenerated < timeRangeEnd and Computer in ({Servers})
| extend Resource=tolower(iff(isempty(_ResourceId), Resource, _ResourceId))
| summarize heartbeat_tot = count() by Resource,ResourceGroup, SubscriptionId
| extend total_number_of_buckets=round((timeRangeEnd-timeRangeStart)/1m)
| extend round(availability_rate=heartbeat_tot*100/total_number_of_buckets,2)
| extend availability_rate = min_of(availability_rate, 100)
| order by availability_rate asc
Resources // VMs
| where type == 'microsoft.compute/virtualmachines'
| extend AvSet = properties.availabilitySet.id
| extend AvZone = properties.availabilityZone.id
| extend VMname_SLA = iff(isnotempty(AvZone), AvZone, iff(isnotempty(AvSet), AvSet, id))
| extend SLA_VM = iff(isnotnull(AvZone), '99.99%', iff(isnotnull(AvSet), '99.95%', ''))
| extend managedBy = tolower(id)
| join kind = leftouter (
Resources // Disks
| where type == 'microsoft.compute/disks'
| where isnotempty(managedBy)
| extend managedBy = tolower(managedBy)
// What do Standard HDD disks have as SKU tag??? I used StandardHDD for the time being
| extend Tier_disk = sku.tier
| extend SLA_disk = iff(Tier_disk == 'StandardHDD', '95%', iff(Tier_disk == 'Standard', '99.5%', '99.9%'))
) on managedBy
| extend SLA_tot = iff(isnotempty(SLA_VM), SLA_VM, SLA_disk)
| project managedBy, VMname_SLA, SLA_tot
| order by managedBy asc
How many resources is it?
If it is not a large number of resources, a workaround would be:
run your ARG query in text parameter, and format the results of the query to effectively generate a json array of objects, with id, location, etc that you need. then mark this parameter as hidden
in your Logs query, reference that parameter json text before the query, and use KQL operators to turn that JSON structure into a table. then you can join/filter on that table in the query
it isn't optimal, and won't work well if there are large numbers of resources since every time you run your query you're effectively "uploading" a json blob and then immediately parsing it apart again.

Pandas: Subset of subset with multiple conditions

I need to grab a subset of the following using multiple conditions:
Event Type must contain the string 'Outreach'
AND any other field can contain the string 'STEM' - case insensitive.
Data Sample:
Title Event Type Presenter Description Tags
STEM event STEM Gloria Bubbles Craft
Robots Outreach STEM - John EV3 Bots
School STEM Outreach Billy Robots Craft
Code:
cond = df['Event Type'].str.contains('Outreach')
stemA = df[cond]
This gets me all the outreach events.
cond = df['Event Type'].str.contains('Outreach') & (df['Presenter'].str.contains('STEM') | df['Tags'].str.contains('STEM') | df['Description'].str.contains('STEM') | df['Title'].str.contains('STEM'))
stem[cond]
I was hoping for a grep-like solution. The above gets me less than grep does on the command line and I know this result is wrong from looking at the data.
IIUC, this should work for you
cols_to_include = df.columns[df.columns != 'Event Type']
a = df[cols_to_include].astype(str).sum(axis=1)
df[df['Event Type'].str.contains('Outreach') & (a.str.contains('STEM', regex=True))]

Generating seed code from existing database in ASP.NET MVC

I wondered if anyone has encountered a similar challenge:
I have a database with some data that was ETL'ed (imported and transformed) in there from an Excel file. In my ASP.NET MVC web application I'm using Code First approach and dropping/creating every time database changes:
#if DEBUG
Database.SetInitializer(new DropCreateDatabaseIfModelChanges<MyDataContext>());
#endif
However, since the data in the Database is lost, I have to ETL it again, which is annoying.
Since, the DB will be dropped only on model change, I will have to tweak my ETL anyway, I know that. But I'd rather change my DB seed code.
Does anyone know how to take the contents of the database and generate seed code, assuming that both Models and SQL Tables are up to date?
EDIT 1:
I'm planning to use the auto-generated Configuration.cs, and its Seed method, and then use AddOrUpdate() method to add data into the database: Here is Microsoft's Tutorial on migrations (specifically the "Set up the Seed method" section).
Lets say we have a simple database table with 3750 records in it;
| Id | Age | FullName |
|------|-----|-----------------|
| 1 | 50 | Michael Jackson |
| 2 | 42 | Elvis Presley |
| 3 | 48 | Whitney Houston |
| ... | ... | ... |
| 3750 | 57 | Prince |
We want to create this table in our database with using auto-generated Configuration.cs file and its Seed() method.
protected override void Seed(OurDbContainer context)
{
context.GreatestSingers.AddOrUpdate(
p => p.Id,
new GreatestSinger { Id = 1, Age = 50, FullName = "Michael Jackson" },
new GreatestSinger { Id = 2, Age = 42, FullName = "Elvis Presley" },
new GreatestSinger { Id = 3, Age = 48, FullName = "Whitney Houston" }
);
}
This is what you should do. 3750 times!
But you already have this data in your existing database table. So we can use this existing data to create Seed() codes.
With the help of SQL String Concatenation;
SELECT
CONCAT('new GreatestSinger { Id = ', Id ,', Age = ', Age ,', FullName = "', FullName ,'" },')
FROM GreatestSinger
will give us all the code needed to create 3750 rows of data.
Just copy/paste it into Seed() method. And from Package Manager Console;
Add-Migration SeedDBwithSingersData
Update-Database
Another way of seeding data is to run it as sql in an Up migration.
I have code that will read a sql file and run it
using System;
using System.Data.Entity.Migrations;
using System.IO;
public partial class InsertStandingData : DbMigration
{
public override void Up()
{
var baseDir = AppDomain.CurrentDomain
.BaseDirectory
.Replace("\\bin", string.Empty) + "\\Data\\Sql Scripts";
Sql(File.ReadAllText(baseDir + "\\StandingData.sql"));
}
public override void Down()
{
//Add delete sql here
}
}
So if your ETL generates sql for you then you could use that technique.
The advantages of doing it in the Up method are
It will be quicker than doing it using AddOrUpdate because
AddOrUpdate queries the database each time it is called to get any
already existing entity.
You are normally going from a known state (e.g. empty tables) so you probably
don't need to check whether data exists already. NB to ensure this
then you should delete the data in the Down method so that you can
tear all the way down and back up again.
The Up method does not run every time the application starts.
The Seed method provides convenience - and it has the advantage (!?) that it runs every time the application starts
But if you prefer to run the sql from there use ExecuteSqlCommand instead of Sql:
string baseDir = AppDomain.CurrentDomain.BaseDirectory.Replace("\\bin", string.Empty)
+ "\\Data\\Sql Scripts";
string path = Path.Combine(baseDir, "StandingData");
foreach (string file in Directory.GetFiles(path, "*.sql"))
{
context.Database.ExecuteSqlCommand(File.ReadAllText(file));
}
References:
Best way to incrementally seed data
Preparing for database deployment
Database Initializer and Migrations Seed Methods

How to store smart-list rules in a relational database

The system I'm building has smart groups. By smart groups, I mean groups that update automatically based on these rules:
Include all people that are associated with a given client.
Include all people that are associated with a given client and have these occupations.
Include a specific person (i.e., by ID)
Each smart groups can combine any number of these rules. So, for example, a specific smart list might have these specific rules:
Include all people that are associated with client 1
Include all people that are associated with client 5
Include person 6
Include all people associated with client 10, and who have occupations 2, 6, and 9
These rules are OR'ed together to form the group. I'm trying to think about how to best store this in the database given that, in addition to supporting these rules, I'd like to be able to add other rules in the future without too much pain.
The solution I have in mind is to have a separate model for each rule type. The model would have a method on it that returns a queryset that can be combined with other rules' querysets to, ultimately, come up with a list of people. The one downside of this that I can see is that each rule would have its own database table. Should I be concerned about this? Is there, perhaps, a better way to store this information?
Why not use Q objects?
rule1 = Q(client = 1)
rule2 = Q(client = 5)
rule3 = Q(id = 6)
rule4 = Q(client = 10) & (Q(occupation = 2) | Q(occupation = 6) | Q(occupation = 9))
people = Person.objects.filter(rule1 | rule2 | rule3 | rule4)
and then store their pickled strings into the database.
rule = rule1 | rule2 | rule3 | rule4
pickled_rule_string = pickle.dumps(rule)
Rule.objects.create(pickled_rule_string=pickled_rule_string)
Here are the models we implemented to deal with this scenario.
class ConsortiumRule(OrganizationModel):
BY_EMPLOYEE = 1
BY_CLIENT = 2
BY_OCCUPATION = 3
BY_CLASSIFICATION = 4
TYPES = (
(BY_EMPLOYEE, 'Include a specific employee'),
(BY_CLIENT, 'Include all employees of a specific client'),
(BY_OCCUPATION, 'Include all employees of a speciified client ' + \
'that have the specified occupation'),
(BY_CLASSIFICATION, 'Include all employees of a specified client ' + \
'that have the specified classifications'))
consortium = models.ForeignKey(Consortium, related_name='rules')
type = models.PositiveIntegerField(choices=TYPES, default=BY_CLIENT)
negate_rule = models.BooleanField(default=False,
help_text='Exclude people who match this rule')
class ConsortiumRuleParameter(OrganizationModel):
""" example usage: two of these objects one with "occupation=5" one
with "occupation=6" - both FK linked to a single Rule
"""
rule = models.ForeignKey(ConsortiumRule, related_name='parameters')
key = models.CharField(max_length=100, blank=False)
value = models.CharField(max_length=100, blank=False)
At first I was resistant to this solution as I didn't like the idea of storing references to other objects in a CharField (CharField was selected, because it is the most versatile. Later on, we might have a rule that matches any person whose first name starts with 'Jo'). However, I think this is the best solution for storing this kind of mapping in a relational database. One reason this is a good approach is that it's relatively easy to clean hanging references. For example, if a company is deleted, we only have to do:
ConsortiumRuleParameter.objects.filter(key='company', value=str(pk)).delete()
If the parameters were stored as serialized objects (e.g., Q objects as suggested in a comment), this would be a lot more difficult and time consuming.