KQL - return entries not matching IP from watchlist (query optimization) - kql

I want to receive a high severity alert in Sentinel when a user is added to a defined "high severity" group (via watchlist), however, I want to omit any users that are connected to a Zscaler IP address. The query below is working, however, I'm not sure this is the neatest/most optimized logic. Is there a shorter/better way to write this?
I'm only concerned about the lines beginning with asterisks (which are only added for clarity).
watchlist "aadgroups"
Group
Severity
Prod Owners
High
Prod Contributors
High
watchlist "ZSIPs"
zscaler_ip
location
165.225.0.0/23
Chicago
165.225.60.0/22
Chicago
165.225.56.0/22
Chicago
let HighSeverityGroups = (_GetWatchlist('aadgroups') | where severity == "High" | project group_name, severity);
let ZSIPs = (_GetWatchlist('zscaler_ip') | project zscaler_ip);
AuditLogs
| where ActivityDisplayName == "Add member to group"
| where parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)) has_any (HighSeverityGroups)
| extend InitiatedByActor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend GroupName = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
| extend Actor_ipv4 = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| extend TargetUser = tostring(TargetResources[0].userPrincipalName)
| project-reorder TimeGenerated,SourceSystem,InitiatedBy,ActivityDisplayName,TargetUser,GroupName,InitiatedByActor,Actor_ipv4,Result
| where TargetUser <> ""
** | evaluate ipv4_lookup(ZSIPs, Actor_ipv4, zscaler_ip, return_unmatched = true)
** | where isempty(zscaler_ip)

A couple of things you can try to optimize the query:
This filter is quite costly: | where parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)) has_any (HighSeverityGroups) - if TargetResources will rarely have strings from HighSeverityGroups, then before this filter, you can add a much more efficient filter, that will filter out most of the records: | where TargetResources has_any (HighSeverityGroups) - this way, the heavy parsing will be done only on a small amount of records
You're parsing some of the data more than once, for example tostring(parse_json(tostring(InitiatedBy.user)) - instead, you need to use the extend operator to parse them only once, and then use later on in the query

Related

Defender KQL to show blocked Bluetooth Devices with all relevant fields

I'm trying to write a query to report on BlueToothPolicyTriggered events, that will return all the details to show when a device was blocked by policy AND the details of that device.
Our BT policy basically should allow everything but block file transfer over BT. That seems to be working as expected, but before rolling out wider, want a quick way to 'see' if any other devices are being blocked incorrectly or be able to refer to it if a user reports an issue so we can get all the details of the device blocked to add an exception etc.
However (and I'm new to kql) it seems once I filter a table using an 'ActionType' the columns available to report on are restricted, and in this case we lose details of the BT device that has been blocked
This shows all events that have triggered the policy and whether it was 'accepted' or 'blocked' but not the details of the device
search in (DeviceEvents) ActionType == "BluetoothPolicyTriggered"
| extend parsed=parse_json(AdditionalFields)
| extend Result = tostring(parsed.Accepted)
| extend BluetoothMACAddress = tostring(parsed.BluetoothMacAddress)
| extend PolicyName = tostring(parsed.PolicyName)
| extend PolicyPath = tostring(parsed.PolicyPath)
| summarize arg_max(Timestamp, *) by DeviceName, BluetoothMACAddress
| sort by Timestamp desc
| project Timestamp, DeviceName, DeviceId, Result, ActionType, BluetoothMACAddress, PolicyPath, PolicyName, ReportId
Then I have this which will show every BT connection, the device details im looking for, but not whether it was blocked or accepted
DeviceEvents
| extend parsed=parse_json(AdditionalFields)
| extend MediaClass = tostring(parsed.ClassName)
| extend MediaDeviceId = tostring(parsed.DeviceId)
| extend MediaDescription = tostring(parsed.DeviceDescription)
| extend MediaSerialNumber = tostring(parsed.SerialNumber)
| where MediaClass == "Bluetooth"
| project Timestamp, DeviceId, DeviceName, MediaClass, MediaDeviceId, MediaDescription, parsed
| order by Timestamp desc
Ive been trying to somehow join these together (despite being the same DeviceEvents table) with not much success. I don't trust the output as im seeing entries saying a device was blocked when I know it wasnt.
DeviceEvents
| where ActionType == "BluetoothPolicyTriggered"
| extend parsed=parse_json(AdditionalFields)
| extend Result = tostring(parsed.Accepted)
| extend BluetoothMACAddress = tostring(parsed.BluetoothMacAddress)
| extend PolicyName = tostring(parsed.PolicyName)
| extend PolicyPath = tostring(parsed.PolicyPath)
| project Timestamp, DeviceName, DeviceId, Result, ActionType, BluetoothMACAddress, PolicyPath, PolicyName, ReportId
| join kind =inner (DeviceEvents
| extend parsed=parse_json(AdditionalFields)
| extend MediaClass = tostring(parsed.ClassName)
| extend MediaDeviceId = tostring(parsed.DeviceId)
| extend MediaDescription = tostring(parsed.DeviceDescription)
| extend MediaSerialNumber = tostring(parsed.SerialNumber)
) on DeviceName
| where MediaClass == "Bluetooth"
| project Timestamp, DeviceName, Result, ActionType, MediaClass, MediaDeviceId, MediaDescription,BluetoothMACAddress
| sort by Timestamp desc
Am i going about this completely wrong ?

Self-join Kusto Query in Analytics Rule

I am working within Microsoft Sentinel Analytics Rules with the Kusto Query Language. (KQL)
I need to work in a Table called CrowdstrikeReplicatorLogs_CL which contains rows that contain a) data rows for which I need to alert on and b) metadata. that contains information about the subject in the alert.
This means I need to self-join the KQL table with itself to get the final result.
The column in question to join the table itself is the aid_g column.
ThreatIntelligenceIndicator
| where foo == bar
| join kind=innerunique (
CrowdstrikeReplicatorLogs_CL
| where TimeGenerated >= ago(dt_lookBack)
| where event_simpleName_s has_any ("NetworkConnectIP4", "NetworkConnectIP6")
| extend json=parse_json(custom_fields_message_s)
| extend ip4 = json["RemoteAddressIP4"], ip6=json["RemoteAddressIP6"]
| extend CS_ipEntity = tostring(iff(isnotempty(ip4), ip4, ip6))
| extend CommonSecurityLog_TimeGenerated = TimeGenerated
) on $left.TI_ipEntity == $right.CS_ipEntity
| join kind=innerunique (
CrowdstrikeReplicatorLogs_CL
| where custom_fields_message_s has "ComputerName"
| extend customFields=parse_json(custom_fields_message_s)
| project Hostname=customFields['ComputerName'], Platform=event_platform_s, aid_g
) on $left.aid_g == $right.aid_g
;
However, this raises a Query contains incompatible 'set' commands. error in Sentinel.
Is there a proper way to self-join tables?

Force an Azure Resource Graoh to show 0 - KQL - Azure monitor

I want to create a pie chart showing the counts of open (not closed) alerts which is working. However, I want it to default to 0 in the chart when there is no alert for a particular severity
alertsmanagementresources
|extend Sev = tostring(parse_json(properties.essentials.severity)),
LastModifiedTime = todatetime(properties.essentials.lastModifiedDateTime)
| where tostring(parse_json(properties.essentials.alertState)) <> 'Closed'
| where resourceGroup =='ai-eazyfuel-eu-prd-rg'
| where Sev =='Sev0'
|where LastModifiedTime >=datetime(2022/07/26)
|summarize count() by Sev
Is this even possible because I understand there are no results to show but you know what end users are like
While it's feasible to write the KQL query:
Azure Resource Graph uses only a limited subset of KQL which makes the query syntax cumbersome.
Azure Resource Graph cannot display 0 size slice.
P.S.
Please note the removal of unnecessary transformations of properties and the use of ISO format for datetime.
resources
| take 1
| mv-expand severity = range(0,4) to typeof(string)
| project severity = strcat("Sev", severity)
| join kind=leftouter
(
alertsmanagementresources
| extend severity = tostring(properties.essentials.severity)
,lastModifiedDateTime = todatetime(properties.essentials.lastModifiedDateTime)
| where properties.essentials.alertState <> "Closed"
and resourceGroup == "ai-eazyfuel-eu-prd-rg"
and severity == "Sev0"
and lastModifiedDateTime >= datetime("2022-07-26")
| summarize count() by severity
) on severity
| project severity, count_ = coalesce(count_, 0)

How to make this very complicated query from three connected models with Django ORM?

Good day, everyone. Hope you're doing well. I'm a Django newbie, trying to learn the basics of RESTful development while helping in a small app project. We currently want some of our models to update accordingly based on the data we submit to them, by using the Django ORM and the fields that some of them share wih OneToMany relationsips. Currently, there's a really difficult query that I must do for one of my fields to update automatically given that filter. First, let me explain the models. This are not real, but a doppleganger that should work the same:
First we have a Report model that is a teacher's report of a student:
class Report(models.Model):
status = models.CharField(max_length=32, choices=Statuses.choices, default=Statuses.created,)
student = models.ForeignKey(Student, on_delete=models.CASCADE,)
headroom_teacher = models.ForeignKey(TeacherStaff, on_delete=models.CASCADE,)
# Various dates
results_date = models.DateTimeField(null=True, blank=True)
report_created = models.DateTimeField(null=True, blank=True)
.
#Other fields that don't matter
Here we have two related models, which are student and headroom_teacher. It's not necessary to show their models, but their relationship with the next two models is really important. We also have an Exams model:
class Exams(models.Model):
student = models.ForeignKey(student, on_delete=models.CASCADE,)
headroom_teacher = models.ForeignKey(TeacherStaff, on_delete=models.CASCADE,)
# Various dates
results_date = models.DateTimeField(null=True, blank=True)
initial_exam_date = models.DateTimeField(null=True, blank=True)
.
#Other fields that don't matter
As you can see, the purpose of this app is akin to reporting on the performance of students after completing some exams, and every Report is made by a teacher for specific student on how he did on those exams. Finally we have a final model called StudentMood that aims to show how should an student be feeling depending on the status of their exams:
class StudentMood(models.Model):
report = models.ForeignKey(Report, on_delete=models.CASCADE,)
student_status = models.CharField(
max_length=32, choices=Status.choices,
default=None, null=True, blank=False)
headroom_teacher = models.ForeignKey(TeacherStaff, on_delete=models.CASCADE,)
And with these three models is that we arrive to the crux of the issue. One of our possible student_status options is called Anxious for results, which we believe a student will feel during the time when he already has done an exam and is waiting for the results.
I want to automatically set my student_status to that, using a custom manager that takes into account the date that the report has been done or the day the data has been entered. I believe this can be done by making a query taking into account initial_exam_date.
I already have my custom manager set up, and the only thing missing is this query. I have no choice but to do it with Django's ORM. However, I've come up with an approximate raw SQL query, that I'm not sure if it's ok:
SELECT student_mood.id AS student_mood_id FROM
school_student_mood LEFT JOIN
school_reports report
ON student_mood.report_id = report.id AND student_mood.headroom_teacher_id = report.headroom_teacher_id
JOIN school_exams exams
ON report.headroom_teacher_id = exams.headroom_teacher_id
AND report.student_id = exams.student_id
AND exams.results_date > date where the student_mood or report data is entered, I guess
And that's what I've come to ask for help. Could someone shed some light into how to transfer this into a single query?
Without having an environment setup or really knowing exactly what you want out of the data. This is a good start.
Generally speaking, the Django ORM is not great for these types of queries, and trying to use select_related or prefetches results in really complex and inefficient queries.
I've found the best way to achieve these types of queries in Django is to break each piece of your puzzle down into a query that returns a "list" of ids that you can then use in a subquery.
Then you keep working down until you have your final output
from django.db.models import Subquery
# Grab the students of any exams where the result_date is greater than a specific date.
student_exam_subquery = Exam.objects.filter(
results_date__gt=timezone.now()
).values_list('student__id', flat=True)
# Grab all the student moods related to reports that relate to our "exams" where the student is anxious
student_mood_subquery = StudentMood.objects.filter(
student_status='anxious',
reports__student__in=Subquery(student_exam_subquery)
).values_list('report__id', flat=True)
# Get the final list of students
Student.objects.values_list('id', flat=True).filter(
reports__id__in=Subquery(student_mood_subquery)
)
Now I doubt this will work out of the box, but it's really to give you an understanding of how you might go about solving this in a way that is readable to future devs and the most efficient (db wise).
So, the issue I was running into, is that the school has exam cycles each period, and it was difficult to retrieve only the students' report for this cycle. Let's assume we have the following database:
+-----------+-----------+----------------+-------------------+-------------------+------------------+
| Student | Report ID | StudentMood ID | Exam Cycle Status | Initial Exam Date | Report created a |
+-----------+-----------+----------------+-------------------+-------------------+------------------+
| Student 1 | 1 | 1 | Done | 01/01/2020 | 02/01/2020 |
| Student 2 | 2 | 2 | Done | 01/01/2020 | 02/01/2020 |
| Student 1 | 3 | 3 | On Going | 02/06/2020 | 01/01/2020 |
| Student 2 | 4 | 4 | On Going | 02/06/2020 | 01/01/2020 |
+-----------+-----------+----------------+-------------------+-------------------+------------------+
And Obviously, I wanted to limit my query to just this cycle, like this:
+-----------+-----------+----------------+-------------------+-------------------+------------------+
| Student | Report ID | StudentMood ID | Exam Cycle Status | Initial Exam Date | Report created a |
+-----------+-----------+----------------+-------------------+-------------------+------------------+
| Student 1 | 3 | 3 | On Going | 02/06/2020 | 01/01/2020 |
| Student 2 | 4 | 4 | On Going | 02/06/2020 | 01/01/2020 |
+-----------+-----------+----------------+-------------------+-------------------+------------------+
Now, your answer, trent, was really useful, but I'm still having issues retrieving in the shape of the above:
qs_exams = Exams.objects.filter(initial_exam_date__gt=now()).values_list('student__id', flat=True)
qs_report = Report.objects.filter(student__id__in=qs_exams).values_list('id', flat=True)
qs_mood = StudentMood.objects.select_related('report') \
.filter(report__id__in=qs_report).order_by('report__student_id', '-created').distinct()
But this query is still giving me all the StudentMoods throughout the school year. Sooooo, any ideas?

How to match phone number prefix to country from phonenumber in SQL

I am trying to extract the country code prefix from a list of numbers, and match them to the region that they belong to. The data might look something like this:
| id | phone_number |
|----|----------------|
| 1 | +27000000000 |
| 2 | +16840000000 |
| 3 | +10000000000 |
| 4 | +27000000000 |
The country codes here are:
American Samoa: +1684
United States and Caribbean: +1
South Africa: +27
And the desired result would be something this:
| country | count |
|-----------------------------|-------|
| South Africa | 2 |
| American Samoa | 1 |
| United States and Caribbean | 1 |
There are some difficulties because
country prefix codes vary from 1 to 4 numbers and even without the country prefix,
phone number length varies from place to place.
I do not have write access to this DB, so adding another column, while probably the best solution, will not work in this use case
This is my current solution:
SELECT
CASE
WHEN SUBSTRING(phone_number,1,5) = '+1684' THEN 'American Samoa'
WHEN SUBSTRING(phone_number,1,5) = '+1264' THEN 'Anguilla'
...
WHEN SUBSTRING(phone_number,1,5) = '+1599' THEN 'Saint Martin'
WHEN SUBSTRING(phone_number,1,4) = '+355' THEN 'Albania'
WHEN SUBSTRING(phone_number,1,4) = '+213' THEN 'Algeria'
...
WHEN SUBSTRING(phone_number,1,4) = '+263' THEN 'Zimbabwe'
WHEN SUBSTRING(phone_number,1,3) = '+93' THEN 'Afghanistan'
WHEN SUBSTRING(phone_number,1,3) = '+54' THEN 'Argentina'
...
WHEN SUBSTRING(phone_number,1,3) = '+58' THEN 'Venezuela'
WHEN SUBSTRING(phone_number,1,3) = '+84' THEN 'Vietnam'
WHEN SUBSTRING(phone_number,1,2) = '+1' THEN 'United States and Caribbean'
WHEN SUBSTRING(phone_number,1,2) = '+7' THEN 'Kazakhstan, Russia'
ELSE 'unknown'
END as country_name,
count(*)
FROM users
GROUP BY country_name
order by count desc
There are ~205 WHEN ... THEN cases. It seems to be very inefficient and times out. I assume this is because it runs the pattern matching on every row. This would need to scale to roughly 10s of millions of rows
Is there a more efficient way to do this?
I am using postgreSQL 9.6.16
In spite of reading the whole table, an index could help here. In order to aggregate the data per country code, the DBMS must sort all rows by country code. Sorting is an expensive operation, especially on large data sets. If you had an index on the country codes, the DBMS would find the codes already pre-sorted in the index and could avoid the work of sorting the data.
You don't have the separate country code in a column, but each phone number starts with the code, so you could index the complete phone number:
create index idx on users (phone_number);
Then you must make it obvious to the DBMS that you are interested in the beginnings of the string, so it will consider using the index. Invoking a function like SUBSTRING on the phone number is likely to make the the DBMS blind to this. Use LIKE instead. According to the docs (https://www.postgresql.org/docs/9.3/indexes-types.html), indexes on strings can be used with LIKE 'something%':
WHEN phone_number LIKE '+1684%' THEN 'American Samoa'
There is no guarantee this will help, but it's worth a try I think. It depends on whether the optimizer sees the advantage of using the pre-sorted phone numbers from the index.