Is it possible to rename a column that contains special signs and numbers in microsoft azure in KQL? - kql

I started working with Microsoft's sentinel one.
I'm working on gathering information from the logs that sentinel is producing.
For better readability, I want to change the names of the columns that I'm projecting, but couldn't rename a column that contained numbers and special characters.
I'm using KQL to gather the logs from sentinel
AuditLogs
| where OperationName == "Add group" or OperationName == "Delete group"
| where TimeGenerated > ago(20d)
| project TargetResources[0].displayName, OperationName, ActivityDateTime
| project-rename GroupName = TargetResources[0].displayName, Time = ActivityDateTime, Type = OperationName
So renaming the columns: ActivityDateTime & OperationName is working, but I get an error that says "column name expected" when trying to rename the first column. Even though it appear when running that code.
Is there a way to rename that column?

Extend operator is used to create a calculated column and new column is appended to result set. Since you just need to rename a column you can do it with project operator. project-rename doesn't work for expressions.
AuditLogs
| where OperationName == "Add group" or OperationName == "Delete group"
| where TimeGenerated > ago(20d)
| project GroupName=TargetResources[0].displayName, Type=OperationName, Time = ActivityDateTime

TargetResources[0].displayName is an expression, not a column name, so there's nothing to rename here.
If you want to give this expression a name, you can use the extend operator.
| extend GroupName = TargetResources[0].displayName
print TargetResources = dynamic([{"displayName": "Tic"}, {"displayName": "Tac"}, {"displayName": "Toe"}])
| project-rename GroupName = TargetResources[0].displayName
project-rename: expression '' cannot be used as a column name
Fiddle
print TargetResources = dynamic([{"displayName": "Tic"}, {"displayName": "Tac"}, {"displayName": "Toe"}])
| extend GroupName = TargetResources[0].displayName
TargetResources
GroupName
[{"displayName":"Tic"},{"displayName":"Tac"},{"displayName":"Toe"}]
Tic
Fiddle

Related

Self-join Kusto Query in Analytics Rule

I am working within Microsoft Sentinel Analytics Rules with the Kusto Query Language. (KQL)
I need to work in a Table called CrowdstrikeReplicatorLogs_CL which contains rows that contain a) data rows for which I need to alert on and b) metadata. that contains information about the subject in the alert.
This means I need to self-join the KQL table with itself to get the final result.
The column in question to join the table itself is the aid_g column.
ThreatIntelligenceIndicator
| where foo == bar
| join kind=innerunique (
CrowdstrikeReplicatorLogs_CL
| where TimeGenerated >= ago(dt_lookBack)
| where event_simpleName_s has_any ("NetworkConnectIP4", "NetworkConnectIP6")
| extend json=parse_json(custom_fields_message_s)
| extend ip4 = json["RemoteAddressIP4"], ip6=json["RemoteAddressIP6"]
| extend CS_ipEntity = tostring(iff(isnotempty(ip4), ip4, ip6))
| extend CommonSecurityLog_TimeGenerated = TimeGenerated
) on $left.TI_ipEntity == $right.CS_ipEntity
| join kind=innerunique (
CrowdstrikeReplicatorLogs_CL
| where custom_fields_message_s has "ComputerName"
| extend customFields=parse_json(custom_fields_message_s)
| project Hostname=customFields['ComputerName'], Platform=event_platform_s, aid_g
) on $left.aid_g == $right.aid_g
;
However, this raises a Query contains incompatible 'set' commands. error in Sentinel.
Is there a proper way to self-join tables?

SPLUNK use result from first search in second search

Say I have a query such as
index="example" source="example.log" host="example" "ERROR 1234"
| stats distinct_count by id
This will give me all the events with that error code per id.
I then want to combine this query to search the same log file for another string but only on the unique id's returned from the first search. Because the new string will appear on a separate event I can't just do an 'AND'.
There are a few ways to do that, including using subsearches, join, or append, but those require multiple passes through the data. Here is a way that makes a single pass through the index.
index=example source="example.log" ("ERROR 1234" OR "ERROR 5678")
``` Check for the presence of each string in the event ```
| eval string1=if(searchmatch("ERROR 1234"), 1, 0)
| eval string2=if(searchmatch("ERROR 5678"), 1, 0)
``` Count string occurrences by id ```
| stats sum(string1) as string1, sum(string2) as string2 by id
``` Keep only the ids that have both strings ```
| where (string1 > 0 AND string2 > 0)
You can search for "some other string" in subsearch and then join the queries on the id:
index="example" source="example.log" host="example" "ERROR 1234"
| join id [search index="example" source="example.log" host="example" "some other string" ]
| stats distinct_count by id
Presuming your id field is the same and available in both indices, this form should work:
(index=ndxA sourcetype=srctpA id=* source=example.log host=example "ERROR 1234") OR (index=ndxB sourcetype=srctpB id=* "some other string")
| rex field=_raw "(?<first_field>ERROR 1234)"
| rex field=_raw "(?<second_field>some other string)"
| fillnull value="-" first_field second_field
| stats count by id first_string second_string
| search NOT (first_string="-" OR second_string="-")
If your id field has a different name in the other index, do a rename like this before the stats line:
| rename otherIdFieldName as id
Advantages of this format:
you are not limited by subsearch constraints (search must finish in 60 seconds, no more than 50k rows)
the Search Peers (ie Indexers) will handle all of the overhead instead of having to wait on the Search Head that initiated the search to do lots of post-processing (all the SH is doing is sending the distributed search, then a post-stats filter to ensure both first_string and second_string have the values you are looking for)

Extracting value of a json in Spark SQl

I am looking to aggregate by extracting the value of a json key here from one of the column here. can someone help me with the right syntax in Spark SQL
select count(distinct(Name)) as users, xHeaderFields['xyz'] as app group by app order by users desc
The table column is something like this. I have removed other columns for simplification.Table has columns like Name etc.
Assuming that your dataset is called ds and there is only one key=xyz object per columns;
First, to JSON conversion (if needed):
ds = ds.withColumn("xHeaderFields", expr("from_json(xHeaderFields, 'array<struct<key:string,value:string>>')"))
Then filter the key = xyz and take the first element (assuming there is only one xyz key):
.withColumn("xHeaderFields", expr("filter(xHeaderFields, x -> x.key == 'xyz')[0]"))
Finally, extract value from your object:
.withColumn("xHeaderFields", expr("xHeaderFields.value"))
Final result:
+-------------+
|xHeaderFields|
+-------------+
|null |
|null |
|Settheclass |
+-------------+
Good luck!

Deep dive Azure Log analytics cost using KQL query

I'm running following Log Analytics Kusto query to get data what uses and thus generetes our Log Analytics cost
Usage
| where IsBillable == true
| summarize BillableDataGB = sum(Quantity) by Solution, DataType
| sort by Solution asc, DataType asc
and then the output is following:
What kinda query should I use if I want to deep dive more eg to ContainerInsights/InfrastructureInsights/ServiceMap/VMInsights/LogManagement so to get more detailed data what name or namespaces really cost?
Insightmetrics table have e.g these names and namespaces.
I was able maybe able to get something out using following query but something is still missing. Not totally sure if I'm on right or wrong way
union withsource = tt *
| where _IsBillable == true
| extend Namespace, Name
Here is the code for getting the name and namespace details. using Kusto query
let startTimestamp = ago(1h);
KubePodInventory
| where TimeGenerated > startTimestamp
| project ContainerID, PodName=Name, Namespace
| where PodName contains "name" and Namespace startswith "namespace"
| distinct ContainerID, PodName
| join
(
ContainerLog
| where TimeGenerated > startTimestamp
)
on ContainerID
// at this point before the next pipe, columns from both tables are available to be "projected". Due to both
// tables having a "Name" column, we assign an alias as PodName to one column which we actually want
| project TimeGenerated, PodName, LogEntry, LogEntrySource
| summarize by TimeGenerated, LogEntry
| order by TimeGenerated desc
For more information you can go through the Microsoft document and here is the Kust Query Tutorial.

Take output from query and use in subsequent KQL query

I'm using Azure Log Analytics to review certain events of interest.
I would like to obtain timestamps from data that meets a certain criteria, and then reuse these timestamps in further queries, i.e. to see what else occurred around these times.
The following query returns the desired results, but I'm stuck at how to use the interestingTimes var to then perform further searches and show data within X minutes of each previously returned timestamp.
let interestingTimes =
Event
| where TimeGenerated between (datetime(2021-04-01T11:57:22) .. datetime('2021-04-01T15:00:00'))
| where EventID == 1
| parse EventData with * '<Data Name="Image">' ImageName "<" *
| where ImageName contains "MicrosoftEdge.exe"
| project TimeGenerated
;
Any pointers would be greatly appreciated.
interestingTimes will only be available for use in the query where you declare it. You can't use it in another query, unless you define it there as well.
By the way, you can make your query much more efficient by adding a filter that will utilize the built-in index for the EventData column, so that the parse operator will run on a much smaller amount of records:
let interestingTimes =
Event
| where TimeGenerated between (datetime(2021-04-01T11:57:22) .. datetime('2021-04-01T15:00:00'))
| where EventID == 1
| where EventData has "MicrosoftEdge.exe" // <-- OPTIMIZATION that will filter out most records
| parse EventData with * '<Data Name="Image">' ImageName "<" *
| where ImageName contains "MicrosoftEdge.exe"
| project TimeGenerated
;