NextgenSplunk: Need help forming a splunk query which takes sessionId from a particular set of logs, use it to form next query - splunk

I need to form a Splunk query to find a particular sessionId for which log a is available but log b is not. Both are part of the same transaction but code breaking in between somewhere.
LOGGER.info("Log a:: setting some details in session");
Response response = handler.transactionMethod(token); //throws some exception
LOGGER.info("Log b:: getting details in session");
So in the success scenario, both Log a and Log b will be printed. But when transactionMethod throws an exception, only Log a will be printed for that sessionId and not Log b.
The requirement is I need to find any of the sessionId for which only Log a is present, not Log b.

Assuming that you have 2 fields TEXT and SessionID already defined, we will use the following test data:
SessionID=1001 TEXT="setting
SessionID=1001 TEXT="getting
SessionID=1002 TEXT="setting
SessionID=1003 TEXT="getting"
Splunk query:
| makeresults count=4
| streamstats count
| eval TEXT=case(count=1 OR count=3, "setting", count=2 OR count=4, "getting")
| eval SessionID=case(count=1 OR count=2, 1001, count=3, 1002, count=4, 1003)
``` The above is just setting of the test data ```
``` Below is the actual SPL for the task ```
| stats count(eval(TEXT=setting")) as LogA count(eval(TEXT="getting") as Logb by SessionID
| search LogA > 0 and LogB = 0
As you can see I specifically excluded the case when only "LogB" record is present (SessionID=3)

Related

retrieve a variable in the case of an outline scenario

I want to recover an object id (processId) that I pass in a simple Post request in a feature A
Given url <url>
And path 'processes'
And header Authorization = 'Bearer ' + <token>
And request process
When method post
Then status 201
* def processId = response.id
I test this request with 3 different environments in a Scenario outline (varriable <url>). So I have to recover the 3 id to use them in a feature B
My question is: how can I retrieve these IDs for use in feature B
Thanks
There is no way you can re-use data from one Scenario in another, and a Feature is certainly out of the question. Please take some time to read this: https://stackoverflow.com/a/46080568/143475
That said, if all you need to do is call a Scenario in a loop, just do that. Here is a simple example you can try:
Feature:
Scenario:
* table data
| value |
| 'one' |
| 'two' |
* def result = call read('called.feature') data
* def traceIds = $result[*].traceId
* print traceIds
And called.feature is simply:
#ignore
Feature:
Scenario:
* url 'https://httpbin.org/post'
* request { key: '#(value)' }
* method post
* def traceId = response.headers['X-Amzn-Trace-Id']
Please read the docs to understand how you can "collect" data from the "called" feature, traceId in this case: https://github.com/karatelabs/karate#data-driven-features

Process fields with nested arrays into strings with strcat_array for output in Kusto

I would like to process Azure AD audit Logs into HTML tables/csv files. The data contains nested sets of arrays that I would like to summarise into a comma separated string.
eg data that looks like this
{
"TargetResources": [{"displayName": "Policy",
"modifiedProperties": [{"displayname": "PolicySetting1"},
{"displayname": "PolicySetting2"}]
}]
}
Would be processed into
TargetResource | Policy
modifedProps | PolicySetting1, PolicySetting2
mv-expand doesn't seem to work because some rows do not have modifiedProperties so those rows get eliminated
The only solution I have been able to find that gets close to what I am trying to do looks like this:
AuditLogs
| extend TargetResource = tostring(TargetResources[0].displayName)
| extend ModifiedProperty0 = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[0].displayName)
| extend ModifiedProperty1 = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].displayName)
| extend ModifiedProperty2 = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[2].displayName)
| extend ModifiedProperties = strcat(ModifiedProperty0,", ",ModifiedProperty1,", ",ModifiedProperty2)
This solution is limited in that it cannot work for arbitrary numbers of modifiedProperty values (it only works properly for exactly 3) which is a requirement for my purposes, I would like the solution to work if modifiedProperties does not exist and if there are 0-15 values.
Thank you for any help you can provide
if I understood your description correctly, you could use mv-apply (twice) to achieve that:
datatable(d: dynamic)
[
dynamic({"TargetResources":[{"displayName": "Policy0","someOtherProperty":"hello world"}]}),
dynamic({"TargetResources":[{"displayName": "Policy1","modifiedProperties":[{"displayname":"PolicySetting1"},{"displayname":"PolicySetting2"}]}]}),
dynamic({"TargetResources":[{"displayName": "Policy2","modifiedProperties":[{"displayname":"PolicySetting3"},{"displayname":"PolicySetting4"}]}, {"displayName":"Policy3","modifiedProperties":[{"displayname":"PolicySetting5"},{"displayname":"PolicySetting6"}]}]}),
]
| mv-apply tr = d.TargetResources on (
extend TargetResource = tr.displayName
| mv-apply mp = tr.modifiedProperties on (
extend propertyName = mp.displayname
| summarize modifiedProps = strcat_array(make_set(propertyName), ", ")
)
)
| project TargetResource, modifiedProps
TargetResource
modifiedProps
Policy0
Policy1
PolicySetting1, PolicySetting2
Policy2
PolicySetting3, PolicySetting4
Policy3
PolicySetting5, PolicySetting6

How to track SLA of VM availability set (or availability zone) through heartbeats with Log Analytics (KQL)

I want to track the SLAs of our VMs in a Monitor Workbook using a Log Analytics query.
For this, I use the 'Heartbeat' table, which gives the heartbeats of each VM.
However, some of our VMs are in an availability set/zone and as such, the SLA is only broken,
if in an interval of 1 minute, both heartbeats are missing.
As such I need to be able to group the heartbeats by availability set/zone in the query, but there doesn't seem to be such a property on the heartbeat.
I can use a separate Azure Resource Graph query to search for which VMs are in an availability set/zone, but when I merge this query with my Log Analytics query, I can't do any further Kusto Query Language processing on the query (I can only merge the tables).
For information, these are my Log Analytics Heartbeat query and my Resource Graph SLA query:
let timeRangeStart = {TimeRange:start};
let timeRangeEnd = {TimeRange:end};
Heartbeat
| where ResourceType == "virtualMachines"
| extend ResourceGroup = case(ResourceGroup <> "", ResourceGroup, "On-Prem")
| where TimeGenerated > timeRangeStart and TimeGenerated < timeRangeEnd and Computer in ({Servers})
| extend Resource=tolower(iff(isempty(_ResourceId), Resource, _ResourceId))
| summarize heartbeat_tot = count() by Resource,ResourceGroup, SubscriptionId
| extend total_number_of_buckets=round((timeRangeEnd-timeRangeStart)/1m)
| extend round(availability_rate=heartbeat_tot*100/total_number_of_buckets,2)
| extend availability_rate = min_of(availability_rate, 100)
| order by availability_rate asc
Resources // VMs
| where type == 'microsoft.compute/virtualmachines'
| extend AvSet = properties.availabilitySet.id
| extend AvZone = properties.availabilityZone.id
| extend VMname_SLA = iff(isnotempty(AvZone), AvZone, iff(isnotempty(AvSet), AvSet, id))
| extend SLA_VM = iff(isnotnull(AvZone), '99.99%', iff(isnotnull(AvSet), '99.95%', ''))
| extend managedBy = tolower(id)
| join kind = leftouter (
Resources // Disks
| where type == 'microsoft.compute/disks'
| where isnotempty(managedBy)
| extend managedBy = tolower(managedBy)
// What do Standard HDD disks have as SKU tag??? I used StandardHDD for the time being
| extend Tier_disk = sku.tier
| extend SLA_disk = iff(Tier_disk == 'StandardHDD', '95%', iff(Tier_disk == 'Standard', '99.5%', '99.9%'))
) on managedBy
| extend SLA_tot = iff(isnotempty(SLA_VM), SLA_VM, SLA_disk)
| project managedBy, VMname_SLA, SLA_tot
| order by managedBy asc
How many resources is it?
If it is not a large number of resources, a workaround would be:
run your ARG query in text parameter, and format the results of the query to effectively generate a json array of objects, with id, location, etc that you need. then mark this parameter as hidden
in your Logs query, reference that parameter json text before the query, and use KQL operators to turn that JSON structure into a table. then you can join/filter on that table in the query
it isn't optimal, and won't work well if there are large numbers of resources since every time you run your query you're effectively "uploading" a json blob and then immediately parsing it apart again.

Using PowerShell to get the values of a SQL job schedule

Currently, I am getting the job schedule for an existing SQL job and I want to get the values of how frequently it runs i.e. Daily, Weekly, Monthly, etc then I want to get when it should run next, and if the job runs on the weekends. I understand how to get all that information by doing
Get-SqlAgent -ServerInstance "$SERVER" | Get-SqlAgentJob $job | Get-SqlAgentJobSchedule | Format-List -Property *
This shows me all the relative information I need
Parent : Test_Job
ActiveEndDate : 12/31/9999 12:00:00 AM
ActiveEndTimeOfDay : 23:59:59
ActiveStartDate : 3/4/2020 12:00:00 AM
ActiveStartTimeOfDay : 00:00:00
DateCreated : 3/4/2020 2:08:00 PM
FrequencyInterval : 1
FrequencyRecurrenceFactor : 0
FrequencyRelativeIntervals : First
FrequencySubDayInterval : 2
FrequencySubDayTypes : Hour
FrequencyTypes : Daily
IsEnabled : True
JobCount : 1
I am looking at microsofts page on how to understand all the frequency information, but so far it seems like the only option is to have a bunch of nested IF statements that determine how often it runs. I can do it this way, but I figured there has to be a cleaner way to get the information I need. This is how I am currently parsing the information
if($frequency -eq "DAILY")
{
$MINUTES_LATEST_RUN_SUCCESS = "1500"
#Code here to see how often it runs in a day
}
elseif($frequency -eq "WEEKLY")
{
$MINUTES_LATEST_RUN_SUCCESS = "11520"
#Code here to see how how many days a week it runs
}
elseif($frequency -eq "MONTHLY")
{
$MINUTES_LATEST_RUN_SUCCESS = "50400"
#Code here to see how which day it runs a month
}
else
{
$MINUTES_LATEST_RUN_SUCCESS = "1500"
}
I figured this can't be the best approach.
Anytime you get into many if/then statements, it's time to use a switch. There are sample code blocks in the PowerShell ISE (CRTL+J) and VSCode (CRTl+ALT+J) for this idea.
$a = 5
switch ($a)
{
1 {"The color is red."}
2 {"The color is blue."}
3 {"The color is green."}
4 {"The color is yellow."}
5 {"The color is orange."}
6 {"The color is purple."}
7 {"The color is pink."}
8 {"The color is brown."}
default {"The color could not be determined."}
}
Yours, but of course add your other code as needed in the block.
$frequency = 'DAILY'
switch ($frequency)
{
DAILY {"The job is run $frequency"}
WEEKLY {"The job is run $frequency"}
MONTHLY {"The job is run $frequency"}
default {$MINUTES_LATEST_RUN_SUCCESS = "1500"}
}
I'm a bit late to the party, but I was also looking for documentation on the frequency interval, as there is no proper documentation on the Powershell pages, but came across this https://learn.microsoft.com/en-us/sql/relational-databases/system-tables/dbo-sysschedules-transact-sql?view=sql-server-ver15

Google Pub/Sub to Dataflow, avoid duplicates with Record ID

I'm trying to build a Streaming Dataflow Job which read events from Pub/Sub and write them into BigQuery.
According to the documentation, Dataflow can detect duplicate messages delivery if a Record ID is used (see: https://cloud.google.com/dataflow/model/pubsub-io#using-record-ids)
But even using this Record ID, I still have some duplicates
(around 0.0002%).
Did I miss something ?
EDIT:
I use Spotify Async PubSub Client to publish messages with the following snipplet:
Message
.builder()
.data(new String(Base64.encodeBase64(json.getBytes())))
.attributes("myid", id, "mytimestamp", timestamp.toString)
.build()
Then I use Spotify scio to read the message from pub/sub and save it to DataFlow:
val input = sc.withName("ReadFromSubscription")
.pubsubSubscription(subscriptionName, "myid", "mytimestamp")
input
.withName("FixedWindow")
.withFixedWindows(windowSize) // apply windowing logic
.toWindowed // convert to WindowedSCollection
//
.withName("ParseJson")
.map { wv =>
wv.copy(value = TableRow(
"message_id" -> (Json.parse(wv.value) \ "id").as[String],
"message" -> wv.value)
)
}
//
.toSCollection // convert back to normal SCollection
//
.withName("SaveToBigQuery")
.saveAsBigQuery(bigQueryTable(opts), BQ_SCHEMA, WriteDisposition.WRITE_APPEND)
The Window size is 1 minute.
After only few seconds injecting messages I already have duplicates in BigQuery.
I use this query to count duplicates:
SELECT
COUNT(message_id) AS TOTAL,
COUNT(DISTINCT message_id) AS DISTINCT_TOTAL
FROM my_dataset.my_table
//returning 273666 273564
And this one to look at them:
SELECT *
FROM my_dataset.my_table
WHERE message_id IN (
SELECT message_id
FROM my_dataset.my_table
GROUP BY message_id
HAVING COUNT(*) > 1
) ORDER BY message_id
//returning for instance:
row|id | processed_at | processed_at_epoch
1 00166a5c-9143-3b9e-92c6-aab52601b0be 2017-02-02 14:06:50 UTC 1486044410367 { ...json1... }
2 00166a5c-9143-3b9e-92c6-aab52601b0be 2017-02-02 14:06:50 UTC 1486044410368 { ...json1... }
3 00354cc4-4794-3878-8762-f8784187c843 2017-02-02 13:59:33 UTC 1486043973907 { ...json2... }
4 00354cc4-4794-3878-8762-f8784187c843 2017-02-02 13:59:33 UTC 1486043973741 { ...json2... }
5 0047284e-0e89-3d57-b04d-ebe4c673cc1a 2017-02-02 14:09:10 UTC 1486044550489 { ...json3... }
6 0047284e-0e89-3d57-b04d-ebe4c673cc1a 2017-02-02 14:08:52 UTC 1486044532680 { ...json3... }
The BigQuery documentation states that there may be rare cases where duplicates arrive:
"BigQuery remembers this ID for at least one minute" -- if Dataflow takes more than one minute before retrying the insert BigQuery may allow the duplicate in. You may be able to look at the logs from the pipeline to determine if this is the case.
"In the rare instance of a Google datacenter losing connectivity unexpectedly, automatic deduplication may not be possible."
You may want to try the instructions for manually removing duplicates. This will also allow you to see the insertID that was used with each row to determine if the problem was on the Dataflow side (generating different insertIDs for the same record) or on the BigQuery side (failing to deduplicate rows based on their insertID).