Calculate Time difference between two operation using kusto query

Calculate Time difference between two operation using kusto query - kql

I would like to Calculate Time difference between two operation where operation name contains ID using kusto query. please find attached snapshot that contains table and required output. It would be helpful if anyone can share Kusto query to achieve the same.
timestamp [IST] name
2022/30/6, 4:10:00.460 AM fbf759a0-d4be-4d07-adfb-7090a207e667 Success: Reached Three
2022/30/6, 4:12:00.091 AM fbf759a0-d4be-4d07-adfb-7090a207e667 Success: Reached Two
2022/30/6, 4:10:00.460 AM ewerewtetete-4d07-adfb-7090a207e667 Retry message
2022/30/6, 4:14:10.791 AM fbf759a0-d4be-4d07-adfb-7090a207e667 Success: Reached One
2022/30/6, 4:15:10.460 AM ewerewtetete-4d07-adfb-7090a207e667 Success: Reached Three
2022/30/6, 4:20:04.343 AM fbf759a0-d4be-4d07-adfb-7090a207e667 Retry message
Output
Time Difference by operationID(which is part of name) in seconds
fbf759a0-d4be-4d07-adfb-7090a207e667 604
ewerewtetete-4d07-adfb-7090a207e667 310

datatable(['timestamp [IST]']:datetime, name:string)
[
"2022-06-30 04:10:00.460" ,"fbf759a0-d4be-4d07-adfb-7090a207e667 Success: Reached Three"
,"2022-06-30 04:12:00.091" ,"fbf759a0-d4be-4d07-adfb-7090a207e667 Success: Reached Two"
,"2022-06-30 04:10:00.460" ,"ewerewtetete-4d07-adfb-7090a207e667 Retry message"
,"2022-06-30 04:14:10.791" ,"fbf759a0-d4be-4d07-adfb-7090a207e667 Success: Reached One"
,"2022-06-30 04:15:10.460" ,"ewerewtetete-4d07-adfb-7090a207e667 Success: Reached Three"
,"2022-06-30 04:20:04.343" ,"fbf759a0-d4be-4d07-adfb-7090a207e667 Retry message"
]
| project-rename ts = ['timestamp [IST]']
| parse name with OperationID " " *
| summarize TimeDiffSec = round((max(ts) - min(ts)) / 1s) by OperationID
OperationID
TimeDiffSec
fbf759a0-d4be-4d07-adfb-7090a207e667
604
ewerewtetete-4d07-adfb-7090a207e667
310
Fiddle

Related

NextgenSplunk: Need help forming a splunk query which takes sessionId from a particular set of logs, use it to form next query

I need to form a Splunk query to find a particular sessionId for which log a is available but log b is not. Both are part of the same transaction but code breaking in between somewhere.
LOGGER.info("Log a:: setting some details in session");
Response response = handler.transactionMethod(token); //throws some exception
LOGGER.info("Log b:: getting details in session");
So in the success scenario, both Log a and Log b will be printed. But when transactionMethod throws an exception, only Log a will be printed for that sessionId and not Log b.
The requirement is I need to find any of the sessionId for which only Log a is present, not Log b.

Assuming that you have 2 fields TEXT and SessionID already defined, we will use the following test data:
SessionID=1001 TEXT="setting
SessionID=1001 TEXT="getting
SessionID=1002 TEXT="setting
SessionID=1003 TEXT="getting"
Splunk query:
| makeresults count=4
| streamstats count
| eval TEXT=case(count=1 OR count=3, "setting", count=2 OR count=4, "getting")
| eval SessionID=case(count=1 OR count=2, 1001, count=3, 1002, count=4, 1003)
``` The above is just setting of the test data ```
``` Below is the actual SPL for the task ```
| stats count(eval(TEXT=setting")) as LogA count(eval(TEXT="getting") as Logb by SessionID
| search LogA > 0 and LogB = 0
As you can see I specifically excluded the case when only "LogB" record is present (SessionID=3)

Sequelize Query - Count associated tables and count all for pagination

this is my first question on stackoverflow, never used it before but this issue is making me tear my hair out.
I'm building an infinite scroll component for a react app I'm working on a I'm trying to make a Postgres DB query work.
I have 2 tables - Challenges, and UserChallenges.
Challenges have many User Challenges.
I need to get a subsection of Challenges (from start to end) with each Challenge having a count of the number of "participants" (number of associated UserChallenges), and also a count of all challenges.
Something like this:
{
rows: [Challenge, Challenge, Challenge],
count: n
}
Where each challenge includes the total number of userChallenges as "participants" and count is a count of all challenges.
Here is the query:
let json_query = {
attributes: {
include: [[Sequelize.fn("COUNT", Sequelize.col("user_challenges.id")), "participants"]]
},
include: [{
model: UserChallenge, attributes: []
}],
order: [['timestamp', 'DESC']],
offset: start,
limit: end
}
The start and end quantities are the start and end of the pagination.
I'm running this query as follows:
var challengeInstances = await Challenge.findAndCountAll(json_query)
This results in the following error:
name: 'SequelizeDatabaseError',
parent: error: missing FROM-clause entry for table "user_challenges"
and this is the sql it's saying it's running:
`SELECT "challenge".* FROM (SELECT "challenge"."id", "challenge".*, COUNT("user_challenges"."id"), "challenge"."participants" FROM "challenges" AS "challenge" GROUP BY "challenge"."id" ORDER BY "challenge"."end_date" DESC LIMIT '4' OFFSET '0') AS "challenge" LEFT OUTER JOIN "user_challenges" AS "user_challenges" ON "challenge"."id" = "user_challenges"."challenge_id" ORDER BY "challenge"."end_date" DESC;`,
Sequelize or raw queries are both good.
Do let me know if you need any more information and thank you so so much.

you can use sequelize literal like this & remove object from attributes just paste this code for attributes .
attributes: [
[
sequelize.literal(`(
SELECT COUNT(id)
FROM user_challenges
WHERE
// your condition of foreign key like (user_challenges.participants_id = participants.id)
)`),
'numberOfParticipants'
]
]

Using PowerShell to get the values of a SQL job schedule

Currently, I am getting the job schedule for an existing SQL job and I want to get the values of how frequently it runs i.e. Daily, Weekly, Monthly, etc then I want to get when it should run next, and if the job runs on the weekends. I understand how to get all that information by doing
Get-SqlAgent -ServerInstance "$SERVER" | Get-SqlAgentJob $job | Get-SqlAgentJobSchedule | Format-List -Property *
This shows me all the relative information I need
Parent : Test_Job
ActiveEndDate : 12/31/9999 12:00:00 AM
ActiveEndTimeOfDay : 23:59:59
ActiveStartDate : 3/4/2020 12:00:00 AM
ActiveStartTimeOfDay : 00:00:00
DateCreated : 3/4/2020 2:08:00 PM
FrequencyInterval : 1
FrequencyRecurrenceFactor : 0
FrequencyRelativeIntervals : First
FrequencySubDayInterval : 2
FrequencySubDayTypes : Hour
FrequencyTypes : Daily
IsEnabled : True
JobCount : 1
I am looking at microsofts page on how to understand all the frequency information, but so far it seems like the only option is to have a bunch of nested IF statements that determine how often it runs. I can do it this way, but I figured there has to be a cleaner way to get the information I need. This is how I am currently parsing the information
if($frequency -eq "DAILY")
{
$MINUTES_LATEST_RUN_SUCCESS = "1500"
#Code here to see how often it runs in a day
}
elseif($frequency -eq "WEEKLY")
{
$MINUTES_LATEST_RUN_SUCCESS = "11520"
#Code here to see how how many days a week it runs
}
elseif($frequency -eq "MONTHLY")
{
$MINUTES_LATEST_RUN_SUCCESS = "50400"
#Code here to see how which day it runs a month
}
else
{
$MINUTES_LATEST_RUN_SUCCESS = "1500"
}
I figured this can't be the best approach.

Anytime you get into many if/then statements, it's time to use a switch. There are sample code blocks in the PowerShell ISE (CRTL+J) and VSCode (CRTl+ALT+J) for this idea.
$a = 5
switch ($a)
{
1 {"The color is red."}
2 {"The color is blue."}
3 {"The color is green."}
4 {"The color is yellow."}
5 {"The color is orange."}
6 {"The color is purple."}
7 {"The color is pink."}
8 {"The color is brown."}
default {"The color could not be determined."}
}
Yours, but of course add your other code as needed in the block.
$frequency = 'DAILY'
switch ($frequency)
{
DAILY {"The job is run $frequency"}
WEEKLY {"The job is run $frequency"}
MONTHLY {"The job is run $frequency"}
default {$MINUTES_LATEST_RUN_SUCCESS = "1500"}
}

I'm a bit late to the party, but I was also looking for documentation on the frequency interval, as there is no proper documentation on the Powershell pages, but came across this https://learn.microsoft.com/en-us/sql/relational-databases/system-tables/dbo-sysschedules-transact-sql?view=sql-server-ver15

Eloquent: get AVG from all rows that have minimum timestamp

I want to get the User ID and it's average score from every minimum timestamp for each category. Here's the table structure
Skill Table
id | user_id | category | score | timestamp**
0....10............a................11........12
1....10............a................10........9
2....10............b................12........10
3....10............c................11........8
4....11............a................8........9
5....11............b................9........10
6....11............c................10........8
7....11............c................15........14
I want to get the result like this:
user_id | AVG(score)
10........11 (average id: 1,2,3)
11........9 (average id: 4,5,6)
For now I use the looping query for every user
foreach ($userIds as $id){
// in some case I need to get from only specified Ids not all of them
foreach ($category as $cat) {
// get the minimum timestamp's score for each category
$rawCategory = Skill::where("user_id", $id)->where("timestamp", "<=", $start)->where("category",$cat->id)->orderBy("timestamp", "desc")->first();
if($rawCategory){
$skillCategory[$cat->cat_name] = $rawCategory->score;
}
}
//get the average score
$average = array_sum($skillCategory) / count($skillCategory);
}
I want to create better Eloquent query to get the data like this with good performance (< 60 sec). Have anyone faced a similar problem and solved it? If so, can you please give me the link. Thanks

Google Pub/Sub to Dataflow, avoid duplicates with Record ID

I'm trying to build a Streaming Dataflow Job which read events from Pub/Sub and write them into BigQuery.
According to the documentation, Dataflow can detect duplicate messages delivery if a Record ID is used (see: https://cloud.google.com/dataflow/model/pubsub-io#using-record-ids)
But even using this Record ID, I still have some duplicates
(around 0.0002%).
Did I miss something ?
EDIT:
I use Spotify Async PubSub Client to publish messages with the following snipplet:
Message
.builder()
.data(new String(Base64.encodeBase64(json.getBytes())))
.attributes("myid", id, "mytimestamp", timestamp.toString)
.build()
Then I use Spotify scio to read the message from pub/sub and save it to DataFlow:
val input = sc.withName("ReadFromSubscription")
.pubsubSubscription(subscriptionName, "myid", "mytimestamp")
input
.withName("FixedWindow")
.withFixedWindows(windowSize) // apply windowing logic
.toWindowed // convert to WindowedSCollection
//
.withName("ParseJson")
.map { wv =>
wv.copy(value = TableRow(
"message_id" -> (Json.parse(wv.value) \ "id").as[String],
"message" -> wv.value)
)
}
//
.toSCollection // convert back to normal SCollection
//
.withName("SaveToBigQuery")
.saveAsBigQuery(bigQueryTable(opts), BQ_SCHEMA, WriteDisposition.WRITE_APPEND)
The Window size is 1 minute.
After only few seconds injecting messages I already have duplicates in BigQuery.
I use this query to count duplicates:
SELECT
COUNT(message_id) AS TOTAL,
COUNT(DISTINCT message_id) AS DISTINCT_TOTAL
FROM my_dataset.my_table
//returning 273666 273564
And this one to look at them:
SELECT *
FROM my_dataset.my_table
WHERE message_id IN (
SELECT message_id
FROM my_dataset.my_table
GROUP BY message_id
HAVING COUNT(*) > 1
) ORDER BY message_id
//returning for instance:
row|id | processed_at | processed_at_epoch
1 00166a5c-9143-3b9e-92c6-aab52601b0be 2017-02-02 14:06:50 UTC 1486044410367 { ...json1... }
2 00166a5c-9143-3b9e-92c6-aab52601b0be 2017-02-02 14:06:50 UTC 1486044410368 { ...json1... }
3 00354cc4-4794-3878-8762-f8784187c843 2017-02-02 13:59:33 UTC 1486043973907 { ...json2... }
4 00354cc4-4794-3878-8762-f8784187c843 2017-02-02 13:59:33 UTC 1486043973741 { ...json2... }
5 0047284e-0e89-3d57-b04d-ebe4c673cc1a 2017-02-02 14:09:10 UTC 1486044550489 { ...json3... }
6 0047284e-0e89-3d57-b04d-ebe4c673cc1a 2017-02-02 14:08:52 UTC 1486044532680 { ...json3... }

The BigQuery documentation states that there may be rare cases where duplicates arrive:
"BigQuery remembers this ID for at least one minute" -- if Dataflow takes more than one minute before retrying the insert BigQuery may allow the duplicate in. You may be able to look at the logs from the pipeline to determine if this is the case.
"In the rare instance of a Google datacenter losing connectivity unexpectedly, automatic deduplication may not be possible."
You may want to try the instructions for manually removing duplicates. This will also allow you to see the insertID that was used with each row to determine if the problem was on the Dataflow side (generating different insertIDs for the same record) or on the BigQuery side (failing to deduplicate rows based on their insertID).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Calculate Time difference between two operation using kusto query - kql

Related

NextgenSplunk: Need help forming a splunk query which takes sessionId from a particular set of logs, use it to form next query

Sequelize Query - Count associated tables and count all for pagination

Using PowerShell to get the values of a SQL job schedule

Eloquent: get AVG from all rows that have minimum timestamp

Google Pub/Sub to Dataflow, avoid duplicates with Record ID

Categories

Resources