Setting up an alert for Long Running Pipelines in ADF v2 using Kusto Query - azure-log-analytics

I have a pipeline in ADF V2 which generally takes 3 hours to run but some times it takes more than 3 hours. so I want to set up an alert if the pipeline running more than 3 hours using Azure log analytics (Kusto Query), I have written a query but it shows the result if the pipeline succeeded or failed. I want an alert if the pipeline taking more than 3 hours and it's in progress.
My query is
ADFPipelineRun
| where PipelineName == "XYZ"
| where (End - Start) > 3h
| project information = 'Expected Time : 3 Hours, Pipeline took more that 3 hours' ,PipelineName,(End - Start)
Could you please help me to solve this issue?
Thanks in Advance.
Lalit

Updated:
Please change your query like below:
ADFPipelineRun
| where PipelineName == "pipeline11"
| top 1 by TimeGenerated
| where Status in ("Queued","InProgress")
| where (now() - Start) > 3h //please change the time to 3h in your case
| project information = 'Expected Time : 3 Hours, Pipeline took more that 3 hours' ,PipelineName,(now() - Start)
Explanation:
The pipeline has some statuses like: Succeeded, Failed, Queued, InProgress. If the pipeline is now running and not completed, its status must be one of the two: Queued, InProgress.
So we just need to get the latest one record by using top 1 by TimeGenerated, then check its status if Queued or InProgress(in query, its where Status in ("Queued","InProgress")).
At last, we just need to check if it's running more than 3 hours by using where (now() - Start) > 3h.
I test it by myself, it works ok. Please let me know if you still have more issue.

Related

How can I find last time an azure function was executed in log analytics workspace?

Trying to find the last execution time for all my functions in log analytics
I wrote this simple query to start
AppRequests
| distinct OperationName
| take 10
But how Can I get the last executionTime? I tried adding TimeGenerated which would also work
I would like the final result to be:
OperationName, LastExecutionTime
Function1 2022-01-01
Function2 2021-05-05
And so on
summarize operator
AppRequests
| summarize max(LastExecutionTime) by OperationName

Is there a log analytics query to get the ADF pipleine details that are running more than 24 hours?

I tried the below query to get the pipelines that are in progress for more than 1 day.
however it retrieved the results of the pipelines that were once in progress from the past 24 hours.
ADFActivityRun
| where TimeGenerated > ago(1d)
| where Status contains "progress"
| extend dataFactory=split(ResourceId, '/')[-1]
| project TimeGenerated, dataFactory, OperationName,Status, PipelineName
| summarize count() by PipelineName, tostring(dataFactory), Status,TimeGenerated
My requirement is to get only those pipeline results that are running more than 24 hours.
Could anyone please let me know if this is even possible?
Thanks!
The below query may work for you.
ADFPipelineRun
| where TimeGenerated > ago(1d)
| where Status == 'InProgress'
| where RunId !in (( ADFPipelineRun | where Status in ("Succeeded","Failed","Cancelled") | project RunId ))
| where datetime_diff('hour',now(),Start) > 24
| extend dataFactory=split(ResourceId, '/')[-1]
| project TimeGenerated, dataFactory, OperationName,Status, PipelineName
| summarize count() by PipelineName, tostring(dataFactory), Status,TimeGenerated
It will give the pipelines which are InProgress and not completed even after 24 hours.
Please check this output for your reference:
As I don’t have any pipelines which are running more than 24 hours, It is not displaying any details.
Please check the below result where my pipelines are InProgress for some time and failed but the execution time is more than 1 second here.
You can try the above query to get pipeline details which are running more than 24 hours and still running.
Reference:
https://www.techtalkcorner.com/long-running-azure-data-factory-pipelines/
I'd recommend using summarize arg_max(...) by ... to find the latest state of every ADF pipeline details. See more info here.

Scheduling of jobs through SQL Server stored procedure

I have to write a stored procedures for scheduling the Azure pipelines (Jobs).
Frequency ----Number of times batch needs to run in a day
Timing column will have entry for batch start time
Table A will have static entries for batches. Frequency denotes in a day how many times job will run and timing column will have the batch run time separated by comma(,)
Batch_ID Batch_Name Frequency Timing
-----------------------------------------------
1 ABC 2 7:00,13:00
Table B will have listing of jobs corresponding to one particular batch.This table will be static and have one time entry like table B.
Table B
Batch_ID JOB_ID JOB_NM
--------------------------------
1 1 Job_1
1 2 Job_1
Table C will contain the dependencies of the jobs in a batch
Table C
Batch_ID JOB_ID DEPENDENY_JOB_ID
----------------------------------------
1 1
1 2 1
When Batch executes, table D will be populated with batch start time.
Table D
Batch_ID Batch_Name Status start_Time end_time
-------------------------------------------------------
1 abc Start 7:00
As soon as Table E is populated,table D will populated with Job details.Job 2 will start only when job 1 finishes.
Table E
Batch_ID Batch_Name JOB_ID JOB_NM Start_Time End_Time
----------------------------------------------------------------------
1 abc 1 Job_1 7:00
1 abc 2 Job_2 7:15
When Job 2 completes then we will update the Table D end time column.
Once first run is completed, we need to check frequency column of table A and run the job again (if it's more than 1) and do the entire exercise again.
In case our 1st batch didn't complete before the start time of batch 2 then we have to hold the 2nd batch until batch 1 is completed.
Could anyone help me how to start this?
As #Gordon Linoff said, you are lacking a question on your "question".
If I can give an opinion on this, I dont think its a good design idea to split your logic between data factory and stored procedures in a database. Be mindful that in the future, the user mantaining the pipelines may not have access to the database and will not be able to understand half of it. Even if YOU are the one mantaining this, 2 years from now chances are you are going to forget what you did and following the line between 2 resources may take you more time than it should. It will also make troubleshooting harder.
It really depends on the scenario you are working on, but to sum it up: try to have everything logic related in one place.
Hope this helped!

Using SELECT ... FOR UPDATE to poll for a value change

I have a table that contains tasks and their status, akin to:
| task_id | task_status |
+---------+-------------+
| 71 | 1 |
| 85 | 3 |
| 110 | 2 |
Let's call the table TASKS.
Status is an enumerated value, for example:
= SCHEDULED
= RUNNING
= DONE
I need to poll this status to inform the user about a task he started. Currently, I'm just polling it on the server using a while loop, like this pseudocode:
status = old_status
while(timeout_not_expired and status==old_status) {
status = get_status("SELECT task_status FROM TASKS WHERE task_id=%1", task_id)
wait(check_interval)
}
return status
That's nasty, not only it spams the Oracle SQL server, it also spams our log of SQL queries.
So I did a bit of googling and found about SELECT ... FOR UPDATE. I tried to run this statement:
SELECT
task_status
FROM TASKS
WHERE task_id = 361
FOR UPDATE OF task_status
But it returns immediately. So the question:
Is this even what FOR UPDATE is for?
If yes, how do I get it to wait on the row with a timeout?
No, that isn't what that clause is for. From the documentation:
The FOR UPDATE clause lets you lock the selected rows so that other users cannot lock or update the rows until you end your transaction.
Your query selects the current status for that task and locks the row, essentially on the assumption that you plan to update it, and don't want anyone else to be able to change it between your select and subsequent update.
So after you perform that query, no-one else can update the status of that task until you commit or rollback - kind of the opposite of what you're trying to achieve.
You could look at alert or queueing mechanisms, but you might want to investigate continuous query notification, though it could be overkill for this.

SQL Access - Transpose Multiple results into one row

I am really struggling with the following problem as I am fairly new to SQL.
Problem: The projtask table has multiple tasks for 1 project. I need to transpose results so that I show all the statuses (e.g. task 150, 130, 110, 70 status) for every task on a single result row against 1 project.
At the moment I am coming back with multiple result rows against 1 project due to the number of tasks associated with that project. I hope this makes sense. If not please probe. Thanks, all the help would be appreciated :)
Ultimately I want the result to look like:
Project X - Task 10 - Status C - Task 130 - Status A - Task 150 - Status C
Project Y - Task 10 - Status A - Task 130 - Status C - Task 150 - Status A
Project Z - Task 10 - Status C - Task 130 - Status C - Task 150 - Status C
SELECT IIf(dbo_projtask.[task-num]=150 And dbo_projtask.stat='C','Released') AS 150_status, dbo_projtask.[proj-num],
IIf(dbo_projtask.[task-num]=130 And dbo_projtask.stat='A','Active') AS 130_status
FROM dbo_projtask
GROUP BY IIf(dbo_projtask.[task-num]=150 And dbo_projtask.stat='C','Released'), dbo_projtask.[proj-num],
IIf(dbo_projtask.[task-num]=130 And dbo_projtask.stat='A','Active');**
Not sure if i understand correctly, but trying to create columns for all tasks under a project doesn't sound like a scalable solution. Why not create an resultset with ProjectID, TaskID and StatusID and do any processing/modifications clietside? Relational databases tend to not like ragged/dynamic columns all that much. If you are absolutely set on the proposed structure you'd need to build a dynamic query that uses a pivot construction of sorts, but have my doubts whether it will work if you have a flexible number of tasks per project.