Is is possible to set a schedule in Azure Data Factory to execute a pipeline at intervals?
For example, I would like to schedule that runs every hour from Monday to Friday between 9am and 5am
At the moment I the following, but not sure how to enter the execution times.
It should look something like that:
Related
I have a pipeline in which ingest data from an API and load them into an azure database. The pipeline is called by a trigger. The load time is normally 6 to 7 hours. But sometimes for some reason, the pipeline runs more than 24 hours and on the next day again is executed by the trigger. So I want to stop the pipeline, if pipeline it runs more than 24 hours. Appreciate any help.
In Azure Pipeline, set Timeout for agent job would achieve your demand. Each job has a timeout. If the job has not completed in the specified time, the server will cancel the job. It will attempt to signal the agent to stop, and it will mark the job as canceled: https://learn.microsoft.com/en-us/azure/devops/pipelines/process/runs?view=azure-devops#timeouts-and-disconnects
Set 1440 minutes for 24 hours.
I have a query which will run if I simply run it through console or from code.
When I created Scheduled Query for the Query, it would not run. The Scheduled Query is successfully created, and the interval I set (every 2 hours) is correctly implemented but only the jobs are not created (I can see in Scheduled query that the time to run is being incremented by 2 hours every time it is supposed to run).
These are the properties when running query from Scheduled query:
Overwrite table, Processing location: US, Allow large results, Batch priority
If I do a Schedule Backfill, it creates 12 jobs which fails with an error messages similar to the following:
Exceeded CPU limit 125%
Exceeded memory
If I cancel all the created jobs and leave one to run, it would run successfully. The Scheduled Query itself would not create any jobs.
I started the Scheduled query at 12:00 and made it to run for every 2 hours in repeats.
I assumed the jobs would run at the start time but apparently it is not the case. Scheduled Query ran perfectly as intended from 14:00 following with 16:00 and so on.
The errors regarding maximum CPU/memory usage is because the query I wrote had ORDER BY statement which was causing this issue. Removing that cleared the issue.
Snowflake's documentation illustrates to have a TASK run on a scheduled basis when there are inserts/updates/deletions or other DML operations run on a table by creating a STREAM on that specific table.
Is there any way to have a TASK run if a view from a external Snowflake data share is refreshed, i.e. dropped and recreated?
As part of this proposed pipeline, we receive a one-time refresh of a view within a specific time period in a day and the goal would be to start a downstream pipeline that runs at most once during that time period, when the view is refreshed.
For example for the following TASK schedule
'USING CRON 0,10,20,30,40,50 8-12 * * MON,WED,FRI America/New York', the downstream pipeline should only run once every Monday, Wednesday, and Friday between 8-12.
Yes, I can point you to the documentation if you would like to see if this works for the tables you might already have set up:
Is there any way to have a TASK run if a view from a external
Snowflake data share is refreshed, i.e. dropped and recreated?
If you create a stored procedure to monitor the existence of the table, I have not tried that before though, I will see if I can ask an expert.
Separately, is there any way to guarantee that the task runs at most
once on a specific day or other time period?
Yes, you can use CRON to schedule optional parameters with specific days of the week or time: an example:
CREATE TASK delete_old_data
WAREHOUSE = deletion_wh
SCHEDULE = 'USING CRON 0 0 * * * UTC';
Reference: https://docs.snowflake.net/manuals/user-guide/tasks.html more specifically: https://docs.snowflake.net/manuals/sql-reference/sql/create-task.html#optional-parameters
A TASK can only be triggered by a calendar schedule, either directly or indirectly via a predecessor TASK being run by a schedule.
Since the tasks are only run on a schedule, they will not run more often than the schedule says.
A TASK can't be triggered by a data share change, so you have to monitor it on a calendar schedule.
This limitation is bound to be lifted sometime, but is valid as of Dec, 2019.
I've been given the task to schedule 2 SSIS packages (by the way I'm a Jr.Data Analyst and starting to get my feet wet with SSIS). Here is the scenario:
I have a package that needs to be scheduled to run weekly at 1pm every Friday (this sends out files to an ouside vendor). Will call this the weekly package.
I have another package for the same vendor that needs to be scheduled to run the first friday of every month. Will call this the monthly package.
So I have scheduled the weekly package to run every Friday BUT I need the weekly package not to run the Friday that the monthly package will run. Any ideas would be greatly appreciated. Thank You
Add an 'Execute SQL' task that runs:
IF (DATEPART(day,GETDATE()) BETWEEN 1 AND 7 AND DATEDIFF(day,0,GETDATE())%7 = 4) RAISERROR('Skip job on first Friday of month',16,1)
You could make it easy and schedule four jobs. Weeks 1, 2 & 3 would be the weekly package and the 4th would be the monthly package. SQL server agent makes it easy to do this.
I am facing issue with sql agent , on changing the server date it does not start at its schedule time while job is schedule daily at fixed time. No logs are found on this . This issue also occurs while system date has been changed to its real date . I have to restart the sql agent after that to invoke the job at its schedule time.
SQL Agent scheduled jobs will have last run date/time and next run date/time. Each time the jobs run, these values get updated. Please look in MSDB and you will see these details. You can also look in Job history.
When you manually reset the system clock to a date later than now, then your next run date/time will be in the past. As you have mentioned, bouncing the agent service should start the jobs again.
Raj