Run Snowflake Task when a data share view is refreshed - sql

Snowflake's documentation illustrates to have a TASK run on a scheduled basis when there are inserts/updates/deletions or other DML operations run on a table by creating a STREAM on that specific table.
Is there any way to have a TASK run if a view from a external Snowflake data share is refreshed, i.e. dropped and recreated?
As part of this proposed pipeline, we receive a one-time refresh of a view within a specific time period in a day and the goal would be to start a downstream pipeline that runs at most once during that time period, when the view is refreshed.
For example for the following TASK schedule
'USING CRON 0,10,20,30,40,50 8-12 * * MON,WED,FRI America/New York', the downstream pipeline should only run once every Monday, Wednesday, and Friday between 8-12.

Yes, I can point you to the documentation if you would like to see if this works for the tables you might already have set up:
Is there any way to have a TASK run if a view from a external
Snowflake data share is refreshed, i.e. dropped and recreated?
If you create a stored procedure to monitor the existence of the table, I have not tried that before though, I will see if I can ask an expert.
Separately, is there any way to guarantee that the task runs at most
once on a specific day or other time period?
Yes, you can use CRON to schedule optional parameters with specific days of the week or time: an example:
CREATE TASK delete_old_data
WAREHOUSE = deletion_wh
SCHEDULE = 'USING CRON 0 0 * * * UTC';
Reference: https://docs.snowflake.net/manuals/user-guide/tasks.html more specifically: https://docs.snowflake.net/manuals/sql-reference/sql/create-task.html#optional-parameters

A TASK can only be triggered by a calendar schedule, either directly or indirectly via a predecessor TASK being run by a schedule.
Since the tasks are only run on a schedule, they will not run more often than the schedule says.
A TASK can't be triggered by a data share change, so you have to monitor it on a calendar schedule.
This limitation is bound to be lifted sometime, but is valid as of Dec, 2019.

Related

Run a Big Query Schedule everytime a table is updated

So I have this schedule that gets data from some tables and aggregate then into a single table, that I use as a source in a data studio dash, one of these tables (table 1), is updated daily, sometimes more than once, I wanted to know if there is a way to automatically run the schedule every time this table (table1) is updated.
Unfortunately, BigQuery scheduled queries need the parameters of run_time and run_date, so if you want this to be only with BigQuery, you can schedule multiple queries at different times.
Additionally, using a trigger in BigQuery is not possible because this is unsupported.
I recommend you to use a Cloud Run Action that gets the event after any Insert and this creates an event Trigger, so this could do what you are looking for, but without any scheduled queries.

Can I automate an SSIS package that requires user input?

I've been developing a data pipeline in SSIS on an on-premise VM during my internship, and was tasked with gathering data from Marketo (re: https://www.marketo.com/ ). This package runs without error, starting with a Truncate table execute SQL task, followed by 5 data flow tasks that gather data from different sources within Marketo and moves them to staging tables within SQL Server, and concludes with an execute SQL task to load processing tables with only new data.
The problem I'm having: my project lead wants this process to be automated to run daily, and I have noticed tons of resources online that show automation of an SSIS package, but within my package, I have to have user input for the Marketo source. The Marketo source requires a user input of a time frame from which to gather data.
Is it possible to automate this package to run daily even with user input required? I was thinking there may be a way to increment the date value by one for the start and end dates (So start date could be 2018-07-01, and end date could be 2018-07-02, incrementing each day by one), to make the package run by itself. Thank you in advance for any help!
As you are automating your extract, this suggests that you have a predefined schedule on which to pull that data. From this schedule, you should be able to work out your start and end dates based on the date that the package was run.
In SSIS there are numerous ways to achieve this depending on the data source and your connection methods. If you are using a script task, you can simply calculate the dates required using your .Net code. Another alternative would be to use calculated variables that return the result of an expression, such as:
DATEADD("Month", -1, GETDATE())
Assuming you schedule your extract to run on the first day of the month, the expression above would return the first day of the previous month.

Run a SQL Server job until it succeeds

I have a SQL Server job that has run for almost 2 years.
It's connecting to a bad Oracle database that keeps disconnecting, it always fails due to that. And when I run it again after 10 or 15 minutes, it works successfully. I'm getting bored of checking it every day...
Is there a way that make the job run to connect to that Oracle source until it succeeds, or another job that looks over this job status and if it failed, then it runs it again until it succeeds?
A solution we are using is something like this:
Wrap your Oracle query in an SSIS package, and after the query, have the package update a SQL table that keeps either a history of executions, or just a single row that tracks the last time the job ran successfully. In short, if the Oracle query was successful, then put something in a table saying the query ran successfully today. If it was not successful, then don't put anything in the table for today.
Then at the beginning of the package, BEFORE the Oracle query, check to see if the query has been run successfully today. If it has already run successfully, then do nothing and exit the package. If it has not run successfully today, then go ahead and try to run it, following the post-query steps described above. If you have any other conditions about when the package should run (like "only after 10 am" or anything like that) you would include that logic here.
Finally, schedule the job to call the package, and schedule to run every 15 minutes, or however often you like. It will try every 15 minutes until it runs successfully, and after that it will stop doing anything until the next day.
As a bonus, you can use this same package and job to initiate all tasks that you want handled the same way. You just need to keep meta data about all these tasks in your history/metadata table.
an alternative is to create the job step and leave it unscheduled, and create an ssis job that acts as the master to all your jobs and it runs every minute checking all job steps from a config table that have yet to succeed today and any it finds execute using sp_start_job.
if they do run successfully log the stats to a log table and this prevents them ever being launched again until the next day. This prevents all yours jobs needing to be scheduled every 15 minutes etc, they launch asap, and you can add extra logic to handle dependencies, number parallel running, importance level etc, start time, latest start time, max number to retty etc

SQL Periodic table updates

Good Morning all
I wish to create a table which essentially acts like a view but only updates once a week. Not sure how to start this process...looking at scheduled tasks.
I was thinking essentially it would be a clearing function which clears the table at the start of the week and re-runs the initial SQL command to populate...But unsure and need some advice please.
The reason I am looking at doing a table rather than a view is so I can set up periodically to update tables to create / show trends analysis over time
Thanks in advance
You may also look at using SSIS package and then schedule this as a task

Check scheduled job status in SAP BODS

I am new to BODS, At present I have configured a job to execute every 2 mins to perform transaction from MySQL server and Load into HANA tables.
But sometimes when the data volume in MySQL is too large to transform and load into HANA within 2 Mins, the job is still executing my next iteration for the same job starts which results in BODS failure.
My question is: is there is any option BODS to check for the execution status of the scheduled JOB between runs?
Please help me out with this.
You can create a control/audit table to keep history of each run of bods job. The table should contain fields like eExtractionStart, ExtractionEnd, EndTime etc. And you need to make a change in the job, so that it reads status of previous run from this table before starting the load to Hana data flow. If previous run has not finished, the job can raise an exception.
Let me know if this has been helpful or if you need more information.