I am trying to run an SQL package from our SQL Server via a scheduled job at different times of the day with different parameters. The package imports property specific information (our company has multiple properties) that is available after a certain time of the day. The package accepts a property code parameter to identify the property information to be imported.
If possible, I would like to re-use one package/job and set up multiple steps that execute at certain times of the day.
Is there a better way to set this up besides using multiple jobs that run the same package with its own parameters?
I would really appreciate some advice, thank you.
Multiple steps allow you to stop execution if a step fails so the thing you do in step 10, will NOT run if step 9 fails.
If you care about that requirement, use a single job with multiple steps. If you don’t, use multiple jobs.
If you need to control the time of each step, then you must use multiple jobs.
Really, the answer is based on what you need to accomplish.
Related
I have a dataset located in Europe-west3, and i'm trying to setup scheduled queries on that dataset. However, when setting up the scheduled query, the "processing location" option doesn't contain Europe-west3 as an option. Leaving it as "default" makes the processing location be US, and then the query is unable to run. There are only like 7 procesing locations available, i tried both EU and Europe-west2, but neither work.
I don't really know what to do to get my queries to run on schedule. I can run the queries just fine normally, but trying to schedule them the processing location simply wont let me pick the correct location.
Any ideas?
Currently Schedule Queries does not support region europe-west3. Follow (star) this public issue tracker to stay updated.
Right now if you need to implement scheduled queries you should create a replica of that dataset in another region that is supported and run them there. I would suggest creating a copy of that dataset in another region. However, this feature is also not available for region europe-west3 right now.
I hope you can achieve what you desire without many headaches.
need your idea guys how to develop Automation WorkTask.
Actually, i want to create a automation WorkTask by pulling the data from SQL. I always used a website : XXX to submit Work Task. In another hand, i need to pull the data from SQL. SO, i will used the data from SQL and manually insert to the website to submit Work Task. my idea is, i want to make it as one. meaning that, whenever, i pull the data, it will automatically, send the data to the website and auto submit Work Task. can anyone help me to do that? or it is impossible? - Noobiest SQL
Use a Console Application. From there you can extract the data, format it in any way you want and even automate the upload of that information using the .NET library.
Then with the windows scheduler, you can tell it to run however often you need to.
For example, I have an application that runs every 5 minutes, reads a database, gets the info, then executes a number of tasks using it. It's scheduled to run every 5 minutes.
My team at work is currently looking for a replacement for a rather expensive ETL tool that, at this point, we are using as a glorified scheduler. Any of the integrations offered by the ETL tool we have improved using our own python code, so I really just need its scheduling ability. One option we are looking at is Data Pipeline, which I am currently piloting.
My problem is thus: imagine we have two datasets to load - products and sales. Each of these datasets requires a number of steps to load (get source data, call a python script to transform, load to Redshift). However, product needs to be loaded before sales runs, as we need product cost, etc to calculate margin. Is it possible to have a "master" pipeline in Data Pipeline that calls products first, waits for its successful completion, and then calls sales? If so, how? I'm open to other product suggestions as well if Data Pipeline is not well-suited to this type of workflow. Appreciate the help
I think I can relate to this use case. Any how, Data Pipeline does not do this kind of dependency management on its own. It however can be simulated using file preconditions.
In this example, your child pipelines may depend on a file being present (as a precondition) before starting. A Master pipeline would create trigger files based on some logic executed in its activities. A child pipeline may create other trigger files that will start a subsequent pipeline downstream.
Another solution is to use Simple Workflow product . That has the features you are looking for - but would need custom coding using the Flow SDK.
This is a basic use case of datapipeline and should definitely be possible. You can use their graphical pipeline editor for creating this pipeline. Breaking down the problem:
There are are two datasets:
Product
Sales
Steps to load these datasets:
Get source data: Say from S3. For this, use S3DataNode
Call a python script to transform: Use ShellCommandActivity with staging. Data Pipeline does data staging implicitly for S3DataNodes attached to ShellCommandActivity. You can use them using special env variables provided: Details
Load output to Redshift: Use RedshiftDatabase
You will need to do add above components for each of the dataset you need to work with (product and sales in this case). For easy management, you can run these on an EC2 Instance.
Condition: 'product' needs to be loaded before 'sales' runs
Add dependsOn relationship. Add this field on ShellCommandActivity of Sales that refers to ShellCommandActivity of Product. See dependsOn field in documentation. It says: 'One or more references to other Activities that must reach the FINISHED state before this activity will start'.
Tip: In most cases, you would not want your next day execution to start while previous day execution is still active aka RUNNING. To avoid such a scenario, use 'maxActiveInstances' field and set it to '1'.
i have an SQL query (shown below) that i need to run on a regular basis:
db.execute("UPDATE property_info SET IsActive=false WHERE ExpiryDate > #0", CurrentDate);
This query is basically intended to check ALL properties, and to see whether or not they are past their expiration date. If they are, then it will automatically set the property to Inactive. Because "CurrentDate" is a rolling window, i want to re-run this query automatically, probably every day.
Is this something i should be using a stored procedure for?
Any suggestions on the best way to achieve this without any user interaction?
One simple way to achieve this would be to add the line of code to _PageStart.cshtml in the root of your project. This will make it execute every time any page on the site is executed. That is probably massively overkill for something that, by the looks of it, only needs to be checked once a day or so. To alleviate this you could employ a simple DateTime stamp in the Application collection to make sure it only runs a maximum of once every day or so (or tune the interval as appropriate for your needs). This is in no way a solution for fully scheduled code execution, but it may well serve your purposes (and your budget).
Okay, I'll try to explain as good as I can... Quite a particular case.
Tools: SSIS 2008
We have a control flow that now needs to be triggered by an event: the presence of one or multiple files. (1,2 or 3)
The variables used:
BO_FileLocation_1
BO_FileLocation_2
BO_FileLocation_3
BO_FileName_1
BO_FileName_2
BO_FileName_3
There can be one, two or three files: defined in above variables. When they are filled in,
they should be processed. When they are empty, this means there's just one file file, the process should ignore them and jump to the next (file watcher?) task.
For example:
BO_FileLocation_1= "C:\"
BO_FileLocation_2 NULL
BO_FileLocation_3 NULL
BO_FileName_1= "test.csv"
BO_FileName_2 NULL
BO_FileName_3 NULL
The report only needs one file.
I'd need a generic concept that checks the presence of these files, it could be more generic than my SSIS knowledge can handle right now. For example handy, when there's a 4th file in the future. I was also thinking to work with a single script to handle all the logic.
Thanks in advance
A possibly irrelevant image:
If all you want is to trigger the Copy Source File to handle if one or more of the files is present, just use the OR Constraint in your flow. The following image shows you how:
First connect all to the destination:
Then click one of the green arrows. This will make its properties window pop up. Select the Logical ORinstead of the Logical AND:
If everything went well, you should now see the connections as dashed lines:
There are several possible solutions:
Create a sequence container and include all the file imports in the sequence container. Add int variables for RowCountFile1, RowCountFile2, and RowCountFile3 and set the value to 0 (this is the default value when you create an int variable). Add a RowCount transformation to each of the data flows. Create a precedence constraint from the sequence container to the "Do something" task. Set the precedence constraint to success and expression. Set the expression value to #RowCountFile1 > 0 || #RowCountFile2 > 0 || #RowCountFile3 > 0. The advantage of this approach is that you can take an action as soon as the files are detected, you import all available files, and you only take an action after all the files have been imported. You could then schedule running this SSIS package as a SQL Server Agent job step and run it as frequently as you want.
A variant on solution 1 is to use for each file enumerator containers inside the sequence container. This would be useful if you don't know the exact name of the file and you expect to import more than one under some circumstances. For instance, if you get a file every few minutes with a timestamp in its file name and your process doesn't run for some reason, then you may have to process multiple files to get caught up and then take an action once it has been done.
You could use the file watcher task as you outlined in your question. The only problem I have with the file watcher task is that the package has to be in a constantly running state. This makes it hard to troubleshoot problems and performance. It also can introduce other problems since I remember having some problems with the file watcher task years ago when it first came out. It may well be a totally stable task now, but I prefer other methods over the task after having been burned previously. If you really want the package to run continously instead of having it be called by a job, then you could always use a script task to check for file, sleep thread if not found, check again, etc. I'm sure that's what the file watcher task does, but I would trust my own C# over the task. Power to anyone who has had better experiences than me with File Watcher...
Use PowerShell. If you just want to take an action if a file appears and you aren't importing the data, then a PowerShell script could do this just as well as a SSIS package. The drawback is that you have to learn some basic PowerShell, it may be hard to maintain in the future since PowerShell is probably not your bread and butter core language, and you may have to rewrite the code again to a SSIS package if you want to import the data. You would probably call the PowerShell script from a SQL Server Agent job step, so scheduling can be handled pretty easily.
There are more options than what I listed, so let me know if you still want more suggestions.