In a media management system my task is to create a workflow automation. Currently, i have created it using SQL Server triggers and the UI using ASP.NET with JQuery.
For Ex:
When a new file enters the system the trigger works and it will update the database metadata table with some data for that file.
Millions of assets get through the system. Is it ideal to have triggers to do this process.
Is there a better way to create this automation?
Is there a "best practice" to do this kind of works?
I'm having the same issue and data enters my central asset database on several ways (may differ from client to client).
So I also want to create an easily customizable workflow in the data layer (no other dependencies)
As the other people mention, triggers may affect the parent activity.
That is overcome by writing your action that should be performed away to a queue table.
Example Trigger. Hardware.Status = "Issue Work Order"
INSERT INTO Queue (Created, Task, Completed) VALUES (GETUTCDATE(),"EXEC dbo.IssueWorkOrder(123)",0);
The insert of a record into your queue table will reduce the problems as highlighted by other user comments.
The you build a scheduling tool (hangfire, sql tasks, or whatever), that execute tasks in the queue in the data order it wAS added.
Now, of course in practice it's not as simple as that. You will have to address the following:
What if the step fails2
Dependencies of previous steps to first have been completed
Multiple operators changing a record. (the deploy time between the job step being executed, and another person updating the same record.
I guess #2 and #3 is an issue with any workflow engine / pipleline. To address this a locking mechanism must be put in place.
Related
I am trying to create a dashboard app for my company that displays data from a few different sources that they use. I am starting with an in house system that stores data in MSSQL. I'm struggling to decide how I can display real time (or at least updated regularly) data based on this database.
I was thinking of writing a node server to poll the company database and check for updates and then store a copy of the relevant tables in my own database. Then creating another node server that computes metrics (average delivery time, Turnover, etc.) from my database and then a frontend (probably react) to display these metrics nicely and trigger the logic in the backend whenever the page is loaded by a user.
This is my first project so just need some guidance on whether this is the right way to go about this or if I'm over complicating it.
Thanks
One of solutions is to implement a CRON job in nodejs or in you frontEnd side, then you can retrieve new Data inserted to your Database.
You can reffer to this link for more information about the CROB job :
https://www.npmjs.com/package/cron
if you are using MySQL, you can use the mysql-events listner, it is a MySQL database and runs callbacks on matched events.
https://www.npmjs.com/package/mysql-events
I'm new on Stackoverflow even if I solved a lot of problems with your hints. Now I have a problem I have not found the solution.
I'm developing a pushing service using the WSO2 CEP and the GCM. CEP handles the subscribe/unsubscribe requests and the push events. The subscriptions keys are stored on my own server using MySQL together with other info.
My problems come with the subscribe step. This step has to handle either the new subscriptions (insert) and existing subscription (update). To make the operation easier, I decided to normalise the two operations by deleting and inserting the records (even if the record could be already on the DB).
To handle this, I developed an execution plan using Siddhi. The plan defines 2 streams: an event stream and a table stream linked to a MySQL table.
In the Execution Plan, first a delete is done using the key taken from the event and after a new record is inserted using the info contained into the event.
But it seems that the sequence of the operations (delete and insert) differs, so sometimes I found two or more records with the same GCM key on my server. I applied a workaround by adding a unique constraint on the table, but I'd like to know if there is a way to fix a deterministic order on the Siddhi operations.
Regards
Michele de Rosa
Since you are using same stream to update and insert to table there is no guarantee that delete query will execute earlier. All queries which are receiving from same stream will execute in parallel and we do not have any control over order. Only way we can enforce order is by either introducing a query pipeline or using a pattern query to delay events.
However your requirement you can use newly added insert overwrite functionality in event tables. This will automatically handle your requirement of updating if exists and inserting otherwise.
Hope this helps!!
Thanks
Tishan
I'm working on a project where we are thinking of using SQLCacheDependency with SQL Server 2005/2008 and we are wondering how this will affect the performance of the system.
So we are wondering about the following questions
Can the number of SQLCacheDependency objects (query notifications) have negative effect on SQL Server performance i.e. on insert, update and delete operations on affected tables ?
What effect (performance wise) would for example 50000 different query notifications on a single table have in SQL Server 2005/2008 on insertion and deletion on that table.
Are there any recommendations of how to use SQLCacheDependencies? Any official do‘s and don‘ts? We have found some information on the internet but haven‘t found information on performance implications.
If there is anyone here that has some answers to these questions that would be great.
The SQL Cache dependency using the polling mechanism should not be a load on the sql server or the application server.
Lets see what all steps are there for sqlcachedependency to work and analyze them:
Database is enabled for sqlcachedependency.
A table say 'Employee' is enabled for sqlcachedependency. (can be any number of tables)
Web.config is updated to enable sqlcachedependency.
The Page where u r using sql cache dependency is configured.
thats it.
Internally:
step 1. creates a table 'ASPnet_sqlcachetablesforchangenotification' in database which will store the 'Employee' table name for which sqlcachedependency is enabled. and add some stored procedures aswell.
step 2. inserts a 'Employee' table entry in the 'ASPnet_sqlcachetablesforchangenotification' table. Also creates an insert update delete trigger on this 'Employee' table.
step 3. enables application for sqlcachedependency by providing the connectionstring and polltime.
whenever there is a change in 'Employee' table, trigger is fired which inturn updates the 'ASPnet_sqlcachetablesforchangenotification' table.
Now application polls the database say every 5000ms and checks for any changes to the 'ASPnet_sqlcachetablesforchangenotification' table. if there r any changes the respective caches is removed from memory.
The great benefit of caching combined with freshness of data ( atmost data can be 5 seconds stale). The polling is taken care by a background process with should not be a performance hurdle. because as u see from above list the task are least CPU demanding.
SQLCacheDependency is implemented as an indexed view and every time the table is modified this views index gets changed. so many views (SQLCacheDependency objects) on the same table mean quite a perf hit for modifications. however if you have 1 view (SQLCacheDependency object) per table you should have no problems.
the cache changed notification is async and is triggered when the server has resources.
You're right, not much information on this is provided but there's a phrase related to your question in this page http://msdn.microsoft.com/en-us/library/ms178604%28VS.80%29.aspx
"The database operations associated with SQL cache dependency are simple and therefore do not incur a heavy processing cost on the server."
Hope this helps you although your question is a little bit old already.
This page appears to have some good info on setup which technique to use well (granted I did just skim it).
All I can provide is anecdotal evidence for performance, but we use SqlCacheDependency as a sort of "messaging solution" for a large enterprise application that processes on the order of ten thousand messages per hour.
The basic architecture is that our company uses Perforce for source control and we have a "subscription service" that receives messages from a trigger webservice call than gets called on every p4 commit and inserts a record into a SQL database. Our application has the dependency setup to send subscription notifications for every changeliest that affects a branch or path that you are monitoring.
The performance is fine. Trigger runs on the order of 200ms and we have never had a complaint about the latency of relaying the messages to end users.
As always, your mileage may vary.
I have a number of stored procs which I would like to all run simultaneously on the server. Ideally all on the server without reliance on connections to an external client.
What options are there to launch all these and have them run simultaneously (I don't even need to wait until all the processes are done to do additional work)?
I have thought of:
Launching multiple connections from
a client, having each start the
appropriate SP.
Setting up jobs for
each SP and starting the jobs from a
SQL Server connection or SP.
Using
xp_cmdshell to start additional runs
equivalent to osql or whetever
SSIS - I need to see if the package can be dynamically written to handle more SPs, because I'm not sure how much access my clients are going to get to production
In the job and cmdshell cases, I'm probably going to run into permissions level problems from the DBA...
SSIS could be a good option - if I can table-drive the SP list.
This is a datawarehouse situation, and the work is largely independent and NOLOCK is universally used on the stars. The system is an 8-way 32GB machine, so I'm going to load it down and scale it back if I see problems.
I basically have three layers, Layer 1 has a small number of processes and depends on basically all the facts/dimensions already being loaded (effective, the stars are a Layer 0 - and yes, unfortunately they will all need to be loaded), Layer 2 has a number of processes which depend on some or all of Layer 1, and Layer 3 has a number of processes which depend on some or all of Layer 2. I have the dependencies in a table already, and would only initially launch all the procs in a particular layer at the same time, since they are orthogonal within a layer.
Is SSIS an option for you? You can create a simple package with parallel Execute SQL tasks to execute the stored procs simultaneously. However, depending on what your stored procs do, you may or may not get benefit from starting this in parallel (e.g. if they all access the same table records, one may have to wait for locks to be released etc.)
At one point I did some architectural work on a product known as Acumen Advantage that has a warehouse manager that does this.
The basic strategy for this is to have a control DB with a list of the sprocs and their dependencies. Based on the dependencies you can do a Topological Sort to give them an order to run in. If you do this, you need to manage the dependencies - all of the predecessors of a stored procedure must complete before it executes. Just starting the sprocs in order on multiple threads will not accomplish this by itself.
Implementing this meant knocking much of the SSIS functionality on the head and implementing another scheduler. This is OK for a product but probably overkill for a bespoke system. A simpler solution is thus:
You can manage the dependencies at a more coarse-grained level by organising the ETL vertically by dimension (sometimes known as Subject Oriented ETL) where a single SSIS package and set of sprocs takes the data from extraction through to producing dimensions or fact tables. Typically the dimensions will mostly be siloed, so they will have minimal interdependency. Where there is interdependency, make one dimension (or fact table) load process dependent on whatever it needs upstream.
Each loader becomes relatively modular and you still get a useful degree of parallelism by kicking off the load processes in parallel and letting the SSIS scheduler work it out. The dependencies will contain some redundancy. For example an ODS table may not be dependent on a dimension load being completed but the upstream package itself takes the components right through to the dimensional schema before it completes. However this is not likely to be an issue in practice for the following reasons:
The load process probably has plenty of other tasks that can execute in the meantime
The most resource-hungry tasks will almost certainly be the fact table loads, which will mostly not be dependent on each other. Where there is a dependency (e.g. a rollup table based on the contents of another table) this cannot be avoided anyway.
You can construct the SSIS packages so they pick up all of their configuration from an XML file and the location can be supplied exernally in an environment variable. This sort of thing can be fairly easily implemented with scheduling systems like Control-M.
This means that a modified SSIS package can be deployed with relatively little manual intervention. The production staff can be handed the packages to deploy along with the stored procedures and can mainain the config files on a per-environment basis without having to manually fiddle configuration in the SSIS packages.
you might want to look at the service broker and it's activation stored procedures... might be an option...
In the end, I created a C# management console program which launches the processes Async as they are able to be run and keeps track of the connections.
I am currently working on a project with specific requirements. A brief overview of these are as follows:
Data is retrieved from external webservices
Data is stored in SQL 2005
Data is manipulated via a web GUI
The windows service that communicates with the web services has no coupling with our internal web UI, except via the database.
Communication with the web services needs to be both time-based, and triggered via user intervention on the web UI.
The current (pre-pre-production) model for web service communication triggering is via a database table that stores trigger requests generated from the manual intervention. I do not really want to have multiple trigger mechanisms, but would like to be able to populate the database table with triggers based upon the time of the call. As I see it there are two ways to accomplish this.
1) Adapt the trigger table to store two extra parameters. One being "Is this time-based or manually added?" and a nullable field to store the timing details (exact format to be determined). If it is a manaully created trigger, mark it as processed when the trigger has been fired, but not if it is a timed trigger.
or
2) Create a second windows service that creates the triggers on-the-fly at timed intervals.
The second option seems like a fudge to me, but the management of option 1 could easily turn into a programming nightmare (how do you know if the last poll of the table returned the event that needs to fire, and how do you then stop it re-triggering on the next poll)
I'd appreciate it if anyone could spare a few minutes to help me decide which route (one of these two, or possibly a third, unlisted one) to take.
Why not use a SQL Job instead of the Windows Service? You can encapsulate all of you db "trigger" code in Stored Procedures. Then your UI and SQL Job can call the same Stored Procedures and create the triggers the same way whether it's manually or at a time interval.
The way I see it is this.
You have a Windows Service, which is playing the role of a scheduler and in it there are some classes which simply call the webservices and put the data in your databases.
So, you can use these classes directly from the WebUI as well and import the data based on the WebUI trigger.
I don't like the idea of storing a user generated action as a flag (trigger) in the database where some service will poll it (at an interval which is not under the user's control) to execute that action.
You could even convert the whole code into an exe which you can then schedule using the Windows Scheduler. And call the same exe whenever the user triggers the action from the Web UI.
#Vaibhav
Unfortunately, the physical architecture of the solution will not allow any direct communication between the components, other than Web UI to Database, and database to service (which can then call out to the web services). I do, however, agree that re-use of the communication classes would be the ideal here - I just can't do it within the confines of our business*
*Isn't it always the way that a technically "better" solution is stymied by external factors?