SQL Server parallel inserts with Azure Eventhub + Logic app

SQL Server parallel inserts with Azure Eventhub + Logic app - sql

I'm using Azure Event Hub to stream Azure diagnostics data to a Logic app to save it into an Azure SQL table for monitoring purposes. This works great. However, it sometimes occurs that EventHub sends duplicates. To avoid duplicate inserts I'm using a INSERT INTO statement with a WHERE NOT EXISTS clause. However, very infrequently I am still getting double rows. The Logic app runs parallel so I guess this is causing the issue. I think sometimes it does the same insert on exactly the same time, which causes the WHERE NOT EXISTS clause not to work.
Does anyone know a workaround? i'd rather not do a DELETE from and remove duplicate rows afterwards as I want to put a unique key constraint on the table.

You have to have some idempotency checking in you function. When scaling happens the new scale unit will take over processing a partition and thus reprocess the same messages.
I don't know if Service Bus can queue those messages.
I advise you to report to the Event Hubs team.

Related

Order excecution in Siddhi on wso2cep

I'm new on Stackoverflow even if I solved a lot of problems with your hints. Now I have a problem I have not found the solution.
I'm developing a pushing service using the WSO2 CEP and the GCM. CEP handles the subscribe/unsubscribe requests and the push events. The subscriptions keys are stored on my own server using MySQL together with other info.
My problems come with the subscribe step. This step has to handle either the new subscriptions (insert) and existing subscription (update). To make the operation easier, I decided to normalise the two operations by deleting and inserting the records (even if the record could be already on the DB).
To handle this, I developed an execution plan using Siddhi. The plan defines 2 streams: an event stream and a table stream linked to a MySQL table.
In the Execution Plan, first a delete is done using the key taken from the event and after a new record is inserted using the info contained into the event.
But it seems that the sequence of the operations (delete and insert) differs, so sometimes I found two or more records with the same GCM key on my server. I applied a workaround by adding a unique constraint on the table, but I'd like to know if there is a way to fix a deterministic order on the Siddhi operations.
Regards
Michele de Rosa

Since you are using same stream to update and insert to table there is no guarantee that delete query will execute earlier. All queries which are receiving from same stream will execute in parallel and we do not have any control over order. Only way we can enforce order is by either introducing a query pipeline or using a pattern query to delay events.
However your requirement you can use newly added insert overwrite functionality in event tables. This will automatically handle your requirement of updating if exists and inserting otherwise.
Hope this helps!!
Thanks
Tishan

How to efficiently trigger system command with SQL query or table change?

I have data conversion and caching service running as self-hosted WCF service.
Now it uses database polling in constant short intervals to update its data.
I think it's unnecessary. The data can be changed only if one of the tables is changed, and when the data is changed depends on system users actions.
There is no problem in setting a trigger for specific tables, however I would need an action outside SQL-Server to update my cache. My WCF service could perform update when receiving specific URI via HTTP. So all I need is a command in table trigger which would send a request. Is it even possible?
I think about a hack I used back in the days with HTTP requests. I halted HTTP request response at server until data packet from somewhere else arrived. There was no delay between polling requests. I achieved fully asynchronous, "real-time" updates.
Maybe this approach is possible to apply with SQL? I think about a query which blocks termination until receives a signal. Well, it eventually times out, but it's good enough to try. Then - how to signal and wait in SQL? By locking and unlocking shared resource, like cursor or dummy table?
Any other options?
I need the cache update done at lowest possible frequency (because it's pretty expensive, so once per minute is great), but I need immediate update when the data is changed.

To answer your question, have you looked at xp_cmdshell?
https://msdn.microsoft.com/en-us/library/ms175046.aspx
However, the security/performance implications of such a decision could be non-trivial depending on your use case.

How to handle errors in a trigger?

I'm writing some SQL code that needs to be executed when rows are inserted in a database table, so I'm using an AFTER INSERT trigger; the code is quite complex, thus there could still be some bugs around.
I've discovered that, if an error happens when executing a trigger, SQL Server aborts the batch and/or the whole transaction. This is not acceptable for me, because it causes problems to the main application that uses the database; I also don't have the source code for that application, so I can't perform proper debugging on it. I absolutely need all database actions to succeed, even if my trigger fails.
How can I code my trigger so that, should an error happen, SQL Server will not abort the INSERT action?
Additionally, how can I perform proper error handling so that I can actually know the trigger has failed? Sending an email with the error data would be ok for me (the trigger's main purpose is actually sending emails), but how do I detect an error condition in a trigger and react to it?
Edit:
Thanks for the tips about optimizing performance by using something else than a trigger, but this code is not "complex" in the sense that it's long-running or performance intensive; it simply builds and sends a mail message, but in order to do so, it must retrieve data from various linked tables, and since I am reverse-engineering this application, I don't have the database schema available and am still trying to find my way around it; this is why conversion errors or unexpected/null values can still creep up, crashing the trigger execution.
Also, as stated above, I absolutely can't perform debugging on the application itself, nor modify it to do what I need in the application layer; the only way to react to an application event is by firing a database trigger when the application writes to the DB that something has just heppened.

If the operations in the trigger are complex and/or potentially long running, and you don't want the activity to affect the original transaction, then you need to find a way to decouple the activity.
One way might be to use Service Broker. In the trigger, just create message(s) (one per row) and send them on their way, then do the rest of the processing in the service.
If that seems too complex, the older way to do it is to insert the rows needing processing into a work/queue table, and then have a job continuously pulling rows from there are doing the work.
Either way, you're now not preventing the original transaction from committing.

Triggers are part of the transaction. You could do try catch swallow around the trigger code, or somewhat more professional try catch log swallow, but really you should let it go bang and then fix the real problem which can only be in your trigger.
If none of the above are acceptable, then you can't use a trigger.

Async Message Testing

Here is the problem I am facing with respect to Asynchronous Testing. The Problem statement is as below
I get a big batch of xml with data of multiple candidates. We do some validations and split that big xml into multiple xml's for each candidate. Each and every xml is persisted to the file structured database wih a Unique Identifier. A Unique identifier is generated for each of the messages that got persisted to the database. Each of those unique identifier's are hosted on to the Queue for subscription.
I am working on developing the automation test framework. I am looking for a way to let the test class know that unique idenifier has been subscribed by the next step in Data processing.
I have read information regarding the above problem. Most of which specifies Thread sleeps and timers. The problem what would happen is when we run the large number of test cases, it takes enoromously large amount of time.
Have read Awaitility. Had some hopes on it. Any ideas and anyonehas faced a similar situation. Please help.
Thanks
DevAutotester

You could use Awaitility to wait until all id's exists in the db or queue (if I understand it correctly) and then continue to do the validation afterwards. You will have to provide a supplier to Awaitility that checks that all IDs are present. Awaitility will then wait for this statement to be true.
/Johan

Performance questions for SQL Cache Dependency

I'm working on a project where we are thinking of using SQLCacheDependency with SQL Server 2005/2008 and we are wondering how this will affect the performance of the system.
So we are wondering about the following questions
Can the number of SQLCacheDependency objects (query notifications) have negative effect on SQL Server performance i.e. on insert, update and delete operations on affected tables ?
What effect (performance wise) would for example 50000 different query notifications on a single table have in SQL Server 2005/2008 on insertion and deletion on that table.
Are there any recommendations of how to use SQLCacheDependencies? Any official do‘s and don‘ts? We have found some information on the internet but haven‘t found information on performance implications.
If there is anyone here that has some answers to these questions that would be great.

The SQL Cache dependency using the polling mechanism should not be a load on the sql server or the application server.
Lets see what all steps are there for sqlcachedependency to work and analyze them:
Database is enabled for sqlcachedependency.
A table say 'Employee' is enabled for sqlcachedependency. (can be any number of tables)
Web.config is updated to enable sqlcachedependency.
The Page where u r using sql cache dependency is configured.
thats it.
Internally:
step 1. creates a table 'ASPnet_sqlcachetablesforchangenotification' in database which will store the 'Employee' table name for which sqlcachedependency is enabled. and add some stored procedures aswell.
step 2. inserts a 'Employee' table entry in the 'ASPnet_sqlcachetablesforchangenotification' table. Also creates an insert update delete trigger on this 'Employee' table.
step 3. enables application for sqlcachedependency by providing the connectionstring and polltime.
whenever there is a change in 'Employee' table, trigger is fired which inturn updates the 'ASPnet_sqlcachetablesforchangenotification' table.
Now application polls the database say every 5000ms and checks for any changes to the 'ASPnet_sqlcachetablesforchangenotification' table. if there r any changes the respective caches is removed from memory.
The great benefit of caching combined with freshness of data ( atmost data can be 5 seconds stale). The polling is taken care by a background process with should not be a performance hurdle. because as u see from above list the task are least CPU demanding.

SQLCacheDependency is implemented as an indexed view and every time the table is modified this views index gets changed. so many views (SQLCacheDependency objects) on the same table mean quite a perf hit for modifications. however if you have 1 view (SQLCacheDependency object) per table you should have no problems.
the cache changed notification is async and is triggered when the server has resources.

You're right, not much information on this is provided but there's a phrase related to your question in this page http://msdn.microsoft.com/en-us/library/ms178604%28VS.80%29.aspx
"The database operations associated with SQL cache dependency are simple and therefore do not incur a heavy processing cost on the server."
Hope this helps you although your question is a little bit old already.

This page appears to have some good info on setup which technique to use well (granted I did just skim it).

All I can provide is anecdotal evidence for performance, but we use SqlCacheDependency as a sort of "messaging solution" for a large enterprise application that processes on the order of ten thousand messages per hour.
The basic architecture is that our company uses Perforce for source control and we have a "subscription service" that receives messages from a trigger webservice call than gets called on every p4 commit and inserts a record into a SQL database. Our application has the dependency setup to send subscription notifications for every changeliest that affects a branch or path that you are monitoring.
The performance is fine. Trigger runs on the order of 200ms and we have never had a complaint about the latency of relaying the messages to end users.
As always, your mileage may vary.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas