Order excecution in Siddhi on wso2cep - google-cloud-messaging

I'm new on Stackoverflow even if I solved a lot of problems with your hints. Now I have a problem I have not found the solution.
I'm developing a pushing service using the WSO2 CEP and the GCM. CEP handles the subscribe/unsubscribe requests and the push events. The subscriptions keys are stored on my own server using MySQL together with other info.
My problems come with the subscribe step. This step has to handle either the new subscriptions (insert) and existing subscription (update). To make the operation easier, I decided to normalise the two operations by deleting and inserting the records (even if the record could be already on the DB).
To handle this, I developed an execution plan using Siddhi. The plan defines 2 streams: an event stream and a table stream linked to a MySQL table.
In the Execution Plan, first a delete is done using the key taken from the event and after a new record is inserted using the info contained into the event.
But it seems that the sequence of the operations (delete and insert) differs, so sometimes I found two or more records with the same GCM key on my server. I applied a workaround by adding a unique constraint on the table, but I'd like to know if there is a way to fix a deterministic order on the Siddhi operations.
Regards
Michele de Rosa

Since you are using same stream to update and insert to table there is no guarantee that delete query will execute earlier. All queries which are receiving from same stream will execute in parallel and we do not have any control over order. Only way we can enforce order is by either introducing a query pipeline or using a pattern query to delay events.
However your requirement you can use newly added insert overwrite functionality in event tables. This will automatically handle your requirement of updating if exists and inserting otherwise.
Hope this helps!!
Thanks
Tishan

Related

SQL Server parallel inserts with Azure Eventhub + Logic app

I'm using Azure Event Hub to stream Azure diagnostics data to a Logic app to save it into an Azure SQL table for monitoring purposes. This works great. However, it sometimes occurs that EventHub sends duplicates. To avoid duplicate inserts I'm using a INSERT INTO statement with a WHERE NOT EXISTS clause. However, very infrequently I am still getting double rows. The Logic app runs parallel so I guess this is causing the issue. I think sometimes it does the same insert on exactly the same time, which causes the WHERE NOT EXISTS clause not to work.
Does anyone know a workaround? i'd rather not do a DELETE from and remove duplicate rows afterwards as I want to put a unique key constraint on the table.
You have to have some idempotency checking in you function. When scaling happens the new scale unit will take over processing a partition and thus reprocess the same messages.
I don't know if Service Bus can queue those messages.
I advise you to report to the Event Hubs team.

How to call BizTalk Server when Inserting new row into SQL table?

I need to create the interaction of the two SQL-table using the BizTalk Server
The simplest example is when a new record is added to one table. Is it possible to call the BizTalk, transfer this row to BizTalk-solution, where it row will process and transfer to another SQL-table?
I found some information about BizTalk-To-SQL interaction, but i cannot find any information or example about SQL-to-BizTalk interaction.
If it possible, can you say - how, or give me some instruction?
Yes, it is possible. But it is not possible for use to give detailed instructions based on your question.
You would have to have a receive location that either polls for records in that table, either inline SQL in the receive location or a call a Stored Procedure.
Then you would do whatever transformation etc. you needed using maps, possibly Orchestrations, and have a send port that would insert it into another table.
How to Configure a port using the WCF-SQL adapter
However, as others have said in the comments, you have to consider if BizTalk is a best fit of this. This would depend on the frequency, what sort of processing is needed, how quickly after insert the record needs to be processed, number of rows, and if each row is a discreet message or a large group of records.
Some other possibilities to consider include
Insert trigger on the first table, if you need it processed instantly
A SSIS package running on a schedule, if it is a large batch of messages that needs to be processed on a schedule basis

When to invalidate cache - .net core api

How do I know when to invalidate the cache, if a table change is made from an outside source?
I have an api call that returns an employee table. The first time this call is made, I will cache the results so that on subsequent calls it will pull the data from the cache instead of the database. This makes sense, however, what happens if someone adds a new record to the employee table from outside of the api, how does the cache know that it is now invalid?
If the user made the change to the employee table through the API I can capture that, but we have a separate desktop app that doesn't use the API, and that app can directly make changes to the employee table. Is there any accepted standards for handling this?
The only possible solution I can think of is to add a trigger to the employee table, and somehow use that to know when a table has changed. But, we have over a thousand tables, and we are making an api call for each table - So, I do not think that adding a thousand triggers to our database is an acceptable solution.
Yes you could add a trigger as suggested. Or you could use a caching system that support expiry time/sliding expiry. So you would be serving up stale data some of the time but not always.
As the other answer a suggests your trigger idea is ok, however as you've stated that would be a lot of triggers.
If your cache is not local to the API, which i assume it isn't if triggers would be able to access. Could you not access it from your desktop application? You could invalidate your cache by removing the employee record from the cache with the desktop application when it makes a successful change to the employee table.
It boils down to..
You have a cache (which is essentially a read store).
You have two options to update it
- Either it times out and fetches (which is ok, if you dont need up to the minute real time data)
- Or is has to be told its data is no longer valid.
Two ways to solve this
Push model
Pull model
Push Model: Using a database trigger for SQL server table to populate an intermediate audit table and polling that using a background task.
Pull Model: Using CLR Trigger and pushing the updates to an API. Whenever DML happens the CLR trigger will call the Api, qhich in-turn can update the cache!
Hope this helps!

SQL Server: Using triggers for workflow automation

In a media management system my task is to create a workflow automation. Currently, i have created it using SQL Server triggers and the UI using ASP.NET with JQuery.
For Ex:
When a new file enters the system the trigger works and it will update the database metadata table with some data for that file.
Millions of assets get through the system. Is it ideal to have triggers to do this process.
Is there a better way to create this automation?
Is there a "best practice" to do this kind of works?
I'm having the same issue and data enters my central asset database on several ways (may differ from client to client).
So I also want to create an easily customizable workflow in the data layer (no other dependencies)
As the other people mention, triggers may affect the parent activity.
That is overcome by writing your action that should be performed away to a queue table.
Example Trigger. Hardware.Status = "Issue Work Order"
INSERT INTO Queue (Created, Task, Completed) VALUES (GETUTCDATE(),"EXEC dbo.IssueWorkOrder(123)",0);
The insert of a record into your queue table will reduce the problems as highlighted by other user comments.
The you build a scheduling tool (hangfire, sql tasks, or whatever), that execute tasks in the queue in the data order it wAS added.
Now, of course in practice it's not as simple as that. You will have to address the following:
What if the step fails2
Dependencies of previous steps to first have been completed
Multiple operators changing a record. (the deploy time between the job step being executed, and another person updating the same record.
I guess #2 and #3 is an issue with any workflow engine / pipleline. To address this a locking mechanism must be put in place.

Performance questions for SQL Cache Dependency

I'm working on a project where we are thinking of using SQLCacheDependency with SQL Server 2005/2008 and we are wondering how this will affect the performance of the system.
So we are wondering about the following questions
Can the number of SQLCacheDependency objects (query notifications) have negative effect on SQL Server performance i.e. on insert, update and delete operations on affected tables ?
What effect (performance wise) would for example 50000 different query notifications on a single table have in SQL Server 2005/2008 on insertion and deletion on that table.
Are there any recommendations of how to use SQLCacheDependencies? Any official do‘s and don‘ts? We have found some information on the internet but haven‘t found information on performance implications.
If there is anyone here that has some answers to these questions that would be great.
The SQL Cache dependency using the polling mechanism should not be a load on the sql server or the application server.
Lets see what all steps are there for sqlcachedependency to work and analyze them:
Database is enabled for sqlcachedependency.
A table say 'Employee' is enabled for sqlcachedependency. (can be any number of tables)
Web.config is updated to enable sqlcachedependency.
The Page where u r using sql cache dependency is configured.
thats it.
Internally:
step 1. creates a table 'ASPnet_sqlcachetablesforchangenotification' in database which will store the 'Employee' table name for which sqlcachedependency is enabled. and add some stored procedures aswell.
step 2. inserts a 'Employee' table entry in the 'ASPnet_sqlcachetablesforchangenotification' table. Also creates an insert update delete trigger on this 'Employee' table.
step 3. enables application for sqlcachedependency by providing the connectionstring and polltime.
whenever there is a change in 'Employee' table, trigger is fired which inturn updates the 'ASPnet_sqlcachetablesforchangenotification' table.
Now application polls the database say every 5000ms and checks for any changes to the 'ASPnet_sqlcachetablesforchangenotification' table. if there r any changes the respective caches is removed from memory.
The great benefit of caching combined with freshness of data ( atmost data can be 5 seconds stale). The polling is taken care by a background process with should not be a performance hurdle. because as u see from above list the task are least CPU demanding.
SQLCacheDependency is implemented as an indexed view and every time the table is modified this views index gets changed. so many views (SQLCacheDependency objects) on the same table mean quite a perf hit for modifications. however if you have 1 view (SQLCacheDependency object) per table you should have no problems.
the cache changed notification is async and is triggered when the server has resources.
You're right, not much information on this is provided but there's a phrase related to your question in this page http://msdn.microsoft.com/en-us/library/ms178604%28VS.80%29.aspx
"The database operations associated with SQL cache dependency are simple and therefore do not incur a heavy processing cost on the server."
Hope this helps you although your question is a little bit old already.
This page appears to have some good info on setup which technique to use well (granted I did just skim it).
All I can provide is anecdotal evidence for performance, but we use SqlCacheDependency as a sort of "messaging solution" for a large enterprise application that processes on the order of ten thousand messages per hour.
The basic architecture is that our company uses Perforce for source control and we have a "subscription service" that receives messages from a trigger webservice call than gets called on every p4 commit and inserts a record into a SQL database. Our application has the dependency setup to send subscription notifications for every changeliest that affects a branch or path that you are monitoring.
The performance is fine. Trigger runs on the order of 200ms and we have never had a complaint about the latency of relaying the messages to end users.
As always, your mileage may vary.