I need to create the interaction of the two SQL-table using the BizTalk Server
The simplest example is when a new record is added to one table. Is it possible to call the BizTalk, transfer this row to BizTalk-solution, where it row will process and transfer to another SQL-table?
I found some information about BizTalk-To-SQL interaction, but i cannot find any information or example about SQL-to-BizTalk interaction.
If it possible, can you say - how, or give me some instruction?
Yes, it is possible. But it is not possible for use to give detailed instructions based on your question.
You would have to have a receive location that either polls for records in that table, either inline SQL in the receive location or a call a Stored Procedure.
Then you would do whatever transformation etc. you needed using maps, possibly Orchestrations, and have a send port that would insert it into another table.
How to Configure a port using the WCF-SQL adapter
However, as others have said in the comments, you have to consider if BizTalk is a best fit of this. This would depend on the frequency, what sort of processing is needed, how quickly after insert the record needs to be processed, number of rows, and if each row is a discreet message or a large group of records.
Some other possibilities to consider include
Insert trigger on the first table, if you need it processed instantly
A SSIS package running on a schedule, if it is a large batch of messages that needs to be processed on a schedule basis
Related
I have a table that records a lot of information at any moment, for example, 100 rows per second.
After completing each row, certain operations must be performed. That is, some of these rows should be copied to another table.
Now a few questions:
Can I use triggers to do this? Given the high number of entry rows
If multiple conditions are checked for copying to the table, can the triggers be responsive?
Additional explanation: the records added to this table are added by the fingerprint recorder
first of all, check these :
1.refer to define your trigger it can be called in insert or update etc. which not need to be executed for all operations(not required for all inserts)
2.you can forget your business during the times by changing some rules of your application
you need to pay attention to it for every change (prevent to introduce bugs)
4....
I strongly suggest you do not define trigger unless you have not any other choices.
if you have an application, you can do it in that and with putting the business
(for Instance, make a thread in your application to check and do your business)
you can have a windows service to do that for you
if you have just database access you can define a job in that to do it for you (not recommended)
finally, to avoiding blocks if you decided to use multi-thread(second thread according to your question is just for read data from your original table and insert into another), you can turn on the is_read_committed_snapshot_on in your database
I'm new on Stackoverflow even if I solved a lot of problems with your hints. Now I have a problem I have not found the solution.
I'm developing a pushing service using the WSO2 CEP and the GCM. CEP handles the subscribe/unsubscribe requests and the push events. The subscriptions keys are stored on my own server using MySQL together with other info.
My problems come with the subscribe step. This step has to handle either the new subscriptions (insert) and existing subscription (update). To make the operation easier, I decided to normalise the two operations by deleting and inserting the records (even if the record could be already on the DB).
To handle this, I developed an execution plan using Siddhi. The plan defines 2 streams: an event stream and a table stream linked to a MySQL table.
In the Execution Plan, first a delete is done using the key taken from the event and after a new record is inserted using the info contained into the event.
But it seems that the sequence of the operations (delete and insert) differs, so sometimes I found two or more records with the same GCM key on my server. I applied a workaround by adding a unique constraint on the table, but I'd like to know if there is a way to fix a deterministic order on the Siddhi operations.
Regards
Michele de Rosa
Since you are using same stream to update and insert to table there is no guarantee that delete query will execute earlier. All queries which are receiving from same stream will execute in parallel and we do not have any control over order. Only way we can enforce order is by either introducing a query pipeline or using a pattern query to delay events.
However your requirement you can use newly added insert overwrite functionality in event tables. This will automatically handle your requirement of updating if exists and inserting otherwise.
Hope this helps!!
Thanks
Tishan
I want to stream some time series data into BigQuery with insertAll but only retain the last 3 months (say) to avoid unbounded storage costs. The usual answer is to save each day of data into a separate table but AFAICT this would require each such table to be created in advance. I intend to stream data directly from unsecured clients authorized with a token that only has bigquery.insertdata scope, so they wouldn't be able to create the daily tables themselves. The only solution I can think of would be to run a secure daily cron job to create the tables -- not ideal, especially since if it misfires data will be dropped until the table is created.
Another approach would be to stream data into a single table and use table decorators to control query costs as the table grows. (I expect all queries to be for specific time ranges so the decorators should be pretty effective here.) However, there's no way to delete old data from the table, so storage costs will become unsustainable after a while. I can't figure out any way to "copy and truncate" the table atomically either, so that I can partition old data into daily tables without losing rows being streamed at that time.
Any ideas on how to solve this? Bonus points if your solution lets me re-aggregate old data into temporally coarser rows to retain more history for the same storage cost. Thanks.
Edit: just realized this is a partial duplicate of Bigquery event streaming and table creation.
If you look at the streaming API discovery document, there's a curious new experimental field called "templateSuffix", with a very relevant description.
I'd also point out that no official documentation has been released, so special care should probably go into using this field -- especially in a production setting. Experimental fields could possibly have bugs etc. Things I could think to be careful of off the top of my head are:
Modifying the schema of the base table in non-backwards-compatible ways.
Modifying the schema of a created table directly in a way that is incompatible with the base table.
Streaming to a created table directly and via this suffix -- row insert ids might not apply across boundaries.
Performing operations on the created table while it's actively being streamed to.
And I'm sure other things. Anyway, just thought I'd point that out. I'm sure official documentation will be much more thorough.
Most of us are doing the same thing as you described.
But we don't use a cron, as we create tables advance for 1 year or on some project for 5 years in advance. You may wonder why we do so, and when.
We do this when the schema is changed by us, by the developers. We do a deploy and we run a script that takes care of the schema changes for old/existing tables, and the script deletes all those empty tables from the future and simply recreates them. We didn't complicated our life with a cron, as we know the exact moment the schema changes, that's the deploy and there is no disadvantage to create tables in advance for such a long period. We do this based on tenants too on SaaS based system when the user is created or they close their accounts.
This way we don't need a cron, we just to know that the deploy needs to do this additional step when the schema changed.
As regarding don't lose streaming inserts while I do some maintenance on your tables, you need to address in your business logic at the application level. You probably have some sort of message queue, like Beanstalkd to queue all the rows into a tube and later a worker pushes to BigQuery. You may have this to cover the issue when BigQuery API responds with error and you need to retry. It's easy to do this with a simple message queue. So you would relly on this retry phase when you stop or rename some table for a while. The streaming insert will fail, most probably because the table is not ready for streaming insert eg: have been temporary renamed to do some ETL work.
If you don't have this retry phase you should consider adding it, as it not just helps retrying for BigQuery failed calls, but also allows you do have some maintenance window.
you've already solved it by partitioning. if table creation is an issue have an hourly cron in appengine that verifies today and tomorrow tables are always created.
very likely the appengine wont go over the free quotas and it has 99.95% SLO for uptime. the cron will never go down.
I want to stream some time series data into BigQuery with insertAll but only retain the last 3 months (say) to avoid unbounded storage costs. The usual answer is to save each day of data into a separate table but AFAICT this would require each such table to be created in advance. I intend to stream data directly from unsecured clients authorized with a token that only has bigquery.insertdata scope, so they wouldn't be able to create the daily tables themselves. The only solution I can think of would be to run a secure daily cron job to create the tables -- not ideal, especially since if it misfires data will be dropped until the table is created.
Another approach would be to stream data into a single table and use table decorators to control query costs as the table grows. (I expect all queries to be for specific time ranges so the decorators should be pretty effective here.) However, there's no way to delete old data from the table, so storage costs will become unsustainable after a while. I can't figure out any way to "copy and truncate" the table atomically either, so that I can partition old data into daily tables without losing rows being streamed at that time.
Any ideas on how to solve this? Bonus points if your solution lets me re-aggregate old data into temporally coarser rows to retain more history for the same storage cost. Thanks.
Edit: just realized this is a partial duplicate of Bigquery event streaming and table creation.
If you look at the streaming API discovery document, there's a curious new experimental field called "templateSuffix", with a very relevant description.
I'd also point out that no official documentation has been released, so special care should probably go into using this field -- especially in a production setting. Experimental fields could possibly have bugs etc. Things I could think to be careful of off the top of my head are:
Modifying the schema of the base table in non-backwards-compatible ways.
Modifying the schema of a created table directly in a way that is incompatible with the base table.
Streaming to a created table directly and via this suffix -- row insert ids might not apply across boundaries.
Performing operations on the created table while it's actively being streamed to.
And I'm sure other things. Anyway, just thought I'd point that out. I'm sure official documentation will be much more thorough.
Most of us are doing the same thing as you described.
But we don't use a cron, as we create tables advance for 1 year or on some project for 5 years in advance. You may wonder why we do so, and when.
We do this when the schema is changed by us, by the developers. We do a deploy and we run a script that takes care of the schema changes for old/existing tables, and the script deletes all those empty tables from the future and simply recreates them. We didn't complicated our life with a cron, as we know the exact moment the schema changes, that's the deploy and there is no disadvantage to create tables in advance for such a long period. We do this based on tenants too on SaaS based system when the user is created or they close their accounts.
This way we don't need a cron, we just to know that the deploy needs to do this additional step when the schema changed.
As regarding don't lose streaming inserts while I do some maintenance on your tables, you need to address in your business logic at the application level. You probably have some sort of message queue, like Beanstalkd to queue all the rows into a tube and later a worker pushes to BigQuery. You may have this to cover the issue when BigQuery API responds with error and you need to retry. It's easy to do this with a simple message queue. So you would relly on this retry phase when you stop or rename some table for a while. The streaming insert will fail, most probably because the table is not ready for streaming insert eg: have been temporary renamed to do some ETL work.
If you don't have this retry phase you should consider adding it, as it not just helps retrying for BigQuery failed calls, but also allows you do have some maintenance window.
you've already solved it by partitioning. if table creation is an issue have an hourly cron in appengine that verifies today and tomorrow tables are always created.
very likely the appengine wont go over the free quotas and it has 99.95% SLO for uptime. the cron will never go down.
For an application I am writing, I need to be able to identify when new data is inserted into several tables of a database.
The problem is two fold, this data will be been inserted many times per minute into sometimes very large databases (and I need to be sensitive to demand / database polling issues) and I have no control of the application creating this data (so as far as I know, I can't use the notify / listen functionality available within postgres for exactly this kind of task*).
Any suggestion regarding a good strategy would be much appreciated.
*I believe the application controlling this data is using the notify / listen functionality itself, but I haven't a clue how (if at all possible) to know what the "channel" it uses externally and if it is ever able to latch on to that.
Generally, you need something in the table that you can use to determine newness, and there are a few approaches.
A timestamp column would let you use the date but you'd still have the application issue of storing a date outside of your database, and data that isn't in the database means another realm of data to manage. Yuck.
A tracking table that stored last update/insert timestamps on a per-table basis could give you what you want. You'd want to use a trigger to maintain the last-DML timestamp.
A solution you don't want to use is a serial (integer) id that comes from nextval, for any purpose than uniqueness. The standard/common mistake is to presume serial keys will be contiguous (they're not) or monotonic (they're not).