SQL Server Change Data Capture - how to separate log entries by transaction?

SQL Server Change Data Capture - how to separate log entries by transaction? - sql

I'm just getting started with SQL Server's Change Data Capture functionality. I'd like to be able to pick out entries from the change table based on which transactions they were part of. Is there a way to do this? The transaction ID field in those tables doesn't seem to correspond to anything meaningful.
I realize that there are the LSN's and you can look for general start and end times, but is it possible to account for multiple transactions which run at the same time, each affecting different records? (with potentially interleaving start and end times for different operations)

I ran into this same problem a few minutes ago. I did some Googling and it appears there's a cdc.lsn_time_mapping table that may do this for you.
Here's how I used it (not sure how reliable this is, what would happen if you truncate the transaction log, etc.):
DECLARE #from_lsn binary(10), #to_lsn binary(10);
SET #from_lsn = sys.fn_cdc_get_min_lsn('Application_Person');
SET #to_lsn = sys.fn_cdc_get_max_lsn();
SELECT m.*, c.*
FROM cdc.fn_cdc_get_all_changes_Application_Person(#from_lsn, #to_lsn, N'all') c
INNER JOIN cdc.lsn_time_mapping m ON c.__$start_lsn = m.start_lsn
Other useful links for future readers:
https://msdn.microsoft.com/en-us/library/bb510494.aspx (How to set up
CDC)
https://msdn.microsoft.com/en-us/library/bb510627.aspx (How to
query changes)

Related

Oracle SQL -- Can I timestamp a query?

I have a simple query:
Select Count(p.Group_ID)
From Player_Source P
Inner Join Feature_Group_Xref X On P.Group_Id=X.Group_Id
where x.feature_name ='Try this site'
which spits out the current number of people in a specific test group at the current moment in time.
If I wanted to see what this number was, say, on 9/10/12 instead, could I add something in to the query to time phase this information as the database had it 2 days ago?

No. If you want to store historical information, you will need to incorporate that into your schema. For example, you might extend Feature_Group_Xref to add the columns Effective_Start_Timestamp and Effective_End_Timestamp; to find which groups currently have a given feature, you would write AND Effective_End_Timestamp > CURRENT_TIMESTAMP() (or AND Effective_End_Timestamp IS NULL, depending how you want to define the column), but to find which groups had a given feature at a specific time, you would write AND ... BETWEEN Effective_Start_Timestamp AND Effective_End_Timestamp (or AND Effective_Start_Timestamp < ... AND (Effective_End_Timestamp > ... OR Effective_End_Timestamp IS NULL)).
Wikipedia has a good article on various schema designs that people use to tackle this sort of problem: see http://en.wikipedia.org/wiki/Slowly_changing_dimension.

It depends...
It is at least theoretically possible that you could use flashback query
Select Count(p.Group_ID)
From Player_Source as of timestamp( date '2012-09-10' ) P
Join Feature_Group_Xref as of timestamp( date '2012-09-10' ) X
On P.Group_Id=X.Group_Id
where x.feature_name ='Try this site'
This requires, though, that you have the privileges necessary to do a flashback query and that there is enough UNDO for Oracle to apply to be able to get back to the state those tables were in at midnight two days ago. It is unlikely that the database is configured to retain that much UNDO though it is generally possible. This query would also work if you happen to be using Oracle Total Recall.
More likely, though, you will need to modify your schema definition so that you are storing historical information that you can then query as of a point in time. There are a variety of ways to accomplish this-- adding effective and expiration date columns to the table as #ruakh suggests is one of the more popular options. Which option(s) are appropriate in your particular case will depend on a variety of factors including how much history you want to retain, how frequently data changes, etc.

SQL update working not insert

Ok I am going to do my best describing this. I have a SP which passes in XML and updates and inserts another table. This was working yesterday. All I changed today was loading the temp table with a OPENXML vs xml.nodes. I even changed it back and I am still getting this interesting issue. I have an update and insert in the same transaction. The update works and then the Insert hangs, no error no nothing... going on 9 minutes. Normally takes 10 seconds. No Blocking processes according to master.sys.sysprocesses. The funny thing is the Select of the Insert returns no rows as they are already in the database. The update updates 72438 in
SQL Server Execution Times:
CPU time = 1359 ms, elapsed time = 7955 ms.
ROWS AFFECTED(72438)
I am out of ideas as to what could be causing my issue? Permissions I don't think so? Space I don't think so because a Error would be returned?
queries:
UPDATE [Sales].[dbo].[WeeklySummary]
SET [CountryId] = I.CountryId
,[CurrencyId] = I.CurrencyId
,[WeeklySummaryType] = #WeeklySummaryTypeId
,[WeeklyBalanceAmt] = M.WeeklyBalanceAmt + I.WeeklyBalanceAmt
,[CurrencyFactor] = I.CurrencyFactor
,[Comment] = I.Comment
,[UserStamp] = I.UserStamp
,[DateTimeStamp] = I.DateTimeStamp
FROM
[Sales].[dbo].[WeeklySummary] M
INNER JOIN
#WeeklySummaryInserts I
ON M.EntityId = I.EntityId
AND M.EntityType = I.EntityType
AND M.WeekEndingDate = I.WeekEndingDate
AND M.BalanceId = I.BalanceId
AND M.ItemType = I.ItemType
AND M.AccountType = I.AccountType
and
INSERT INTO [Sales].[dbo].[WeeklySummary]
([EntityId]
,[EntityType]
,[WeekEndingDate]
,[BalanceId]
,[CountryId]
,[CurrencyId]
,[WeeklySummaryType]
,[ItemType]
,[AccountType]
,[WeeklyBalanceAmt]
,[CurrencyFactor]
,[Comment]
,[UserStamp]
,[DateTimeStamp])
SELECT
I.[EntityId]
, I.[EntityType]
, I.[WeekEndingDate]
, I.[BalanceId]
, I.[CountryId]
, I.[CurrencyId]
, #WeeklySummaryTypeId
, I.[ItemType]
, I.[AccountType]
, I.[WeeklyBalanceAmt]
, I.[CurrencyFactor]
, I.[Comment]
, I.[UserStamp]
, I.[DateTimeStamp]
FROM
#WeeklySummaryInserts I
LEFT OUTER JOIN
[Sales].[dbo].[WeeklySummary] M
ON I.EntityId = M.EntityId
AND I.EntityType = M.EntityType
AND I.WeekEndingDate = M.WeekEndingDate
AND I.BalanceId = M.BalanceId
AND I.ItemType = M.ItemType
AND I.AccountType = M.AccountType
WHERE M.WeeklySummaryId IS NULL
UPDATE
Trying the advice here worked for a while I run the following before my stored procedure call
UPDATE STATISTICS Sales.dbo.WeeklySummary;
UPDATE STATISTICS Sales.dbo.ARSubLedger;
UPDATE STATISTICS dbo.AccountBalance;
UPDATE STATISTICS dbo.InvoiceUnposted
UPDATE STATISTICS dbo.InvoiceItemUnposted;
UPDATE STATISTICS dbo.InvoiceItemUnpostedHistory;
UPDATE STATISTICS dbo.InvoiceUnpostedHistory;
EXEC sp_recompile N'dbo.proc_ChargeRegister'
Still stalling at the Insert Statement, which again inserts 0 rows.

There are really only a few things that can be going on, and the trick here is to eliminate them in order, from simplest to most complex.
STEP 1: Hand craft a set of XML to run that will produce exactly one insert and no updates, so you can go "back to basics" as it were and establish that the code is still doing what you expect, and the result is exactly what you expect. This may seem silly or unnecessary but you really need this reality check to start.
STEP 2: Hand craft a set of XML that will produce a medium-size set of inserts, still with no updates. Based on your experience with the routine, try to find something that will run in a 3-4 seconds. Perhaps 5000 rows. Does it continue to behave as expected?
STEP 3: Assuming steps 1 and 2 pass easily, the next most likely problem is TRANSACTION SIZE. If your update hits 74,000 rows in a single statement, then SQL Server must allocate resources to be able to roll back all 74,000 rows in the case of an abort. Generally you should assume the resources (and time) required to maintain a transaction explode exponentially as the row count goes up. So hand-craft one more set of inserts that contains 50,000 rows. You should find it takes dramatically more time. Let it finish. Is it 10 minutes, an hour? If it takes a long time but finishes, you have an issue with TRANSACTION SIZE, the server is choking trying to keep track of everything required to roll back the insert in the event of failure.
STEP 4: Determine if your entire stored procedure is operating within a single implied transaction. If it is, the matter is entirely worse, because SQL Server is tracking together everything required to roll back both the 74,000 updates and the ??? inserts in a single transaction. See this page:
http://msdn.microsoft.com/en-us/library/ms687099(v=vs.85).aspx
STEP 5: If you've got a single implicit transaction, you can either. A) Turn that off, which may help some but will not entirely fix the problem, or B) break the sproc into two separate calls, one for updates, one for inserts, so that at least the two are in separate transactions.
STEP 6: Consider "chunking". This is a technique for avoiding exploding transaction costs. Considering just the INSERT to get us started, you wrap the insert into a loop that begins and commits a transaction inside each iteration, and exits when affected rows is zero. The INSERT is modified so that you pull only the first 1000 rows from the source and insert them (that 1000 number is kind of arbitrary, you may find 5000 produces better performance, you have to experiment a bit). Once the INSERT affects zero rows, there are no more rows to handle and the loop exits.
QUICK EDIT: The "chunking" system works because the complete throughput for a large set of rows looks something like a quadratic. If you execute an INSERT that affects a huge number of rows, the total time for all rows to be handled explodes. If on the other hand you break it up and go row-by-row, the overhead of opening and committing each statement causes the total time for all rows to explode. Somewhere in the middle, when you've "chunked" out 1k rows per statement, the transaction requirements are at their minimum and the overhead of opening and committing the statement is negligible, and the total time for all rows to be handled is a minimum.

I had a problem where the stored proc was actually getting recompiled in the middle of running because it was deleting rows from a temp table. My situation doesn't look like yours, but mine was so odd that reading about it might give you some ideas.
Unexplained SQL Server Timeouts and Intermittent Blocking
I think you should post the full stored proc because the problem doesn't look to be where you think it is.

mysterious oracle query

if a query in oracle takes the first time it is executed 11 minutes, and the next time, the same query 25 seconds, with the buffer being flushed, what is the possible cause? could it be that the query is written in a bad way?
set timing on;
set echo on
set lines 999;
insert into elegrouptmp select idcll,idgrpl,0 from elegroup where idgrpl = 109999990;
insert into SLIMONTMP (idpartes, indi, grecptseqs, devs, idcll, idclrelpayl)
select rel.idpartes, rel.indi, rel.idgres,rel.iddevs,vpers.idcll,nvl(cdsptc.idcll,vpers.idcll)
from
relbqe rel,
elegrouptmp ele,
vrdlpers vpers
left join cdsptc cdsptc on
(cdsptc.idclptcl = vpers.idcll and
cdsptc.cdptcs = 'NOS')
where
rel.idtits = '10BCPGE ' and
vpers.idbqes = rel.idpartes and
vpers.cdqltptfc = 'N' and
vpers.idcll = ele.idelegrpl and
ele.idgrpl = 109999990;
alter system flush shared_pool;
alter system flush buffer_cache;
alter system flush global context;
select /* original */ mvtcta_part_SLIMONtmp.idpartes,mvtcta_part_SLIMONtmp.indi,mvtcta_part_SLIMONtmp.grecptseqs,mvtcta_part_SLIMONtmp.devs,
mvtcta_part_SLIMONtmp.idcll,mvtcta_part_SLIMONtmp.idclrelpayl,mvtcta_part_vrdlpers1.idcll,mvtcta_part_vrdlpers1.shnas,mvtcta_part_vrdlpers1.cdqltptfc,
mvtcta_part_vrdlpers1.idbqes,mvtcta_part_compte1.idcll,mvtcta_part_compte1.grecpts,mvtcta_part_compte1.seqc,mvtcta_part_compte1.devs,mvtcta_part_compte1.sldminud,
mvtcta.idcll,mvtcta.grecptseqs,mvtcta.devs,mvtcta.termel,mvtcta.dtcptl,mvtcta.nusesi,mvtcta.fiches,mvtcta.indl,mvtcta.nuecrs,mvtcta.dtexel,mvtcta.dtvall,
mvtcta.dtpayl,mvtcta.ioi,mvtcta.mtd,mvtcta.cdlibs,mvtcta.libcps,mvtcta.sldinitd,mvtcta.flagtypei,mvtcta.flagetati,mvtcta.flagwarnl,mvtcta.flagdonei,mvtcta.oriindl,
mvtcta.idportfl,mvtcta.extnuecrs
from SLIMONtmp mvtcta_part_SLIMONtmp
left join vrdlpers mvtcta_part_vrdlpers1 on
(
mvtcta_part_vrdlpers1.idbqes = mvtcta_part_SLIMONtmp.idpartes
and mvtcta_part_vrdlpers1.cdqltptfc = 'N'
and mvtcta_part_vrdlpers1.idcll = mvtcta_part_SLIMONtmp.idcll
)
left join compte mvtcta_part_compte1 on
(
mvtcta_part_compte1.idcll = mvtcta_part_vrdlpers1.idcll
and mvtcta_part_compte1.grecpts = substr (mvtcta_part_SLIMONtmp.grecptseqs, 1, 2 )
and mvtcta_part_compte1.seqc = substr (mvtcta_part_SLIMONtmp.grecptseqs, -1 )
and mvtcta_part_compte1.devs = mvtcta_part_SLIMONtmp.devs
and (mvtcta_part_compte1.devs = ' ' or ' ' = ' ')
and mvtcta_part_compte1.cdpartc not in ( 'L' , 'R' )
)
left join mvtcta mvtcta on
(
mvtcta.idcll = mvtcta_part_SLIMONtmp.idclrelpayl
and mvtcta.devs = mvtcta_part_SLIMONtmp.devs
and mvtcta.grecptseqs = mvtcta_part_SLIMONtmp.grecptseqs
and mvtcta.flagdonei <> 0
and mvtcta.devs = mvtcta_part_compte1.devs
and mvtcta.dtvall > 20101206
)
where 1=1
order by mvtcta_part_compte1.devs,
mvtcta_part_SLIMONtmp.idpartes,
mvtcta_part_SLIMONtmp.idclrelpayl,
mvtcta_part_SLIMONtmp.grecptseqs,
mvtcta.dtvall;

"if a query in oracle takes the first
time it is executed 11 minutes, and
the next time, the same query 25
seconds, with the buffer being
flushed, what is the possible cause?"
The thing is, flushing the DB Buffers, like this ...
alter system flush shared_pool
/
... wipes the Oracle data store but there are other places where data gets cached. For instance the chances are your OS caches its file reads.
EXPLAIN PLAN is good as a general guide to how the database thinks it will execute a query, but it is only a prediction. It can be thrown out by poor statistics or ambient conditions. It is not good at explaining why a specific instance of a query took as much time as it did.
So, if you really want to understand what occurs when the database executes a specific query you need to get down and dirty, and learn how to use the Wait Interface. This is a very powerful tracing mechanism, which allows us to see the individual events that happen over the course of a single query execution. Each version of Oracle has extended the utility and richness of the Wait Interface, but it has been essential to proper tuning since Oracle 9i (if not earlier).
Find out more by reading Roger Schrag's very good overview .
In your case you'll want to run the trace multiple times. In order to make it easier to compare results you should use a separate session for each execution, setting the 10046 event each time.

What else is happening on the box when you ran these? You can get different timings based on other processes chewing resources. Also, with a lot of joins, performance will depend on memory usage (hash_area_size, sort_area_size, etc) and availability, so perhaps you are paging (check temp space size/usage also). In short, try sql_trace and tkprof to analyze deeper

Sometimes a block is written to the file system before it is committed (a dirty block). When that block is read later, Oracle sees that it was uncommitted. It checks the open transaction and, if the transaction isn't still there, it knows the change was committed. Therefore it writes the block back as a clean block. It is called delayed block cleanout.
That is one possible reason why reading blocks for the first time can be slower than a subsequent re-read.

Could be the second time the execution plan is known. Maybe the optimizer has a very hard time finding a execution plan for some reason.
Try setting
alter session set optimizer_max_permutations=100;
and rerun the query. See if that makes any difference.

could it be that the query is written in a bad way?
"bad" is a rather emotional expression - but broadly speaking, yes, if a query performs significantly faster the second time it's run, it usually means there are ways to optimize the query.
Actually optimizing the query is - as APC says - rather a question of "down and dirty". Obvious candidate in your examply might be the substring - if the table is huge, and the substring misses the index, I'd imagine that would take a bit of time, and I'd imagine the result of all those substrin operations are cached somewhere.

Here's Tom Kyte's take on flushing Oracle buffers as a testing practice. Suffice it to say he's not a fan. He favors the approach of attempting to emulate your production load with your test data ("real life"), and tossing out the first and last runs. #APC's point about OS caching is Tom's point - to get rid of that (non-trivial!) effect you'd need to bounce the server, not just the database.

How to update a column value incrementally

I'm experimenting with a personal finance application, and I'm thinking about what approach to take to update running balances when entering a transaction in an account.
Currently the way I'm using involves retrieving all records more recent than the inserted/modified one, and go one by one incrementing their running balance.
For example, given the following transactions:
t1 date = 2008-10-21, amount = 500, running balance = 1000
t2 date = 2008-10-22, amount = 300, running balance = 1300
t3 date = 2008-10-23, amount = 100, running balance = 1400
...
Now suppose I insert a transaction between t1 and t2, then t2 and all subsequent transactions would need their running balances adjusted.
Hehe, now that I wrote this question, I think I know the answer... so I'll leave it here in case it helps someone else (or maybe there's even a better approach?)
First, I get the running balance from the previous transaction, in this case, t1. Then I update all following transactions (which would include the new one):
UPDATE transactions
SET running_balance = running_balance + <AMOUNT>
WHERE date > <t1.date>
The only issue I see is that now instead of storing only a date, I'll have to store a time too. Although, what would happen if two transactions had the exact same date/time?
PS: I'd prefer solutions not involving propietary features, as I'm using both PostgreSQL and SQLite... Although a Postgre-only solution would be helpful too.

Some sort of Identity / Auto-increment columnn in there would be wise as well, purely for the transaction order if anything.
Also in addition to just the date of the transaction, a date that the transaction is inserted into the database (not always the same) would be wise / helpful as well.
These sort of things simply help you arrange things in the system and make it easier to change things i.e. for transactions, at a later time.

I think this might work:
I was using both the date and the id to order the transactions, but now I'm going to store both the date and the id on one column, and use that for ordering. So, using comparisons (like >) should always work as expected, right? (as opposed to the situation I describe earlier where two columns have the exact datetime (however unlikely that'd be).

If you have a large volume of transactions, then you are better off storing the running balance date-wise or even week/month-wise in a separate table.
This was if you are inserting rows for the same date you just need to change the running balance in one row.
The querying and reporting will be more trickier as using this running balance you would need to arrive at balances after each transaction, it would be more like taking the last days running balance and adding or subtracting the transaction value.

How do I output progress messages from a SELECT statement?

I have a SQL script that I want to output progress messages as it runs. Having it output messages between SQL statements is easy, however I have some very long running INSERT INTO SELECTs. Is there a way to have a select statement output messages as it goes, for example after every 1000 rows, or every 5 seconds?
Note: This is for SQL Anywhere, but answers in any SQL dialect will be fine.

There's no way to retrieve the execution status of a single query. None of the mainstream database engines provide this functionality.
Furthermore, a measurable overhead would be generated from any progress implementation were one to exist, so if a query is already taking an uncomfortably long time such that you want to show progress, causing additional slowdown by showing said progress might not be a design goal.
You may find this article on estimating SQL execution progress helpful, though its practical implications are limited.

SQL itself has no provision for this kind of thing. Any way of doing this would involve talking directly to the database engine, and would not be standard across databases.

Really the idea of progress with set based operations (which is what a relational database uses) wouldn't be too helpful, at least not as displayed with a progress bar (percent done vs total). By the time the optimizer figured out what it needed to do and really understood the full cost of the operation, you have already completed a significant portion of the operation. Progress displays are really meant for iterative operations rather than set operations.
That's talking about your general SELECT statement execution. For inserts that are separate statements there are all kinds of ways to do that from the submitter by monitoring the consumption rate of the statements. If they are bulk inserts (select into, insert from, and the like) then you really have the same problem that I described above. Set operations are batched in a way that make a progress bar type of display somewhat meaningless.

I am on the SQL Anywhere engine development team and there is currently no way to do this. I can't promise anything, but we are considering adding this type of functionality to a future release.

There's certainly no SQL-standard solution to this. Sorry to be doom-laden, but I haven't seen anything that can do this in Oracle, SQL Server, Sybase or MySQL, so I wouldn't be too hopeful for SQLAnywhere.

I agree that SQL does not have a way to do this directly. One way might be to only insert the TOP 1000 at a time and then print your status message. Then keep repeating this as needed (in a loop of some kind). The downside is that you would then need a way to keep track of where you are.
I should note that this approach will not be as efficient as just doing one big INSERT

Here's what I would do (Sybase / SQL Server syntax):
DECLARE #total_rows int
SELECT #total_rows = count(*)
FROM Source_Table
WHILE #total_rows > (SELECT count(*) FROM Target_Table)
BEGIN
SET rowcount 1000
print 'inserting 1000 rows'
INSERT Target_Table
SELECT *
FROM Source_Table s
WHERE NOT EXISTS( SELECT 1
FROM Target_Table t
WHERE t.id = s.id )
END
set rowcount 0
print 'done'
Or you could do it based on IDs (assumes Id is a number):
DECLARE #min_id int,
#max_id int,
#start_id int,
#end_id int
SELECT #min_id = min(id) ,
#max_id = max(id)
FROM Source_Table
SELECT #start_id = #min_id ,
#end_id = #min_id + 1000
WHILE #end_id <= #max_id
BEGIN
print 'inserting id range: ' + convert(varchar,#start_id) + ' to ' + convert(varchar,#end_id)
INSERT Target_Table
SELECT *
FROM Source_Table s
WHERE id BETWEEN #start_id AND #end_id
SELECT #start_id = #end_id + 1,
#end_id = #end_id + 1000
END
set rowcount 0
print 'done'

One thought might to have another separate process count the number of rows in the table where the insert is being done to determine what percentage of them are there already. This of course would require that you know the total in the end. This would probably only be okay if this you're not too worried about server load.

On the off chance you're using Toad, you can generate a set of INSERT statements from a table and configure it to commit at a user input frequency. You could modify your scripts a little bit and then see how much of the new data has been commited as you go.

You can simulate the effect for your users by timing several runs, then having a progress bar advance at the average records / second rate.
The only other ways will be
1 - Refer to the API of your database engine to see if it makes any provision for that
or
2 - Break your INSERT into many smaller statements, and report on them as you go. But that will have a significant negative performance impact.

If you need to have it or you die, for insert,update,delete you can use some trigger logic with db variables, and time by time you do sql to retrieve variable data and display some progress to user.
If you wan`t to use it, I can write an example and send it.

Stumbled upon this old thread looking for something else. I disagree with the idea that we don't want progress information just because it's a set operation. Users will often tolerate even a long wait if they know how long it is.
Here's what I suggest:
Each time this runs, log the number of rows inserted and the total time, then add a step at the beginning of that process to query that log and calculate an estimated total time. If you base your estimate on the last runs, you should be able to present an acceptably good guess for the wait time for the thing to finish.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server Change Data Capture - how to separate log entries by transaction? - sql

Related

Oracle SQL -- Can I timestamp a query?

SQL update working not insert

mysterious oracle query

How to update a column value incrementally

How do I output progress messages from a SELECT statement?

Categories

Resources