Can I select rows on row version?
I am querying a database table periodically for new rows.
I want to store the last row version and then read all rows from the previously stored row version.
I cannot add anything to the table, the PK is not generated sequentially, and there is no date field.
Is there any other way to get all the rows that are new since the last query?
I am creating a new table that contains all the primary keys of the rows that have been processed and will join on that table to get new rows, but I would like to know if there is a better way.
EDIT
This is the table structure:
Everything except product_id and stock_code are fields describing the product.
You can cast the rowversion to a bigint, then when you read the rows again you cast the column to bigint and compare against your previous stored value. The problem with this approach is the table scan each time you select based on the cast of the rowversion - This could be slow if your source table is large.
I haven't tried a persisted computed column of this, I'd be interested to know if it works well.
Sample code (Tested in SQL Server 2008R2):
DECLARE #TABLE TABLE
(
Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
Data VARCHAR(10) NOT NULL,
LastChanged ROWVERSION NOT NULL
)
INSERT INTO #TABLE(Data)
VALUES('Hello'), ('World')
SELECT
Id,
Data,
LastChanged,
CAST(LastChanged AS BIGINT)
FROM
#TABLE
DECLARE #Latest BIGINT = (SELECT MAX(CAST(LastChanged AS BIGINT)) FROM #TABLE)
SELECT * FROM #TABLE WHERE CAST(LastChanged AS BIGINT) >= #Latest
EDIT: It seems I've misunderstood, and you don't actually have a ROWVERSION column, you just mentioned row version as a concept. In that case, SQL Server Change Data Capture would be the only thing left I could think of that fits the bill: http://technet.microsoft.com/en-us/library/bb500353(v=sql.105).aspx
Not sure if that fits your needs, as you'd need to be able to store the LSN of "the last time you looked" so you can query the CDC tables properly. It lends itself more to data loads than to typical queries.
Assuming you can create a temporary table, the EXCEPT command seems to be what you need:
Copy your table into a temporary table.
The next time you look, select everything from your table EXCEPT everything from the temporary table, extract the keys you need from this
Make sure your temporary table is up to date again.
Note that your temporary table only needs to contain the keys you need. If this is just one column, you can go for a NOT IN rather than EXCEPT.
Related
I have a database table which constantly gets updated. I am looking to query only the changes/additions that have been made on rows with a specific attribute in a column. e.g. get the rows which have been changed/added, the 'description' column of which is "xyz". My end goal is to copy these rows to another table in another database. Is this even possible? The reason for not just querying and overwriting the rows in the other database is to avoid inefficiency.
What I have tried so far?
I am able to select query on the table to get the rows but it gives me all the rows, not the ones that have been changed or recently added. If i add these rows to the table in the other database, the only option I have is to overwrite the rows.
Log table logs the changes in a table but I can't put additional filters in SQL which tells me which of these changes are associated with 'description' column as 'xyz'.
Write your update statements to make use of OUTPUT to capture the before and after values and log them to a table of your choice.
Here is a really simple example update example that uses output to store the RowID, before and after values for the ActivityType column:
DECLARE #MyTableVar table (
SummaryBefore nvarchar(max),
SummaryAfter nvarchar(max),
RowID int
);
update DBA.dbo.dtest set ActivityType = 3
OUTPUT deleted.ActivityType,
inserted.ActivityType,
inserted.RowID
INTO #MyTableVar
select * From #MyTableVar
You can do it two ways
Have new date fields/columns like update_time and/or create_time(Can be defaulted if needed). These fields will indicate the status of the record. You need to save your previous_run_time and then your select query will look for records with update_time/create_time greater than previous_run_time, and then you can move these records to the new DB.
Have CDC turned on the source table, which is available by default in SQL server and then move only those records that have been impacted.
One of the tables has to be a hierarchy of product reps and their assigned area. Orese reps and their area change every day, and I need to keep track of what exactly that table looks like every day. I will need to take snapshots of the table daily. I would like to know what I have to do or how I have to store the data in the table, to be able to know exactly what the data in the table was at a certain point in time. Is this possible? Please keep in mind that the table will not be more than one megabyte and table has an incremental load. i do not want to use any tool for it. i want to build logic for it in stored proc only.
You can do one of these:
Create a new table each day, and copy the data of your table in it;
Create one new table with the same structure as your table, plus one additional date column, to store the date of the snapshot taken, then each day copy your table along with the current system date;
Make your existing table a temporal table (as also suggested by sticky bit in the comments). Please note, that you need SQL Server 2016 or newer for this.
My personal preference is the last option, but first two may be easier for you.
For the first 2 options you need to create a SQL Server Agent job to run nightly and take the snapshots. The 3rd option works automatically.
Lets say your table is named MyTable and has primary key ID int and field Name varchar(50).
For the first option you need to use dynamic SQL, because each time your new table's name will be different:
declare #sql nvarchar(max) = N'select ID, Name into MyTable_' +
convert(nvarchar(10), getdate(), 112) + N' from MyTable'
exec (#sql)
When executed, this statement will create a new table with the same structure as your existing table, but named with the current date as suffix, e.g. MyTable_20190116, and copy MyTable to it.
For the second option you need to create one table like bellow, and copy data to it using the script like this:
create table MyTableDailySnapshots(
SnapshotDate date not null
, ID int not null
, Name varchar(50)
, constraint PK_MyTableDailySnapshots primary key clustered (SnapshotDate, ID)
)
insert into MyTableDailySnapshots(SnapshotDate, ID, Name)
select GETDATE(), ID, Name
from MyTable
If you choose the third option, no actions are needed to maintain the snapshots. Just use query like this, to get the state of the table for a period of time:
select ID, Name from MyTable
for system_time between '2019-01-16 00:00:00.0000000' and '2019-01-16 23:59:59.9999999'
The first option is more flexible if you table's schema changes in time, because each day you can create a table with different schema. Options 2 and 3 has only 1 table to store the snapshots, so you may need to be creative, if your table's schema needs to change. But the disadvantage of the first option is the large number of tables created in your database.
So it is up to you to choose what's the best for your case.
I have a query that moves year-old rows from one table to an identical "archive" table.
Sometimes, invalid dates get entered in to a dateprocessed column (used to evaluate if the row is more than a year old), and the query errors out. I want to essentially "screen" the bad rows -- i.e. where not isdate(dateprocessed) does not equal 1 -- so that the query does not try to archive them.
I have a few ideas about how to do this, but want to do this in the absolute simplest way possible. If I select the good data into a temp table in my stored procedure, then inner join it with the live table, then run the delete from live output to archive -- will it delete from the underlying live table or the new joined table?
Is there a better way to do this? Thanks for the help. I am a .NET programmer playing DBA, but really want to do this properly.
Here is the query that errors when some of the dateprocessed column values are invalid:
delete from live
output deleted.* into archive
where isdate(dateprocessed) = 1
and cast (dateprocessed as datetime) < dateadd(year, -1, getdate())
and not exists (select * from archive where live.id = archive.id)
The simplest thing to do is:
Select the correct records into a temp table
One of the fields you need to copy into the temp table should be a
unique identifier like an "ID" column
Do any additional processing in the temp table
Archive from the temp table to archive table
Delete from live table with a join with temp table using the "ID" Column. This will ensure no mistakes are made.
If you are a .NET guy you could bring every data down and do a DateTime.TryParse. Better yet just do it once to populate a real DateTime column. The the dates that don't parse you could assign a fixed date or null. And there are some dates strings that .NET will parse that SQL will not (.e.g. November 2010).
I am developing an application that is required to store previous versions of database table rows to maintain a history of changes. I am recording the history in the same table but need the most current data to be accessible by a unique identifier that doesn't change with new versions. I have a few ideas on how this could be done and was just looking for some ideas on the best way of doing this or whether there is any reason not to use one of my ideas:
Create a new row for each row version, with a field to indicate which row was the current row. The drawback of this is that the new version has a different primary key and any references to the old version will not return the current version.
When data is updated, the old row version is duplicated to a new row, and the new version replaces the old row. The current row can be accessed by the same primary key.
Add a second table with only a primary key, add a column to the other table which is foreign key to new table's primary key. Use same method as described in option 1 for storing multiple versions and create a view which finds the current version by using the new table's primary key.
PeopleSoft uses (used?) "effective dated records". It took a little while to get the hang of it, but it served its purpose. The business key is always extended by an EFFDT column (effective date). So if you had a table EMPLOYEE[EMPLOYEE_ID, SALARY] it would become EMPLOYEE[EMPLOYEE_ID, EFFDT, SALARY].
To retrieve the employee's salary:
SELECT e.salary
FROM employee e
WHERE employee_id = :x
AND effdt = (SELECT MAX(effdt)
FROM employee
WHERE employee_id = :x
AND effdt <= SYSDATE)
An interesting application was future dating records: you could give every employee a 10% increase effective Jan 1 next year, and pre-poulate the table a few months beforehand. When SYSDATE crosses Jan 1, the new salary would come into effect. Also, it was good for running historical reports. Instead of using SYSDATE, you plug in a date from the past in order to see the salaries (or exchange rates or whatever) as they would have been reported if run at that time in the past.
In this case, records are never updated or deleted, you just keep adding records with new effective dates. Makes for more verbose queries, but it works and starts becoming (dare I say) normal. There are lots of pages on this, for example: http://peoplesoft.wikidot.com/effective-dates-sequence-status
#3 is probably best, but if you wanted to keep the data in one table, I suppose you could add a datetime column that has a now() value populated for each new row and then you could at least sort by date desc limit 1.
Overall though - multiple versions needs more info on what you want to do effectively as much as programatically...ie need more info on what you want to do.
R
Have you considered using AutoAudit?
AutoAudit is a SQL Server (2005, 2008) Code-Gen utility that creates
Audit Trail Triggers with:
Created, CreatedBy, Modified, ModifiedBy, and RowVersion (incrementing INT) columns to table
Insert event logged to Audit table
Updates old and new values logged to Audit table
Delete logs all final values to the Audit tbale
view to reconstruct deleted rows
UDF to reconstruct Row History
Schema Audit Trigger to track schema changes
Re-code-gens triggers when Alter Table changes the table
For me, history tables are always separate. So, definitely I would go with that, but why create some complex versioning thing where you need to look at the current production record. In reporting, this results in nasty unions that are really unnecessary.
Table has a primary key and who cares what else.
TableHist has these columns: incrementing int/bigint primary key, history written date/time, history written by, record type (I, U, D for insert, update, delete), the PK from Table as an FK on TableHist, the remaining columns all other columns with the same name are in the TableHist table.
If you create this history table structure and populate it via triggers on Table, you will have all versions of every row in the tables you care about and can easily determine the original record, every change, and the deletion records as well. AND if you are reporting, you only need to use your historical tables to get all of the information you'd like.
create table table1 (
Id int identity(1,1) primary key,
[Key] varchar(max),
Data varchar(max)
)
go
create view view1 as
with q as (
select [Key], Data, row_number() over (partition by [Key] order by Id desc) as 'r'
from table1
)
select [Key], Data from q where r=1
go
create trigger trigger1 on view1 instead of update, insert as begin
insert into table1
select [Key], Data
from (select distinct [Key], Data from inserted) a
end
go
insert into view1 values
('key1', 'foo')
,('key1', 'bar')
select * from view1
update view1
set Data='updated'
where [Key]='key1'
select * from view1
select * from table1
drop trigger trigger1
drop table table1
drop view view1
Results:
Key Data
key1 foo
Key Data
key1 updated
Id Key Data
1 key1 bar
2 key1 foo
3 key1 updated
I'm not sure if the disctinct is needed.
I have a SQL query where I am going to be transferring a fair amount of response data down the wire, but I want to get the total rowcount as quickly as possible to facilitate binding in the UI. Basically I need to get a snapshot of all of the rows that meet a certain criteria, and then be able to page through all of the resulting rows.
Here's what I currently have:
SELECT --primary key column
INTO #tempTable
FROM --some table
--some filter clause
ORDER BY --primary key column
SELECT ##ROWCOUNT
SELECT --the primary key column and some others
FROM #tempTable
JOIN -- some table
DROP TABLE #tempTable
Every once in a while, the query results end up out of order (presumably because I am doing an unordered select from the temp table).
As I see it, I have a couple of options:
Add a second order by clause to the select from the temp table.
Move the order by clause to the second select and let the first select be unordered.
Create the temporary table with a primary key column to force the ordering of the temp table.
What is the best way to do this?
Use number 2. Just because you have a primary key on the table does not mean that the result set from select statement will be ordered (even if what you see actually is).
There's no need to order the data when putting it in the temp table, so take that one out. You'll get the same ##ROWCOUNT value either way.
So do this:
SELECT --primary key column
INTO #tempTable
FROM --some table
--some filter clause
SELECT ##ROWCOUNT
SELECT --the primary key column and some others
FROM #tempTable
JOIN -- some table
ORDER BY --primary key column
DROP TABLE #tempTable
Move the order by from the first select to the second select.
A database isn't a spreadsheet. You don't put the data into a table in a particular order.
Just make sure you order it properly when you get it back out.
Personally I would select out the data in the order you want to eventually have it. So in your first select, have your order by. That way it can take advantage of any existing indexes and access paths.