Date of inserting a row into table - sql

Is it possible to find date+time when a row was inserted into a table in SQL Server 2005?
Does SQL Server log insert commands?

Whenever I create a table, I alway include the following two columns:
CreatedBy varchar(255) default system_name,
CreatedAt datetime default getdate()
Although this uses a bit of extra space, I've found that the information proves very, very useful over time.
Your question is about the log. The answer is "yes". However, whether you can get the information depends on your recovery mode. If simple, then the records are overwritten for the next transaction. If bulk or full, then the information is in the log, at least since the last incremental backup.

You can derive insert date as long as you are using cdc created functions to pull actual data records.
So for example if you will pull something like:
DECLARE #from_lsn binary(10), #to_lsn binary(10);
SET #from_lsn=sys.fn_cdc_get_min_lsn ( 'name_of_your_cdc_instance_on_cdc_table' );
SET #to_lsn=sys.fn_cdc_get_max_lsn();
SELECT * FROM
cdc.fn_cdc_get_net_changes_name_of_your_cdc_instance_on_cdc_table
(
#from_lsn,
#to_lsn,
N'all'
);
You can use cdc built in function sys.fn_cdc_map_lsn_to_time to convert Log Sequence Number to datetime. Below usecase:
SELECT sys.fn_cdc_map_lsn_to_time(__$start_lsn),* FROM
cdc.fn_cdc_get_net_changes_name_of_your_cdc_instance_on_cdc_table
(
#from_lsn,
#to_lsn,
N'all'
);

you can have a InsertDate default getdate() column on your table, that would be the easiest approach.
On SQl Server 2008 you can use CDC to control changed data on your table
Change data capture records insert, update, and delete activity that
is applied to a SQL Server table. This makes the details of the
changes available in an easily consumed relational format. Column
information and the metadata that is required to apply the changes to
a target environment is captured for the modified rows and stored in
change tables that mirror the column structure of the tracked source
tables. Table-valued functions are provided to allow systematic access
to the change data by consumers.

Related

Missing data while loading data through ETL datastage

I am trying to load data from DB2 table to Netezza through ETL Datastage. This is a delta load against a timestamp column.
So source SQL is like
select * from db2_table where timestamp_column > '2017-02-10 08:24:00';
After loading data in Netezza table, when I ran below query and got following result.
select max(timestamp_column) from netezza_table;
returns '2017-02-10 11:17:56'
Which looks good to me.
But I have noticed that we have a record in the DB2 table whose timestamp_column is '2017-02-10 11:17:54', though that data is missing in destination Netezza table.
This is not a regular issue, but when the issue occurred, I have noticed that missing record's timestamp_column value is less than 1 or 2 second.
My question is, if max(timestamp_column) value is '2017-02-10 11:17:56' in Netezza then the ETL job should have fetched the '2017-02-10 11:17:54' record.
How it is possible to miss this record?
A way to solve your problem could be a row change timestamp.
This timestamp is generated by DB2 automaically at insert or update time and therefore a perfect solution to determine deltas.
Add a additional column to your source table like this
rct timestamp not null generated always for each row on update as row change timestamp
To avoid conflicts because of the DDL change you could also defined this column as "hidden". This means it can be explicitly selected but is not returned when running a
SELECT * FROM tab
It's totally possible that the transaction that updated the '2017-02-10 11:17:54' record committed after the row had been read by the ETL job. The default isolation level in the DB2 database (I'm assuming DB2 for LUW) is CS, which only locks the current row when processing a cursor, and other transactions are free to update rows that have already been read.
You can try increasing the ETL job's isolation level to RR to ensure the result set does not change until you're done reading it, but keep in mind that it will affect concurrency of updates on the DB2 side.

How to track changes for certain database tables?

I have program that takes user and updates information about him/her in five tables. The process is fairly sophisticated as it takes many steps(pages) to complete. I have logs, sysout and syserr statements that helps me to find sql queries in IDE console but it doesn't have all of them. I've already spend many days to catch other missing queries by debugging but no luck so far. The reason why I am doing this is because I want to automate user information updates so I don't have to go through every page entering user details manually.
I wonder if I could just have some technique that will show me database table changes as I already know table names, by changes I mean whether it was update or insert statements and what exactly changed(column name and value inserted/updated). Any advice is greatly appreciated. I have IBM RAD and DB2 database. Thanks.
In DB2 you can track basic auditing information.
DB2 can track what data was modified, who modified the data, and the SQL operation that modified the data.
To track when data was modified, define your table as a system-period temporal table. The row-begin and row-end columns in the associated history table contain information about when data modifications occurred.
To track who and what SQL modified the data, you can use non-deterministic generated expression columns. These columns can contain values that are helpful for auditing purposes, such as the value of the CURRENT SQLID special register at the time that the data was modified. Possible values for non-deterministic generated expression columns are defined in the syntax for the CREATE TABLE and ALTER TABLE statements.
For example
CREATE TABLE TempTable (balance INT,
userId VARCHAR(100) GENERATED ALWAYS AS ( SESSION_USER ) ,
opCode CHAR(1)
GENERATED ALWAYS AS ( DATA CHANGE OPERATION )
... SYSTEM PERIOD (SYS_START, SYS_END));
The userId column stores who modified the data. This column is defined as a non-deterministic generated expression column that contains the value of SESSION_USER special register.
The opCode column stores the SQL operation that modified the data. This column is defined as a non-deterministic generated expression column and stores a value that indicates the type of SQL operation.
Suppose that you then use the following statements to create a history table for TempTable and to associate that history table with TempTable:
CREATE TABLE TempTable_HISTORY (balance INT, user_id VARCHAR(128) , op_code CHAR(1) ... );
ALTER TABLE TempTable ADD VERSIONING
USE HISTORY TABLE TempTable_HISTORY ON DELETE ADD EXTRA ROW;
Capturing SQL statements for a limited number of tables and a limited time - as far as I understand your problem - could be solved with the DB2 Audit facility.
create audit policy tabsql categories execute status both error type normal
audit <tabname> using policy tabsql
You have to have SECADM rights in theh database and the second command will start the audit process. You can stop it with
audit <tabname> remove policy
Check out the
db2audit
command to configure paths and extract the data from the audit file to a delimited file which then could be loaded again into the database.
The necessarfy tables can be created with the provided sqllib/misc/db2audit.ddl script. You will need the query the EXECUTE table for your SQL details
Please note that audit can capture huge amounts of data so make sure to switch it off again after you have catured the necessary information.

SQL: Get real time data from different database

I want to insert data from database1 into database2 when something changes. Like a trigger but the thing is I am not allowed to create a trigger in database1.
I basically need to insert certain data into a table from a database into another database as they happen.
Any suggestions would be great!
Thanx
You can create a program to do that, there are many ways instead but I think this is the straightforward and easy one.
For example if you work with SQL server you can create C# code that connect to the two databases and check for data in the first one then send it to the second.
For example:
you can create a trigger for the first DB tables you want to check them, then you can create a web service that check if the trigger fire, it will get the data from first database and send it to the second, this will be better to enhance and you can change the code to do whatever you want without making any changes from the database side
MS SQL Server Replication
I think suitable for you is Transactional Replication
Transactional replication is asynchronous but close to real time
What about the merge operator ?
Merge operator will commit changes into the target table from the source table when something change.
As you cannot use trigger this cannot be a real time process but with a process scheduler like SQL Agent, you can run it each 10 seconds depending on your table size.
https://msdn.microsoft.com/en-us/library/bb510625.aspx
You can do it when your source come from multiple table by using CTE like that :
With TableSources As
(
Select id,Column1, Column2 from table1
UNION
Select id,Colum1,Column2 from table2
)
MERGE INTO TargetTable
USING TableSources ON TargetTable.id=TableSources.id
WHEN NOT MATCHED BY Target
Insert(Column1,Column2) Values(Column1,Column2)
WHEN NOT MATCHED BY Source
Delete
WHEN MATCHED
Update Set Column1=TableSources.Column1, Column2=TableSources.Column2
;

Retrieving the last inserted rows

I have a table which contains GUID and Name columns and I need to retrieve the last inserted rows so I can load it into table2.
But how would I find out the latest data in Table1. I seem to be lost at this I have read similar posts posing the same question but the answers don't seem to work for my situation.
I am using SQL Server 2008 and I upload my data using SSIS
1 - One way to do this is with triggers. Check out my blog entry that shows how to copy data from one table to another on a insert.
Triggers to replicate data = http://craftydba.com/?p=1995
However, like most things in life, there is overhead with triggers. If you are bulk loading a ton of data via SSIS, this can add up.
2 - Another way to do this is to add a modify date to your first table and modify your SSIS package.
ALTER TABLE [MyTable1]
ADD [ModifyDate] [datetime] NULL DEFAULT GETDATE();
Next, change your SSIS package. In the control flow, add an execute SQL task. Insert data from [MyTable1] to [MyTable2] using TSQL.
INSERT INTO [MyTable2]
SELECT * FROM [MyTable1]
WHERE [ModifyDate] >= 'Start Date/Time Of Package';
Execute SQL Task =
http://technet.microsoft.com/en-us/library/ms141003.aspx
This will be quicker than a data flow or execute OLEDB command since you are working with the data on the server.

Concurrency handling of Sql transactrion

Suppose, I am about to start a project using ASP.NET and SQL Server 2005. I have to design the concurrency requirement for this application. I am planning to add a TimeStamp column in each table. While updating the tables I will check that the TimeStamp column is same, as it was selected.
Will this approach be suffice? Or is there any shortcomings for this approach under any circumstances?
Please advice.
Thanks
Lijo
First of all the way which you describe in your question is in my opinion the best way for ASP.NET application with MS SQL as a database. There is no locking in the database. It is perfect with permanently disconnected clients like web clients.
How one can read from some answers, there is a misunderstanding in the terminology. We all mean using Microsoft SQL Server 2008 or higher to hold the database. If you open in the MS SQL Server 2008 documentation the topic "rowversion (Transact-SQL)" you will find following:
"timestamp is the synonym for the
rowversion data type and is subject to
the behavior of data type synonym." …
"The timestamp syntax is deprecated.
This feature will be removed in a
future version of Microsoft SQL
Server. Avoid using this feature in
new development work, and plan to
modify applications that currently use
this feature."
So timestamp data type is the synonym for the rowversion data type for MS SQL. It holds 64-bit the counter which exists internally in every database and can be seen as ##DBTS. After a modification of one row in one table of the database, the counter will be incremented.
As I read your question I read "TimeStamp" as a column name of the type rowversion data. I personally prefer the name RowUpdateTimeStamp. In AzManDB (see Microsoft Authorization Manager with the Store as DB) I could see such name. Sometimes were used also ChildUpdateTimeStamp to trace hierarchical RowUpdateTimeStamp structures (with respect of triggers).
I implemented this approach in my last project and be very happy. Generally you do following:
Add RowUpdateTimeStamp column to every table of you database with the type rowversion (it will be seen in the Microsoft SQL Management Studio as timestamp, which is the same).
You should construct all you SQL SELECT Queries for sending results to the client so, that you send additional RowVersion value together with the main data. If you have a SELECT with JOINTs, you should send RowVersion of the maximum RowUpdateTimeStamp value from both tables like
SELECT s.Id AS Id
,s.Name AS SoftwareName
,m.Name AS ManufacturerName
,CASE WHEN s.RowUpdateTimeStamp > m.RowUpdateTimeStamp
THEN s.RowUpdateTimeStamp
ELSE m.RowUpdateTimeStamp
END AS RowUpdateTimeStamp
FROM dbo.Software AS s
INNER JOIN dbo.Manufacturer AS m ON s.Manufacturer_Id=m.Id
Or make a data casting like following
SELECT s.Id AS Id
,s.Name AS SoftwareName
,m.Name AS ManufacturerName
,CASE WHEN s.RowUpdateTimeStamp > m.RowUpdateTimeStamp
THEN CAST(s.RowUpdateTimeStamp AS bigint)
ELSE CAST(m.RowUpdateTimeStamp AS bigint)
END AS RowUpdateTimeStamp
FROM dbo.Software AS s
INNER JOIN dbo.Manufacturer AS m ON s.Manufacturer_Id=m.Id
to hold RowUpdateTimeStamp as bigint, which corresponds ulong data type of C#. If you makes OUTER JOINTs or JOINTs from many tables, the construct MAX(RowUpdateTimeStamp) from all tables will be seen a little more complex. Because MS SQL don't support function like MAX(a,b,c,d,e) the corresponding construct could looks like following:
(SELECT MAX(rv)
FROM (SELECT table1.RowUpdateTimeStamp AS rv
UNION ALL SELECT table2.RowUpdateTimeStamp
UNION ALL SELECT table3.RowUpdateTimeStamp
UNION ALL SELECT table4.RowUpdateTimeStamp
UNION ALL SELECT table5.RowUpdateTimeStamp) AS maxrv) AS RowUpdateTimeStamp
All disconnected clients (web clients) receive and hold not only some rows of data, but RowVersion (type ulong) of the data row.
In one try to modify data from the disconnected client, you client should send the RowVersion corresponds to the original data to server. The spSoftwareUpdate stored procedure could look like
CREATE PROCEDURE dbo.spSoftwareUpdate
#Id int,
#SoftwareName varchar(100),
#originalRowUpdateTimeStamp bigint, -- used for optimistic concurrency mechanism
#NewRowUpdateTimeStamp bigint OUTPUT
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
-- ExecuteNonQuery() returns -1, but it is not an error
-- one should test #NewRowUpdateTimeStamp for DBNull
SET NOCOUNT ON;
UPDATE dbo.Software
SET Name = #SoftwareName
WHERE Id = #Id AND RowUpdateTimeStamp <= #originalRowUpdateTimeStamp
SET #NewRowUpdateTimeStamp = (SELECT RowUpdateTimeStamp
FROM dbo.Software
WHERE (##ROWCOUNT > 0) AND (Id = #Id));
END
Code of dbo.spSoftwareDelete stored procedure look like the same. If you don’t switch on NOCOUNT, you can produce DBConcurrencyException automatically generated in a lot on scenarios. Visual Studio gives you possibilities to use optimistic concurrency like "Use optimistic concurrency" checkbox in Advanced Options of the TableAdapter or DataAdapter.
If you look at dbo.spSoftwareUpdate stored procedure carful you will find, that I use RowUpdateTimeStamp <= #originalRowUpdateTimeStamp in WHERE instead of RowUpdateTimeStamp = #originalRowUpdateTimeStamp. I do so because, the value of #originalRowUpdateTimeStamp which has the client typically are constructed as a MAX(RowUpdateTimeStamp) from more as one tables. So it can be that RowUpdateTimeStamp < #originalRowUpdateTimeStamp. Either you should use strict equality = and reproduce here the same complex JOIN statement as you used in SELECT statement or use <= construct like me and stay exact the same safe as before.
By the way, one can construct very good value for ETag based on RowUpdateTimeStamp which can sent in HTTP header to the client together with data. With the ETag you can implement intelligent data caching on the client side.
I can’t write whole code here, but you can find a lot of examples in Internet. I want only repeat one more time that in my opinion usage optimistic concurrency based on rowversion is the best way for the most of ASP.NET scenarios.
In SQL Server a recommended approach for type of situation is to create a column of type 'rowversion' and use that to check if any of the fields in that row have changed.
SQL Server guarantees that if any value in the row changes (or a new row is inserted) it's rowversion column will automatically updated to different value. Letting the database handle this for you is much more reliable than trying to do it yourself.
In your update statements you simply need to add a where clause to check that the rowversion value is the same as it was when you first retrieved the row. If not, someone else has changed the row (ie: it's dirty)
Also, from that page:
The timestamp syntax is deprecated.
This feature will be removed in a
future version of Microsoft SQL
Server. Avoid using this feature in
new development work, and plan to
modify applications that currently use
this feature.
I'm not sure that concurrency should be handled in the database like this. The database itself should be able manage isolation and transactional behavior, but the threading behavior ought to be done in code.
rowversion suggestion is correct I would say but its disappointing to see that timestamp will be deprecated soon. Some of my OLD applications are using this for different reasons then checking concurrency.