How to take daily snapshot of table in SQL Server?

How to take daily snapshot of table in SQL Server? - sql

One of the tables has to be a hierarchy of product reps and their assigned area. Orese reps and their area change every day, and I need to keep track of what exactly that table looks like every day. I will need to take snapshots of the table daily. I would like to know what I have to do or how I have to store the data in the table, to be able to know exactly what the data in the table was at a certain point in time. Is this possible? Please keep in mind that the table will not be more than one megabyte and table has an incremental load. i do not want to use any tool for it. i want to build logic for it in stored proc only.

You can do one of these:
Create a new table each day, and copy the data of your table in it;
Create one new table with the same structure as your table, plus one additional date column, to store the date of the snapshot taken, then each day copy your table along with the current system date;
Make your existing table a temporal table (as also suggested by sticky bit in the comments). Please note, that you need SQL Server 2016 or newer for this.
My personal preference is the last option, but first two may be easier for you.
For the first 2 options you need to create a SQL Server Agent job to run nightly and take the snapshots. The 3rd option works automatically.
Lets say your table is named MyTable and has primary key ID int and field Name varchar(50).
For the first option you need to use dynamic SQL, because each time your new table's name will be different:
declare #sql nvarchar(max) = N'select ID, Name into MyTable_' +
convert(nvarchar(10), getdate(), 112) + N' from MyTable'
exec (#sql)
When executed, this statement will create a new table with the same structure as your existing table, but named with the current date as suffix, e.g. MyTable_20190116, and copy MyTable to it.
For the second option you need to create one table like bellow, and copy data to it using the script like this:
create table MyTableDailySnapshots(
SnapshotDate date not null
, ID int not null
, Name varchar(50)
, constraint PK_MyTableDailySnapshots primary key clustered (SnapshotDate, ID)
)
insert into MyTableDailySnapshots(SnapshotDate, ID, Name)
select GETDATE(), ID, Name
from MyTable
If you choose the third option, no actions are needed to maintain the snapshots. Just use query like this, to get the state of the table for a period of time:
select ID, Name from MyTable
for system_time between '2019-01-16 00:00:00.0000000' and '2019-01-16 23:59:59.9999999'
The first option is more flexible if you table's schema changes in time, because each day you can create a table with different schema. Options 2 and 3 has only 1 table to store the snapshots, so you may need to be creative, if your table's schema needs to change. But the disadvantage of the first option is the large number of tables created in your database.
So it is up to you to choose what's the best for your case.

Related

Querying a SQL table and only transferring updated rows to a different database

I have a database table which constantly gets updated. I am looking to query only the changes/additions that have been made on rows with a specific attribute in a column. e.g. get the rows which have been changed/added, the 'description' column of which is "xyz". My end goal is to copy these rows to another table in another database. Is this even possible? The reason for not just querying and overwriting the rows in the other database is to avoid inefficiency.
What I have tried so far?
I am able to select query on the table to get the rows but it gives me all the rows, not the ones that have been changed or recently added. If i add these rows to the table in the other database, the only option I have is to overwrite the rows.
Log table logs the changes in a table but I can't put additional filters in SQL which tells me which of these changes are associated with 'description' column as 'xyz'.

Write your update statements to make use of OUTPUT to capture the before and after values and log them to a table of your choice.
Here is a really simple example update example that uses output to store the RowID, before and after values for the ActivityType column:
DECLARE #MyTableVar table (
SummaryBefore nvarchar(max),
SummaryAfter nvarchar(max),
RowID int
);
update DBA.dbo.dtest set ActivityType = 3
OUTPUT deleted.ActivityType,
inserted.ActivityType,
inserted.RowID
INTO #MyTableVar
select * From #MyTableVar

You can do it two ways
Have new date fields/columns like update_time and/or create_time(Can be defaulted if needed). These fields will indicate the status of the record. You need to save your previous_run_time and then your select query will look for records with update_time/create_time greater than previous_run_time, and then you can move these records to the new DB.
Have CDC turned on the source table, which is available by default in SQL server and then move only those records that have been impacted.

How do I update a SQL table with daily records?

I have a SQL database where some of my tables are updated daily. I want to create another table which is updated daily with records of what tables (table name, modified/updated date) were updated. I also do not want this table to get too big, so I want this table to only keep records for the last 31 days. How would I write the code for this?
I have already created a table (tUpdatedTables) but i would like this table to be updated daily & keep these records for 31 days
This is how I created the table
Select *
Into tUpdatedTables
from sys.tables
order by modify_date desc
I have tried inserting an "Update" code to update the table but I get an error
update tUpdatedTables
set [name]
,[object_id]
,[principal_id]
,[schema_id]
,[parent_object_id]
,[type]
,[type_desc]
,[create_date]
,[modify_date]
,[is_ms_shipped]
,[is_published]
,[is_schema_published]
,[lob_data_space_id]
,[filestream_data_space_id]
,[max_column_id_used]
,[lock_on_bulk_load]
,[uses_ansi_nulls]
,[is_replicated]
,[has_replication_filter]
,[is_merge_published]
,[is_sync_tran_subscribed]
,[has_unchecked_assembly_data]
,[text_in_row_limit]
,[large_value_types_out_of_row]
,[is_tracked_by_cdc]
,[lock_escalation]
,[lock_escalation_desc]
,[is_filetable]
,[is_memory_optimized]
,[durability]
,[durability_desc]
,[temporal_type]
,[temporal_type_desc]
,[history_table_id]
,[is_remote_data_archive_enabled]
,[is_external]
--Into tUpdatedTables
from sys.tables
where modify_date >= GETDATE()
order by modify_date desc
Msg 2714, Level 16, State 6, Line 4 There is already an object named
'tUpdatedTables' in the database.

I want to create another table which is updated daily with records of what tables (table name, modified/updated date) were updated.
If this is all you want, I would suggest instead simply doing daily backups. You should be doing that anyway.
Beyond that, what you're looking for is an audit log. Most languages and frameworks have libraries to do this for you. For example, paper_trail.
If you want to do this yourself, follow the basic pattern of paper_trail.
id as an autoincrementing primary key
item_type which would be the table, or perhaps something more abstract
item_id is the primary key of the item
event are you storing a create, an update, or a delete?
bywho identify who made the change
object a json field containing a dump of the data
created_at when this happened (use a default)
Using JSON is key to making this table generic. Rather than trying to store every possible column of every possible table, and having to keep that up to date as the tables change, you store a JSON dump of the row using FOR JSON. This means the audit table doesn't need to change as other tables change. And it will save a lot of disk space as it avoids the audit table having a lot of unused columns.
For example, here's how you'd record creating ID 5 of some_table by user 23. (I might be a bit off as I don't use SQL Server).
insert into audit_log (item_type, item_id, event, bywho, object)
values(
'some_table', 5, 'create', 23, (
select * from some_table where id = 5 for json auto
)
)
Because the audit table doesn't care about the structure of the thing being recorded, you use insert, update, and delete triggers to each table to record their changes in the audit log. Just change the item_type.
As for not getting too big, don't worry about it until it's a problem. Proper indexing means it won't be a problem: a composite index on (item_type, item_id) will make listing the changes to a particular thing fast. Indexing bywho will make searches for changes made by a particular thing fast. You shouldn't be referencing this thing in production. If you are, that probably requires a different design.
Partitioning the table by month could also stave off scaling issues.
And if it does get too big, you can backup the table and use created_at to delete old entries.
delete from audit_log
where created_at < dateadd(day, -31, getdate())

How to add a new column with existing and new rows to a table?

I have a table that I created with a unique key and each other column representing one day of December 2014 (eg named D20141226 for data from 26/12/2014). So the table consists of 32 columns (key + 31 days). These daily columns are indicating that a customer had a transaction on that specific day or no transaction is indicated by a 0.
Now I want to execute the same query on a daily basis, producing a list of unique keys that had a transaction on that specific day. I used this easy script:
CREATE TABLE C01012015 AS
SELECT DISTINCT CALLING_ISDN AS A_PARTY
FROM CDRICC_012015
WHERE CALL_STA_TIME ::date = '2015-01-01'
Now my question is, how can I add the content of the new daily table to the existing table with the 31 days, making it effectively a table with 32 days of data (and then continue to do so on a daily basis to store up to 360 days of data)?
Please note that new customer are doing transactions every day hence there will unique keys in the daily table that aren't in the big table holding all the previous days.
It would be ideal if those new rows would automatically get a 0 instead of a NULL but I can work around it if it gets a NULL value (not sure how to make sure it gets a 0 instead).
I thought that a FULL OUTER JOIN would be the solution but that would mean that I have to list all variables in the select statement, which becomes quite large as I add one more column each day. Is there a more elegant way to do this?
Or is SQL just not suited to this and a programming language like eg R would be much better at this?

If you have the option to change your schema completely, you should unpivot your table so that your columns are something like CUSTOMER_ID INTEGER, D DATE, DID_TRANSACTION BOOLEAN. There's a post on the Enzee Community website that suggests using a user-defined table function (UDTF) to do this. If you change your schema in this way, a simple insert will work just fine and there will be no need to add columns dynamically.
If you can't change your schema that much but you're still able to add columns, you could add a column for every day of the year up front with a default value of FALSE (assuming it's a boolean column representing whether the customer had a transaction or not on that day). You probably want to script this.
ALTER TABLE table_with_daily_columns MODIFY COLUMN (D20140101 BOOLEAN DEFAULT FALSE);
ALTER TABLE table_with_daily_columns MODIFY COLUMN (D20140102 BOOLEAN DEFAULT FALSE);
-- etc
ALTER TABLE table_with_daily_columns ADD COLUMN (D20150101 BOOLEAN DEFAULT FALSE);
GROOM TABLE table_with_daily_columns;
When you alter a table like this, Netezza creates a new table and an internal view that does a UNION of the new table and the old. You need to GROOM the table to merge the tables back into a single one for improved performance.
If you really must keep one column per day, then you'll have to use the method you described to pivot the data from your daily transaction table. Set the default value for each of your columns to 0 or FALSE as described above, then:
INSERT INTO table_with_daily_columns
SELECT
cust_id,
TRUE as D20150101
FROM C01012015;

How to add dates to database records for trending analysis

I have a SQL server database table that contain a few thousand records. These records are populated by PowerShell scripts on a weekly basis. These scripts basically overwrite last weeks data so the table only has information pertaining to the previous week. I would like to be able to take a copy of that tables data each week and add a date column with that day's date beside each record. I need this so can can do trend analysis in the future.
Unfortunately, I don't have access to the PowerShell scripts to edit them. Is there any way I can accomplish this using MS SQL server or some other way?

You can do the following. Create a table that will contain the clone + dates. Insert the results from your original table along with the date into your clone table. From your description you don't need a where clause because the results of the original table are wiped out only holding new data. After the initial table creation there is no need to do it again. You'll just simply do the insert piece. Obviously the below is very basic and is just to provide you the framework.
CREATE TABLE yourTableClone
(
col1 int
col2 varchar(5)...
col5 date
)
insert into yourTableClone
select *, getdate()
from yourOriginalTable

Select on Row Version

Can I select rows on row version?
I am querying a database table periodically for new rows.
I want to store the last row version and then read all rows from the previously stored row version.
I cannot add anything to the table, the PK is not generated sequentially, and there is no date field.
Is there any other way to get all the rows that are new since the last query?
I am creating a new table that contains all the primary keys of the rows that have been processed and will join on that table to get new rows, but I would like to know if there is a better way.
EDIT
This is the table structure:
Everything except product_id and stock_code are fields describing the product.

You can cast the rowversion to a bigint, then when you read the rows again you cast the column to bigint and compare against your previous stored value. The problem with this approach is the table scan each time you select based on the cast of the rowversion - This could be slow if your source table is large.
I haven't tried a persisted computed column of this, I'd be interested to know if it works well.
Sample code (Tested in SQL Server 2008R2):
DECLARE #TABLE TABLE
(
Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
Data VARCHAR(10) NOT NULL,
LastChanged ROWVERSION NOT NULL
)
INSERT INTO #TABLE(Data)
VALUES('Hello'), ('World')
SELECT
Id,
Data,
LastChanged,
CAST(LastChanged AS BIGINT)
FROM
#TABLE
DECLARE #Latest BIGINT = (SELECT MAX(CAST(LastChanged AS BIGINT)) FROM #TABLE)
SELECT * FROM #TABLE WHERE CAST(LastChanged AS BIGINT) >= #Latest
EDIT: It seems I've misunderstood, and you don't actually have a ROWVERSION column, you just mentioned row version as a concept. In that case, SQL Server Change Data Capture would be the only thing left I could think of that fits the bill: http://technet.microsoft.com/en-us/library/bb500353(v=sql.105).aspx
Not sure if that fits your needs, as you'd need to be able to store the LSN of "the last time you looked" so you can query the CDC tables properly. It lends itself more to data loads than to typical queries.

Assuming you can create a temporary table, the EXCEPT command seems to be what you need:
Copy your table into a temporary table.
The next time you look, select everything from your table EXCEPT everything from the temporary table, extract the keys you need from this
Make sure your temporary table is up to date again.
Note that your temporary table only needs to contain the keys you need. If this is just one column, you can go for a NOT IN rather than EXCEPT.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas