SQL Server - Syncing two database - sql

We have a warehouse database that contains a year of data up to now. I want to create report database that represents the last 3 months of data for reporting purposes. I want to be able to keep the two databases in sync. Right now, every 10 minutes I execute a package that will grab the most recent rows from the warehouse and adds them to the report db. The problem is that I only get new rows but not new updates.
I would like to know what are the various ways of solving this scenario.
Thanks

look into replication, mirroring or log shipping

If you are using SQL 2000 or below, replication is your best bet. Since you are doing this every ten minutes, you should definitely look at transactional replication.
If you are using SQL 2005 or greater, you have more options available to you. Database snapshots, log shipping, and mirroring as SQLMenace suggested above. The suitability of these vary depending on your hardware. You will have to do some research to pick the optimal one for your needs.

You should probably read about replication, or ask your DB admin about it.

Is it possible to add columns to this database? You could add a Last_Activity column to the DB and the write a trigger that updates the date/timestamp on that row to reflect the latest edit. For any new entries, the date/time would reflect the timestamp when the row was added.
This way, when you grab the last three months, you'd be grabbing the last three months' activity, not just the new stuff.

Related

Azure SQL reporting on last READ records

Is it possible to generate a report from Azure MS SQL Server which shows which records in a table were last read from?
We have a table which we would like to begin cleaning records out of and it would be useful to know which data it contains that is no longer used by the client application. Unfortunately, it does not contain a datetime field which shows when the records were last accessed.
It is not a feature in SQL Server. The reason is that it would make the database a lot slower if we turned every read into a write. Since we have to log everything, we'd generate tons of log write traffic. There is a feature called Temporal Tables which doesn't quite do what you ask but it does have start/end dates for rows. You could track when you don't want to see a row anymore and then it would go into the history table. You can then remove rows from the history table after some period of non-use. The retention feature can be seen here and you can read a conceptual overview of temporal tables here

Should I split my database into current year's data and all past year's data?

I am trying to configure a SQL server to run as efficiently as possible, while maintaining clean DB development standards. I have tables with millions of records, and sometimes have to join 5+ tables in a single query for Tableau Reports.
My question is whether it would be operationally more efficient to replicate the DB that I currently use and move all data for past seasons into it, since that data is only rarely used in reports. So my current DB would only have FY2018 data and the historical DB would contain FY2017, FY2016, FY2015, and FY2014.
Would this be worth the hassle and potential maintenance headaches? I don't know if it's worth it, or unneeded in the absence of inefficient queries.
Thanks!
No. What do you do at the end of the year?
What you should do is create an index on year and then when they user puts a criteria on year (eg = current year) it will run just as fast as if it had been split.

How to take backup of all table's last ten days record?

I want to take backup of my database xyz.
Tables of this database should contain all records for last ten days only.
Is it possible? If yes then how I can achieve it?
You could check the answers posted here.
Or if you specify 10 days because that was the date of the LAST backup operation, you can use MySQL Backup's Incremental backup operations.
If you need to capture some of the DB to synchronize it with a different DB, this SQLyog information might be helpful.

sql server get only updated record

I am using sql server 2000. I need to get only updated records from remote server and need to insert that record in my local server on daily basis. But that table did not have created date or modified date field.
Use Transactional Replication.
Update
If you cannot do administrative operations on the source then you'll going to have to read all the data every day. Since you cannot detect changes (and keep in mind that even if you'd have a timestamp you still wouldn't be able to detect changes because there is no way to detect deletes with a timestamp) then you have to read every row every time you sync. And if you read every row, then the simplest solution is to just replace all the data you have with the new snapshot.
You need one of the following
a column in the table which flag new or updated records in a fashion or other (lastupdate_timestamp, incremental update counter...)
some trigger on Insert and Update, on the table, which produces some side-effect such as adding the corresponding row id into a separate table
You can also compare row-by-row the data from the remote server against that of the production server to get the list of new or updated rows... Such a differential update can also be produced by comparing some hash value, one per row, computed from the values of all columns for the row.
Barring one the above, and barring some MS-SQL built-in replication setup, the only other possibility I can think of is [not pretty]:
parsing the SQL Log to identify updates and addition to the table. This requires specialized software; I'm not even sure if the Log file format is published/documented, though I have seen this types of tools. Frankly this approach is more one for forensic-type situations...
If you can't change the remote server's database, your best option may be to come up with some sort of hash function on the values of a given row, compare the old and new tables, and pull only the ones where function(oldrow) != function(newrow).
You can also just do a direct comparison of the columns in question, and copy that record over when not all the columns in question are the same between old and new.
This means that you cannot modify values in the new table, or they'll get overwritten daily from the old. If this is an issue, you'll need another table in which to cache the old table's values from the day before; then you'll be able to tell whether old, new, or both were modified in the interim.
I solved this by using tablediff utility which will compare the data in two tables for non-convergence, and is particularly useful for troubleshooting non-convergence in a replication topology.
See the link.
tablediff utility
TO sum up:
You have an older remote db server that you can't modify anything in (such as tables, triggers, etc).
You can't use replication.
The data itself has no indication of date/time it was last modified.
You don't want to pull the entire table down each time.
That leaves us with an impossible situation.
You're only option if the first 3 items above are true is to pull the entire table. Even if they did have a modified date/time column, you wouldn't detect deletes. Which leaves us back at square one.
Go talk to your boss and ask for better requirements. Maybe something that can be done this time.

How would you maintain a history in SQL tables?

I am designing a database to store product informations, and I want to store several months of historical (price) data for future reference. However, I would like to, after a set period, start overwriting initial entries with minimal effort to find the initial entries. Does anyone have a good idea of how to approach this problem? My initial design is to have a table named historical data, and everyday, it pulls the active data and stores it into the historical database with a time stamp. Does anyone have a better idea? Or can see what is wrong with mine?
First, I'd like to comment on your proposed solution. The weak part of course is that, there can, actually, be more than one change between your intervals. That means, the record was changed three times during the day, but you only archive the last change.
It's possible to have the better solution, but it must be event-driven. If you have the database server that supports events or triggers (like MS SQL), you should write a trigger code that creates entry in history table. If your server does not support triggers, you can add the archiving code to your application (during Save operation).
You could place a trigger on your price table. That way you can archive the old price in an other table at each update or delete event.
It's a much broader topic than it initially seems. Martin Fowler has a nice narrative about "things that change with time".
IMO your approach seems sound if your required history data is a snapshot of the end of the day's data - in the past I have used a similar approach with overnight jobs (SP's) that pick up the day's new data, timestamp it and then use a "delete all data that has a timestamp < today - x" where x is the time period of data I want to keep.
If you need to track all history changes, then you need to look at triggers.
I would like to, after a set period, start overwriting initial entries with minimal effort to find the initial entries
We store data in Archive tables, using a Trigger, as others have suggested. Our archive table has additional column for AuditDate, and stores the "Deleted" data - i.e. the previous version of the data. The current data is only stored in the actual table.
We prune the Archive table with a business rule along the lines of "Delete all Archive data more than 3 months old where there exists at least one archive record younger than 3 months old; delete all archive data more than 6 months old"
So if there has been no price change in the last 3 months you would still have a price change record from the 3-6 months ago period.
(Ask if you need an example of the self-referencing-join to do the delete, or the Trigger to store changes in the Archive table)