sql server get only updated record - sql

I am using sql server 2000. I need to get only updated records from remote server and need to insert that record in my local server on daily basis. But that table did not have created date or modified date field.

Use Transactional Replication.
Update
If you cannot do administrative operations on the source then you'll going to have to read all the data every day. Since you cannot detect changes (and keep in mind that even if you'd have a timestamp you still wouldn't be able to detect changes because there is no way to detect deletes with a timestamp) then you have to read every row every time you sync. And if you read every row, then the simplest solution is to just replace all the data you have with the new snapshot.

You need one of the following
a column in the table which flag new or updated records in a fashion or other (lastupdate_timestamp, incremental update counter...)
some trigger on Insert and Update, on the table, which produces some side-effect such as adding the corresponding row id into a separate table
You can also compare row-by-row the data from the remote server against that of the production server to get the list of new or updated rows... Such a differential update can also be produced by comparing some hash value, one per row, computed from the values of all columns for the row.
Barring one the above, and barring some MS-SQL built-in replication setup, the only other possibility I can think of is [not pretty]:
parsing the SQL Log to identify updates and addition to the table. This requires specialized software; I'm not even sure if the Log file format is published/documented, though I have seen this types of tools. Frankly this approach is more one for forensic-type situations...

If you can't change the remote server's database, your best option may be to come up with some sort of hash function on the values of a given row, compare the old and new tables, and pull only the ones where function(oldrow) != function(newrow).
You can also just do a direct comparison of the columns in question, and copy that record over when not all the columns in question are the same between old and new.
This means that you cannot modify values in the new table, or they'll get overwritten daily from the old. If this is an issue, you'll need another table in which to cache the old table's values from the day before; then you'll be able to tell whether old, new, or both were modified in the interim.

I solved this by using tablediff utility which will compare the data in two tables for non-convergence, and is particularly useful for troubleshooting non-convergence in a replication topology.
See the link.
tablediff utility

TO sum up:
You have an older remote db server that you can't modify anything in (such as tables, triggers, etc).
You can't use replication.
The data itself has no indication of date/time it was last modified.
You don't want to pull the entire table down each time.
That leaves us with an impossible situation.
You're only option if the first 3 items above are true is to pull the entire table. Even if they did have a modified date/time column, you wouldn't detect deletes. Which leaves us back at square one.
Go talk to your boss and ask for better requirements. Maybe something that can be done this time.

Related

SQL Server Auditing Data in the Same Table

A project I'm working on requires that a record be digitally "signed" and after that any modifications would create a new "version" of the row. The "signed" record can't be modified for regulatory reasons and new versions shouldn't be modified very often. In the past, done so by creating a separate logging table with the same schema as the main table with some extra columns for tracking who modified it and when.
However, after doing some work with SharePoint where ALL data (including different versions) is put into the same table I thought of a different approach which I can't find any examples of people doing: I could put the new version of the row right in the same table and increment the version number. Then add the version number to the PK.
PROS:
Implementation is easy, just create an "Instead of update" trigger which performs an insert instead of an update of the row is "signed". I could easily add a IsCurrentVersion column to be updated in the trigger.
Querying for older versions is easy, just get all the records with
the ID I want let the user choose from the list.
A trigger is nice because it guarantees that a row CAN'T be updated if signed (for regulatory and audit purposes).
Schema changes to the table don't have to be replicated to the mirror "logging" table.
CONS:
The table could get a bit larger but most of the time the record won't be changed after "signing" it. The client estimated around 100,000 rows/year max at current usage levels. SQL Server can handle hundreds of millions of rows so this doesn't seem too bad.
Indexing and performance could be an issue. SharePoint adds a tp_CalculatedVersion int to the PK where the calculated number is always 0 for the latest version. I could do the same and calculate it based off the Version number. Would that help performance?
There is an extra step in querying the data to make sure you get the latest version but that could be handled in a SP.
What other cons are there in this scenario. Am I missing anything??
I've seen this pattern used in an enterprise system before,and IMO it wasn't successful.
You are Mixing two different concerns here, viz storage of live and audit data. Queries to this table will always need to keep in mind whether they are seeking leaf or audit data (e.g. reports) - new team members may find this non intuitive. You would likely need to encapsulate this complexity with views etc.
As you mentioned performance will always be a concern. Inserting a new record will also need to update the previous record to mark it as inactive.You may also need to consider changing your clustered index to keep all versions on the same page.
Foreign keys to this table are going to be problematic. Do you
reference an exact version record? Do you then fix up the foreign
keys to point to the new live leaf record?
The one benefit I can think of doing this is that the audit table DDL will always be in synch with the live table - often with the 2 table strategy changes are made to the live, and the audit and trigger DDL isn't updated accordingly.
Overall, I would still recommend keeping your audit table separate.
If the requirement is that the signed data not be changed, then you should move it to another table. In fact, I might suggest moving it to another database/schema, where the only operation allowed on the table is inserting and reading records. You can use both permissions and triggers, if you really want to prevent updates.
You don't want to mess around with regulatory requirements. A complex schema that uses a combination of primary key with version, along with triggers, is a sign that there might be a simpler way.
The historical records can affect performance of the current records. If you end up in a situation where every record has changed 100 times, then keeping them in the same table is just going to slow down queries. Of course, you can embark on more complexity, in the form of partitioning the data. In the end, the solution is simpler: keep the data that cannot be changed in another table where it cannot be changed. You don't want to have to upgrade the hardware just because lots of history has accumulated.
I would also suggest including an effective and end date in the history records. This will allow you to reconstruct all the data as of a particular date, something that users might find useful in the future.
That's right. Audit trails can stay in an application for internal reporting/audit but infosec best practice mandates getting audit logs off the system where they are generated into your log management / SIEM solution.

How to find number of rows inserted/deleted in MySQL

Is there a way to find out the number of rows inserted/deleted in a table in MySQL? Is this kind of statistics kept somewhere in the database? If not, what would be the best way to implement something to keep track of these statistics?
When I say how many, I mean within a certain period (last 24 hours, or since server was up, or last week etc)
When I need to keep track of deleted things, I just don't delete.
I change a column value that excludes it from normal user results.
If space is an issue, you can set it's contents you no longer care about to empty.
Inserted you can user COUNT()
The Binary Log contains records of all queries that update or insert data. I don't know if it stores the number of affected rows, however.
There is also a General Query Log, which tracks all queries that were run.
(Information current for MySQL 5.0. If you're using an older version ymmv)
If I want to handle logging my SQL queries, I have 2 possibilities:
Turning the MySQL Log function on
Writting my own 'trace' class
I prefer doing number 2.
Why?
Because it is more controllable. You can easily differ from INSERT DELETE UPDATE and so on queries.
But that is not the only advantage of your own trace class, because creating trace files (so called "logs") makes administrative tasks much more easier.
You can structure the trace output, put it into a separate database, store it into some XML or JSON file.
You can order things as you want them to be.

Database history for client usage

I'm trying to figure out what would be the best way to have a history on a database, to track any Insert/Delete/Update that is done. The history data will need to be coded into the front-end since it will be used by the users. Creating "history tables" (a copy of each table used to store history) is not a good way to do this, since the data is spread across multiple tables.
At this point in time, my best idea is to create a few History tables, which the tables would reflect the output I want to show to the users. Whenever a change is made to specific tables, I would update this history table with the data as well.
I'm trying to figure out what the best way to go about would be. Any suggestions will be appreciated.
I am using Oracle + VB.NET
I have used very successfully a model where every table has an audit copy - the same table with a few additional fields (time stamp, user id, operation type), and 3 triggers on the first table for insert/update/delete.
I think this is a very good way of handling this, because tables and triggers can be generated from a model and there is little overhead from a management perspective.
The application can use the tables to show an audit history to the user (read-only).
We've got that requirement in our systems. We added two tables, one header, one detail called AuditRow and AuditField. The AuditRow contains one row per row changed in any other table, and the AuditField contains one row per column changed with old value and new value.
We have a trigger on every table that writes a header row (AuditRow) and the needed detail rows (one per changed colum) on each insert/update/delete. This system does rely on the fact that we have a guid on every table that can uniquely represent the row. Doesn't have to be the "business" or "primary" key, but it's a unique identifier for that row so we can identify it in the audit tables. Works like a champ. Overkill? Perhaps, but we've never had a problem with auditors. :-)
And yes, the Audit tables are by far the largest tables in the system.
If you are lucky enough to be on Oracle 11g, you could also use the Flashback Data Archive
Personally, I would stay away from triggers. They can be a nightmare when it comes to debugging and not necessarily the best if you are looking to scale out.
If you are using an PL/SQL API to do the INSERT/UPDATE/DELETEs you could manage this in a simple shift in design without the need (up front) for history tables.
All you need are 2 extra columns, DATE_FROM and DATE_THRU. When a record is INSERTed, the DATE_THRU is left NULL. If that record is UPDATEd or DELETEd, just "end date" the record by making DATE_THRU the current date/time (SYSDATE). Showing the history is as simple as selecting from the table, the one record where DATE_THRU is NULL will be your current or active record.
Now if you expect a high volume of changes, writing off the old record to a history table would be preferable, but I still wouldn't manage it with triggers, I'd do it with the API.
Hope that helps.

SQL, selecting and updating

I am trying to select 100s of rows at a DB that contains 100000s of row and update those rows afters.
the problem is I don't want to go to DB twice for this purpose since update only marks those rows as "read".
is there any way I can do this in java using simple jdbc libraries? (hopefully without using stored procedures)
update: ok here is some clarification.
there are a few instance of same application running on different servers, they all need to select 100s of "UNREAD" rows sorted according to creation_date column, read blob data within it, write it to file and ftp that file to some server. (I know prehistoric but requirements are requirements)
The read and update part is for to ensure each instance getting diffent set of data. (in order, tricks like odds and evens wont work :/)
We select data for update. the data transfers through the wire (we wait and wait) and then we update them as "READ". then release lock for reading. this entire thing takes too long. By reading and updating at the same time, I would like to reduce lock time (from time we use select for update to actual update) so that using multiple instances would increase read rows per second.
Still have ideas?
It seems to me there might be more than one way to interpret the question here.
You are selecting the rows for the
sole purpose of updating them and
not reading them.
You are selecting the rows to show
to somebody, and marking them as
read either one at a time or all as a group.
You want to select the rows and mark
them as read at the time you select
them.
Let's take Option 1 first, as that seems to be the easiest. You don't need to select the rows in order to update them, just issue an update with a WHERE clause:
update table_x
set read = 'T'
where date > sysdate-1;
Looking at option 2, you want to mark them as read when a user has read them (or a down stream system has received it, or whatever). For this, you'll probably have to do another update. If you query for the primary key, in addition to the other columns you'll need in the first select, you will probably have an easier time of updating, as the DB won't have to do table or index scans to find the rows.
In JDBC (Java) there is a facility to do a batch update, where you execute a set of updates all at once. That's worked out well when I need to perform a lot of updates that are of the exact same form.
Option 3, where you want to select and update all in one shot. I don't find much use for this, personally, but that doesn't mean others don't. I suppose some kind of stored procedure would reduce the round trips. I'm not sure what db you are working with here and can't really offer specifics.
Going to the DB isn't so bad. If you aren't returning anything 'across the wire' then an update shouldn't do you too much damage and its only a few hundred thousand rows. What is your worry?
If you're doing a SELECT in JDBC and iterating over the ResultSet to UPDATE each row, you're doing it wrong. That's an (n+1) query problem that will never perform well.
Just do an UPDATE with a WHERE clause that determines which of those rows needs to be updated. It's a single network round trip that way.
Don't be too code-centric. Let the database do the job it was designed for.
Can't you just use the same connection without closing it?

How can I determine the last time any record changed in a specific Sql Server 2000 database?

I have a SQL Server 2000 database instance that is rarely updated. I also have a database table which has no columns holding each row's created date or modified date.
Is there any way that I can determine the last time an update or insert was performed on the database as a whole, so that I can at least put a bound on when the specific records in the table may have changed?
Note: I am looking for information about transactions that have already occurred. Triggers may help we should I require this again in the future, but does not address the problem I'm trying to describe.
If it can be done, how can I do it?
The database's log file may have some information that is useful to your quest. AFAIK, the database itself doesn't store a "last updated" date.
Depending on the size of the database and the number of tables you could put a trigger in place that would handle updates/or inserts and log that to another table, potentially logging the table name and a timestamp, it isn't elegant but could work. and Doesn't require any modification to the rest of the db.