Database Replication after Data Load

Database Replication after Data Load - sql

I'm trying to understand the ramifications of database replication (SQL Server or Golden Gate) for situations where the source database is completely repopulated every night. To clarify, all existing tables are dropped and then the database is reloaded with new tables using same name along with all the data.
Based on my understanding i.e. that replication uses a transaction log... I would assume it will also repeat the process of dropping the tables instead of identifying the differences and just adding the new data. Is that correct?

You can set up the replication using OracleGoldenGate so that it will be doing what you want it to do.
the TRUNCATE TABLE command can be replicated or it can be ignored
the populating of the source table (INSERT/bulk operations) can be replicated or it can be ignored
if a row already exists (meaning a row with the same PK exists) on the target and you INSERT it on the source you can either UPDATE the target or DELETE the old one and INSERT the new, or ignore it
Database replication is based on the redo (transaction) log. Only particular events that appear on the source databases, which are logged can be replicated. But the whole replication engine can make some additional transformations as it is replicating the changes.

Related

DB2: Working with concurrent DDL operations

We are working on a data warehouse using IBM DB2 and we wanted to load data by partition exchange. That means we prepare a temporary table with the data we want to load into the target table and then use that entire table as a data partition in the target table. If there was previous data we just discard the old partition.
Basically you just do "ALTER TABLE target_table ATTACH PARTITION pname [starting and ending clauses] FROM temp_table".
It works wonderfully, but only for one operation at a time. If we do multiple loads in parallel or try to attach multiple partitions to the same table it's raining deadlock errors from the database.
From what I understand, the problem isn't necessarily with parallel access to the target table itself (locking it changes nothing), but accesses to system catalog tables in the background.
I have combed through the DB2 documentation but the only reference to the topic of concurrent DDL statements I found at all was to avoid doing them. The answer to this question, can't be to simply not attempt it?
Does anyone know a way to deal with this problem?
I tried to have a global, single synchronization table to lock if you want to attach any partitions, but it didn't help either. Either I'm missing something (implicit commits somewhere?) or some of the data catalog updates even happen asynchronously, which makes the whole problem much worse. If that is the case, is there are any chance at all to query if the attach is safe to perform at any given moment?

Run restore on working database. What happens?

What happens when I runned:
zcat /mnt/Postgres/restoreFile.gz | psql my_db
on the working database and after doing ALTER TABLE and other standard things there were problems with duplicated keys. When I stopped it and tried to insert into database then I got duplicates key error because of sequences and constraints. Seems like all data is in but what about the sequences. What really happend with that database?

A normal Postgres backup consists of table design (like create table) and data (like insert) statements. If you run it twice, most design statements will fail. The insert statements would succeed in so far as the data definition allows for duplicate rows.
So restoring a database to a production server would typically result in a lot of duplicate rows in tables without a primary key. Some design changes made after the backup (like changing the owner of a table) may be undone.

Database replication stops if data is changed on the subscriber

Should the SQL database replication stop if i delete one record in a replicated table on the subscriber end?
I remember having a replication running where the delete on subscriber would be overwritten from publisher effectively preventing you from deleting the data.
But in our new configuration it crashed the replication when we deleted one record.

It depends. If you deleted a row at the subscriber that is subsequently updated at the publisher, replication will break when the update is propagated. Why is this? If you look at how the command is replicated, it calls a stored procedure with the PK column(s), a bit mask of what columns changed, and then the list of new values for columns that changed (I'm glossing over some detail, but you can look for yourself in the subscriber database; the procedures are all there and pretty accessible). Because it doesn't re-replicate the whole row again, if it doesn't find the row indicated by the PK, replication assumes (correctly) that the subscriber is no longer in sync with the publisher and stops. As far as I know, replication has never worked in the way you describe.
TL;DR: you should treat the subscriber database(s) as read-only except by the replication process itself.

Auditing data changes in SQL Server 2008

I am trying to find a highly efficient method of auditing changes to data in a table. Currently I am using a trigger that looks at the INSERTED and DELETED tables to see what rows have changed and inserts these changes into an Audit table.
The problem is this is proving to be very inefficient (obviously!). It's possible that with 3 thousand rows inserted into the database at one time (which wouldn't be unusual) that 215000 rows would have to be inserted in total to audit these rows.
What is a reasonable way to audit all this data without it taking a long time to insert in to the database? It needs to be fast!
Thanks.

A correctly written trigger should be fast enough.
You could also look at Change Data Capture
Auditing in SQL Server 2008
I quite often use AutoAudit:
AutoAudit is a SQL Server (2005, 2008, 2012) Code-Gen utility that creates
Audit Trail Triggers with:
Created, CreatedBy, Modified, ModifiedBy, and RowVersion (incrementing
INT) columns to table
Insert event logged to Audit table
Updates old and new values logged to Audit table Delete logs all
final values to the Audit table
view to reconstruct deleted rows
UDF to reconstruct Row History
Schema Audit Trigger to track schema changes
Re-code-gens triggers when Alter Table changes the table
Update: (Original edit was rejected, but I'm re-adding it):
A major upgrade to version 3.20 was released in November 2013 with these added features:
Handles tables with up to 5 PK columns
Performance improvements up to 90% faster than version 2.00
Improved historical data retrieval UDF
Handles column/table names that need quotename [ ]
Archival process to keep the live Audit tables smaller/faster but retain the older data in archive AutoAudit tables

As others already mentioned - you can use Change Data Capture, Change Tracking, and Audit features in SQL Server, but to keep it simple and use one solution to track all SQL Server activities including these DML operations I suggest trying ApexSQL Comply. You can disable all other, and leave DML auditing option only
It uses a centralized repository for captured information on multiple SQL Server instances and their databases.
It would be best to read this article first, and then decide on using this tool:
http://solutioncenter.apexsql.com/methods-for-auditing-sql-server-data-changes-part-9-the-apexsql-solution/

SQL Server Notifications on insert update delete table change
SqlTableDependency C# componenet provides the low-level implementation to receive database notification creating SQL Server Queue and Service Broker.
Have a look at http://www.sqltabledependency.it/
For any record change, SqlTableDependency's event handler will get a notification containing modified table record values as well as DML - insert, update, delete - change executed on your database table.

You could allow the table to be self auditing by adding additional columns, for example:
For an INSERT - this is a new record and it's existence in the table is the audit itself.
With a DELETE - you can add columns like IsDeleted BIT \ DeletingUserID INT \ DeletingTimestamp DATETIME to your table.
With an UPDATE you add columns like IsLatestVersion BIT \ ParentRecordID INT to track version changes.

SSIS Data Migration Primary Key Identity Conflicts

We have developed a large data migration from one DB schema to the other. We had built it based on the idea that the destination DB would be empty, however months ago we started putting clients on the new application which means their data is being housed in the new schema (the destination DB).
Now we're in a situation where the primary keys could overlap from the source to the destination DB and we're struggling to come up with a solution. The only solution I can think of is to check if the ID exists in the destination, updated the ID in the source to be 1 more than the greatest ID in the destination, and then migrate the record. This seems really cumbersome to have to do for hundreds of tables. Any ideas?

Sorry I don't know anything about SSIS but the following are a few ways to solve the problem using SQL.
When inserting into the destination tables, do not insert identities. As rows are inserted, capture the newly inserted identities and the old identities in a mapping table, see MERGE + OUTPUT INTO. Use the mapping table to update the tables that haven't been inserted, substituting the old identities with the new identities.
Of course for this to work, insertion into tables has to be done in an order that won't cause foreign key or constraint violations.
If you're not into doing all that, and you can lock users out of tables for short periods of time, DBCC CHECK INDENT could be used to 'reserve' identities. These new identities can then be used to update the old data and then insert with SET IDENTITY_INSERT ON.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas