Run restore on working database. What happens? - sql

What happens when I runned:
zcat /mnt/Postgres/restoreFile.gz | psql my_db
on the working database and after doing ALTER TABLE and other standard things there were problems with duplicated keys. When I stopped it and tried to insert into database then I got duplicates key error because of sequences and constraints. Seems like all data is in but what about the sequences. What really happend with that database?

A normal Postgres backup consists of table design (like create table) and data (like insert) statements. If you run it twice, most design statements will fail. The insert statements would succeed in so far as the data definition allows for duplicate rows.
So restoring a database to a production server would typically result in a lot of duplicate rows in tables without a primary key. Some design changes made after the backup (like changing the owner of a table) may be undone.

Related

Preserve table data during test data 'refresh'

We have a Postgres database in a test system which we want to 'refresh' with the data from the production system. However, there are some tables with test configuration data that I want to preserve in the test database. Note that the tables I want to preserve are referred to with foreign key constraints in other tables that are not preserved.
To refresh the test database, we usually rename it to '..._old' and then re-create the database from a dump of the production data.
Now there's a few ways to try to preserve the test configuration data, but I'm wondering if anyone has any brilliant ideas that are better/faster. Hoping that we can script this somehow to make it easy each time we do this.
A straight pg_dump/pg_restore won't work, because it will only INSERT not UPDATE the matching records. Or am I missing something?
I had thought about doing it by:
Renaming the tables involved with '..._test'
Using pg_dump to dump just those renamed tables to a file
Re-create the 'refreshed' database
Restore the renamed tables from file into the new database
Perform an UPDATE table_a SET (......) = (SELECT * FROM table_a_test) to overwrite refreshed data with preserved test data
Note that the number of records in the production data and the test data may not be the same.
The content of these tables is not huge, so I had also thought about generating UPDATE scripts for all the records within the preserved data.
Can anyone think of a better way to do this?

DROP TABLE or DELETE TABLE? Which is best practice?

Working on redesigning some databases in my SQL SERVER 2012 instance.
I have databases where I put my raw data (from vendors) and then I have client databases where I will (based on client name) create a view that only shows data for a specific client.
Because of the this data being volatile (Google Adwords & Google DFA) I typically just delete the last 6 days and insert 7 days everyday from the vendor databases. Doing this gives me comfort in knowing that Google has had time to solidify its data.
The question I am trying to answer is:
1. Instead of using views, would it be better use a 'SELECT INTO' statement and DROP the table everyday in the client database?
I'm afraid that by automating my process using the 'DROP TABLE' method will not scale well longterm. While testing it myself, it seems that performance is improved because it does not have to scan the entire table for the date range. I've also tested this with an index on the 'date' column and performance still seemed better with the 'DROP TABLE' method.
I am looking for best practices here.
NOTE: This is my first post. So I am not too familiar with how to format correctly. :)
Deleting rows from a table is a time-consuming process. All the deleted records get logged, and performance of the server suffers.
Instead, databases offer truncate table. This removes all the rows of the table without logging the rows, but keeps the structure intact. Also, triggers, indexes, constraints, stored procedures, and so on are not affected by the removal of rows.
In some databases, if you delete all rows from a table, then the operation is really truncate table. However, SQL Server is not one of those databases. In fact the documentation lists truncate as a best practice for deleting all rows:
To delete all the rows in a table, use TRUNCATE TABLE. TRUNCATE TABLE
is faster than DELETE and uses fewer system and transaction log
resources. TRUNCATE TABLE has restrictions, for example, the table
cannot participate in replication. For more information, see TRUNCATE
TABLE (Transact-SQL)
You can drop the table. But then you lose auxiliary metadata as well -- all the things listed above.
I would recommend that you truncate the table and reload the data using insert into or bulk insert.

Truncate All Tables under a Schema in DB2

I want to truncate all tables under a specific schema in DB2 which is worked on a Linux Server. But I have no right to ALTER TABLE to disable the foreign-key constraints.
Is there anyway to do this?
I'm considering performing a topology sort based on the constraints between tables, but it is a little bit complex.
Any good idea on this problem?
You don't say what platform you're on. This answer is specific to DB2 on Linux, UNIX and Windows.
If you have LOAD, INSERT and DELETE privileges on the table(s) you can use the LOAD command with an empty file to truncate the tables, regardless of whether there are foreign key constraints:
LOAD from /dev/null of del replace into yourschema.yourtable nonrecoverable
This will place any dependent tables in check pending stateā€¦ Once you have truncated all of your tables you would use the SET INTEGRITY statement to take all of the tables out of check pending.

Automatically dropping PostgreSQL tables once per day

I have a scenario where I have a central server and a node. Both server and node are capable of running PostgreSQL but the storage space on the node is limited. The node collects data at a high speed and writes the data to its local DB.
The server needs to replicate the data from the node. I plan on accomplishing this with Slony-I or Bucardo.
The node needs to be able to delete all records from its tables at a set interval in order to minimize disk space used. Should I use pgAgent with a job consisting of a script like
DELETE FROM tablex, tabley, tablez;
where the actual batch file to run the script would be something like
#echo off
C:\Progra~1\PostgreSQL\9.1\bin\psql -d database -h localhost -p 5432 -U postgres -f C:\deleteFrom.sql
?
I'm just looking for opinions if this is the best way to accomplish this task or if anyone knows of a more efficient way to pull data from a remote DB and clear that remote DB to save space on the remote node. Thanks for your time.
The most efficient command for you is the TRUNCATE command.
With TRUNCATE, you can chain up tables, like your example:
TRUNCATE tablex, tabley, tablez;
Here's the description from the postgres docs:
TRUNCATE quickly removes all rows from a set of tables. It has the same effect as an unqualified DELETE on each table, but since it does not actually scan the tables it is faster. Furthermore, it reclaims disk space immediately, rather than requiring a subsequent VACUUM operation. This is most useful on large tables.
You may also add CASCADE as a parameter:
CASCADE Automatically truncate all tables that have foreign-key references to any of the named tables, or to any tables added to the group due to CASCADE.
The two best options, depending on your exact needs and workflow, would be truncate, as #Bohemian suggested, or to create a new table, rename, then drop.
We use something much like the latter create/rename/drop method in one of our major projects. This has an advantage where you need to be able to delete some data, but not all data, from a table very quickly. The basic workflow is:
Create a new table with a schema identical to the old one
CREATE new_table LIKE ...
In a transaction, rename the old and new tables simultaneously:
BEGIN;
RENAME table TO old_table;
RENAME new_table TO table;
COMMIT;
[Optional] Now you can do stuff with the old table, while the new table is happily accepting new inserts. You can dump the data to your centralized server, run queries on it, or whatever.
Delete the old table
DROP old_table;
This is an especially useful strategy when you want to keep, say, 7 days of data around, and only discard the 8th day's data all at once. Doing a DELETE in this case can be very slow. By storing the data in partitions (one for each day), it is easy to drop an entire day's data at once.

How to figure out which record has been deleted in an effiecient way?

I am working on an in-house ETL solution, from db1 (Oracle) to db2 (Sybase). We needs to transfer data incrementally (Change Data Capture?) into db2.
I have only read access to tables, so I can't create any table or trigger in Oracle db1.
The challenge I am facing is, how to detect record deletion in Oracle?
The solution which I can think of, is by using additional standalone/embedded db (e.g. derby, h2 etc). This db contains 2 tables, namely old_data, new_data.
old_data contains primary key field from tahle of interest in Oracle.
Every time ETL process runs, new_data table will be populated with primary key field from Oracle table. After that, I will run the following sql command to get the deleted rows:
SELECT old_data.id FROM old_data WHERE old_data.id NOT IN (SELECT new_data.id FROM new_data)
I think this will be a very expensive operation when the volume of data become very large. Do you have any better idea of doing this?
Thanks.
Which edition of Oracle ? If you have Enterprise Edition, look into Oracle Streams.
You can grab the deletes out of the REDO log rather than the database itself
One approach you could take is using the Oracle flashback capability (if you're using version 9i or later):
http://forums.oracle.com/forums/thread.jspa?messageID=2608773
This will allow you to select from a prior database state.
If there may not always be deleted records, you could be more efficient by:
Storing a row count with each query iteration.
Comparing that row count to the previous row count.
If they are different, you know you have a delete and you have to compare the current set with the historical data set from flashback. If not, then don't bother and you've saved a lot of cycles.
A quick note on your solution if flashback isn't an option: I don't think your select query is a big deal - it's all those inserts to populate those side tables that will really take a lot of time. Why not just run that query against the sybase production server before doing your update?