We have a Postgres database in a test system which we want to 'refresh' with the data from the production system. However, there are some tables with test configuration data that I want to preserve in the test database. Note that the tables I want to preserve are referred to with foreign key constraints in other tables that are not preserved.
To refresh the test database, we usually rename it to '..._old' and then re-create the database from a dump of the production data.
Now there's a few ways to try to preserve the test configuration data, but I'm wondering if anyone has any brilliant ideas that are better/faster. Hoping that we can script this somehow to make it easy each time we do this.
A straight pg_dump/pg_restore won't work, because it will only INSERT not UPDATE the matching records. Or am I missing something?
I had thought about doing it by:
Renaming the tables involved with '..._test'
Using pg_dump to dump just those renamed tables to a file
Re-create the 'refreshed' database
Restore the renamed tables from file into the new database
Perform an UPDATE table_a SET (......) = (SELECT * FROM table_a_test) to overwrite refreshed data with preserved test data
Note that the number of records in the production data and the test data may not be the same.
The content of these tables is not huge, so I had also thought about generating UPDATE scripts for all the records within the preserved data.
Can anyone think of a better way to do this?
Related
As stated I need help deleting all data from every table in a test database. There are 3477 tables and some of the tables were created by a past employee so I was unable to create a schema of the DB and recreate it empty.
Is there a fast way to delete all of the data and keep all of the tables and their structure? Also, I noticed when deleting data from the DB with Delete table_name, that the data file wasn't decreasing in size. Any reason why? Then I tried to just delete the data file to see what would happen and it erased everything, so i had to restore the test database. Now I'm back at block one....
Any help or guidance would be appreciated.... I've read a lot and everything just says use Delete or Truncate, but rather not do that for 3477 tables.
The TRUNCATE TABLE command deletes the data inside a table, but not the table itself.
You have a lot of tables (more than 3000...), so take a look to following link to truncate all tables:
Truncate all tables in a SQL Server database
I'm new to DBA and not much of a SQL person, so be gentle please.
I'd like to restructure a database that requires adding new columns, tables, and relationships followed by removing old tables, columns, and relationships. A three step process seems to be in order.
Change schema to add new stuff
Run SSIS to hook up new data using some of the old data.
Change schema to drop old stuff.
I'm using a SQL database Project in VS 2015 to maintain the schema, and using schema compare to update the DB schema. I'd like to make it repeatable or automatic, if possible, so I can test it out on a non-production database to get the flow right: change schema->run ETL->change schema. Is there a way to apply schema changes from within ETL or does this require manual operations? Is there a way to store two schemas into files and then apply them, other than VS publish or compare?
There is a SQL TASK that allows you to do what you want to do. You want to alter table (to add columns), move the data from old columns to new columns, then drop the old columns.
1) Alter table tableA add column ..
2) update table tableA set ..
3) alter table tableA drop column...
Please test your code carefully before running it.
It worked! Here is the example of the ETL. Note that it's important to set DelayValidation to true for the data flows and to disable ValidateExternalMetadata for some of the operations within the data flows because the database is not static.
I am working on some scripts that will modify many records in a table if they meet certain conditions.
In case my script is wrong, I need a quick and easy way to revert back to the original values of a table.
Currently, I am doing the following
Right click on the database
Tasks ---> Generate Scripts
Then I select Drop - Create with data or data + schema
This never works because the table I need to backup has foreign key constraints.
I just want to generate a script that wipes the values and inserts the old ones to a table. Does anyone know a simple trick for this?
We want to update our out-of-sync tables in our database to match a different sql server database instance. We want to preserve the data in the database tables but will need to update contraints and column definitions. What is the easiest technique for accomplishing this?
Brute force, but fairly easy to script would be to:
On the current data base (schemas you want), right-click on the DB and select Tasks > Generate Scripts...
Change the relevant parameters for what you need and save the script file (make sure you select the options to script all the indexes, triggers, etc.).
Create a fresh staging DB and run the script there.
Export all the data from the out-of-sync DB to the staging DB.
Drop all the tables on the out-of-sync DB.
Run the script on the out-of-sync DB.
Import all the data into the out-of-sync DB from the staging DB.
Delete the staging DB.
Obviously, you'll need to verify your data at the various steps before you go dropping tables or databases.
I have a scenario where I have a central server and a node. Both server and node are capable of running PostgreSQL but the storage space on the node is limited. The node collects data at a high speed and writes the data to its local DB.
The server needs to replicate the data from the node. I plan on accomplishing this with Slony-I or Bucardo.
The node needs to be able to delete all records from its tables at a set interval in order to minimize disk space used. Should I use pgAgent with a job consisting of a script like
DELETE FROM tablex, tabley, tablez;
where the actual batch file to run the script would be something like
#echo off
C:\Progra~1\PostgreSQL\9.1\bin\psql -d database -h localhost -p 5432 -U postgres -f C:\deleteFrom.sql
?
I'm just looking for opinions if this is the best way to accomplish this task or if anyone knows of a more efficient way to pull data from a remote DB and clear that remote DB to save space on the remote node. Thanks for your time.
The most efficient command for you is the TRUNCATE command.
With TRUNCATE, you can chain up tables, like your example:
TRUNCATE tablex, tabley, tablez;
Here's the description from the postgres docs:
TRUNCATE quickly removes all rows from a set of tables. It has the same effect as an unqualified DELETE on each table, but since it does not actually scan the tables it is faster. Furthermore, it reclaims disk space immediately, rather than requiring a subsequent VACUUM operation. This is most useful on large tables.
You may also add CASCADE as a parameter:
CASCADE Automatically truncate all tables that have foreign-key references to any of the named tables, or to any tables added to the group due to CASCADE.
The two best options, depending on your exact needs and workflow, would be truncate, as #Bohemian suggested, or to create a new table, rename, then drop.
We use something much like the latter create/rename/drop method in one of our major projects. This has an advantage where you need to be able to delete some data, but not all data, from a table very quickly. The basic workflow is:
Create a new table with a schema identical to the old one
CREATE new_table LIKE ...
In a transaction, rename the old and new tables simultaneously:
BEGIN;
RENAME table TO old_table;
RENAME new_table TO table;
COMMIT;
[Optional] Now you can do stuff with the old table, while the new table is happily accepting new inserts. You can dump the data to your centralized server, run queries on it, or whatever.
Delete the old table
DROP old_table;
This is an especially useful strategy when you want to keep, say, 7 days of data around, and only discard the 8th day's data all at once. Doing a DELETE in this case can be very slow. By storing the data in partitions (one for each day), it is easy to drop an entire day's data at once.