Postgresql dump with data restriction - sql

I'm working on developing a fast way to make a clone of a database to test an application. My database has some specif tables that are quite big (+50GB), but the big majority of the tables only have a few MBs. On my current server, the dump + restore takes some hours. These bigs tables have date fields.
With the context in mind, my question is: Is possible to use some type of restrictions on table rows to select the data that is being dumped? e.g. On table X only dump the rows that date is Y.
If this is a possible show can I do it? if it's not possible what would be a good alternative?

You can use COPY SELECT whatever FROM yourtable WHERE ... TO '/some/file' to limit what you export.
COPY command

You could use row level security and create a policy that lets the dumping database user see only those rows that you want to dump (make sure that that user is neither a superuser nor owns the tables, because these users are exempt from row level security).
Then dump the database with that user, using the --enable-row-security option of pg_dump.

Related

Backing up portion of data in SQL

I have a huge schema containing billions of records, I want to purge data older than 13 months from it and maintain it as a backup in such a way that it can be recovered again whenever required.
Which is the best way to do it in SQL - can we create a separate copy of this schema and add a delete trigger on all tables so that when trigger fires, purged data gets inserted to this new schema?
Will there be only one record per delete statement if we use triggers? Or all records will be inserted?
Can we somehow use bulk copy?
I would suggest this is a perfect use case for the Stretch Database feature in SQL Server 2016.
More info: https://msdn.microsoft.com/en-gb/library/dn935011.aspx
The cold data can be moved to the cloud with your given date criteria without any applications or users being aware of it when querying the database. No backups required and very easy to setup.
There is no need for triggers, you can use job running every day, that will put outdated data into archive tables.
The best way I guess is to create a copy of current schema. In main part - delete all that is older then 13 months, in archive part - delete all for last 13 month.
Than create SP (or any SPs) that will collect data - put it into archive and delete it from main table. Put this is into daily running job.
The cleanest and fastest way to do this (with billions of rows) is to create a partitioned table probably based on a date column by month. Moving data in a given partition is a meta operation and is extremely fast (if the partition setup and its function is set up properly.) I have managed 300GB tables using partitioning and it has been very effective. Be careful with the partition function so dates at each edge are handled correctly.
Some of the other proposed solutions involve deleting millions of rows which could take a long, long time to execute. Model the different solutions using profiler and/or extended events to see which is the most efficient.
I agree with the above to not create a trigger. Triggers fire with every insert/update/delete making them very slow.
You may be best served with a data archive stored procedure.
Consider using multiple databases. The current database that has your current data. Then an archive or multiple archive databases where you move your records out from your current database to with some sort of say nightly or monthly stored procedure process that moves the data over.
You can use the exact same schema as your production system.
If the data is already in the database no need for a Bulk Copy. From there you can backup your archive database so it is off the sql server. Restore the database if needed to make the data available again. This is much faster and more manageable than bulk copy.
According to Microsoft's documentation on Stretch DB (found here - https://learn.microsoft.com/en-us/azure/sql-server-stretch-database/), you can't update or delete rows that have been migrated to cold storage or rows that are eligible for migration.
So while Stretch DB does look like a capable technology for archive, the implementation in SQL 2016 does not appear to support archive and purge.

Getting data from different database on different server with one SQL Server query

Server1: Prod, hosting DB1
Server2: Dev hosting DB2
Is there a way to query databases living on 2 different server with a same select query? I need to bring all the new rows from Prod to dev, using a query
like below. I will be using SQL Server DTS (import export data utility)to do this thing.
Insert into Dev.db1.table1
Select *
from Prod.db1.table1
where table1.PK not in (Select table1.PK from Dev.db1.table1)
Creating a linked server is the only approach that I am aware of for this to occur. If you are simply trying to add all new rows from prod to dev then why not just create a backup of that one particular table and pull it into the dev environment then write the query from the same server and database?
Granted this is a one time use and a pain for re-occuring instances but if it is a one time thing then I would recommend doing that. Otherwise make a linked server between the two.
To backup a single table in SQL use the SQl Server import and export wizard. Select the prod database as your datasource and then select only the prod table as your source table and make a new table in the dev environment for your destination table.
This should get you what you are looking for.
You say you're using DTS; the modern equivalent would be SSIS.
Typically you'd use a data flow task in an SSIS package to pull all the information from the live system into a staging table on the target, then load it from there. This is a pretty standard operation when data warehousing.
There are plenty of different approaches to save you copying all the data across (e.g. use a timestamp, use rowversion, use Change Data Capture, make use of the fact your primary key only ever gets bigger, etc. etc.) Or you could just do what you want with a lookup flow directly in SSIS...
The best approach will depend on many things: how much data you've got, what data transfer speed you have between the servers, your key types, etc.
When your servers are all in one Active Directory, and when you use Windows Authentification, then all you need is an account which has proper rights on all the databases!
You can then simply reference all tables like server.database.schema.table
For example:
insert into server1.db1.dbo.tblData1 (...)
select ... from server2.db2.dbo.tblData2;

How can I copy and overwrite data of tables from database1 to database2 in SQL Server

I have a database1 which has more than 500 tables and I have database2 which also has the same number of tables and in both the databases the name of tables are same.. some of the tables have different table definitions, for example a table reports in database1 has 9 columns and the table reports in database2 has 10.
I want to copy all the data from database1 to database2 and it should overwrite the same data and append the columns if structure does not match. I have tried the import export wizard in SQL Server 2008 but it gives an error when it comes to the last step of copying rows. I don't have the screen shot of that error right now, it is my office PC. It says that error inserting into the readonly column xyz, some times it says that vs_isbroken, for the read only column error as I mentioned a enabled the identity insert but it did not help..
Please help me. It is an opportunity in my office for me.
SSIS and SQL Server 2008 Wizards can be finicky tools.
If you get a "can't insert into column ABC", then it could be one of the following:
Inserting into a PK column -> when setting up the mappings, you need to indicate to overwrite the value
Inserting into a column with a smaller range -> for example from nvarchar(256) into nvarchar(50)
Inserting into a calculated column (pointed out by #Nick.McDermaid)
You could also get issues with referential integrity if your database uses this (most do).
If you're going to do this more often, then I suggest you build an SSIS package instead of using the wizard tooling. This way you will see warnings on all sorts of issues like the ones I've described above. You can then run your package on demand.
Another suggestion I would make, is that you insert DB1 into "stage" tables in DB2. These tables should have no relational integrity and will allow you to break the process into several steps as follows.
Stage the data from DB1 into DB2
Produce reports/queries on issues pertinent to your database/rules
Merge the data from stage tables into target tables using SQL
That last step is where you can use merge statements, or simple insert/updates depending on a key match. Using SQL here in the local database is then able to use set theory to manage the overlap of the two sets and figure out what is new or to be updated.
SSIS "can" do this, but you will not be able to do a bulk update using SSIS, whereas with SQL you can. SSIS would do what is known as RBAR (row by agonizing row), something slow and to be avoided.
I suggest you inform your seniors that this will take a little longer to ensure it is reliable and the results reportable. Then work step by step, reporting on each stages completion.
Another two small suggestions:
Create _Archive tables of each of the stage tables and add a Tstamp column to each. Merge into these after the stage step which will allow you to quickly see when which rows were introduced into DB2
After stage and before the SQL merge step, create indexes on your stage tables. This will improve the merge performance
Drop those Indexes after each merge, this will increase the bulk insert Performance
Basic on Staging (response to question clarification):
Links:
http://www.codeproject.com/Articles/173918/How-to-Create-your-First-SQL-Server-Integration-Se
http://www.jasonstrate.com/tag/31daysssis/
http://blogs.msdn.com/b/andreasderuiter/archive/2012/12/05/designing-an-etl-process-with-ssis-two-approaches-to-extracting-and-transforming-data.aspx
Staging is the act of moving data from one place to another without any checks.
First you need to create the target tables, the schema should match the source tables.
Open up BIDS and create a new Project and in it a new SSIS package.
In the package, create a connection for the source server and another for the destination.
Then create a data flow step, in the step create a data source for each table you want to copy from.
Connect each source to a new data destination and set the appropriate connection and table.
When done, save and do a test run.
Before the data flow step, you might like to add a SQL step that will truncate all the target tables.
If you're open to using tools then what about using something like Red Gate Sql Compare and Red Gate SQL Data Compare?
First I would use data compare to manage the schema differences, add the new columns you want to your destination database (database2) from the source (database1). Then with data compare you match the contents of the tables any columns it can't match based on names you specify how to handle. Then you can pick and choose what data you want to copy from your destination. So you'll see what data is new and what's different (you can delete data in the destination that's not in the source or ignore it). You can either have the tool do the work or create you a script to run when you want.
There's a 15 day trial if you want to experiment.
Seems like maybe you are looking for Replication technology as is offered by SQL Server Replication.
Well, if i understood your requirement correctly, you need to make database2 a replica of database1. Why not take a full backup of database1 and restore it as database2? Your database2 will be exactly what database1 is at the time of backup.

Updating/Inserting tables in one database from another database

How can I sync two databases and do a manual refresh on the entities on either of the database whenever I want?
Let's say I have two databases DB1(prod) and DB2(dev). I want to update/insert only a few tables from prod DB to dev DB. How could I achieve this? Is this possible instead of DBlink since I do not have privileges to create a database link?
If you only want to do a manual refresh set up an import/export/datapump script to copy the data across if there is not too much data involved. If there is a large amount of data you could write some pl/sql as described above to only move the new/changed rows. This will be easier if your data has fields such as created/updated_on

Using SSIS to create new Database from two separate databases

I am new to SSIS.I got the task have according to the scenario as explained.
Scenario:
I have two databases A and B on different machines and have around 25 tables and 20 columns with relationships and dependencies. My task is to create a database C with selected no of tables and in each table I don't require all the columns but selected some. Conditions to be met are that the relationships should be intact and created automatically in new database.
What I have done:
I have created a package using the transfer SQL Server object task to transfer the tables and relationships.
then I have manually edited the columns that are not required
and then I transferred the data using the data source and destination
My question is: can I achieve all these things in one package? Also after I have transferred the data how can I schedule the package to just transfer the recently inserted rows in the database to the new database?
Please help me
thanks in advance
You can schedule the package by using a SQL Server Agent Job - one of the options for a job step is run SSIS package.
With regard to transferring new rows, I would either:
Track your current "position" in another table, assumes you have either an ascending key or a time stamp column - load the current position into an SSIS variable, use this variable in the WHERE statement of your data source queries.
Transfer all data across into "dump" copies of each table (no relationships/keys etc required just the same schema) & use a T-SQL MERGE statement to load new rows in, then truncate "dump" tables.
Hope this makes sense - its a bit difficult to get across in writing.