Postgres transfer data from local to remote database - sql

I have two servers (let's say x.x.x.x - main one and y.y.y.y - secondary one)
On the main server I have a Django application running which stores its data into Postgres database, secondary one is absolutely empty.
Every minute it creates about 900+ lines in one table, so, eventually, it became over 3M lines in that table and the processing of all those objects (filtering, sorting) became really slow because of its amount. However, I need only those lines which were written within last 3 days, no more. Still, I cannot simply remove the data because I need it for analysis in the future, so I need to keep it somewhere.
What I think about is creating another database on the secondary server and keep all the extra data there. So I need to transfer all the data that is older than 3 days from local (primary) server to the remote (secondary) server.
The regularity can be achieved using cron which is a trivial task.
What's not trivial is the command I need to execute in cron. I don't think there is a built-in SQL command to do this, so I'm wondering if this is possible at all.
I think the command should look something like this
INSERT INTO remote_server:table
SELECT * FROM my_table;
Also, I think it's also worth mentioning that the table I'm having troubles with is being constantly updated as I've written above. So, may be these updates are causing speed problems when executing some filter or sorting queries.

You have several options:
If you want to stick with the manual copy, you can setup a foreign server that connects from the secondary to the main. Then create a foreign table to access the table from the main server. Maybe access through the foreign table is already fast enough so that you don't actually need to physically copy the data. But if you want to have a "disconnected" copy, you can simply run insert into local_table select * from foreign_table or create a materialized view that is refreshed through cron.
Another solution that is a bit easier to setup (but probably slower) is to use the dblink module to access the remote server.
And finally you have the option to setup logical replication for that table from the main server to the secondary. Then you don't need any cron job, as any changes on the primary will automatically be applied to the table on the secondary server.

Related

Can I use postgres_fdw without foreign tables defined?

I have a production database "PRODdb1", with a read-only user account. I have a need to query(select statement) this database and insert the data into a secondary database named "RPTdb1". I originally planned to just create a temp table in PRODdb1 from my select, but permissions are the issue.
I've read abut dblink & postgres_fdw, but are either of these a solution for my issue? I wouldn't be creating foreign tables because my SELECT is joining many tables from PRODdb1, so I'm unfamiliar if postgres_fdw would still be an option for my use case.
Another option would be any means of getting the results of the SELECT to a .CSV file or something. My main blocker here is that I only have a read-only user to work with, but no way around that issue.
The simple answer is no. You can not use postgres_fdw without defining a foreign table in your RPTdb1. This should not be much of an issue though, since it is quite easy to create the foreign tables.
I am in much the same boat as you. We use a 3rd party product (based on Postgres 9.3) for our production database and the user roles we have are very restrictive (i.e. read-only access, no replication, no ability to create triggers/functions/tables/etc).
I believe that postgres_fdw has the functionality you are looking for, with one caveat. Your local reporting server needs to be running PostgreSQL version 10 (or 9.6 at a minimum). We currently use 9.3 on our local server and while simple queries work beautifully, anything more complicated takes forever, because the FDW in 9.3 tries to pull all data in the table before it is able to do JOINs or even use the WHERE statement.
version 9.6: Pushes down JOIN to the remote database before returning results.
version 10: Pushes down aggregates such as COUNT and SUM to the remote database before returning results.
(I am not sure which version adds the ability to push down WHERE statements to the remote DB, but I know it was not possible in 9.5).
We are in the process of upgrading our local server to version 10 this week. I can try to keep you updated with our progress, feel free to do the same.

Temporary Tables Quick Guide

I have a structured database and software to handle it and I wanted to setup a demo version based off of a simple template version. I'm reading through some resources on temporary tables but I have questions.
What is the best way to go about cloning a "temporary" database while keeping a clean list of databases?
From what I've seen, there are two ways to do this - temporary local versions that are terminated at the end of the session, and tables that are stored in the database until deleted by the client or me.
I think I would prefer the 2nd option, because I would like to be able to see what they do with it. However, I do not want add a ton of throw-away databases and clutter my system.
How can I a) schedule these for deletion after say 30 days and b) if possible, keep these all under one umbrella, or in other words, is there a way to keep them out of my main list of databases and grouped by themselves.
I've thought about having one database and then serving up the information by using a unique ID for the user and 'faux indexes' so that it appears as 1,2,3 instead of 556,557,558 to solve B. I'm unsure how I could solve A, other than adding a date and protected columns and having a script that runs daily and deletes if over 30 days and not protected.
I apologize for the open-ended question, but the resources I've found are a bit ambiguous.
These aren't true temp tables in the sense that your DBMS knows them. What you're looking for is a way to have a demo copy of your database, probably with a cut-down data set. It's really no different from having any other non-production copy of your database.
Don't do this on your production database server.
Do not do this on your production database server.
Script the creation of your database schema. Depending on the DBMS you're using, this may be pretty easy. If you've got a good development/deployment/maintenance process for your system, this should already exist.
Create your database on the non-production server using the script(s) generated in the previous step. Use an easily-identifiable naming convention, like starting the database name with demo.
Load any data required into the tables.
Point the demo version of your app (that's running on your non-production servers) at this new database.
Create a script/process/job which looks at your database server and drops any databases that match your demo DB naming convention and were created more than 30 days ago.
Without details about your actual environment, people can't give concrete examples/sample code/instructions.
If you cannot run a second, independent database server for these demos, then you will have to make do with your production server. This is still a bad idea because of potential security exposures and performance impact on your production database (constrained resources).
Create a complete copy of your database (or at least the schema, with a reduced data set) for each demo.
Create a unique set of credentials for each of these demo databases. This account should have access to only its demo database.
Configure the demo instance(s) of your application to connect to the demo database
Here's why I'm pushing so hard for separate databases: If you keep copying your "demo" tables within the database, you will have to update your application code to point at those tables each time you do a new demo. Once you start doing this, you're taking a big risk with your demos - the code you keep changing isn't really the application you're running in production anymore. And if you miss one of those changes, you'll get unexpected results at best, and mangling of your production data at worst.

Best way to archive/backup tables and changes in a large database

I have an interesting issue and requirement for a large multi-schema database.
-The database is around 130Gb in Size.
-It is a multi Schema database, each customer has a schema.
-We currently have 102,247 tables in the system.
-Microsoft SQL Server 2k8 r2
This is due to customisation requirements of customers, all using a single defined front end.
The issue we have is that our database backups become astronomical and getting a database restore done for retrieval of lost/missing/incorrect data is a nightmare. The initial product did not have defined audit trails and we don't have 'changes' to data stored, we simply have 1 version of data.
getting lost data back basically means restoring a full 130GB backup and loading differentials/transaction files to get the data.
We want to introduce a 'Changeset' for each important table within each schema. essentially holding a set of the data, then any modified/different data as it is saved - every X number of minutes. This will have to be a SQL job initially, but I want to know what would be the best method.
Essentially I would run a script to insert the 'backup' tables into each schema for the tables we wish to keep backed up.
Then run a job every X minutes to cycle through each schema and insert current - then new/changed data as it spots a change. (based on the modifiedDate of the row) It will then retain this changelog for around a month before self-overwriting.
We still have our larger backups, but we wont need to keep a larger retention period. My point is, what is the best and most efficient method of checking for a changed data and performing an insert.
My gut feeling would be :
INSERT INTO BACKUP_table (UNIQUE ID, col1,col2,col3)
select col1,col2,col3 from table where and ModifiedDate < DATEADD(mi,+90,Current_TimeStamp)
*rough SQL
This would have to be in a loop to go through all schemas and run this. A number of tables wont have changed data.
Is this even a good method?
What does SO think?
My first response would be to consider keeping each customer in their own database instead of their own schema within a massive database. The key benefits to doing this are:
much less stress on the metadata for a single database
you can perform backups for each customer on whatever schedule you like
when a certain customer has high activity you can move them easily
I managed such a system for several years at my previous job and managing 500 databases was no more complex than managing 10, and the only difference to your applications is the database part of the connection string (which is actually easier to make queries adapt to than a schema prefix).
If you're really committed to keeping everyone in a single database, then what you can consider doing is storing your important tables inside of each schema within their own filegroup, and move everything out of the primary filegroup. Now you can backup those filegroups independently and, based on solely the full primary backup and a piecemeal restore of the individual filegroup backup, you can bring just that customer's schema online in another location, and retrieve the data you're after (maybe copying it over to the primary database using import/export, BCP, or simple DML queries), without having to completely restore the entire database. Moving all user data out of the primary filegroup minimizes the time it takes to restore that initial backup and get you on to restoring the specific customer's filegroup. While this makes your backup/recovery strategy a little more complex, it does achieve what you're after I believe.
Another option is to use a custom log shipping implementation with an intentional delay. We did this for a while by shipping our logs to a reporting server, but waiting 12 hours before applying them. This gave us protection from customers shooting themselves in the foot and then requiring a restore - if they contacted us within 12 hours of their mistake, we likely already had the "before-screw-up" data online on the reporting server, making it trivial to fix it on the primary server. It also doubled as a reporting server for reports looking at data older than 12 hours, taking substantial load away from the primary server.
You can also consider change data capture but you will obviously need to test the performance and the impact on the rest of your workload. This solution also will depend on the edition of SQL Server you're using, since it is not available in Standard, Web, Workgroup, etc.

Is there downside to creating and dropping too many tables on SQL Server

I would like to know if there is an inherent flaw with the following way of using a database...
I want to create a reporting system with a web front end, whereby I query a database for the relevant data, and send the results of the query to a new data table using "SELECT INTO". Then the program would make a query from that table to show a "page" of the report. This has the advantage that if there is a lot of data, this can be presented a little at a time to the user as pages. The same data table can be accessed over and over while the user requests different pages of the report. When the web session ends, the tables can be dropped.
I am prepared to program around issues such as tracking the tables and ensuring they are dropped when needed.
I have a vague concern that over a long period of time, the database itself might have some form of maintenance problems, due to having created and dropped so many tables over time. Even day by day, lets say perhaps 1000 such tables are created and dropped.
Does anyone see any cause for concern?
Thanks for any suggestions/concerns.
Before you start implementing your solution consider using SSAS or simply SQL Server with a good model and properly indexed tables. SQL Server, IIS and the OS all perform caching operations that will be hard to beat.
The cause for concern is that you're trying to write code that will try and outperform SQL Server and IIS... This is a classic example of premature optimization. Thousands and thousands of programmer hours have been spent on making sure that SQL Server and IIS are as fast and efficient as possible and it's not likely that your strategy will get better performance.
First of all: +1 to #Paul Sasik's answer.
Now, to answer your question (if you still want to go with your approach).
Possible causes of concern if you use VARBINARY(MAX) columns (from the MSDN)
If you drop a table that contains a VARBINARY(MAX) column with the
FILESTREAM attribute, any data stored in the file system will not be
removed.
If you do decide to go with your approach, I would use global temporary tables. They should get DROPped automatically when there are no more connections using them, but you can still DROP them explicitly.
In your query you can check if they exist or not and create them if they don't exist (any longer).
IF OBJECT_ID('mydb..##temp') IS NULL
-- create temp table and perform your query
this way, you have most of the logic to perform your queries and manage the temporary tables together, which should make it more maintainable. Plus they're built to be created and dropped, so it's quite safe to think SQL Server would not be impacted in any way by creating and dropping a lot of them.
1000 per day should not be a concern if you talk about small tables.
I don't know sql-server, but in Oracle you have the concept of temporary table(small article and another) . The data inserted in this type of table is available only on the current session. when the session ends, the data "disapear". In this case you don't need to drop anything. Every user insert in the same table, and his data is not visible to others. Advantage: less maintenance.
You may check if you have something simmilar in sql-server.

Best practices for writing SQL scripts for deployment

I was wondering what are the best practices in order to write SQL scripts to set up databases for production and/or development, for instance:
Should I include the CREATE DATABASE statement?
Should I create users for the database in the same script?
Is correct to disable FK check before executing the body of the script?
May I include the hole script in a transaction?
Is better to generate 1 script per database than one script for all of them?
Thanks!
The problem with your question is is hard to answer as it depends on the way the scripts are used in what you are trying to achieve. you also don't say which DB server you are using as there are tools provided which can make some tasks easier.
Taking your points in order, here are some suggestions, which will probably be very different to everyone elses :)
Should I include the CREATE DATABASE
statement?
What alternative are you thinking of using? If your question is should you put the CREATE DATABASE statement in the same script as the table creation it depends. When developing DB I use a separate create DB script as I have a script to drop all objects and so I don't need to create the database again.
Should I create users for the database in the same script?
I wouldn't, simply because the users may well change but your schema has not. Might as well manage those changes in a smaller script.
Is correct to disable FK check before executing the body of the script?
If you are importing the data in an attempt to recover the database then you may well have to if you are using auto increment IDs and want to keep the same values. Also you may end up importing the tables "out of order" an not want checks performed.
May I include the whole script in a transaction?
Yes, you can, but again it depends on the type of script you are running. If you are importing data after rebuilding a db then the whole import should work or fail. However, your transaction file is going to be huge during the import.
Is better to generate 1 script per database than one script for all of them?
Again, for maintenance purposes it's probably better to keep them separate.
This probably depends what kind of database and how it is used and deployed. I am developing a n-tier standard application that is deployed at many different customer sites.
I do not add a CREATE DATABASE statement in the script. Creating the the database is a part of the installation script which allows the user to choose server, database name and collation
I have no knowledge about the users at my customers sites so I don't add create users statements also the only user that needs access to the database is the user executing the middle tire application.
I do not disable FK checks. I need them to protect the consistency of the database, even if it is I who wrote the body scripts. I use FK to capture my errors.
I do not include the entire script in one transaction. I require from the users to take a backup of the db before they run any db upgrade scripts. For creating of a new database there is nothing to protect so running in a transaction is unnecessary. For upgrades there are sometimes extensive changes to the db. A couple of years ago we switched from varchar to nvarchar in about 250 tables. Not something you would like to do in one transaction.
I would recommend you to generate one script per database and version control the scripts separately.
Direct answers, please ask if you need to expand on any point
* Should I include the CREATE DATABASE statement?
Normally I would include it since you are creating and owning the database.
* Should I create users for the database in the same script?
This is also a good idea, especially if your application uses specific users.
* Is correct to disable FK check before executing the body of the script?
If the script includes data population, then it helps to disable it so that the order is not too important, otherwise you can get into complex scripts to insert (without fk link), create fk record, update fk column.
* May I include the hole script in a transaction?
This is normally not a good idea. Especially if data population is included as the transaction can become quite unwieldy large. Since you are creating the database, just drop it and start again if something goes awry.
* Is better to generate 1 script per database than one script for all of them?
One per database is my recommendation so that they are isolated and easier to troubleshoot if the need arises.
For development purposes it's a good idea to create one script per database object (one script for each table, stored procedure, etc). If you check them into your source control system that way then developers can check out individual objects and you can easily keep track of versions and know what changed and when.
When you deploy you may want to combine the changes for each release into one single script. Tools like Red Gate SQL compare or Visual Studio Team System will help you do that.
Should I include the CREATE DATABASE statement?
Should I create users for the database in the same script?
That depends on your DBMS and your customer.
In an Oracle environment you will probably never be allowed to do such a thing (mainly because in the Oracle world a "database" is something completely different than e.g. in the PostgreSQL or MySQL world).
Sometimes the customer will have a DBA that won't let you create databases (or schemas or users - depending on the DBMS in use). So you will need to supply that information to the DBA in order for him/her to prepare the environment for your script.
May I include the hole script in a transaction?
That totally depends on the DBMS that you are using.
Some DBMS don't support transactional DDL and will implicitely commit any transaction when you execute a DDL statement, so you need to consider the order of your installation script.
For populating the tables with data I would definitely try to do that in a single transaction, but again this depends on your DBMS.
Some DBMS are faster if you commit only once or very seldomly (Oracle and PostgreSQL fall into this category) but will slow down if you commit more often.
Other DBMS handle smaller but more transactions better and will slow down if the transactions get too big (SQL Server and MySQL tend to fall into that direction)
The best practices will differ considerably on whether it is the first time set-up or a new version being pushed. For the first time set-up yes you need create database and create table scripts. For a new version, you need to script only the changes from the previous version, so no create database and no create table unless it is a new table. Now you need alter table statements becasue you don't want to lose the existing data. I do usually write stored procs, functions and views with a drop and create statment as dropping those pbjects doesn't generally affect the underlying data.
I find it best to create all database changes with scripts that are stored in source control under the version. So if a client is new, you run the create version 1.0 scripts, then apply all the other versions in order. If a client is just upgrading from version 1.2 to version 1.3, then you run just the scripts in version 1.3 source control repository. This would also include scripts to populate or add records to lookup tables.
For transactions you may want to break them up into several chunks not to leave a prod database locked in one transaction.
We also write reversal scripts to return to the old version if need be. This makes life easier if you have a part of a change that causes unanticipated problems on prod (usually performance issues).