Exporting from one schema and importing to another with pg_dump

Exporting from one schema and importing to another with pg_dump - sql

I have a table called units, which exists in two separate schemas within the same database (we'll call them old_schema, and new_schema). The structure of the table in both schemas are identical. The only difference is that the units table in new_schema is presently empty.
I am attempting to export the data from this table in old_schema and import it into new_schema. I used pg_dump to handle the export, like so:
pg_dump -U username -p 5432 my_database -t old_schema.units -a > units.sql
I then attempted to import it using the following:
psql -U username -p 5432 my_database -f units.sql
Unfortunately, this appeared to try and reinsert back in to the old_schema. Looking at the generated sql file, it seems there is a line, which I think is causing this:
SET search_path = mysql_migration, pg_catalog;
I can, in fact, alter this line to read
SET search_path = public;
And this does prove successful, but I don't believe this is the "correct" way to accomplish this.
Question: When importing data via a script generated through pg_dump, how can I specify in to which schema the data should go without altering the generated file?

There are two main issues here based on the scenario you described.
The difference in the schemas, to which you alluded.
The fact that by dumping the whole table via pg_dump, you're dumping the table definition also, which will cause issues if the table is already present in the destination schema.
To dump only the data, if the table already exists in the destination database (which appears to be the case based on your scenario above), you can dump the table using pg_dump with the --data-only flag.
Then, to address the schema issue, I would recommend doing a search/replace (sed would be a quick way to do it) on the output sql file, replacing old_schema with new_schema.
That way, it will apply the data (which is all that would be in the file, not the table definition itself) to the table in new_schema.
If you need a solution on a broader level to support, say, dynamically named schemas, you can use the same search/replace trick with sed, but instead of replacing it with new_schema, replace it with some placeholder text, say, $$placeholder_schema$$ (something highly unlikely to appear as as token elsewhere in the file), and then, when you need to apply that file to a particular schema, use the original file as a template, copy it, and then modify the copy using sed or similar, replacing the placeholder token with the desired on-the-fly schema name.
You can set some options for psql on the command line, such as --set AUTOCOMMIT=off, however, a similar approach with SEARCH_PATH does not appear to have any effect.
Instead, it needs the form \set SEARCH_PATH to <path>, which can be specified with the -c option, but not in combination with -f (it's either or).
Given that, I think modifying the file with sed is probably the best all around option in this case for use with -f.

Related

[pg_dump]Extract only tables and views from a schema

I am trying to extract the ddl of all the tables and views which are present in a schema of a postgres db.
I am able to export, but in the export it is also including create function and other objects.
Is there any way using which we can only extract tables and views from a schema? Either via limiting the access to objects or by editting a file?
Thanks for the help!!

some options maybe redundant.
pg_dump --dbname=test15 --schema=public --schema-only -N '*catalog*' -t 'public.*' > test.sql
main gottcha:
-t pattern
--table=pattern
Dump only tables with names matching pattern.
The manual have many examples. You can follow it through.

Incrementally importing data to a PostgreSQL database

Situation:
I have a PostgreSQL-database that is logging data from sensors in a field-deployed unit (let's call this the source database). The unit has a very limited hard-disk space, meaning that if left untouched, the data-logging will cause the disk where the database is residing to fill up within a week. I have a (very limited) network link to the database (so I want to compress the dump-file), and on the other side of said link I have another PostgreSQL database (let's call that the destination database) that has a lot of free space (let's just, for argument's sake, say that the source is very limited with regard to space, and the destination is unlimited with regard to space).
I need to take incremental backups of the source database, append the rows that have been added since last backup to the destination database, and then clean out the added rows from the source database.
Now the source database might or might not have been cleaned since a backup was last taken, so the destination database needs to be able to only imported the new rows in an automated (scripted) process, but pg_restore fails miserably when trying to restore from a dump that has the same primary key numbers as the destination database.
So the question is:
What is the best way to restore only the rows from a source that are not already in the destination database?
The only solution that I've come up with so far is to pg_dump the database and restore the dump to a new secondary-database on the destination-side with pg_restore, then use simple sql to sort out which rows already exist in my main-destination database. But it seems like there should be a better way...
(extra question: Am I completely wrong in using PostgreSQL in such an application? I'm open to suggestions for other data-collection alternatives...)

A good way to start would probably be to use the --inserts option to pg_dump. From the documentation (emphasis mine) :
Dump data as INSERT commands (rather than COPY). This will make
restoration very slow; it is mainly useful for making dumps that can
be loaded into non-PostgreSQL databases. However, since this option
generates a separate command for each row, an error in reloading a row
causes only that row to be lost rather than the entire table contents.
Note that the restore might fail altogether if you have rearranged
column order. The --column-inserts option is safe against column order
changes, though even slower.
I don't have the means to test it right now with pg_restore, but this might be enough for your case.
You could also use the fact that from the version 9.5, PostgreSQL provides ON CONFLICT DO ... for INSERTs. Use a simple scripting language to add these to the dump and you should be fine. I haven't found an option for pg_dump to add those automatically, unfortunately.

You might google "sporadically connected database synchronization" to see related solutions.
It's not a neatly solved problem as far as I know - there are some common work-arounds, but I am not aware of a database-centric out-of-the-box solution.
The most common way of dealing with this is to use a message bus to move events between your machines. For instance, if your "source database" is just a data store, with no other logic, you might get rid of it, and use a message bus to say "event x has occurred", and point the endpoint of that message bus at your "destination machine", which then writes that to your database.
You might consider Apache ActiveMQ or read "Patterns of enterprise integration".

#!/bin/sh
PSQL=/opt/postgres-9.5/bin/psql
TARGET_HOST=localhost
TARGET_DB=mystuff
TARGET_SCHEMA_IMPORT=copied
TARGET_SCHEMA_FINAL=final
SOURCE_HOST=192.168.0.101
SOURCE_DB=slurpert
SOURCE_SCHEMA=public
########
create_local_stuff()
{
${PSQL} -h ${TARGET_HOST} -U postgres ${TARGET_DB} <<OMG0
CREATE SCHEMA IF NOT EXISTS ${TARGET_SCHEMA_IMPORT};
CREATE SCHEMA IF NOT EXISTS ${TARGET_SCHEMA_FINAL};
CREATE TABLE IF NOT EXISTS ${TARGET_SCHEMA_FINAL}.topic
( topic_id INTEGER NOT NULL PRIMARY KEY
, topic_date TIMESTAMP WITH TIME ZONE
, topic_body text
);
CREATE TABLE IF NOT EXISTS ${TARGET_SCHEMA_IMPORT}.tmp_topic
( topic_id INTEGER NOT NULL PRIMARY KEY
, topic_date TIMESTAMP WITH TIME ZONE
, topic_body text
);
OMG0
}
########
find_highest()
{
${PSQL} -q -t -h ${TARGET_HOST} -U postgres ${TARGET_DB} <<OMG1
SELECT MAX(topic_id) FROM ${TARGET_SCHEMA_IMPORT}.tmp_topic;
OMG1
}
########
fetch_new_data()
{
watermark=${1-0}
echo ${watermark}
${PSQL} -h ${SOURCE_HOST} -U postgres ${SOURCE_DB} <<OMG2
\COPY (SELECT topic_id, topic_date, topic_body FROM ${SOURCE_SCHEMA}.topic WHERE topic_id >${watermark}) TO '/tmp/topic.dat';
OMG2
}
########
insert_new_data()
{
${PSQL} -h ${TARGET_HOST} -U postgres ${TARGET_DB} <<OMG3
DELETE FROM ${TARGET_SCHEMA_IMPORT}.tmp_topic WHERE 1=1;
COPY ${TARGET_SCHEMA_IMPORT}.tmp_topic(topic_id, topic_date, topic_body) FROM '/tmp/topic.dat';
INSERT INTO ${TARGET_SCHEMA_FINAL}.topic(topic_id, topic_date, topic_body)
SELECT topic_id, topic_date, topic_body
FROM ${TARGET_SCHEMA_IMPORT}.tmp_topic src
WHERE NOT EXISTS (
SELECT *
FROM ${TARGET_SCHEMA_FINAL}.topic nx
WHERE nx.topic_id = src.topic_id
);
OMG3
}
########
delete_below_watermark()
{
watermark=${1-0}
echo ${watermark}
${PSQL} -h ${SOURCE_HOST} -U postgres ${SOURCE_DB} <<OMG4
-- delete not yet activated; COUNT(*) instead
-- DELETE
SELECT COUNT(*)
FROM ${SOURCE_SCHEMA}.topic WHERE topic_id <= ${watermark}
;
OMG4
}
######## Main
#create_local_stuff
watermark="`find_highest`"
echo 'Highest:' ${watermark}
fetch_new_data ${watermark}
insert_new_data
echo 'Delete below:' ${watermark}
delete_below_watermark ${watermark}
# Eof
This is just an example. Some notes:
I assume a non-decreasing serial PK for the table; in most cases it could also be a timestamp
for simplicity, all the queries are run as user postgres, you might need to change this
the watermark method will guarantee that only new records will be transmitted, minimising bandwidth usage
the method is atomic, if the script crashes, nothing is lost
only one table is fetched here, but you could add more
because I'm paranoid, I us a different name for the staging table and put it into a separate schema
The whole script does two queries on the remote machine (one for fetch one for delete); you could combine these.
but there is only one script (executing from the local=target machine) involved.
The DELETE is not yet active; it only does a count(*)

Move table from server1 to server2

I have two Postgresql servers (Windows), and I am trying to transfer a table from server1 to server2. This table is around 200 MB size as it contains binary data.
I want to put the table into a usb stick and then move it to the second server. (assume the two servers are not connected by a LAN).
What is the simplest way to do that? Can you describe the way with command.

The easiest way would probably be to use pg_dump.
I haven't used it on Windows so I don't know the actual path to it, but it should be in the Postgres\bin directory and you need to execute it in a shell window (like PowerShell or CMD).
Assuming you have console access to each server, and that the table already exists in the second database:
pg_dump -a -b -Fc -t <tablename> <databasename> > <path to dump file>
Then when you have moved it to the new server.
pg_restore -a -Fc -d <databasename> <path to dump file>
If you don't have direct access to each server, then you need to add the connection parameters to each command:
-h <server> -U <username>
Quick description of the parameters:
-a : dumps only the data and not the schema definition. This should be removed if the table is not already in place on the new server
-b : dumps blobs. You mentioned there are binary data in the table, if they are stored as large objects, this parameter needs to be included, otherwise you can skip it.
-Fc : The format to dump the data as. c stands for Postgres custom format, which is better suited for moving binary data. You could change it to d to use a directory format since you're using 9.2, but I prefer the custom format still. d however is useful when dumping large databases since it stores each table in one file within the specified directory.
-t : Specifies that you want to dump a table and not the entire database.
-d : the database that you want to restore to (this parameter can be used in pg_dump as well, but not needed if specified as above)
There is a possibility that you need to add the -t parameter to the restore as well, but as far as I remember, it should not be necessary since you only have that table in the dump (however, if you had several tables in the dump, for instance if it is a complete dump of the database, this can be used to only restore parts of the database).

Save PL/pgSQL output from PostgreSQL to a CSV file

What is the easiest way to save PL/pgSQL output from a PostgreSQL database to a CSV file?
I'm using PostgreSQL 8.4 with pgAdmin III and PSQL plugin where I run queries from.

Do you want the resulting file on the server, or on the client?
Server side
If you want something easy to re-use or automate, you can use Postgresql's built in COPY command. e.g.
Copy (Select * From foo) To '/tmp/test.csv' With CSV DELIMITER ',' HEADER;
This approach runs entirely on the remote server - it can't write to your local PC. It also needs to be run as a Postgres "superuser" (normally called "root") because Postgres can't stop it doing nasty things with that machine's local filesystem.
That doesn't actually mean you have to be connected as a superuser (automating that would be a security risk of a different kind), because you can use the SECURITY DEFINER option to CREATE FUNCTION to make a function which runs as though you were a superuser.
The crucial part is that your function is there to perform additional checks, not just by-pass the security - so you could write a function which exports the exact data you need, or you could write something which can accept various options as long as they meet a strict whitelist. You need to check two things:
Which files should the user be allowed to read/write on disk? This might be a particular directory, for instance, and the filename might have to have a suitable prefix or extension.
Which tables should the user be able to read/write in the database? This would normally be defined by GRANTs in the database, but the function is now running as a superuser, so tables which would normally be "out of bounds" will be fully accessible. You probably don’t want to let someone invoke your function and add rows on the end of your “users” table…
I've written a blog post expanding on this approach, including some examples of functions that export (or import) files and tables meeting strict conditions.
Client side
The other approach is to do the file handling on the client side, i.e. in your application or script. The Postgres server doesn't need to know what file you're copying to, it just spits out the data and the client puts it somewhere.
The underlying syntax for this is the COPY TO STDOUT command, and graphical tools like pgAdmin will wrap it for you in a nice dialog.
The psql command-line client has a special "meta-command" called \copy, which takes all the same options as the "real" COPY, but is run inside the client:
\copy (Select * From foo) To '/tmp/test.csv' With CSV DELIMITER ',' HEADER
Note that there is no terminating ;, because meta-commands are terminated by newline, unlike SQL commands.
From the docs:
Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.
Your application programming language may also have support for pushing or fetching the data, but you cannot generally use COPY FROM STDIN/TO STDOUT within a standard SQL statement, because there is no way of connecting the input/output stream. PHP's PostgreSQL handler (not PDO) includes very basic pg_copy_from and pg_copy_to functions which copy to/from a PHP array, which may not be efficient for large data sets.

There are several solutions:
1 psql command
psql -d dbname -t -A -F"," -c "select * from users" > output.csv
This has the big advantage that you can using it via SSH, like ssh postgres#host command - enabling you to get
2 postgres copy command
COPY (SELECT * from users) To '/tmp/output.csv' With CSV;
3 psql interactive (or not)
>psql dbname
psql>\f ','
psql>\a
psql>\o '/tmp/output.csv'
psql>SELECT * from users;
psql>\q
All of them can be used in scripts, but I prefer #1.
4 pgadmin but that's not scriptable.

In terminal (while connected to the db) set output to the cvs file
1) Set field seperator to ',':
\f ','
2) Set output format unaligned:
\a
3) Show only tuples:
\t
4) Set output:
\o '/tmp/yourOutputFile.csv'
5) Execute your query:
:select * from YOUR_TABLE
6) Output:
\o
You will then be able to find your csv file in this location:
cd /tmp
Copy it using the scp command or edit using nano:
nano /tmp/yourOutputFile.csv

CSV Export Unification
This information isn't really well represented. As this is the second time I've needed to derive this, I'll put this here to remind myself if nothing else.
Really the best way to do this (get CSV out of postgres) is to use the COPY ... TO STDOUT command. Though you don't want to do it the way shown in the answers here. The correct way to use the command is:
COPY (select id, name from groups) TO STDOUT WITH CSV HEADER
Remember just one command!
It's great for use over ssh:
$ ssh psqlserver.example.com 'psql -d mydb "COPY (select id, name from groups) TO STDOUT WITH CSV HEADER"' > groups.csv
It's great for use inside docker over ssh:
$ ssh pgserver.example.com 'docker exec -tu postgres postgres psql -d mydb -c "COPY groups TO STDOUT WITH CSV HEADER"' > groups.csv
It's even great on the local machine:
$ psql -d mydb -c 'COPY groups TO STDOUT WITH CSV HEADER' > groups.csv
Or inside docker on the local machine?:
docker exec -tu postgres postgres psql -d mydb -c 'COPY groups TO STDOUT WITH CSV HEADER' > groups.csv
Or on a kubernetes cluster, in docker, over HTTPS??:
kubectl exec -t postgres-2592991581-ws2td 'psql -d mydb -c "COPY groups TO STDOUT WITH CSV HEADER"' > groups.csv
So versatile, much commas!
Do you even?
Yes I did, here are my notes:
The COPYses
Using /copy effectively executes file operations on whatever system the psql command is running on, as the user who is executing it1. If you connect to a remote server, it's simple to copy data files on the system executing psql to/from the remote server.
COPY executes file operations on the server as the backend process user account (default postgres), file paths and permissions are checked and applied accordingly. If using TO STDOUT then file permissions checks are bypassed.
Both of these options require subsequent file movement if psql is not executing on the system where you want the resultant CSV to ultimately reside. This is the most likely case, in my experience, when you mostly work with remote servers.
It is more complex to configure something like a TCP/IP tunnel over ssh to a remote system for simple CSV output, but for other output formats (binary) it may be better to /copy over a tunneled connection, executing a local psql. In a similar vein, for large imports, moving the source file to the server and using COPY is probably the highest-performance option.
PSQL Parameters
With psql parameters you can format the output like CSV but there are downsides like having to remember to disable the pager and not getting headers:
$ psql -P pager=off -d mydb -t -A -F',' -c 'select * from groups;'
2,Technician,Test 2,,,t,,0,,
3,Truck,1,2017-10-02,,t,,0,,
4,Truck,2,2017-10-02,,t,,0,,
Other Tools
No, I just want to get CSV out of my server without compiling and/or installing a tool.

New version - psql 12 - will support --csv.
psql - devel
--csv
Switches to CSV (Comma-Separated Values) output mode. This is equivalent to \pset format csv.
csv_fieldsep
Specifies the field separator to be used in CSV output format. If the separator character appears in a field's value, that field is output within double quotes, following standard CSV rules. The default is a comma.
Usage:
psql -c "SELECT * FROM pg_catalog.pg_tables" --csv postgres
psql -c "SELECT * FROM pg_catalog.pg_tables" --csv -P csv_fieldsep='^' postgres
psql -c "SELECT * FROM pg_catalog.pg_tables" --csv postgres > output.csv

If you're interested in all the columns of a particular table along with headers, you can use
COPY table TO '/some_destdir/mycsv.csv' WITH CSV HEADER;
This is a tiny bit simpler than
COPY (SELECT * FROM table) TO '/some_destdir/mycsv.csv' WITH CSV HEADER;
which, to the best of my knowledge, are equivalent.

I had to use the \COPY because I received the error message:
ERROR: could not open file "/filepath/places.csv" for writing: Permission denied
So I used:
\Copy (Select address, zip From manjadata) To '/filepath/places.csv' With CSV;
and it is functioning

psql can do this for you:
edd#ron:~$ psql -d beancounter -t -A -F"," \
-c "select date, symbol, day_close " \
"from stockprices where symbol like 'I%' " \
"and date >= '2009-10-02'"
2009-10-02,IBM,119.02
2009-10-02,IEF,92.77
2009-10-02,IEV,37.05
2009-10-02,IJH,66.18
2009-10-02,IJR,50.33
2009-10-02,ILF,42.24
2009-10-02,INTC,18.97
2009-10-02,IP,21.39
edd#ron:~$
See man psql for help on the options used here.

I'm working on AWS Redshift, which does not support the COPY TO feature.
My BI tool supports tab-delimited CSVs though, so I used the following:
psql -h dblocation -p port -U user -d dbname -F $'\t' --no-align -c "SELECT * FROM TABLE" > outfile.csv

In pgAdmin III there is an option to export to file from the query window. In the main menu it's Query -> Execute to file or there's a button that does the same thing (it's a green triangle with a blue floppy disk as opposed to the plain green triangle which just runs the query). If you're not running the query from the query window then I'd do what IMSoP suggested and use the copy command.

I tried several things but few of them were able to give me the desired CSV with header details.
Here is what worked for me.
psql -d dbame -U username \
-c "COPY ( SELECT * FROM TABLE ) TO STDOUT WITH CSV HEADER " > \
OUTPUT_CSV_FILE.csv

I've written a little tool called psql2csv that encapsulates the COPY query TO STDOUT pattern, resulting in proper CSV. It's interface is similar to psql.
psql2csv [OPTIONS] < QUERY
psql2csv [OPTIONS] QUERY
The query is assumed to be the contents of STDIN, if present, or the last argument. All other arguments are forwarded to psql except for these:
-h, --help show help, then exit
--encoding=ENCODING use a different encoding than UTF8 (Excel likes LATIN1)
--no-header do not output a header

If you have longer query and you like to use psql then put your query to a file and use the following command:
psql -d my_db_name -t -A -F";" -f input-file.sql -o output-file.csv

To Download CSV file with column names as HEADER use this command:
Copy (Select * From tableName) To '/tmp/fileName.csv' With CSV HEADER;

Since Postgres 12, you can change the output format :
\pset format csv
The following formats are allowed :
aligned, asciidoc, csv, html, latex, latex-longtable, troff-ms, unaligned, wrapped
If you want to export the result of a request, you can use the \o filename feature.
Example :
\pset format csv
\o file.csv
SELECT * FROM table LIMIT 10;
\o
\pset format aligned

I found that psql --csv creates a CSV file with UTF8 characters but it is missing the UTF8 Byte Order Mark (0xEF 0xBB 0xBF). Without taking it into account, the default import of this CSV file will corrupt international characters such as CJK characters.
To fix it, I devised the following script:
# Define a connection to the Postgres database through environment variables
export PGHOST=your.pg.host
export PGPORT=5432
export PGDATABASE=your_pg_database
export PGUSER=your_pg_user
# Place credentials in $HOME/.pgpass with the format:
# ${PGHOST}:${PGPORT}:${PGUSER}:master:${PGPASSWORD}
# Populate long SQL query in a text file:
cat > /tmp/query.sql <<EOF
SELECT item.item_no,item_descrip,
invoice.invoice_no,invoice.sold_qty
FROM item
LEFT JOIN invoice
ON item.item_no=invoice.item_no;
EOF
# Generate CSV report with UTF8 BOM mark
printf '\xEF\xBB\xBF' > report.csv
psql -f /tmp/query.sql --csv | tee -a report.csv
Doing it this way, lets me script the CSV creation process for automation and allows me to succinctly maintain the script in a single source file.

import json
cursor = conn.cursor()
qry = """ SELECT details FROM test_csvfile """
cursor.execute(qry)
rows = cursor.fetchall()
value = json.dumps(rows)
with open("/home/asha/Desktop/Income_output.json","w+") as f:
f.write(value)
print 'Saved to File Successfully'

JackDB, a database client in your web browser, makes this really easy. Especially if you're on Heroku.
It lets you connect to remote databases and run SQL queries on them.
Source
(source: jackdb.com)
Once your DB is connected, you can run a query and export to CSV or TXT (see bottom right).
Note: I'm in no way affiliated with JackDB. I currently use their free services and think it's a great product.

Per the request of #skeller88, I am reposting my comment as an answer so that it doesn't get lost by people who don't read every response...
The problem with DataGrip is that it puts a grip on your wallet. It is not free. Try the community edition of DBeaver at dbeaver.io. It is a FOSS multi-platform database tool for SQL programmers, DBAs and analysts that supports all popular databases: MySQL, PostgreSQL, SQLite, Oracle, DB2, SQL Server, Sybase, MS Access, Teradata, Firebird, Hive, Presto, etc.
DBeaver Community Edition makes it trivial to connect to a database, issue queries to retrieve data, and then download the result set to save it to CSV, JSON, SQL, or other common data formats. It's a viable FOSS competitor to TOAD for Postgres, TOAD for SQL Server, or Toad for Oracle.
I have no affiliation with DBeaver. I love the price and functionality, but I wish they would open up the DBeaver/Eclipse application more and made it easy to add analytics widgets to DBeaver / Eclipse, rather than requiring users to pay for the annual subscription to create graphs and charts directly within the application. My Java coding skills are rusty and I don't feel like taking weeks to relearn how to build Eclipse widgets, only to find that DBeaver has disabled the ability to add third-party widgets to the DBeaver Community Edition.
Do DBeaver users have insight as to the steps to create analytics widgets to add into the Community Edition of DBeaver?

reload a .sql schema without restarting mysqld

Is it possible to reload a schema file without having to restart mysqld? I am working in just one db in a sea of many and would like to have my changes refreshed without doing a cold-restart.

When you say "reload a schema file", I assume you're referring to a file that has all the SQL statements defining your database schema? i.e. creating tables, views, stored procecures, etc.?
The solution is fairly simple - keep a file with all the SQL that creates the tables, etc. in a file, and before all the CREATE statements, add a DELETE/DROP statement to remove what's already there. Then when you want to do a reload, just do:
cat myschemafile.sql | mysql -u userid -p databasename

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas