What are the differentiation between \l vs \d in postgresql? - sql

And I want to know the details about template0 and template1 database

\l
Lists the databases. You can also run \list with similar results. See the documentation.
\d
stands for describe and shows the list of tables and other relations in the database.
Basically you need to switch to the database of your interest first, via running
\c template0
or
\c template1
depending where you want to connect and then run
\d
to list the tables in the current database. Since you do not specify a pattern in the above, it will display everything.
Now, if you are interested to see table structures, run \d tablename
Running
\dt+ *.*
will display all tables.

Related

pgAdmin 4 display source with/without schema

This is possibly more a database configuration problem than pgAdmin 4.
I work with two databases dev and test.
When I visualise common objects, one pgAdmin connection displays the table etc objects with a schema prefix the other does not. For example comparing two lines from a SELECT ... FROM statement (cf Schema Diff)
FROM df.pipe <-> FROM pipe
Can anyone point me in the direction of how to fix this ie both always show the schema prefix.

Incrementally importing data to a PostgreSQL database

Situation:
I have a PostgreSQL-database that is logging data from sensors in a field-deployed unit (let's call this the source database). The unit has a very limited hard-disk space, meaning that if left untouched, the data-logging will cause the disk where the database is residing to fill up within a week. I have a (very limited) network link to the database (so I want to compress the dump-file), and on the other side of said link I have another PostgreSQL database (let's call that the destination database) that has a lot of free space (let's just, for argument's sake, say that the source is very limited with regard to space, and the destination is unlimited with regard to space).
I need to take incremental backups of the source database, append the rows that have been added since last backup to the destination database, and then clean out the added rows from the source database.
Now the source database might or might not have been cleaned since a backup was last taken, so the destination database needs to be able to only imported the new rows in an automated (scripted) process, but pg_restore fails miserably when trying to restore from a dump that has the same primary key numbers as the destination database.
So the question is:
What is the best way to restore only the rows from a source that are not already in the destination database?
The only solution that I've come up with so far is to pg_dump the database and restore the dump to a new secondary-database on the destination-side with pg_restore, then use simple sql to sort out which rows already exist in my main-destination database. But it seems like there should be a better way...
(extra question: Am I completely wrong in using PostgreSQL in such an application? I'm open to suggestions for other data-collection alternatives...)
A good way to start would probably be to use the --inserts option to pg_dump. From the documentation (emphasis mine) :
Dump data as INSERT commands (rather than COPY). This will make
restoration very slow; it is mainly useful for making dumps that can
be loaded into non-PostgreSQL databases. However, since this option
generates a separate command for each row, an error in reloading a row
causes only that row to be lost rather than the entire table contents.
Note that the restore might fail altogether if you have rearranged
column order. The --column-inserts option is safe against column order
changes, though even slower.
I don't have the means to test it right now with pg_restore, but this might be enough for your case.
You could also use the fact that from the version 9.5, PostgreSQL provides ON CONFLICT DO ... for INSERTs. Use a simple scripting language to add these to the dump and you should be fine. I haven't found an option for pg_dump to add those automatically, unfortunately.
You might google "sporadically connected database synchronization" to see related solutions.
It's not a neatly solved problem as far as I know - there are some common work-arounds, but I am not aware of a database-centric out-of-the-box solution.
The most common way of dealing with this is to use a message bus to move events between your machines. For instance, if your "source database" is just a data store, with no other logic, you might get rid of it, and use a message bus to say "event x has occurred", and point the endpoint of that message bus at your "destination machine", which then writes that to your database.
You might consider Apache ActiveMQ or read "Patterns of enterprise integration".
#!/bin/sh
PSQL=/opt/postgres-9.5/bin/psql
TARGET_HOST=localhost
TARGET_DB=mystuff
TARGET_SCHEMA_IMPORT=copied
TARGET_SCHEMA_FINAL=final
SOURCE_HOST=192.168.0.101
SOURCE_DB=slurpert
SOURCE_SCHEMA=public
########
create_local_stuff()
{
${PSQL} -h ${TARGET_HOST} -U postgres ${TARGET_DB} <<OMG0
CREATE SCHEMA IF NOT EXISTS ${TARGET_SCHEMA_IMPORT};
CREATE SCHEMA IF NOT EXISTS ${TARGET_SCHEMA_FINAL};
CREATE TABLE IF NOT EXISTS ${TARGET_SCHEMA_FINAL}.topic
( topic_id INTEGER NOT NULL PRIMARY KEY
, topic_date TIMESTAMP WITH TIME ZONE
, topic_body text
);
CREATE TABLE IF NOT EXISTS ${TARGET_SCHEMA_IMPORT}.tmp_topic
( topic_id INTEGER NOT NULL PRIMARY KEY
, topic_date TIMESTAMP WITH TIME ZONE
, topic_body text
);
OMG0
}
########
find_highest()
{
${PSQL} -q -t -h ${TARGET_HOST} -U postgres ${TARGET_DB} <<OMG1
SELECT MAX(topic_id) FROM ${TARGET_SCHEMA_IMPORT}.tmp_topic;
OMG1
}
########
fetch_new_data()
{
watermark=${1-0}
echo ${watermark}
${PSQL} -h ${SOURCE_HOST} -U postgres ${SOURCE_DB} <<OMG2
\COPY (SELECT topic_id, topic_date, topic_body FROM ${SOURCE_SCHEMA}.topic WHERE topic_id >${watermark}) TO '/tmp/topic.dat';
OMG2
}
########
insert_new_data()
{
${PSQL} -h ${TARGET_HOST} -U postgres ${TARGET_DB} <<OMG3
DELETE FROM ${TARGET_SCHEMA_IMPORT}.tmp_topic WHERE 1=1;
COPY ${TARGET_SCHEMA_IMPORT}.tmp_topic(topic_id, topic_date, topic_body) FROM '/tmp/topic.dat';
INSERT INTO ${TARGET_SCHEMA_FINAL}.topic(topic_id, topic_date, topic_body)
SELECT topic_id, topic_date, topic_body
FROM ${TARGET_SCHEMA_IMPORT}.tmp_topic src
WHERE NOT EXISTS (
SELECT *
FROM ${TARGET_SCHEMA_FINAL}.topic nx
WHERE nx.topic_id = src.topic_id
);
OMG3
}
########
delete_below_watermark()
{
watermark=${1-0}
echo ${watermark}
${PSQL} -h ${SOURCE_HOST} -U postgres ${SOURCE_DB} <<OMG4
-- delete not yet activated; COUNT(*) instead
-- DELETE
SELECT COUNT(*)
FROM ${SOURCE_SCHEMA}.topic WHERE topic_id <= ${watermark}
;
OMG4
}
######## Main
#create_local_stuff
watermark="`find_highest`"
echo 'Highest:' ${watermark}
fetch_new_data ${watermark}
insert_new_data
echo 'Delete below:' ${watermark}
delete_below_watermark ${watermark}
# Eof
This is just an example. Some notes:
I assume a non-decreasing serial PK for the table; in most cases it could also be a timestamp
for simplicity, all the queries are run as user postgres, you might need to change this
the watermark method will guarantee that only new records will be transmitted, minimising bandwidth usage
the method is atomic, if the script crashes, nothing is lost
only one table is fetched here, but you could add more
because I'm paranoid, I us a different name for the staging table and put it into a separate schema
The whole script does two queries on the remote machine (one for fetch one for delete); you could combine these.
but there is only one script (executing from the local=target machine) involved.
The DELETE is not yet active; it only does a count(*)

Merging two datababases

UPD: Where from the requirement is coming.
My friend is using Mnemosine (http://mnemosyne-proj.org/) which is python program that uses sqlite as db. The issue that mobile version works only with one database file and my friend has already several. So he asked me if I can merge two databases.
So! I have two sqlite db files with same schema but different data.
Is there an automated way to include data from one file to another? I just need to insert additional values to dictionary tables and correctly insert values from other tables based on new ids.
Unfortunately there are no foreign keys defined so I need probably first specify columns/tables relationship. But in general, if I solve relationship issue, is it possible to merge dbs?
You can open the database you want to merge into, then attach the other database.
ATTACH DATABASE "foo.database" AS foo;
Then you can access the other database's tables by prefixing it with the database's name and a dot:
INSERT INTO bar (baz) SELECT baz FROM foo.bar;
You could try this:
sqlite3 bar.db ".dump t1" | grep -v "^CREATE" | sqlite3 foo.db
That will put the contents of table t1 from bar.db into table t1 in foo.db.

Create database explicitly before restoring to it?

When I setup my PostgreSQL server one of the first things I will do is import a database for an external source. Which of the following is the right way to do it?
Create a database called "NEWDB" on the PostgreSQL server and then
import my external "BACKUPDB" database from my pg_dump into the
"NEWDB".
Don't create a database on the PostgreSQL server, and import the
"NEWDB" database, thereby automatically creating "NEWDB" on the
postgresql server.
I guess my question is, if I want to import an existing database to the PostgreSQL server, do I first need to create a database for it to go into?
You don't have to. It depends on what you want to achieve. If you dump a single database with pg_dump, CREATE DATABASE and ALTER DATABASE commands are not included. You are expected to connect to an existing database. So you have to create it first.
I quote advice from the manual:
If your database cluster has any local additions to the template1
database, be careful to restore the output of pg_dump into a truly
empty database; otherwise you are likely to get errors due to
duplicate definitions of the added objects. To make an empty database
without any local additions, copy from template0 not template1, for
example:
CREATE DATABASE foo WITH TEMPLATE template0;
And also:
The dump file also does not contain any ALTER DATABASE ... SET
commands; these settings are dumped by pg_dumpall, along with database
users and other installation-wide settings.
pg_dumpall, on the other hand, dumps the whole DB cluster including meta-objects like users. It includes CREATE DATABASE statements and connects to each DB when restoring. You can even include DROP DATABASE statements with the -c (--clean) option. Careful with that.
Every instance of PostgreSQL has a default maintenance db named "postgres" that you can connect to - to create databases for instance or start a full restore (from pg_dumpall). But a single-DB dump (from pg_dump) has to be run against its target database.
Finally:
Once restored, it is wise to run ANALYZE on each database so the
optimizer has useful statistics. You can also run vacuumdb -a -z to
analyze all databases.

How to read schema of a PostgreSQL database

I installed an application that uses a postgreSQL server but I don't know the name of the database and the tables it uses. Is there any command in order to see the name of the database and the tables of this application?
If you are able to view the database using the psql terminal command:
> psql -h hostname -U username dbname
...then, in the psql shell, \d ("describe") will show you a list of all the relations in the database. You can use \d on specific relations as well, e.g.
db_name=# \d table_name
Table "public.table_name"
Column | Type | Modifiers
---------------+---------+-----------
id | integer | not null
... etc ...
Using the psql on Linux you can use the \l command to list databases, \c dbname to connect to that db and the \d command to list tables in the db.
Short answer: connect to the default database with psql, and list all databases with '\l'
Then, connect to you database of interest, and list tables with '\dt'
Slightly larger answer: A Postgresql server installation usually has a "data directory" (can have more than one, if there are two server instances running, but that's quite unusual), which defines what postgresl calls "a cluster". Inside it, you can have several databases ; you usually have at least the defaults 'template0' and 'template1', plus your own database(s).