Copy a table (including indexes) in postgres - sql

I have a postgres table. I need to delete some data from it. I was going to create a temporary table, copy the data in, recreate the indexes and the delete the rows I need. I can't delete data from the original table, because this original table is the source of data. In one case I need to get some results that depends on deleting X, in another case, I'll need to delete Y. So I need all the original data to always be around and available.
However it seems a bit silly to recreate the table and copy it again and recreate the indexes. Is there anyway in postgres to tell it "I want a complete separate copy of this table, including structure, data and indexes"?
Unfortunately PostgreSQL does not have a "CREATE TABLE .. LIKE X INCLUDING INDEXES'

New PostgreSQL ( since 8.3 according to docs ) can use "INCLUDING INDEXES":
# select version();
version
-------------------------------------------------------------------------------------------------
PostgreSQL 8.3.7 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
(1 row)
As you can see I'm testing on 8.3.
Now, let's create table:
# create table x1 (id serial primary key, x text unique);
NOTICE: CREATE TABLE will create implicit sequence "x1_id_seq" for serial column "x1.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "x1_pkey" for table "x1"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "x1_x_key" for table "x1"
CREATE TABLE
And see how it looks:
# \d x1
Table "public.x1"
Column | Type | Modifiers
--------+---------+-------------------------------------------------
id | integer | not null default nextval('x1_id_seq'::regclass)
x | text |
Indexes:
"x1_pkey" PRIMARY KEY, btree (id)
"x1_x_key" UNIQUE, btree (x)
Now we can copy the structure:
# create table x2 ( like x1 INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES );
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "x2_pkey" for table "x2"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "x2_x_key" for table "x2"
CREATE TABLE
And check the structure:
# \d x2
Table "public.x2"
Column | Type | Modifiers
--------+---------+-------------------------------------------------
id | integer | not null default nextval('x1_id_seq'::regclass)
x | text |
Indexes:
"x2_pkey" PRIMARY KEY, btree (id)
"x2_x_key" UNIQUE, btree (x)
If you are using PostgreSQL pre-8.3, you can simply use pg_dump with option "-t" to specify 1 table, change table name in dump, and load it again:
=> pg_dump -t x2 | sed 's/x2/x3/g' | psql
SET
SET
SET
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
And now the table is:
# \d x3
Table "public.x3"
Column | Type | Modifiers
--------+---------+-------------------------------------------------
id | integer | not null default nextval('x1_id_seq'::regclass)
x | text |
Indexes:
"x3_pkey" PRIMARY KEY, btree (id)
"x3_x_key" UNIQUE, btree (x)

[CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE table_name
[ (column_name [, ...] ) ]
[ WITH ( storage_parameter [= value] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE tablespace ]
AS query][1]
Here is an example
CREATE TABLE films_recent AS
SELECT * FROM films WHERE date_prod >= '2002-01-01';
The other way to create a new table from the first is to use
CREATE TABLE films_recent (LIKE films INCLUDING INDEXES);
INSERT INTO films_recent
SELECT *
FROM books
WHERE date_prod >= '2002-01-01';
Note that Postgresql has a patch out to fix tablespace issues if the second method is used

There are many answers on the web, one of them can be found here.
I ended up doing something like this:
create table NEW ( like ORIGINAL including all);
insert into NEW select * from ORIGINAL
This will copy the schema and the data including indexes, but not including triggers and constraints.
Note that indexes are shared with original table so when adding new row to either table the counter will increment.

I have a postgres table. I need to
delete some data from it.
I presume that ...
delete from yourtable
where <condition(s)>
... won't work for some reason. (Care to share that reason?)
I was going to create a temporary
table, copy the data in, recreate the
indexes and the delete the rows I
need.
Look into pg_dump and pg_restore. Using pg_dump with some clever options and perhaps editing the output before pg_restoring might do the trick.
Since you are doing "what if"-type analysis on the data, I wonder if might you be better off using views.
You could define a view for each scenario you want to test based on the negation of what you want to exclude. I.e., define a view based on what you want to INclude. E.g., if you want a "window" on the data where you "deleted" the rows where X=Y, then you would create a view as rows where (X != Y).
Views are stored in the database (in the System Catalog) as their defining query. Every time you query the view the database server looks up the underlying query that defines it and executes that (ANDed with any other conditions you used). There are several benefits to this approach:
You never duplicate any portion of your data.
The indexes already in use for the base table (your original, "real" table) will be used (as seen fit by the query optimizer) when you query each view/scenario. There is no need to redefine or copy them.
Since a view is a "window" (NOT a shapshot) on the "real" data in the base table, you can add/update/delete on your base table and simply re-query the view scenarios with no need to recreate anything as the data changes over time.
There is a trade-off, of course. Since a view is a virtual table and not a "real" (base) table, you're actually executing a (perhaps complex) query every time you access it. This may slow things down a bit. But it may not. It depends on many issues (size and nature of the data, quality of the statistics in the System Catalog, speed of the hardware, usage load, and much more). You won't know until you try it. If (and only if) you actually find that the performance is unacceptably slow, then you might look at other options. (Materialized views, copies of tables, ... anything that trades space for time.)

A simple way is include all:
CREATE TABLE new_table (LIKE original_table INCLUDING ALL);

Create a new table using a select to grab the data you want. Then swap the old table with the new one.
create table mynewone as select * from myoldone where ...
mess (re-create) with indexes after the table swap.

Related

How can I copy a Redshift table but add a sortkey to a column?

I'm currently working on a project that uses a Redshift table with 51 columns. However, the person who made the table forgot to add a sortkey to our time column which will hurt performance for our use case if we don't add it.
How can I make a version of the table with our time column as the sortkey? I'm aware that you can't make a column a sortkey if its a member of an existing table, but I was hoping there's a way to do it that doesn't involve writing out the CREATE TABLE syntax by hand; for example, something like this would be nice:
timecube=# CREATE TABLE foo (like bar) sortkey(time);
ERROR: CREATE TABLE LIKE is not supported with DISTSTYLE, DISTKEY(), or SORTKEY() clauses
but as you can see its not supported. Is there another way? As we're still developing we don't need any of existing data.
Using traditional tools like pgdump didn't work well because they don't include any of the Redshift extras like encoding.
Redshift supports specifying the DIST and SORT keys as part of CREATE TABLE AS statements, as per the docs.
CREATE TABLE table_name
DISTSTYLE KEY
DISTKEY ( column )
SORTKEY ( column )
AS
(SELECT *
FROM source_table)
;
First step you need to do use get create table statement for existing table. Then create new table this time add sort key to new table.
Check encoding for old table ( when you load data using copy command it automatically adds compression encodings)
select "column", type, encoding
from pg_table_def where tablename = 'old_table'
When creating new table add encoding type for each column. Create table with Sort key .
Once new table is created use below command
insert into new table ( select * from old table order by time asc)

How Do I Deep Copy a Set of Data, and Change FK References to Point to All the Copies?

Suppose I have Table A and Table B. Table B references Table A. I want to deep copy a set of rows in Table A and Table B. I want all of the new Table B rows to reference the new Table A rows.
Note that I'm not copying the rows into any other tables. The rows in table A will be copied into table A, and the rows in table B will be copied into table B.
How can I ensure that the foreign key references get readjusted as part of the copy?
To clarify, I'm trying to find a generic way to do this. The example I'm giving involves two tables, but in practice the dependency graph may be much more complicated. Even a generic way to dynamically generate SQL to do the work would be fine.
UPDATE:
People are asking why this is necessary, so I'll give some background. It may be way too much, but here goes:
I'm working with an old desktop application that's been moved to a client-server model. But, the application still uses a rudimentary in-house binary file format for storing data for its tables. A data file is just a header followed by a series of rows, each of which is just the binary serialized field values, the order of which is determined by a schema text file. The only thing good about it is that it's very fast. It's terrible in every other respect. I'm moving the application to SQL Server and trying not to degrade the performance too badly.
This is a kind of scheduling application; the data's not critical to anybody, and there's no audit tracking, etc. necessary. It's not a supermassive amount of data, and we don't necessarily need to keep very old data around if the database grows too large.
One feature that they are accustomed to is the ability to duplicate entire schedules in order to create "what-if" scenarios that they can muck with. Any user can do this as many times as they want, as often as they want. In the old database, the data files for each schedule are stored in their own data folder, identified by name. So, copying a schedule was as simple as copying the data folder and renaming it.
I must be able to do effectively the same thing with SQL Server or the migration will not work. Maybe you're thinking that I can just only copy the data that actually gets changed in order to avoid redundancy; but that honestly sounds too complicated to be feasible.
To throw another wrench into the mix, there can be a hierarchy of schedule data folders. So, a data folder may contain a data folder, which may contain a data folder. And the copying can occur at any level.
In SQL Server, I'm implementing a nested set hierarchy to mimic this. I have a DATA_SET table like this:
CREATE TABLE dbo.DATA_SET
(
DATA_SET_ID UNIQUEIDENTIFIER PRIMARY KEY,
NAME NVARCHAR(128) NOT NULL,
LFT INT NOT NULL,
RGT INT NOT NULL
)
So, there's a tree structure of data sets. Each data set represents a schedule, and may contain child data sets. Every row in every table has a DATA_SET_ID FK reference, indicating which data set it belongs to. Whenever I copy a data set, I copy all the rows in the table for that data set, and every other data set, into the same table, but referencing new data sets.
So, here's a simple concrete example:
CREATE TABLE FOO
(
FOO_ID BIGINT PRIMARY KEY,
DATA_SET_ID BIGINT FOREIGN KEY REFERENCES DATA_SET(DATA_SET_ID) NOT NULL
)
CREATE TABLE BAR
(
BAR_ID BIGINT PRIMARY KEY,
DATA_SET_ID BIGINT FOREIGN KEY REFERENCES DATA_SET(DATA_SET_ID) NOT NULL,
FOO_ID UNIQUEIDENTIFIER PRIMARY KEY
)
INSERT INTO FOO
SELECT 1, 1 UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 1 UNION ALL
INSERT INTO BAR
SELECT 1, 1, 1
SELECT 2, 1, 2
SELECT 3, 1, 3
So, let's say I copy data set 1 into a new data set of ID 2. After I copy, the tables will look like this:
FOO
FOO_ID, DATA_SET_ID
1 1
2 1
3 1
4 2
5 2
6 2
BAR
BAR_ID, DATA_SET_ID, FOO_ID
1 1 1
2 1 2
3 1 3
4 2 4
5 2 5
6 2 6
As you can see, the new BAR rows are referencing the new FOO rows. It's not the rewiring of the DATA_SET_ID's that I'm asking about. I'm asking about rewiring the foreign keys in general.
So, that was surely too much information, but there you go.
I'm sure there are a lot of concerns about performance with the idea of bulk copying the data like this. The tables are not going to be huge. I'm not expecting more than 1000 records in any table, and most of the tables will be much much smaller than that. Old data sets can be deleted outright with no repercussions.
Thanks,
Tedderz
Here is an example with three tables that can probably get you started.
DB schema
CREATE TABLE users
(user_id int auto_increment PRIMARY KEY,
user_name varchar(32));
CREATE TABLE agenda
(agenda_id int auto_increment PRIMARY KEY,
`user_id` int, `agenda_name` varchar(7));
CREATE TABLE events
(event_id int auto_increment PRIMARY KEY,
`agenda_id` int,
`event_name` varchar(8));
An SP to clone a user with his agenda and events records
DELIMITER $$
CREATE PROCEDURE clone_user(IN uid INT)
BEGIN
DECLARE last_user_id INT DEFAULT 0;
INSERT INTO users (user_name)
SELECT user_name
FROM users
WHERE user_id = uid;
SET last_user_id = LAST_INSERT_ID();
INSERT INTO agenda (user_id, agenda_name)
SELECT last_user_id, agenda_name
FROM agenda
WHERE user_id = uid;
INSERT INTO events (agenda_id, event_name)
SELECT a3.agenda_id_new, e.event_name
FROM events e JOIN
(SELECT a1.agenda_id agenda_id_old,
a2.agenda_id agenda_id_new
FROM
(SELECT agenda_id, #n := #n + 1 n
FROM agenda, (SELECT #n := 0) n
WHERE user_id = uid
ORDER BY agenda_id) a1 JOIN
(SELECT agenda_id, #m := #m + 1 m
FROM agenda, (SELECT #m := 0) m
WHERE user_id = last_user_id
ORDER BY agenda_id) a2 ON a1.n = a2.m) a3
ON e.agenda_id = a3.agenda_id_old;
END$$
DELIMITER ;
To clone a user
CALL clone_user(3);
Here is SQLFiddle demo.
I recently found myself needing to solve a similar problem; that is, I needed to copy a set of rows in a table (Table A) as well as all of the rows in related tables which have foreign keys pointing to Table A's primary key. I was using Postgres so the exact queries may differ but the overall approach is the same. The biggest benefit of this approach is that it can be used recursively to go infinitely deep
TLDR: the approach looks like this
1) find all the related table/columns of Table A
2) copy the necessary data into temporary tables
3) create a trigger and function to propagate primary key column
updates to related foreign keys columns in the temporary tables
4) update the primary key column in the temporary tables to the next
value in the auto increment sequence
5) Re-insert the data back into the source tables, and drop the
temporary tables/triggers/function
1) The first step is to query the information schema to find all of the tables and columns which are referencing Table A. In Postgres this might look like the following:
SELECT tc.table_name, kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
ON ccu.constraint_name = tc.constraint_name
WHERE constraint_type = 'FOREIGN KEY'
AND ccu.table_name='<Table A>'
AND ccu.column_name='<Primary Key>'
2) Next we need to copy the data from Table A, and any other tables which reference Table A - lets say there is one called Table B. To start this process, lets create a temporary table for each of these tables and we will populate it with the data that we need to copy. This might look like the following:
CREATE TEMP TABLE temp_table_a AS (
SELECT * FROM <Table A> WHERE ...
)
CREATE TEMP TABLE temp_table_b AS (
SELECT * FROM <Table B> WHERE <Foreign Key> IN (
SELECT <Primary Key> FROM temp_table_a
)
)
3) We can now define a function that will cascade primary key column updates out to related foreign key columns, and trigger which will execute whenever the primary key column changes. For example:
CREATE OR REPLACE FUNCTION cascade_temp_table_a_pk()
RETURNS trigger AS
$$
BEGIN
UPDATE <Temp Table B> SET <Foreign Key> = NEW.<Primary Key>
WHERE <Foreign Key> = OLD.<Primary Key>;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trigger_temp_table_a
AFTER UPDATE
ON <Temp Table A>
FOR EACH ROW
WHEN (OLD.<Primary Key> != NEW.<Primary Key>)
EXECUTE PROCEDURE cascade_temp_table_a_pk();
4) Now we just update the primary key column in to the next value of the sequence of the source table (). This will activate the trigger, and the updates will be cascaded out to the foreign key columns in . In Postgres you can do the following:
UPDATE <Temp Table A>
SET <Primary Key> = nextval(pg_get_serial_sequence('<Table A>', '<Primary Key>'))
5) Insert the data back from the temporary tables back into the source tables. And then drop the temporary tables, triggers, and functions after that.
INSERT INTO <Table A> (SELECT * FROM <Temp Table A>)
INSERT INTO <Table B> (SELECT * FROM <Temp Table B>)
DROP TRIGGER trigger_temp_table_a
DROP cascade_temp_table_a_pk()
It is possible to take this general approach and turn it into a script which can be called recursively in order to go infinitely deep. I ended up doing just that using python (our application was using django so I was able to use the django ORM to make some of this easier)

How to copy structure and contents of a table, but with separate sequence?

I'm trying to setup temporary tables for unit-testing purposes. So far I managed to create a temporary table which copies the structure of an existing table:
CREATE TEMP TABLE t_mytable (LIKE mytable INCLUDING DEFAULTS);
But this lacks the data from the original table. I can copy the data into the temporary table by using a CREATE TABLE AS statement instead:
CREATE TEMP TABLE t_mytable AS SELECT * FROM mytable;
But then the structure of t_mytable will not be identical, e.g. column sizes and default values are different. Is there a single statement which copies everything?
Another problem with the first query using LIKE is that the key column still references the SEQUENCE of the original table, and thus increments it on insertion. Is there an easy way to create the new table with its own sequence, or will I have to set up a new sequence by hand?
I'm using the following code to do it:
CREATE TABLE t_mytable (LIKE mytable INCLUDING ALL);
ALTER TABLE t_mytable ALTER id DROP DEFAULT;
CREATE SEQUENCE t_mytable_id_seq;
INSERT INTO t_mytable SELECT * FROM mytable;
SELECT setval('t_mytable_id_seq', (SELECT max(id) FROM t_mytable), true);
ALTER TABLE t_mytable ALTER id SET DEFAULT nextval('t_my_table_id_seq');
ALTER SEQUENCE t_mytable_id_seq OWNED BY t_mytable.id;
Postgres 10 or later
Postgres 10 introduced IDENTITY columns conforming to the SQL standard (with minor extensions). The ID column of your table would look something like:
id integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
Syntax in the manual.
Using this instead of a traditional serial column avoids your problem with sequences. IDENTITY columns use exclusive, dedicated sequences automatically, even when the specification is copied with LIKE. The manual:
Any identity specifications of copied column definitions will only be
copied if INCLUDING IDENTITY is specified. A new sequence is created
for each identity column of the new table, separate from the sequences
associated with the old table.
And:
INCLUDING ALL is an abbreviated form of INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING COMMENTS.
The solution is simpler now:
CREATE TEMP TABLE t_mytable (LIKE mytable INCLUDING ALL);
INSERT INTO t_mytable TABLE mytable;
SELECT setval(pg_get_serial_sequence('t_mytable', 'id'), max(id)) FROM tbl;
As demonstrated, you can still use setval() to set the sequence's current value. A single SELECT does the trick. pg_get_serial_sequence()]6 gets the name of the sequence.
db<>fiddle here
Related:
How to reset postgres' primary key sequence when it falls out of sync?
Is there a shortcut for SELECT * FROM?
Creating a PostgreSQL sequence to a field (which is not the ID of the record)
Original (old) answer
You can take the create script from a database dump or a GUI like pgAdmin (which reverse-engineers database object creation scripts), create an identical copy (with separate sequence for the serial column), and then run:
INSERT INTO new_tbl
SELECT * FROM old_tbl;
The copy cannot be 100% identical if both tables reside in the same schema. Obviously, the table name has to be different. Index names would conflict, too. Retrieving serial numbers from the same sequence would probably not be in your best interest, either. So you have to (at least) adjust the names.
Placing the copy in a different schema avoids all of these conflicts. While you create a temporary table from a regular table like you demonstrated, that's automatically the case since temp tables reside in their own temporary schema.
Or look at Francisco's answer for DDL code to copy directly.

creating table as select is dropping the not null constraints in postgresql

in postgres sql creating the table as select dropped the not null constraints on the table.
for example :
create table A (char a not null);
create table B as select * from a;
select * from B;-- no constraint is copied from A table
please let me know how to copy table data as well as constraints in postgres.
There is no single-command solution to this.
To create a table based on an existing one, including all constraints, use:
create table B ( like a including constraints);
Once you have done that, you can copy the data from the old one to the new one:
insert into b
select * from a;
If you do this in a single transaction, it looks like an atomic operation to all other sessions connected to the database.
very detailed and nicely explained tutorial for create table command in PostgreSQL 9.1
http://www.postgresql.org/docs/current/static/sql-createtable.html
Not null constraints are always copied (if creating table by giving reference of parent table in create table command) and even with including constraints, only check constraint will be copied.

Practical limitations of expression indexes in PostgreSQL

I have a need to store data using the HSTORE type and index by key.
CREATE INDEX ix_product_size ON product(((data->'Size')::INT))
CREATE INDEX ix_product_color ON product(((data->'Color')))
etc.
What are the practical limitations of using expression indexes? In my case, there could be several hundred different types of data, hence several hundred expression indexes. Every insert, update, and select query will have to process against these indexes in order to pick the correct one.
I've never played with hstore, but I do something similar when I need an EAV column, e.g.:
create index on product_eav (eav_value) where (eav_type = 'int');
The limitation in doing so is that you need to be explicit in your query to make use of it, i.e. this query would not make use of the above index:
select product_id
from product_eav
where eav_name = 'size'
and eav_value = :size;
But this one would:
select product_id
from product_eav
where eav_name = 'size'
and eav_value = :size
and type = 'int';
In your example it should likely be more like:
create index on product ((data->'size')::int) where (data->'size' is not null);
This should avoid adding a reference to the index when there is no size entry. Depending on the PG version you're using the query may need to be modified like so:
select product_id
from products
where data->'size' is not null
and data->'size' = :size;
Another big difference between regular and partial index is that the latter cannot enforce a unique constraint in a table definition. This will succeed:
create unique index foo_bar_key on foo (bar) where (cond);
The following won't:
alter table foo add constraint foo_bar_key unique (bar) where (cond);
But this will:
alter table foo add constraint foo_bar_excl exclude (bar with =) where (cond);