Create column with duplicate data in same table psql - sql

Postgres database
I'm trying to find a faster way to create a new column in a table which is a copy of the tables primary key column, so if I have the following columns in a table named students:
student_id Integer Auto-Increment -- Primary key
name varchar
Then I would like to create a new column named old_student_id which has all the same values as student_id.
To do this I create the column and the execute the following update statement
update student set old_student_id=student_id
Which works, but on my biggest table it takes over an hour, and I feels like I should be able to use some kind of alternative approach to get that down to a few minutes, I just don't know what.
So what I want at the end of the day is something that looks like this:
+------------+-----+---------------+
| student_id | name| old_student_id|
+------------+-----+---------------+
| 1 | bob | 1 |
+------------+-----+---------------+
| 2 | tod | 2 |
+------------+-----+---------------+
| 3 | joe | 3 |
+------------+-----+---------------+
| 4 | tim | 4 |
+------------+-----+---------------+
To speed things up a bit before I do the update query, I drop all the FK's and Indices on the table, then reapply them when it finishes. Also I'm on an AWS RDS, so I have setup a param group which has synchronized_commits=false, turned off backups, and increased working mem a bit for the duration of this update.
For context this is actually happening to every table in the database, across three databases. The old ids are used as references for several external systems which reference these ids, so I need to keep track of them in order to update those systems as well. I have an 8 hour downtime window, and currently merging the databases takes ~3 hours, and a whole hour of that time is spent creating these ids.

If in the future you do not need to update old_student_id column then you can use virtual columns on PostgreSQL.
CREATE TABLE table2 (
id serial4 NOT NULL,
val1 int4 NULL,
val2 int4 NULL,
total int4 NULL GENERATED ALWAYS AS (id) STORED
);
During the inserting process, the total field will be set to the same value as the id field. But you can not update this field, because this is a virtual column.
Alternative method is a using triggers. In this case you can update your fields. See this example:
Firstly we need create trigger function which will be called before table inserting.
CREATE OR REPLACE FUNCTION table2_insert()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
begin
new.total = new.val1 * new.val2;
return new;
END;
$function$
;
After then:
CREATE TABLE table2 (
id serial4 NOT NULL,
val1 int4 NULL,
val2 int4 NULL,
total int4 NULL
);
create trigger my_trigger before
insert
on
table2 for each row execute function table2_insert();
With both methods, you don't have to update many records every time.

Related

Auto incremented column scoped to user id

I am really struggling how to implement requirement which is going to be best described with example.
Consider everything below to be written in pseudocode although I am interested in solutions for postgres.
id
id_for_user
note
created_by
1
1
Buy milk
1
1
2
Winter tyres
1
1
3
Read for 1h
1
2
1
Clean dishes
2
2
2
Learn how magnets work
2
INSERT INTO notes VALUES (note: 'Learn icelandic', created_by: 1);
id
id_for_user
note
created_by
1
1
Buy milk
1
2
2
Winter tyres
1
3
3
Read for 1h
1
4
1
Clean dishes
2
5
2
Learn how magnets work
2
6
4
Learn Icelandic
1
INSERT INTO notes VALUES (note: 'Are birds real?', created_by: 2);
id
id_for_user
note
created_by
1
1
Buy milk
1
2
2
Winter tyres
1
3
3
Read for 1h
1
4
1
Clean dishes
2
5
2
Learn how magnets work
2
6
4
Learn Icelandic
1
7
3
Are birds real?
2
I would like to achieve something like this:
CREATE TABLE notes (
id SERIAL,
id_for_user INT DEFAULT nextval(created_by) -- Dynamic name for sequence so every user gets its own,
note VARCHAR,
created_by INT,
PRIMARY KEY(id, id_for_user),
CONSTRAINT fk_notes_created_by
FOREIGN KEY(created_by)
REFERENCES users(created_by)
);
So that user 1 sees (notice how id_for_user is just id on front end)
id
note
1
Buy milk
2
Winter tyres
3
Read for 1h
4
Learn Icelandic
And user 2
id
note
1
Clean dishes
2
Learn how magnets work
3
Are birds real?
Basically I want to have auto incremented field for each user.
I am then also probably going to query for the record by id_for_user filling create_by on backend based on which user made the request.
Is something like this even possible? What are my options? I would really like to have this logic on db level.
https://www.db-fiddle.com/f/6eBvq4VCQPTmmR3W6fCnEm/2
Try with a sequence, this object will have control of the autonumeric of the ID
example:
CREATE SEQUENCE sequence_notes1
INCREMENT BY 1
MINVALUE 1
MAXVALUE 100;
CREATE SEQUENCE sequence_notes2
INCREMENT BY 1
MINVALUE 1
MAXVALUE 100;
CREATE TABLE notes (
id SERIAL,
id_for_user INT,
note VARCHAR,
created_by INT,
PRIMARY KEY(id)
);
INSERT INTO notes (id_for_user, note, created_by) VALUES (nextval('sequence_notes1'),'Foo', 1);
INSERT INTO notes (id_for_user, note, created_by) VALUES (nextval('sequence_notes1'),'Moo', 1);
INSERT INTO notes (id_for_user, note, created_by) VALUES (nextval('sequence_notes2'),'Boo', 2);
INSERT INTO notes (id_for_user, note, created_by) VALUES (nextval('sequence_notes2'),'Loo', 2);
You can have a separate table to store "the next ordinal value for each user". Then a trigger can fill the value and increment the related table.
For example:
create table usr (
id int primary key,
next_ordinal int default 1
);
create table note (
id int primary key,
note varchar(100),
created_by int references usr (id),
user_ord int
);
create or replace function add_user_ord() returns trigger as $$
begin
select next_ordinal into new.user_ord from usr where id = new.created_by;
update usr set next_ordinal = next_ordinal + 1 where id = new.created_by;
return new;
end;
$$ language plpgsql;
//
create trigger trg_note1 before insert on note
for each row execute procedure add_user_ord();
Then, the trigger will add the correct ordinal numbers automatically behind the scenes during INSERTs:
insert into usr (id) values (10), (20);
insert into note (id, note, created_by) values (1, 'Hello', 10);
insert into note (id, note, created_by) values (2, 'Lorem', 20);
insert into note (id, note, created_by) values (3, 'World', 10);
insert into note (id, note, created_by) values (4, 'Ipsum', 20);
Result:
id note created_by user_ord
-- ----- ---------- --------
1 Hello 10 1
2 Lorem 20 1
3 World 10 2
4 Ipsum 20 2
Note: This solution does not address multi-threading inserts. If your application needs this you'll need to add some isolation (or pessimistic, or optimistic locking) for it.
In MySQL, this is supported in the MyISAM storage engine.
https://dev.mysql.com/doc/refman/8.0/en/example-auto-increment.html says:
For MyISAM tables, you can specify AUTO_INCREMENT on a secondary
column in a multiple-column index. In this case, the generated value
for the AUTO_INCREMENT column is calculated as
MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is
useful when you want to put data into ordered groups.
CREATE TABLE animals (
grp ENUM('fish','mammal','bird') NOT NULL,
id MEDIUMINT NOT NULL AUTO_INCREMENT,
name CHAR(30) NOT NULL,
PRIMARY KEY (grp,id)
) ENGINE=MyISAM;
INSERT INTO animals (grp,name) VALUES
('mammal','dog'),('mammal','cat'),
('bird','penguin'),('fish','lax'),('mammal','whale'),
('bird','ostrich');
SELECT * FROM animals ORDER BY grp,id; Which returns:
+--------+----+---------+
| grp | id | name |
+--------+----+---------+
| fish | 1 | lax |
| mammal | 1 | dog |
| mammal | 2 | cat |
| mammal | 3 | whale |
| bird | 1 | penguin |
| bird | 2 | ostrich |
+--------+----+---------+
The reason this works in MyISAM is that MyISAM only supports table-level locking.
In a storage engine with row-level locking, you get race conditions if you try to have a primary key that works like this. This is why others on this thread have commented that implementing this with triggers requires some pessimistic locking. You have to use locking to ensure that only one client at a time is inserting, so they don't allocate the same value.
This will be limiting in a high-traffic application. InnoDB's auto-increment is implemented the way it is to allow applications in which many client threads are executing inserts concurrently.
So you could use MyISAM or you could use InnoDB and invent your own way of allocating new id's per user, but either way it will severely limit your app's scalability.

How to UPDATE or INSERT in PostgreSQL

I want to UPDATE or INSERT a column in PostgreSQL instead of doing INSERT or UPDATE using INSERT ... ON CONFLICT ... because there will be more updates than more inserts and also I have an auto incrementing id column that's defined using SERIAL so it increments the id column everytime it tries to INSERT or UPDATE and that's not what I want, I want the id column to increase only if it's an INSERT so that all ids would be in an order instead
The table is created like this
CREATE TABLE IF NOT EXISTS table_name (
id SERIAL PRIMARY KEY,
user_id varchar(30) NOT NULL,
item_name varchar(50) NOT NULL,
code_uses bigint NOT NULL,
UNIQUE(user_id, item_name)
)
And the query I used was
INSERT INTO table_name
VALUES (DEFAULT, 'some_random_id', 'some_random_name', 1)
ON CONFLICT (user_id, item_name)
DO UPDATE SET code_uses = table_name.code_uses + 1;
Thanks :)
Upserts in PostgreSQL do exactly what you described.
Consider this table and records
CREATE TABLE t (id SERIAL PRIMARY KEY, txt TEXT);
INSERT INTO t (txt) VALUES ('foo'),('bar');
SELECT * FROM t ORDER BY id;
id | txt
----+-----
1 | foo
2 | bar
(2 Zeilen)
Using upserts the id will only increment if a new record is inserted
INSERT INTO t VALUES (1,'foo updated'),(3,'new record')
ON CONFLICT (id) DO UPDATE SET txt = EXCLUDED.txt;
SELECT * FROM t ORDER BY id;
id | txt
----+-------------
1 | foo updated
2 | bar
3 | new record
(3 Zeilen)
EDIT (see coments): this is the expected behaviour of a serial column, since they're nothing but a fancy way to use sequences. Long story short: using upserts the gaps will be inevitable. If you're worried the value might become too big, use bigserial instead and let PostgreSQL do its job.
Related thread: serial in postgres is being increased even though I added on conflict do nothing

Insert multiple values with foreign key Postgresql

I am having trouble figuring out how to insert multiple values to a table, which checks if another table has the needed values stored. I am currently doing this in a PostgreSQL server, but will be implementing it in PreparedStatements for my java program.
user_id is a foreign key which references the primary in mock2. I have been trying to check if mock2 has values ('foo1', 'bar1') and ('foo2', 'bar2').
After this I am trying to insert new values into mock1 which would have a date and integer value and reference the primary key of the row in mock2 to the foreign key in mock1.
mock1 table looks like this:
===============================
| date | time | user_id |
| date | integer | integer |
| | | |
And the table mock2 is:
==================================
| Id | name | program |
| integer | text | test |
Id is a primary key for the table and the name is UNIQUE.
I've been playing around with this solution https://dba.stackexchange.com/questions/46410/how-do-i-insert-a-row-which-contains-a-foreign-key
However, I haven't been able to make it work. Could someone please point out what the correct syntax is for this, I would be really appreciative.
EDIT:
The create table statements are:
CREATE TABLE mock2(
id SERIAL PRIMARY KEY UNIQUE,
name text NOT NULL,
program text NOT NULL UNIQUE
);
and
CREATE TABLE mock1(
date date,
time_spent INTEGER,
user_id integer REFERENCES mock2(Id) NOT NULL);
Ok so I found an answer to my own question.
WITH ins (date,time_spent, id) AS
( VALUES
( '22/08/2012', 170, (SELECT id FROM mock3 WHERE program ='bar'))
)
INSERT INTO mock4
(date, time_spent, user_id)
SELECT
ins.date, ins.time_spent, mock3.id
FROM
mock3 JOIN ins
ON ins.id = mock3.id ;
I was trying to take the 2 values from the first table, match these and then insert 2 new values to the next table, but I realised that I should be using the Primary and Foreign keys to my advantage.
I instead now JOIN on the ID and then just select the key I need by searching it from the values with (SELECT id FROM mock3 WHERE program ='bar') in the third row.

Redshift psql auto increment on even number

I am trying to create a table with an auto-increment column as below. Since Redshift psql doesn't support SERIAL, I had to use IDENTITY data type:
IDENTITY(seed, step)
Clause that specifies that the column is an IDENTITY column. An IDENTITY column contains unique auto-generated values. These values start with the value specified as seed and increment by the number specified as step. The data type for an IDENTITY column must be either INT or BIGINT.`
My create table statement looks like this:
CREATE TABLE my_table(
id INT IDENTITY(1,1),
name CHARACTER VARYING(255) NOT NULL,
PRIMARY KEY( id )
);
However, when I tried to insert data into my_table, rows increment only on the even number, like below:
id | name |
----+------+
2 | anna |
4 | tom |
6 | adam |
8 | bob |
10 | rob |
My insert statements look like below:
INSERT INTO my_table ( name )
VALUES ( 'anna' ), ('tom') , ('adam') , ('bob') , ('rob' );
I am also having trouble with bringing the id column back to start with 1. There are solutions for SERIAL data type, but I haven't seen any documentation for IDENTITY.
Any suggestions would be much appreciated!
You have to set your identity as follows:
id INT IDENTITY(0,1)
Source: http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_examples.html
And you can't reset the id to 0. You will have to drop the table and create it back again.
Set your seed value to 1 and your step value to 1.
Create table
CREATE table my_table(
id bigint identity(1, 1),
name varchar(100),
primary key(id));
Insert rows
INSERT INTO organization ( name )
VALUES ('anna'), ('tom') , ('adam'), ('bob'), ('rob');
Results
id | name |
----+------+
1 | anna |
2 | tom |
3 | adam |
4 | bob |
5 | rob |
For some reason, if you set your seed value to 0 and your step value to 1 then the integer will increase in steps of 2.
Create table
CREATE table my_table(
id bigint identity(0, 1),
name varchar(100),
primary key(id));
Insert rows
INSERT INTO organization ( name )
VALUES ('anna'), ('tom') , ('adam'), ('bob'), ('rob');
Results
id | name |
----+------+
0 | anna |
2 | tom |
4 | adam |
6 | bob |
8 | rob |
This issue is discussed at length in AWS forum.
https://forums.aws.amazon.com/message.jspa?messageID=623201
The answer from the AWS.
Short answer to your question is seed and step are only honored if you
disable both parallelism and the COMPUPDATE option in your COPY.
Parallelism is disabled if and only if you're loading your data from a
single file, which is what we normally do not recommend, and hence
will be an unlikely scenario for most users.
Parallelism impacts things because in order to ensure that there is no
single point of contention in assigning identity values to rows, there
end up being gaps in the value assignment. When parallelism is
disabled, the load is happening serially, and therefore, there is no
issue with assigning different id values in parallel.
The reason COMPUPDATE impacts things is when it's enabled, the COPY is
actually making 2 passes over your data. During the first pass, it
internally increments the identity values, and as a result, your
initial value starts with a larger value than you'd expect.
We'll update the doc to reflect this.
Also multiple nodes seems to cause such effect with IDENTITY column. In essence it can only provide you with guaranteed unique IDs.

Postgres start table ID from 1000

Before you mark this a duplicate. I found this answer on another thread and having difficulties making it work.
From psql I see my table:
\d people
Table:
Column | Type | Modifiers
---------------+----------------------------------+-----------------------------------------------------------------------
id | integer | not null default
nextval('people_id_seq'::regclass)
Code I tried which seems to do nothing...
ALTER SEQUENCE people_id_seq RESTART 1000
How do I make the primary key start from 1000?
The following query would set the sequence value to 999. The next time the sequence is accessed, you would get 1000.
SELECT setval('people_id_seq', 999);
Reference:
Sequence Manipulation Functions on PostgreSQL Manual
Why are you declaring your id like that ?
I mean, I would do the following :
create table people(
id serial,
constraint primaryKeyID primary key(id));
And now if you want to start your sequence from 1000, your alter query will work.
alter sequence people_id_seq restart 1000