Remove duplicate rows ERROR: duplicate key value - sql

I want to remove duplicate rows from a table measurement in a PostgreSQL 9.1 data base.
Some table information:
select column_name, data_type from information_schema.columns where table_name = 'measurement';
column_name | data_type
-------------+-----------
s_sum | real
s_l3 | real
s_l2 | real
s_l1 | real
q_sum | real
q_l3 | real
q_l2 | real
q_l1 | real
p_sum | real
p_l3 | real
p_l2 | real
p_l1 | real
irms_n | real
irms_l3 | real
irms_l2 | real
irms_l1 | real
urms_l3 | real
urms_l2 | real
urms_l1 | real
timestamp | integer
site | integer
id | integer
(22 rows)
and
select count(*) from measurement;
count
----------
56265678
(1 row)
So what I want to do is to remove duplicate rows where all columns except id are equal. I went ahead and tried this with the approach in this answer.
SET temp_buffers = '1GB';
BEGIN;
CREATE TEMPORARY TABLE t_tmp AS
SELECT DISTINCT site,
timestamp,
urms_l1,
urms_l2,
urms_l3,
irms_l1,
irms_l2,
irms_l3,
irms_n,
p_l1,
p_l2,
p_l3,
p_sum,
q_l1,
q_l2,
q_l3,
q_sum,
s_l1,
s_l2,
s_l3,
s_sum
FROM measurement;
TRUNCATE measurement;
INSERT INTO measurement
SELECT * FROM t_tmp;
COMMIT;
where the echo / error is:
SET
BEGIN
SELECT 56103537
TRUNCATE TABLE
ERROR: duplicate key value violates unique constraint "measurement_pkey"
DETAIL: Key (id)=(1) already exists.
ROLLBACK
so it looks as if it would remove the duplicates alright (compare with number of rows of original table measurement above) but then a primary key constraint is violated. I do not really know what is going on here, I assume that the INSERT is not operating on the truncated table...
Update:
The requested sql schema is as follows:
--
-- PostgreSQL database dump
--
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
--
-- Name: plpgsql; Type: EXTENSION; Schema: -; Owner: -
--
CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;
--
-- Name: EXTENSION plpgsql; Type: COMMENT; Schema: -; Owner: -
--
COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
--
-- Name: measurement; Type: TABLE; Schema: public; Owner: -; Tablespace:
--
CREATE TABLE measurement (
id integer NOT NULL,
site integer,
"timestamp" integer,
urms_l1 real,
urms_l2 real,
urms_l3 real,
irms_l1 real,
irms_l2 real,
irms_l3 real,
irms_n real,
p_l1 real,
p_l2 real,
p_l3 real,
p_sum real,
q_l1 real,
q_l2 real,
q_l3 real,
q_sum real,
s_l1 real,
s_l2 real,
s_l3 real,
s_sum real
);
--
-- Name: measurement_pkey; Type: CONSTRAINT; Schema: public; Owner: -; Tablespace:
--
ALTER TABLE ONLY measurement
ADD CONSTRAINT measurement_pkey PRIMARY KEY (id);
--
-- Name: public; Type: ACL; Schema: -; Owner: -
--
REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON SCHEMA public FROM postgres;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO PUBLIC;
--
-- PostgreSQL database dump complete
--
And then
SELECT id
FROM measurement
GROUP BY id
HAVING COUNT(*) > 1;
yields
id
----
(0 rows)

The primary key is a unique constraint on a subset of the fields in your measurement table, while your SELECT DISTINCT returns only unique records from the fields you list, but looks at every field in each record, not just the primary key
That is, you have records which have the same primary key (id, apparently), but have different values in the non-key fields.
You can find keys that have duplicate ids by running this:
SELECT id
FROM t_tmp
GROUP BY id
HAVING COUNT(*) > 1;
And you can display the records related to that by doing this:
SELECT *
FROM t_tmp
WHERE id IN (
SELECT id
FROM t_tmp
GROUP BY id
HAVING COUNT(*) > 1
);
[Note that I specify t_tmp above, but if you haven't actually run TRUNCATE TABLE measurement; yet that you can use measurement instead.]
Those are the records that have duplicated ids that are causing your key violations, assuming that the key is just on id, which it looks like it is from the error message. You'll need to decide on which one to keep and which one to delete, or otherwise consider updating the id field to a new unique value.
It's not clear if id is tied to a sequence or was created as a SERIAL or BIGSERIAL in your new table. You should really just generate a CREATE TABLE script from pgAdmin to give us the full schema. It's also not clear if there are other unique constraints on the table, either.

Related

Copy data from one table to another existing table combined by calculated variable and specific columns

The scenario is as described in the following steps (this is only an example to illustrate the problem)
There are two tables in the database (user_1, post_1) - no real relation between them
user_1 contains the following fields:
id VARCHAR,
name VARCHAR,
address TEXT,
phone_number VARCHAR,
PRIMARY KEY (id)
post_1 contains the following fields:
id VARCHAR,
user_id VARCHAR,
title VARCHAR,
body TEXT,
PRIMARY KEY (id)
Suppose I added the following data to the two tables above:
INSERT INTO user_1(id, name, address, phone_number) VALUES ('first_u', 'avi', 'some address', '05488789906');
INSERT INTO post_1(id, user_id, title, body) VALUES ('first_p', 'first_u', 'new post', 'This is a good one!');
Now I've created new tables with few changes and with a link between them:
user_2 contains the following fields:
id uuid DEFAULT uuid_generate_v4(),
name VARCHAR,
address TEXT,
phone_number VARCHAR,
PRIMARY KEY (id)
post_2 contains the following fields:
id uuid DEFAULT uuid_generate_v4(),
user_id uuid,
title VARCHAR,
body TEXT,
PRIMARY KEY (id),
CONSTRAINT fk_user FOREIGN KEY(user_id) REFERENCES user_2(id)
Now the problem is with the copy of the data to the two new tables with the real link (foreign key) between them.
I will explain - it is divided into few parts:
a. Count the number of rows in the user_1 table (I've created a function for that):
CREATE OR REPLACE FUNCTION rowsnumber() RETURNS INTEGER AS $$
DECLARE
numberOfRows integer;
BEGIN
SELECT COUNT(*) INTO numberOfRows FROM (
SELECT *
FROM user_1
) t;
RETURN numberOfRows;
END;
$$ LANGUAGE plpgsql;
b. Go through the function result in the for loop and do the following:
(b.1) Extract the row number x according to the current index in the for loop (can achieve this by using limit and offset) from user_1 table.
(b.2) Insert the row data into the user_2 table.
(b.3) Extract the newly created uuid and save it in a parameter.
(b.4) Retrieve the appropriate row from the post_1 table according to the id column of the current row stored in the user_id column of the post_1 table.
(b.5) Insert the appropriate data of the row that we extracted in stage (b.4) into the post_2 table BUT with the user_id that we extracted in step (b.3) so that there is a real relationship between the post_2 table and the user_2 table.
I would greatly appreciate any help - if someone could write the query I would need to run to solve this problem.
Many thanks.
After several attempts I was able to implement the function that would answer the question - according to the steps I described above.
CREATE OR REPLACE FUNCTION insertToTables() RETURNS VOID AS $$
DECLARE
numOfRowsInTable integer := rowsnumber();
newRow record;
userUUIDRow record;
postRow record;
BEGIN
for r in 0..numOfRowsInTable-1
loop
SELECT * INTO newRow FROM user_1 LIMIT 1 OFFSET r;
INSERT INTO user_2(name, address, phone_number) SELECT newRow.name,newRow.address, newRow.phone_number ;
SELECT * INTO userUUIDRow FROM user_2 ORDER BY created_at DESC LIMIT 1;
SELECT * INTO postRow FROM post_1 WHERE user_id = newRow.id;
INSERT INTO post_2(user_id, title, body) SELECT userUUIDRow.id, postRow.title, postRow.body;
end loop;
END;
$$ LANGUAGE plpgsql;

Problem with alias in Sqliteman trying to make unique id´s

I´m trying to translate this function from SQL server to Sqlite, how can i do it? I know it´s different but I just couldn´t. Any information will be awesome. Thanks
This is the code:
CREATE TABLE "user" (
"numId" INT IDENTITY(1,1) NOT NULL,
"prefix" VARCHAR(50) NOT NULL,
"id" AS ("prefix"+RIGHT('000000'+CAST(ID AS VARCHAR(7)),7))PERSISTED,
Thanks.
Normal SQL tables are called rowid tables, and have a unique integer as their primary key (A INTEGER PRIMARY KEY column aliases this rowid). So there's the IDENTITY bit taken care of.
While Sqlite 3.31 (In beta as of this writing) supports generated columns, until versions with that feature become widely available over the next few years, you can either create a VIEW that adds your custom prefix, or do it in each individual SELECT where it matters. Something like:
sqlite> CREATE TABLE actual_user(numId INTEGER PRIMARY KEY, prefix TEXT NOT NULL);
sqlite> CREATE VIEW user(numId, prefix, id) AS
...> SELECT numId, prefix, printf('%s%07d', prefix, numId) FROM actual_user;
sqlite> INSERT INTO actual_user(prefix) VALUES ('TEST'), ('TEST'), ('TESTX');
sqlite> SELECT * FROM user;
numId prefix id
---------- ---------- ------------
1 TEST TEST0000001
2 TEST TEST0000002
3 TESTX TESTX0000003
You might also emulate it with triggers for your table that updates the id value on insertion and update. This more closely mimics a persist (stored in SQLite lingo) generated column, while the view mimics a virtual column:
sqlite> CREATE TABLE user(numId INTEGER PRIMARY KEY, prefix TEXT NOT NULL, id TEXT);
sqlite> CREATE TRIGGER user_ai AFTER INSERT ON user
...> BEGIN UPDATE user SET id = printf('%s%07d', new.prefix, new.numId) WHERE numId = new.numId; END;
sqlite> CREATE TRIGGER user_au AFTER UPDATE OF numId, prefix ON user
...> BEGIN UPDATE user SET id = printf('%s%07d', new.prefix, new.numId) WHERE numId = new.numId; END;
sqlite> INSERT INTO user(prefix) VALUES ('TEST');
sqlite> SELECT * FROM user;
numId prefix id
---------- ---------- ------------
1 TEST TEST0000001
sqlite> UPDATE user SET prefix='APPLE' WHERE numId = 1;
sqlite> SELECT * FROM user;
numId prefix id
---------- ---------- ------------
1 APPLE APPLE0000001

Multiple autoincrement ids based on table column

I need help in database design.
I have following tables.
Pseudo code:
Table order_status {
id int[pk, increment]
name varchar
}
Table order_status_update {
id int[pk, increment]
order_id int[ref: > order.id]
order_status_id int[ref: > order_status.id]
updated_at datetime
}
Table order_category {
id int[pk, increment]
name varchar
}
Table file {
id int[pk, increment]
order_id int[ref: > order.id]
key varchar
name varchar
path varchar
}
Table order {
id int [pk] // primary key
order_status_id int [ref: > order_status.id]
order_category_id int [ref: > order_category.id]
notes varchar
attributes json // no of attributes is not fixed, hence needed a json column
}
Everything was okay, but now I need an auto-increment id for each type of order_category_id column.
For example, if I have 2 categories electronics and toys , then I would need electronics-1, toy-1, toy-2, electronics-2, electronics-3, toy-3, toy-4, toy-5 values associated with rows of order table. But it's not possible as auto-increment increments based on each new row, not column type.
In other words, for table order instead of
id order_category_id
---------------------
1 1
2 1
3 1
4 2
5 1
6 2
7 1
I need following,
id order_category_id pretty_ids
----------------------------
1 1 toy-1
2 1 toy-2
3 1 toy-3
4 2 electronics-1
5 1 toy-4
6 2 electronics-2
7 1 toy-5
What I tried:
I created separate table for each order category (not an ideal solution but currently I have 6 order categories, so it works for now )
Now, I have table for electronics_order and toys_order. Columns are repetitive, but it works. But now I have another problem, my every relationship with other tables got ruined. Since, both electronics_order and toys_orders can have same id, I cannot use id column to reference order_status_update, order_status, file tables.
I can create another column order_category in each of these tables, but will it be the right way? I am not experienced in database design, so I would like to know how others do it.
I also have a side question.
Do I need tables for order_category and order_status just to store names? Because these values will not change much and I can store them in code and save in columns of order table.
I know separate tables are good for flexibility, but I had to query database 2 times to fetch order_status and order_category by name before inserting new row to order table. And later it will be multiple join for querying order table.
--
If it helps, I am using flask-sqlalchemy in backend and postgresql as database server.
In order to track the increment id which is based on the order_category, we can keep track of this value on another table. Let us call this table: order_category_sequence. To show my solution, I just created simplified version of order table with order_category.
CREATE TABLE order_category (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NULL
);
CREATE TABLE order_category_sequence (
id SERIAL PRIMARY KEY,
order_category_id int NOT NULL,
current_key int not null
);
Alter Table order_category_sequence Add Constraint "fk_order_category_id" FOREIGN KEY (order_category_id) REFERENCES order_category (id);
Alter Table order_category_sequence Add Constraint "uc_order_category_id" UNIQUE (order_category_id);
CREATE TABLE "order" (
id SERIAL PRIMARY KEY,
order_category_id int NOT NULL,
pretty_id VARCHAR(100) null
);
Alter Table "order" Add Constraint "fk_order_category_id" FOREIGN KEY (order_category_id) REFERENCES order_category (id);
The order_category_id column in order_category_sequence table refers the order_category. The current_key column holds the last value in order.
When a new order row is added, we can use a trigger to read the last value from order_category_sequence and update pretty_id. The following trigger definition can be used to achieve this.
--function called everytime a new order is added
CREATE OR REPLACE FUNCTION on_order_created()
RETURNS trigger AS
$BODY$
DECLARE
current_pretty_id varchar(100);
BEGIN
-- increment last value of the corresponding order_category_id in the sequence table
Update order_category_sequence
set current_key = (current_key + 1)
where order_category_id = NEW.order_category_id;
--prepare the pretty_id
Select
oc.name || '-' || s.current_key AS current_pretty_id
FROM order_category_sequence AS s
JOIN order_category AS oc on s.order_category_id = oc.id
WHERE s.order_category_id = NEW.order_category_id
INTO current_pretty_id;
--update order table
Update "order"
set pretty_id = current_pretty_id
where id = NEW.id;
RETURN NEW;
END;
$BODY$ LANGUAGE plpgsql;
CREATE TRIGGER order_created
AFTER INSERT
ON "order"
FOR EACH ROW
EXECUTE PROCEDURE on_order_created();
If we want to synchronize the two table, order_category and order_category_sequence, we can use another trigger to have a row in the latter table every time a new order category is added.
//function called everytime a new order_category is added
CREATE OR REPLACE FUNCTION on_order_category_created()
RETURNS trigger AS
$BODY$
BEGIN
--insert a new row for the newly inserted order_category
Insert into order_category_sequence(order_category_id, current_key)
values (NEW.id, 0);
RETURN NEW;
END;
$BODY$ LANGUAGE plpgsql;
CREATE TRIGGER order_category_created
AFTER INSERT
ON order_category
FOR EACH ROW
EXECUTE PROCEDURE on_order_category_created();
Testing query and result:
Insert into order_category(name)
values ('electronics'),('toys');
Insert into "order"(order_category_id)
values (1),(2),(2);
select * from "order";
Regarding your side question, I prefer to store the lookup values like order_status and order_category in separate tables. Doing this allows to have the above flexibility and it is easy when we have changes.
To answer your side question: yes, you should keep tables with names in them, for a number of reasons. First of all, such tables are small and generally kept in memory by the database, so there is negligible performance benefit to not using the tables. Second, you want to be able to use external tools to query the database and generate reports, and you want these kind of labels available to those tools. Third, you want to minimize the coupling of your software to the actual data so that they can evolve independently. Adding a new category should not require modifying your software.
Now, to the main question, there is no built-in facility for the kind of auto-increment you want. You have to build it yourself.
I suggest you keep the sequence number for each category as a column in the category table. Then you can update it and use the updated sequence number in the order table, like this (which is specific to PostgreSQL):
-- set up the tables
create table orders (
id SERIAL PRIMARY KEY,
order_category_id int,
pretty_id VARCHAR
);
create unique index order_category_pretty_id_idx
on orders (pretty_id);
create table order_category (
id SERIAL PRIMARY KEY,
name varchar NOT NULL,
seq int NOT NULL default 0
);
-- create the categories
insert into order_category
(name) VALUES
('toy'), ('electronics');
-- create orders, specifying the category ID and generating the pretty ID
WITH
new_category_id (id) AS (VALUES (1)), -- 1 here is the category ID for the new order
pretty AS (
UPDATE order_category
SET seq = seq + 1
WHERE id = (SELECT id FROM new_category_id)
RETURNING *
)
INSERT into orders (order_category_id, pretty_id)
SELECT new_category_id.id, concat(pretty.name, '-', pretty.seq)
FROM new_category_id, pretty;
You just plug in your category ID where I have 1 in the example and it will create the new pretty_id for that category. The first category will be toy-1, the next toy-2, etc.
| id | order_category_id | pretty_id |
| --- | ----------------- | ------------- |
| 1 | 1 | toy-1 |
| 2 | 1 | toy-2 |
| 3 | 2 | electronics-1 |
| 4 | 1 | toy-3 |
| 5 | 2 | electronics-2 |
In order to do toys-1 toys-2 toys-3 you should repeat the logic of order_status update, There is no difference between track some status by time or by count.
Just in the order_status update it is simpler you just put now() into updated_at for lets say order_category_track you would take last value + 1 or have different sequences respectively category (would not recommend to do like this because it binds database objects with data in the DB).
I would change a schema to:
In this schema might be in inconsistent state. But in my opinion in your application there are three different entities "order","order_status","order category track" which live their own lives.
And still it is almost impossible to achieve consistent state for this task with out locks for example. This task is complicated by condition that next rows depends on previous what contradicts with SQL.
I would suggest to split category into 2-level hierarchy: category (toy, electronic) and subcategory (toy-1, toy-2, electronic-1, etc.):
So you can use column order_subcategory.full_name contain compiled "toy-1" value, or you can create view to make this field on the fly:
select oc.name || "-" || os.number
from order_category as oc
join order_subcategory as os on oc.id = os.category_id
https://dbdiagram.io/d/5dd6a132edf08a25543e34f8
Regarding your questions "Do I need tables for order_category and order_status just to store names?":
It is best practice to store this kind of data as a separate dictionary table. It gives you consistency and reliability. Querying those tables is very fast and easy for RDBMS, so feel free to use it.
I'll focus on only 3 tables you showed: order, order_status and order_category.
Creating a new table for a new record is not the right way. As your explanation, I think you trying to use order and order_category tables as many to many relationship. If it's so, the thing you need is a pivot table like this:
I currently add order_status column in order table,
you can add this column one of these tables as your need.
side question:
for order_status, if order status is fixed,( like only ACTIVE,INACTIVE and it won't be more values in the future) it would be better to use a column with ENUM type.
The easy answer would be to answer directly to your question. But I do not think it is a good thing in this case. So I will do otherwise.
I think that maybe the whole conception is wrong.
First things first : clarification of your business needs and assertions.
One order can have multiple categories
One category can concern multiple orders
One order can only have one status at a time but multiple through time
One status can be used by multiple orders
One order correspond to a file (probably a billing proof)
One file concerns only one order
Second : Advices
There is a little amount of reserved key words that you must not use in production environment. (https://www.postgresql.org/docs/current/sql-keywords-appendix.html). So for example I replace the word 'order' by 'command'.
Remaining questions that mandatory needs an answer before production : why the attributes attribute in your 'order' table? There is a risk of non respect of normal forms here. (https://www.geeksforgeeks.org/normal-forms-in-dbms/)
Third : conception solution
This normally is enough to give you a good start. But I wanna have fun a little more :) So...
Fourth : interrogation on needed performance
estimation of load per day/month in order (ten million rows per month?)
Fifth : physical solution proposition
Archiving in another tablespace (trigger when cancel or terminated => archived)
Indexes in another tablespace (your dba will thank you for that)
Possible partitionning of order table (https://pgxn.org/dist/pg_partman/doc/pg_partman.html, https://www.postgresql.org/docs/current/ddl-partitioning.html)
Hardware and option choosings (high availibility? disaster management? if it is: the elaboration needs further study but few)
Data transposition (is it really needed? if it is: the elaboration needs further study but few)
The finaaaaaal code-down ! (with the good music)
-- as a postgres user
CREATE DATABASE command_system;
CREATE SCHEMA in_prgoress_command;
CREATE SCHEMA archived_command;
--DROP SCHEMA public;
-- create tablespaces on other location than below
CREATE TABLESPACE command_indexes_tbs location 'c:/Data/indexes';
CREATE TABLESPACE archived_command_tbs location 'c:/Data/archive';
CREATE TABLESPACE in_progress_command_tbs location 'c:/Data/command';
CREATE TABLE in_prgoress_command.command
(
id bigint /*or bigserial if you use a INSERT RETURNING clause*/ primary key
, notes varchar(500)
, fileULink varchar (500)
)
TABLESPACE in_progress_command_tbs;
CREATE TABLE archived_command.command
(
id bigint /*or bigserial if you use a INSERT RETURNING clause*/ primary key
, notes varchar(500)
, fileULink varchar (500)
)
TABLESPACE archived_command_tbs;
CREATE TABLE in_prgoress_command.category
(
id int primary key
, designation varchar(45) NOT NULL
)
TABLESPACE in_progress_command_tbs;
INSERT INTO in_prgoress_command.category
VALUES (1,'Toy'), (2,'Electronic'), (3,'Leather'); --non-exaustive list
CREATE TABLE in_prgoress_command.status
(
id int primary key
, designation varchar (45) NOT NULL
)
TABLESPACE in_progress_command_tbs;
INSERT INTO in_prgoress_command.status
VALUES (1,'Shipping'), (2,'Cancel'), (3,'Terminated'), (4,'Payed'), (5,'Initialised'); --non-exaustive list
CREATE TABLE in_prgoress_command.command_category
(
id bigserial primary key
, idCategory int
, idCommand bigint
)
TABLESPACE in_progress_command_tbs;
ALTER TABLE in_prgoress_command.command_category
ADD CONSTRAINT fk_command_category_category FOREIGN KEY (idCategory) REFERENCES in_prgoress_command.category(id);
ALTER TABLE in_prgoress_command.command_category
ADD CONSTRAINT fk_command_category_command FOREIGN KEY (idCommand) REFERENCES in_prgoress_command.command(id);
CREATE INDEX idx_command_category_category ON in_prgoress_command.command_category USING BTREE (idCategory) TABLESPACE command_indexes_tbs;
CREATE INDEX idx_command_category_command ON in_prgoress_command.command_category USING BTREE (idCommand) TABLESPACE command_indexes_tbs;
CREATE TABLE archived_command.command_category
(
id bigserial primary key
, idCategory int
, idCommand bigint
)
TABLESPACE archived_command_tbs;
ALTER TABLE archived_command.command_category
ADD CONSTRAINT fk_command_category_category FOREIGN KEY (idCategory) REFERENCES in_prgoress_command.category(id);
ALTER TABLE archived_command.command_category
ADD CONSTRAINT fk_command_category_command FOREIGN KEY (idCommand) REFERENCES archived_command.command(id);
CREATE INDEX idx_command_category_category ON archived_command.command_category USING BTREE (idCategory) TABLESPACE command_indexes_tbs;
CREATE INDEX idx_command_category_command ON archived_command.command_category USING BTREE (idCommand) TABLESPACE command_indexes_tbs;
CREATE TABLE in_prgoress_command.command_status
(
id bigserial primary key
, idStatus int
, idCommand bigint
, change_timestamp timestamp --anticipate if you can the time-zone problematic
)
TABLESPACE in_progress_command_tbs;
ALTER TABLE in_prgoress_command.command_status
ADD CONSTRAINT fk_command_status_status FOREIGN KEY (idStatus) REFERENCES in_prgoress_command.status(id);
ALTER TABLE in_prgoress_command.command_status
ADD CONSTRAINT fk_command_status_command FOREIGN KEY (idCommand) REFERENCES in_prgoress_command.command(id);
CREATE INDEX idx_command_status_status ON in_prgoress_command.command_status USING BTREE (idStatus) TABLESPACE command_indexes_tbs;
CREATE INDEX idx_command_status_command ON in_prgoress_command.command_status USING BTREE (idCommand) TABLESPACE command_indexes_tbs;
CREATE UNIQUE INDEX idxu_command_state ON in_prgoress_command.command_status USING BTREE (change_timestamp, idStatus, idCommand) TABLESPACE command_indexes_tbs;
CREATE OR REPLACE FUNCTION sp_trg_archiving_command ()
RETURNS TRIGGER
language plpgsql
as $function$
DECLARE
BEGIN
-- Copy the data
INSERT INTO archived_command.command
SELECT *
FROM in_prgoress_command.command
WHERE new.idCommand = idCommand;
INSERT INTO archived_command.command_status (idStatus, idCommand, change_timestamp)
SELECT idStatus, idCommand, change_timestamp
FROM in_prgoress_command.command_status
WHERE idCommand = new.idCommand;
INSERT INTO archived_command.command_category (idCategory, idCommand)
SELECT idCategory, idCommand
FROM in_prgoress_command.command_category
WHERE idCommand = new.idCommand;
-- Delete the data
DELETE FROM in_prgoress_command.command_status
WHERE idCommand = new.idCommand;
DELETE FROM in_prgoress_command.command_category
WHERE idCommand = new.idCommand;
DELETE FROM in_prgoress_command.command
WHERE idCommand = new.idCommand;
END;
$function$;
DROP TRIGGER IF EXISTS t_trg_archiving_command ON in_prgoress_command.command_status;
CREATE TRIGGER t_trg_archiving_command
AFTER INSERT
ON in_prgoress_command.command_status
FOR EACH ROW
WHEN (new.idstatus = 2 or new.idStatus = 3)
EXECUTE PROCEDURE sp_trg_archiving_command();
CREATE TABLE archived_command.command_status
(
id bigserial primary key
, idStatus int
, idCommand bigint
, change_timestamp timestamp --anticipate if you can the time-zone problematic
)
TABLESPACE archived_command_tbs;
ALTER TABLE archived_command.command_status
ADD CONSTRAINT fk_command_command_status FOREIGN KEY (idStatus) REFERENCES in_prgoress_command.category(id);
ALTER TABLE archived_command.command_status
ADD CONSTRAINT fk_command_command_status FOREIGN KEY (idCommand) REFERENCES archived_command.command(id);
CREATE INDEX idx_command_status_status ON archived_command.command_status USING BTREE (idStatus) TABLESPACE command_indexes_tbs;
CREATE INDEX idx_command_status_command ON archived_command.command_status USING BTREE (idCommand) TABLESPACE command_indexes_tbs;
CREATE UNIQUE INDEX idxu_command_state ON archived_command.command_status USING BTREE (change_timestamp, idStatus, idCommand) TABLESPACE command_indexes_tbs;
Conclusion:
In many cases, when you are worried by the disposition of your keys it is because they are not in the good place. Same goes for cars ones! :D
Do not take any solution as prophetic solution : benchmark it.

Insert jsonb into postgresql

I want to create an object and save it ionto the DB
Log l = new Log();
l.setTimestamp("creation_date", Util.getCurrentTimestamp());
l.setString("app_name", "name");
l.setString("log_type", "type");
l.setLong("user_id", 9l);
l.setLong("module_element_id", 9);
l.set("info", JsonHelper.toJsonString("{}"));
l.save();
I've tried mutiple silution but always get this error:
ERROR: column "info" is of type jsonb but expression is of type character varying
How to insert a jsonb?
edit (DDL):
-- Table: public.log
-- DROP TABLE public.log;
CREATE TABLE public.log
(
id bigint NOT NULL DEFAULT nextval('log_id_seq'::regclass),
creation_date timestamp without time zone,
app_name text,
log_type text,
user_id bigint,
module_element_type bigint,
info jsonb,
CONSTRAINT log_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.log
OWNER TO postgres;
This implementation is missing in the framework currently. It was already reported in https://github.com/javalite/activejdbc/issues/640. If you can help tracking down Postgres documentation for all the Postgres data types where this syntax is used with PreparedStatement:
UPDATE my_table SET status = ?::status_type WHERE id = ?
I will then be in a position to quickly implement it for the PostgreSQL dialect: https://github.com/javalite/activejdbc/blob/master/activejdbc/src/main/java/org/javalite/activejdbc/dialects/PostgreSQLDialect.java
Update Jan 4 2022:
The implementation was added to the SNAPSHOT-3.0 and will make it to a next release 3.0 for Java 16.
You can see the discussion here: https://github.com/javalite/javalite/issues/640

Define foreign key in Postgres to a subset of a target table

Example:
I have:
Table A:
int id
int table_b_id
Table B:
int id
text type
I want to add a constraint check on column table_b_id that will verify that it points only to rows in table B which their type value is 'X'.
I can't change table structure.
I've understood it can be done with 'CHECK' and a postgres functions which will do the specific query but I've saw people recommending to avoid it.
Any inputs on what is the best approach to implement it will be helpful.
What you are referring to is not a FOREIGN KEY, which, in PostgreSQL, refers to a (number of) column(s) in an other table where there is a unique index on that/those column(s), and which may have associated automatic actions when the value(s) of that/those column(s) change (ON UPDATE, ON DELETE).
You are trying to enforce a specific kind of referential integrity, similar to what a FOREIGN KEY does. You can do this with a CHECK clause and a function (because the CHECK clause does not allow sub-queries), you can also do it with table inheritance and range partitioning (refer to a child table which holds only rows where type = 'X'), but it is probably the easiest to do this with a trigger:
CREATE FUNCTION trf_test_type_x() RETURNS trigger AS $$
BEGIN
PERFORM * FROM tableB WHERE id = NEW.table_b_id AND type = 'X';
IF NOT FOUND THEN
-- RAISE NOTICE 'Foreign key violation...';
RETURN NULL;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE tr_test_type_x
BEFORE INSERT OR UPDATE ON tableA
FOR EACH ROW EXECUTE PROCEDURE trf_test_type_x();
You can create a partial index on tableB to speed things up:
CREATE UNIQUE INDEX idx_type_X ON tableB(id) WHERE type = 'X';
The most elegant solution, in my opinion, is to use inheritance to get a subtyping behavior:
PostgreSQL 9.3 Schema Setup with inheritance:
create table B ( id int primary key );
-- Instead to create a 'type' field, inherit from B for
-- each type with custom properties:
create table B_X ( -- some_data varchar(10 ),
constraint pk primary key (id)
) inherits (B);
-- Sample data:
insert into B_X (id) values ( 1 );
insert into B (id) values ( 2 );
-- Now, instead to reference B, you should reference B_X:
create table A ( id int primary key, B_id int references B_X(id) );
-- Here it is:
insert into A values ( 1, 1 );
--Inserting wrong values will causes violation:
insert into A values ( 2, 2 );
ERROR: insert or update on table "a" violates foreign key constraint "a_b_id_fkey"
Detail: Key (b_id)=(2) is not present in table "b_x".
Retrieving all data from base table:
select * from B
Results:
| id |
|----|
| 2 |
| 1 |
Retrieving data with type:
SELECT p.relname, c.*
FROM B c inner join pg_class p on c.tableoid = p.oid
Results:
| relname | id |
|---------|----|
| b | 2 |
| b_x | 1 |