Serial numbers per group of rows for compound key - sql

I am trying to maintain an address history table:
CREATE TABLE address_history (
person_id int,
sequence int,
timestamp datetime default current_timestamp,
address text,
original_address text,
previous_address text,
PRIMARY KEY(person_id, sequence),
FOREIGN KEY(person_id) REFERENCES people.id
);
I'm wondering if there's an easy way to autonumber/constrain sequence in address_history to automatically count up from 1 for each person_id.
In other words, the first row with person_id = 1 would get sequence = 1; the second row with person_id = 1 would get sequence = 2. The first row with person_id = 2, would get sequence = 1 again. Etc.
Also, is there a better / built-in way to maintain a history like this?

Don't. It has been tried many times and it's a pain.
Use a plain serial or IDENTITY column:
Auto increment table column
CREATE TABLE address_history (
address_history_id serial PRIMARY KEY
, person_id int NOT NULL REFERENCES people(id)
, created_at timestamp NOT NULL DEFAULT current_timestamp
, previous_address text
);
Use the window function row_number() to get serial numbers without gaps per person_id. You could persist a VIEW that you can use as drop-in replacement for your table in queries to have those numbers ready:
CREATE VIEW address_history_nr AS
SELECT *, row_number() OVER (PARTITION BY person_id
ORDER BY address_history_id) AS adr_nr
FROM address_history;
See:
Gap-less sequence where multiple transactions with multiple tables are involved
Or you might want to ORDER BY something else. Maybe created_at? Better created_at, address_history_id to break possible ties. Related answer:
Column with alternate serials
Also, the data type you are looking for is timestamp or timestamptz, not datetime in Postgres:
Ignoring time zones altogether in Rails and PostgreSQL
And you only need to store previous_address (or more details), not address, nor original_address. Both would be redundant in a sane data model.

Related

Prevent Duplicate Entries in SQL Application

I have an application that's running on two different machines.
The use of the application is pretty simple, we scan a product and it associates the product_id and creates a Unique_ID that's auto-incremental.
Ex: U00001 then the next is U00002
My problem is while both the machines are running, sometimes the Unique_ID is the same for two different products. It's like the creation of the Unique_ID happens at the same time so it duplicates the entry.
What's the best approach for this? Is it a connection problem?
You need a SEQUENCE or IDENTITY column, and then a computed column concatenates the U onto it
CREATE TABLE YourTable (
ID int IDENTITY PRIMARY KEY,
product_id varchar(30),
Unique_ID AS FORMAT(ID, '"U"0000)
)
Or
CREATE SEQUENCE YourTable_IDs AS int START WITH 1 INCREMENT BY 1 MAXVALUE 9999;
CREATE TABLE YourTable (
ID int PRIMARY KEY DEFAULT (NEXT VALUE FOR YourTable_IDs),
product_id varchar(30),
Unique_ID AS FORMAT(ID, '"U"0000)
)

Auto increment column depending other columns value

Hi I'm very new to Postgresql or SQL in general, so my terminology will probably be off.
I'm trying to add a version number column to my text text table.
name
type
desc
text_id
uuid
unique id
post_id
uuid
id of the collection of versions
version_nr
INT
version number of document
created_at
TIMESTAMP
when the document was edited
So basically when I create a new row I want to increment the version number, but I don't want all the rows to share the same "increment". I want all rows sharing the same post_id to have their own "increment".
This is what I've come up with this far:
CREATE TABLE text (
text_id uuid PRIMARY KEY DEFAULT UUID_GENERATE_V4(),
post_id uuid NOT NULL,
version_nr SERIAL -- <--- I DONT KNOW
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
)
I just don't understand how to solve the version_nr part.
Thanks!

How to store array of tables in Schema

So I am a student currently learning about PostgreSQL. I am trying to figure out the way, how to randomly seed data. I have 10M users and 100 stocks.
Currently my tables will look like:
CREATE TABLE user (
user_id INTEGER NOT NULL,
amount_of_stocks [][] array, -- this is just assumption
PRIMARY KEY (user_id)
);
CREATE TABLE stock (
stock_id INTEGER NOT NULL,
amount_per_stock INT,
quantity INT
PRIMARY KEY (stock_id)
);
How would I store 100 different stocks for each user?
Sounds like a classical many-to-many relationship. Should not involve arrays at all. Assuming Postgres 10 or later, use something along these lines:
CREATE TABLE users ( -- "user" is a reserved word!
user_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, username text UNIQUE NOT NULL -- or similar
);
CREATE TABLE stock (
stock_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, stock text UNIQUE NOT NULL -- or similar
);
CREATE TABLE user_stock (
user_id int REFERENCES users
, stock_id int REFERENCES stock
, amount int NOT NULL
, PRIMARY KEY (user_id, stock_id)
);
Detailed explanation:
How to implement a many-to-many relationship in PostgreSQL?
Auto increment table column
Seed
Postgres provides generate_series() to conveniently generate rows. random() is instrumental to generate random numbers:
INSERT INTO users(username)
SELECT 'user_' || g
FROM generate_series(1, 10000000) g; -- 10M (!) - try with just 10 first
INSERT INTO stock(stock)
SELECT 'stock_' || g
FROM generate_series(1, 100) g;
Experiment with a small number of users first. 10M users * 100 stocks generates a billion rows. Takes some time and occupies some space.
How would I store 100 different stocks for each user?
INSERT INTO user_stock
(user_id, stock_id, amount)
SELECT u.user_id, s.stock_id, ceil(random() * 1000)::int
FROM users u, stock s; -- cross join
Every user gets 100 different stocks - though everyone gets the same set in this basic example, you did not define more closely. I added a random amount per stock between 1 and 1000.
About the cross join to produce the Cartesian product:
What does [FROM x, y] mean in Postgres?
CREATE TABLE user (
user_id INTEGER NOT NULL,
stocks text[],
PRIMARY KEY (user_id)
);
Store a list of the primary keys in your Stock table, so you can easily look up their values with a select statement.
If you wanted to, you could make the array two-dimensional, and store the value.. but that violates some principle, I'm sure, as you already have a table for that purpose.

Modelling Post and Flag relationship in SQL

I am modeling the data for my web I am building. I use Postgresql database.
In the app there are posts like SO posts and also the flags for posts as Github flags or marks, whatever the correct term for it. A post can have only one flag at a time. There are plenty of posts ever increasing, but four or five flags and they will not increase.
First approach, normalized; I have modeled this part of my data with three tables; two for the corresponding entities posts and flags, and one for the relationship as post_flag. No reference in any of the entity tables mentioned to the other entity table for relationship. All relationship is recorded in the relationship table post_flag, and that is only the id pair for ids of a post and a flag.
Table structure in that case would be:
CREATE TABLE posts
(
id bigserial PRIMARY KEY,
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
title character varying(100),
text text,
score integer DEFAULT 0,
author_id integer NOT NULL REFERENCES users (id),
product_id integer NOT NULL REFERENCES products (id),
);
CREATE TABLE flags
(
id bigserial PRIMARY KEY,
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
flag character varying(30) NOT NULL -- planned, in progress, fixed
);
CREATE TABLE post_flag
(
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
post_id integer NOT NULL REFERENCES posts (id),
flag_id integer NOT NULL REFERENCES flags (id)
);
To get posts flagged as fixed I have to use:
-- homepage posts- fixed posts tab
SELECT
p.*,
f.flag
FROM posts p
JOIN post_flag p_f
ON p.id = p_f.post_id
JOIN flags f
ON p_f.flag_id = f.id
WHERE f.flag = 'fixed'
ORDER BY p_f.created_at DESC
Second approach; I have two tables posts and flags. The table posts has a flag_id column that references a flag in the flags table.
CREATE TABLE posts
(
id bigserial PRIMARY KEY,
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
title character varying(100),
text text,
score integer DEFAULT 0,
author_id integer NOT NULL REFERENCES users (id),
product_id integer NOT NULL REFERENCES products (id),
flag_id integer DEFAULT NULL REFERENCES flags (id)
);
CREATE TABLE flags
(
id bigserial PRIMARY KEY,
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
flag character varying(30) NOT NULL -- one of planned, in progress, fixed
);
For same data;
-- homepage posts- fixed posts tab
SELECT
p.*,
f.flag
FROM posts p
JOIN flags f
ON p.flag_id = f.id
WHERE f.flag = 'fixed'
ORDER BY p.created_at DESC
Third approach denormalized; I have only one table posts. Posts table has a flag column to store the flag assigned to the post.
CREATE TABLE posts
(
id bigserial PRIMARY KEY,
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
title character varying(100),
text text,
score integer DEFAULT 0,
author_id integer NOT NULL REFERENCES users (id),
product_id integer NOT NULL REFERENCES products (id),
flag character varying(30)
);
Here I would only have for same data;
-- homepage posts- fixed posts tab
SELECT
p.*,
FROM posts p
WHERE p.flag = 'fixed'
ORDER BY p.created_at DESC
I wonder if first approach is an overkill in terms of normalization of data in a RDBMS like Postgresql? For a post comment relationship that first approach would be great and indeed I make use of it. But I have some very few quantity data used as meta data for posts as badges, flags, tags. As you see in fact in the most normal form, the first approach, I already use some product_id etc for a using one less JOIN but to another table as a different relation, not to the flags. So, there my approach fits into my second approach. Should I use the more denormalized approach, the third one, having posts table and a flag column in it? What is the better approach in terms of performance, expansion, and maintainability?
Use the second approach.
The first is a many-to-many data structure and you say
A post can have only one flag at a time.
So you would then have to build the business logic in to the front-end or set up complex rules to check a post never have more than one flag.
The third approach will result in messy data, again unless you implement checks or rules to ensure the flags are not misspelled or new ones added.
Expansion and maintainability are provided in the second approach; it is also self documenting. Worry about performance when it actually becomes a problem, and not before.
Personally I would make the flag_id field in the posts table NULL, which would allow you to model a post without a flag.
Blending two approaches
Assuming your flag names are unique, you can use the flag name as a natural key. Your table structures would then be
CREATE TABLE posts
(
id bigserial PRIMARY KEY,
... other fields
flag character varying(30) REFERENCES flags (flag)
);
CREATE TABLE flags
(
flag character varying(30) NOT NULL PRIMARY KEY,
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP
);
You then get the benefit of being able to write queries for flag without having to JOIN to the flags table while having flag names checked by the table reference.

Alternateive SQL Query for a Badly-Designed Oracle Table

I am pulling data from a legacy table (that I did not design) to convert that data for use in a different application. Here is the truncated table design:
-- Create table
create table order
(
insert_timestamp TIMESTAMP(6) default systimestamp not null,
numeric_identity NUMBER not null,
my_data VARCHAR2(100) not null
)
-- Create/Recreate primary, unique and foreign key constraints
alter table order
add constraint order_pk primary key (numeric_identity, insert_timestamp);
The idea behind this original structure was that the numeric_identity identified a particular customer. The most current order would be the one with the newest insert timestamp value for the given customer's numeric identity. In this particular case, there are no instances where more than one row has the same insert_timestamp value and numeric_identity value.
I'm tasked with retrieving this legacy data for conversion. I wrote the following query to pull back the latest, unique records, as older records need not be converted:
select * from order t where t.insert_timestamp =
(select max(w.insert_timestamp) from order
where t.numeric_identity = w.numeric_identity);
This pulls back the expected dataset, but could fail if there somehow were more than one row with the same insert_timestamp and numeric_identity. Is there a better query than what I've written to pull back unique records in a table designed in this fashion?
Another way to write this query:
select *
from (select t.*, row_number() over (partition by numeric_identity order by insert_timestamp desc) rn
from order t)
where rn = 1
Also, you can't get situation when one row has the same insert_timestamp and numeric_identity, because you have primary key on these two columns.