What is the best way to implement an auto-increment key that is "local" to some other column (i.e. comment id starts from 1 for each blog post)?
For instance, on GitHub, the issue number is local to the repository: issue #1 means that it's the first issue of your repo, and makes life easier for everyone by not having to use longer and seemingly random IDs.
For instance given:
CREATE TABLE post (
id bigserial PRIMARY KEY
, title varchar(255) NOT NULL
);
CREATE TABLE "comment" (
post_id bigint REFERENCES post NOT NULL
, id bigint NOT NULL
, "comment" text NOT NULL
, PRIMARY KEY (id, post_id)
);
One way to solve the problem is to calculate the max id of all comments for a given post_id:
INSERT INTO post (id, title) VALUES (1, 'first post');
INSERT INTO "comment" (post_id, id, "comment") VALUES (
1,
(SELECT COALESCE(MAX(id) + 1, 1) FROM "comment" WHERE post_id = 1 LIMIT 1),
'1st comment of 1st post'
);
^ This feels like a kludge, and I am also worried about possible serialisability issues too.
I wonder what is the best way to implement this (under PostgreSQL)? Thanks!
I would say that the simple method is to forget about it. That is, create the table like this:
create table comments (
comment_id bigserial primary key,
post_id bigint REFERENCES post NOT NULL,
comment text NOT NULL
);
And then calculate the value on the fly:
create view v_comments as
select c.*,
row_number() over (partition by post_id order by comment_id) as post_seqnum
from comments c;
Of course, this is not exactly the same thing. For instance, the post_seqnum does not uniquely identify each row over time -- because a delete might change the ordering.
However, this still has a unique identifier for each row that can be used for such purposes. Plus, there is a single primary key column, which is generally preferable for foreign key references and debugging.
Related
For a simple example, let's say I have a list table and a list_entry table:
CREATE TABLE list
(
id SERIAL PRIMARY KEY,
);
CREATE TABLE list_entry
(
id SERIAL PRIMARY KEY,
list_id INTEGER NOT NULL
REFERENCES list(id)
ON DELETE CASCADE,
position INTEGER NOT NULL,
value TEXT NOT NULL,
CONSTRAINT list_entry__position_in_list_unique
UNIQUE(list_id, position)
);
I now want to add the following constraint: all list entries with the same list_id have position entries that form a contiguous sequence starting at 1.
And I have no idea how.
I first thought about EXCLUDE constraints, but that seems to lead nowhere.
Could of course create a trigger, but I'd prefer not to, if at all possible.
You can't do that with a constraint - you would need to implement the logic in code (e.g. using triggers, stored procedures, application code, etc.)
I'm not aware of such way to use constraints. Normally a trigger would be the most straightforward choice, but in case you want to avoid using them, try to get the current position number for the list_entry with the list_id you're about to insert, e.g. inserting a list_entry with list_id = 1:
INSERT INTO list_entry (list_id,position,value) VALUES
(1,(SELECT coalesce(max(position),0)+1 FROM list_entry WHERE list_id = 1),42);
Demo: db<>fiddle
You can use a generated column to reference the previous number in the list, essentially building a linked list. This works in Postgres:
create table list_entry
(
pos integer not null primary key,
val text not null,
prev_pos integer not null
references list_entry (pos)
generated always as (greatest(0, pos-1)) stored
);
In this implementation, the first item (pos=0) points to itself.
I am modeling the data for my web I am building. I use Postgresql database.
In the app there are posts like SO posts and also the flags for posts as Github flags or marks, whatever the correct term for it. A post can have only one flag at a time. There are plenty of posts ever increasing, but four or five flags and they will not increase.
First approach, normalized; I have modeled this part of my data with three tables; two for the corresponding entities posts and flags, and one for the relationship as post_flag. No reference in any of the entity tables mentioned to the other entity table for relationship. All relationship is recorded in the relationship table post_flag, and that is only the id pair for ids of a post and a flag.
Table structure in that case would be:
CREATE TABLE posts
(
id bigserial PRIMARY KEY,
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
title character varying(100),
text text,
score integer DEFAULT 0,
author_id integer NOT NULL REFERENCES users (id),
product_id integer NOT NULL REFERENCES products (id),
);
CREATE TABLE flags
(
id bigserial PRIMARY KEY,
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
flag character varying(30) NOT NULL -- planned, in progress, fixed
);
CREATE TABLE post_flag
(
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
post_id integer NOT NULL REFERENCES posts (id),
flag_id integer NOT NULL REFERENCES flags (id)
);
To get posts flagged as fixed I have to use:
-- homepage posts- fixed posts tab
SELECT
p.*,
f.flag
FROM posts p
JOIN post_flag p_f
ON p.id = p_f.post_id
JOIN flags f
ON p_f.flag_id = f.id
WHERE f.flag = 'fixed'
ORDER BY p_f.created_at DESC
Second approach; I have two tables posts and flags. The table posts has a flag_id column that references a flag in the flags table.
CREATE TABLE posts
(
id bigserial PRIMARY KEY,
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
title character varying(100),
text text,
score integer DEFAULT 0,
author_id integer NOT NULL REFERENCES users (id),
product_id integer NOT NULL REFERENCES products (id),
flag_id integer DEFAULT NULL REFERENCES flags (id)
);
CREATE TABLE flags
(
id bigserial PRIMARY KEY,
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
flag character varying(30) NOT NULL -- one of planned, in progress, fixed
);
For same data;
-- homepage posts- fixed posts tab
SELECT
p.*,
f.flag
FROM posts p
JOIN flags f
ON p.flag_id = f.id
WHERE f.flag = 'fixed'
ORDER BY p.created_at DESC
Third approach denormalized; I have only one table posts. Posts table has a flag column to store the flag assigned to the post.
CREATE TABLE posts
(
id bigserial PRIMARY KEY,
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
title character varying(100),
text text,
score integer DEFAULT 0,
author_id integer NOT NULL REFERENCES users (id),
product_id integer NOT NULL REFERENCES products (id),
flag character varying(30)
);
Here I would only have for same data;
-- homepage posts- fixed posts tab
SELECT
p.*,
FROM posts p
WHERE p.flag = 'fixed'
ORDER BY p.created_at DESC
I wonder if first approach is an overkill in terms of normalization of data in a RDBMS like Postgresql? For a post comment relationship that first approach would be great and indeed I make use of it. But I have some very few quantity data used as meta data for posts as badges, flags, tags. As you see in fact in the most normal form, the first approach, I already use some product_id etc for a using one less JOIN but to another table as a different relation, not to the flags. So, there my approach fits into my second approach. Should I use the more denormalized approach, the third one, having posts table and a flag column in it? What is the better approach in terms of performance, expansion, and maintainability?
Use the second approach.
The first is a many-to-many data structure and you say
A post can have only one flag at a time.
So you would then have to build the business logic in to the front-end or set up complex rules to check a post never have more than one flag.
The third approach will result in messy data, again unless you implement checks or rules to ensure the flags are not misspelled or new ones added.
Expansion and maintainability are provided in the second approach; it is also self documenting. Worry about performance when it actually becomes a problem, and not before.
Personally I would make the flag_id field in the posts table NULL, which would allow you to model a post without a flag.
Blending two approaches
Assuming your flag names are unique, you can use the flag name as a natural key. Your table structures would then be
CREATE TABLE posts
(
id bigserial PRIMARY KEY,
... other fields
flag character varying(30) REFERENCES flags (flag)
);
CREATE TABLE flags
(
flag character varying(30) NOT NULL PRIMARY KEY,
created_at timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP
);
You then get the benefit of being able to write queries for flag without having to JOIN to the flags table while having flag names checked by the table reference.
I am designing a database to capture audits that my company performs. I am having a bit of trouble finding an efficient way to capture all of the audit points without making 60 columns in a single table. Is there an efficient way to capture multiple data points in a single column and still be able to query without trouble.
Each audit may have anywhere from 0 to 60 unique citations. I will make a reference table to hold every regulatory citation, but how do I design the central table so that the 'citation' column can have , or , or any number of other combinations?
I usually try to keep auditing info in a single table.
In order to do this, I go something like this:
TABLE: Audit
**Id** (PK)
**EntityClass** (the Class, or type, or whatever you want to identify your entities by)
**EntityId** (the id of the entity in it's own table)
**PropertyChanged** (the name of the property of the entity that changed)
**OldValue** (the old value of the property)
**NewValue** (the revised value of the property)
**TimeStamp** (moment of the revision)
**RevisionType** (transaction type: Insert, Update, Delete)
This is the simplest schema, you can build on that with additional columns if you wish.
Hope this helps. Cheers!
In this example, I'm assuming, since you refer to a specific number if citations, there is -- or can be -- a taxonomic table holding 60 definitions or references, one for each kind of citation.
The Audits table contains the relevant info about each audit. I'm guessing most of this, but note there is no reference to any citation.
create table Audits(
ID int identity( 1, 1 ),
Started date,
Completed date,
CustomerID int,
AuditorID int, -- More than one possible auditor? Normalize.
VerifierID int,
Details ...,
constraint PK_Audits primary key( ID ),
constraint FK_Audits_Customer( foreign key( CustomerID )
references Customers( ID ),
constraint FK_Audits_Auditor( foreign key( AuditorID )
references Auditors( ID ),
constraint FK_Audits_Verifier( foreign key( VerifierID )
references Auditors( ID ),
constraint CK_Audits_Auditor_Verifier check( AuditorID <> VerifierID )
);
The AuditCitations table contains each citation for each audit, one entry for each citation. Note that the PK will prevent the same audit from having more than one reference to the same citation (if, of course, that is your rule).
create table AuditCitations(
AuditID int,
CitID int,
Details ...,
constraint FK_AuditCitations_Audit( foreign key( AuditID )
references Audits( ID ),
constraint FK_AuditCitations_Citation( foreign key( CitID )
references Citations( ID ),
constraint PK_AuditCitations primary key( AuditID, CitID )
);
A citation may well have its own auditor and verifier/checker or just about anything that applies to the particular citation. This example mainly just shows the relationship between the two tables.
I have the following two tables in my Postgres database:
CREATE TABLE User (
Id serial UNIQUE NOT NULL,
Login varchar(80) UNIQUE NOT NULL,
PRIMARY KEY (Id,Login)
);
CREATE TABLE UserData (
Id serial PRIMARY KEY REFERENCES Users (Id),
Password varchar(255) NOT NULL
);
Say, I add a new user with INSERT INTO Users(Id, Login) VALUES(DEFAULT, 'John') and also want to add VALUES(id, 'john1980') in UserData where id is John's new id.
How do I get that id? Running a query for something just freshly created seems superfluous. I have multiple such situations across the database. Maybe my design is flawed in general?
(I'm obviously not storing passwords like that.)
1) Fix your design
CREATE TABLE usr (
usr_id serial PRIMARY KEY,
,login text UNIQUE NOT NULL
);
CREATE TABLE userdata (
usr_id int PRIMARY KEY REFERENCES usr
,password text NOT NULL
);
Start by reading the manual about identifiers and key words.
user is a reserved word. Never use it as identifier.
Use descriptive identifiers. id is useless.
Avoid mixed case identifiers.
serial is meant for a unique column that can be pk on its own. No need for a multicolumn pk.
The referencing column userdata.usr_id cannot be a serial, too. Use a plain integer.
I am just using text instead of varchar(n), that's optional. More here.
You might consider to merge the two tables into one ...
2) Query to INSERT in both
Key is the RETURNING clause available for INSERT, UPDATE, DELETE, to return values from the current row immediately.
Best use in a data-modifying CTE:
WITH ins1 AS (
INSERT INTO usr(login)
VALUES ('John') -- just omit default columns
RETURNING usr_id -- return automatically generated usr_id
)
INSERT INTO userdata (usr_id, password )
SELECT i.usr_id, 'john1980'
FROM ins1 i;
You can consider using a trigger. The Id column of the newly inserted row can be accessed by the name NEW.Id.
References:
CREATE TRIGGER documentation on PostgreSQL Manual
Trigger Procedures
Hoping someone can shed some light on this: Do lookup tables need their own ID?
For example, say I have:
Table users: user_id, username
Table categories: category_id, category_name
Table users_categories: user_id, category_id
Would each row in "users_categories" need an additional ID field? What would the primary key of said table be? Thanks.
You have a choice. The primary key can be either:
A new, otherwise meaningless INTEGER column.
A key made up of both user_id and category_id.
I prefer the first solution but I think you'll find a majority of programmers here prefer the second.
You could create a composite key that uses the both keys
Normally if there is no suitable key to be found in a table you want to create a either a composite key, made up of 2 or more fields,
ex:
Code below found here
CREATE TABLE topic_replies (
topic_id int unsigned not null,
id int unsigned not null auto_increment,
user_id int unsigned not null,
message text not null,
PRIMARY KEY(topic_id, id));
therefor in your case you could add code that does the following:
ALTER TABLE users_categories ADD PRIMARY KEY (user_id, category_id);
therefor once you want to reference a certain field all you would need is to pass the two PKs from your other table, however to link them they need to each be coded as a foreign key.
ALTER TABLE users_categories ADD CONSTRAINT fk_1 FOREIGN KEY (category_id) REFERENCES categories (category_id);
but if you want to create a new primary key in your users_categories table that is an option. Just know that its not always neccessary.
If your users_categories table has a unique primary key over (user_id, category_id), then - no, not necessarily.
Only if you
want to refer to single rows of that table from someplace else easily
have more than one equal user_id, category_id combination
you could benefit from a separate ID field.
Every table needs a primary key and unique ID in SQL no matter what. Just make it users_categories_id, you technically never have to use it but it has to be there.