Check if value of referenced row matches value in current row - sql

See the following database structure:
v---------------------------------------------------|
v----------------------------|---------------------------| |
+---------------+ +----+---------+------+ +----+---------+---------+-----+
| id | username | | id | user_id | tag | | id | user_id | message | tag |
+----+----------+ +----+---------+------+ +----+---------+---------+-----+
| 1 | User1 | | 1 | 1 | tech | | 1 | 1 | Test1 | 1 |
| 2 | User2 | | 2 | 1 | news | | 2 | 2 | Test2 | 1 |
+----+----------+ +----+---------+------+ +----+---------+---------+-----+
users tags messages
tags.user_id and messages.user_id both reference users.id. messages.tag references tags.id.
Users have tags available (rows in tags where rows.user_id = users.id) and messages (rows in messages where messages.user_id = users.id).
The problem is that any tag can be "attached" to the message, instead of only tags that are owned by the user. So I need an extra restriction that ensures that the tag referenced in messages.tag not only exists (foreign key restriction), but is also owned by the same user as the message itself (messages.user_id = tags.user_id).
I have not found a way yet to achieve this restriction, which is why I'm asking help.
python: 3.8.10
sqlite3.version: 2.6.0
sqlite3.sqlite_version: 3.31.1

From the manual creating a composite FK in Sqlite3 looks like:
CREATE TABLE parent(a PRIMARY KEY, c, d, e, f);
CREATE UNIQUE INDEX i1 ON parent(c, d);
CREATE TABLE child3(j, k, FOREIGN KEY(j, k) REFERENCES parent(c, d));

Related

select values or NULL if it does not exists

I have a table user with columns id (primary key) and name (unique constraint) containing:
+----+------+
| id | name |
+----+------+
| 1 | u1 |
| 2 | u2 |
| 3 | u3 |
+----+------+
I would like a make a query, something like SELECT id, name FROM user WHERE name IN (?,?), with parameters ("u2", "u4"), that I can reuse in CTE, but which returns:
+------+------+
| id | name |
+------+------+
| 2 | u2 |
| NULL | u4 |
+------+------+
(edit: s/NULL/u4).
Is it possible? I'm interested in MariaDB and PostgreSQL.
One use case would be to insert the ids result in another table where NULL is not accepted, so it would insert the names and check for invalid values in only 1 request. (no need for 2 separates requests + application code).
There are different ways.. An easy one in Postgresql could be:
select a, "name"
from unnest(ARRAY['u2', 'u4']::text[]) AS a
LEFT JOIN "user" AS u ON a = u."name";

How should I design a table where a row can have different columns depending on the type of row?

I'm planning to use the Reddit API and store my saved posts in a database. The saves can be of two types - Comments or Posts, both of them have few common columns - author, score, subreddit etc. and a few columns unique to each category:
comment - body_text, comment_id, parent_id, etc.
posts - selftext,link_url,is_video, etc.
I decided to separate the 2 categories into their own tables - Comments table and Posts table. But I don't know how to link these tables to the master table "saves".
My current solution is to have a column kind for the type of save. The comment_id and post_id link the save to its own table. However, this feels like a messy solution and a bit cumbersome. A save can either have a comment_id or a link_id (but not both or neither), and I also have to manage this constraint.
Saves Table :
+----+---------+-------+---------------------------------------------+---------+------------+---------+
| ID | Kind | title | post_url | author | comment_id | post_id |
+----+---------+-------+---------------------------------------------+---------+------------+---------+
| 1 | comment | abc | https://redd.i/redditpostid/redditcommentid | FusionX | 1 | NULL |
| 2 | post | xyz | https://redd.i/redditpostid | XnoisuF | NULL | 1 |
+----+---------+-------+---------------------------------------------+---------+------------+---------+
Post Table :
+----+---------+-------------------------------------------+-----------------------+--------------+--------------+
| ID | is_self | selftext | post_url | num_comments | thumbnail |
+----+---------+-------------------------------------------+-----------------------+--------------+--------------+
| 1 | no | NULL | i.imgur.com/xyz.jpg | 1020 | someimageurl |
| 2 | yes | "some random selftext of variable length" | redd.it/redditpostid/ | 10 | |
+----+---------+-------------------------------------------+-----------------------+--------------+--------------+
Comment table:
+----+---------------------------------+---------------------+--------------------+
| ID | body_html | reddit_comment_id | reddit_parent_id |
+----+---------------------------------+---------------------+--------------------+
| 1 | comment text of variable length | <reddit comment id> | <reddit parent id> |
+----+---------------------------------+---------------------+--------------------+
(reddit ID's are different from my table's own IDs and are only relevant at reddit's end)
Is there a better way to design this database?
I think you should move the owning side of the relation to the two other tables.
So instead of having comment_id and post_id columns in saves table, have a saves_id column in post table and comment table.

Which normal form or other formal rule does this database design choice violate?

The project I'm working on is an application that lets you design data entry forms, and automagically generates a schema in an underlying PostgreSQL database
to persist them as well as the browsing and editing UI.
The use case I've encountered this with is a store back-office database, but the app itself intends to be somewhat universal. The administrator creates the following entry forms with the given fields:
Customers
name (text box)
Items
name (text box)
stock (number field)
Order
customer (combo box selecting a customer)
order lines (a grid showing order lines)
OrderLine
item (combo box selecting an item)
count (number field)
When all this is done, the resulting database schema will be equivalent to this:
create table Customers(id serial primary key,
name varchar);
create table Items(id serial primary key,
name varchar,
stock integer);
create table Orders(id serial primary key);
create table OrderLines(id serial primary key,
count integer);
create table Links(id serial primary key,
fk1 integer references Customers.id,
fk2 integer references Items.id,
fk3 integer references Orders.id,
fk4 integer references OrderLines.id);
Links being a special table that stores all the relationships between entities; every row has (usually) two of the foreign keys set to a value, and the rest set to NULL. Whenever a new entry form is added to the application instance, a new foreign key referencing the table for this form is added to Links.
So, suppose our shop stocks some widgets, gizmos, and thingeys. A customer named Adam orders two widgets and three gizmos, and Betty orders four gizmos and five thingeys. The database will contain the following data:
Customers
/----+-------\
| ID | NAME |
| 1 | Adam |
| 2 | Betty |
\----+-------/
Items
/----+---------+-------\
| ID | NAME | STOCK |
| 1 | widget | 123 |
| 2 | gizmo | 456 |
| 3 | thingey | 789 |
\----+---------+-------/
Orders
/----\
| ID |
| 1 |
| 2 |
\----/
OrderLines
/----+-------\
| ID | COUNT |
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
\----+-------/
Links
/----+------+------+------+------\
| ID | FK1 | FK2 | FK3 | FK4 |
| 1 | 1 | NULL | 1 | NULL |
| 2 | 2 | NULL | 2 | NULL |
| 3 | NULL | NULL | 1 | 1 |
| 4 | NULL | NULL | 1 | 2 |
| 5 | NULL | NULL | 2 | 3 |
| 6 | NULL | NULL | 2 | 4 |
| 7 | NULL | 1 | NULL | 1 |
| 8 | NULL | 2 | NULL | 2 |
| 9 | NULL | 2 | NULL | 3 |
| 10 | NULL | 3 | NULL | 4 |
\----+------+------+------+------/
(The tables also contain a bunch of timestamps for auditing and soft deletion but I don't think they're relevant here, they just make writing the SQL by the administrator that much messier. The management app is also used to implement a bunch of different use cases, but they're generally primarily data entry, master-detail views, and either scalar fields or selection boxes.)
When I've had to write a join through this thing I'd grumbled about it to my coworker, who replied "well using separate tables for each relationship is one way to do it, this is another..." Leaving aside the obvious-to-me ugliness of the above and the practical issues, I also have a nagging feeling this has to be a violation of some normal form, but it's been a while since college and I'm struggling to figure out which of the criteria apply here.
Is there something stronger "well that's just your opinion" I can use when critiquing this design?

A good SQL Structure for grouped comments with replies

I have two tables: News, Images. Both could have comments, so i decided try to make a generic comments table. Also comments could have a reply. I solved two possible methods, but i dont know which choose in order of a good practice or good performance solution.
Method 1 (which i am using):
News:
| ID | CommentGroup | Content | ...etc
Images:
| ID | CommentGroup | Url | ...etc
Considering the next image:
| 14 | 22 | http://image.gif | ...etc
Where the comments could be these:
|UserA:
| Coment1
|
|--|UserB -> UserA:
| Coment2
|
|---|UserC -> UserB:
| | Comment4
|
|UserD -> UserA:
| Coment3
Resulting Comments:
| ID | Group | ReplyGroup | Replied | Content | User |
| 13 | 22 | NULL | 1 | Comment1 | UserA |
| 17 | 22 | 13 | 1 | Comment2 | UserB |
| 11 | 22 | 13 | NULL | Comment3 | UserD |
| 15 | 22 | 17 | NULL | Comment4 | UserC |
If after commented Image14, is created a New, i decide the future comments group number by counting the max of the group column (22) so add 1 (23).
New:
| ID | CommentGroup | Content | ...etc
| 14 | 23 | A new | ...etc
Comments:
| ID | Group | ReplyGroup | Replied | Content | User |
| 22 | 23 | NULL | 1 | Comment1 | UserA |
| 30 | 23 | 22 | NULL | Comment2 | UserB |
Method 2
taken from this question:
News:
| ID | Content | ...etc
Images:
| ID | Url | ...etc
Comments:
| ID | Group | Type | ReplyGroup | Replied | Content | User |
Where type dintincts between News or Images Group.
how you think is better?
or what other solutions are possible?
Thanks.
Initially as a basic implementation I would treat everything as 'content' grouping common attributes.
CONTENT (
id int primary key,
created_on datetime,
created_by int
)
Then have more specific tables of the types of content
e.g.
NEWS (
content_id int primary key foreign key references content(id),
article nvarchar(max)
)
and
IMAGES (
content_id int primary key foreign key references content(id),
url varchar(1000)
)
and
COMMENTS (
content_id int primary key foreign key references content(id),
parent_id int foreign key references content(id)
root_id int foreign key references content(id),
level int,
text nvarchar(2000)
)
Each of these would have a 1:1 relationship with CONTENT.
COMMENTS would then reference other content 'directly' via the parent_id, the reference being either an image, news or indeed another comment.
The root_id in the COMMENTS would reference the actual image or news content (as would the parent_id of all 'top level' comments). This adds the overhead of maintaining the root_id (which shouldn't be too difficult) but will aid selecting comments for some content.
e.g.
-- get the article
SELECT *
FROM content
JOIN news
ON news.content_id = content.id
JOIN users
ON users.id = content.created_by
WHERE content.id = #news_id
-- get the comments
SELECT *
FROM content
JOIN comments
ON comments.content_id = content.id
JOIN users
ON users.id = content.created_by
WHERE comments.root_id = #news_id

SQL relationships and best practices

If I have a User table and a Roles table.
What is the usual practice/pattern for adding the relationship?
Do I create an extra column in the User table for the RoleID or do people usually create a Relationships table like so:
Relationships Table
RelationshipID | UserID | RoleID |... any other relations a user might have
for the last bit, as a user you might create an endless amount of different types of things that all need to be related to you... do you instead add the relationship to each individual table created for each individual thing.. for example:
Pages Table
PageID | Title | Content | Author (UserID)
and so another table would also be similar to this:
Comments Table
CommentID | Comment | Author (UserID)
In this case, I would need to expand upon the Relationships table if I were to do it that way:
Relationships Table
RelationshipID | UserID | RoleID | CommentID
and i'd probably only want to fill in the UserID and CommentID as this relationship is not for the Roles... that is governed by another entry. so for example the values might be put in for a comment relationship:
AUTO | 2 | NULL | 16
I could imagine a multi purpose Revisions table being handy...
Revisions Table
RevisionID | DateCreated | UserID | ActionTypeID | ModelTypeID | Status | RelatedItemID
---------------------------------------------------------------------------------------
1 | <Now> | 3 | 4 (Delete) | 6 (Page) | TRUE | 38
2 | <Now> | 3 | 1 (Delete) | 5 (Comment) | TRUE | 10
3 | <Now> | 3 | 1 (Add) | 5 (Comment) | FALSE | 10
but not for a general Relationships table...
Does this sound correct?
Edit since comments:
They stated that the relationships table should be made due to the many-to-many (data model)
So let's take my previous example of my possible relationship table:
Relationships Table Old
RelationshipID | UserID | RoleID | CommentID... etc
Should it actually be something more like this:
Relationships Table New
RelationshipID | ItemID | LinkID | ItemType | LinkType | Status
---------------------------------------------------------------------------------
1 | 23(PageID) | 7(UserID) | ("Page") | ("User") | TRUE
2 | 22(CommentID) | 7(UserID) | ("Comment") | ("User") | TRUE
3 | 22(CommentID) | 23(PageID) | ("Comment") | ("Page") | TRUE