A simplified version of Twitter. Understanding many-to-many relationships between tables in the database - sql

I am reading this article about a Twitter-like application. The type of storage where tweets, users, likes, etc. will be stored is a relational database. The database scheme is described here and drawn here.
As an Android developer, I coded my sample using SQLite. This is how I would code it:
create table users (_id integer primary key, username text unique, first_name text, last_name text);
create table tweets (_id integer primary key, content text, created_at integer, user_id integer, foreign key(user_id) references users(_id));
create table connections (_id integer primary key, follower_id integer, followee_id integer, created_at integer, foreign key(follower_id) references users(_id), foreign key (followee_id) references users(_id));
create table favorites (_id integer primary key, user_id integer, tweet_id integer, foreign key (user_id) references users(_id), foreign key (tweet_id) references tweets(_id));
Now let's insert some data.
users:
insert into users values (1, 'user1', 'Lorem', 'Ipsum');
insert into users values (2, 'user2', 'Dolor', 'Sit');
insert into users values (3, 'user3', 'Foo', 'Bar');
insert into users values (4, 'user4', 'Qwerty', 'Trewq');
some tweets:
insert into tweets values(10, '1 Tweet from user1', 1100, 1);
insert into tweets values(11, '2 Tweet from user1', 1101, 1);
insert into tweets values(12, '3 Tweet from user1', 1102, 1);
insert into tweets values(13, '4 Tweet from user1', 1103, 1);
insert into tweets values(14, '1 Tweet from user2', 1103, 2);
insert into tweets values(15, '2 Tweet from user2', 1103, 2);
insert into tweets values(16, '1 Tweet from user3', 1103, 3);
insert into tweets values(17, '2 Tweet from user3', 1103, 3);
insert into tweets values(18, '1 Tweet from user4', 1107, 4);
favorites (the same as likes):
insert into favorites values(1, 2, 11);
insert into favorites values(2, 3, 13);
insert into favorites values(3, 4, 15);
There is a question about the database scheme:
Do you think you could support with our database design the ability
to display a page for a given user with their latest tweets that were
favorited at least once?
Yes, this is why query:
sqlite> select favorites._id, tweets._id as tweet_row_id, tweets.content from favorites join tweets on tweets.user_id=1 and tweets._id = favorites.tweet_id order by tweets._id desc limit 1;
_id tweet_row_id content
---------- ------------ ------------------
2 13 4 Tweet from user1
Explanation:
The left dataset is the table favorites. The right dataset is the table tweets. I join the two datasets. Then tweets.user_id=1 and tweets._id = favorites.tweet_id is evaluated for each row of the resulting dataset as a boolean expression. If the result is true, the row is included. order by tweets._id desc is used to get the latest tweets (the greater tweets._id is, the newer the tweet is). limit is used to limit the number of rows. If the user has been using our Twitter-like app for years, we'll show the latest 10 or 20 tweets.
My questions.
Is there anything wrong with my database scheme? I omitted not null, unique, and other column constraints for simplicity.
Here the author of the original article says:
The first relation is addressed by sticking the user ID to each tweet.
This is possible because each tweet is created by exactly one user.
It’s a bit more complicated when it comes to following users and
favoriting tweets. The relationship there is many-to-many.
"The first relation" is users-tweets.
Why do we need a many-to-many here? In my scheme I only use a one-to-many.
Update 1

Shortly I placed an answer here, where the OP - like you in this question - was unsure about 1:n and n:m.
I assume, that your final sentence is the actual question you have:
Why do we need a many-to-many here? In my scheme I only use a one-to-many
The relation user-tweets is 1:n...
Think in objects
user (id, name, ...)
tweet (id, author (FK on user), datetime, content, ...)
The like is an object with sepecific details on its own:
like (id, userid,tweetid,datetime,...)
For this you need a mapping table (you call it favourites)
There is a 1:n-relation from users to this mapping and a 1:n-relation from tweets to this mapping.
These two 1:n-relations form the m:n-relation together.
Now each tweet can be liked by many users and each user can like many tweets, but one user should (probably) not like the same tweet twice (unique key or even a two column PK?). And you might introduce a CHECK constraint to ensure, that the liking user and the author's userid is not the same (don't like your own tweets).
As a side note:
Is there anything wrong with my database scheme
You should never create constraints wihtout naming them
CREATE TABLE Dummy
(
ID INT IDENTITY CONSTRAINT PK_Dummy PRIMARY KEY
,UserID INT NOT NULL CONSTRAINT FK_Dummy_UserID FOREIGN KEY REFERENCES User(id)
,...
)
If this database was ever installed on different systems, they'll get different (random) names and future upgrade scripts will get you in deepest pain...
UPDATE: example for the side note
In you comment you ask, what this last sentence is about... Try this
CREATE DATABASE testDB;
GO
USE testDB;
GO
CREATE TABLE testTbl1(ID INT IDENTITY PRIMARY KEY,SomeValue INT UNIQUE);
CREATE TABLE testTbl2(ID INT IDENTITY PRIMARY KEY,FKtoTbl1 INT NOT NULL FOREIGN KEY REFERENCES testTbl1(ID));
GO
CREATE TABLE testTbl3(ID INT IDENTITY CONSTRAINT PK_3 PRIMARY KEY,SomeValue INT CONSTRAINT UQ_3_SomeValue UNIQUE);
CREATE TABLE testTbl4(ID INT IDENTITY CONSTRAINT PK_4 PRIMARY KEY,FKtoTbl3 INT NOT NULL CONSTRAINT FK_4_FKtoTbl3 FOREIGN KEY REFERENCES testTbl3(ID));
GO
SELECT * FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS;
GO
USE master;
GO
DROP DATABASE testDB;
GO
On column in your result looks like this:
CONSTRAINT_NAME
------------------------------
PK__testTbl1__3214EC27ABEA2C0C
UQ__testTbl1__0E5C381C04C8AF66
PK__testTbl2__3214EC272784631C
FK__testTbl2__FKtoTb__1367E606
PK_3
UQ_3_SomeValue
PK_4
FK_4_FKtoTbl3
If this script is run twice, the given names will stay as you defined them. The other names will get a random name like PK__testTbl1__3214EC27ABEA2C0C. Now imagine, you need to create an upgrade script for several installed systems where one constraint has to be dropped or modified. How would you do this, if you do not know its name?

Related

Securing values for other tables from enumerable table in SQL Server database

English is not my native language, so I might have misused words Enumerator and Enumerable in this context. Please get a feel for what I'm trying to say and correct me if I'm wrong.
I'm looking into not having tables for each enumerator I need in my database.
I "don't want" to add tables for (examples:) service duration type, user type, currency type, etc. and add relations for each of them.
Instead of a table for each of them which values will probably not change a lot, and for which I'd have to create relationships with other tables, I'm looking into having just 2 tables called Enumerator (eg: user type, currency...) and Enumerable (eg: for user type -> manager, ceo, delivery guy... and for currency -> euro, dollar, pound...).
Though here's the kicker. If I implement it like that, I'm loosing the rigidity of the foreign key relationships -> I can't accidentally insert a row in users table that will have a user type of some currency or service duration type, or something else.
Is there another way to resolve the issue of having so many enumerators and enumerables with the benefit of having that rigidity of the foreign key and with the benefit of having all of them in just those 2 tables?
Best I can think of is to create a trigger for BEFORE UPDATE and BEFORE INSERT to check if (for example) the column type of user table is using the id of the enumerable table that belongs to the correct enumerator.
This is a short example in SQL
CREATE TABLE [dbo].[Enumerator]
(
[Id] INT NOT NULL PRIMARY KEY,
[Name] VARCHAR(50)
)
CREATE TABLE [dbo].[Enumerable]
(
[Id] INT NOT NULL PRIMARY KEY,
[EnumeratorId] INT NOT NULL FOREIGN KEY REFERENCES Enumerator(Id),
[Name] VARCHAR(50)
)
INSERT INTO Enumerator (Id, Name)
VALUES (1, 'UserType'),
(2, 'ServiceType');
INSERT INTO Enumerable (Id, EnumeratorId, Name) -- UserType
VALUES (1, 1, 'CEO'),
(2, 1, 'Manager'),
(3, 1, 'DeliveryGuy');
INSERT INTO Enumerable (Id, EnumeratorId, Name) -- ServiceDurationType
VALUES (4, 2, 'Daily'),
(5, 2, 'Weekly'),
(6, 2, 'Monthly');
CREATE TABLE [dbo].[User]
(
[Id] INT NOT NULL PRIMARY KEY IDENTITY (1,1),
[Type] INT NOT NULL FOREIGN KEY REFERENCES Enumerable(Id)
)
CREATE TABLE [dbo].[Service]
(
[Id] INT NOT NULL PRIMARY KEY IDENTITY (1,1),
[Type] INT NOT NULL FOREIGN KEY REFERENCES Enumerable(Id)
)
The questions are:
Is it viable to resolve enumerators and enumerables with 2 tables and with before update and before insert triggers, or is it more trouble than it's worth?
Is there a better way to resolve this other than using before update and before insert triggers?
Is there a better way to resolve enumerators and enumerables that is not using 2 tables and triggers, nor creating a table with relations for each of them?
I ask for your wisdom as I don't have one or more big projects behind me and I didn't get a chance to create a DB like this until now.

Cannot create Access tables through SQL view

I'm trying to create 3 access tables in SQL view on Microsoft Access but whenever I try to execute it, I receive the following error. 'Syntax Error in CREATE TABLE statement'.
Please find my code below.
CREATE TABLE Book (
Book_ID int,
Book_Title varchar (30),
PRIMARY KEY (Book_ID)
);
CREATE TABLE Users (
User_ID int,
User_Name varchar (30),
PRIMARY KEY (User_ID)
);
CREATE TABLE Borrows (
User_ID int,
Book_ID int,
B_ID int,
PRIMARY KEY(B_ID),
FOREIGN KEY(User_ID) REFERENCES Users(User_ID),
FOREIGN KEY(Book_ID) REFERENCES Book(Book_ID)
);
INSERT INTO Book VALUES (101, 'The Hobbit'), (102, 'To Kill a Mockingbird');
INSERT INTO Users VALUES (1, 'Stephen'), (2, 'Tom'), (3,' Eric');
INSERT INTO Borrows VALUES (3, 102, 1), (1, 101, 2);
Appreciate any feedback I can get, have a good day.
Your first CREATE TABLE executed flawlessly from the query designer in Access 2010. However my preference is to include the PRIMARY KEY constraint as part of the field definition:
CREATE TABLE Book (
Book_ID int PRIMARY KEY,
Book_Title varchar (30)
);
That variation also executed successfully.
I suspect you have at least 2 issues to resolve:
Access does not allow you to execute more than one SQL statement at a time (as Heinzi and Albert mentioned). You must execute them one at a time.
In Access, INSERT ... VALUES can only be used to add one row at a time. Revise your inserts accordingly.
IOW, split the first one into 2 statements which you then execute individually:
-- INSERT INTO Book VALUES (101, 'The Hobbit'), (102, 'To Kill a Mockingbird');
INSERT INTO Book VALUES (101, 'The Hobbit');
INSERT INTO Book VALUES (102, 'To Kill a Mockingbird');
Then split and execute the remaining inserts similarly.
Your code example use SQL Server (T-SQL) syntax, not MS Access syntax.
The syntax for Access' CREATE TABLE statement is documented here:
https://learn.microsoft.com/en-us/office/client-developer/access/desktop-database-reference/create-table-statement-microsoft-access-sql
The most obvious differences seem to be there is no varchar type and that PRIMARY KEY needs a constraint name if specified in an extra line. There might be more, see the article and its examples for details. I also suggest that you submit your statements one-by-one, instead of submitting a complete ;-separated batch; I'm not sure Access queries even support the latter.

Sqlite3 store list in column

I havea some data i need to store in a sqlite database that looks like this
[
[user1,[python,java,javascript],21],
[user2,[csharp,python,c,java,php,sql],18],
[user3,[],52]
[user4,[python],73]
]
How do i store the list of programming languages for each user in sqlite3
The standard solution is have a table for each thing: users, languages, and the table that holds the 1 to many relationship which in this case is users_languages. user and language are relatively large and variable sized key, so it's pretty common optimization to introduce a an artificial key usually integer auto_increment.
create table languages (
language text primary key
);
insert into languages values ('python'), ('java'), ('javascript'), ('csharp'), ('c'), ('php'), ('sql');
create table users (
user text primary key,
age tinyint not null
);
insert into users values ('user1', 21), ('user2', 18), ('user3', 52), ('user4', 73);
create table users_languages (
user text not null,
language text not null,
foreign key (user) references users (user),
foreign key (language) references languages (language),
unique(user, language)
);
insert into users_languages values ('user1', 'python'), ('user1', 'java'), ('user1', 'javascript');
...
-- list of languages for a given user (row per language)
select language from users_languages where user = '...';
-- list of languages for all users (row per user)
select user, group_concat(langauge) from users_langauges group by 1;

Check if many-to-many relationship exists before insert or delete

I have 3 tables
For example:
Book
id
title
Tag
id
name
BookTag
book_id
tag_id
The goal to disallow having Book without Tag. i.e. when I try insert/delete data I need something to check on database level that Book has at least one Tag through many-to-many. If such validation fails it should throw constaint violation error or some sort of that. How should I implement that? Can it be reached by check constraint or should I create some trigger, if so then how?
please help me. thanks for your help in advance
You can enforce this at the pure database level by adding a foreign key in the book table that points back to a tag (any tag) in the book_tag table. As of now, your database model looks like:
create table book (
id int primary key not null,
title varchar(50)
);
create table tag (
id int primary key not null,
name varchar(50)
);
create table book_tag (
book_id int not null,
book_tag int not null,
primary key (book_id, book_tag)
);
Now, add the extra foreign key that points back to a tag:
alter table book add column a_tag int not null;
alter table book add constraint fk1 foreign key (id, a_tag)
references book_tag (book_id, tag_id) deferrable initially deferred;
Now when you insert a book, it can temporarily not have a tag, but only while the transaction hasn't finished yet. You need to insert a tag before committing. If you don't the constraint will fail, the transaction will rollback, and the insert won't happen.
Note: Please notice that this requires the use of deferrable constraints (look at deferrable initially deferred), something that is part of the SQL Standard but seldomly implemented. Fortunately, PostgreSQL does.
EDIT - Adding an example
Considering the previous modified tables you can try inserting a book without tags (will fail) and with tags (succeeding) as shown below:
insert into tag (id, name) values (10, 'classic');
insert into tag (id, name) values (12, 'action');
insert into tag (id, name) values (13, 'science fiction');
-- begin transaction
insert into book (id, title, a_tag) values (1, 'Moby Dick', 123);
commit; -- fails
-- begin transaction
insert into book (id, title, a_tag) values (2, 'Frankenstein', 456);
insert into book_tag (book_id, book_tag) values (2, 10);
insert into book_tag (book_id, book_tag) values (2, 13);
update book set a_tag = 10;
commit; -- succeeds

How to deal with an id that needs to be matched to multiple ids in SQLite?

I'm new to databases, so I'll start by showing what I would do if I was using a simple table in a csv file. Presently, I'm building a Shiny (R) app to keep track of people taking part in studies. The idea is to make sure no one is doing more than one study at the same time, and that enough time has passed between studies.
A single table would look like something like this:
study_title contact_person tasks first_name last_name
MX9345-3 John Doe OGTT Michael Smith
MX9345-3 John Doe PVT Michael Smith
MX9345-3 John Doe OGTT Julia Barnes
MX9345-3 John Doe PVT Julia Barnes
...
So each study has a single contact person, but multiple tasks. It is possible other studies will use the same tasks.
Each task should have a description
Each person can be connected to multiple studies (the final database has timestamps to make sure this does not happen at the same time), and consequently repeat the same tasks.
the SQLite code could look something like this
CREATE TABLE studies (study_title TEXT, contact_person TEXT);
CREATE TABLE tasks (task_name TEXT, description TEXT);
CREATE TABLE participants (first_name TEXT, last_name TEXT);
Now I'm stuck. If I add a primary key and foreign keys (say in studies an ID for each study, and foreign keys for each task and person), the primary keys will repeat, which is not possible. A Study is defined by the tasks it contains (akin to an album and music tracks).
How should I approach this situation in SQLite? And importantly, how are the INSERTs done in these situations? I've seen lots of SELECT examples, but few INSERTs that match all IDs in each table, for example when adding a new person to a running study.
What you do is use tables to map/reference/relate/associate.
The first step would be to utilise alias's of the rowid so instead of :-
CREATE TABLE studies (study_title TEXT, contact_person TEXT);
CREATE TABLE tasks (task_name TEXT, description TEXT);
CREATE TABLE participants (first_name TEXT, last_name TEXT);
you would use :-
CREATE TABLE studies (id INTEGER PRIMARY KEY,study_title TEXT, contact_person TEXT);
CREATE TABLE tasks (id INTEGER PRIMARY KEY, task_name TEXT, description TEXT);
CREATE TABLE participants (id INTEGER PRIMARY KEY, first_name TEXT, last_name TEXT);
With SQLite INTEGER PRIMARY KEY (or INTEGER PRIMARY KEY AUTOINCREMENT) makes the column (id in the above although they can have any valid column name) and alias of the rowid (max of 1 per table), which uniquely identifies the rows.
Why not to use AUTOINCREMENT plus more seeSQLite Autoincrement
Insert some data for demonstration :-
INSERT INTO studies (study_title, contact_person)
VALUES ('Maths','Mr Smith'),('English','Mrs Taylor'),('Geography','Mary White'),('Phsyics','Mr Smith');
INSERT INTO tasks (task_name,description)
VALUES ('Task1','Do task 1'),('Task2','Do task 2'),('Task3','Do task 3'),('Task4','Do task 4'),('Mark','Mark the sudies');
INSERT INTO participants (first_name,last_name)
VALUES ('Tom','Jones'),('Susan','Smythe'),('Sarah','Toms'),('Alan','Francis'),('Julian','MacDonald'),('Fred','Bloggs'),('Rory','Belcher');
First mapping/reference... Table :-
CREATE TABLE IF NOT EXISTS study_task_relationship (study_reference INTEGER, task_reference INTEGER, PRIMARY KEY (study_reference,task_reference));
Map/relate Study's with Tasks (many-many possible)
Do some mapping (INSERT some data) :-
INSERT INTO study_task_relationship
VALUES
(1,2), -- Maths Has Task2
(1,5), -- Maths has Mark Questions
(2,1), -- English has Task1
(2,4), -- English has Task4
(2,5), -- English has Mark questions
(3,3), -- Geography has Task3
(3,1), -- Geoegrapyh has Task1
(3,2), -- Geography has Task2
(3,5), -- Geography has Mark Questions
(4,4) -- Physics has Task4
;
- See comments on each line
List the Studies along with the tasks
SELECT study_title, task_name -- (just want the Study title and task name)
FROM study_task_relationship -- use the mapping table as the main table
JOIN studies ON study_reference = studies.id -- get the related studies
JOIN tasks ON task_reference = tasks.id -- get the related tasks
ORDER BY study_title -- Order by Study title
results in :-
List each study with all it's tasks
SELECT study_title, group_concat(task_name,'...') AS tasklist
FROM study_task_relationship
JOIN studies ON study_reference = studies.id
JOIN tasks ON task_reference = tasks.id
GROUP BY studies.id
ORDER by study_title;
results in :-
Add study-participants associative table and populate it :-
CREATE TABLE IF NOT EXISTS study_participants_relationship (study_reference INTEGER, participant_reference INTEGER, PRIMARY KEY (study_reference,participant_reference));
INSERT INTO study_participants_relationship
VALUES
(1,1), -- Maths has Tom Jones
(1,5), -- Maths has Julian MacDonald
(1,6), -- Maths has Fred Bloggs
(2,4), -- English has Alan Francis
(2,7), -- English has Rory Belcher
(3,3), -- Geography has Sarah Toms
(3,2) -- Susan Smythe
;
You can now, as an example, get a list of participants the tasks along with the study title :-
SELECT study_title, task_name, participants.first_name ||' '||participants.last_name AS fullname
FROM study_task_relationship
JOIN tasks ON study_task_relationship.task_reference = tasks.id
JOIN studies On study_task_relationship.study_reference = studies.id
JOIN study_participants_relationship ON study_task_relationship.study_reference = study_participants_relationship.study_reference
JOIN participants ON study_participants_relationship.participant_reference = participants.id
ORDER BY fullname, study_title
which would result in :-
FOREIGN KEYS
As you can see there is no actual need for defining FOREIGN KEYS. They are really just an aid to stop you inadvertently doing something like :-
INSERT INTO study_participants_relationship VALUES(30,25);
No such study nor no such participant
To utilise FOREIGN KEYS you have to ensure that they are enabled, the simplest is just to issue the command to turn them on (as if it were a normal SQL statment).
PRAGMA foreign_keys=1
A FOREIGN KEY is a constraint, it stops you INSERTING, UPDATING or DELETING a row that would violate the constraint/rule.
Basically the rule is that the column to which the FOREIGN key is defined (the child) must have a value that is in the referenced table/column the parent.
So assumning that FOREIGN KEYS are turned on then coding :-
CREATE TABLE IF NOT EXISTS study_participants_relationship
(
study_reference INTEGER REFERENCES studies(id), -- column foreign key constraint
participant_reference INTEGER,
FOREIGN KEY (participant_reference) REFERENCES participants(id) -- table foreign key constraint
PRIMARY KEY (study_reference,participant_reference
)
);
Would result in INSERT INTO study_participants_relationship VALUES(30,25); failing e.g.
FOREIGN KEY constraint failed: INSERT INTO study_participants_relationship VALUES(30,25);
It fails as there is no row in studies with an id who's value is 30 (the first column foreign key constraint). If the value 30 did exist then the second constraint would kick in as there is no row in participants with an id of 25.
There is no difference between a column Foreign key constraint and a table Foreign key constraint other than where and how they are coded.
However, the above wouldn't stop you deleting all rows from the study_participants_relationship table as it would stop you deleting a row from the studies or participants table if they were referenced by the study_participants_relationship table.
"deal with an id that needs to be matched to multiple ids in SQLite?"
For many-to-many couplings, make extra coupling tables, like the study_task and participent_task tables below. This is many-to-many since a task can be on many studies and a study can have many tasks.
"make sure no one is doing more than one study at the same time"
That could be handled by letting each participant only have a column for current study (no place for more than one then).
PRAGMA foreign_keys = ON;
CREATE TABLE study (id INTEGER PRIMARY KEY, study_title TEXT, contact_person TEXT);
CREATE TABLE task (id INTEGER PRIMARY KEY, task_name TEXT, description TEXT);
CREATE TABLE participant (
id INTEGER PRIMARY KEY,
first_name TEXT,
last_name TEXT,
id_current_study INTEGER references study(id),
started_current_study DATE
);
CREATE TABLE study_task (
id_study INTEGER NOT NULL references study(id),
id_task INTEGER NOT NULL references task(id),
primary key (id_study,id_task)
);
CREATE TABLE participant_task (
id_participant INTEGER NOT NULL references participant(id),
id_task INTEGER NOT NULL references task(id),
status TEXT check (status in ('STARTED', 'DELIVERED', 'PASSED', 'FAILED')),
primary key (id_participant,id_task)
);
insert into study values (1, 'MX9345-3', 'John Doe');
insert into study values (2, 'MX9300-2', 'Jane Doe');
insert into participant values (1001, 'Michael', 'Smith', 1,'2018-04-21');
insert into participant values (1002, 'Julia', 'Barnes', 1, '2018-04-10');
insert into task values (51, 'OGTT', 'Make a ...');
insert into task values (52, 'PVT', 'Inspect the ...');
insert into study_task values (1,51);
insert into study_task values (1,52);
insert into study_task values (2,51);
--insert into study_task values (2,66); --would fail since 66 doesnt exists (controlled and enforced by foreign key)
The PRAGMA on the first line is needed to make SQLite (above version 3.6 from 2009 I think) enforce foreign keys, without it it just accepts the foreign key syntax, but no controlling is done.