Database design for email-style application - sql

I'm looking for some advice on how to design a database for an email-style application, specifically how to handle sending a message to multiple users, and displaying what messages were sent to what users.
This is what I have so far:
Messages (Primary Key is Id)
Id (Identity)
SenderId (Foreign Key to Users.Id)
<message data>
ReceivedMessages (Primary key is MessageId + RecipientId)
MessageId (Foreign Key to Messages.Id)
RecipientId (Foreign Key to Users.Id)
IsRead
So for every message sent, there would be one row in Messages, with the data, and then one row for each recipient in ReceivedMessages.
But what if I want to view all the messages sent by a user, and who they were sent to? For each message, I'd need to find all the ReceivedMessages rows for that message, and join all of those with the user table, and then somehow concat all the names (something like this: Concatenate many rows into a single text string?). Might this cause scaling issues, or is it not really anything to worry about?
Any thoughts/suggestions? Thanks.

I see no problem with your design, and would not anticipate and scalability issues with proper indexing on your tables (unless you are talking about massive scale, e.g. gmail, Yahoo mail, etc.).
As far as concatenating the recipient names, I would recommend you do this on the application side and not in SQL, or determine whether you need to do it at all (you might want to show a list and not a concatenated string).

all the messages sent by a user, and who they were sent to
You can do it as an aggregate query, something like:
SELECT u1.user_name, m.message, GROUP_CONCAT(DISTINCT u2.user_name)
FROM messages m JOIN users u1 ON (m.senderID=u1.user_id)
JOIN receivedmessages r ON (m.id=r.messageId)
JOIN users u2 ON (r.RecipientId=u2.user_id)
GROUP BY u1.user_name, m.message;
But because Recipients is essentially unlimited, you may run up against the string length limit on GROUP_CONCAT.
So it's likely better to do an unaggregated select and process the records for display in your application layer:
SELECT u1.user_name, m.message, DISTINCT u2.user_name
FROM messages m JOIN users u1 ON (m.senderID=u1.user_id)
JOIN receivedmessages r ON (m.id=r.messageId)
JOIN users u2 ON (r.RecipientId=u2.user_id)
ORDER BY u1.user_name, m.sent_date, u2.user_name;

Users change name and email, but not their login. Consider mimicking what happens in a unix-based mailbox (e.g. pine) instead:
received_messages (
user_id,
message_id,
message_date,
message_title,
message_content,
message_sender,
message_recipient,
message_is_read
)
and sent_messages along the same lines, i.e. two "files" per user.
Or even merging the latter two with a sent/received flag.

I am facing the same challenge of creating an email or messaging system for a website... You guys forget one thing... IsRead, IsDraft, IsFlagged, IsReply, IsTrash, etc... need to be in a separate table since the same message will be flagged, read or unread by two or more people!! So, we must have a Status table as shown below...
StatusID int
MessageID int
MemberID int
DateTime datetime
IPAddress varchar(65)
IsRead char(1)
IsDraft char(1)
IsFlagged char(1)
IsForwarded char(1)
IsReply char(1)
IsTrash char(1)
You will need at least three tables besides the member or user table:
mail
folders
status
attachment
log
If this is for an existing website... I would separate the mail system into a separate database if you expect this mail system to have a lot of activity.

Related

SQL stratascratch facebook interview question

SMS Confirmations From Users
Facebook sends SMS texts when users attempt to 2FA (2-factor authenticate) to log into the platform. In order to successfully 2FA they must confirm they received the SMS text. Confirmation texts are only valid on the date they were sent. Unfortunately, there was an ETL problem where friend requests and invalid confirmation records were inserted into the logs which are stored in the 'fb_sms_sends' table. Fortunately, the 'fb_confirmers' table contains valid confirmation records so you can use this table to identify confirmed SMS texts.
Calculate the percentage of confirmed SMS texts for August 4, 2020.
fb_sms_sends
ds datetime
country varchar
carrier varchar
phone_number int
type varchar
fb_confirmers
date datetime
phone_number int
My solution -
Select s.ds, (count(c.phone_number)::Float/count(s.phone_number)::Float)*100 as perc
from fb_sms_sends s
left join fb_confirmers c
on s.phone_number = c.phone_number
where s.ds = c.date
group by s.ds
fb_sms_sends table
Not sure what is wrong here. Can someone please explain?
I think what you're not handling is the scenario
Unfortunately, there was an ETL problem where friend requests and invalid confirmation records were inserted into the logs which are stored in the 'fb_sms_sends' table. Fortunately, the 'fb_confirmers' table contains valid confirmation records so you can use this table to identify confirmed SMS texts.
You need to remove the friend requests and invalid confirmation records from the table.
If you add
type NOT IN ('confirmation', 'friend_request')
to your WHERE clause, you should get the right answer.
Instead of using date in where condition, you need to use it in the join condition, it will keep all the records from the left table. Keeping in the where makes it an inner join. I slightly modified your solution.
Select s.ds, (count(c.phone_number)::Float/count(s.phone_number)::Float)*100 as perc
from fb_sms_sends s
left join fb_confirmers c
on s.phone_number = c.phone_number and s.ds = c.date
where type <> 'friend_request'
and s.ds = '2020-08-04'
group by 1

How to tightly contain an SQL query result

I'm writing an application that implements a message system through a 'memos' table in a database. The table has several fields that look like this:
id, date_sent, subject, senderid, recipients,message, status
When someone sends a new memo, it will be entered into the memos table. A memo can be sent to multiple people at the same time and the recipients userid's will be inserted into the 'recipients' field as comma separated values.
It would seem that an SQL query like this would work to see if a specific userid is included in a memo:
SELECT * FROM memos WHERE recipients LIKE %15%
But I'm not sure this is the right solution. If I use the SQL statement above, won't that return everything that "contains" 15? For example, using the above statement, user 15, 1550, 1564, 2015, would all be included in the result set (and those users might not actually be on the recipient list).
What is the best way to resolve this so that ONLY the user 15 is pulled in if they are in the recipient field instead of everything containing a 15? Am I misunderstanding the LIKE statement?
I think you would be better off having your recipients as a child table of the memos table. So your memo's table has a memo ID which is referenced by the child table as
MemoRecipients
-----
MemoRecipientId INT PRIMARY KEY, IDENTITY,
MemoId INT FK to memos NOT NULL
UserId INT NOT NULL
for querying specific memos from a user you would do something like
SELECT *
FROM MEMOS m
INNER JOIN memoRecipients mr on m.Id = mr.memoId
WHERE userId = 15
No, you aren't misunderstood, that's how LIKE works.. But to achieve what you want, it would be better not to combine the recipients into 1 field. Instead try to create separate table that saves the recipient list for each memo..
For me I will use below schema, for your need:
Table_Memo
id, date_sent, subject, senderid, message, status
Table_Recipient
id_memo FK Table_Memo(id), recipient
By doing so, if you want to get specific recipients from a memo, you can do such query:
SELECT a.* FROM Table_Memo a, Table_Recipient b
WHERE a.id = "memo_id" AND a.id = b.id_memo AND b.recipient LIKE %15%
I am not sure how your application is exactly pulling these messages, but I imagine that better way would be creating a table message_recepient, which will represent many-to-many relationship between recipients and memos
id, memoId, recepientId
Then your application could pull messages like this
SELECT m.*
FROM memos m inner join message_recepient mr on m.id = mr.memoId
WHERE recepientId = 15
This way you will get messages for the specific user. Again, don't know what your status field is for but if this is for new/read/unread, you could add in your where
and m.status = 'new'
Order by date_set desc
This way you could just accumulate messages, those that are new

Stored Procedure to get the count using 2 tables

I am trying to write a complicated stored procedure for the first time. My goal is to get the count with some condition from 2 tables.
Consider Merchant table and Email table.
Email table saves the Email invitations sent by the Merchant. Merchant table has all the Merchant Info along with Email IDs.
My goal is to get the count of EmailID s that are in the Merchant table by checking if the Email invitations sent by Merchant has signed up.
I have tried to make this question clear... Hope i am clear.
Thanks in advance..
Why a stored procedure? It sounds like it can be done in a single SQL query.
Let's see if I understand your question correctly: Merchants invite other people to become a Merchant as well and you want a list with the number of accepted invitations per merchant?
Something along those lines:
select MerchantName, count(1)
from Merchants, Emails
where Merchants.Id = Emails.Id
and Emails.SignedUp = 'YES!'
group by MerchantName;
It sounds like there is some confusion in that schema you're describing. Based on what I think you're trying to do I'd suggest you have a Merchant table, an Email table and a MerchantEmail table which links Merchants to email invitations sent.
The way it appears otherwise with a Merchant table that has an Email ID is a many-to-one relationship so that several Merchants could be the recipients of an email. In that case the Signed Up flag would appear in the Merchants table and not in the Email table.
CREATE PROCEDURE GetMerchantsSignedUp
#EmailId INT
AS
SELECT COUNT(*) AS MerchantSignedUp
FROM Merchant, Email
WHERE Merchant.EmailId = Email.Id
AND Merchant.SignedUp = 1
AND Email.Id = #EmailId
Please note that there is some redundancy above which suggests that you need not even include the Email table in the query. Here it acts only to indicate a foreign key relationship.

SQL multiple join on many to many tables + comma separation

I have these tables:
media
id (int primary key)
uri (varchar).
media_to_people
media_id (int primary key)
people_id (int primary key)
people
id (int primary key)
name (varchar)
role (int) -- role specifies whether the person is an artist, publisher, writer, actor,
etc relative to the media and has range(1-10)
This is a many to many relation
I want to fetch a media and all its associated people in a select. So if a media has 10 people associated with it, all 10 must come.
Further more, if multiple people with the same role exist for a given media, they must come as comma separated values under a column for that role.
Result headings must look like: media.id, media.uri, people.name(actor), people.name(artist), people.name(publisher) and so on.
I'm using sqlite.
SQLite doesn't have the "pivot" functionality you'd need for starters, and the "comma separated values" part is definitely a presentation issue that it would be absurd (and possibly unfeasible) to try to push into any database layer, whatever dialect of SQL may be involved -- it's definitely a part of the job you'd do in the client, e.g. a reporting facility or programming language.
Use SQL for data access, and leave presentation to other layers.
How you get your data is
SELECT media.id, media.uri, people.name, people.role
FROM media
JOIN media_to_people ON (media.id = media_to_people.media_id)
JOIN people ON (media_to_people.people_id = people.id)
WHERE media.id = ?
ORDER BY people.role, people.name
(the ? is one way to indicate a parameter in SQLite, to be bound to the specific media id you're looking for in ways that depend on your client); the data will come from the DB to your client code in several rows, and your client code can easily put them into the single column form that you want.
It's hard for us to say how to code the client-side part w/o knowing anything about the environment or language you're using as the client. But in Python for example:
def showit(dataset):
by_role = collections.defaultdict(list)
for mediaid, mediauri, name, role in dataset:
by_role[role].append(name)
headers = ['mediaid', 'mediauri']
result = [mediaid, mediauri]
for role in sorted(by_role):
headers.append('people(%s)' % role)
result.append(','.join(by_role[role]))
return ' '.join(headers) + '\n' + ' '.join(result)
even this won't quite match your spec -- you ask for headers such as 'people(artist)' while you also specify that the role's encoded as an int, and mention no way to go from the int to the string 'artist', so it's obviously impossible to match your spec exactly... but it's as close as my ingenuity can get;-).
I agree with Alex Martelli's answer, that you should get the data in multiple rows and do some processing in your application.
If you try to do this with just joins, you need to join to the people table for each role type, and if there are multiple people in each role, your query will have Cartesian products between these roles.
So you need to do this with GROUP_CONCAT() and produce a scalar subquery in your select-list for each role:
SELECT m.id, m.uri,
(SELECT GROUP_CONCAT(name)
FROM media_to_people JOIN people ON (people_id = id)
WHERE media_id = m.id AND role = 1) AS Actors,
(SELECT GROUP_CONCAT(name)
FROM media_to_people JOIN people ON (people_id = id)
WHERE media_id = m.id AND role = 2) AS Artists,
(SELECT GROUP_CONCAT(name)
FROM media_to_people JOIN people ON (people_id = id)
WHERE media_id = m.id AND role = 3) AS Publishers
FROM media m;
This is truly ugly! Don't try this at home!
Take our advice, and don't try to format the pivot table using only SQL.

Fetch unread messages, by user

I want to maintain a list of global messages that will be displayed to all users of a web app. I want each user to be able to mark these messages as read individually. I've created 2 tables; messages (id, body) and messages_read (user_id, message_id).
Can you provide an sql statement that selects the unread messages for a single user? Or do you have any suggestions for a better way to handle this?
Thanks!
Well, you could use
SELECT id FROM messages m WHERE m.id NOT IN(
SELECT message_id FROM messages_read WHERE user_id = ?)
Where ? is passed in by your app.
If the table definitions you mentioned are complete, you might want to include a date for each message, so you can order them by date.
Also, this might be a slightly more efficient way to do the select:
SELECT id, message
FROM messages
LEFT JOIN messages_read
ON messages_read.message_id = messages.id
AND messages_read.[user_id] = #user_id
WHERE
messages_read.message_id IS NULL
Something like:
SELECT id, body FROM messages LEFT JOIN
(SELECT message_id FROM messages_read WHERE user_id = ?)
ON id=message_id WHERE message_id IS NULL
Slightly tricky and I'm not sure how the performance will scale up, but it should work.