Database design for email messaging system

Database design for email messaging system - sql

I want to make an email messaging system like gmail have. I would like to have following option: Starred, Trash, Spam, Draft, Read, Unread. Right now I have the below following structure in my database :
CREATE TABLE [MyInbox](
[InboxID] [int] IDENTITY(1,1) NOT NULL,
[FromUserID] [int] NOT NULL,
[ToUserID] [int] NOT NULL,
[Created] [datetime] NOT NULL,
[Subject] [nvarchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Body] [nvarchar](max) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[IsRead] [bit] NOT NULL,
[IsReceived] [bit] NOT NULL,
[IsSent] [bit] NOT NULL,
[IsStar] [bit] NOT NULL CONSTRAINT [DF_MyInbox_IsStarred] DEFAULT ((0)),
[IsTrash] [bit] NOT NULL CONSTRAINT [DF_MyInbox_IsTrashed] DEFAULT ((0)),
[IsDraft] [bit] NOT NULL CONSTRAINT [DF_MyInbox_Isdrafted] DEFAULT ((0))
) ON [PRIMARY]
But I am facing some issues with the above structure. Right now if a user A sends a msessage to user B I am storing a row in this table But if user B deletes the that message it gets deleted frm user's A sent message too. This is wrong, I want exactly as normal email messaging system does. If A deletes message from his sent item then B should not get deleted from his inbox. I am thinking on other problem here which will come suppose a user A sent a mail to 500 users at once so as per my design I will have 500 rows with duplicate bodies i.e not a memory efficent way to store it. Could you guys please help me in makeing the design for a messaging system ?

You need to split your table for it. You could have following schema and structure
CREATE TABLE [Users]
(
[UserID] INT ,
[UserName] NVARCHAR(50) ,
[FirstName] NVARCHAR(50) ,
[LastName] NVARCHAR(50)
)
CREATE TABLE [Messages]
(
[MessageID] INT ,
[Subject] NVARCHAR(MAX) ,
[Body] NVARCHAR(MAX) ,
[Date] DATETIME,
[AuthorID] INT,
)
CREATE TABLE [MessagePlaceHolders]
(
[PlaceHolderID] INT ,
[PlaceHolder] NVARCHAR(255)--For example: InBox, SentItems, Draft, Trash, Spam
)
CREATE TABLE [Users_Messages_Mapped]
(
[MessageID] INT ,
[UserID] INT ,
[PlaceHolderID] INT,
[IsRead] BIT ,
[IsStarred] BIT
)
In users table you can have users."Messages" denotes the table for messages. "MessagePlaceHolders" denotes the table for placeholders for messages. Placeholders can be inbox, sent item, draft, spam or trash. "Users_Messages_Mapped" denotes the mapping table for users and messages. The "UserID" and "PlaceHolderID" are the foreign keys."IsRead" and "IsStarred" signifies what their name stands for.
If there is no record found for a particular messageid in "Users_Messages_Mapped" table that record will be deleted from Messages table since we no longer need it.

If you're doing document-orientated work, I suggest taking a look at CouchDB. It is schema-less, meaning issues like this disappear.
Let's take a look at the example: A sends a message to B, and it's deleted by B.
You would have a single instance of the document, with recipients listed as an attribute of the email. As users delete messages, you either remove them from the recipients list or add them to a list of deleted_by or whatever you choose.
It's a much different approach to data than what you're used to, but may be highly beneficial to take some time to consider.

I think you need to decompose your schema some more. Store emails seperately, and map inboxes to the messages they contain.

If I were you I would set two flags one for sender and other one for receiver if both flags are true then message should be deleted from database otherwise keep that in database but hide it from who deleted it.
Do same thing for trash. You may want to run cron or check manually if both sender and receiver delete the message then remove it from database.

A message can only be in one folder at a time, so you want a folders table (containing folders 'Trash', 'Inbox', 'Archive', etc.) and a foreign key from messages to folders.
For labels, you have a many-to-many relation, so you need a labels table and also a link table (messages_labels).
For starring, a simple bit column should do, same for 'unread'.

CREATE TABLE `mails` (
`message_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`message` varchar(10000) NOT NULL DEFAULT '',
`file` longblob,
`mailingdate` varchar(40) DEFAULT NULL,
`starred_status` int(10) unsigned NOT NULL DEFAULT '0',
`sender_email` varchar(200) NOT NULL DEFAULT '',
`reciever_email` varchar(200) NOT NULL DEFAULT '',
`inbox_status` int(10) unsigned NOT NULL DEFAULT '0',
`sent_status` int(10) unsigned NOT NULL DEFAULT '0',
`draft_status` int(10) unsigned NOT NULL DEFAULT '0',
`trash_status` int(10) unsigned NOT NULL DEFAULT '0',
`subject` varchar(200) DEFAULT NULL,
`read_status` int(10) unsigned NOT NULL DEFAULT '0',
`delete_status` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`message_id`)
)
You can use this table for storing the mails and manipulate the queries according to mail boxes. I am avoiding rest of the tables like user details and login details table. You can make them according to your need.

You could create a table for MessageContacts which joins each message to the people who have it in their mailboxes. When a user deletes a message then a row gets deleted from MessageContacts but the original message is preserved.
You could do that... but I suggest you don't. Unless it's an academic exercise set by your tutor then it is surely a complete waste of time to develop your own messaging system. If it is homework then you ought to say so. If not, then go do something more useful instead.

WHY DELETE? I think there is no need to delete anything. Just hide it, from users when deleted. Because, it will problem to check both sides, when sender send same message to many recipients. Then you have to check and flag all recipients. If all OK, then delete...
I think there is no need to delete anything.

in my structure, I set "deleted: bool" flag and depend on its value show message or hide.

Related

Exclude the user on the count if the post is created by the same user

I have the following scenario in my system:
a member:
CREATE TABLE `member` (
`memberid` int(11) NOT NULL,
`email` text
);
creates the protocols:
CREATE TABLE `protocol` (
`protocolid` int(11) NOT NULL,
`createdby` int(11) NOT NULL,
`status` varchar(256) DEFAULT NULL
) ;
member can create a feedback post on the protocols
CREATE TABLE `protocolpost` (
`protocolid` int(11) NOT NULL,
`protocolpostid` int(11) NOT NULL,
`createdby` text
) ;
member can reply to the feedback
CREATE TABLE `protocolpostcomment` (
`protocolpostcommentid` int(11) NOT NULL,
`protocolpostid` int(11) NOT NULL,
`commentedby` varchar(256) DEFAULT NULL,
`hasfeedbackreplyviewed` tinyint(1) DEFAULT NULL
) ;
I wanted to get the total count of replies from all post comments made on a protocol created by a member, excluding counts of a user who created that protocol, and comments made by the author of the post.
I have written this query so far, but this query returns all the post comments, I wanted to exclude the reply done by the feedback creator.
SELECT
protocols.*,
protocolFeedbackReply.*,
protocolfeedback.*
FROM protocolpost AS protocolfeedback
JOIN protocol AS protocols
ON protocols.protocolid = protocolfeedback.protocolid
JOIN protocolpostcomment AS protocolFeedbackReply
ON protocolfeedback.protocolpostid =
protocolFeedbackReply.protocolpostid
WHERE protocols.createdby = 1038
AND protocols.status = "published"
AND protocolFeedbackReply.hasfeedbackreplyviewed = 0
AND protocolfeedback.createdby NOT LIKE Concat('%', (SELECT email
FROM member
WHERE
memberid = 1038),
'%');
I have attached a dbfiddle here:
In the dbfiddle example only the comment that is done by the user nwxaofrc#tempemail.com, , should be on the count.

Thank you for your very good description. Your query is difficult to read and might be simplified if possible. Anyway, within your NOT LIKE condition, it seems you need to check protocolFeedbackReply.commentedby instead of protocolfeedback.createdby, see db<>fiddle

Why does adding a nullable default constraint to an existing column take so long?

I have an existing table with approximately 400 million rows. That table includes a set of bit columns named IsModified, IsDeleted, and IsExpired.
CREATE TABLE [dbo].[ActivityAccumulator](
[ActivityAccumulator_SK] [int] IDENTITY(1,1) NOT NULL,
[ActivityAccumulatorPK1] [int] NULL,
[UserPK1] [int] NULL,
[Data] [varchar](510) NULL,
[CoursePK1] [int] NULL,
[TimeStamp] [datetime] NULL,
[SessionID] [int] NULL,
[Status] [varchar](50) NULL,
[EventType] [varchar](40) NULL,
[DWCreated] [datetime] NULL,
[DWModified] [datetime] NULL,
[IsModified] [bit] NULL,
[DWDeleted] [datetime] NULL,
[IsDeleted] [bit] NULL,
[ActivityAccumulatorKey] [bigint] NULL,
[ContentPK1] [bigint] NULL
) ON [PRIMARY]
I would like to add a default constraint to the table that, for all future inserted rows, will default those bit columns to 0. I'm trying to do this via the following command:
ALTER TABLE ActivityAccumulator
ADD CONSTRAINT DF_ActivityAccumulatorIsExpired DEFAULT (0) FOR IsExpired
ALTER TABLE ActivityAccumulator
ADD CONSTRAINT DF_ActivityAccumulatorIsDeleted DEFAULT (0) FOR IsDeleted
ALTER TABLE ActivityAccumulator
ADD CONSTRAINT DF_ActivityAccumulatorIsModified DEFAULT (0) FOR IsModified
I'd eventually like to go back and clean up the existing data to put the zero value in wherever there are NULL values, but I don't really need to do so right now.
Just trying to run the first ADD CONSTRAINT command has been executing for over an hour now. Given that I'm not trying to change any existing values, why is this taking so long?

One possibility may be that you have another process on your server that's locking this table.
Imagine I have two SSMS windows open, and in the first one I execute these commands:
-- Session 1
CREATE TABLE Foo(IsTrue BIT)
INSERT INTO Foo VALUES (1),(1),(0)
BEGIN TRANSACTION
UPDATE Foo SET IsTrue = 1 - IsTrue
And then leave the SSMS window open so that the transaction never closes, trying to execute this simple constraint command in the other SSMS session will hang forever:
-- Session 2
ALTER TABLE Foo ADD CONSTRAINT FooDefault DEFAULT(0) FOR IsTrue
Note that in this example, the size or complexity of the table is irrelevant; I'm forced to wait for the transaction to complete. My alter instruction in session 2 won't complete until I release the lock on Foo either by COMMITing the transaction or closing session 1.
How can you tell if this is your problem? Have a look at the "Processes" list in the SSMS activity monitor. If your ALTER instruction is waiting for something else to complete, there'll be a number in the "Blocked By" column indicating the Session ID of the command that's causing your problem.
That session may in turn be waiting on another and so forth. If you follow these references, you eventually find a process with a 1 in the "Head Blocker" column. From there you can decide whether the appropriate action is to kill the offending process, or just wait it out.

recreate the object with all the constrains
dump the data
lock the original object
switch the object names
this way is the fastest if you want to optimize, re-index and avoid conflicts like the one mentioned by Dan

SQL Table handle large amounts of records

I need to make sure that a table of mine can handle in excess of 1,000,000 records.
Can I have some advice on my table code to determine if it can indeed handle this amount of records.
Here is my code:
USE [db_person_cdtest]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [Person](
[PersonID] [numeric](18, 0) IDENTITY(1,1) NOT NULL,
[ID] [varchar](20),
[FirstName] [varchar](50) NOT NULL,
[LastName] [varchar](50) NOT NULL,
[AddressLine1] [varchar](50),
[AddressLine2] [varchar](50),
[AddressLine3] [varchar](50),
[MobilePhone] [varchar](20),
[HomePhone] [varchar](20),
[Description] [varchar](10),
[DateModified] [datetime],
[PersonCategory] [varchar](30) NOT NULL,
[Comment] [varchar](max),
CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED
(
[PersonID] DESC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY];

Almost any table structure in almost any database can handle a million records. That is not a large number of records for a modern computer running modern software.
Your structure looks reasonable. One question is whether the fields are always large enough to hold the value in the data. It looks like you are using SQL Server. There is no difference in storage or performance to declaring a varchar(50) versus a varchar(8000). "50" seems on the low side to me.
Another comment is that you have a DateModified column. I would suggest that you also keep a history table of the modifications. It is often important to know what changed, when it changed, and what the values were before the change.
In more advanced systems, you would not be storing a person's address and telephone number in the same table as their unique ids. A person could have more than one address (shipping address, billing address, home address, etc.). A person could have many telephone numbers (landline number, mobile number, work number, work mobile, etc.). And, you have no fields for email address, Facebook id, and so on. Contact information is more complex than a few fields in a table.
Finally, as a matter of habit, I almost always include the following fields at the end of every table:
CreatedBy varchar(255) default system_user,
CreataedAt datetime not null default getdate()
This let's me know who and when a row was created.

SQL Not Auto-incrementing

I have the following SQL I trigger in a C# app.
All works well but the ID table doesn't auto increment. It creates the value of 1 for the first entry then will not allow other inserts due to not being able to create a unquie ID.
Here is the SQL:
CREATE TABLE of_mapplist_raw (
id integer PRIMARY KEY NOT NULL,
form_name varchar(200) NOT NULL,
form_revi varchar(200) NOT NULL,
source_map varchar(200),
page_num varchar(200) NOT NULL,
fid varchar(200) NOT NULL,
fdesc varchar(200) NOT NULL
)";
I'm sure its a schoolboy error at play here.

you need to specify its seed and increment.( plus , i dont think there is integer keyword ....)
id [int] IDENTITY(1,1) NOT NULL,
the first value is the seed
the second one is the delta between increases
A Question you might ask :
delta between increases ? why do i need that ? its always 1 ....??
well - yes and no. sometimes you want to leave a gap between rows - so you can later insert rows between... specially if its clustered index by that key....and speed is important... so you can pre-design it to leave gaps.
p.s. ill be glad to hear other scenarios from watchers.

You need to mention the Identity.
id int IDENTITY(1,1) NOT NULL

How do you setup Post Revisions/History Tracking with ORM?

I am trying to figure out how to setup a revisions system for posts and other content. I figured that would mean it would need to work with a basic belongs_to/has_one/has_many/has_many_though ORM (any good ORM should support this).
I was thinking a that I could have some tables like (with matching models)
[[POST]] (has_many (text) through (revisions)
id
title
[[Revisions]] (belongs_to posts/text)
id
post_id
text_id
date
[[TEXT]]
id
body
user_id
Where I could join THROUGH the revisions table to get the latest TEXT body. But I'm kind of foggy on how it will all work. Has anyone setup something like this?
Basically, I need to be able to load an article and request the latest content entry.
// Get the post row
$post = new Model_Post($id);
// Get the latest revision (JOIN through revisions to TEXT) and print that body.
$post->text->body;
Having the ability to shuffle back in time to previous revisions and removing revisions would also be a big help.
At any rate, these are just ideas of how I think that some kind of history tracking would work. I'm open to any form of tracking I just want to know what the best-practice is.
:EDIT:
It seems that moving forward, two tables seems to make the most sense. Since I plan to store two copies of text this will also help to save space. The first table posts will store the data of the current revision for fast reads without any joins. The posts body will be the value of the matching revision's text field - but processed through markdown/bbcode/tidy/etc. This will allow me to retain the original text (for the next edit) without having to store that text twice in one revision row (or having to re-parse it each time I display it).
So fetching will be be ORM friendly. Then for creates/updates I will have to handle revisions separately and then just update the post object with the new current revision values.
CREATE TABLE IF NOT EXISTS `posts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`published` tinyint(1) unsigned DEFAULT NULL,
`allow_comments` tinyint(1) unsigned DEFAULT NULL,
`user_id` int(11) NOT NULL,
`title` varchar(100) NOT NULL,
`body` text NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `published` (`published`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
CREATE TABLE IF NOT EXISTS `postsrevisions` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`post_id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`is_current` tinyint(1) unsigned DEFAULT NULL,
`date` datetime NOT NULL,
`title` varchar(100) NOT NULL,
`text` text NOT NULL,
`image` varchar(200) NOT NULL,
PRIMARY KEY (`id`),
KEY `post_id` (`post_id`),
KEY `user_id` (`user_id`),
KEY `is_current` (`is_current`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;

Your Revisions table as you have shown it models a many-to-many relationship between Posts and Text. This is probably not what you want, unless a given row in Text may provide the content for multiple rows in Posts. This is not how most CMS architectures work.
You certainly don't need three tables. I have no idea why you think this is needed for 3NF. The point of 3NF is that an attribute should not depend on a non-key attribute, it doesn't say you should split into multiple tables needlessly.
So you might only need a one-to-many relationship between two tables: Posts and Revisions. That is, for each post, there can be multiple revisions, but a given revision applies to only one post. Others have suggested two alternatives for finding the current post:
A flag column in Revisions to note the current revision. Changing the current revision is as simple as changing the flag to true in the desired revision and to false to the formerly current revision.
A foreign key in Posts to the revision that is current for the given post. This is even simpler, because you can change the current revision in one update instead of two. But circular foreign key references can cause problems vis-a-vis backup & restore, cascading updates, etc.
You could even implement the revision system using a single table:
CREATE TABLE PostRevisions (
post_revision_id SERIAL PRIMARY KEY,
post_id INT NOT NULL,
is_current TINYINT NULL,
date DATE,
title VARCHAR(80) NOT NULL,
text TEXT NOT NULL,
UNIQUE KEY (post_id, is_current)
);
I'm not sure it's duplication to store the title with each revision, because the title could be revised as much as the text, couldn't it?
The column is_current should be either 1 or NULL. A unique constraint doesn't count NULLs, so you can have only one row where is_current is 1 and an unlimited number of rows where it's NULL.
This does require updating two rows to make a revision current, but you gain some simplicity by reducing the model to a single table. This is a great advantage when you're using an ORM.
You can create a view to simplify the common case of querying current posts:
CREATE VIEW Posts AS SELECT * FROM PostRevisions WHERE is_current = 1;
update: Re your updated question: I agree that proper relational design would encourage two tables so that you could make a few attributes of a Post invariant for all that post's revisions. But most ORM tools assume an entity exists in a single table, and ORM's are clumsy at joining rows from multiple tables to constitute a given entity. So I would say if using an ORM is a priority, you should store the posts and revisions in a single table. Sacrifice a little bit of relational correctness to support the assumptions of the ORM paradigm.
Another suggestion is to consider Dimensional Modeling. This is a school of database design to support OLAP and data warehousing. It uses denormalization judiciously, so you can usually organize data in a Star Schema. The main entity (the "Fact Table") is represented by a single table, so this would be a win for an ORM-centric application design.

You'd probably be better off in this case to put a CurrentTextID on your Post table to avoid having to figure out which revision is current (an alternative would be a flag on Revision, but I think a CurrentTextID on the post will give you easier queries).
With the CurrentTextID on the Post, your ORM should place a single property (CurrentText) on your Post class which would allow you to access the current text with essentially the statement you provided.
Your ORM should also give you some way to load the Revisions based on the Post; If you want more details about that then you should include information about which ORM you are using and how you have it configured.

I think two tables would suffice here. A post table and it's revisions. If you're not worried about duplicating data, a single table (de-normalized) could also work.

For anyone interested, here is how wordpress handles revisions using a single MySQL posts table.
CREATE TABLE IF NOT EXISTS `wp_posts` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`post_author` bigint(20) unsigned NOT NULL DEFAULT '0',
`post_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`post_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`post_content` longtext NOT NULL,
`post_title` text NOT NULL,
`post_excerpt` text NOT NULL,
`post_status` varchar(20) NOT NULL DEFAULT 'publish',
`comment_status` varchar(20) NOT NULL DEFAULT 'open',
`ping_status` varchar(20) NOT NULL DEFAULT 'open',
`post_password` varchar(20) NOT NULL DEFAULT '',
`post_name` varchar(200) NOT NULL DEFAULT '',
`to_ping` text NOT NULL,
`pinged` text NOT NULL,
`post_modified` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`post_modified_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`post_content_filtered` text NOT NULL,
`post_parent` bigint(20) unsigned NOT NULL DEFAULT '0',
`guid` varchar(255) NOT NULL DEFAULT '',
`menu_order` int(11) NOT NULL DEFAULT '0',
`post_type` varchar(20) NOT NULL DEFAULT 'post',
`post_mime_type` varchar(100) NOT NULL DEFAULT '',
`comment_count` bigint(20) NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`),
KEY `post_name` (`post_name`),
KEY `type_status_date` (`post_type`,`post_status`,`post_date`,`ID`),
KEY `post_parent` (`post_parent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas