How do you setup Post Revisions/History Tracking with ORM? - orm

I am trying to figure out how to setup a revisions system for posts and other content. I figured that would mean it would need to work with a basic belongs_to/has_one/has_many/has_many_though ORM (any good ORM should support this).
I was thinking a that I could have some tables like (with matching models)
[[POST]] (has_many (text) through (revisions)
id
title
[[Revisions]] (belongs_to posts/text)
id
post_id
text_id
date
[[TEXT]]
id
body
user_id
Where I could join THROUGH the revisions table to get the latest TEXT body. But I'm kind of foggy on how it will all work. Has anyone setup something like this?
Basically, I need to be able to load an article and request the latest content entry.
// Get the post row
$post = new Model_Post($id);
// Get the latest revision (JOIN through revisions to TEXT) and print that body.
$post->text->body;
Having the ability to shuffle back in time to previous revisions and removing revisions would also be a big help.
At any rate, these are just ideas of how I think that some kind of history tracking would work. I'm open to any form of tracking I just want to know what the best-practice is.
:EDIT:
It seems that moving forward, two tables seems to make the most sense. Since I plan to store two copies of text this will also help to save space. The first table posts will store the data of the current revision for fast reads without any joins. The posts body will be the value of the matching revision's text field - but processed through markdown/bbcode/tidy/etc. This will allow me to retain the original text (for the next edit) without having to store that text twice in one revision row (or having to re-parse it each time I display it).
So fetching will be be ORM friendly. Then for creates/updates I will have to handle revisions separately and then just update the post object with the new current revision values.
CREATE TABLE IF NOT EXISTS `posts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`published` tinyint(1) unsigned DEFAULT NULL,
`allow_comments` tinyint(1) unsigned DEFAULT NULL,
`user_id` int(11) NOT NULL,
`title` varchar(100) NOT NULL,
`body` text NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `published` (`published`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
CREATE TABLE IF NOT EXISTS `postsrevisions` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`post_id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`is_current` tinyint(1) unsigned DEFAULT NULL,
`date` datetime NOT NULL,
`title` varchar(100) NOT NULL,
`text` text NOT NULL,
`image` varchar(200) NOT NULL,
PRIMARY KEY (`id`),
KEY `post_id` (`post_id`),
KEY `user_id` (`user_id`),
KEY `is_current` (`is_current`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;

Your Revisions table as you have shown it models a many-to-many relationship between Posts and Text. This is probably not what you want, unless a given row in Text may provide the content for multiple rows in Posts. This is not how most CMS architectures work.
You certainly don't need three tables. I have no idea why you think this is needed for 3NF. The point of 3NF is that an attribute should not depend on a non-key attribute, it doesn't say you should split into multiple tables needlessly.
So you might only need a one-to-many relationship between two tables: Posts and Revisions. That is, for each post, there can be multiple revisions, but a given revision applies to only one post. Others have suggested two alternatives for finding the current post:
A flag column in Revisions to note the current revision. Changing the current revision is as simple as changing the flag to true in the desired revision and to false to the formerly current revision.
A foreign key in Posts to the revision that is current for the given post. This is even simpler, because you can change the current revision in one update instead of two. But circular foreign key references can cause problems vis-a-vis backup & restore, cascading updates, etc.
You could even implement the revision system using a single table:
CREATE TABLE PostRevisions (
post_revision_id SERIAL PRIMARY KEY,
post_id INT NOT NULL,
is_current TINYINT NULL,
date DATE,
title VARCHAR(80) NOT NULL,
text TEXT NOT NULL,
UNIQUE KEY (post_id, is_current)
);
I'm not sure it's duplication to store the title with each revision, because the title could be revised as much as the text, couldn't it?
The column is_current should be either 1 or NULL. A unique constraint doesn't count NULLs, so you can have only one row where is_current is 1 and an unlimited number of rows where it's NULL.
This does require updating two rows to make a revision current, but you gain some simplicity by reducing the model to a single table. This is a great advantage when you're using an ORM.
You can create a view to simplify the common case of querying current posts:
CREATE VIEW Posts AS SELECT * FROM PostRevisions WHERE is_current = 1;
update: Re your updated question: I agree that proper relational design would encourage two tables so that you could make a few attributes of a Post invariant for all that post's revisions. But most ORM tools assume an entity exists in a single table, and ORM's are clumsy at joining rows from multiple tables to constitute a given entity. So I would say if using an ORM is a priority, you should store the posts and revisions in a single table. Sacrifice a little bit of relational correctness to support the assumptions of the ORM paradigm.
Another suggestion is to consider Dimensional Modeling. This is a school of database design to support OLAP and data warehousing. It uses denormalization judiciously, so you can usually organize data in a Star Schema. The main entity (the "Fact Table") is represented by a single table, so this would be a win for an ORM-centric application design.

You'd probably be better off in this case to put a CurrentTextID on your Post table to avoid having to figure out which revision is current (an alternative would be a flag on Revision, but I think a CurrentTextID on the post will give you easier queries).
With the CurrentTextID on the Post, your ORM should place a single property (CurrentText) on your Post class which would allow you to access the current text with essentially the statement you provided.
Your ORM should also give you some way to load the Revisions based on the Post; If you want more details about that then you should include information about which ORM you are using and how you have it configured.

I think two tables would suffice here. A post table and it's revisions. If you're not worried about duplicating data, a single table (de-normalized) could also work.

For anyone interested, here is how wordpress handles revisions using a single MySQL posts table.
CREATE TABLE IF NOT EXISTS `wp_posts` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`post_author` bigint(20) unsigned NOT NULL DEFAULT '0',
`post_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`post_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`post_content` longtext NOT NULL,
`post_title` text NOT NULL,
`post_excerpt` text NOT NULL,
`post_status` varchar(20) NOT NULL DEFAULT 'publish',
`comment_status` varchar(20) NOT NULL DEFAULT 'open',
`ping_status` varchar(20) NOT NULL DEFAULT 'open',
`post_password` varchar(20) NOT NULL DEFAULT '',
`post_name` varchar(200) NOT NULL DEFAULT '',
`to_ping` text NOT NULL,
`pinged` text NOT NULL,
`post_modified` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`post_modified_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`post_content_filtered` text NOT NULL,
`post_parent` bigint(20) unsigned NOT NULL DEFAULT '0',
`guid` varchar(255) NOT NULL DEFAULT '',
`menu_order` int(11) NOT NULL DEFAULT '0',
`post_type` varchar(20) NOT NULL DEFAULT 'post',
`post_mime_type` varchar(100) NOT NULL DEFAULT '',
`comment_count` bigint(20) NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`),
KEY `post_name` (`post_name`),
KEY `type_status_date` (`post_type`,`post_status`,`post_date`,`ID`),
KEY `post_parent` (`post_parent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;

Related

One to many to many relationship, with composite keys

In my game, an archetype is a collection of associated traits, an attack type, a damage type, and a resource type. Each piece of data is unique to each
archetype. For example, the Mage archetype might look like the following:
archetype: Mage
attack: Targeted Area Effect
damage: Shock
resource: Mana
trait_defense: Willpower
trait_offense: Intelligence
This is the archetype table in SQLite syntax:
create table archetype
(
archetype_id varchar(16) not null,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_defense_id varchar(16) not null,
trait_offense_id varchar(16) not null,
archetype_description varchar(128),
constraint pk_archetype primary key (archetype_id),
constraint uk_archetype unique (attack_id, damage_id,
resource_type_id,
trait_defense_id,
trait_offense_id)
);
The primary key should be the complete composite, but I do not want to pass
all the data around to other tables unless necessary. For example, there are
crafting skills associated with each archetype which do not need to know any
other archetype data.
An effect is a combat outcome that can be applied to a friend or foe. An effect has an application type (instant, overtime), a type (buff, debuff, harm, heal, etc.) and a detail describing to which stat the effect applies. It also has most of the archetype data to make each effect unique. Also included is the associated trait used for progress and skill checks. For example, an effect might look like:
apply: Instant
type: Harm
detail: Health
archetype: Mage
attack_id: Targeted Area Effect
damage_id: Shock
resource: Mana
trait_id: Intelligence
This is the effect table in SQLite syntax:
create table effect
(
effect_apply_id varchar(16) not null,
effect_type_id varchar(16) not null,
effect_detail_id varchar(16) not null,
archetype_id varchar(16) not null,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_id varchar(16),
constraint pk_effect primary key(archetype_id, effect_type_id,
effect_detail_id, effect_apply_id,
attack_id, damage_id, resource_type_id),
constraint fk_effect_archetype_id foreign key(archetype_id, attack_id,
damage_id, resource_type_id)
references archetype (archetype_id, attack_id,
damage_id, resource_type_id)
);
An ability is a container that can hold multiple effects. There is no limit to
the kinds of effects it can hold, e.g. having both Mage and Warrior effects in
the same ability, or even having two of the same effects, is fine. Each effect
in the ability is going to have the archetype data, and the effect data.
Again.
Ability tables in SQLite syntax:
create table ability
(
ability_id varchar(64),
ability_description varchar(128),
constraint pk_ability primary key (ability_id)
);
create table ability_effect
(
ability_effect_id integer primary key autoincrement,
ability_id varchar(64) not null,
archetype_id varchar(16) not null,
effect_type_id varchar(16) not null,
effect_detail_id varchar(16) not null,
effect_apply_id varchar(16) not null,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_id varchar(16),
constraint fk_ability_effect_ability_id foreign key (ability_id)
references ability (ability_id),
constraint fk_ability_effect_effect_id foreign key (archetype_id,
effect_type_id,
effect_detail_id,
effect_apply_id)
references effect (archetype_id,
effect_type_id,
effect_detail_id,
effect_apply_id)
);
This is basically a one to many to many relationship, so I needed a technical
key to have duplicate effects in the ability_effect table.
Questions:
1) Is there a better way to design these tables to avoid the duplication of
data over these three tables?
2) Should these tables be broken down further?
3) Is it better to perform multiple table lookups to collect all the data? For example, just passing around the archetype_id and doing lookups for the data when necessary (which will be often).
UPDATE:
I actually do have parent tables for attacks, damage, etc. I removed those
tables and their related indexes from the sample to make the question clean,
concise, and focused on my duplicate data issue.
I was trying to avoid each table having both an id and a name, as both would be candidate keys and so having both would be wasted space. I was trying to keep the SQLite database as small as possible. (Hence, the many "varchar(16)"
declarations, which I now know SQLite ignores.) It seems in SQLite having both
values is unavoidable, unless being twice as slow is somehow ok when using the
WITHOUT ROWID option during table creation. So, I will rewrite my database to
use ids and names via the rowid implementation.
Thanks for your input guys!
1) Is there a better way to design these tables to avoid the
duplication of data over these three tables?
and also
2) Should these tables be broken down further?
It would appear so.
It would appear Mage is a unique archtype, as is Warrior. (based upon For example, the Mage archetype might look like the following:).
As such why not make the archtype_id a primary key and then reference the attack type, damage etc from tables for these. i.e. have an attack table and a damage table.
So you could, for example, have something like (simplified for demonstration) :-
DROP TABLE IF EXISTS archtype;
DROP TABLE IF EXISTS attack;
DROP TABLE IF EXISTS damage;
CREATE TABLE IF NOT EXISTS attack (attack_id INTEGER PRIMARY KEY, attack_name TEXT, a_more_columns TEXT);
INSERT INTO attack (attack_name, a_more_columns) VALUES
('Targetted Affect','ta blah'), -- id 1
('AOE','aoe blah'), -- id 2
('Bounce Effect','bounce blah') -- id 3
;
CREATE TABLE IF NOT EXISTS damage (damage_id INTEGER PRIMARY KEY, damage_name TEXT, d_more_columns TEXT);
INSERT INTO damage (damage_name,d_more_columns) VALUES
('Shock','shock blah'), -- id 1
('Freeze','freeze blah'), -- id 2
('Fire','fire blah'), -- id 3
('Hit','hit blah')
;
CREATE TABLE IF NOT EXISTS archtype (id INTEGER PRIMARY KEY, archtype_name TEXT, attack_id_ref INTEGER, damage_id_ref INTEGER, at_more_columns TEXT);
INSERT INTO archtype (archtype_name,attack_id_ref,damage_id_ref,at_more_columns) VALUES
('Mage',1,1,'Mage blah'),
('Warrior',3,4,'Warrior Blah'),
('Dragon',2,3,'Dragon blah'),
('Iceman',2,2,'Iceman blah')
;
SELECT archtype_name, damage_name, attack_name FROM archtype JOIN damage ON damage_id_ref = damage_id JOIN attack ON attack_id_ref = attack_id;
Note that the aliases of rowid have been used for id's rather than the name as these are generally the most efficient.
The data for rowid tables is stored as a B-Tree structure containing one entry for each table row, using the rowid value as the key. This means that retrieving or sorting records by rowid is fast. Searching for a record with a specific rowid, or for all records with rowids within a specified range is around twice as fast as a similar search made by specifying any other PRIMARY KEY or indexed value. SQL As Understood By SQLite - CREATE TABLE- ROWIDs and the INTEGER PRIMARY KEY
A rowid is generated for all rows (unless WITHOUT ROWID is specified), by specifying ?? INTEGER PRIMARY KEY column ?? is an alias of the rowid.
Beware using AUTOINCREMENT, unlike other RDMS's that use this for automatically generating unique id's for rows. SQLite by default creates a unique id (the rowid). The AUTOINCREMENT keyword adds a constraint that ensures that the generated id is larger than the highest existing. To do this requires an additional table sqlite_sequence that has to be maintained and interrogated and as such has overheads. The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed. It is usually not needed. SQLite Autoincrement
The query at the end will result in :-
Now say you wanted types to have multiple attacks and damages per type then the above could easily be adapted by using many-many relationships by introducing reference/mapping/link tables (all just different names for the same). Such a table will have two columns (sometime other columns for data specific to the distinct reference/map/link) one for the parent (archtype) reference/map/link and the other for the child (attack/damage) referenced/mapped/linked.
e.g. the following could be added :-
DROP TABLE IF EXISTS archtype_attack_reference;
CREATE TABLE IF NOT EXISTS archtype_attack_reference
(aar_archtype_id INTEGER NOT NULL, aar_attack_id INTEGER NOT NULL,
PRIMARY KEY(aar_archtype_id,aar_attack_id))
WITHOUT ROWID;
DROP TABLE IF EXISTS archtype_damage_reference;
CREATE TABLE IF NOT EXISTS archtype_damage_reference
(adr_archtype_id INTEGER NOT NULL, adr_damage_id INTEGER NOT NULL,
PRIMARY KEY(adr_archtype_id,adr_damage_id))
WITHOUT ROWID
;
INSERT INTO archtype_attack_reference VALUES
(1,1), -- Mage has attack Targetted
(1,3), -- Mage has attack Bounce
(3,2), -- Dragon has attack AOE
(2,1), -- Warrior has attack targetted
(2,2), -- Warrior has attack AOE
(4,2), -- Iceman has attack AOE
(4,3) -- Icemane has attack Bounce
;
INSERT INTO archtype_damage_reference VALUES
(1,1),(1,3), -- Mage can damage with Shock and Freeze
(2,4), -- Warrior can damage with Hit
(3,3),(3,4), -- Dragon can damage with Fire and Hit
(4,2),(4,4) -- Iceman can damage with Freeze and Hit
;
SELECT archtype_name, attack_name,damage_name FROM archtype
JOIN archtype_attack_reference ON archtype_id = aar_archtype_id
JOIN archtype_damage_reference ON archtype_id = adr_archtype_id
JOIN attack ON aar_attack_id = attack_id
JOIN damage ON adr_damage_id = damage_id
;
The query results in :-
With a slight change the above query could even be used to perform a random attack e.g. :-
SELECT archtype_name, attack_name,damage_name FROM archtype
JOIN archtype_attack_reference ON archtype_id = aar_archtype_id
JOIN archtype_damage_reference ON archtype_id = adr_archtype_id
JOIN attack ON aar_attack_id = attack_id
JOIN damage ON adr_damage_id = damage_id
ORDER BY random() LIMIT 1 -- ADDED THIS LINE
;
You could get :-
Another time you might get :-
3) Is it better to perform multiple table lookups to collect all the
data? For example, just passing around the archetype_id and doing
lookups for the data when necessary (which will be often).
That's pretty hard to say. You may initially think gather all the data once and keep it in memory say as an object. However, at times the underlying data may well already be in memory due to it being cached. Perhaps it could be better to utilise part of each. So I believe the answer is, you will need to test various scenarios.
I would probably avoid those composite primary keys.
And use the more commonly used integer with an autoincrement.
Then add the unique or non-unique composite indexes where needed.
Although i.m.h.o it's not always a bad idea to use a short CHAR or VARCHAR as the primary key in some cases. Mostly when easy to understand abbreviations can be used.
An example. Suppose you have a reference table for Countries. With a primary key on the 2 character CountryCode. Then when querying a table with a foreign key on that CountryCode, then for the human mind it's way easier to understand 'US' than some integer. Even without joining to Countries you'll probably know what Country is referenced.
So here are your tables with a slightly different layout.
create table archetype
(
archetype_id integer primary key autoincrement,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_defense_id varchar(16) not null,
trait_offense_id varchar(16) not null,
archetype_description varchar(128),
constraint uk_archetype unique (attack_id, damage_id,
resource_type_id,
trait_defense_id,
trait_offense_id)
);
create table effect
(
effect_id integer primary key autoincrement,
archetype_id integer not null, -- FK archetype
effect_apply_id varchar(16) not null,
effect_type_id varchar(16) not null,
effect_detail_id varchar(16) not null,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_id varchar(16),
constraint pk_effect unique(archetype_id, effect_type_id,
effect_detail_id, effect_apply_id,
attack_id, damage_id, resource_type_id),
constraint fk_effect_archetype_id foreign key(archetype_id)
references archetype (archetype_id)
);
create table ability
(
ability_id integer primary key autoincrement,
ability_description varchar(128)
);
create table ability_effect
(
ability_effect_id integer primary key autoincrement,
ability_id integer not null, -- FK ability
effect_id integer not null, -- FK effect
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_id varchar(16),
constraint fk_ability_effect_ability_id foreign key (ability_id)
references ability (ability_id),
constraint fk_ability_effect_effect_id foreign key (effect_id)
references effect (effect_id)
);

Doctrine: Models with column_aggregation inheritance appear twice in SQL

Has anyone noticed this?
Whenever a model uses column_aggregation (inheritance), the schema.sql has 2 CREATE TABLE commands, one creates the basic table, and the other (apart from fields) adds an index on the inheritence column
CREATE TABLE Prop (id INT AUTO_INCREMENT, opt_property_type SMALLINT UNSIGNED DEFAULT 251 NOT NULL, property_nature VARCHAR(255), INDEX opt_property_type_idx (opt_property_type), PRIMARY KEY(id)) DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE = InnoDB;
CREATE TABLE Prop (id INT AUTO_INCREMENT, opt_property_type SMALLINT UNSIGNED DEFAULT 251 NOT NULL, property_nature VARCHAR(255), INDEX Prop_property_nature_idx (property_nature), INDEX opt_property_type_idx (opt_property_type), PRIMARY KEY(id)) DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE = InnoDB;
Note the inclusion of INDEX Prop_property_nature_idx (property_nature) in the second statement
If anyone else is facing this, I will log a bug. Thanks
I just came across this myself. It seems like doctrine:build-sql is buggy.
One of the crazier things I discovered while investigating this is that doctrine:insert-sql doesn't even use schema.sql. It dynamically generates and runs the SQL based on the model definitions.
Looks like this is a known bug and won't be fixed in Doctrine 1:
http://www.doctrine-project.org/jira/browse/DC-123
http://www.doctrine-project.org/jira/browse/DC-536

Database design for email messaging system

I want to make an email messaging system like gmail have. I would like to have following option: Starred, Trash, Spam, Draft, Read, Unread. Right now I have the below following structure in my database :
CREATE TABLE [MyInbox](
[InboxID] [int] IDENTITY(1,1) NOT NULL,
[FromUserID] [int] NOT NULL,
[ToUserID] [int] NOT NULL,
[Created] [datetime] NOT NULL,
[Subject] [nvarchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Body] [nvarchar](max) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[IsRead] [bit] NOT NULL,
[IsReceived] [bit] NOT NULL,
[IsSent] [bit] NOT NULL,
[IsStar] [bit] NOT NULL CONSTRAINT [DF_MyInbox_IsStarred] DEFAULT ((0)),
[IsTrash] [bit] NOT NULL CONSTRAINT [DF_MyInbox_IsTrashed] DEFAULT ((0)),
[IsDraft] [bit] NOT NULL CONSTRAINT [DF_MyInbox_Isdrafted] DEFAULT ((0))
) ON [PRIMARY]
But I am facing some issues with the above structure. Right now if a user A sends a msessage to user B I am storing a row in this table But if user B deletes the that message it gets deleted frm user's A sent message too. This is wrong, I want exactly as normal email messaging system does. If A deletes message from his sent item then B should not get deleted from his inbox. I am thinking on other problem here which will come suppose a user A sent a mail to 500 users at once so as per my design I will have 500 rows with duplicate bodies i.e not a memory efficent way to store it. Could you guys please help me in makeing the design for a messaging system ?
You need to split your table for it. You could have following schema and structure
CREATE TABLE [Users]
(
[UserID] INT ,
[UserName] NVARCHAR(50) ,
[FirstName] NVARCHAR(50) ,
[LastName] NVARCHAR(50)
)
CREATE TABLE [Messages]
(
[MessageID] INT ,
[Subject] NVARCHAR(MAX) ,
[Body] NVARCHAR(MAX) ,
[Date] DATETIME,
[AuthorID] INT,
)
CREATE TABLE [MessagePlaceHolders]
(
[PlaceHolderID] INT ,
[PlaceHolder] NVARCHAR(255)--For example: InBox, SentItems, Draft, Trash, Spam
)
CREATE TABLE [Users_Messages_Mapped]
(
[MessageID] INT ,
[UserID] INT ,
[PlaceHolderID] INT,
[IsRead] BIT ,
[IsStarred] BIT
)
In users table you can have users."Messages" denotes the table for messages. "MessagePlaceHolders" denotes the table for placeholders for messages. Placeholders can be inbox, sent item, draft, spam or trash. "Users_Messages_Mapped" denotes the mapping table for users and messages. The "UserID" and "PlaceHolderID" are the foreign keys."IsRead" and "IsStarred" signifies what their name stands for.
If there is no record found for a particular messageid in "Users_Messages_Mapped" table that record will be deleted from Messages table since we no longer need it.
If you're doing document-orientated work, I suggest taking a look at CouchDB. It is schema-less, meaning issues like this disappear.
Let's take a look at the example: A sends a message to B, and it's deleted by B.
You would have a single instance of the document, with recipients listed as an attribute of the email. As users delete messages, you either remove them from the recipients list or add them to a list of deleted_by or whatever you choose.
It's a much different approach to data than what you're used to, but may be highly beneficial to take some time to consider.
I think you need to decompose your schema some more. Store emails seperately, and map inboxes to the messages they contain.
If I were you I would set two flags one for sender and other one for receiver if both flags are true then message should be deleted from database otherwise keep that in database but hide it from who deleted it.
Do same thing for trash. You may want to run cron or check manually if both sender and receiver delete the message then remove it from database.
A message can only be in one folder at a time, so you want a folders table (containing folders 'Trash', 'Inbox', 'Archive', etc.) and a foreign key from messages to folders.
For labels, you have a many-to-many relation, so you need a labels table and also a link table (messages_labels).
For starring, a simple bit column should do, same for 'unread'.
CREATE TABLE `mails` (
`message_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`message` varchar(10000) NOT NULL DEFAULT '',
`file` longblob,
`mailingdate` varchar(40) DEFAULT NULL,
`starred_status` int(10) unsigned NOT NULL DEFAULT '0',
`sender_email` varchar(200) NOT NULL DEFAULT '',
`reciever_email` varchar(200) NOT NULL DEFAULT '',
`inbox_status` int(10) unsigned NOT NULL DEFAULT '0',
`sent_status` int(10) unsigned NOT NULL DEFAULT '0',
`draft_status` int(10) unsigned NOT NULL DEFAULT '0',
`trash_status` int(10) unsigned NOT NULL DEFAULT '0',
`subject` varchar(200) DEFAULT NULL,
`read_status` int(10) unsigned NOT NULL DEFAULT '0',
`delete_status` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`message_id`)
)
You can use this table for storing the mails and manipulate the queries according to mail boxes. I am avoiding rest of the tables like user details and login details table. You can make them according to your need.
You could create a table for MessageContacts which joins each message to the people who have it in their mailboxes. When a user deletes a message then a row gets deleted from MessageContacts but the original message is preserved.
You could do that... but I suggest you don't. Unless it's an academic exercise set by your tutor then it is surely a complete waste of time to develop your own messaging system. If it is homework then you ought to say so. If not, then go do something more useful instead.
WHY DELETE? I think there is no need to delete anything. Just hide it, from users when deleted. Because, it will problem to check both sides, when sender send same message to many recipients. Then you have to check and flag all recipients. If all OK, then delete...
I think there is no need to delete anything.
in my structure, I set "deleted: bool" flag and depend on its value show message or hide.

SQL Query always uses filesort in order by clause

I am trying to optimize a sql query which is using order by clause. When I use EXPLAIN the query always displays "using filesort". I am applying this query for a group discussion forum where there are tags attached to posts by users.
Here are the 3 tables I am using: users, user_tag, tags
user_tag is the association mapping table for users and their tags.
CREATE TABLE `usertable` (
`user_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_name` varchar(20) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`user_name`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `user_tag` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) unsigned NOT NULL,
`tag_id` int(11) unsigned NOT NULL,
`usage_count` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `tag_id` (`tag_id`),
KEY `usage_count` (`usage_count`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I update the usage_count on server side using programming. Here is the query that's giving me problem. The query is to find out the tag_id and usage_count for a particular username, sorted by usage count in descending order
select user_tag.tag_id, user_tag.usage_count
from user_tag inner join usertable on usertable.user_id = user_tag.user_id
where user_name="abc" order by usage_count DESC;
Here is the explain output:
mysql> explain select
user_tag.tag_id,
user_tag.usage_count from user_tag
inner join usertable on
user_tag.user_id = usertable.user_id
where user_name="abc" order by
user_tag.usage_count desc;
Explain output here
What should I be changing to lose that "Using filesort"
I'm rather rusty with this, but here goes.
The key used to fetch the rows is not the same as the one used in the ORDER BY:
http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html
As mentioned by OMG Ponies, an index on user_id, usage_count may resolve the filesort.
KEY `user_id_usage_count` (`user_id`,`usage_count`)
"Using filesort" is not necessarily bad; in many cases it doesn't actually matter.
Also, its name is somewhat confusing. The filesort() function does not necessarily use temporary files to perform the sort. For small data sets, the data are sorted in memory which is pretty fast.
Unless you think it's a specific problem (for example, after profiling your application on production-grade hardware in the lab, removing the ORDER BY solves a specific performance issue), or your data set is large, you should probably not worry about it.

Large MyISAM table slow even for non-concurrent inserts/updates

I have a MyISAM table with ~50'000'000 records (tasks for web crawler):
CREATE TABLE `tasks2` (
`id` int(11) NOT NULL auto_increment,
`url` varchar(760) character set latin1 NOT NULL,
`state` varchar(10) collate utf8_bin default NULL,
`links_depth` int(11) NOT NULL,
`sites_depth` int(11) NOT NULL,
`error_text` text character set latin1,
`parent` int(11) default NULL,
`seed` int(11) NOT NULL,
`random` int(11) NOT NULL default '0',
PRIMARY KEY (`id`),
UNIQUE KEY `URL_UNIQUE` (`url`),
KEY `next_random_task` (`state`,`random`)
) ENGINE=MyISAM AUTO_INCREMENT=61211954 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
Once every few seconds one of the following operations occur (but never simultaneously):
INSERT ... VALUES (500 rows) - inserts new tasks
UPDATE ... WHERE id IN (up to 10 ids) - updates state for batch of tasks
SELECT ... WHERE (by next_random_task index) - loads batch of tasks for processing
My problem is that inserts and updates are very slow - running on the order of tens of seconds, sometimes over a minute. Selects are fast, though. Why could this happen and how to improve performance?
~50M on a regular hardware is a decent number.
Please go through this question on sf (even though it is written for InoDB, there are similar parameters for MyISAM)
After that you should start the cycle of
identifying (logging) slow queries to understand you patterns (or confirm your assumptions)
tweaking my.cnf or adding/removing indexes (depending on the patterns)
measuring improvements
EXPLAIN a sample UPDATE against the full table to ensure the primary key index is being used.
Consider changing state to a TINYINT or ENUM to make its index smaller. (ENUM might not actually do this).
Do you need the unique key on url? This will slow down inserts.