In my game, an archetype is a collection of associated traits, an attack type, a damage type, and a resource type. Each piece of data is unique to each
archetype. For example, the Mage archetype might look like the following:
archetype: Mage
attack: Targeted Area Effect
damage: Shock
resource: Mana
trait_defense: Willpower
trait_offense: Intelligence
This is the archetype table in SQLite syntax:
create table archetype
(
archetype_id varchar(16) not null,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_defense_id varchar(16) not null,
trait_offense_id varchar(16) not null,
archetype_description varchar(128),
constraint pk_archetype primary key (archetype_id),
constraint uk_archetype unique (attack_id, damage_id,
resource_type_id,
trait_defense_id,
trait_offense_id)
);
The primary key should be the complete composite, but I do not want to pass
all the data around to other tables unless necessary. For example, there are
crafting skills associated with each archetype which do not need to know any
other archetype data.
An effect is a combat outcome that can be applied to a friend or foe. An effect has an application type (instant, overtime), a type (buff, debuff, harm, heal, etc.) and a detail describing to which stat the effect applies. It also has most of the archetype data to make each effect unique. Also included is the associated trait used for progress and skill checks. For example, an effect might look like:
apply: Instant
type: Harm
detail: Health
archetype: Mage
attack_id: Targeted Area Effect
damage_id: Shock
resource: Mana
trait_id: Intelligence
This is the effect table in SQLite syntax:
create table effect
(
effect_apply_id varchar(16) not null,
effect_type_id varchar(16) not null,
effect_detail_id varchar(16) not null,
archetype_id varchar(16) not null,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_id varchar(16),
constraint pk_effect primary key(archetype_id, effect_type_id,
effect_detail_id, effect_apply_id,
attack_id, damage_id, resource_type_id),
constraint fk_effect_archetype_id foreign key(archetype_id, attack_id,
damage_id, resource_type_id)
references archetype (archetype_id, attack_id,
damage_id, resource_type_id)
);
An ability is a container that can hold multiple effects. There is no limit to
the kinds of effects it can hold, e.g. having both Mage and Warrior effects in
the same ability, or even having two of the same effects, is fine. Each effect
in the ability is going to have the archetype data, and the effect data.
Again.
Ability tables in SQLite syntax:
create table ability
(
ability_id varchar(64),
ability_description varchar(128),
constraint pk_ability primary key (ability_id)
);
create table ability_effect
(
ability_effect_id integer primary key autoincrement,
ability_id varchar(64) not null,
archetype_id varchar(16) not null,
effect_type_id varchar(16) not null,
effect_detail_id varchar(16) not null,
effect_apply_id varchar(16) not null,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_id varchar(16),
constraint fk_ability_effect_ability_id foreign key (ability_id)
references ability (ability_id),
constraint fk_ability_effect_effect_id foreign key (archetype_id,
effect_type_id,
effect_detail_id,
effect_apply_id)
references effect (archetype_id,
effect_type_id,
effect_detail_id,
effect_apply_id)
);
This is basically a one to many to many relationship, so I needed a technical
key to have duplicate effects in the ability_effect table.
Questions:
1) Is there a better way to design these tables to avoid the duplication of
data over these three tables?
2) Should these tables be broken down further?
3) Is it better to perform multiple table lookups to collect all the data? For example, just passing around the archetype_id and doing lookups for the data when necessary (which will be often).
UPDATE:
I actually do have parent tables for attacks, damage, etc. I removed those
tables and their related indexes from the sample to make the question clean,
concise, and focused on my duplicate data issue.
I was trying to avoid each table having both an id and a name, as both would be candidate keys and so having both would be wasted space. I was trying to keep the SQLite database as small as possible. (Hence, the many "varchar(16)"
declarations, which I now know SQLite ignores.) It seems in SQLite having both
values is unavoidable, unless being twice as slow is somehow ok when using the
WITHOUT ROWID option during table creation. So, I will rewrite my database to
use ids and names via the rowid implementation.
Thanks for your input guys!
1) Is there a better way to design these tables to avoid the
duplication of data over these three tables?
and also
2) Should these tables be broken down further?
It would appear so.
It would appear Mage is a unique archtype, as is Warrior. (based upon For example, the Mage archetype might look like the following:).
As such why not make the archtype_id a primary key and then reference the attack type, damage etc from tables for these. i.e. have an attack table and a damage table.
So you could, for example, have something like (simplified for demonstration) :-
DROP TABLE IF EXISTS archtype;
DROP TABLE IF EXISTS attack;
DROP TABLE IF EXISTS damage;
CREATE TABLE IF NOT EXISTS attack (attack_id INTEGER PRIMARY KEY, attack_name TEXT, a_more_columns TEXT);
INSERT INTO attack (attack_name, a_more_columns) VALUES
('Targetted Affect','ta blah'), -- id 1
('AOE','aoe blah'), -- id 2
('Bounce Effect','bounce blah') -- id 3
;
CREATE TABLE IF NOT EXISTS damage (damage_id INTEGER PRIMARY KEY, damage_name TEXT, d_more_columns TEXT);
INSERT INTO damage (damage_name,d_more_columns) VALUES
('Shock','shock blah'), -- id 1
('Freeze','freeze blah'), -- id 2
('Fire','fire blah'), -- id 3
('Hit','hit blah')
;
CREATE TABLE IF NOT EXISTS archtype (id INTEGER PRIMARY KEY, archtype_name TEXT, attack_id_ref INTEGER, damage_id_ref INTEGER, at_more_columns TEXT);
INSERT INTO archtype (archtype_name,attack_id_ref,damage_id_ref,at_more_columns) VALUES
('Mage',1,1,'Mage blah'),
('Warrior',3,4,'Warrior Blah'),
('Dragon',2,3,'Dragon blah'),
('Iceman',2,2,'Iceman blah')
;
SELECT archtype_name, damage_name, attack_name FROM archtype JOIN damage ON damage_id_ref = damage_id JOIN attack ON attack_id_ref = attack_id;
Note that the aliases of rowid have been used for id's rather than the name as these are generally the most efficient.
The data for rowid tables is stored as a B-Tree structure containing one entry for each table row, using the rowid value as the key. This means that retrieving or sorting records by rowid is fast. Searching for a record with a specific rowid, or for all records with rowids within a specified range is around twice as fast as a similar search made by specifying any other PRIMARY KEY or indexed value. SQL As Understood By SQLite - CREATE TABLE- ROWIDs and the INTEGER PRIMARY KEY
A rowid is generated for all rows (unless WITHOUT ROWID is specified), by specifying ?? INTEGER PRIMARY KEY column ?? is an alias of the rowid.
Beware using AUTOINCREMENT, unlike other RDMS's that use this for automatically generating unique id's for rows. SQLite by default creates a unique id (the rowid). The AUTOINCREMENT keyword adds a constraint that ensures that the generated id is larger than the highest existing. To do this requires an additional table sqlite_sequence that has to be maintained and interrogated and as such has overheads. The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed. It is usually not needed. SQLite Autoincrement
The query at the end will result in :-
Now say you wanted types to have multiple attacks and damages per type then the above could easily be adapted by using many-many relationships by introducing reference/mapping/link tables (all just different names for the same). Such a table will have two columns (sometime other columns for data specific to the distinct reference/map/link) one for the parent (archtype) reference/map/link and the other for the child (attack/damage) referenced/mapped/linked.
e.g. the following could be added :-
DROP TABLE IF EXISTS archtype_attack_reference;
CREATE TABLE IF NOT EXISTS archtype_attack_reference
(aar_archtype_id INTEGER NOT NULL, aar_attack_id INTEGER NOT NULL,
PRIMARY KEY(aar_archtype_id,aar_attack_id))
WITHOUT ROWID;
DROP TABLE IF EXISTS archtype_damage_reference;
CREATE TABLE IF NOT EXISTS archtype_damage_reference
(adr_archtype_id INTEGER NOT NULL, adr_damage_id INTEGER NOT NULL,
PRIMARY KEY(adr_archtype_id,adr_damage_id))
WITHOUT ROWID
;
INSERT INTO archtype_attack_reference VALUES
(1,1), -- Mage has attack Targetted
(1,3), -- Mage has attack Bounce
(3,2), -- Dragon has attack AOE
(2,1), -- Warrior has attack targetted
(2,2), -- Warrior has attack AOE
(4,2), -- Iceman has attack AOE
(4,3) -- Icemane has attack Bounce
;
INSERT INTO archtype_damage_reference VALUES
(1,1),(1,3), -- Mage can damage with Shock and Freeze
(2,4), -- Warrior can damage with Hit
(3,3),(3,4), -- Dragon can damage with Fire and Hit
(4,2),(4,4) -- Iceman can damage with Freeze and Hit
;
SELECT archtype_name, attack_name,damage_name FROM archtype
JOIN archtype_attack_reference ON archtype_id = aar_archtype_id
JOIN archtype_damage_reference ON archtype_id = adr_archtype_id
JOIN attack ON aar_attack_id = attack_id
JOIN damage ON adr_damage_id = damage_id
;
The query results in :-
With a slight change the above query could even be used to perform a random attack e.g. :-
SELECT archtype_name, attack_name,damage_name FROM archtype
JOIN archtype_attack_reference ON archtype_id = aar_archtype_id
JOIN archtype_damage_reference ON archtype_id = adr_archtype_id
JOIN attack ON aar_attack_id = attack_id
JOIN damage ON adr_damage_id = damage_id
ORDER BY random() LIMIT 1 -- ADDED THIS LINE
;
You could get :-
Another time you might get :-
3) Is it better to perform multiple table lookups to collect all the
data? For example, just passing around the archetype_id and doing
lookups for the data when necessary (which will be often).
That's pretty hard to say. You may initially think gather all the data once and keep it in memory say as an object. However, at times the underlying data may well already be in memory due to it being cached. Perhaps it could be better to utilise part of each. So I believe the answer is, you will need to test various scenarios.
I would probably avoid those composite primary keys.
And use the more commonly used integer with an autoincrement.
Then add the unique or non-unique composite indexes where needed.
Although i.m.h.o it's not always a bad idea to use a short CHAR or VARCHAR as the primary key in some cases. Mostly when easy to understand abbreviations can be used.
An example. Suppose you have a reference table for Countries. With a primary key on the 2 character CountryCode. Then when querying a table with a foreign key on that CountryCode, then for the human mind it's way easier to understand 'US' than some integer. Even without joining to Countries you'll probably know what Country is referenced.
So here are your tables with a slightly different layout.
create table archetype
(
archetype_id integer primary key autoincrement,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_defense_id varchar(16) not null,
trait_offense_id varchar(16) not null,
archetype_description varchar(128),
constraint uk_archetype unique (attack_id, damage_id,
resource_type_id,
trait_defense_id,
trait_offense_id)
);
create table effect
(
effect_id integer primary key autoincrement,
archetype_id integer not null, -- FK archetype
effect_apply_id varchar(16) not null,
effect_type_id varchar(16) not null,
effect_detail_id varchar(16) not null,
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_id varchar(16),
constraint pk_effect unique(archetype_id, effect_type_id,
effect_detail_id, effect_apply_id,
attack_id, damage_id, resource_type_id),
constraint fk_effect_archetype_id foreign key(archetype_id)
references archetype (archetype_id)
);
create table ability
(
ability_id integer primary key autoincrement,
ability_description varchar(128)
);
create table ability_effect
(
ability_effect_id integer primary key autoincrement,
ability_id integer not null, -- FK ability
effect_id integer not null, -- FK effect
attack_id varchar(16) not null,
damage_id varchar(16) not null,
resource_type_id varchar(16) not null,
trait_id varchar(16),
constraint fk_ability_effect_ability_id foreign key (ability_id)
references ability (ability_id),
constraint fk_ability_effect_effect_id foreign key (effect_id)
references effect (effect_id)
);
I want to make an email messaging system like gmail have. I would like to have following option: Starred, Trash, Spam, Draft, Read, Unread. Right now I have the below following structure in my database :
CREATE TABLE [MyInbox](
[InboxID] [int] IDENTITY(1,1) NOT NULL,
[FromUserID] [int] NOT NULL,
[ToUserID] [int] NOT NULL,
[Created] [datetime] NOT NULL,
[Subject] [nvarchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Body] [nvarchar](max) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[IsRead] [bit] NOT NULL,
[IsReceived] [bit] NOT NULL,
[IsSent] [bit] NOT NULL,
[IsStar] [bit] NOT NULL CONSTRAINT [DF_MyInbox_IsStarred] DEFAULT ((0)),
[IsTrash] [bit] NOT NULL CONSTRAINT [DF_MyInbox_IsTrashed] DEFAULT ((0)),
[IsDraft] [bit] NOT NULL CONSTRAINT [DF_MyInbox_Isdrafted] DEFAULT ((0))
) ON [PRIMARY]
But I am facing some issues with the above structure. Right now if a user A sends a msessage to user B I am storing a row in this table But if user B deletes the that message it gets deleted frm user's A sent message too. This is wrong, I want exactly as normal email messaging system does. If A deletes message from his sent item then B should not get deleted from his inbox. I am thinking on other problem here which will come suppose a user A sent a mail to 500 users at once so as per my design I will have 500 rows with duplicate bodies i.e not a memory efficent way to store it. Could you guys please help me in makeing the design for a messaging system ?
You need to split your table for it. You could have following schema and structure
CREATE TABLE [Users]
(
[UserID] INT ,
[UserName] NVARCHAR(50) ,
[FirstName] NVARCHAR(50) ,
[LastName] NVARCHAR(50)
)
CREATE TABLE [Messages]
(
[MessageID] INT ,
[Subject] NVARCHAR(MAX) ,
[Body] NVARCHAR(MAX) ,
[Date] DATETIME,
[AuthorID] INT,
)
CREATE TABLE [MessagePlaceHolders]
(
[PlaceHolderID] INT ,
[PlaceHolder] NVARCHAR(255)--For example: InBox, SentItems, Draft, Trash, Spam
)
CREATE TABLE [Users_Messages_Mapped]
(
[MessageID] INT ,
[UserID] INT ,
[PlaceHolderID] INT,
[IsRead] BIT ,
[IsStarred] BIT
)
In users table you can have users."Messages" denotes the table for messages. "MessagePlaceHolders" denotes the table for placeholders for messages. Placeholders can be inbox, sent item, draft, spam or trash. "Users_Messages_Mapped" denotes the mapping table for users and messages. The "UserID" and "PlaceHolderID" are the foreign keys."IsRead" and "IsStarred" signifies what their name stands for.
If there is no record found for a particular messageid in "Users_Messages_Mapped" table that record will be deleted from Messages table since we no longer need it.
If you're doing document-orientated work, I suggest taking a look at CouchDB. It is schema-less, meaning issues like this disappear.
Let's take a look at the example: A sends a message to B, and it's deleted by B.
You would have a single instance of the document, with recipients listed as an attribute of the email. As users delete messages, you either remove them from the recipients list or add them to a list of deleted_by or whatever you choose.
It's a much different approach to data than what you're used to, but may be highly beneficial to take some time to consider.
I think you need to decompose your schema some more. Store emails seperately, and map inboxes to the messages they contain.
If I were you I would set two flags one for sender and other one for receiver if both flags are true then message should be deleted from database otherwise keep that in database but hide it from who deleted it.
Do same thing for trash. You may want to run cron or check manually if both sender and receiver delete the message then remove it from database.
A message can only be in one folder at a time, so you want a folders table (containing folders 'Trash', 'Inbox', 'Archive', etc.) and a foreign key from messages to folders.
For labels, you have a many-to-many relation, so you need a labels table and also a link table (messages_labels).
For starring, a simple bit column should do, same for 'unread'.
CREATE TABLE `mails` (
`message_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`message` varchar(10000) NOT NULL DEFAULT '',
`file` longblob,
`mailingdate` varchar(40) DEFAULT NULL,
`starred_status` int(10) unsigned NOT NULL DEFAULT '0',
`sender_email` varchar(200) NOT NULL DEFAULT '',
`reciever_email` varchar(200) NOT NULL DEFAULT '',
`inbox_status` int(10) unsigned NOT NULL DEFAULT '0',
`sent_status` int(10) unsigned NOT NULL DEFAULT '0',
`draft_status` int(10) unsigned NOT NULL DEFAULT '0',
`trash_status` int(10) unsigned NOT NULL DEFAULT '0',
`subject` varchar(200) DEFAULT NULL,
`read_status` int(10) unsigned NOT NULL DEFAULT '0',
`delete_status` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`message_id`)
)
You can use this table for storing the mails and manipulate the queries according to mail boxes. I am avoiding rest of the tables like user details and login details table. You can make them according to your need.
You could create a table for MessageContacts which joins each message to the people who have it in their mailboxes. When a user deletes a message then a row gets deleted from MessageContacts but the original message is preserved.
You could do that... but I suggest you don't. Unless it's an academic exercise set by your tutor then it is surely a complete waste of time to develop your own messaging system. If it is homework then you ought to say so. If not, then go do something more useful instead.
WHY DELETE? I think there is no need to delete anything. Just hide it, from users when deleted. Because, it will problem to check both sides, when sender send same message to many recipients. Then you have to check and flag all recipients. If all OK, then delete...
I think there is no need to delete anything.
in my structure, I set "deleted: bool" flag and depend on its value show message or hide.
I am trying to optimize a sql query which is using order by clause. When I use EXPLAIN the query always displays "using filesort". I am applying this query for a group discussion forum where there are tags attached to posts by users.
Here are the 3 tables I am using: users, user_tag, tags
user_tag is the association mapping table for users and their tags.
CREATE TABLE `usertable` (
`user_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_name` varchar(20) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`user_name`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `user_tag` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) unsigned NOT NULL,
`tag_id` int(11) unsigned NOT NULL,
`usage_count` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `tag_id` (`tag_id`),
KEY `usage_count` (`usage_count`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I update the usage_count on server side using programming. Here is the query that's giving me problem. The query is to find out the tag_id and usage_count for a particular username, sorted by usage count in descending order
select user_tag.tag_id, user_tag.usage_count
from user_tag inner join usertable on usertable.user_id = user_tag.user_id
where user_name="abc" order by usage_count DESC;
Here is the explain output:
mysql> explain select
user_tag.tag_id,
user_tag.usage_count from user_tag
inner join usertable on
user_tag.user_id = usertable.user_id
where user_name="abc" order by
user_tag.usage_count desc;
Explain output here
What should I be changing to lose that "Using filesort"
I'm rather rusty with this, but here goes.
The key used to fetch the rows is not the same as the one used in the ORDER BY:
http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html
As mentioned by OMG Ponies, an index on user_id, usage_count may resolve the filesort.
KEY `user_id_usage_count` (`user_id`,`usage_count`)
"Using filesort" is not necessarily bad; in many cases it doesn't actually matter.
Also, its name is somewhat confusing. The filesort() function does not necessarily use temporary files to perform the sort. For small data sets, the data are sorted in memory which is pretty fast.
Unless you think it's a specific problem (for example, after profiling your application on production-grade hardware in the lab, removing the ORDER BY solves a specific performance issue), or your data set is large, you should probably not worry about it.
I have a MyISAM table with ~50'000'000 records (tasks for web crawler):
CREATE TABLE `tasks2` (
`id` int(11) NOT NULL auto_increment,
`url` varchar(760) character set latin1 NOT NULL,
`state` varchar(10) collate utf8_bin default NULL,
`links_depth` int(11) NOT NULL,
`sites_depth` int(11) NOT NULL,
`error_text` text character set latin1,
`parent` int(11) default NULL,
`seed` int(11) NOT NULL,
`random` int(11) NOT NULL default '0',
PRIMARY KEY (`id`),
UNIQUE KEY `URL_UNIQUE` (`url`),
KEY `next_random_task` (`state`,`random`)
) ENGINE=MyISAM AUTO_INCREMENT=61211954 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
Once every few seconds one of the following operations occur (but never simultaneously):
INSERT ... VALUES (500 rows) - inserts new tasks
UPDATE ... WHERE id IN (up to 10 ids) - updates state for batch of tasks
SELECT ... WHERE (by next_random_task index) - loads batch of tasks for processing
My problem is that inserts and updates are very slow - running on the order of tens of seconds, sometimes over a minute. Selects are fast, though. Why could this happen and how to improve performance?
~50M on a regular hardware is a decent number.
Please go through this question on sf (even though it is written for InoDB, there are similar parameters for MyISAM)
After that you should start the cycle of
identifying (logging) slow queries to understand you patterns (or confirm your assumptions)
tweaking my.cnf or adding/removing indexes (depending on the patterns)
measuring improvements
EXPLAIN a sample UPDATE against the full table to ensure the primary key index is being used.
Consider changing state to a TINYINT or ENUM to make its index smaller. (ENUM might not actually do this).
Do you need the unique key on url? This will slow down inserts.