Best database structure for multiple value attributes - sql

I'm using PostgreSQL and I currently have 2 models.
Users (user_id, user_name),
Players (player_id, player_title, player_description)
I want to add the feature for users to add characteristics to players and other users. I was thinking to make a new table called Characteristics (characteristic_id, characteristic_description, user_id, player_id).
Each user can add multiple characteristics to multiple players. I read that adding multiple values to 1 row is not optimal for databases so my second option is to have multiple characteristics stored within the Characteristics table. But still I find this redundant because the values are being repeated.
Is there a better way to structure my database?
Example:
The Characteristics table might look like this:
characteristic_id characteristic_description user_id player_id
1 good 1 1
2 bad 1 2
1 good 2 1
1 good 2 2
I find this redundant. Is there a better way? I also don't know how to add the feature for users to add characteristics to other users.
Thanks

Related

Is it good practice to have two SQL tables with bijective row correspondence?

I have a table of tasks,
id | name
----+-------------
1 | brush teeth
2 | do laundry
and a table of states.
taskid | state
--------+-------------
1 | completed
2 | uncompleted
There is a bijective correspondence between the tables, i.e.
each row in the task table corresponds to exactly one row in the state table.
Another way of implementing this would be to place a state row in the task table.
id | name | state
----+-------------+-------------
1 | brush teeth | completed
2 | do laundry | uncompleted
The main reason why I have selected to use two tables instead of this one, is because updating the state will then cause a change in the task id.
I have other tables referencing the task(id) column, and do not want to have to update all those other tables too when altering a task's state.
I have two questions about this.
Is it good practice to have two tables in bijective row-row correspondence?
Is there a way I can ensure a constraint that there is exactly one row in the state table corresponding to each row in the task table?
The system I am using is postgresql.
You can ensure the 1-1 correspondence by making the id in each table a primary key and a foreign key that references the id in the other table. This is allowed and it guarantees 1-1'ness.
Sometimes, you want such tables, but one table has fewer rows than the other. This occurs when there is a subsetting relationship, and you don't want the additional columns on all rows.
Another purpose is to store separate columns in different places. When I learned about databases, this approach was called vertical partitioning. Nowadays, columnar databases are relatively common; these take the notion to the extreme -- a separate "store" for each column (although the "store" is not exactly a "table").
Why would you do this? Here are some reasons:
You have infrequently used columns that you do not want to load for every query on the more frequent columns.
You have frequently updated columns and you do not want to lock the rest of the columns.
You have too many columns to store in one row.
You have different security requirements on different columns.
Postgres does offer other mechanisms that you might find relevant. In particular, table inheritance might be useful in your situation.
All that said, you would not normally design a database like this. There are good reasons for doing so, but it is more typical to put all columns related to an entity in the same table.

Storing list of strings in MySql column

I have a users table which contains data for registered users, each row represents a user. One column in particular should contain a list of groups the user is part of, at the moment that column is of TEXT type and I'm storing that list as a string where groups are separated with a semicolon, something like:
admin;moderators;devteam
And I was wondering: "Is this a good idea?", is there a better/safer way to do this that doesn't require a lot of effort to implement or is this "ok"?
Here is a pic of the table as of now:
And I was wondering: "Is this a good idea?"
Short answer: probably not.
Why
If you will ever need to do any manipulation on that column, you will find yourself in big trouble. Simply selecting all users in a group will require some operations on a string (usually not performance-friendly). Same will hold true for sorting, joining and all the other operations SQL is great for.
Solution
What you describe is a typical example of N:N relationship, where each user can belong to multiple groups and each group can have multiple users in it.
The 'standard' way of modeling this relationship is creating a new table, where each row will represent a user belonging to a group. The columns will be group and userID.
With data from your example
userID | group
--------|----------
1 | admin
1 | moderator
1 | test
This allows to have one row for each user in the users table, and getting the groups of a specific user is as simple as
select group
from user_groups
where userID = '1'

Does it follow best-practice DB design to mix staff and customer details in 1 table?

I have a table called Users which is currently holding data on both Customers and Staff. It has their names and emails and passwords etc. It also has a field called TypeOfUserID which holds a value to say what type of user they are .e.g Customer or Staff
Would it be better to have two separate tables: Customers and Staff?
It seems like duplication because the fields are the same for both types of user. The only field I can get rid of is the TypeOfUserID column.
However, having them both in one table called Users means that in my front-end application I have to keep adding a clause to check what type of user they are. If for any reason I need to allow a different type of user access e.g. External Supplier then I have to manage the addition of TypeOfUserID in multiple places in the WHERE clauses.
Short Answer:
It depends. If your current needs are met, and you don't foresee this model needing to be changed for a long time / it would be easy to change if you had to, stick with it.
Longer answer:
If staff members are just a special case of user, I don't see any reason you'd want to change anything about the database structure. Yes, for staff-specific stuff you'd need to be sure the person was staff, but I don't really see any way around that- you always have to know they're staff, first.
If, however, you want finer-grained permissions than binary (a person can belong to the 'staff' group but that doesn't necessarily say whether or not they're in the users' group, for instance), you might want to change the database.
The easiest way to do that, of course, would be to have a unique ID associated with each user, and use that key to look up their group permissions in a different table.
Something like:
uid | group
------------
1 | users
1 | staff
2 | users
3 | staff
4 | users
5 | admin
Although you may or may not want an actual string for each group; most likely you'd want another level of indirection by having a 'groups' table. So, that table above would be a
'group_membership' table, and it could look more like:
uid | gid
------------
1 | 1
1 | 2
2 | 1
3 | 2
4 | 1
5 | 3
To go along with it, you'd have the 'groups' table, which would be:
gid | group
-------------
1 | users
2 | staff
3 | admin
But, again, that's only if you're imagining a larger number of roles and you want more flexibility. If you only ever plan on having 'users' and 'staff' and staff are just highly privileged users, all of that extra stuff would be a waste of your time.
However, if you want really fine grained permissions, with maximum flexibility, you can use the above to make them happen via a 'permissions' table:
gid | can_create_user | can_fire_people | can_ban_user
-------------------------------------------------------
1 | false | false | false
2 | true | false | true
3 | true | true | true
Some Example Code
Here's a working PostgreSQL example of getting permissions can_create_user and can_fire_people for a user with uid 1:
SELECT bool_or(can_create_user) AS can_create_user,
bool_or(can_fire_people) AS can_fire_people
FROM permissions
WHERE gid IN (SELECT gid FROM group_membership WHERE uid = 1);
Which would return:
can_create_user | can_fire_people
----------------------------------
true | false
because user 1 is in groups 1 and 2, and group 2 has the can_create_user permission, but
neither group has the can_fire_people permission.
((I know you're using SQL Server, but I only have access to a PostgreSQL server at the moment. Sorry about that. The difference should be minor, though.)
Notes
You'll want to make sure that uid and gid are primary keys in the users and groups table, and that there are foreign key constraints for those values in every other table which uses them; you don't want nonexistent groups to have permissions, or nonexistent users to be accidentally added to groups.
Alternatively
A graph database solves this problem pretty elegantly; you'd simply create edges linking users to groups, and edges linking groups to permissions. If you want to work with a technology that's currently sexy / buzzword compliant, might want to give that a try, depending on how enormous of a change that'd be.
Further information
The phrase you'll want to google is "access control". You'll probably want to implement access control lists (as outlined above) or something similar. Since this is primarily a security-related topic, you might also want to ask this question on sec.se, or at least look around there for related answers.
Even they look similar, they are logically from different areas. You will never need a union between those tables. But as your application develops, you will need to add more and more specific fields for these tables and they will became more different than similar.
You could have a seperate table for staff holding only id from the user table as the foreign key. If you do that, then any functionality related only to the staff member can query the staff table joining to the user table. This solution will also give you the fexibility for the future extension as any data releted only to the staff (for example department they work) member can be placed in the staff table.

Uniqueness in many-to-many

I couldn't figure out what terms to google, so help tagging this question or just pointing me in the way of a related question would be helpful.
I believe that I have a typical many-to-many relationship:
CREATE TABLE groups (
id integer PRIMARY KEY);
CREATE TABLE elements (
id integer PRIMARY KEY);
CREATE TABLE groups_elements (
groups_id integer REFERENCES groups,
elements_id integer REFERENCES elements,
PRIMARY KEY (groups_id, elements_id));
I want to have a constraint that there can only be one groups_id for a given set of elements_ids.
For example, the following is valid:
groups_id | elements_id
1 | 1
1 | 2
2 | 2
2 | 3
The following is not valid, because then groups 1 and 2 would be equivalent.
groups_id | elements_id
1 | 1
1 | 2
2 | 2
2 | 1
Not every subset of elements must have a group (this is not the power set), but new subsets may be formed. I suspect that my design is incorrect since I'm really talking about adding a group as a single entity.
How can I create identifiers for subsets of elements without risk of duplicating subsets?
That is an interesting problem.
One solution, albeit a klunky one, would be to store a concatenation of groups_id and elements_id in the groups table: 1-1-2 and make it a unique index.
Trying to do a search for duplicate groups before inserting a new row, would be an enormous performance hit.
The following query would spit out offending group ids:
with group_elements_arr as (
select groups_id, array_agg(elements_id order by elements_id) elements
from group_elements
group by groups_id )
select elements, count(*), array_agg(groups_id) offending_groups
from group_elements_arr
group by elements
having count(*) > 1;
Depending on the size of group_elements and its change rate you might get away with stuffing something along this lines into a trigger watching group_elements. If that's not fast enough you can materialize group_elements_arr into a real table managed by triggers.
And I think, the trigger should be FOR EACH STATEMENT and INITIALLY DEFERRED for easy building up a new group.
This link from user ypercube was most helpful: unique constraint on a set. In short, a bit of what everyone is saying is correct.
It's a question of tradeoffs, but here are the best options:
a) Add a hash or some other combination of element values to the groups table and make it unique, then populate the groups_elements table off of it using triggers. Pros of this method are that it preserves querying ability and enforces the constraint so long as you deny naked updates to groups_elements. Cons are that it adds complexity and you've now introduced logic like "how do you uniquely represent a set of elements" into your database.
b) Leave the tables as-is and control the access to groups_elements with your access layer, be it a stored procedure or otherwise. This has the advantage of preserving querying ability and keeps the database itself simple. However, it means that you are moving an analytic constraint into your access layer, which necessarily means that your access layer will need to be more complex. Another point is that it separates what the data should be from the data itself, which has both pros and cons. If you need faster access to whether or not a set already exists, you can attack that problem separately.

Architecture of SQL tables

I am wondering is it more useful and practical (size of DB) to create multiple tables in sql with two columns (one column containing foreign key and one column containing random data) or merge it and create one table containing multiple columns. I am asking this because in my scenario one product holding primary key could have sufficient/applicable data for only one column while other columns would be empty.
example a. one table
productID productname weight no_of_pages
1 book 130 500
2 watch 50 null
3 ring null null
example b. three tables
productID productname
1 book
2 watch
3 ring
productID weight
1 130
2 50
productID no_of_pages
1 500
The multi-table approach is more "normal" (in database terms) because it avoids columns that commonly store NULLs. It's also something of a pain in programming terms because you have to JOIN a bunch of tables to get your original entity back.
I suggest adopting a middle way. Weight seems to be a property of most products, if not all (indeed, a ring has a weight even if small and you'll probably want to know it for shipping purposes), so I'd leave that in the Products table. But number of pages applies only to a book, as do a slew of other unmentioned properties (author, ISBN, etc). In this example, I'd use a Products table and a Books table. The books table would extend the Products table in a fashion similar to class inheritance in object oriented program.
All book-specific properties go into the Books table, and you join only Products and Books to get a complete description of a book.
I think this all depends on how the tables will be used. Maybe your examples are oversimplifying things too much but it seems to me that the first option should be good enough.
You'd really use the second example if you're going to be doing extremely CPU intensive stuff with the first table and will only need the second and third tables when more information about a product is needed.
If you're going to need the information in the second and third tables most times you query the table, then there's no reason to join over every time and you should just keep it in one table.
I would suggest example a, in case there is a defined set of attributes for product, and an example c if you need variable number of attributes (new attributes keep coming every now and then) -
example c
productID productName
1 book
2 watch
3 ring
attrID productID attrType attrValue
1 1 weight 130
2 1 no_of_pages 500
3 2 weight 50
The table structure you have shown in example b is not normalized - there will be separate id columns required in second and third tables, since productId will be an fk and not a pk.
It depends on how many rows you are expecting on your PRODUCTS table. I would say that it would not make sense to normalize your tables to 3N in this case because product name, weight, and no_of_pages each describe the products. If you had repeating data such as manufacturers, it would make more sense to normalize your tables at that point.
Without knowing the background (data model), there is no way to tell which variant is more "correct". both are fine in certain scenarios.
You want three tables, full stop. That's best because there's no chance of watches winding up with pages (no pun intended) and some books without. If you normalize, the server works for you. If you don't, you do the work instead, just not as well. Up to you.
I am asking this because in my scenario one product holding primary key could have sufficient/applicable data for only one column while other columns would be empty.
That's always true of nullable columns. Here's the rule: a nullable column has an optional relationship to the key. A nullable column can always be, and usually should be, in a separate table where it can be non-null.