Relationship redundant? - sql

I'm designing a database and I have a user table with users, and a group table with user's group.
These groups will have a owner (a user that has created it), and a set of users that are part of the group (like a Whatsapp group).
To represent this I have this design:
Do you think the owner column on Group table is necessary? Maybe I can add an owner column on Group table I can know easily the group's owner.

If you don't add the owner in the group then where are you going to add it? The only way I see apart from this is adding a boolean isowner to the usergroup. Anyway, this would not make sense if there will only be 1 owner. If there can be N owners then that would be the way to go.

You are on the right track, but you'll need one more step to ensure an owner must actually belong to the group she owns:
There is a FOREIGN KEY in Group {groupID, owner} that references UserGroup {groupID, userID}.
If your DBMS supports deferred foreign keys, you can make owner NOT NULL, to ensure a group cannot be owner-less. Otherwise you can leave it NULL-able (and you will still be able to break the "chicken-and-egg" problem with circular references if your DBMS supports MATCH SIMPLE FKs - almost all do).

You need 4 tables:
User
UserGroup
Group
UserRole (associated with UserGroup) - Shows the role of a user in a group (admin/owner, etc.) - If your roles are Admin and Ordinary user, you could use a Binary column on UserGroup instead.

I know a solution has already been proposed, but I am so convinced there is a better one ...
At the higher level, the concept of owner can be seen as a property of the relation existing between users and groups. It should then ideally be set as a field in the UserGroup table.
It could be either a boolean field or, even better, a generalized userGroupNatureOfRelation field, that could hold values such as 'owner', 'participant', 'user', or whatever could be the status.
Of course, such a solution allows you to implement any specific business rule, like 'there is only one owner per group'. It will let you implement any other more sophisticated business rule when needed, and it will even allow you to add a level of complexity by adding fields such as:
userGroupRelationStartDate
userGroupRelationEndDate
where you'll be able to follow the nature of the relation between a group and a person through time ...
Of course you could say 'I don't need it'. But implementing such an 'open' model does not cost anything more than what you are thinking of now. Then, if for any reason, in a near or far future, business rules have to be changed or improved, your model stays valid and efficient ...
This said, building views and manipulating data should be a lot easier with this model. So this a good and immediate reason to adopt it!

Related

The correct way to model User and Roles in SQL

I'm designing a Java application and the model data is stored in Oracle SQL Server. I'm trying to design the best user/role model according to what is necessary.
Because of business rules all users have basic common information:
Identification ID
Name
Surname
Email
IsActiveUser
But then depending on the role, the user will have extra fields like:
Client Role:
Birth Date
Address
Lawyer Role:
Specialty
Professional Registration ID
Expert Role:
Occupation
Manager Role:
Region
I think in two possible solutions:
User table will have all the common fields and the optional fields that will be filled depending on the role.
User table will only have the common fields, and then I create a Detail_User table to save the optional fields that vary with the role.
Do you think this possible solutions are good? Is there an alternative better solution?
Answering in the Relational database paradigm, as you have tagged it.
Do you think this possible solutions are good? Is there an alternative better solution?
No. This is a classic case for Subtypes.
I have a Role table and the User table have a FK to this, because every user will have only one role.
That won't solve your problem. You need to store the values for each instance of a Role, each instance of an User.
Further, you will appreciate the correct solution only when you wish to constrain some child table (eg. Portfolio.LawyerId) to Lawyer, not User.
Data Model
The Data Model in IDEF1X/ER Level (not ERD) is:
Note
The Standard for Relational Data Modelling since 1983 is IDEF1X. For those unfamiliar with the Standard, refer to the short IDEF1X Introduction.
For full definition and usage considerations re Subtypes, refer to Subtype Definition.
Unless you are going to have zillions of rows, there is no need to split the tables in the two that you specify.
You might consider a separate table for each role -- or more specifically for each role that has bespoke columns. This would give you a little efficiency in storage space in many cases (versus NULL) in a wider record, although that depends on the database and data types being used.
A more important reason to split them is for foreign key references. If you have other tables where "lawyer" would be a foreign key, then you need a lawyers table for that. Voila! Having a separate table for different roles allows such specialized relationships, as well as general purpose relationships for all users.

Is it a good practice to map UserAccount Table with all other tables in SQL Server?

I have UserAccount table and other tables like Employee, Student etc. I want to have an audit like who created a student record or who created a certain employee record. Is it a good practice to have UserAccountId as foreign key in all other tables like Employee, Student etc? I am using hibernate if I mapped like this I have to maintain one to many relationship between UserAccount and All other Classes so code increases and for me that is a burden.
Well it breaks all normalisation rules. Have a link/href table instead. UserAccountID, EmployeeID(NULL), StudentID(NULL). Have one massive linked table like this. The foreign Keys needs to be nullable besides UserAccountID(Primary Key and Foreign Key).
"Good habit/practice" is subjective.
If the business domain includes the fact that the person who created an entity is a meaningful piece of information, and that this is likely to be a regular request by end users, then adding a "createdBy" attribute to your tables/classes is, indeed, good practice.
The best way to know whether this is true is to ask the product owner whether they would need a screen showing "all employees created by user x". If they say "no, only if something goes wrong", you have an audit requirement; if they say "yes, we'll use that regularly", it's an integral part of your business domain.
You may find, that your users want to know not just who created a row, but also who modified it. In that case, there are similar questions on SO.

How should I record in the database that an item/product is visible by 'all' groups?

A user can be in groups. And an item/product is assigned groups that can see the item. Users can see that item if they are in one of the assigned groups.
I want neither public (anonymous users in no groups) nor groupless users (logged in users not in any groups) to see the item. But I want the interface to allow assigning the item an 'all/any groups' attribute so that users that are in any group at all can see the item.
Where/How should I store this assignment?
p.s. I expect the technique to also be extended to other entities, for example I'd assign a file to a category, and groups are linked to categories. so when a file is marked as visible by the 'all/any category' then if the user (thru groups and group-categories) is linked to at least one category then the file is visible to them.
Decision:
It seemed the choice was whether to implement as a row in a entity-groups table or as fields in the entity table. The chosen answer used the former.
And either managing the group membership in a table or adding JOIN conditions. The chosen answer used the former, but I'm going to use the latter. I'm putting an indirection between the query and usage so if (when) performance is a problem I should be able to change to a managed table underneath (as suggested) without changing usage.
I've other special groups like 'admin', 'users', etc. which can also fit into the same concept (the basis simply being a list of groups) more easily than special and variable field handling for each entity.
thanks all.
I'd put it in the items table as a boolean/bit column IsVisibleToAllGroups.
It does make queries to get all items for a user a bit less straightforward but the other alternative would be to expand out "all groups" so you add a permission row for each individual group but this can potentially lead to a huge expansion in the number of rows and you still have to keep this up-to-date if an additional group is added later and somehow distinguish between a permission that was granted explicitly to all (current and future) groups and one that just happened to be granted to all groups currently in existence.
Edit You don't mention the RDBMS you are using. One other approach you could take would be to have a hierarchy of groups.
GroupId ParentGroupId Name
----------- ------------- ----------
0 NULL Base Group
1 0 Group 1
2 0 Group 2
You could then assign your "all" permissions to GroupId=0 and use (SQL Server approach below)
WITH GroupsForUser
AS (SELECT G.GroupId,
G.ParentGroupId
FROM UserGroups UG
JOIN Groups G
ON G.GroupId = UG.GroupId
WHERE UserId = #UserId
UNION ALL
SELECT G.GroupId,
G.ParentGroupId
FROM Groups G
JOIN GroupsForUser GU
ON G.GroupId = GU.ParentGroupId)
SELECT IG.ItemId
FROM GroupsForUser GU
JOIN ItemGroups IG
ON IG.GroupId = GU.GroupId
As mentioned by both Martin Smith and Mikael Eriksson, making this a property of the entity is a very tidy and straight forward approach. Purely in terms of data representation, this has a very nice feel to it.
I would, however, also consider the queries that you are likely to make against the data. For example, based on your description, you seem most likely to have queries that start with a single user, find the groups they are a member of, and then find the entities they are associated to. Possibly something lke this...
SELECT DISTINCT -- If both user and entity relate to multiple groups, de-dupe them
entity.*
FROM
user
INNER JOIN
user_link_group
ON user.id = user_link_group.user_id
INNER JOIN
group_link_entity
ON group_link_entity.group_id = user_link_group.group_id
INNER JOIN
entity
ON entity.id = group_link_entity.entity_id
WHERE
user.id = #user_id
If you were to use this format, and the idea of a property in the entity table, you would need something much less elegant, and I think the following UNION approach is possibly the most efficient...
<ORIGINAL QUERY>
UNION -- Not UNION ALL, as the next query may duplicate results from above
SELECT
entity.*
FROM
entity
WHERE
EXISTS (SELECT * FROM user_link_group WHERE user_id = #user_id)
AND isVisibleToAllGroups != 0
-- NOTE: This also implies the need for an additional index on [isVisibleToAllGroups]
Rather than create the corner case in the "what entity can I see" query, it is instead an option to create the corner case in the maintenance of the link tables...
Create a GLOBAL group
If an enitity is visible to all groups, map them to the GLOBAL group
If a user is added to a group, ensure they are also linked to the GLOBAL group
If a user is removed from all groups, ensure they are also removed from the GLOBAL group
In this way, the original simple query works without modification. This means that no UNION is needed, with it's overhead of sorting and de-duplication, and neither is the INDEX on isVisibleToAllGroups needed. Instead, the overhead is moved to maintaining which groups a user is linked to; a one time overhead instead.
This assumes that the question "what entities can I see" is more common than changing groups. It also adds a behaviour that is defined by the DATA and not by the SCHEMA, which necessitates good documentation and understanding. As such, I do see this as a powerful type of optimisation, but I also see it as a trades-and-balances type of compromise that needs accounting for in the database design.
Instead of a boolean, which needs additional logic in every query, I'd add a column 'needs_group' which contains the name (or number) of the group that is required for the item. Whether a NULL field means 'nobody' or 'everybody' is only a (allow/deny) design-decision. Creating one 'public' group and putting everybody in it is also a design decision. YMMV
This concept should get you going:
The user can see the product if:
the corresponding row exists in USER_GROUP_PRODUCT
or PRODUCT.PUBLIC is TRUE (and user is in at least one group, if I understand your question correctly).
There are 2 key points to consider about this model:
Liberal usage of identifying relationships - primary keys of parents are "migrated" within primary keys of children, which enables "merging" of GROUP_ID at the bottom USER_GROUP_PRODUCT. This is what allows the DBMS to enforce the constraint that both user and product have to belong to the same group to be mutually visible. Usage of non-identifying relationships and surrogate keys would prevent the DBMS from being able to enforce that directly (you'd have to write custom triggers).
Usage of PRODUCT.PUBLIC - you'll have to treat this field as "magic" in your client code. The alternative is to simply fill the USER_GROUP_PRODUCT with all the possible combinations, but this approach is fragile in case a new user is added - it would not automatically see the product unless you update the USER_GROUP_PRODUCT as well, but how would you know you need to update it unless you have a field such as PRODUCT.PUBLIC? So if you can't avoid PRODUCT.PUBLIC anyway, why not treat it specially and save some storage space in the database?

What to do if 2 (or more) relationship tables would have the same name?

So I know the convention for naming M-M relationship tables in SQL is to have something like so:
For tables User and Data the relationship table would be called
UserData
User_Data
or something similar (from here)
What happens then if you need to have multiple relationships between User and Data, representing each in its own table? I have a site I'm working on where I have two primary items and multiple independent M-M relationships between them. I know I could just use a single relationship table and have a field which determines the relationship type, but I'm not sure whether this is a good solution. Assuming I don't go that route, what naming convention should I follow to work around my original problem?
To make it more clear, say my site is an auction site (it isn't but the principle is similar). I have registered users and I have items, a user does not have to be registered to post an item but they do need to be to do anything else. I have table User which has info on registered users and Items which has info on posted items. Now a user can bid on an item, but they can also report a item (spam, etc.), both of these are M-M relationships. All that happens when either event occurs is that an email is generated, in my scenario I have no reason to keep track of the actual "report" or "bid" other than to know who bid/reported on what.
I think you should name tables after their function. Lets say we have Cars and People tables. Car has owners and car has assigned drivers. Driver can have more than one car. One of the tables you could call CarsDrivers, second CarsOwners.
EDIT
In your situation I think you should have two tables: AuctionsBids and AuctionsReports. I believe that report requires additional dictinary (spam, illegal item,...) and bid requires other parameters like price, bid date. So having two tables is justified. You will propably be more often accessing bids than reports. Sending email will be slightly more complicated then when this data is stored in one table, but it is not really a big problem.
I don't really see this as a true M-M mapping table. Those usually are JUST a mapping. From your example most of these will have additional information as well. For example, a table of bids, which would have a User and an Item, will probably have info on what the bid was, when it was placed, etc. I would call this table... wait for it... Bids.
For reporting items you might want what was offensive about it, when it was placed, etc. Call this table OffenseReports or something.
You can name tables whatever you want. I would just name them something that makes sense. I think the convention of naming them Table1Table2 is just because sometimes the relationships don't make alot of sense to an outside observer.
There's no official or unofficial convention on relations or tables names. You can name them as you want, the way you like.
If you have multiple user_data relationships with the same keys that makes absolutely no sense. If you have different keys, name the relation in a descriptive way like: stores_products_manufacturers or stores_products_paymentMethods
I think you're only confused because the join tables are currently simple. Once you add more information, I think it will be obvious that you should append a functional suffix. For example:
Table User
UserID
EmailAddress
Table Item
ItemID
ItemDescription
Table UserItem_SpamReport
UserID
ItemID
ReportDate
Table UserItem_Post
UserID -- can be (NULL, -1, '', ...)
ItemID
PostDate
Table UserItem_Bid
UserId
ItemId
BidDate
BidAmount
Then the relation will have a Role. For instance a stock has 2 companies associated: an issuer and a buyer. The relationship is defined by the role the parent and child play to each other.
You could either put each role in a separate table that you name with the role (IE Stock_Issuer, Stock_Buyer etc, both have a relationship one - many to company - stock)
The stock example is pretty fixed, so two tables would be fine. When there are multiple types of relations possible and you can't foresee them now, normalizing it into a relationtype column would seem the better option.
This also depends on the quality of the developers having to work with your model. The column approach is a bit more abstract... but if they don't get it maybe they'd better stay away from databases altogether..
Both will work fine I guess.
Good luck, GJ
GJ

MySQL - Structure for Permissions to Objects

What would be an ideal structure for users > permissions of objects.
I've seen many related posts for general permissions, or what sections a user can access, which consists of a users, userGroups and userGroupRelations or something of that nature.
In my system there are many different objects that can get created, and each one has to be able to be turned on or off. For instance, take a password manager that has groups and sub groups.
Group 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Each group can contain a set of passwords. A user can be given read, write, edit and delete permissions to any group. More groups can get created at any point in time.
If someone has permission to a group, I should be able to make him have permissions to all sub groups OR restrict it to just that group.
My current thought is to have a users table, and then a permissions table with columns like:
permission_id (int) PRIMARY_KEY
user_id (int) INDEX
object_id (int) INDEX
type (varchar) INDEX
admin (bool)
read (bool)
write (bool)
edit (bool)
delete (bool)
This has worked in the past, but the new system I'm building needs to be able to scale rapidly, and I am unsure if this is the best structure. It also makes the idea of having someone with all subgroup permissions of a group more difficult.
There will be a separate table for roles of users/admins, which means they can change the permissions on users below groups they can control.
So, as a question, should I use the above structure? Or can someone point me in the direction of a better one?
EDIT
Alternative is to create a permission table for every type of object.
I suggest you add a "last_update" timestamp and a "last_updated_by_user" column so you have some hope of tracking changes to this table in your running system.
You could consider adding a permission -- grant. A user having the grant permission for an object would be able to grant access to other users to the object in question.
Be careful with "needs to scale rapidly." It's hard to guess without real-world production experience what a scaled-up system really needs.
Also, be careful not to over-complicate a permissions system, because an overly complex system will be hard to verify and therefore easier to crack. A simple system will be much easier to refactor for scaleup than a more complex one.
Your schema seems to relate users to objects. Do you want your primary key and your unique index to be (user_id, object_id)? That is, do you want each user to have either zero or one permission entry for each object? If so, use the primary key to enforce that, rather than using the surrogate permission_id key you propose.
For your objects that exist in hierarchies, you should make one of two choices systemwide:
a grant to an object with subobjects
implicitly grants access to only the
object, or...
it also grants access
to all subobjects.
The second choice reduces the burden of explicit permission granting when new subobjects are created. The first choice is more secure.
The second choice makes it harder to determine whether a user has access to a particular object, because you have to walk the object hierarchy toward the root of the tree looking for access grants on parent objects when verifying whether a user has access. That performance issue should dominate your decision making. Will your users create a few objects and access them often? Or will they create many objects and subobjects and access them rarely? If access is more frequent than creation, you want the first choice. Take the permission-granting overhead hit at object creation time, rather than a permission-searching hit at object access time.
I think the first choice is probably superior. I suggest this table layout:
user_id (int)
object_id (int)
type (varchar) (not sure what you have this column for)
admin (bool)
read (bool)
write (bool)
edit (bool)
grant (bool)
delete (bool)
last_update (timestamp)
last_updated_by_user_id (int)
primary key = user_id, object_id.
You could also use this table layout, and have a row in the table for each distinct permission granted to each user for each object. This one scales up more easily if you add more types of permissions.
user_id (int)
object_id (int)
permission_enum (admin/read/write/edit/grant/delete)
type (varchar) (not sure what you have this column for)
last_update (timestamp)
last_updated_by_user_id (int)
primary key = user_id, object_id, permission_enum