How should I record in the database that an item/product is visible by 'all' groups? - sql

A user can be in groups. And an item/product is assigned groups that can see the item. Users can see that item if they are in one of the assigned groups.
I want neither public (anonymous users in no groups) nor groupless users (logged in users not in any groups) to see the item. But I want the interface to allow assigning the item an 'all/any groups' attribute so that users that are in any group at all can see the item.
Where/How should I store this assignment?
p.s. I expect the technique to also be extended to other entities, for example I'd assign a file to a category, and groups are linked to categories. so when a file is marked as visible by the 'all/any category' then if the user (thru groups and group-categories) is linked to at least one category then the file is visible to them.
Decision:
It seemed the choice was whether to implement as a row in a entity-groups table or as fields in the entity table. The chosen answer used the former.
And either managing the group membership in a table or adding JOIN conditions. The chosen answer used the former, but I'm going to use the latter. I'm putting an indirection between the query and usage so if (when) performance is a problem I should be able to change to a managed table underneath (as suggested) without changing usage.
I've other special groups like 'admin', 'users', etc. which can also fit into the same concept (the basis simply being a list of groups) more easily than special and variable field handling for each entity.
thanks all.

I'd put it in the items table as a boolean/bit column IsVisibleToAllGroups.
It does make queries to get all items for a user a bit less straightforward but the other alternative would be to expand out "all groups" so you add a permission row for each individual group but this can potentially lead to a huge expansion in the number of rows and you still have to keep this up-to-date if an additional group is added later and somehow distinguish between a permission that was granted explicitly to all (current and future) groups and one that just happened to be granted to all groups currently in existence.
Edit You don't mention the RDBMS you are using. One other approach you could take would be to have a hierarchy of groups.
GroupId ParentGroupId Name
----------- ------------- ----------
0 NULL Base Group
1 0 Group 1
2 0 Group 2
You could then assign your "all" permissions to GroupId=0 and use (SQL Server approach below)
WITH GroupsForUser
AS (SELECT G.GroupId,
G.ParentGroupId
FROM UserGroups UG
JOIN Groups G
ON G.GroupId = UG.GroupId
WHERE UserId = #UserId
UNION ALL
SELECT G.GroupId,
G.ParentGroupId
FROM Groups G
JOIN GroupsForUser GU
ON G.GroupId = GU.ParentGroupId)
SELECT IG.ItemId
FROM GroupsForUser GU
JOIN ItemGroups IG
ON IG.GroupId = GU.GroupId

As mentioned by both Martin Smith and Mikael Eriksson, making this a property of the entity is a very tidy and straight forward approach. Purely in terms of data representation, this has a very nice feel to it.
I would, however, also consider the queries that you are likely to make against the data. For example, based on your description, you seem most likely to have queries that start with a single user, find the groups they are a member of, and then find the entities they are associated to. Possibly something lke this...
SELECT DISTINCT -- If both user and entity relate to multiple groups, de-dupe them
entity.*
FROM
user
INNER JOIN
user_link_group
ON user.id = user_link_group.user_id
INNER JOIN
group_link_entity
ON group_link_entity.group_id = user_link_group.group_id
INNER JOIN
entity
ON entity.id = group_link_entity.entity_id
WHERE
user.id = #user_id
If you were to use this format, and the idea of a property in the entity table, you would need something much less elegant, and I think the following UNION approach is possibly the most efficient...
<ORIGINAL QUERY>
UNION -- Not UNION ALL, as the next query may duplicate results from above
SELECT
entity.*
FROM
entity
WHERE
EXISTS (SELECT * FROM user_link_group WHERE user_id = #user_id)
AND isVisibleToAllGroups != 0
-- NOTE: This also implies the need for an additional index on [isVisibleToAllGroups]
Rather than create the corner case in the "what entity can I see" query, it is instead an option to create the corner case in the maintenance of the link tables...
Create a GLOBAL group
If an enitity is visible to all groups, map them to the GLOBAL group
If a user is added to a group, ensure they are also linked to the GLOBAL group
If a user is removed from all groups, ensure they are also removed from the GLOBAL group
In this way, the original simple query works without modification. This means that no UNION is needed, with it's overhead of sorting and de-duplication, and neither is the INDEX on isVisibleToAllGroups needed. Instead, the overhead is moved to maintaining which groups a user is linked to; a one time overhead instead.
This assumes that the question "what entities can I see" is more common than changing groups. It also adds a behaviour that is defined by the DATA and not by the SCHEMA, which necessitates good documentation and understanding. As such, I do see this as a powerful type of optimisation, but I also see it as a trades-and-balances type of compromise that needs accounting for in the database design.

Instead of a boolean, which needs additional logic in every query, I'd add a column 'needs_group' which contains the name (or number) of the group that is required for the item. Whether a NULL field means 'nobody' or 'everybody' is only a (allow/deny) design-decision. Creating one 'public' group and putting everybody in it is also a design decision. YMMV

This concept should get you going:
The user can see the product if:
the corresponding row exists in USER_GROUP_PRODUCT
or PRODUCT.PUBLIC is TRUE (and user is in at least one group, if I understand your question correctly).
There are 2 key points to consider about this model:
Liberal usage of identifying relationships - primary keys of parents are "migrated" within primary keys of children, which enables "merging" of GROUP_ID at the bottom USER_GROUP_PRODUCT. This is what allows the DBMS to enforce the constraint that both user and product have to belong to the same group to be mutually visible. Usage of non-identifying relationships and surrogate keys would prevent the DBMS from being able to enforce that directly (you'd have to write custom triggers).
Usage of PRODUCT.PUBLIC - you'll have to treat this field as "magic" in your client code. The alternative is to simply fill the USER_GROUP_PRODUCT with all the possible combinations, but this approach is fragile in case a new user is added - it would not automatically see the product unless you update the USER_GROUP_PRODUCT as well, but how would you know you need to update it unless you have a field such as PRODUCT.PUBLIC? So if you can't avoid PRODUCT.PUBLIC anyway, why not treat it specially and save some storage space in the database?

Related

Get all Many:Many relationships from reference/join-table

I am having difficulty querying for possible relationships in a Many:Many scenario.
I present my schema:
What I do know how to query with this schema is:
All Bands that a given User belongs to.
All Users that belong to a given Band.
What I am trying to do is:
Get all Band Members across all Bands that a given User belongs to.
ie, say I am in 5 bands, I want to know who all of my bandmates are.
My first questions are:
Is there a name for this type of query? Where I am more interested in the joined relationships than what I am joined to (just saying that made me want to put this whole system into a Graph DB :/ )? I'd like to learn proper terminology to help me google for problems down the road.
Is this a terrible idea in RDBMS land in general? I feel like this should be a common use case but I want to know if I'm totally approaching this wrong.
To recap:
I am looking to query the above schema with the expected output being one row per User as Band Members that a given User shares a Band with.
Terminology
Your terminology seems to be correct - "many to many", often written as "many:many" with a colon. Sometimes the middle table (band_members) is called the "bridge table".
You can probably drop band_members.id, since the two foreign keys also make up a composite primary key (and the primary key can actually be defined that way, since normally a User cannot be a member of the same Band twice. The only exception to that is if a User could have more than one role in the same Band).
Solution
On the surface of it, this sounds easy - we can see the relationships of the tables, and one would normally just use an INNER JOIN between them. There are three tables, so that would be two joins.
However, we have to conceptualise the problem correctly first. The problem we have is that the join between Users and Band Members (user ID) is actually to be used for two things:
which User is in what Band
filtering by User
So to do this we need to introduce one table with multiple purposes:
SELECT
Users.first_name, Users.last_name
FROM Users
INNER JOIN Band_Members Band_Members1 ON (Band_Members1.user_id = Users.Id)
INNER JOIN Band_Members Band_Members2 ON (Band_Members1.band_id = Band_Members2.band_id)
WHERE
Band_Members2.user_id = 1
You can see here that I have joined Band_Members twice, and when one does that, one has to alias them differently, so they can be separately referenced. The first instance does the obvious join between the Users table and the bridge table, and the second one does a link between "Users who are in Bands" and "Bands that I am in".
Of course, this solution requires that you know your User ID. If you had wanted to do a similar query but filter based on your name, then you would have to join to another (re-aliased) copy of the User table, so that you can differentiate between the two different purposes: "Users who are in bands" and "your User".

How do I give different users access to different rows without creating separate views in BigQuery?

In this question: How do I use row-level permissions in BigQuery? it describes how to use an authorized view to grant access to only a portion of a table. But I'd like to give different users access to different rows. Does this mean I need to create separate views for each user? Is there an easier way?
Happily, if you want to give different users access to different rows in your table, you don't need to create separate views for each one. You have a couple of options.
These options all make use of the SESSION_USER() function in BigQuery, which returns the e-mail address of the currently running user. For example, if I run:
SELECT SESSION_USER()
I get back tigani#google.com.
The simplest option, then, for displaying different rows to different users, is to add another column to your table that is the user who is allowed to see the row. For example, the schema: {customer:string, id:integer} would become {customer:string, id:integer, allowed_viewer: string}. Then you can define a view:
#standardSQL
SELECT customer, id
FROM private.customers
WHERE allowed_viewer = SESSION_USER()
(note, don't forget to authorize the view as described here).
Then I'd be able to see only the fields where tigani#google.com was the value in the allowed_viewer column.
This approach has its own drawbacks, however; You can only grant access to a single user at a time. One option would be to make the allowed_viewer column a repeated field; this would let you provide a list of users for each row.
However, this is still pretty restrictive, and requires a lot of bookkeeping about which users should have access to which row. Chances are, what you'd really like to do is specify a group. So your schema would look like: {customer:string, id:integer, allowed_group: string}, and anyone in the allowed_group would be able to see your table.
You can make this work by having another table that has your group mappings. That table would look like: {group:string, user_name:string}. The rows might look like:
{engineers, tigani#google.com}
{engineers, some_engineer#google.com}
{administrators, some_admin#google.com}
{sales, some_salesperson#google.com}
...
Let's call this table private.access_control. Then we can change our view definition:
#standardSQL
SELECT c.customer, c.id
FROM private.customers c
INNER JOIN (
SELECT group
FROM private.access_control
WHERE SESSION_USER() = user_name) g
ON c.allowed_group = g.group
(note you will want to make sure that there are no duplicates in private.access_control, otherwise it could records to repeat in the results).
In this way, you can manage the groups in the private.access_control separately from the data table (private.customers).
There is still one piece missing that you might want; the ability for groups to contain other groups. You can get this by doing a more complex join to expand the groups in the access control table (you might want to consider doing this only once and saving the results, to save the work each time the main table is queried).

Relationship redundant?

I'm designing a database and I have a user table with users, and a group table with user's group.
These groups will have a owner (a user that has created it), and a set of users that are part of the group (like a Whatsapp group).
To represent this I have this design:
Do you think the owner column on Group table is necessary? Maybe I can add an owner column on Group table I can know easily the group's owner.
If you don't add the owner in the group then where are you going to add it? The only way I see apart from this is adding a boolean isowner to the usergroup. Anyway, this would not make sense if there will only be 1 owner. If there can be N owners then that would be the way to go.
You are on the right track, but you'll need one more step to ensure an owner must actually belong to the group she owns:
There is a FOREIGN KEY in Group {groupID, owner} that references UserGroup {groupID, userID}.
If your DBMS supports deferred foreign keys, you can make owner NOT NULL, to ensure a group cannot be owner-less. Otherwise you can leave it NULL-able (and you will still be able to break the "chicken-and-egg" problem with circular references if your DBMS supports MATCH SIMPLE FKs - almost all do).
You need 4 tables:
User
UserGroup
Group
UserRole (associated with UserGroup) - Shows the role of a user in a group (admin/owner, etc.) - If your roles are Admin and Ordinary user, you could use a Binary column on UserGroup instead.
I know a solution has already been proposed, but I am so convinced there is a better one ...
At the higher level, the concept of owner can be seen as a property of the relation existing between users and groups. It should then ideally be set as a field in the UserGroup table.
It could be either a boolean field or, even better, a generalized userGroupNatureOfRelation field, that could hold values such as 'owner', 'participant', 'user', or whatever could be the status.
Of course, such a solution allows you to implement any specific business rule, like 'there is only one owner per group'. It will let you implement any other more sophisticated business rule when needed, and it will even allow you to add a level of complexity by adding fields such as:
userGroupRelationStartDate
userGroupRelationEndDate
where you'll be able to follow the nature of the relation between a group and a person through time ...
Of course you could say 'I don't need it'. But implementing such an 'open' model does not cost anything more than what you are thinking of now. Then, if for any reason, in a near or far future, business rules have to be changed or improved, your model stays valid and efficient ...
This said, building views and manipulating data should be a lot easier with this model. So this a good and immediate reason to adopt it!

Exclude ambiguous columns in SELECT statement (actually CREATE VIEW statement)

After reading around, I've realized that SQL (MySQL in my case) does not support column exclusion.
SELECT *, NOT excluded_column FROM table; /* shame it doesn't work */
Anyways, while I've come to accept that, I'm wondering if there's any decent workarounds to achieve this sort of behavior. Reason being, I'm creating a view to consolidate information across a few tables.
I've normalized some user data to tables user and user_profile among others; purpose being that user stores data critical to user operations, and user_profile stores non-critical data. Application requirements are still being realized, so columns are being added/removed from user_profile as necessary, and further tables may be supported down the line which would be included in the view.
Problem is, when I create the view, I get Error 1060: Duplicate Column Name because user_id is present in both tables.
Now, the solution I've come up with so far, is basically:
/* exclude user_id from user */
SELECT user.critical_field, user.other_critical_field,
user_profile.*
FROM user
LEFT JOIN user_profile
ON user.user_id = user_profile.user_id;
Since the user table is going to remain unchanged throughout the application lifecycle (hopefully) this could suffice, but I was just curious if a more dynamic approach exists.
(Table names were not copypasta'd, I know user is often a poor choice of naming convention on it's own, I use prefixes.)
Typically, I'll define which fields I want to be in my view
Using your example:
SELECT user.critical_field, user.other_critical_field,
user_profile.User_Id, user_profile.MyOtherOfield
FROM user
LEFT JOIN user_profile
ON user.user_id = user_profile.user_id;
Now, additionally, I'll make sure that I alias things properly:
SELECT u.critical_field, u.other_critical_field,
up.User_Id, up.MyOtherOfield, u.KeyField AS userKey, up.KeyField as ProfileKey
FROM user as u
LEFT JOIN user_profile as up
ON u.user_id = up.user_id;
This allows me to ensure I know what's in my view, and that the columns are named intelligently, but it does mean that I'll need to touch that view when I make changes to the underlying table structures.

What is the best way to add users to multiple groups in a database?

In an application where users can belong to multiple groups, I'm currently storing their groups in a column called groups as a binary. Every four bytes is a 32 bit integer which is the GroupID. However, this means that to enumerate all the users in a group I have to programatically select all users, and manually find out if they contain that group.
Another method was to use a unicode string, where each character is the integer denoting a group, and this makes searching easy, but is a bit of a fudge.
Another method is to create a separate table, linking users to groups. One column called UserID and another called GroupID.
Which of these ways would be the best to do it? Or is there a better way?
You have a many-to-many relationship between users and groups. This calls for a separate table to combine users with groups:
User: (UserId[PrimaryKey], UserName etc.)
Group: (GroupId[PrimaryKey], GroupName etc.)
UserInGroup: (UserId[ForeignKey], GroupId[ForeignKey])
To find all users in a given group, you just say:
select * from User join UserInGroup on UserId Where GroupId=<the GroupId you want>
Rule of thumb: If you feel like you need to encode multiple values in the same field, you probably need a foreign key to a separate table. Your tricks with byte-blocks or Unicode chars are just clever tricks to encode multiple values in one field. Database design should not use clever tricks - save that for application code ;-)
I'd definitely go for the separate table - certainly the best relational view of data. If you have indexes on both UserID and GroupID you have a quick way of getting users per group and groups per user.
The more standard, usable and comprehensible way is the join table. It's easily supported by many ORMs, in addition to being reasonably performant for most cases. Only enter in "clever" ways if you have a reason to, say a million of users and having to answer that question every half a second.
I would make 3 tables. users, groups and usersgroups which is used as cross-reference table to link users and groups. In usersgroups table I would add userId and groupId columns and make them as primary key. BTW. What naming conventions there are to name those xref tables?
It depends what you're trying to do, but if your database supports it, you might consider using roles. The advantage of this is that the database provides security around roles, and you don't have to create any tables.