Exclude ambiguous columns in SELECT statement (actually CREATE VIEW statement) - sql

After reading around, I've realized that SQL (MySQL in my case) does not support column exclusion.
SELECT *, NOT excluded_column FROM table; /* shame it doesn't work */
Anyways, while I've come to accept that, I'm wondering if there's any decent workarounds to achieve this sort of behavior. Reason being, I'm creating a view to consolidate information across a few tables.
I've normalized some user data to tables user and user_profile among others; purpose being that user stores data critical to user operations, and user_profile stores non-critical data. Application requirements are still being realized, so columns are being added/removed from user_profile as necessary, and further tables may be supported down the line which would be included in the view.
Problem is, when I create the view, I get Error 1060: Duplicate Column Name because user_id is present in both tables.
Now, the solution I've come up with so far, is basically:
/* exclude user_id from user */
SELECT user.critical_field, user.other_critical_field,
user_profile.*
FROM user
LEFT JOIN user_profile
ON user.user_id = user_profile.user_id;
Since the user table is going to remain unchanged throughout the application lifecycle (hopefully) this could suffice, but I was just curious if a more dynamic approach exists.
(Table names were not copypasta'd, I know user is often a poor choice of naming convention on it's own, I use prefixes.)

Typically, I'll define which fields I want to be in my view
Using your example:
SELECT user.critical_field, user.other_critical_field,
user_profile.User_Id, user_profile.MyOtherOfield
FROM user
LEFT JOIN user_profile
ON user.user_id = user_profile.user_id;
Now, additionally, I'll make sure that I alias things properly:
SELECT u.critical_field, u.other_critical_field,
up.User_Id, up.MyOtherOfield, u.KeyField AS userKey, up.KeyField as ProfileKey
FROM user as u
LEFT JOIN user_profile as up
ON u.user_id = up.user_id;
This allows me to ensure I know what's in my view, and that the columns are named intelligently, but it does mean that I'll need to touch that view when I make changes to the underlying table structures.

Related

Get all Many:Many relationships from reference/join-table

I am having difficulty querying for possible relationships in a Many:Many scenario.
I present my schema:
What I do know how to query with this schema is:
All Bands that a given User belongs to.
All Users that belong to a given Band.
What I am trying to do is:
Get all Band Members across all Bands that a given User belongs to.
ie, say I am in 5 bands, I want to know who all of my bandmates are.
My first questions are:
Is there a name for this type of query? Where I am more interested in the joined relationships than what I am joined to (just saying that made me want to put this whole system into a Graph DB :/ )? I'd like to learn proper terminology to help me google for problems down the road.
Is this a terrible idea in RDBMS land in general? I feel like this should be a common use case but I want to know if I'm totally approaching this wrong.
To recap:
I am looking to query the above schema with the expected output being one row per User as Band Members that a given User shares a Band with.
Terminology
Your terminology seems to be correct - "many to many", often written as "many:many" with a colon. Sometimes the middle table (band_members) is called the "bridge table".
You can probably drop band_members.id, since the two foreign keys also make up a composite primary key (and the primary key can actually be defined that way, since normally a User cannot be a member of the same Band twice. The only exception to that is if a User could have more than one role in the same Band).
Solution
On the surface of it, this sounds easy - we can see the relationships of the tables, and one would normally just use an INNER JOIN between them. There are three tables, so that would be two joins.
However, we have to conceptualise the problem correctly first. The problem we have is that the join between Users and Band Members (user ID) is actually to be used for two things:
which User is in what Band
filtering by User
So to do this we need to introduce one table with multiple purposes:
SELECT
Users.first_name, Users.last_name
FROM Users
INNER JOIN Band_Members Band_Members1 ON (Band_Members1.user_id = Users.Id)
INNER JOIN Band_Members Band_Members2 ON (Band_Members1.band_id = Band_Members2.band_id)
WHERE
Band_Members2.user_id = 1
You can see here that I have joined Band_Members twice, and when one does that, one has to alias them differently, so they can be separately referenced. The first instance does the obvious join between the Users table and the bridge table, and the second one does a link between "Users who are in Bands" and "Bands that I am in".
Of course, this solution requires that you know your User ID. If you had wanted to do a similar query but filter based on your name, then you would have to join to another (re-aliased) copy of the User table, so that you can differentiate between the two different purposes: "Users who are in bands" and "your User".

How do I give different users access to different rows without creating separate views in BigQuery?

In this question: How do I use row-level permissions in BigQuery? it describes how to use an authorized view to grant access to only a portion of a table. But I'd like to give different users access to different rows. Does this mean I need to create separate views for each user? Is there an easier way?
Happily, if you want to give different users access to different rows in your table, you don't need to create separate views for each one. You have a couple of options.
These options all make use of the SESSION_USER() function in BigQuery, which returns the e-mail address of the currently running user. For example, if I run:
SELECT SESSION_USER()
I get back tigani#google.com.
The simplest option, then, for displaying different rows to different users, is to add another column to your table that is the user who is allowed to see the row. For example, the schema: {customer:string, id:integer} would become {customer:string, id:integer, allowed_viewer: string}. Then you can define a view:
#standardSQL
SELECT customer, id
FROM private.customers
WHERE allowed_viewer = SESSION_USER()
(note, don't forget to authorize the view as described here).
Then I'd be able to see only the fields where tigani#google.com was the value in the allowed_viewer column.
This approach has its own drawbacks, however; You can only grant access to a single user at a time. One option would be to make the allowed_viewer column a repeated field; this would let you provide a list of users for each row.
However, this is still pretty restrictive, and requires a lot of bookkeeping about which users should have access to which row. Chances are, what you'd really like to do is specify a group. So your schema would look like: {customer:string, id:integer, allowed_group: string}, and anyone in the allowed_group would be able to see your table.
You can make this work by having another table that has your group mappings. That table would look like: {group:string, user_name:string}. The rows might look like:
{engineers, tigani#google.com}
{engineers, some_engineer#google.com}
{administrators, some_admin#google.com}
{sales, some_salesperson#google.com}
...
Let's call this table private.access_control. Then we can change our view definition:
#standardSQL
SELECT c.customer, c.id
FROM private.customers c
INNER JOIN (
SELECT group
FROM private.access_control
WHERE SESSION_USER() = user_name) g
ON c.allowed_group = g.group
(note you will want to make sure that there are no duplicates in private.access_control, otherwise it could records to repeat in the results).
In this way, you can manage the groups in the private.access_control separately from the data table (private.customers).
There is still one piece missing that you might want; the ability for groups to contain other groups. You can get this by doing a more complex join to expand the groups in the access control table (you might want to consider doing this only once and saving the results, to save the work each time the main table is queried).

How should I record in the database that an item/product is visible by 'all' groups?

A user can be in groups. And an item/product is assigned groups that can see the item. Users can see that item if they are in one of the assigned groups.
I want neither public (anonymous users in no groups) nor groupless users (logged in users not in any groups) to see the item. But I want the interface to allow assigning the item an 'all/any groups' attribute so that users that are in any group at all can see the item.
Where/How should I store this assignment?
p.s. I expect the technique to also be extended to other entities, for example I'd assign a file to a category, and groups are linked to categories. so when a file is marked as visible by the 'all/any category' then if the user (thru groups and group-categories) is linked to at least one category then the file is visible to them.
Decision:
It seemed the choice was whether to implement as a row in a entity-groups table or as fields in the entity table. The chosen answer used the former.
And either managing the group membership in a table or adding JOIN conditions. The chosen answer used the former, but I'm going to use the latter. I'm putting an indirection between the query and usage so if (when) performance is a problem I should be able to change to a managed table underneath (as suggested) without changing usage.
I've other special groups like 'admin', 'users', etc. which can also fit into the same concept (the basis simply being a list of groups) more easily than special and variable field handling for each entity.
thanks all.
I'd put it in the items table as a boolean/bit column IsVisibleToAllGroups.
It does make queries to get all items for a user a bit less straightforward but the other alternative would be to expand out "all groups" so you add a permission row for each individual group but this can potentially lead to a huge expansion in the number of rows and you still have to keep this up-to-date if an additional group is added later and somehow distinguish between a permission that was granted explicitly to all (current and future) groups and one that just happened to be granted to all groups currently in existence.
Edit You don't mention the RDBMS you are using. One other approach you could take would be to have a hierarchy of groups.
GroupId ParentGroupId Name
----------- ------------- ----------
0 NULL Base Group
1 0 Group 1
2 0 Group 2
You could then assign your "all" permissions to GroupId=0 and use (SQL Server approach below)
WITH GroupsForUser
AS (SELECT G.GroupId,
G.ParentGroupId
FROM UserGroups UG
JOIN Groups G
ON G.GroupId = UG.GroupId
WHERE UserId = #UserId
UNION ALL
SELECT G.GroupId,
G.ParentGroupId
FROM Groups G
JOIN GroupsForUser GU
ON G.GroupId = GU.ParentGroupId)
SELECT IG.ItemId
FROM GroupsForUser GU
JOIN ItemGroups IG
ON IG.GroupId = GU.GroupId
As mentioned by both Martin Smith and Mikael Eriksson, making this a property of the entity is a very tidy and straight forward approach. Purely in terms of data representation, this has a very nice feel to it.
I would, however, also consider the queries that you are likely to make against the data. For example, based on your description, you seem most likely to have queries that start with a single user, find the groups they are a member of, and then find the entities they are associated to. Possibly something lke this...
SELECT DISTINCT -- If both user and entity relate to multiple groups, de-dupe them
entity.*
FROM
user
INNER JOIN
user_link_group
ON user.id = user_link_group.user_id
INNER JOIN
group_link_entity
ON group_link_entity.group_id = user_link_group.group_id
INNER JOIN
entity
ON entity.id = group_link_entity.entity_id
WHERE
user.id = #user_id
If you were to use this format, and the idea of a property in the entity table, you would need something much less elegant, and I think the following UNION approach is possibly the most efficient...
<ORIGINAL QUERY>
UNION -- Not UNION ALL, as the next query may duplicate results from above
SELECT
entity.*
FROM
entity
WHERE
EXISTS (SELECT * FROM user_link_group WHERE user_id = #user_id)
AND isVisibleToAllGroups != 0
-- NOTE: This also implies the need for an additional index on [isVisibleToAllGroups]
Rather than create the corner case in the "what entity can I see" query, it is instead an option to create the corner case in the maintenance of the link tables...
Create a GLOBAL group
If an enitity is visible to all groups, map them to the GLOBAL group
If a user is added to a group, ensure they are also linked to the GLOBAL group
If a user is removed from all groups, ensure they are also removed from the GLOBAL group
In this way, the original simple query works without modification. This means that no UNION is needed, with it's overhead of sorting and de-duplication, and neither is the INDEX on isVisibleToAllGroups needed. Instead, the overhead is moved to maintaining which groups a user is linked to; a one time overhead instead.
This assumes that the question "what entities can I see" is more common than changing groups. It also adds a behaviour that is defined by the DATA and not by the SCHEMA, which necessitates good documentation and understanding. As such, I do see this as a powerful type of optimisation, but I also see it as a trades-and-balances type of compromise that needs accounting for in the database design.
Instead of a boolean, which needs additional logic in every query, I'd add a column 'needs_group' which contains the name (or number) of the group that is required for the item. Whether a NULL field means 'nobody' or 'everybody' is only a (allow/deny) design-decision. Creating one 'public' group and putting everybody in it is also a design decision. YMMV
This concept should get you going:
The user can see the product if:
the corresponding row exists in USER_GROUP_PRODUCT
or PRODUCT.PUBLIC is TRUE (and user is in at least one group, if I understand your question correctly).
There are 2 key points to consider about this model:
Liberal usage of identifying relationships - primary keys of parents are "migrated" within primary keys of children, which enables "merging" of GROUP_ID at the bottom USER_GROUP_PRODUCT. This is what allows the DBMS to enforce the constraint that both user and product have to belong to the same group to be mutually visible. Usage of non-identifying relationships and surrogate keys would prevent the DBMS from being able to enforce that directly (you'd have to write custom triggers).
Usage of PRODUCT.PUBLIC - you'll have to treat this field as "magic" in your client code. The alternative is to simply fill the USER_GROUP_PRODUCT with all the possible combinations, but this approach is fragile in case a new user is added - it would not automatically see the product unless you update the USER_GROUP_PRODUCT as well, but how would you know you need to update it unless you have a field such as PRODUCT.PUBLIC? So if you can't avoid PRODUCT.PUBLIC anyway, why not treat it specially and save some storage space in the database?

SQL Server table with different user profiles

I am designing my db tables in SQL Server 2005 and have come across a small design/architecture issue... I have my main Users table (username, password, lastlogin, etc.), but I also need to store 2 different user profiles, i.e. the profile data stored will be different between the two. I've put all the common user data into the Users table.
Do I create separate tables for Consumers and Marketers? And if so, should the primary key in these tables be [table-name]_UserID with a 1:1 relationship on Users_UserID?
Basically, upon registering, the user will be given the choice to register as a Consumer or Marketer. When a user logs in, the Users table will be queried, and their accompanying profile will be queried from either table.
I know this approach is messy, which is why I've come here to ask how best this can be achieved.
Thanks!
EDIT: Additionally, in the Users table I have a Users_UserType flag that will allow me to distinguish between users when they log in, hence knowing which Profile Table to query.
Your gut feeling is correct. You want to normalize your data. Using separate tables reduces data duplication, or empty/null columns.
Unfortunantly, with a reverse relationship like this, you won't have a nice clean foreign key from User to Consumer or Marketer because it could be one table or another.
You would want to map a User_Id from the Consumer/Marketers table back to User though.
You could query it in a single query using left joins:
Select
u.*,
c.*,
m.*
From Users u
left join Consumers c on c.User_Id = u.ID
left join Marketers m on m.User_Id = u.ID
Where
u.ID = #UserId

Recommended Table Set up for one to many/many to one situation

I need to create a script where someone will post an opening for a position, and anyone who is eligible will see the opening but anyone who is not (or opts out) will not see the opening. So two people could go to the same page and see different content, some potentially the same, some totally unique. I'm not sure the best way to arrange that data in a MySQL DB/table.
For instance, I could have it arranged by the posting, but that would look sort of like:
PostID VisibleTo
PostingA user1,user2
And that seems wrong (the CSV style in the column). Or I could go with by person:
User VisiblePosts
user1 posting1, posting2
But it's the same problem. Is there a way to make the user's unique, the posting unique, and have them join only where they match?
The decision is initially made by doing a series of queries to another set of tables, but once that is run, it seems inefficient to have that some chunk of code run again and again when it won't change after the user posts the position.
...On second thought, it MIGHT change, but if we assume it doesn't (as it is unlikely, and as little consequence if a user sees something that they are no longer eligible for), is there a standard solution for this scenario?
Three tables...
User:
[UserId]
[OtherField]
Post:
[PostId]
[OtherFields]
UserPost:
[UserId]
[PostId]
User.UserId Joins to UserPost.UserId,
Post.PostId Joins to UserPost.PostId
Then look up the table UserPost, joining to Post when you are selecting which posts to show
This is a many-to-many relationship or n:m relationship.
You would create an additional table, say PostVisibility, with a column PostID and UserID. If a combination of PostID and UserID is present in the table, that post is visible to that user.
Edit: Sorry, I think you are speaking in Posting-User terms, which is many-to-many. I was thinking of this in terms of posting-"viewing rights" terms, which is one-to-many.
Unless I am missing something, this is a one-to-many situation, which requires two tables. E.g., each posting has n users who can view it. Postings are unique to an individual user, so you don't need to do the reverse.
PostingTable with PostingID (and other data)
PostingVisibilityTable with PostingID and UserID
UserTable with UserID and user data
Create the postings independently of their visibility rights, and then separately add/remove PostingID/UserID pairs against the Visibility table.
To select all postings visible to the current user:
SELECT * FROM PostingTable A INNER JOIN PostingVisibilityTable B ON A.PostingID = B.PostingID WHERE B.UserID = "currentUserID"