Best practices on multiple SQL calls to avoid repeated results - sql

Should multiple SQL calls be used when it can be done in one call, if the one call returns repeated data?
Like say, and this is a really oversimplified version of what I'm trying to do, we want to get a user info and their favorite bands. If I did it in one call, I could do:
SELECT userName, userAge, userGender, bandName FROM users NATURAL JOIN userBands NATURAL JOIN bands WHERE userID = 47
That will return to me a list of all the bands that user likes, but with every single one it will return the user name, age, and gender.
I could do two calls…
SELECT userName, userAge, userGender FROM users WHERE userID = 47
SELECT bandName FROM userBands NATURAL JOIN bands WHERE userID = 47
where the first just returns the basic info once, and the second just returns the list of bands. Is there any best practice with this? Is there another way to approach this that I'm not realizing? And what if (as it will be in my my real-world case) that it's not just two calls to separate the data, but 4 or 5? So it's a lot more calls, but it's also a lot more extraneous data returned to do it in one call.
I'm using PHP PDO if that makes a difference in the answer.

There is another answer, which is to combine the bands into a single column using group_concat():
SELECT u.userName, u.userAge, u.userGender, group_concat(b.bandName)
FROM users u NATURAL JOIN
userBands ub NATURAL JOIN
bands b
WHERE userID = 47
GROUP BY u.userName, u.userAge, u.userGender;
This gives you one row per user with the list of bands.
By the way, I would discourage you from using NATURAL JOIN. The query then depends on the metadata for the join keys -- and a small change to the table structures could have a big impact on lots of queries. Use either an explicit on clause or a using clause.

This makes sense when sending data across a network to a remote user. Why waste bandwidth sending the same data over and over again? For that purpose, a good method is to place the the data in some sort of XML structure.
<users>
<user>
<userdata1>...</userdata1>
<...>...</...>
<bands>
<band>...</band>
<band>...</band>
...
</bands>
</user>
</users>
No duplicated data is sent. Whether you form this structure from a single result set with the duplicated data or from the two result sets is up to you and your team. Play around with both and decide which is best under your specific situation.

Related

How to implement One-To-Many relationship with the same table?

I have a table called user. Now I am building an app where a user can have many users as friends. So I think I should create a new table called friends_list and implement one-to-many relationship where (user) is one and (friends_list) is many. Then to get the list of friends the user has, I could do select * from friends where userId = XXXx.
Is this the best approach? Or is there another better way to create a relationship with the same table?
Your approach is the best approach. You want to represent an n-m relationship (one user has many friends and vice versa).
There are some considerations. If "friendship" is symmetric (that is A friends B automatically means that B friends A), then you probably want to include both in the table when one is inserted. You may also want to prevent a user from self-friending.
If you want to retrieve user's friends as a list, you can do so using a string concatenation function. In Standard SQL, this is:
select listagg(friendid, ',') within group (order by friendid) as friendids
from friends
where userid = XXX;
Different databases have different function names for listagg().
An approach, and clearly NOT the best, would be to use an extra column to store a comma separated value of friend's user_ids.
If the answer to the question "who are my friends?" would not be too difficult to get, you'll need to rely on LIKE %XXXx% conditions and shouldnt expect very fast response times.
Another drawback would be some complexity with the relationship maintenance while editing the friend list.
Therefore, the two tables schema is both the most semantically correct and reliable one.

Get all Many:Many relationships from reference/join-table

I am having difficulty querying for possible relationships in a Many:Many scenario.
I present my schema:
What I do know how to query with this schema is:
All Bands that a given User belongs to.
All Users that belong to a given Band.
What I am trying to do is:
Get all Band Members across all Bands that a given User belongs to.
ie, say I am in 5 bands, I want to know who all of my bandmates are.
My first questions are:
Is there a name for this type of query? Where I am more interested in the joined relationships than what I am joined to (just saying that made me want to put this whole system into a Graph DB :/ )? I'd like to learn proper terminology to help me google for problems down the road.
Is this a terrible idea in RDBMS land in general? I feel like this should be a common use case but I want to know if I'm totally approaching this wrong.
To recap:
I am looking to query the above schema with the expected output being one row per User as Band Members that a given User shares a Band with.
Terminology
Your terminology seems to be correct - "many to many", often written as "many:many" with a colon. Sometimes the middle table (band_members) is called the "bridge table".
You can probably drop band_members.id, since the two foreign keys also make up a composite primary key (and the primary key can actually be defined that way, since normally a User cannot be a member of the same Band twice. The only exception to that is if a User could have more than one role in the same Band).
Solution
On the surface of it, this sounds easy - we can see the relationships of the tables, and one would normally just use an INNER JOIN between them. There are three tables, so that would be two joins.
However, we have to conceptualise the problem correctly first. The problem we have is that the join between Users and Band Members (user ID) is actually to be used for two things:
which User is in what Band
filtering by User
So to do this we need to introduce one table with multiple purposes:
SELECT
Users.first_name, Users.last_name
FROM Users
INNER JOIN Band_Members Band_Members1 ON (Band_Members1.user_id = Users.Id)
INNER JOIN Band_Members Band_Members2 ON (Band_Members1.band_id = Band_Members2.band_id)
WHERE
Band_Members2.user_id = 1
You can see here that I have joined Band_Members twice, and when one does that, one has to alias them differently, so they can be separately referenced. The first instance does the obvious join between the Users table and the bridge table, and the second one does a link between "Users who are in Bands" and "Bands that I am in".
Of course, this solution requires that you know your User ID. If you had wanted to do a similar query but filter based on your name, then you would have to join to another (re-aliased) copy of the User table, so that you can differentiate between the two different purposes: "Users who are in bands" and "your User".

How do I give different users access to different rows without creating separate views in BigQuery?

In this question: How do I use row-level permissions in BigQuery? it describes how to use an authorized view to grant access to only a portion of a table. But I'd like to give different users access to different rows. Does this mean I need to create separate views for each user? Is there an easier way?
Happily, if you want to give different users access to different rows in your table, you don't need to create separate views for each one. You have a couple of options.
These options all make use of the SESSION_USER() function in BigQuery, which returns the e-mail address of the currently running user. For example, if I run:
SELECT SESSION_USER()
I get back tigani#google.com.
The simplest option, then, for displaying different rows to different users, is to add another column to your table that is the user who is allowed to see the row. For example, the schema: {customer:string, id:integer} would become {customer:string, id:integer, allowed_viewer: string}. Then you can define a view:
#standardSQL
SELECT customer, id
FROM private.customers
WHERE allowed_viewer = SESSION_USER()
(note, don't forget to authorize the view as described here).
Then I'd be able to see only the fields where tigani#google.com was the value in the allowed_viewer column.
This approach has its own drawbacks, however; You can only grant access to a single user at a time. One option would be to make the allowed_viewer column a repeated field; this would let you provide a list of users for each row.
However, this is still pretty restrictive, and requires a lot of bookkeeping about which users should have access to which row. Chances are, what you'd really like to do is specify a group. So your schema would look like: {customer:string, id:integer, allowed_group: string}, and anyone in the allowed_group would be able to see your table.
You can make this work by having another table that has your group mappings. That table would look like: {group:string, user_name:string}. The rows might look like:
{engineers, tigani#google.com}
{engineers, some_engineer#google.com}
{administrators, some_admin#google.com}
{sales, some_salesperson#google.com}
...
Let's call this table private.access_control. Then we can change our view definition:
#standardSQL
SELECT c.customer, c.id
FROM private.customers c
INNER JOIN (
SELECT group
FROM private.access_control
WHERE SESSION_USER() = user_name) g
ON c.allowed_group = g.group
(note you will want to make sure that there are no duplicates in private.access_control, otherwise it could records to repeat in the results).
In this way, you can manage the groups in the private.access_control separately from the data table (private.customers).
There is still one piece missing that you might want; the ability for groups to contain other groups. You can get this by doing a more complex join to expand the groups in the access control table (you might want to consider doing this only once and saving the results, to save the work each time the main table is queried).

How should I record in the database that an item/product is visible by 'all' groups?

A user can be in groups. And an item/product is assigned groups that can see the item. Users can see that item if they are in one of the assigned groups.
I want neither public (anonymous users in no groups) nor groupless users (logged in users not in any groups) to see the item. But I want the interface to allow assigning the item an 'all/any groups' attribute so that users that are in any group at all can see the item.
Where/How should I store this assignment?
p.s. I expect the technique to also be extended to other entities, for example I'd assign a file to a category, and groups are linked to categories. so when a file is marked as visible by the 'all/any category' then if the user (thru groups and group-categories) is linked to at least one category then the file is visible to them.
Decision:
It seemed the choice was whether to implement as a row in a entity-groups table or as fields in the entity table. The chosen answer used the former.
And either managing the group membership in a table or adding JOIN conditions. The chosen answer used the former, but I'm going to use the latter. I'm putting an indirection between the query and usage so if (when) performance is a problem I should be able to change to a managed table underneath (as suggested) without changing usage.
I've other special groups like 'admin', 'users', etc. which can also fit into the same concept (the basis simply being a list of groups) more easily than special and variable field handling for each entity.
thanks all.
I'd put it in the items table as a boolean/bit column IsVisibleToAllGroups.
It does make queries to get all items for a user a bit less straightforward but the other alternative would be to expand out "all groups" so you add a permission row for each individual group but this can potentially lead to a huge expansion in the number of rows and you still have to keep this up-to-date if an additional group is added later and somehow distinguish between a permission that was granted explicitly to all (current and future) groups and one that just happened to be granted to all groups currently in existence.
Edit You don't mention the RDBMS you are using. One other approach you could take would be to have a hierarchy of groups.
GroupId ParentGroupId Name
----------- ------------- ----------
0 NULL Base Group
1 0 Group 1
2 0 Group 2
You could then assign your "all" permissions to GroupId=0 and use (SQL Server approach below)
WITH GroupsForUser
AS (SELECT G.GroupId,
G.ParentGroupId
FROM UserGroups UG
JOIN Groups G
ON G.GroupId = UG.GroupId
WHERE UserId = #UserId
UNION ALL
SELECT G.GroupId,
G.ParentGroupId
FROM Groups G
JOIN GroupsForUser GU
ON G.GroupId = GU.ParentGroupId)
SELECT IG.ItemId
FROM GroupsForUser GU
JOIN ItemGroups IG
ON IG.GroupId = GU.GroupId
As mentioned by both Martin Smith and Mikael Eriksson, making this a property of the entity is a very tidy and straight forward approach. Purely in terms of data representation, this has a very nice feel to it.
I would, however, also consider the queries that you are likely to make against the data. For example, based on your description, you seem most likely to have queries that start with a single user, find the groups they are a member of, and then find the entities they are associated to. Possibly something lke this...
SELECT DISTINCT -- If both user and entity relate to multiple groups, de-dupe them
entity.*
FROM
user
INNER JOIN
user_link_group
ON user.id = user_link_group.user_id
INNER JOIN
group_link_entity
ON group_link_entity.group_id = user_link_group.group_id
INNER JOIN
entity
ON entity.id = group_link_entity.entity_id
WHERE
user.id = #user_id
If you were to use this format, and the idea of a property in the entity table, you would need something much less elegant, and I think the following UNION approach is possibly the most efficient...
<ORIGINAL QUERY>
UNION -- Not UNION ALL, as the next query may duplicate results from above
SELECT
entity.*
FROM
entity
WHERE
EXISTS (SELECT * FROM user_link_group WHERE user_id = #user_id)
AND isVisibleToAllGroups != 0
-- NOTE: This also implies the need for an additional index on [isVisibleToAllGroups]
Rather than create the corner case in the "what entity can I see" query, it is instead an option to create the corner case in the maintenance of the link tables...
Create a GLOBAL group
If an enitity is visible to all groups, map them to the GLOBAL group
If a user is added to a group, ensure they are also linked to the GLOBAL group
If a user is removed from all groups, ensure they are also removed from the GLOBAL group
In this way, the original simple query works without modification. This means that no UNION is needed, with it's overhead of sorting and de-duplication, and neither is the INDEX on isVisibleToAllGroups needed. Instead, the overhead is moved to maintaining which groups a user is linked to; a one time overhead instead.
This assumes that the question "what entities can I see" is more common than changing groups. It also adds a behaviour that is defined by the DATA and not by the SCHEMA, which necessitates good documentation and understanding. As such, I do see this as a powerful type of optimisation, but I also see it as a trades-and-balances type of compromise that needs accounting for in the database design.
Instead of a boolean, which needs additional logic in every query, I'd add a column 'needs_group' which contains the name (or number) of the group that is required for the item. Whether a NULL field means 'nobody' or 'everybody' is only a (allow/deny) design-decision. Creating one 'public' group and putting everybody in it is also a design decision. YMMV
This concept should get you going:
The user can see the product if:
the corresponding row exists in USER_GROUP_PRODUCT
or PRODUCT.PUBLIC is TRUE (and user is in at least one group, if I understand your question correctly).
There are 2 key points to consider about this model:
Liberal usage of identifying relationships - primary keys of parents are "migrated" within primary keys of children, which enables "merging" of GROUP_ID at the bottom USER_GROUP_PRODUCT. This is what allows the DBMS to enforce the constraint that both user and product have to belong to the same group to be mutually visible. Usage of non-identifying relationships and surrogate keys would prevent the DBMS from being able to enforce that directly (you'd have to write custom triggers).
Usage of PRODUCT.PUBLIC - you'll have to treat this field as "magic" in your client code. The alternative is to simply fill the USER_GROUP_PRODUCT with all the possible combinations, but this approach is fragile in case a new user is added - it would not automatically see the product unless you update the USER_GROUP_PRODUCT as well, but how would you know you need to update it unless you have a field such as PRODUCT.PUBLIC? So if you can't avoid PRODUCT.PUBLIC anyway, why not treat it specially and save some storage space in the database?

What is the best way to add users to multiple groups in a database?

In an application where users can belong to multiple groups, I'm currently storing their groups in a column called groups as a binary. Every four bytes is a 32 bit integer which is the GroupID. However, this means that to enumerate all the users in a group I have to programatically select all users, and manually find out if they contain that group.
Another method was to use a unicode string, where each character is the integer denoting a group, and this makes searching easy, but is a bit of a fudge.
Another method is to create a separate table, linking users to groups. One column called UserID and another called GroupID.
Which of these ways would be the best to do it? Or is there a better way?
You have a many-to-many relationship between users and groups. This calls for a separate table to combine users with groups:
User: (UserId[PrimaryKey], UserName etc.)
Group: (GroupId[PrimaryKey], GroupName etc.)
UserInGroup: (UserId[ForeignKey], GroupId[ForeignKey])
To find all users in a given group, you just say:
select * from User join UserInGroup on UserId Where GroupId=<the GroupId you want>
Rule of thumb: If you feel like you need to encode multiple values in the same field, you probably need a foreign key to a separate table. Your tricks with byte-blocks or Unicode chars are just clever tricks to encode multiple values in one field. Database design should not use clever tricks - save that for application code ;-)
I'd definitely go for the separate table - certainly the best relational view of data. If you have indexes on both UserID and GroupID you have a quick way of getting users per group and groups per user.
The more standard, usable and comprehensible way is the join table. It's easily supported by many ORMs, in addition to being reasonably performant for most cases. Only enter in "clever" ways if you have a reason to, say a million of users and having to answer that question every half a second.
I would make 3 tables. users, groups and usersgroups which is used as cross-reference table to link users and groups. In usersgroups table I would add userId and groupId columns and make them as primary key. BTW. What naming conventions there are to name those xref tables?
It depends what you're trying to do, but if your database supports it, you might consider using roles. The advantage of this is that the database provides security around roles, and you don't have to create any tables.