How can I do access control via an SQL table? - sql

I'm trying to create an access control system.
Here's a stripped down example of what the table I'm trying to control access to looks like:
things table:
id group_id name
1 1 thing 1
2 1 thing 2
3 1 thing 3
4 1 thing 4
5 2 thing 5
And the access control table looks like this:
access table:
user_id type object_id access
1 group 1 50
1 thing 1 10
1 thing 2 100
Access can be granted either by specifying the id of the 'thing' directly, or granted for an entire group of things by specifying a group id. In the above example, user 1 has been granted an access level of 50 to group 1, which should apply unless there are any other rules granting more specific access to an individual thing.
I need a query that returns a list of things (ids only is okay) along with the access level for a specific user. So using the example above I'd want something like this for user id 1:
desired result:
thing_id access
1 10
2 100
3 50 (things 3 and 4 have no specific access rule,
4 50 so this '50' is from the group rule)
5 (thing 5 has no rules at all, so although I
still want it in the output, there's no access
level for it)
The closest I can come up with is this:
SELECT *
FROM things
LEFT JOIN access ON
user_id = 1
AND (
(access.type = 'group' AND access.object_id = things.group_id)
OR (access.type = 'thing' AND access.object_id = things.id)
)
But that returns multiple rows, when I only want one for each row in the 'things' table. I'm not sure how to get down to a single row for each 'thing', or how to prioritise 'thing' rules over 'group' rules.
If it helps, the database I'm using is PostgreSQL.
Please feel free to leave a comment if there's any information I've missed out.
Thanks in advance!

I don't know the Postgres SQL dialect, but maybe something like:
select thing.*, coalesce ( ( select access
from access
where userid = 1
and type = 'thing'
and object_id = thing.id
),
( select access
from access
where userid = 1
and type = 'group'
and object_id = thing.group_id
)
)
from things
Incidentally, I don't like the design. I would prefer the access table to be split into two:
thing_access (user_id, thing_id, access)
group_access (user_id, group_id, access)
My query then becomes:
select thing.*, coalesce ( ( select access
from thing_access
where userid = 1
and thing_id = thing.id
),
( select access
from group_access
where userid = 1
and group_id = thing.group_id
)
)
from things
I prefer this because foreign keys can now be used in the access tables.

I just read a paper last night on this. It has some ideas on how to do this. If you can't use the link on the title try using Google Scholar on Limiting Disclosure in Hippocratic Databases.

While there are several good answers, the most efficient would probably be something like this:
SELECT things.id, things.group_id, things.name, max(access)
FROM things
LEFT JOIN access ON
user_id = 1
AND (
(access.type = 'group' AND access.object_id = things.group_id)
OR (access.type = 'thing' AND access.object_id = things.id)
)
group by things.id, things.group_id, things.name
Which simply uses summarization added to you query to get what you're looking for.

Tony:
Not a bad solution, I like it, seems to work. Here's your query after minor tweaking:
SELECT
things.*,
coalesce (
( SELECT access
FROM access
WHERE user_id = 1
AND type = 'thing'
AND object_id = things.id
),
( SELECT access
FROM access
WHERE user_id = 1
AND type = 'group'
AND object_id = things.group_id
)
) AS access
FROM things;
And the results look correct:
id | group_id | name | access
----+----------+---------+--------
1 | 1 | thing 1 | 10
2 | 1 | thing 2 | 100
3 | 1 | thing 3 | 50
4 | 1 | thing 4 | 50
5 | 2 | thing 5 |
I do completely take the point about it not being an ideal schema. However, I am stuck with it to some extent.
Josef:
Your solution is very similar to the stuff I was playing with, and my instincts (such as they are) tell me that it should be possible to do it that way. Unfortunately it doesn't produce completely correct results:
id | group_id | name | max
----+----------+---------+-----
1 | 1 | thing 1 | 50
2 | 1 | thing 2 | 100
3 | 1 | thing 3 | 50
4 | 1 | thing 4 | 50
5 | 2 | thing 5 |
The access level for 'thing 1' has taken the higher 'group' access value, rather than the more specific 'thing' access value of 10, which is what I'm after. I don't think there's a way to fix that within a GROUP BY, but if anyone has any suggestions I'm more than happy to be proven incorrect on that point.

Related

PostgreSQL - How to find the row with a record that matches it with a value higher than given value?

Let's say I have two tables with a 1-to-many relation.
Table A (user):
id INT,
name TEXT
Table B (skill):
id INT,
user_id INT,
skill_name TEXT,
skill_level INT
Each user may have multiple skills. And now I wish to gather the users that have a certain skill that at least at a certain level, and maybe the users that have all the skills that matches the condition.
For example, let's say I have the following data in my database:
User:
id | name
1 | Merlin
2 | Morgan
Skill:
id | user_id | skill_name | skill_level
1 | 1 | Fireball | 2
2 | 1 | Thunderbolt | 3
2 | 2 | Thunderbolt | 2
2 | 2 | Firestorm | 1
2 | 2 | Curse | 3
And if I search for user who has thunderbolt at level 2 or more, I should get both Merlin and Morgan; if if I search for user who knows both thunderbolt at level 1 or more and curse at level 2 or more, then only Morgan should appear.
Also, I am hoping the result could also contain the content of all skills the users have. For now I am using
ARRAY_AGG(JSON_BUILD_OBJECT('skill_name', skill_name, 'skill_level', skill_level') to gather all skills by users' id. But I don't know how to filter the data based on those skills I get.
One approach uses aggregation, and a having clause to filter on skills names and levels. Boolean aggregate functions come handy here:
select u.*,
jsonb_agg(jsonb_build_object('skill_name', skill_name, 'skill_level', skill_level)) as skills
from users u
inner join skills s on s.user_id = u.id
group by u.id
having bool_or(s.skill_name = 'Thunderbolt' and s.skill_level >= 1)
and bool_or(s.skill_name = 'Curse' and s.skill_level >= 2)
This also adds a column to the resultset, called skills, that is a JSONB array containing one object for each skill name and level, as requested in your question: note that it makes more sense to generate a JSON(B) array rather than an array of JSON objects, as your originally intended.
One method uses aggregation. For one skill:
select user_id
from skills
group by user_id
having count(*) filter (where skill_name = 'thunderbold' and skill_level >= 2) > 0;
And for multiple skills, just add more clauses:
select user_id
from skills
group by user_id
having count(*) filter (where skill_name = 'thunderbold' and skill_level >= 1) > 0 and
count(*) filter (where skill_name = 'curse' and skill_level >= 2) > 0 ;
This returns the user_id. You can join back to the users table to get the name.

SQL - Representing SUM's after using CASE to transform STR -> INT

I am sorry for what may be a long post in advance.
Background:
I am using Rational Team Concert (RTC) which stores work item data in conjunction with Jazz Reporting Service to create reports. Using the Report Builder tool, it allows you to write your own queries to pull data as a table, and has its own interface to represent the table as a graph.
There is not much options for of graphing; the chart type defaults as a count, unless you specify it to show a sum. In order to graph by sum, the data must be a number rather than a string. By default, the Report Builder assumes all variables in the SELECT statement are strings.
The data which I will be using are a bunch of work items. Each work item is associated to a team (A, B) and has a work estimation number (count1, count2).
Item # | Team | Work |
------------------------
123 | A | count1 |
------------------------
124 | A | count2 |
------------------------
125 | B | count2 |
------------------------
....
Problem:
Since the work estimation is entered as a Tag, the first step was to use a CATCH WHEN block when using SELECT to transform count1 -> 1, and count2 -> 2 (the string tag to an actual number which can be summed). This resulted in a table with numbers 1 and 2 in place of the typed tag (good so far).
Item # | Team | Work |
------------------------
123 | A | 1 |
------------------------
124 | A | 2 |
------------------------
125 | B | 2 |
------------------------
....
The problem is that I am trying to graph by sum, which means getting the tool to identify the variables in the SELECT statement as numbers, except for some reason any variable I declare in a SELECT statement is always viewed as a string (The tool has a table of the current columns i.e. variables in the SELECT, along with that the tool identifies as its variable type).
Attempted Solutions:
The first query I did was to return a table of each work item with its team name and work estimate
SELECT T1.NAME,
(CASE WHEN T1.TAGs='count1' THEN 1 ELSE 2 END) AS WORK
FROM RIDW.VW_REQUEST T1
WHERE T1.PROJECT_ID = 73
Which resulted in
Team | Work |
----------------
A | 1 |
----------------
A | 2 |
----------------
B | 2 |
----------------
....
but the tool still sees the numbers as strings. I then tried explicitly casting the CASE to an integer, but resulted in the same issue
...
CAST(CASE WHEN T1.TAGs='count1' THEN 1 ELSE 2 END AS Integer) AS WORK
...
Which again the tool still represents as a string.
Current Goal:
As I cannot confirm if the tool has an underlying problem, compatibility issues with queries, etc. What I believe will work now would be to return a table with 2 rows: The sum of the work for each team
|Sum of 1's and 2's |
-----------------------------
Team A | SUM(1) + SUM(2) |
-----------------------------
Team B | SUM(1) + SUM(2) |
-----------------------------
What I am having trouble with is using sub queries to use SUM to sum the data. When I try
SUM(CASE WHEN ... END) AS TIME2 I get an error that "Column modifiers AVG and SUM apply only to number attributes". This has me thinking that I need to have a sub query which returns the column after the CASE, and then SUM that, but I am sailing into uncharted waters and can't seem to get the syntax to work.
I understand that a post like this would be better off on the product help forum. I have tried asking around but cannot get any help. The solution I am proposing of returning the 2 row/column table should bypass any issues the software may have, but I need help sub-querying the SUM when using a case.
I appreciate your time and help!
EDIT 1:
Below is the full query code which preforms the CASE correctly, but still causes with the interpreted type by the tool:
SELECT
T1.Name,
CAST(CASE WHEN T1.TAGS='|release_points_1|' THEN 1 ELSE (CASE WHEN T1.TAGS='|release_points_2|' THEN 2 ELSE 0 END) END AS Integer) AS TAG,
FROM RIDW.VW_REQUEST T1
WHERE T1.PROJECT_ID = 73
AND
(T1.ISSOFTDELETED = 0) AND
(T1.REQUEST_ID <> -1 AND T1.REQUEST_ID IS NOT NULL
This small adjustment to your current query should work:
SELECT
T1.Name,
SUM(CAST(CASE WHEN T1.TAGS='|release_points_1|' THEN 1 ELSE (CASE WHEN T1.TAGS='|release_points_2|' THEN 2 ELSE 0 END) END AS Integer)) AS TAG,
FROM RIDW.VW_REQUEST T1
WHERE T1.PROJECT_ID = 73
AND
(T1.ISSOFTDELETED = 0) AND
(T1.REQUEST_ID <> -1 AND T1.REQUEST_ID IS NOT NULL
GROUP BY T1.Name

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.
It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.
So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

NOT LIKE search on Link Table

I have a Model table with an ID and A Text Column:
ID | Description
=======================
1 | Model A
2 | Model B
3 | Model C
I have an Items table with an ID and lots of other columns. These two tables are linked via an intermediary table call ItemModels with the following data:
ID | ItemID | ModelID
==================================
1 | 1 | 1
2 | 1 | 2
3 | 2 | 1
4 | 2 | 2
5 | 2 | 3
6 | 3 | 2
I want to search using the standard "Contains, Does Not Contain, Starts With, Ends With" methods.
If I do a "Contains", "Starts With" or "Ends With" search using the LIKE operator this works fine and I always get the correct results, however I have a problem when using the NOT LIKE operator:
If I want to return all items where the model description does not contain "C" (case insensitive) I thought simply of doing the following:
SELECT ItemID FROM ItemModels INNER JOIN Model ON ItemModels.ModelID = Model.ID WHERE Description NOT LIKE '%C%'
I want this query to return Items 1 and 3 as neither of them have any models that contain 'C' however this query will also return item 2 as it will hit the record with ItemModel.ID = 3 and say "That does not contain C so we want to return that!" which of course is undesired behaviour.
So my question is:
How can I do a NOT LIKE search that encompasses all records in a Link table?
ps. I hope I have made this clear as it took me hours to track this issue down and work out why it was happening. And even more hours trying to work out how the hell to fix it!
You don't want any of the items to match your condition. Think in terms of aggregation and a having clause:
SELECT im.ItemID
FROM im.ItemModels im INNER JOIN
Model m
ON im.ModelID = m.ID
GROUP BY im.ItemId
HAVING SUM(CASE WHEN Description LIKE '%C%' THEN 1 ELSE 0 END) = 0;
This query counts the number of models that match the item. The = 0 says that there are none. I like this approach because it is quite flexible. Using AND and OR you can put together complicated conditions, such as like '%a%' and '%b%' but not like '%c%'.

MySQL conditional SELECT statement

If there are records that have a field containing "X", return them, else return a random record.
How the heck do you do this?
This is best done with 2 queries. The first returns the records where field='x'. If that's empty, then do a query for a random record with field!='x'. Getting a random record can be very inefficient as you'll see from the number of "get random record" questions on SO. Because of this, you really only want to do it if you absolutely have to.
Just the bit to select a random record would be very difficult and highly unefficient on large tables in mysql, in this website you can find one script to do it, it should be trivial to add your condition for 'x' and get the functionality you need.
Well, here is my example based on mysql.users table:
First, non existing records:
mysql> SELECT * FROM (select user, 1 as q from user where user like '%z' union all (select user, 0 from user limit 1)) b WHERE q=(SELECT CASE WHEN EXISTS(select user, 1 as q from user where user like '%z' ) THEN 1 ELSE 0 END);
+--------+---+
| user | q |
+--------+---+
| drupal | 0 |
+--------+---+
1 row in set (0.00 sec)
Then, existing:
mysql> SELECT * FROM (select user, 1 as q from user where user like '%t' union all (select user, 0 from user limit 1)) b WHERE q=(SELECT CASE WHEN EXISTS(select user, 1 as q from user where user like '%t' ) THEN 1 ELSE 0 END);
+------------------+---+
| user | q |
+------------------+---+
| root | 1 |
| root | 1 |
| debian-sys-maint | 1 |
| root | 1 |
+------------------+---+
4 rows in set (0.00 sec)
Maybe it will be useful, or maybe someone will be able to rewrite it in better way.