Select rows based on hierarchical permissions - sql

I have a tree/hierarchy of groups and a SQL table of items,each associated with a group (ie. each item belongs to a group). I need to select only the rows associated with a given group, or with the groups below.
eg. say this is the group tree:
A
=> B
=> D
=> C
=> E
=> F
Selecting items for group A will return all rows, while selecting for group C will select items belonging in C,E and F (descendants of C).
So far, I am thinking I can implement this in one of two ways:
1. IN list
SELECT * FROM table WERE Group in ('C','E','F')
programatically determining the list of descendants before querying
2. BITWISE operator
SELECT * FROM table WHERE GroupBitMask & 52!=0
(ie. bitwise 'C' + 'E' + 'F' ==bit 3 + bit 5 + bit 6 == 110100 ==52 )
again, this 52 will need to be computed before the query by parsing the group tree.
I guess I can probably enforce a limit of 64 groups max. and use a 64-bit mask for this.
I'm not sure if the database will use an index for this or simply scan all rows to determine the bitwise result?
Are there any other (better?) methods of selecting the rows I need ?

A simple solution is to store the ancestry as part of the row:
Group Path Other columns
A A ...
B AB ...
C AC ...
D ABD ...
E ACE ...
F ACF ...
You can retrieve the base path with single query:
select Path from YourTable where Group = 'C'
Then you can query all descendants like:
select * from YourTable where path like 'AC%'
This performs very well with a primary key on (Group) and an index on (Path).

Related

Filter with SQL Server by Group ID

I have two tables and I need to filter the data by filter id depends on the relation to to filter group id.
For example I have this two tables:
Table 1:
ItemID
FilterID
3
122
3
123
3
4
17
123
Table 2:
FilterID
FilterGroupID
122
5
123
5
4
1
If I search by filter id = 123 than all item id with this filter need to be returned.
If I search two or more different filter id that have different group id I need to get only the item ids that have all filter id and group id.
Desired output:
first input: 123 -> return item id =3 and item id = 17
second input: 123,4 -> return item id = 3 because filter id 123 belong to group id 5 and filter id 4 belong to group id 1 and item id 3 is the only one that has this two filters.
third input: 122,123 -> return item id =3 and item id = 17 because both filter id belong to same group.
I am getting a little lost with this query and I will be glad to get some help.
I’ll try to simplify it: Let’s say we have group filter of size and group filter of color. If I filter by size S or M than I need to get all items with this sizes. If I want to add color like blue than the answer will cut the result by: item with size S or M and Color blue. So filter from different group may cut some results
It seems that you want to get every ItemID which has at least one matching filter from each FilterGroupID within your filter input. So within each group you have or logic, and between groups you have and logic
If you store your input in a table variable or Table-Valued parameter, then you can just use normal relational division techniques.
This then becomes a question of Relational Division With Remainder, with multiple divisors.
There are many ways to slice this cake. Here is one option
Join the filter input to the groups, to get each filter's group ID
Use a combination of DENSE_RANK and MAX to get the total distinct groups (you can't use COUNT(DISTINCT in a window function so we need to hack it)
You can change this step to use a subquery instead of window functions. It may be faster or slower
Join the main table, and filter out any ItemIDs which do not have their total distinct groups the same as the main total
SELECT
t1.ItemID
FROM (
SELECT *,
TotalGroups = MAX(dr) OVER ()
FROM (
SELECT
fi.FilterID,
t2.FilterGroupID,
dr = DENSE_RANK() OVER (ORDER BY t2.FilterGroupID)
FROM #Filters fi
JOIN Table2 t2 ON t2.FilterID = fI.FilterID
) fi
) fi
JOIN Table1 t1 ON t1.FilterID = fi.FilterID
GROUP BY
t1.ItemID
HAVING COUNT(DISTINCT FilterGroupID) = MAX(fi.TotalGroups);
db<>fiddle

sql how to convert multi select field to rows with totals

I have a table that has a field where the contents are a concatenated list of selections from a multi-select form. I would like to convert the data in this field into in another table where each row has the text of the selection and a count the number of times this selection was made.
eg.
Original table:
id selections
1 A;B
2 B;D
3 A;B;D
4 C
I would like to get the following out:
selection count
A 2
B 3
C 1
D 2
I could easily do this with split and maps in javascript etc, but not sure how to approach it in SQL. (I use Postgresql) The goal is to use the second table to plot a graph in Google Data Studio.
A much simpler solution:
select regexp_split_to_table(selections, ';'), count(*)
from test_table
group by 1
order by 1;
You can use a lateral join and handy set-returning function regexp_split_to_table() to unnest the strings to rows, then aggregate and count:
select x.selection, count(*) cnt
from mytable t
cross join lateral regexp_split_to_table(t.selections, ';') x(selection)
group by x.selection

How to get unique list from two column in Entity Framework core?

I have a Table in the database with 2 Columns containing userIds.
Column A
1
2
3
4
5
Column B
4
2
6
1
7
Now I want to get a list/array containing the distinct Ids.
The expected result will be
[1,2,3,4,5,6,7]
Any idea how to do it?
I am looking for a Ef Core lambda/linq which will run on the database end and not have to fetch the result in the memory and then find the distinct list as that would be costly operation.
you can try this
var ids = Table1.Select( i => i.ColumnA )
.Union( Table2.Select( j => j.ColumnB ) )
.ToList()
Use union:
select col1
from t
union -- on purpose to remove duplicates
select col2
from t;
You would then read the results of the query into your application.
Posting as an answer for further reference:
IList<String> ids = ((from taba in ids select ids) .Union(from tabB in ids select (ids))).ToList();

Find rows that contain all words in any order

My application is built in vb.net with SQL Server Compact as the database so I'm unable to use a full-text index.
Here's my data...
MainTable field1
A B C
B G C
X Y Z
C P B
Search term = B C
Expected Results = any combination of the search term = Rows 1, 2, 4
Here's what I'm currently doing...
I'm permuting the search term B C into an array containing %B%C% and %C%B% and inserting those values into field1 of tempTable.
So my SQL looks like this:
SELECT * FROM MainTable INNER JOIN tempTable ON MainTable.field1 LIKE tempTable.field1
In this simple example, it does return the expected results correctly. However, my search term can contain more values. For example 6 search terms B C D E F G when permuted has 720 different values and as more search terms are used, the permutations grow exponentially...which is not good.
Is there a better way to do this?
The following will work for your example above:
Select * from table where field1 like '%[BC]%'
But it will also return strings that contain ONLY "B" or "C". Do you need both characters in any order or one or more?
EDIT: Then the following would work:
Select * from test_data where col1 LIKE '%Apple%' and col1 like '%Dog%'
See the demo here: http://rextester.com/edit/LNDQ49764

SQL: Most efficient way to select sequences of rows from a table

I have a tagged textual corpus stored in an SQL table like the following:
id tag1 tag2 token sentence_id
0 a e five 1
1 b f score 1
2 c g years 1
3 d h ago 1
My task is to search the table for sequences of tokens that meet certain criteria, sometimes with gaps between each token.
For example:
I want to be able to search for a sequence similar to the following:
the token has the value a in the tag1 column, and
the second token is one to two rows away from the first, and has the value g in tag2 or b in tag1, and
the third token should be at least three rows away, and has ago in the token column.
In SQL, this would be something like the following:
SELECT * FROM my_table t1
JOIN my_table t2 ON t1.sentence_id = t2.sentence_id
JOIN my_table t3 ON t3.sentence_id = t1.sentence_id
WHERE t1.tag1 = 'a' AND (t2.id = t1.id + 1 OR t2.id = t1.id + 2)
AND (t2.tag2 = 'g' OR t2.tag1 = 'b')
AND t3.id >= t1.id + 3 AND t3.token = 'ago'
So far I have only been able to achieve this by joining the table by itself each time I specify a new token in the sequence (e.g. JOIN my_table t4), but with millions of rows this gets quite slow. Is there a more efficient way to do this?
You could try this staged approach:
apply each condition (other than the various distance conditions) as a subquery
Calculate the distances between the tokens which meet the conditions
Apply all the distance conditions separately.
This might improve things, if you have indexes on the tag1, tag2 and token columns:
SELECT DISTINCT sentence_id FROM
(
-- 2. Here we calculate the distances
SELECT cond1.sentence_id,
(cond2.id - cond1.id) as cond2_distance,
(cond3.id - cond1.id) as cond3_distance
FROM
-- 1. These are all the non-distance conditions
(
SELECT * FROM my_table WHERE tag1 = 'a'
) cond1
INNER JOIN
(
SELECT * FROM my_table WHERE
(tag1 = 'b' OR tag2 = 'g')
) cond2
ON cond1.sentence_id = cond2.sentence_id
INNER JOIN
(
SELECT * FROM my_table WHERE token = 'ago'
) cond3
ON cond1.sentence_id = cond3.sentence_id
) conditions
-- 3. Now apply the distance conditions
WHERE cond2_distance BETWEEN 0 AND 2
AND cond3_distance >= 3
ORDER BY sentence_id;
If you apply this query to this SQL fiddle you get:
| sentence_id |
|-------------|
| 1 |
| 4 |
Which is what you want. Now whether it's any faster or not, only you (with your million-row database) can really tell, but from the perspective of having to actually write these queries, you'll find they're much easier to read, understand and maintain.
You need to edit your question and give more details on how these sequences of tokens work (for instance, what does "each time I specify a new token in the sequence" mean in practice?).
In postgresql you can solve this class of queries with a window function. Following your exact specification above:
SELECT *,
CASE
WHEN lead(tag2, 2) OVER w = 'g' THEN lead(token, 2) OVER w
WHEN lead(tag1) OVER w = 'b' THEN lead(token) OVER w
ELSE NULL::text
END AS next_token
FROM my_table
WHERE tag1 = 'a'
AND next_token IS NOT NULL
WINDOW w AS (PARTITION BY sentence_id ORDER BY id);
The lead() function looks ahead a number of rows (default is 1, when not specified) from the current row in the window frame, in this case all rows with the same sentence_id as specified in the partition of the window definition. So, lead(tag1, 2) looks at the value of tag1 two rows ahead to compare against your condition, and lead(token, 2) returns the token from two rows ahead as column next_token in the current row and having the same sentence_id. If the first CASE condition fails, the second is evaluated; if that fails NULL is returned. Note that the order of the conditions in the CASE clause is significant: different ordering gives different results.
Obviously, if you keep on adding conditions for subsequent tokens the query becomes very complex and you may have to put individual search conditions in separate stored procedures and then call these depending on your requirements.