Grouping records by subsets SQL

Grouping records by subsets SQL - sql

I have a database with PermitHolders (PermitNum = PK) and DetailedFacilities of each Permit Holder. In the tblPermitDetails table there are 2 columns
PermitNum (foreign Key)
FacilityID (integer Foreign Key Lookup to Facility table).
A permitee can have 1 - 29 items on their permit, e.i. Permit 50 can have a Boat Dock (FacID 4), a Paved walkway (FacID 17) a Retaining Wall (FacID 20) etc. I need an SQL filter/display whatever, ALL PERMIT #s that have ONLY FacIDs 19, 20, or 28, NOT ones that have those plus "x" others,....just that subset. I've worked on this for 4 days, would someone PLEASE help me? I HAVE posted to other BB but have not received any helpful suggestions.
As Oded suggested, here are more details.
There is no PK for the tblPermitDetails table.
Let's say that we have Permitees 1 - 10; Permit 1 is John Doe, he has a Boat Dock (FacID 1), a Walkway (FacID 4), a buoy (FacID 7), and Underbrushing (FacID 19)...those are 3 records for Permit 1. Permit 2 is Sus Brown, she has ONLY underbrushing (FacID 19), Permit 3 is Steve Toni, he has a Boat Dock (FacID 1), a Walkway (FacID 4), a buoy (FacID 7), and a Retaining Wall (FacID 20). Permit 4 is Jill Jack, she has Underbrushing (FacID 19), and a Retaining Wall (FacID 20). I could go on but i hope you follow me. I want an SQL (for MS Access) that will show me ONLY Permits 2 & 4 because they have a combination of FacIDs 19 & 20 [either both, or one or the other], BUT NOT ANYTHING ELSE such as Permit 1 who has #19, but also has 4 & 7.
I hope that helps, please say so if not.
Oh yea, I DO know the difference between i.e. and e.g. since i'm in my 40's have written over 3000 pages of archaeological field reports and an MA thesis, but I'm really stressed out here from struggling with this SQL and could care less about consulting the Chicago Manual of Style before banging out a plea for help. SO, DON"T be coy about my compostion errors! Thank you!

Untested, but how about something like this?
SELECT DISTINCT p.PermitNum
FROM tblPermitDetails p
WHERE EXISTS
(SELECT '+'
FROM tblFacility f
WHERE p.FacilityID = f.FacilityID
AND f.facilityID = 19 )
AND EXISTS
(SELECT '+'
FROM tblFacility f
WHERE p.FacilityID = f.FacilityID
AND f.facilityID = 20 )
AND EXISTS
(SELECT '+'
FROM tblFacility f
WHERE p.FacilityID = f.FacilityID
AND f.facilityID = 28 )
AND NOT EXISTS
(SELECT '+'
FROM tblFacility f
WHERE p.FacilityID = f.FacilityID
AND f.facilityID NOT IN (19,20,28) )

SELECT PermitNum
FROM tblPermitDetails
WHERE FacilityID IN (19, 20, 28)
GROUP BY PermitNum
HAVING COUNT(PermitNum)=3

I wasn't sure if you wanted ALL of 19,20,28 or ANY of 19,20,28... also, this is untested, but if you want the any of solution it should be fairly close
Select
allowed.PermitNum
from
DetailedFacilties allowed
join DetailedFacilities disallowed on allowed.PermitNum != disallowed.PermitNum
where
allowed.FacilityID in (19, 20, 28)
and disallowed.FacilityID not in (19, 20, 28)

SELECT DISTINCT PermitNum FROM tblPermitDetails t1
WHERE FacilityID IN (19, 20, 28)
AND NOT EXISTS (SELECT 1 FROM tblPermitDetails t2
WHERE t2.PermitNum = t1.PermitNum
AND FacilityId NOT IN (19, 20, 28));
Or, in prose, get the list of PermitNums that have any of the requested permit numbers as long as no row exists for that PermitNum that isn't in the requested list.
A more optimized version of the same query would be the following:
SELECT PermitNum FROM (SELECT DISTINCT PermitNum FROM tblPermitDetails
WHERE FacilityID IN (19, 20, 28)) AS t1
WHERE NOT EXISTS (SELECT 1 FROM tblPermitDetails t2
WHERE t2.PermitNum = t1.PermitNum
AND FacilityID NOT IN (19, 20, 28));
It's a little harder to read, but it will involve fewer "NOT EXISTS" subqueries by doing the "DISTINCT" part first.
Update:
David-W-Fenton mentions that NOT EXISTS should be avoided for optimization reasons. For a small table, this probably won't matter much, but you could also do the query using COUNT(*) if you needed to avoid NOT EXISTS:
SELECT DISTINCT PermitNum FROM tblPermitDetails t1
WHERE (SELECT COUNT(*) FROM tblPermitDetails t2
WHERE t1.PermitNum = t2.PermitNum
AND FacilityID IN (19, 20, 28))
=
(SELECT COUNT(*) FROM tblPermitDetails t3
WHERE t1.PermitNum = t3.PermitNum)

What about (untested)
select permitnum
from tblPermitDetails t1
left outer join
(Select distinct permitnum from tblPermitDetails where facilityId not in (19, 20, or 28)) t2
on t1.permitnum=t2.permitnum
where t2.permitnum is null
i.e. we find all the permits that cannot match your criteria (they have at least one detail outside those you list), then we find all the permits that are left, via a left join and where criteria.
with indexes set up properly, this should be pretty quick.

Quick way might be to only look at the ones with exactly three matches (with an inner query), and then among those only include the ones that have 19, 20, and 28.
Of course, that is sort of a brute force method, and not very elegant. But it has the small benefit of being understandable. None of the approaches I can think of will be easy to customize to various other sets of values.

Ok, it seems i didn't understand the problem at first. So, again:
I will recreate the example by Stacy here:
DECLARE #PermitHolders TABLE
(PermitNum INT NOT NULL,
PermitHolder VARCHAR(20))
DECLARE #tblPermitDetails TABLE
(PermitNum INT,
FacilityID INT)
INSERT INTO #PermitHolders VALUES (1, 'John Doe')
INSERT INTO #PermitHolders VALUES (2, 'Sus Brown')
INSERT INTO #PermitHolders VALUES (3, 'Steve Toni')
INSERT INTO #PermitHolders VALUES (4, 'Jill Jack')
INSERT INTO #tblPermitDetails VALUES (1, 1)
INSERT INTO #tblPermitDetails VALUES (1, 4)
INSERT INTO #tblPermitDetails VALUES (1, 7)
INSERT INTO #tblPermitDetails VALUES (1, 19)
INSERT INTO #tblPermitDetails VALUES (2, 19)
INSERT INTO #tblPermitDetails VALUES (3, 1)
INSERT INTO #tblPermitDetails VALUES (3, 4)
INSERT INTO #tblPermitDetails VALUES (3, 7)
INSERT INTO #tblPermitDetails VALUES (3, 20)
INSERT INTO #tblPermitDetails VALUES (4, 19)
INSERT INTO #tblPermitDetails VALUES (4, 20)
And this is the solution:
SELECT * FROM #PermitHolders
WHERE (PermitNum IN (SELECT PermitNum FROM #tblPermitDetails WHERE FacilityID IN (19, 20, 28)))
AND (PermitNum NOT IN (SELECT PermitNum FROM #tblPermitDetails WHERE FacilityID NOT IN (19, 20, 28)))
I have one observation on the side:
You didn't mention any PK for tblPermitDetails. If non exists, this may not be good for performance. I recommend that you create a PK using both PermitNum and FacilityID (composite key) because this will serve as both your PK and a useful index for the expected queries.

Related

Postgres optimize several left joins on one table

I have a postgres schema like this:
CREATE TABLE rows
(
id bigint NOT NULL,
start_year integer
);
CREATE TABLE calculations
(
id bigint NOT NULL,
row_id bigint NOT NULL,
year integer,
calculation numeric(23,7)
);
INSERT INTO rows (id, start_year)
VALUES
(1, 2020),
(2, 2021);
INSERT INTO calculations (id, row_id, year, calculation)
VALUES
(1, 1, 2019, 0),
(2, 1, 2020, 100),
(3, 1, 2021, 900),
(4, 1, 2022, 300),
(5, 1, 2023, 500),
(6, 2, 2019, 220),
(7, 2, 2020, 111),
(8, 2, 2021, 222),
(9, 2, 2024, 333),
(10, 2, 2025, 444);
A an SQL view with select like this:
SELECT
row.id,
calc1.calculation as calc1,
calc2.calculation as calc2,
calc3.calculation as calc3
FROM
rows row
LEFT JOIN calculations calc1 on calc1.row_id = row.id and calc1.year = row.start_year
LEFT JOIN calculations calc2 on calc2.row_id = row.id and calc2.year = row.start_year + 1
LEFT JOIN calculations calc3 on calc3.row_id = row.id and calc3.year = row.start_year + 2;
Actually both tables are way larger. SQL query takes about 10 sec to execute and most of it is taken by calculations. The only thing I've managed to optimize it so far is:
SELECT
row.id,
calc.calculation->(row.start_year)::text as calc1,
calc.calculation->(row.start_year+1)::text as calc2,
calc.calculation->(row.start_year+2)::text as calc3
FROM
rows row
LEFT JOIN (select row_id, json_object_agg(year, calculation) as calculation
from calculations
group by row_id) calc on calc.row_id = row.id
Now it has x2 performance boost, but it not enough. It queries unneeded year values. When I've replaced this query with taking first, second and third year, it was working much faster., so I wonder if there is another way to merge these JOINs to one with performance boost.
http://sqlfiddle.com/#!17/8ff004/4

You may try adding the following index to the calculations table:
CREATE INDEX idx_calc ON calculations (row_id, year, calculation);
This index, if used, has the ability to speed up the multiple joins to the calculations table.

Recursive member of a common table expression 'cte' has multiple recursive references?

I have the following two table E and G.
create table E(K1 int, K2 int primary key (K1, K2))
insert E
values (1, 11), (1, 20), (2, 10), (2, 30), (3, 10), (3, 30),
(4, 100), (5, 200), (6, 200),
(7, 300), (8, 300), (9, 310), (10, 310), (10, 320), (11, 320), (12, 330)
create table G(GroupID varchar(10), K1 int primary key)
insert G
values ('Group 1', 1), ('Group 1', 2), ('Group 2', 4), ('Group 2', 5),
('Group 3', 8), ('Group 3', 9), ('Group 3', 12)
I need to a view - giving a K2 number, find all related K1. The "related K1" is defined:
All K1s have the same K2 in table E. For example, 2 and 3 in E are related because both records have K2 of 10. ((2, 10), (3, 10)).
All K1s have the same GroupID in table G. For example, the K1 of 1 and 2 are both in group Group 1.
So querying the following view
select K1 from GroupByK2 where K2 = 200 -- or 100
should return
4
5
6
because both (5, 200) and (6, 200) have the same K2. And the 4 and 5 of (4, 100) and (5, 200) are both in 'Group 2'.
And select K1 from GroupByK2 where K2 = 300 -- or 310, 320, 330 should return 7, 8, 9, 10, 11, 12.
View:
create view GroupByK2
as
with cte as (
select E.*, K2 K2x from E
union all
select E.K1, E.K2, cte.K2x
from cte
join G on cte.K1 = G.K1
join G h on h.GroupID = G.GroupID
join E on E.K1 = h.K1 and E.K1 <> cte.K1
where not exists (select * from cte x where x.k1 = G.k1 and x.K2 = G.K2) -- error
)
select *
from cte;
However, the SQL has the error of
Recursive member of a common table expression 'cte' has multiple recursive references?

Scratched my head over this one a bit, but here is a working, although highly inefficient solution...
You correctly tried to eliminate joining the original rows back to avoid the cyclic recursion, but it won't work due to 2 reasons:
As the error stated, you can't reference the recursive member more
than once
Even if you could, at each recursion, the recursive set consists only of the output of the previous recursion, so you wouldn't be
able to eliminate the cycles from earlier recursions anyway.
My solution avoids that in a "less than optimal" way, it simply includes all the rows with the cycles, but limits the recursion level to a hard number (5 in the example, but you can parameterize it as well) to avoid the endless recursion, and only at the final query, eliminates the duplicates with a group by.
This may or not work for you depending on the depth of the hierarchy. It creates tons of redundant work, and I doubt it will scale, but YMMV. I addressed it as a logical puzzle :-)
This is one of the (rare) cases where I will definitely consider an iterative solution instead of a set based one. You will need to create a table valued function so you can parameterize it, which you won't be able to do properly with a view. Within the function create a temporary table or table variable, populate it with the output sets one by one, and loop until you are done. This way you will be able to eliminate the cycles at the root by checking the content of the temporary table and only inserting new rows.
Anyway, here goes:
;WITH KeyGroups AS
(
SELECT E.*, G.GroupID
FROM E
LEFT OUTER JOIN
G
ON E.K1 = G.K1
),
Recursive AS
(
SELECT K.K1, K.K2, K.GroupID, 0 AS lvl
FROM KeyGroups AS K
WHERE K.K2 = 300
UNION ALL
SELECT K.K1, K.K2, K.GroupID, lvl + 1
FROM Recursive AS R
INNER JOIN
KeyGroups AS K
ON R.GroupID = K.GroupID
OR
R.K2 = K.K2
OR
R.K1 = K.K1
WHERE lvl < 5
)
SELECT MIN(lvl) AS lvl, K1, K2, GroupID
FROM Recursive
GROUP BY GroupID, K1, K2
ORDER BY lvl, K1, K2, GroupID;
Also see DBFiddle.
I'll give this some more thought tomorrow if I have time, and update here if I find a better solution.
Thanks for the interesting challenge and well formulated post.
HTH

Search for specific values in data using a lookup table

I have a lookup table with values I want to check for in the data.
The problem is somewhat like this:
-- Data with an ID, a group (which is a number) and some letters which belong to that group.
select *
into #data
from (values
(1, 45, 'A'),
(1, 45, 'B'),
(1, 45, 'C'),
(2, 45, 'D'))
as data(id, number, letter)
-- The various letters that I expect for each ID in a specific group
select *
into #expected_letters
from (values
(45, 'A'),
(45, 'D'),
(45, 'E'),
(123, 'A'),
(123, 'Q'))
as expected_letters(number, letter)
The results that I expect from a query are all letters (from all ids from #data) that I expect belonging to that group, but are not there. So these results actually:
(1, 45, D)
(1, 45, E)
(2, 45, A)
(2, 45, E)
In my problem the list is a lot longer with more groups and more id's. I've tried a lot with different joins and set operators, but I can't seem to get my head around this problem.
Some help would be much appreciated.

This is my version which is very similar but uses an outer apply instead of multiple joins. :-
select distinct d.id, aa.number,aa.letter from #data d
outer apply (select * from #expected_letters el where el.number=d.number and el.letter not in
(select letter from #data dt where dt.number=d.number and dt.id=d.id)
) aa

Here's what I tried, and it seems to work. The last inner join aliased "nums" is to remove number 123 from your results, since it doesn't exist for any ID in #data.
select e.*, ids.id from #expected_letters e
cross join (select distinct id from #data) ids
full join #data d on e.number = d.number and e.letter = d.letter and d.id = ids.id
inner join (select distinct number from #data) nums on e.number = nums.number
where
d.id is null
--result:
number letter id
45 A 2
45 D 1
45 E 1
45 E 2

Alternative to NOT IN in SSMS

I have my table in this structure. I am trying to find all the unique ID's whose word's do not appear in the list. How can I achieve this in MS SQL Server.
id word
1 hello
2 friends
2 world
3 cat
3 dog
2 country
1 phone
4 eyes
I have a list of words
**List**
phone
eyes
hair
body
Expected Output
Except the words from the list, I need all the unique ID's. In this case it is,
2
3
I & 4 is not in the output as their words appears in the List
I tried the below code
Select count(distinct ID)
from Table1
where word not in ('phone','eyes','hair','body')
I tried Not Exists also which did not work

You can also use GROUP BY
SELECT id
FROM Table1
GROUP BY id
HAVING MAX(CASE WHEN word IN('phone', 'eyes', 'hair', 'body') THEN 1 ELSE 0 END) = 0

One way to do it is to use not exists, where the inner query is linked to the outer query by id and is filtered by the search words.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE (
id int,
word varchar(20)
)
INSERT INTO #T VALUES
(1, 'hello'),
(2, 'friends'),
(2, 'world'),
(3, 'cat'),
(3, 'dog'),
(2, 'country'),
(1, 'phone'),
(4, 'eyes')
The query:
SELECT DISTINCT id
FROM #T t0
WHERE NOT EXISTS
(
SELECT 1
FROM #T t1
WHERE word IN('phone', 'eyes', 'hair', 'body')
AND t0.Id = t1.Id
)
Result:
id
2
3

SELECT t.id FROM dbo.table AS t
WHERE NOT EXISTS (SELECT 1 FROM dbo.table AS t2
INNER JOIN
(VALUES('phone'),('eyes'),('hair'),('body')) AS lw(word)
ON t2.word = lw.word
AND t2.id = t.id)
GROUP BY t.id;

You can try this as well: this is a dynamic table structure:
DECLARE #T AS TABLE (id int, word varchar(20))
INSERT INTO #T VALUES
(1, 'hello'),
(2, 'friends'),
(2, 'world'),
(3, 'cat'),
(3, 'dog'),
(2, 'country'),
(1, 'phone'),
(4, 'eyes')
DECLARE #tblNotUsed AS TABLE ( id int, word varchar(20))
DECLARE #tblNotUsedIds AS TABLE (id int)
INSERT INTO #tblNotUsed VALUES
(1, 'phone'),
(2, 'eyes'),
(3, 'hair'),
(4, 'body')
INSERT INTO #tblNotUsedIds (id)
SELECT [#T].id FROM #T INNER JOIN #tblNotUsed ON [#tblNotUsed].word = [#T].word
SELECT DISTINCT id FROM #T
WHERE id NOT IN (SELECT id FROM #tblNotUsedIds)

The nice thing about SQL is there are sometimes many ways to do things. Here is one way is to place your list of known values into a #temp table and then run something like this.
Select * from dbo.maintable
EXCEPT
Select * from #tempExcludeValues
The results will give you all records that aren't in your predefined list. A second way is to do the join like Larnu has mentioned in the comment above. NOT IN is typically not the fastest way to do things on larger datasets. JOINs are by far the most efficient method of filtering data. Many times better than using a IN or NOT IN clause.

How to Get Sum of One Column Based On Other Table in Sql Server

I have 2 table in my database (like this):
tblCustomers:
id CustomerName
1 aaa
2 bbb
3 ccc
4 ddd
5 eee
6 fff
tblPurchases:
id CustomerID Price
1 1 300
2 2 100
3 3 500
4 1 150
5 4 50
6 3 250
7 6 700
8 2 30
9 1 310
10 4 25
Now I want with "Stored Procedures" take a new table that give me the sum of price for each customer. Exactly like under.
How can do that?
Procedures Result:
id CustomerName SumPrice
1 aaa 760
2 bbb 130
3 ccc 750
4 ddd 75
5 eee 0
6 fff 700

select c.id, c.customername, sum(isnull(p.price, 0)) as sumprice
from tblcustomers c
left join tblpurchases p
on c.id = p.customerid
group by c.id, c.customername
SQL Fiddle test: http://sqlfiddle.com/#!3/9b573/1/0
Note the need for an outer join because your desired result includes customers with no purchases.

You can use the below query to get the result
select id,CustomerName,sum(price) as TotalPrice
from
(
select tc.id,tc.CustomerName,tp.price
from tblCustomers tc
join
tblPurchases tp on tc.id = tp.CustomerID
) tab
group by id,CustomerName

Although the other answers here do work, they don't appear to be what I would consider standard practice, or optimal.
The simplest solution (standard, but not always optimal) requires no sub-query of any variety.
SELECT
cust.id,
cust.CustomerName,
SUM(prch.price) AS SumPrice
FROM
tblCustomers AS cust
INNER JOIN
tblPurchases AS prch
ON cust.id = prch.CustomerID
GROUP BY
cust.id,
cust.CustomerName
The only reason that this is not necessarily optimal is that it involves grouping by two fields, one of which is a string. This involves creating 'counters' in memory that are identified by this composite of an id and string, which can be inefficient due to the fact that you only really need to use the id to uniquely identify the counter. (The identifier is only one item and is a small (probably only 4 bytes), rather than multiple items one of which is long (potentially many many bytes)).
This means that you can do the following as a possible optimisation. Though depending on your data this many be a premature optimsation, it has no performance down-side and is always good to know about...
SELECT
cust.id,
cust.CustomerName,
prch.SumPrice
FROM
tblCustomers AS cust
INNER JOIN
(
SELECT
CustomerID,
SUM(price) AS SumPrice
FROM
tblPurchases
GROUP BY
CustomerID
) AS prch
ON cust.id = prch.CustomerID
This makes the in-memory aggregation as simple as possible, as so as quick as possible.
In both cases you should have the best possible efficiency in the query by ensuring that you have indexes on tblCustomer(id) and on tblPurchases(CustomerID),

DECLARE #tblcustomers table (id int, customername varchar(10));
insert into #tblcustomers values (1, 'aaa');
insert into #tblcustomers values (2, 'bbb');
insert into #tblcustomers values (3, 'ccc');
insert into #tblcustomers values (4, 'ddd');
insert into #tblcustomers values (5, 'eee');
insert into #tblcustomers values (6, 'fff');
DECLARE #tblpurchases table (id int, customerid int, price int);
insert into #tblpurchases values (1, 1, 300);
insert into #tblpurchases values (2, 2, 100);
insert into #tblpurchases values (3, 3, 500);
insert into #tblpurchases values (4, 1, 150);
insert into #tblpurchases values (5, 4, 50);
insert into #tblpurchases values (6, 3, 250);
insert into #tblpurchases values (7, 6, 700);
insert into #tblpurchases values (8, 2, 30);
insert into #tblpurchases values (9, 1, 310);
insert into #tblpurchases values (10, 4, 25);
WITH CTE AS(
select c.id,c.customername from #tblcustomers c
)
Select c.id,c.customername,(Select SUM(ISNULL(P.price,0)) from #tblpurchases P
WHERE P.customerid = C.id) AS Price from CTE c

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas