Search for specific values in data using a lookup table - sql

I have a lookup table with values I want to check for in the data.
The problem is somewhat like this:
-- Data with an ID, a group (which is a number) and some letters which belong to that group.
select *
into #data
from (values
(1, 45, 'A'),
(1, 45, 'B'),
(1, 45, 'C'),
(2, 45, 'D'))
as data(id, number, letter)
-- The various letters that I expect for each ID in a specific group
select *
into #expected_letters
from (values
(45, 'A'),
(45, 'D'),
(45, 'E'),
(123, 'A'),
(123, 'Q'))
as expected_letters(number, letter)
The results that I expect from a query are all letters (from all ids from #data) that I expect belonging to that group, but are not there. So these results actually:
(1, 45, D)
(1, 45, E)
(2, 45, A)
(2, 45, E)
In my problem the list is a lot longer with more groups and more id's. I've tried a lot with different joins and set operators, but I can't seem to get my head around this problem.
Some help would be much appreciated.

This is my version which is very similar but uses an outer apply instead of multiple joins. :-
select distinct d.id, aa.number,aa.letter from #data d
outer apply (select * from #expected_letters el where el.number=d.number and el.letter not in
(select letter from #data dt where dt.number=d.number and dt.id=d.id)
) aa

Here's what I tried, and it seems to work. The last inner join aliased "nums" is to remove number 123 from your results, since it doesn't exist for any ID in #data.
select e.*, ids.id from #expected_letters e
cross join (select distinct id from #data) ids
full join #data d on e.number = d.number and e.letter = d.letter and d.id = ids.id
inner join (select distinct number from #data) nums on e.number = nums.number
where
d.id is null
--result:
number letter id
45 A 2
45 D 1
45 E 1
45 E 2

Related

Programmatically assign NULL to certain columns for certain rows when unioning datasets

I'm trying to figure out a way to programmatically assign NULL to certain columns for certain rows when unioning 2 datasets together. This is most easily explained using an example. The rows in #stage2 need to display NULL in columns cost_center3, cost_center14 in the final dataset. The code below works but it is a manual approach and not dynamic if more cost_center columns need to be added.
select *
into #stage1
from
(
values
(42, 170, 44, 827),
(43, 170, 68, 880),
(44, 190, 31, 745)
) d (work_center, plant, cost_center3, cost_center14);
select *
into #stage2
from
(
values
(10, 200),
(11, 200),
(12, 200)
) d (work_center, plant);
--manual approach - need to find a programmatic way to do this
select * from #stage1
union
select *, NULL, NULL from #stage2;
In the actual business use case, there are several more cost_center columns than are shown in this example - thus the need to find a way to programmatically do this task.
I have experimented with CROSS APPLY like this
select s1.*, s2.*
from #stage1 s1
cross apply #stage2 s2;
but it is essentially cross joining the datasets and that is not the desired outcome.
Can this task be done programmatically and concisely?
Here's what I ended up using, even though the number of NULLs is not dynamic:
select * from #stage1
union
select * from #stage2 cross join (values (null, null)) d (cost_center4, cost_center14);

JOIN exclude records in second table

In my system one order can have associated X documents, and each documents have an document code.
One example of my schema with data:
Order
Document
Code Doc
1
101
5E
1
102
5E
1
103
1DE
2
201
5E
The table of the orders is PDOCAS and the documents save in the table DOCCAB.
I would like when join the orders with the documents, that if one of the types of documents is 1DE, do not bring the order.
select p.DocCabIdDeb as 'Order', d.DocCabId as 'Document', d.DocCod as 'Code Doc'
from PDOCAS p
JOIN DOCCAB d on p.DocCabIdHab=d.DocCabId
WHERE NOT EXISTS(select * from DOCCAB ds where ds.DocCabId=d.DocCabId and doccod='1DE')
and p.DocCabIdDeb in (1, 2)
In this case, it returns me the order 1 and the 5E document codes, and I don't want it to return it because one of the documents of order 1 is 1DE.
Should I join the tables in another way?
Thanks.
I made few assumptions on the data
You need an inner query or CTEs which filter out the orders containing the blacklisted Doc codes
So you need a distinct order where document code in '1de'
and then you should filter out the orders from original table with an order_id not in condition.
Below is an example query with CTE
WITH
-- Setting up the data
PDOCAS AS (
SELECT * FROM (
VALUES
(1, 101),
(1, 102),
(1, 103),
(2, 201)
) t(DocCabIdDeb, DocCabIdHab)
),
DOCCAB AS (
SELECT * FROM (
VALUES
(101, '5E'),
(102, '5E'),
(103, '1DE'),
(201, '5E')
) t(DocCabId, DocCod)
),
-- Data setup ends
-- Your actual query starts from here
ORDERS AS (
select
p.DocCabIdDeb as "OrderId",
d.DocCabId as "Document",
d.DocCod as "CodeDoc"
from
PDOCAS p
JOIN DOCCAB d on p.DocCabIdHab=d.DocCabId
WHERE
p.DocCabIdDeb in (1, 2)
),
ORDERS_WITH_BLACKLISTED_DOCCOD AS (
SELECT DISTINCT
o.OrderId
FROM
ORDERS o
WHERE
o.CodeDoc in ('1DE')
),
FINAL_ORDERS AS (
SELECT
*
FROM
ORDERS o
WHERE
o.OrderId NOT IN (SELECT * FROM ORDERS_WITH_BLACKLISTED_DOCCOD)
)
SELECT * FROM FINAL_ORDERS

Find Max value and assign the value by group by id for non numeric field

I am trying to assign the max value by grouping id field. If the id has L and M the result should be M, if the id field has L, M and H the result should be H. If it has only one value, the same value should return (L for L, M for M and H for H).
This is the code I tried:
select x2.id, x2.code, x1.output
from
(
select id, max(code) as output
from
table
group by id
)x1,
select id,code
from
table
)x2
where x1.id = x2.id
order by id
The result set is not as expected. Where am I going wrong?
Here is one option using a window min():
select t.*,
case min(case code
when 'H' then 1
when 'M' then 2
when 'L' then 3
end) over(partition by id)
when 1 then 'H'
when 2 then 'M'
when 3 then 'L'
end res
from mytable t
Something like this. The first CTE defines a hierarchy table.
Data
drop table if exists #tTable;
go
create table #tTable(
id int not null,
code varchar(10) not null);
Insert into #tTable values
(1, 'h'),
(2, 'l'),
(3, 'm'),
(10001, 'l'),
(10001, 'l'),
(10001, 'm'),
(10001, 'l'),
(10002, 'l'),
(10002, 'l'),
(10002, 'h'),
(10002, 'l'),
(10002, 'h'),
(10002, 'm'),
(10002, 'm');
Query
with
v_cte(h, code) as (
select *
from (values (1, 'H'),
(2, 'M'),
(3, 'L')) v(h, code)),
max_cte as (
select t.id, t.code, v.h, min(v.h) over (partition by t.id order by (select null)) min_h
from #tTable t
join v_cte v on t.code=v.code)
select mc.id, mc.code, v.code [output]
from max_cte mc
join v_cte v on mc.min_h=v.h;
Output
id code output
1 h H
2 l L
3 m M
10001 l M
10001 l M
10001 m M
10001 l M
10002 l H
10002 l H
10002 h H
10002 l H
10002 h H
10002 m H
10002 m H
It appears the code represents High, Medium, Low?
To answer your question directly, I believe where you're going wrong is you expect MAX(code) to return H. But SQL doesn't know anything about the meanings of those codes; it's returning the code with the highest character/lexicographic value, which in ASCII/Unicode would be 'M'.
So you need a lookup table (as suggested by #SteveC) or a lookup expression (as suggested by #GMB) to map the codes to numeric values (in the proper order), and use that numeric value when you call MAX. I'd personally go with the table, as (in my experience) you'll probably end up with multiple queries that depend on this kind of ordering. A lookup table is a very common mechanism for encoding priority levels, status, that sort of thing.

Need solution to avoid repeated scanning in huge table

I have a event table which has 40 columns and fill up to 2 billion records. In that event table i would like to query for a combination event i.e Event A with Event B. Sometimes I may want to find more combination like Event A with B and C. It may goes to 5 or 6 combination.
I don't want to scan that table for every event in combination i.e Scanning for event A and scanning for event B. And I need a generic approach for more combination scanning as well.
Note: That 2 billion records is partitioned based on event date and data is been equally split.
Eg:
Need to find id's which has event A,B,C and need to find id's which has only A,B.
This number of combination is dynamic. I don't want to scan that table for each event and finally intersect the result.
There may be some mileage in using a sql server equivalent of the mysql group_concat function.
For example
drop table t
create table t (id int, dt date, event varchar(1))
insert into t values
(1,'2017-01-01','a'),(1,'2017-01-01','b'),(1,'2017-01-01','c'),(1,'2017-01-02','c'),(1,'2017-01-03','d'),
(2,'2017-02-01','a'),(2,'2017-02-01','b')
select id,
stuff(
(
select cast(',' as varchar(max)) + t1.event
from t as t1
WHERE t1.id = t.id
order by t1.id
for xml path('')
), 1, 1, '') AS groupconcat
from t
group by t.id
Results in
id groupconcat
----------- -----------
1 a,b,c,c,d
2 a,b
If you then add a patindex
select * from
(
select id,
stuff(
(
select cast(',' as varchar(max)) + t1.event
from t as t1
WHERE t1.id = t.id
order by t1.id
for xml path('')
), 1, 1, '') AS groupconcat
from t
group by t.id
) s
where patindex('a,b,c%',groupconcat) > 0
you get this
id groupconcat
----------- ------------
1 a,b,c,c,d
SELECT * from table as A
JOIN table AS B
ON A.Id = B.Id AND A.Date = B.Date
WHERE Date = '1-Jan'
AND A.Event = 'A'
AND B.Event = 'B'
This will give you rows, where Date is '1-Jan' and Id is same for both events.
You can join table again and again if you want to filter by more events.
The having clause allows you to filter using the result of an aggregate function. I've used a regular count but you may need a distinct count, depending on your table design.
Example:
-- Returns ids with 3 or more events.
SELECT
x.Id,
COUNT(*) AS EventCount
FROM
(
VALUES
(1, '2017-01-01', 'A'),
(1, '2017-01-01', 'B'),
(1, '2017-01-03', 'C'),
(1, '2017-01-04', 'C'),
(1, '2017-01-05', 'E'),
(2, '2017-01-01', 'A'),
(2, '2017-01-01', 'B'),
(3, '2017-01-01', 'A')
) AS x(Id, [Date], [Event])
GROUP BY
x.Id
HAVING
COUNT(*) > 2
;
Returns
Id EventCount
1 5

Grouping records by subsets SQL

I have a database with PermitHolders (PermitNum = PK) and DetailedFacilities of each Permit Holder. In the tblPermitDetails table there are 2 columns
PermitNum (foreign Key)
FacilityID (integer Foreign Key Lookup to Facility table).
A permitee can have 1 - 29 items on their permit, e.i. Permit 50 can have a Boat Dock (FacID 4), a Paved walkway (FacID 17) a Retaining Wall (FacID 20) etc. I need an SQL filter/display whatever, ALL PERMIT #s that have ONLY FacIDs 19, 20, or 28, NOT ones that have those plus "x" others,....just that subset. I've worked on this for 4 days, would someone PLEASE help me? I HAVE posted to other BB but have not received any helpful suggestions.
As Oded suggested, here are more details.
There is no PK for the tblPermitDetails table.
Let's say that we have Permitees 1 - 10; Permit 1 is John Doe, he has a Boat Dock (FacID 1), a Walkway (FacID 4), a buoy (FacID 7), and Underbrushing (FacID 19)...those are 3 records for Permit 1. Permit 2 is Sus Brown, she has ONLY underbrushing (FacID 19), Permit 3 is Steve Toni, he has a Boat Dock (FacID 1), a Walkway (FacID 4), a buoy (FacID 7), and a Retaining Wall (FacID 20). Permit 4 is Jill Jack, she has Underbrushing (FacID 19), and a Retaining Wall (FacID 20). I could go on but i hope you follow me. I want an SQL (for MS Access) that will show me ONLY Permits 2 & 4 because they have a combination of FacIDs 19 & 20 [either both, or one or the other], BUT NOT ANYTHING ELSE such as Permit 1 who has #19, but also has 4 & 7.
I hope that helps, please say so if not.
Oh yea, I DO know the difference between i.e. and e.g. since i'm in my 40's have written over 3000 pages of archaeological field reports and an MA thesis, but I'm really stressed out here from struggling with this SQL and could care less about consulting the Chicago Manual of Style before banging out a plea for help. SO, DON"T be coy about my compostion errors! Thank you!
Untested, but how about something like this?
SELECT DISTINCT p.PermitNum
FROM tblPermitDetails p
WHERE EXISTS
(SELECT '+'
FROM tblFacility f
WHERE p.FacilityID = f.FacilityID
AND f.facilityID = 19 )
AND EXISTS
(SELECT '+'
FROM tblFacility f
WHERE p.FacilityID = f.FacilityID
AND f.facilityID = 20 )
AND EXISTS
(SELECT '+'
FROM tblFacility f
WHERE p.FacilityID = f.FacilityID
AND f.facilityID = 28 )
AND NOT EXISTS
(SELECT '+'
FROM tblFacility f
WHERE p.FacilityID = f.FacilityID
AND f.facilityID NOT IN (19,20,28) )
SELECT PermitNum
FROM tblPermitDetails
WHERE FacilityID IN (19, 20, 28)
GROUP BY PermitNum
HAVING COUNT(PermitNum)=3
I wasn't sure if you wanted ALL of 19,20,28 or ANY of 19,20,28... also, this is untested, but if you want the any of solution it should be fairly close
Select
allowed.PermitNum
from
DetailedFacilties allowed
join DetailedFacilities disallowed on allowed.PermitNum != disallowed.PermitNum
where
allowed.FacilityID in (19, 20, 28)
and disallowed.FacilityID not in (19, 20, 28)
SELECT DISTINCT PermitNum FROM tblPermitDetails t1
WHERE FacilityID IN (19, 20, 28)
AND NOT EXISTS (SELECT 1 FROM tblPermitDetails t2
WHERE t2.PermitNum = t1.PermitNum
AND FacilityId NOT IN (19, 20, 28));
Or, in prose, get the list of PermitNums that have any of the requested permit numbers as long as no row exists for that PermitNum that isn't in the requested list.
A more optimized version of the same query would be the following:
SELECT PermitNum FROM (SELECT DISTINCT PermitNum FROM tblPermitDetails
WHERE FacilityID IN (19, 20, 28)) AS t1
WHERE NOT EXISTS (SELECT 1 FROM tblPermitDetails t2
WHERE t2.PermitNum = t1.PermitNum
AND FacilityID NOT IN (19, 20, 28));
It's a little harder to read, but it will involve fewer "NOT EXISTS" subqueries by doing the "DISTINCT" part first.
Update:
David-W-Fenton mentions that NOT EXISTS should be avoided for optimization reasons. For a small table, this probably won't matter much, but you could also do the query using COUNT(*) if you needed to avoid NOT EXISTS:
SELECT DISTINCT PermitNum FROM tblPermitDetails t1
WHERE (SELECT COUNT(*) FROM tblPermitDetails t2
WHERE t1.PermitNum = t2.PermitNum
AND FacilityID IN (19, 20, 28))
=
(SELECT COUNT(*) FROM tblPermitDetails t3
WHERE t1.PermitNum = t3.PermitNum)
What about (untested)
select permitnum
from tblPermitDetails t1
left outer join
(Select distinct permitnum from tblPermitDetails where facilityId not in (19, 20, or 28)) t2
on t1.permitnum=t2.permitnum
where t2.permitnum is null
i.e. we find all the permits that cannot match your criteria (they have at least one detail outside those you list), then we find all the permits that are left, via a left join and where criteria.
with indexes set up properly, this should be pretty quick.
Quick way might be to only look at the ones with exactly three matches (with an inner query), and then among those only include the ones that have 19, 20, and 28.
Of course, that is sort of a brute force method, and not very elegant. But it has the small benefit of being understandable. None of the approaches I can think of will be easy to customize to various other sets of values.
Ok, it seems i didn't understand the problem at first. So, again:
I will recreate the example by Stacy here:
DECLARE #PermitHolders TABLE
(PermitNum INT NOT NULL,
PermitHolder VARCHAR(20))
DECLARE #tblPermitDetails TABLE
(PermitNum INT,
FacilityID INT)
INSERT INTO #PermitHolders VALUES (1, 'John Doe')
INSERT INTO #PermitHolders VALUES (2, 'Sus Brown')
INSERT INTO #PermitHolders VALUES (3, 'Steve Toni')
INSERT INTO #PermitHolders VALUES (4, 'Jill Jack')
INSERT INTO #tblPermitDetails VALUES (1, 1)
INSERT INTO #tblPermitDetails VALUES (1, 4)
INSERT INTO #tblPermitDetails VALUES (1, 7)
INSERT INTO #tblPermitDetails VALUES (1, 19)
INSERT INTO #tblPermitDetails VALUES (2, 19)
INSERT INTO #tblPermitDetails VALUES (3, 1)
INSERT INTO #tblPermitDetails VALUES (3, 4)
INSERT INTO #tblPermitDetails VALUES (3, 7)
INSERT INTO #tblPermitDetails VALUES (3, 20)
INSERT INTO #tblPermitDetails VALUES (4, 19)
INSERT INTO #tblPermitDetails VALUES (4, 20)
And this is the solution:
SELECT * FROM #PermitHolders
WHERE (PermitNum IN (SELECT PermitNum FROM #tblPermitDetails WHERE FacilityID IN (19, 20, 28)))
AND (PermitNum NOT IN (SELECT PermitNum FROM #tblPermitDetails WHERE FacilityID NOT IN (19, 20, 28)))
I have one observation on the side:
You didn't mention any PK for tblPermitDetails. If non exists, this may not be good for performance. I recommend that you create a PK using both PermitNum and FacilityID (composite key) because this will serve as both your PK and a useful index for the expected queries.