Optimized SQL select for selecting across multiple tables - sql

I have the following tables (testing with SQLite).
create table group_header (id int, minCount int, maxCount int);
create table group_items(id int, group_id int, product varchar(10), cons varchar(10));
The group_header has the following records:
id|minCount|maxCount
1|2|2
The group_items has the following records:
id|group_id|product|cons
1|1|A|optional
2|1|B|optional
3|1|C|required
4|1|D|optional
The SQL query should return group_header id that satisfies the following conditions:
Input will be one or more products (e.g. 'A' and 'C')
SQL query should check for the following criteria:
the minCount should be fulfilled for the optional products (i.e. count of optional items in the the input should be >= minCount value)
all the required products in the group_item table should exist in the input product list
Brute force way this can be done by doing the following:
select * from group_items where group_id = 1 and product in ('A') or cons='required';
and then evaluating the minCount and required separately outside SQL. Any suggestions if this can be in a more optimized way using SQL.

If I understand correctly, this is aggregation with a having clause.
select gh.id
from group_header gh left join
group_items gi
on gh.id = gi.group_id
group by gh.id
having sum( case when gi.cons = 'required' and gi.product in ('A') then 1 else 0 end) = 1 and
sum( case when gi.cons = 'optional' then 1 else 0 end) >= gh.mincount;

Related

SQL query is saying too many columns in sub query

I have two tables: a candidate table with candidate ID as a main key and the second table is one of educations linking candidate ID with the school they went to.
I want to filter schools where there are 50 or more candidates from that school. I also want the candidate names too.
select candidates.first_name, candidates.last_name
from candidates
where candidates.id IN (select e.candidate_id, e.school_name, count(e.school_name)
from educations e
group by e.candidate_id, e.school_name
having count(e.school_name) >= 50)
I'm getting an error that says:
Subquery has too many columns
When you are using a subquery inside an IN condition, your subquery can only return a single column.
As Stu already said in the coment, a EXISTS would be faster than an IN clause
In ana IN your subselect only can return so many columns as a defined by the column name(s) before the IN
This example of a query is for MySQL, but it should work on any Databse system
and of course is simplified
CREATE tABLE candidates (id int, first_name varchar(10), last_name varchar(10))
INSERT INTO candidates VALUEs(1,'a','an'),(2,'b','bn')
Records: 2 Duplicates: 0 Warnings: 0
create TablE educations (id int, candidate_id int,school_name varchar(10))
INSERT INTO educations VALUES (1,1,'school A'),(2,1,'school B'),(3,1,'school C'),(4,1,'school D')
,(5,1,'school E'),(6,2,'school A'),(7,2,'school B'),(9,2,'school C')
Records: 8 Duplicates: 0 Warnings: 0
select candidates.first_name, candidates.last_name
from candidates
where EXISTS (select 1
from educations e
WHERE e.candidate_id = candidates.id
having count(e.school_name) >= 5)
first_name
last_name
a
an
fiddle

How to not display an item in select query?

I feel a little stupid asking this because I feel like this is very easy, but for some reason I'm not able to update a query to not select a specific item based on two criteria.
Let's say I have data like this:
ID Name Variant Count1
110 Bob Type1 0
110 Bob Type2 1
120 John Type1 1
So as you can see we have two BOB rows with same ID but different variant (type1 and type2). I want to be able to only see one of the Bob's.
Desired result:
110 Bob Type2
120 John Type1
So what I've been doing is something like
Select ID, Name, Variant, sum(count1) from tbl1
where (id not in (110) and Variant <> 'type1')
Group by Id,name,variant
Please don't use COUNT as a criteria, because in my example it just so happens that Count=0 for the row that I don't want to see. It can vary.
I have many rows where I can have multiple instances of the same id with a variety of different VARIANTS. I'm looking to exclude certain instances of ID based on Variant value
UPDATE:
It has nothing to do with latest variant, it has to do with a specific variant. So I'm just looking to basically be able to use a clause where i used the ID and VARIANT, in order to remove that particular row.
Aggregating (grouping) the data like you're doing is one way to do it, although the where condition is a little overkill. If all you want to do is see the unique combinations of ID and Name, then another approach is just to use the "distinct" statement.
select distinct Id, Name
from tbl1
If you always want to see data from a specific Variant then just include that condition in your where clause and you don't need to worry about using distinct or aggregates.
select *
from tbl1
where Variant = 'Type 1'
If you always want to see the record associated with the latest Variant, then you can use a window function to do so.
select a.Id, a.Name, a.Variant
from
(
select *, row_number() over (partition by Id order by Variant desc) as RowRank
from tbl1
) a
where RowRank = 1
;
If there is not a predictable pattern for exclusion then you will have to maintain an exclusion list. It's not ideal but if you want to maintain this in the SQL itself then you could have a query like the one below.
select *
from tbl1
-- Define rows to exlcude
where not (Id = 110 and Variant = 'Type 1') -- Your example
and not (Id = 110 and Variant = 'Type 3') -- Theoretical example
;
A better solution would be to create an exclusion reference table to maintain all exclusions within. Then you could simply negative join to that table to retrieve your desired results.
Have you considered using an exclusion table where you can place the ID and Variant combinations that you want to exclude? ( I just used temp tables for this example, you can always use user tables so your exclusion table will always be available)
Here is an example of what I mean based on your example:
if object_id('tempdb..#temp') is not null
drop table #temp
create table #temp (
ID int,
Name varchar(20),
Variant varchar(20),
Count1 int
)
if object_id('tempdb..#tempExclude') is not null
drop table #tempExclude
create table #tempExclude (
ID int,
Variant varchar(20)
)
insert into #temp values
(110,'Bob','Type1',0),
(110,'Bob','Type2',1),
(120,'John','Type1',1),
(120,'John','Type2',1),
(120,'John','Type2',1),
(120,'John','Type2',1),
(120,'John','Type3',1)
insert into #tempExclude values (110,'Type1')
select
t.ID,
t.Name
,t.Variant
,sum(t.Count1) as TotalCount
from
#temp t
left join
#tempExclude te
on t.ID = te.ID
and t.Variant = te.Variant
where
te.id is null
group by
t.ID,
t.Name
,t.Variant
Here are the results:
I think the logic you want is something like:
Select ID, Name, Variant, sum(count1)
from tbl1
where not (id = 110 and variant = 'type1')
Group by Id, name, variant;
For the second condition, just keep adding:
where not (id = 110 and variant = 'type1') and
not (id = 314 and variant = 'popsicle')
You can also express this using a list of exclusions:
select t.ID, Name, t.Variant, sum(t.count1)
from tbl1 t left join
(values (111, 'type1'),
(314, 'popsicle')
) v(id, excluded_variant)
on t.id = v.id and
t.variant = v.excluded_variant
where v.id is not null -- doesn't match an exclusion criterion
group by Id, name, variant;

Select rows base on Subset

I've a scenario where I need to write sql query base on result of other query.
Consider the table data:
id attribute
1 a
1 b
2 a
3 a
3 b
3 c
I want to write query to select id base on attribute set.
I mean first I need to check attribute of id 1 using this query:
select attribute from table where id = 1
then base on this result I need to select subset of attribute. like in our case 1(a,b) is the subset of 3(a,b,c). My query should return 3 on that case.
And if I want to check base on 2(a) which is the subset of 1(a,b) and 3(a,b,c), it should return 1 and 3.
I hope, it's understandable. :)
You could use this query.
Logic is simple: If there isn't any item in A and isn't in B --> A is subset of B.
DECLARE #SampleData AS TABLE
(
Id int, attribute varchar(5)
)
INSERT INTO #SampleData
VALUES (1,'a'), (1,'b'),
(2,'a'),
(3,'a'),(3,'b'),(3,'c')
DECLARE #FilterId int = 1
;WITH temp AS
(
SELECT DISTINCT sd.Id FROM #SampleData sd
)
SELECT * FROM temp t
WHERE t.Id <> #FilterId
AND NOT EXISTS (
SELECT sd2.attribute FROM #SampleData sd2
WHERE sd2.Id = #FilterId
AND NOT EXISTS (SELECT * FROM #SampleData sd WHERE sd.Id = t.Id AND sd.attribute = sd2.attribute)
)
Demo link: Rextester
I would compose a query for that in three steps: first I'd get the attributes of the desired id, and this is the query you wrote
select attribute from table where id = 1
Then I would get the number of attributes for the required id
select count(distinct attribute) from table where id = 1
Finally I would use the above results as filters
select id
from table
where id <> 1 and
attribute in (
select attribute from table where id = 1 /* Step 1 */
)
group by id
having count(distinct attribute) = (
select count(distinct attribute) from table where id = 1 /* Step 2 */
)
This will get you all the id's that have a number of attributes among those of the initially provided id equal to the number the initial id has.

SQL 'GROUP BY' to filter an array of 'text' data type

I am new to SQL and I an trying to understand the GROUP BY statement.
I have inserted the following data in SQL:
CREATE TABLE table( id integer, type text);
INSERT INTO table VALUES (1,'start');
INSERT INTO table VALUES (2,'start');
INSERT INTO table VALUES (2,'complete');
INSERT INTO table VALUES (3,'complete');
INSERT INTO table VALUES (3,'start');
INSERT INTO table VALUES (4,'start');
I want to select those IDs that do not have a type 'complete'. For this example I should get IDs 1, 4.
I have tried multiple GROUP BY - HAVING combinations. My best approach is:
SELECT id from customers group by type having type!='complete';
but the resulted IDs are 4,3,2.
Could anyone give me a hint about what I am doing wrong?
You are close. The having clause needs an aggregation function and you need to aggregate by id:
select id
from table t
group by id
having sum(case when type = 'complete' then 1 else 0 end) = 0;
Normally, if you have something called an id, you would also have a table with that as primary key. If so, you can also do:
select it.id
from idtable it
where not exists (select 1
from table t
where t.type = 'complete' and it.id = t.id
);

More efficient way of doing multiple joins to the same table and a "case when" in the select

At my organization clients can be enrolled in multiple programs at one time. I have a table with a list of all of the programs a client has been enrolled as unique rows in and the dates they were enrolled in that program.
Using an External join I can take any client name and a date from a table (say a table of tests that the clients have completed) and have it return all of the programs that client was in on that particular date. If a client was in multiple programs on that date it duplicates the data from that table for each program they were in on that date.
The problem I have is that I am looking for it to only return one program as their "Primary Program" for each client and date even if they were in multiple programs on that date. I have created a hierarchy for which program should be selected as their primary program and returned.
For Example:
1.)Inpatient
2.)Outpatient Clinical
3.)Outpatient Vocational
4.)Outpatient Recreational
So if a client was enrolled in Outpatient Clinical, Outpatient Vocational, Outpatient Recreational at the same time on that date it would only return "Outpatient Clinical" as the program.
My way of thinking for doing this would be to join to the table with the previous programs multiple times like this:
FROM dbo.TestTable as TestTable
LEFT OUTER JOIN dbo.PreviousPrograms as PreviousPrograms1
ON TestTable.date = PreviousPrograms1.date AND PreviousPrograms1.type = 'Inpatient'
LEFT OUTER JOIN dbo.PreviousPrograms as PreviousPrograms2
ON TestTable.date = PreviousPrograms2.date AND PreviousPrograms2.type = 'Outpatient Clinical'
LEFT OUTER JOIN dbo.PreviousPrograms as PreviousPrograms3
ON TestTable.date = PreviousPrograms3.date AND PreviousPrograms3.type = 'Outpatient Vocational'
LEFT OUTER JOIN dbo.PreviousPrograms as PreviousPrograms4
ON TestTable.date = PreviousPrograms4.date AND PreviousPrograms4.type = 'Outpatient Recreational'
and then do a condition CASE WHEN in the SELECT statement as such:
SELECT
CASE
WHEN PreviousPrograms1.name IS NOT NULL
THEN PreviousPrograms1.name
WHEN PreviousPrograms1.name IS NULL AND PreviousPrograms2.name IS NOT NULL
THEN PreviousPrograms2.name
WHEN PreviousPrograms1.name IS NULL AND PreviousPrograms2.name IS NULL AND PreviousPrograms3.name IS NOT NULL
THEN PreviousPrograms3.name
WHEN PreviousPrograms1.name IS NULL AND PreviousPrograms2.name IS NULL AND PreviousPrograms3.name IS NOT NULL AND PreviousPrograms4.name IS NOT NULL
THEN PreviousPrograms4.name
ELSE NULL
END as PrimaryProgram
The bigger problem is that in my actual table there are a lot more than just four possible programs it could be and the CASE WHEN select statement and the JOINs are already cumbersome enough.
Is there a more efficient way to do either the SELECTs part or the JOIN part? Or possibly a better way to do it all together?
I'm using SQL Server 2008.
You can simplify (replace) your CASE by using COALESCE() instead:
SELECT
COALESCE(PreviousPrograms1.name, PreviousPrograms2.name,
PreviousPrograms3.name, PreviousPrograms4.name) AS PreviousProgram
COALESCE() returns the first non-null value.
Due to your design, you still need the JOINs, but it would be much easier to read if you used very short aliases, for example PP1 instead of PreviousPrograms1 - it's just a lot less code noise.
You can simplify the Join by using a bridge table containing all the program types and their priority (my sql server syntax is a bit rusty):
create table BridgeTable (
programType varchar(30),
programPriority smallint
);
This table will hold all the program types and the program priority will reflect the priority you've specified in your question.
As for the part of the case, that will depend on the number of records involved. One of the tricks that I usually do is this (assuming programPriority is a number between 10 and 99 and no type can have more than 30 bytes, because I'm being lazy):
Select patient, date,
substr( min(cast(BridgeTable.programPriority as varchar) || PreviousPrograms.type), 3, 30)
From dbo.TestTable as TestTable
Inner Join dbo.BridgeTable as BridgeTable
Left Outer Join dbo.PreviousPrograms as PreviousPrograms
on PreviousPrograms.type = BridgeTable.programType
and TestTable.date = PreviousPrograms.date
Group by patient, date
You can achieve this using sub-queries, or you could refactor it to use CTEs, take a look at the following and see if it makes sense:
DECLARE #testTable TABLE
(
[id] INT IDENTITY(1, 1),
[date] datetime
)
DECLARE #previousPrograms TABLE
(
[id] INT IDENTITY(1,1),
[date] datetime,
[type] varchar(50)
)
INSERT INTO #testTable ([date])
SELECT '2013-08-08'
UNION ALL SELECT '2013-08-07'
UNION ALL SELECT '2013-08-06'
INSERT INTO #previousPrograms ([date], [type])
-- a sample user as an inpatient
SELECT '2013-08-08', 'Inpatient'
-- your use case of someone being enrolled in all 3 outpation programs
UNION ALL SELECT '2013-08-07', 'Outpatient Recreational'
UNION ALL SELECT '2013-08-07', 'Outpatient Clinical'
UNION ALL SELECT '2013-08-07', 'Outpatient Vocational'
-- showing our workings, this is what we'll join to
SELECT
PPP.[date],
PPP.[type],
ROW_NUMBER() OVER (PARTITION BY PPP.[date] ORDER BY PPP.[Priority]) AS [RowNumber]
FROM (
SELECT
[type],
[date],
CASE
WHEN [type] = 'Inpatient' THEN 1
WHEN [type] = 'Outpatient Clinical' THEN 2
WHEN [type] = 'Outpatient Vocational' THEN 3
WHEN [type] = 'Outpatient Recreational' THEN 4
ELSE 999
END AS [Priority]
FROM #previousPrograms
) PPP -- Previous Programs w/ Priority
SELECT
T.[date],
PPPO.[type]
FROM #testTable T
LEFT JOIN (
SELECT
PPP.[date],
PPP.[type],
ROW_NUMBER() OVER (PARTITION BY PPP.[date] ORDER BY PPP.[Priority]) AS [RowNumber]
FROM (
SELECT
[type],
[date],
CASE
WHEN [type] = 'Inpatient' THEN 1
WHEN [type] = 'Outpatient Clinical' THEN 2
WHEN [type] = 'Outpatient Vocational' THEN 3
WHEN [type] = 'Outpatient Recreational' THEN 4
ELSE 999
END AS [Priority]
FROM #previousPrograms
) PPP -- Previous Programs w/ Priority
) PPPO -- Previous Programs w/ Priority + Order
ON T.[date] = PPPO.[date] AND PPPO.[RowNumber] = 1
Basically we have our deepest sub-select giving all PreviousPrograms a priority based on type, then our wrapping sub-select gives them row numbers per date so we can select only the ones with a row number of 1.
I am guessing you would need to include a UR Number or some other patient identifier, simply add that as an output to both sub-selects and change the join.