Efficient way to check if column has all certain values - sql

I am using following query to show only those inspectors who have got both qualifications.
DECLARE #CertType QualificationType; --2,3
select i.InspectorID from Inspectors i
INNER JOIN (
SELECT _id.InspectorID
FROM InspectorDocs _id
WHERE _id.QualificationTypeID IN (select [QualificationTypeID] from #CertType) GROUP BY _id.InspectorID
HAVING COUNT(DISTINCT _id.QualificationTypeID) = (select count(*) from #CertType)
) as id on id.inspectorid = i.inspectorid
Is there any better way to find if column has all given values?
Schema
Inspectors: Inspectors (PK)
InspectorDocs : DocID (PK) , InspectorID (FK), QualificationTypeID (FK)
QualificationTypE : QualificationTypeID (PK)

If I'm not mistaken what you're doing is what's known as a relational division, and another (slightly non-intuitive) way to express this is the following query, which should give better performance:
select * from Inspectors i
where not exists (
select * from QualificationType c
where QualificationTypeID IN (2,3)
and not exists (
select * from InspectorDocs id
where c.QualificationTypeID = id.QualificationTypeID
and id.InspectorID = i.InspectorID))
If you want to dig deeper into this subject I recommend reading Divided We Stand: The SQL of Relational Division by Joe Celko.

Related

SQL - How to get values from multiple tables without being ambiguous

Apologies if this question had been asked before (it probably did). I never used SQL before and the answers I've got only got me more confused.
I need to find out if an ID exists on different tables and get the total number from all tables.
Here is my query:
select * from public.ui1, public.ui2, public.ui3 where id = '123'
So if id 123 doesn't exist in ui1 and ui2 but does exist in ui3, I'd still like to get it. (I would obviously like to get it if it exists in the other tables)
I am currently getting an ambiguous error message as id exists in all tables but I am not sure how to construct this query in the appropriate manner. I tried join but failed miserably. Any help on how to reconstruct it and a stupid proof explanation would be highly appreciated!
EDIT: What I would finally like to find out is if id = 123 exists in any of the tables.
It's a bit unclear what the result is you expect. If you want the count then you can use a UNION ALL
select 'ui1' as source_table,
count(*) as num_rows
from public.ui1
where id = 123
union all
select 'ui2',
count(*)
from public.ui2
where id = 123
union all
select 'ui3',
count(*)
from public.ui3
where id = 123
If you only want to know if the id exists in at least one of the tables (so a true/false) result you can use:
select exists (select id from ui1 where id = 123
union all
select id from ui2 where id = 123
union all
select id from ui3 where id = 123)
What I would finally like to find out is if id = 123 exists in any of the tables.
The best way to do this is probably just using exists:
select v.id,
(exists (select 1 from public.ui1 t where t.id = v.id) or
exists (select 1 from public.ui2 t where t.id = v.id) or
exists (select 1 from public.ui3 t where t.id = v.id)
) as exists_flag
from (values (123)) v(id);
As written, this returns one row per id defined in values(), along with a flag of whether or not the id exists -- the question you are asking.
This can easily be tweaked if you want additional information, such as which tables the id exists in, or the number of times each appears.

Intersection of Records in Postgres

Suppose I have labels with multiple stores associated with them like so:
label_id | store_id
--------------------
label_1 | store_1
label_1 | store_2
label_1 | store_3
label_2 | store_2
label_2 | store_3
label_3 | store_1
label_3 | store_2
Is there any good way in SQL (or jooq) to get all the store ids in the intersection of the labels? Meaning just return store_2 in the example above because store_2 is associated with label_1, label_2, and label_3? I would like a general method to handle the case where I have n labels.
This is a relational division problem, where you want the stores that have all possible labels. Here is an approach using aggregation:
select store_id
from mytable
group by store_id
having count(*) = (select count(distinct label_id) from mytable)
Note that this assumes no duplicate (store_id, label_id) tuples. Otherwise, you need to change the having clause to:
having count(distinct label_id) = (select count(distinct label_id) from mytable)
Since you're also looking for a jOOQ solution, jOOQ supports a synthetic relational division operator, which produces a more academic approach to relational division, using relational algebra operators only:
// Using jOOQ
T t1 = T.as("t1");
T t2 = T.as("t2");
ctx.select()
.from(t1.divideBy(t2).on(t1.LABEL_ID.eq(t2.LABEL_ID)).returning(t1.STORE_ID).as("t"))
.fetch();
This produces something like the following query:
select t.store_id
from (
select distinct dividend.store_id
from t dividend
where not exists (
select 1
from t t2
where not exists (
select 1
from t t1
where dividend.store_id = t1.store_id
and t1.label_id = t2.label_id
)
)
) t
In plain English:
Get me all the stores (dividend), for which there exists no label (t2) for which that store (dividend) has no entry (t1)
Or in other words
If there was a label (t2) that a store (dividend) does not have (t1), then that store (dividend) would not have all the available labels.
This isn't necessarily more readable or faster than GROUP BY / HAVING COUNT(*) based implementations of relational divisions (as seen in other answers), in fact, the GROUP BY / HAVING based solutions are probably preferrable here, especially since only one table is involved. A future version of jOOQ might use the GROUP BY / HAVING approach, instead: #10450
But in jOOQ, it might be quite convenient to write this way, and you asked for a jOOQ solution :)
Then convert the query by #GMB into an SQL function that takes an array and returns a table of store_id's.
create or replace
function stores_with_all_labels( label_list text[] )
returns table (store_id text)
language sql
as $$
select store_id
from label_store
where label_id = any (label_list)
group by store_id
having count(*) = array_length(label_list,1);
$$;
Then all that's needed is a simple select. See complete example here.
If there are three particular labels you want, you can use:
select store_id
from t
where label in (1, 2, 3)
group by store_id
having count(*) = 3;
If you want only those three labels and nothing else, then:
select store_id
from t
group by store_id
having count(*) = 3 and
count(*) filter (where label in (1, 2, 3)) = count(*);

SQL Views (id + count_table1_column1 + count_table2_column_1)

Im doing following query to select out a serialnumber from table Alerts, and then count how many alerts there is for that serialnumber together with the count on how many measurements there also is for that serialnumber. Measurements is stored in another table. (first 2 queries is jsut there to show you the result for better understanding)
SELECT InstrumentSerialNumber FROM [dbo].[CloudMeasurements]
SELECT InstrumentSerialNumber FROM [dbo].[CloudAlerts]
SELECT
DISTINCT InstrumentSerialNumber,
(SELECT COUNT(*) FROM [CloudAlerts] WHERE [CloudAlerts].InstrumentSerialNumber = InstrumentSerialNumber) AS Alerts,
(SELECT COUNT(*) FROM [CloudMeasurements] WHERE [CloudMeasurements].InstrumentSerialNumber = InstrumentSerialNumber) AS Measurements
FROM [CloudAlerts]
Result
See picture for result of the query.
I assume it respond with Count(*) summarized which makes it wrong from my perspective. How do I write this?
Greetings
Try joining the results of their groups:
SELECT
A.InstrumentSerialNumber,
A.TotalAlerts,
ISNULL(M.TotalMeasurements, 0) TotalMeasurements
FROM
(SELECT InstrumentSerialNumber, COUNT(*) TotalAlerts FROM [CloudAlerts] GROUP BY InstrumentSerialNumber) AS A
LEFT JOIN (SELECT InstrumentSerialNumber, COUNT(*) TotalMeasurements FROM [CloudMeasurements] GROUP BY InstrumentSerialNumber)
AS M ON M.InstrumentSerialNumber = A.InstrumentSerialNumber

How Make a Hierarchical Selection with SQL Query

I have a problem in creating a SQL Query as follows :
I Have 2 Tables with following Specification and data:
http://dc699.4shared.com/img/lgtP3N_4ce/s3/144c7252ff8/SQL1.jpg
I want to create a SQL Select Query to return for me a Hierarchical Model Like this :
For example if the SID is 3 it should return for me this :
http://dc699.4shared.com/img/8UufpK2-ce/s3/144c7255af0/SQL2.jpg
Because the Num 3 in structure table related to data 7,8,9 and 9 is related to 10,11(Note that No 9 is related to 3 or in other words 9 is subset of 3)
Can anyone help me to create this Query? I have try for 2 weeks but I failed :(
Thanks so much
You can also try an Rank solution like this one
WITH Personel_Structure AS
(
SELECT [SID],MID, RANK() OVER(PARTITION BY [SID] ORDER BY MID ASC) AS POS
FROM Structure
WHERE [SID] = 3
)
SELECT [SID],MID
FROM Personel_Structure
ORDER BY POS ASC
I have script this against the structure table if you need to do a join to the personel table that should be easy from here. Just join the the tables in the CTE.
Untested answer, and it does not include the root member for readability and because your examples in the question and comments are inconsistent. This should get you going.
I made the query starting with root = 1
WITH members (id)
AS
(SELECT MID as id FROM structure WHERE SID = 1
UNION ALL
SELECT MID as id
FROM members
INNER JOIN structure ON (members.id = structure.SID)
)
SELECT members.ID FROM members;
members is the intermediary table created by the CTE (WITH...)
sqlfiddle

Remove duplicates (1 to many) or write a subquery that solves my problem

Referring to the diagram below the records table has unique Records. Each record is updated, via comments through an Update Table. When I join the two I get lots of duplicates.
How to remove duplicates? Group By does not work for me as I have more than 10 fields in select query and some of them are functions.
Write a sub query which pulls the last updates in the Update table for each record that is updated in a particular month. Joining with this sub query will solve my problem.
Thanks!
Edit
Table structure that is of interest is
create table Records(
recordID int,
90more_fields various
)
create table Updates(
update_id int,
record_id int,
comment text,
byUser varchar(25),
datecreate datetime
)
Here's one way.
SELECT * /*But list columns explicitly*/
FROM Orange o
CROSS APPLY (SELECT TOP 1 *
FROM Blue b
WHERE b.datecreate >= '20110901'
AND b.datecreate < '20111001'
AND o.RecordID = b.Record_ID2
ORDER BY b.datecreate DESC) b
Based on the limited information available...
WITH cteLastUpdate AS (
SELECT Record_ID2, UpdateDateTime,
ROW_NUMBER() OVER(PARTITION BY Record_ID2 ORDER BY UpdateDateTime DESC) AS RowNUM
FROM BlueTable
/* Add WHERE clause if needed to restrict date range */
)
SELECT *
FROM cteLastUpdate lu
INNER JOIN OrangeTable o
ON lu.Record_ID2 = o.RecordID
WHERE lu.RowNum = 1
Last updates per record and month:
SELECT *
FROM UPDATES outerUpd
WHERE exists
(
-- Magic part
SELECT 1
FROM UPDATES innerUpd
WHERE innerUpd.RecordId = outerUpd.RecordId
GROUP BY RecordId
, date_part('year', innerUpd.datecolumn)
, date_part('month', innerUpd.datecolumn)
HAVING max(innerUpd.datecolumn) = outerUpd.datecolumn
)
(Works on PostgreSQL, date_part is different in other RDBMS)