Select distinct rows that contain a given set of data

Select distinct rows that contain a given set of data - sql

I have a following table:
bid | data
1 | a
1 | b
1 | c
2 | a
3 | c
3 | a
I want to select all bids that contain given set of data.
For example, all bids that 'contains' data "a" and "b" (result should be bid 1), or ones that contain "a" and "c" (1 and 3).
Only solution I could think of is kind of nasty, so I would appreciate some help/suggestions.
My first try:
select bid from my_table as t1 where
exists (select * from my_table t2 where
t2.bid = t1.bid and
t2.data='a'
)
and
exists (select * from my_table t2 where
t2.bid = t1.bid and
t2.data='b'
)
group by bid;
Thanks.

select t1.bid
from table_1 t1
inner join table_1 t2 on t1.bid = t2.bid
where t1.data = 'a' and t2.data = 'c'
By the way:
all bids that 'contains' data "a" and "b" (result should be bid 1)
--> bid 2 also contains data 'a' and 'b'

While I would not recommend this solution for only two variable lookups it's rate of growth for query cost when matching on more variables increases very slowly as opposed to doing an inner join for each match. As a disclaimer I realize that if pipe is a valid field or there are xml encoded charcters that this break.
select e.bid
from myTable e
cross apply ( select '|'+ i.data + '|'
from myTable i
where e.bid = i.bid
for xml path('')) T(v)
where v like '%|A|%' and v like '%|B|%' --and v like '%|C|%'.....
group by e.bid
as a side not about other options your answer could be simplified into
select bid from my_table as t1 where
exists (select * from my_table t2 where
t2.bid = t1.bid and
t2.data='a'
)
and t1.data = 'c'
group by bid;
This is roughly an equivalent of christian's answer. The optimizer will most likely treat these the same.
select distinct t1.bid
from table_1 t1
inner join table_1 t2 on t1.bid = t2.bid
where t1.data = 'a' and t2.data = 'c'

With a subquery, count the number of right occurences you have in your table.
SELECT DISTINCT m.bid
FROM myTable m
WHERE (
SELECT COUNT(1)
FROM myTable m2
WHERE (m2.data = 'a'
OR m2.data = 'b')
AND m.bid = m2.bid
) = 2

Maybe not the best answer but:
select bid from mytable where data = 'a'
intersect
select bid from mytable where data = 'c'

Uses exists:
declare #t table(bid int, data char)
insert #t values(1,'a'),(1,'b'),(1,'c'),(2,'b'),(2,'a'),(3,'c'),(3,'a')
select distinct t1.bid
from #t t1
where exists(
select 1
from #t t2
where t2.bid = t1.bid and t2.data = 'a'
)
and exists(
select 1
from #t t2
where t2.bid = t1.bid and t2.data = 'b'
)
XML PATH and XQuery version:
select distinct t.bid
from
(
select *
, (
select *
from #t t2
where t2.bid = t1.bid
for xml path, root('root'), type
) [x]
from #t t1
) t
where t.x.exist('root[*/data[text() = "a"] and */data[. = "b"]]') = 1

Related

Matching multiple columns in one join

I have two tables:
Table 1
item_name | assocID_1 | assocID_2 | assocID_3
ball 123 456 789
Table 2
assoc_key assoc_value
123 red
456 white
789 blue
Am I able to create an output of:
ball red white blue
With only one join? I understand I can just join the tables multiple times to easily get this result, but in my actual tables there are much more than 3 columns, and the app I'm using can only support 4 joins per query apparently.
Many thanks for any help.

If you don't care about performance, you can do:
select t1.item_name,
max(case when t2.assoc_key = t1.assocID_1 then t2.assoc_value end),
max(case when t2.assoc_key = t1.assocID_2 then t2.assoc_value end),
max(case when t2.assoc_key = t1.assocID_3 then t2.assoc_value end)
from table1 t1 join
table2 t2
on t2.assoc_key in (t1.assocID_1, t1.assocID_2, t1.assocID_3)
group by t1.item_name;
You can also use subqueries. If we assume that there is only one matching row in table2:
select t1.item_name,
(select t2.assoc_value from table2 t2 where t2.assoc_key = t1.assocID_1),
(select t2.assoc_value from table2 t2 where t2.assoc_key = t1.assocID_2),
(select t2.assoc_value from table2 t2 where t2.assoc_key = t1.assocID_3)
from table1 t1;
If there can be more than one match, you can arbitrarily choose one of them using aggregation functions:
select t1.item_name,
(select max(t2.assoc_value) from table2 t2 where t2.assoc_key = t1.assocID_1),
(select max(t2.assoc_value) from table2 t2 where t2.assoc_key = t1.assocID_2),
(select max(t2.assoc_value) from table2 t2 where t2.assoc_key = t1.assocID_3)
from table1 t1;

I do not think you need a join here. You just need to look up which you can do in the SELECT statement directly. Here is an implementation in SQL Server (In Sample Data preparation code, if you are using version older than SQL Server 2016, please replace the DROP TABLE IF EXISTS with older way of doing the same)
DDL and Test Data:
DROP TABLE IF EXISTS Table1
SELECT item_name = 'ball'
,assocID_1 = 123
,assocID_2 = 456
,assocID_3 = 789
INTO Table1
DROP TABLE IF EXISTS Table2
SELECT assoc_key = 123
,assoc_value = 'red'
INTO Table2
UNION ALL
SELECT assoc_key = 456
,assoc_value = 'white'
UNION ALL
SELECT assoc_key = 789
,assoc_value = 'blue'
SELECT * FROM Table1
SELECT * FROM Table2
1. Brute Force Approach:
SELECT item_name = T1.item_name
,(SELECT TOP 1 assoc_value FROM Table2 WHERE assoc_key = T1.assocID_1)
,(SELECT TOP 1 assoc_value FROM Table2 WHERE assoc_key = T1.assocID_2)
,(SELECT TOP 1 assoc_value FROM Table2 WHERE assoc_key = T1.assocID_3)
FROM Table1 T1
2. Dynamically Building the Query For Ease And Then Executing It. With this approach Number of Columns Would Not Be a Concern:
DECLARE #SQL NVARCHAR(MAX) = 'SELECT item_name = T1.item_name '
SELECT #SQL += '
,(SELECT TOP 1 assoc_value FROM Table2 WHERE assoc_key = T1.'+COLUMN_NAME+')'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'dbo' -- provide your proper schema name here
AND TABLE_NAME = 'Table1'
AND COLUMN_NAME <> 'item_name' -- provide the columns you want to avoid doing lookups
ORDER BY ORDINAL_POSITION
SET #SQL+='
FROM Table1 T1 '
PRINT #SQL
EXEC sp_executesql #statement=#SQL
3. Combination of UNPIVOT, JOIN and PIVOT
SELECT item_name, [assocID_1], [assocID_2], [assocID_3] -- you can dynamically build the select list like above example if you need
FROM
(
SELECT IQ.item_name, IQ.assocId, T2.assoc_value
FROM (
SELECT UNP.item_name, UNP.assocId, UNP.Value
FROM Table1 T1
UNPIVOT
(
Value FOR assocId IN ([assocId_1], [assocId_2], [assocId_3]) -- you can dynamically build this column list like above example if you need
) UNP
) IQ
INNER JOIN Table2 T2
ON IQ.Value = T2.assoc_key
) OQ
PIVOT
(
MAX(assoc_value)
FOR associd IN ([assocID_1], [assocID_2], [assocID_3]) -- you can dynamically build this column list like above example if you need
) PV

select item_name, decode(ASSOCID_1,(select assocID_1 from t1 ), (select assoc from t2 where assoc_key =aa.assocID_1),null ) ,
decode(ASSOCID_2,(select assocID_2 from t1 ) , (select assoc from t2 where assoc_key =aa.assocID_1),null ),
decode(ASSOCID_3,(select assocID_3 from t1 ), (select assoc from t2 where assoc_key =aa.assocID_1),null ) from t1 aa

How to avoid repetition of conditions in EXISTS clause in a UNION

I have this SQL in DB2 and want to avoid repeating the conditions in the EXISTS clause in the second UNION, as the conditions can be fairly large. How do I do that? Any help is greatly appreciated.
select id from table t where t.given_name = 'good' and t.time = 1 and exists
(select 1 from table t1 where t1.id = t.id and t1.surname = 'OK') union
select id from table t where t.given_name = 'good' and t.time = 2 and not exists
(select 1 from table t1 where t1.id = t.id and t1.surname = 'OK')

I think this could be also achieve via where clause only
where given_name = 'good' and
(times = 1 and surname = 'OK') or
(times = 2 and surname <> 'OK')

Why are you using union? How about just doing this?
select id
from table t
where t.given_name = 'good' and
t.time in (1, 2) and
exists (select 1 from table t1 where t1.id = t.id and t1.surname = 'OK');
If id could have duplicates, use select distinct in the outer query.'
EDIT:
I think I misread the original query. The logic would be:
select id
from table t
where t.given_name = 'good' and
( (t.time = 1 and exists (select 1 from table t1 where t1.id = t.id and t1.surname = 'OK')
) or
(t.time = 2 and not exists (select 1 from table t1 where t1.id = t.id and t1.surname = 'OK')
)
)

Use a WITH clause to remove redundancy
with t2 as (select * from t1 where surname = 'OK')
select id from table t where t.given_name = 'good' and t.time = 1 and exists
(select 1 from table t2 where t2.id = t.id) union
select id from table t where t.given_name = 'good' and t.time = 2 and not exists
(select 1 from table t2 where t2.id = t.id)
;
and you can do the same for the other table too if needed
with t2 as (select * from t1 where surname = 'OK')
, tt as (select * from t where given_name = 'good')
select id from table tt where tt.time = 1 and exists
(select 1 from table t2 where t2.id = tt.id) union
select id from table tt where tt.time = 2 and not exists
(select 1 from table t2 where t2.id = tt.id)
;

SQL: Select rows in a table by filtering multiple columns from the same table by a 3 column select result

I have a table where I want to filter all rows that have a Code,Life and TC equal to the results of a select query on the same table filtered by ID
ID Code|Life|TC|PORT
62 XX101 1 1 1
63 XX101 1 1 2
64 AB123 1 1 1
65 AB123 1 1 2
66 AB123 1 1 3
67 CD321 1 1 1
68 CD321 1 1 2
This is the best I have come up with but it doesn't seem to be very efficient.
select ID from #table
where Code = (Select Code from #table where ID = #Port1) and
Life = (Select Life from #table where ID = #Port1) and
TC = (Select TC from #table where ID = #Port1)

Here is the query you need:
select t2.*
from #table t1
join #table t2 on t1.Code = t2.Code and
t1.Life = t2.Life and
t1.TC = t2.TC and
t1.PORT = t2.PORT
where t1.id = #Port1
With cross apply:
select ca.*
from #table t1
cross apply (select * from #table t2 where t1.Code = t2.Code and
t1.Life = t2.Life and
t1.TC = t2.TC and
t1.PORT = t2.PORT) ca
where where t1.id = #Port1
With cte:
with cte as(select * from #table where id = #Port1)
select t.*
from #table t
join cte c on t.Code = c.Code and
t.Life = c.Life and
t.TC = c.TC and
t.PORT = c.PORT

You could use an EXIST statement for this scenario
SELECT
ID
FROM
#table t1
WHERE
EXISTS ( SELECT
*
FROM
#table t2
WHERE
t2.ID = #Port1
AND t2.Code = t1.Code
AND t2.Life = t1.Life
AND t2.TC = t1.TC )

Your code looks to provide the same result of
SELECT ID
FROM #table AS tbl1
INNER JOIM#table AS tbl2 on
tbl2.ID =#Port1 AND
tbl1.Life =tbl2.Life AND
tbl1.TC =tbl2.TC
but it's more expensive
You are asking always for the same record in the selects under the where clause.
Then each time you pick a different field to match.
But pay attention because if there is more than one record with that ID your query gives error because, since you used the = operator it expects only one instance of the field you are checking.

Using window functions:
;WITH CTE AS (
SELECT *, RANK() OVER (ORDER BY [Code], [Life], [TC]) AS grp
FROM mytable
), CTE2 AS (SELECT grp FROM CTE WHERE ID = #Port1)
SELECT *
FROM CTE
WHERE grp = (SELECT grp FROM CTE2)
The above query finds the [Code], [Life], [TC] partition to which row with ID = #Port1 belongs and then selects all rows of this partition.

SQL WHERE Subquery in Field List

I have query like:
SELECT field
FROM table
WHERE
(
SELECT COUNT(*)
FROM table2
WHERE table2.field = table.field
)
!=
(
SELECT COUNT(*)
FROM table3
WHERE table3.field = table.field
)
Now I want to have those WHERE subqueries in my field list like:
SELECT field, count1, count2
FROM table
WHERE
(
SELECT COUNT(*)
FROM table2
WHERE table2.field = table.field
) AS Count1
!=
(
SELECT COUNT(*)
FROM table3
WHERE table3.field = table.field
) AS Count2
Is this possible? Of course I could put those subqueries in the field list, but then I can't compare them.
Any ideas?

You can do this if you use Sql Server:
SELECT field, ca2.c2, ca3.c3
FROM table t
cross apply(SELECT COUNT(*) c2
FROM table2 t2
WHERE t2.field = t.field)ca2
cross apply(SELECT COUNT(*) c3
FROM table3 t3
WHERE t3.field = t.field)ca3
where ca2.c2 <> ca1.c1

Use correlated sub-selects to count. Wrap up in a derived table:
select dt.* from
(
SELECT field,
(SELECT COUNT(*)
FROM table2
WHERE table2.field = table.field) as cnt1,
(SELECT COUNT(*)
FROM table3
WHERE table3.field = table.field) as cnt2
FROM table
) dt
where dt.cnt1 <> dt.cnt2

You just need to use a Derived Table:
select *
from
(
SELECT field,
(
SELECT COUNT(*)
FROM table2
WHERE table2.field = table.field
) AS Count1,
(
SELECT COUNT(*)
FROM table3
WHERE table3.field = table.field
) AS Count2
FROM table
) dt
WHERE Count1 <> Count2

alternative solution to too many JOINs

There is a table containing all names:
CREATE TABLE Names(
Name VARCHAR(20)
)
And there are multiple tables with similar schema.
Let's say:
CREATE TABLE T1
(
Name VARCHAR(20),
Description VARCHAR(30),
Version INT
)
CREATE TABLE T2
(
Name VARCHAR(20),
Description VARCHAR(30),
Version INT
)
I need to query description for each name, by following priority:
any records in T1 with matching name and version = 1
any records in T1 with matching name and version = 2
any records in T2 with matching name and version = 1
any records in T2 with matching name and version = 2
I want result from lower priority source only if there are no result from higher priority source.
So far that's I've got:
SELECT
N.Name AS Name, Description =
CASE
WHEN (T11.Description IS NOT NULL) THEN T11.Description
WHEN (T12.Description IS NOT NULL) THEN T12.Description
WHEN (T21.Description IS NOT NULL) THEN T21.Description
WHEN (T22.Description IS NOT NULL) THEN T22.Description
ELSE NULL
END
FROM Names AS N
LEFT JOIN T1 AS T11 ON T11.Name = N.Name AND T11.Version = 1
LEFT JOIN T1 AS T12 ON T12.Name = N.Name AND T12.Version = 2
LEFT JOIN T2 AS T21 ON T21.Name = N.Name AND T21.Version = 1
LEFT JOIN T2 AS T22 ON T22.Name = N.Name AND T22.Version = 2
It's working, but are there too much JOIN here? Is there any better approach?
sqlfiddle
Sample Input:
INSERT INTO Names VALUES('name1')
INSERT INTO Names VALUES('name2')
INSERT INTO Names VALUES('name3')
INSERT INTO Names VALUES('name4')
INSERT INTO Names VALUES('name5')
INSERT INTO Names VALUES('name6')
INSERT INTO T1 VALUES ('name1','name1_T1_1', 1)
INSERT INTO T1 VALUES ('name2','name2_T1_1', 1)
INSERT INTO T1 VALUES ('name3','name3_T1_1', 1)
INSERT INTO T1 VALUES ('name3','name3_T1_2', 2)
INSERT INTO T1 VALUES ('name5','name5_T1_2', 2)
INSERT INTO T2 VALUES ('name1','name1_T2_1', 1)
INSERT INTO T2 VALUES ('name4','name4_T2_1', 1)
Excepted result:
--
-- Excepted result:
-- Name Description
-- name1 name1_T1_1
-- name2 name2_T1_1
-- name3 name3_T1_1
-- name4 name4_T2_1
-- name5 name5_T1_2
-- name6 NULL

Well, this is a solution to eliminate the case statement and minimize the repetitive part of the query, it requires some joins of it's own of course, so you'd need quite some tables and/or versions to get any real benefit out of it:
;WITH
AllDescriptions AS
(
SELECT 1 AS Rank, * FROM T1
UNION ALL SELECT 2 AS Rank, * FROM T2
-- UNION ALL SELECT 3 AS Rank, * FROM T3
-- UNION ALL SELECT 4 AS Rank, * FROM T4
-- etc
),
Ranks AS
(
SELECT
AllDescriptions.Name,
MIN(AllDescriptions.Rank) AS Rank
FROM
AllDescriptions
GROUP BY
Name
),
Versions AS
(
SELECT
AllDescriptions.Name,
AllDescriptions.Rank,
MIN(AllDescriptions.Version) AS Version
FROM
AllDescriptions
INNER JOIN Ranks
ON Ranks.Name = AllDescriptions.Name
AND Ranks.Rank = AllDescriptions.Rank
GROUP BY
AllDescriptions.Name,
AllDescriptions.Rank
),
Descriptions AS
(
SELECT
AllDescriptions.Name,
AllDescriptions.Description
FROM
AllDescriptions
INNER JOIN Versions
ON Versions.Name = AllDescriptions.Name
AND Versions.Rank = AllDescriptions.Rank
AND Versions.Version = AllDescriptions.Version
)
SELECT
Names.*,
Descriptions.Description
FROM
Names
LEFT OUTER JOIN Descriptions
ON Descriptions.Name = Names.Name

Try this query and it will also give you the expected result.
SELECT N.name AS Name,
Description =
CASE
WHEN ( t1.description IS NOT NULL ) THEN t1.description
WHEN ( t2.description IS NOT NULL ) THEN t2.description
ELSE NULL
END
FROM names AS N
LEFT JOIN t1
ON t1.name = N.name
AND t1.version IN( 1, 2 )
LEFT JOIN t2
ON t2.name = N.name
AND t2.version IN ( 1, 2 )

select n.name, isnull(d.description,d1.Description) description
from Names n
outer apply (select top 1 t1.Name, t1.Description
from T1
WHERE t1.Name = n.name
order by Version asc
) d
outer apply (select top 1 t2.Name, t2.Description
from T2
WHERE t2.Name = n.name
order by Version asc
) d1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Select distinct rows that contain a given set of data - sql

select t1.bid from table_1 t1 inner join table_1 t2 on t1.bid = t2.bid where t1.data = 'a' and t2.data = 'c' By the way: all bids that 'contains' data "a" and "b" (result should be bid 1) --> bid 2 also contains data 'a' and 'b'

With a subquery, count the number of right occurences you have in your table. SELECT DISTINCT m.bid FROM myTable m WHERE ( SELECT COUNT(1) FROM myTable m2 WHERE (m2.data = 'a' OR m2.data = 'b') AND m.bid = m2.bid ) = 2

Maybe not the best answer but: select bid from mytable where data = 'a' intersect select bid from mytable where data = 'c'

Related

Matching multiple columns in one join

How to avoid repetition of conditions in EXISTS clause in a UNION

SQL: Select rows in a table by filtering multiple columns from the same table by a 3 column select result

SQL WHERE Subquery in Field List

alternative solution to too many JOINs

Categories

Resources