How to find equal subsets? - sql

I have a table with subsets. How to find reader id's with the same subsets as given id? For example:
Input reader = 4
The expected output: reader 1 and 5.
Subsets size is not always = 3 as in the example it can be dynamic. What is correct SQL query?
declare #t table(
reader int not null,
book int,
pages int
)
insert into #t (reader, book, pages)
select 1, 1, 100 union
select 1, 2, 201 union
select 1, 3, 301 union
select 2, 1, 100 union
select 2, 3, 101 union
select 2, 3, 301 union
select 3, 1, 100 union
select 3, 2, 101 union
select 3, 3, 301 union
select 4, 1, 100 union
select 4, 2, 201 union
select 4, 3, 301 union
select 5, 1, 100 union
select 5, 2, 201 union
select 5, 3, 301
select * from #t

This is a bit of a pain, but you can use a self-join:
with t as (
select t.*, count(*) over (partition by reader) as cnt
from #t t
)
select t.reader
from t left join
t t2
on t2.book = t.book and
t2.pages = t.pages and
t2.cnt = t.cnt and
t2.reader = 4
group by t.reader, t.cnt
having count(*) = t.cnt and
count(*) = count(t2.reader);
The left join is needed to avoid a subsetting relationship. That is, having all the books for "4" plus additional books.

This is a generic approach to handle relational division. It checks if set x contains all elements from set y (and perhaps more):
with reqd as (
select book, pages
from #t
where reader = 1
)
select t.reader
from #t as t
inner join reqd on t.book = reqd.book and t.pages = reqd.pages
group by t.reader
having count(reqd.book) = (select count(*) from reqd)

Related

Oracle SQL Query - Element containing every element in subquery

I have 3 tables like so :
Document(ID:integer, Title:string)
Keywords(ID:integer, Name:string)
Document_Keywords(DocumentID:integer, KeywordID:integer)
Document_Keywords.DocumentID referencing Document.ID
Document_Keywords.KeywordID referencing Keywords.ID
A document contains [0, n] keywords.
I want to get every Document which Keywords contains at least a set of another Document's Keywords. As so:
Foo, Bar and Fred-> Documents
Foo's keywords: {1, 2, 3}
Bar's keywords: {1, 2, 3, 4}
Fred's keywords: {1, 3, 5}
If we search for all the documents keywords containing Foo's keywords, we get Bar but not Fred.
Here is the query I have so far:
SELECT KeywordID
FROM Document_Keywords DK
JOIN Document D ON D.ID = DK.DocumentID
WHERE D.title = 'Foo'
MINUS
SELECT KeywordID
FROM Document_Keywords
WHERE DocumentID = 1;
It returns an empty table if the Document with ID = 1 keywords contains at least every keywords of Foo's.
I can't find any other ways to solve this probleme as I can only use Oracle SQL to answer it.
If you want to get keywords with documents:
SELECT KeywordID, D1.ID DOC_ID, D1.Title
FROM Document_Keywords DK1
JOIN Document D1
on DK1.DocumentID = D1.ID
WHERE exists
(select 1
from Document D2
join Document_Keywords DK2
on D2.ID = DK2.DocumentID
where D2.title = 'Foo'
and DK1.KeywordID=DK2.KeywordID
and D1.ID!= D2.ID
);
Full test case with test data and results:
with
Document(ID, Title) as (
select 1, 'Foo' from dual union all
select 2, 'Bar' from dual union all
select 3, 'Fred' from dual
)
,Keywords(ID, Name) as (
select level, 'Key'||level from dual connect by level<=5
)
,Document_Keywords(DocumentID, KeywordID) as (
select 1, column_value from table(sys.odcinumberlist(1,2,3)) union all -- Foo's keywords: {1, 2, 3}
select 2, column_value from table(sys.odcinumberlist(1,2,3,4)) union all -- Bar's keywords: {1, 2, 3, 4}
select 3, column_value from table(sys.odcinumberlist(1,3,5)) -- Fred's keywords: {1, 3, 5}
)
SELECT KeywordID, D1.ID DOC_ID, D1.Title
FROM Document_Keywords DK1
JOIN Document D1
on DK1.DocumentID = D1.ID
WHERE exists
(select 1
from Document D2
join Document_Keywords DK2
on D2.ID = DK2.DocumentID
where D2.title = 'Foo'
and DK1.KeywordID=DK2.KeywordID
and D1.ID!= D2.ID
);
KEYWORDID DOC_ID TITLE
---------- ---------- -----
1 2 Bar
1 3 Fred
2 2 Bar
3 2 Bar
3 3 Fred
If you want without documents, just list of keywords:
SELECT distinct KeywordID
FROM Document_Keywords DK1
WHERE exists
(select 1
from Document D2
join Document_Keywords DK2
on D2.ID = DK2.DocumentID
where D2.title = 'Foo'
and DK1.KeywordID=DK2.KeywordID
and DK1.DocumentID!= D2.ID
);
Full tests case with the results:
with
Document(ID, Title) as (
select 1, 'Foo' from dual union all
select 2, 'Bar' from dual union all
select 3, 'Fred' from dual
)
,Keywords(ID, Name) as (
select level, 'Key'||level from dual connect by level<=5
)
,Document_Keywords(DocumentID, KeywordID) as (
select 1, column_value from table(sys.odcinumberlist(1,2,3)) union all -- Foo's keywords: {1, 2, 3}
select 2, column_value from table(sys.odcinumberlist(1,2,3,4)) union all -- Bar's keywords: {1, 2, 3, 4}
select 3, column_value from table(sys.odcinumberlist(1,3,5)) -- Fred's keywords: {1, 3, 5}
)
SELECT distinct KeywordID
FROM Document_Keywords DK1
WHERE exists
(select 1
from Document D2
join Document_Keywords DK2
on D2.ID = DK2.DocumentID
where D2.title = 'Foo'
and DK1.KeywordID=DK2.KeywordID
and DK1.DocumentID!= D2.ID
);
KEYWORDID
----------
1
2
3
If I have this right, you want documents whose keywords contain all of Fred's keywords as a submultiset.
Setup (building on Sayan's example):
create or replace type number_tt as table of number;
create table documents(id, title) as
select 1, 'Foo' from dual union all
select 2, 'Bar' from dual union all
select 3, 'Fred' from dual;
create table document_keywords(documentid, keywordid) as
select 1, column_value from table(number_tt(1,2,3)) union all
select 2, column_value from table(number_tt(1,2,3,4)) union all
select 3, column_value from table(number_tt(1,3,5))
Query:
with document_keywords_agg(documentid, title, keywordlist, keywordids) as (
select d.id, d.title
, listagg(dk.keywordid, ', ') within group (order by dk.keywordid)
, cast(collect(dk.keywordid) as number_tt)
from documents d
join document_keywords dk on dk.documentid = d.id
group by d.id, d.title
)
select dk1.documentid, dk1.title, dk1.keywordlist
, dk2.title as subset_title
, dk2.keywordlist as subset_keywords
from document_keywords_agg dk1
join document_keywords_agg dk2
on dk2.keywordids submultiset of dk1.keywordids
where dk2.documentid <> dk1.documentid;
Results:
DOCUMENTID
TITLE
KEYWORDLIST
SUBSET_TITLE
SUBSET_KEYWORDS
2
Bar
1, 2, 3, 4
Foo
1, 2, 3
To extend the example a little, let's add another document 'Dino' containing keywords {1,3,5,9}:
insert all
when rownum = 1 then into documents values (docid, 'Dino')
when 1=1 then into document_keywords values (docid, kw)
select 4 as docid, column_value as kw from table(number_tt(1,3,5,9));
Now the results are:
DOCUMENTID
TITLE
KEYWORDLIST
SUBSET_TITLE
SUBSET_KEYWORDS
2
Bar
1, 2, 3, 4
Foo
1, 2, 3
4
Dino
1, 3, 5, 9
Fred
1, 3, 5
(Add a filter to the where clause if you just want to check one document.)
SQL Fiddle
So, inner joining Document_Keyword to itself on KeywordID gives you the raw materials for what you are looking for, no?
. . .
From Document_Keywords A Inner Join Document_Keywords B On A.KeywordID=B.KeywordID
And A.DocumentID<>B.DocumentID
. . .
Granted, if the same Keyword is in multiple other documents you will get multiple occurrences of A.*, but you can summarize those out with a Group By, or possibly a Distinct clause.
If you need text-y results, you can add Document and Keywords table joins to this on the table A keys.
A query that delivers results in the format you specified above would be:
Select Title, ListAgg(KeywordID,',') Within Group (Order By KeywordID) as KeyWord_IDs
From (
Select D.Title,D.ID,A.KeywordID
From Document_Keywords A Inner Join Document_Keywords B On A.KeywordID=B.KeywordID
And A.DocumentID<>B.DocumentID
Inner Join Document D on D.ID=A.DocumentID
Group By A.DocumentID,A.KeyWordID
)
Group By Title,ID

Should I keep with 2 views, or can I combine into 1?

I currently have 2 views: t_sdet_part and t_sdet_part_all. t_sdet_part is pulling data from view table_mv, along with creating rows that don't already exist in table_mv. Then that data created is joined into another view to show all records (t_sdet_part_all). Below is my current code:
-- view created: t_sdet_part
WITH v as (
SELECT v.*
FROM table_mv v
)
SELECT d.s_date, ig.part_no, ig.i_group, l.s_level, ig.p_category,
COALESCE(v.qty_ordered, 0) as qty_ordered
FROM (SELECT DISTINCT s_date FROM v) d CROSS JOIN
(SELECT DISTINCT part_no, i_group, p_category FROM v) ig CROSS JOIN
(SELECT '80' as s_level FROM DUAL UNION ALL
SELECT '81' FROM DUAL UNION ALL
SELECT '95' FROM DUAL UNION ALL
SELECT '101' FROM DUAL UNION ALL
SELECT '100' FROM DUAL UNION ALL
SELECT 'Late' FROM DUAL
) l LEFT JOIN
v
ON v.s_date = d.s_date AND v.part_no = ig.part_no AND
v.i_group = ig.i_group AND v.s_level = l.s_level
ORDER BY s_date, part_no, i_group,
DECODE(s_level, '80', 1, '81', 2, '95', 3, '101', 4, '100', 5, 'Late', 6)
-- view created t_sdet_part_all
SELECT DISTINCT
t.s_date,
t.part_no,
t.i_group,
t.s_level,
t.p_category,
t.qty_ordered,
v.bucket,
v.relief_amt,
v.extreme_amt,
v.curr_mth_note,
v.carryover_note
FROM
t_sdet_part t
LEFT JOIN table_mv v ON t.s_date = v.s_date
AND t.part_no = v.part_no
ORDER BY
s_date,
part_no,
i_group,
DECODE(s_level, '80', 1, '81', 2, '95', 3, '101', 4, '100', 5, 'Late', 6)
They are both pulling and joining data from table_mv. I'm trying to find a way (if it's even possible) to combine both of the below files so I only have to create 1 new view instead of 2.

Select one line of each code

I've got a Table that stores messages
like this:
codMsg, message, anotherCod
1, 'hi', 1
2, 'hello', 1
3, 'wasup', 1
4, 'yo', 2
5, 'yeah', 2
6, 'gogogo', 3
I was wondering if is possible to select top 1 of each anotherCod
What I expect:
1, 'hi', 1
4, 'yo', 2
6, 'gogogo', 3
I want the whole line, not just the number of the anotherCod, so group by should not work
select mytable.*
from mytable
join (select min(codMsg) as codMsg, anotherCod from mytable group by 2) x
on mytable.codMsg = x.codMsg
SQL Server 2005+, Oracle :
SELECT codMsg,
message,
anotherCod
FROM
(
SELECT codMsg,
message,
anotherCod,
RANK() OVER (PARTITION BY anotherCod ORDER BY codMsg ASC) AS Rank
FROM mytable
) tmp
WHERE Rank = 1
SELECT
*
FROM
myTable
WHERE
codMSG = (SELECT MIN(codMsg) FROM myTable AS lookup WHERE anotherCod = myTable.anotherCod)

concatenating values from string column in aggregate query in sql server [duplicate]

In my SQL Server 2005 database, using an SLQ query, does anyone know the best way to group records together by one field, and get a comma-separated list of the values from another?
So if I have:
UserID Code
1 A
1 C5
1 X
2 V3
3 B
3 D
3 NULL
3 F4
4 NULL
I'd get:
UserID Code
1 A,C5,X
2 V3
3 B,D,F4
4 NULL
Thanks for any help.
WITH Data AS (
SELECT 1 UserId, 'A' Code
UNION ALL
SELECT 1, 'C5'
UNION ALL
SELECT 1, 'X'
UNION ALL
SELECT 2, 'V3'
UNION ALL
SELECT 3, 'B'
UNION ALL
SELECT 3, 'D'
UNION ALL
SELECT 3, NULL
UNION ALL
SELECT 3, 'F4'
UNION ALL
SELECT 4, NULL
)
SELECT U.UserId, STUFF((
SELECT ','+Code FROM Data WHERE Data.UserID = U.UserID FOR XML PATH('')
), 1, 1, '') Code
FROM (SELECT DISTINCT UserID FROM Data) U
Just replace the Data CTE with your table name and you're done.
There it´s a complete review of forms to do that in TSQL
http://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/

How to do equivalent of "limit distinct"?

How can I limit a result set to n distinct values of a given column(s), where the actual number of rows may be higher?
Input table:
client_id, employer_id, other_value
1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj
6, 1, 34kjf
7, 7, 34kjf
8, 6, lkjkj
8, 7, 23kj
desired output, where limit distinct=5 distinct values of client_id:
1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj
Platform this is intended for is MySQL.
You can use a subselect
select * from table where client_id in
(select distinct client_id from table order by client_id limit 5)
This is for SQL Server. I can't remember, MySQL may use a LIMIT keyword instead of TOP. That may make the query more efficient if you can get rid of the inner most subquery by using the LIMIT and DISTINCT in the same subquery. (It looks like Vinko used this method and that LIMIT is correct. I'll leave this here for the second possible answer though.)
SELECT
client_id,
employer_id,
other_value
FROM
MyTable
WHERE
client_id IN
(
SELECT TOP 5
client_id
FROM
(
SELECT DISTINCT
client_id
FROM
MyTable
) SQ
ORDER BY
client_id
)
Of course, add in your own WHERE clause and ORDER BY clause in the subquery.
Another possibility (compare performance and see which works out better) is:
SELECT
client_id,
employer_id,
other_value
FROM
MyTable T1
WHERE
T1.code IN
(
SELECT
T2.code
FROM
MyTable T2
WHERE
(SELECT COUNT(*) FROM MyTable T3 WHERE T3,code < T2.code) < 5
)
-- Using Common Table Expression in Microsoft SQL Server.
-- LIMIT function does not exist in MS SQL.
WITH CTE
AS
(SELECT DISTINCT([COLUMN_NAME])
FROM [TABLE_NAME])
SELECT TOP (5) [[COLUMN_NAME]]
FROM CTE;
This works for ‍‍MS SQL if anyone is on that platform:
SET ROWCOUNT 10;
SELECT DISTINCT
column1, column2, column3,...
FROM
Table1
WHERE ...