Check for multiple distinct occurrences by group - sql

Input Data
Some_Table
ST_Field1 ST_Field2
Apple A
Apple A
Apple D
Orange D
Orange E
Orange Z
Pear D
Pear G
Pear C
Reference_Table
RT_Field1 RT_Field2
1 A
1 B
1 C
2 D
2 E
2 F
3 G
Expected Result:
ST_Field1 ST_Field2
Orange D
Orange E
CREATE TABLE SOME_TABLE
( ST_Field1 VARCHAR(100),
ST_Field2 VARCHAR(100)
);
INSERT INTO SOME_TABLE VALUES ('Apple','A');
INSERT INTO SOME_TABLE VALUES ('Apple','A');
INSERT INTO SOME_TABLE VALUES ('Apple','D');
INSERT INTO SOME_TABLE VALUES ('Orange','D');
INSERT INTO SOME_TABLE VALUES ('Orange','E');
INSERT INTO SOME_TABLE VALUES ('Orange','Z');
INSERT INTO SOME_TABLE VALUES ('Pear','D');
INSERT INTO SOME_TABLE VALUES ('Pear','G');
INSERT INTO SOME_TABLE VALUES ('Pear','C');
CREATE TABLE REFERENCE_TABLE
( RT_Field1 INTEGER,
RT_Field2 VARCHAR(100)
);
INSERT INTO REFERENCE_TABLE VALUES (1,'A');
INSERT INTO REFERENCE_TABLE VALUES (1,'B');
INSERT INTO REFERENCE_TABLE VALUES (1,'C');
INSERT INTO REFERENCE_TABLE VALUES (2,'D');
INSERT INTO REFERENCE_TABLE VALUES (2,'E');
INSERT INTO REFERENCE_TABLE VALUES (2,'F');
INSERT INTO REFERENCE_TABLE VALUES (3,'G');
It can be assumed that RT_Field2 is unique.
I'm looking to get the records from Some_Table which have multiple distinct values from RT_Field2, group by RT_Field1, by ST_Field1.
So from the reference table {A,B,C} are a grouping. I want to see if for a given ST_Field1 I see either {A,B};{B,c},{A,C}. I don't, I see A and C present, but across Apple and Pear.
The only success is Orange, where I'm looking for {D,E},{D,F}, or {E,F} and find D and E both for Orange.
I have:
WITH DUP_VALUES_RTF2 AS
( SELECT *
FROM (SELECT DST.ST_Field1,
DST.ST_Field2,
COUNT(1) OVER (PARTITION BY RT.RT_Field1) cnt_RTF1
FROM (SELECT DISTINCT
ST_Field1,
ST_Field2
FROM Some_Table
) DST
INNER
JOIN REFERENCE_TABLE RT
ON DST.ST_Field2 = RT.RT_Field2
) TMP
WHERE cnt_RTF1 > 1
)
SELECT *
FROM SOME_TABLE ST
WHERE EXISTS
( SELECT 1
FROM DUP_VALUES_RTF2 DVR
WHERE ST.ST_Field1 = DVR.ST_Field1
AND ST.ST_Field2 = DVR.ST_Field2
);
Which doesn't even really come close because it doesn't handle the grouping at all correctly and is really ugly. Maybe I'm just going brain dead after 5pm.

If I understand this correctly, you want to match on st_field1 and rt_field1, looking for duplicates in rt_field2. You can use window functions for this:
select s.*
from (select s.*, rt.rt_field1, rt.rt_field2,
min(rt.rt_field2) over (partition by s.st_field1, r.rt_field1) as min_rt2,
max(rt.rt_field2) over (partition by s.st_field1, r.rt_field1) as max_rt2
from sometable s join
reference_table r
on s.st_field2 = r.rt_field2
) s
where min_rt2 <> max_rt2;

you can try something like below
; with distinctSet as
(select distinct s.*,RT_Field1 from SOME_TABLE s join REFERENCE_TABLE r on s.ST_Field2=r.RT_Field2
)
,
uniqueSet as
(
select RT_Field1,ST_Field1
from distinctSet
group by RT_Field1,ST_Field1
having count(1)>1
),
resultSet as
(
select
distinct
s.*
from SOME_TABLE s
join REFERENCE_TABLE r
on s.ST_Field2=r.RT_Field2
join uniqueSet u
on u.RT_Field1=r.RT_Field1
and u.ST_Field1=s.ST_Field1
)
select * from resultSet

Related

Number of records created per day

In my PostgreSQL database I have the following schema:
CREATE TABLE programs (
id integer,
description text
);
CREATE TABLE public.messages (
id integer,
program_id integer,
text text,
message_template_id integer
);
CREATE TABLE public.message_templates (
id integer,
deliver_day integer
);
INSERT INTO programs VALUES(1, 'Test program');
INSERT INTO messages VALUES(1,1, 'Test message 1', 1);
INSERT INTO message_templates VALUES(1, 1);
INSERT INTO messages VALUES(2,1, 'Test message 2', 2);
INSERT INTO message_templates VALUES(2, 3);
INSERT INTO messages VALUES(3,1, 'Test message 3', 3);
INSERT INTO message_templates VALUES(3, 5);
Now I want to get number of message sent per day throughout the life of the program, query result should look like this:
day count
--------|----------
1 1
2 0
3 1
4 0
5 1
Is there any way of doing that in PostgreSQL?
https://www.db-fiddle.com/f/gvxijmp8u6wr6mYcSoAeVV/2
I decided to use generate_series:
SELECT d AS "Day", count(mt.id) FROM generate_series(
(SELECT min(delivery_day) from message_templates),
(SELECT max(delivery_day) from message_templates)
) d
left join message_templates mt on mt.delivery_day = d
group by d.d
Query is working fine. Maybe there is better way of doing this?
You could use this:
WITH tmp AS
(
SELECT m.program_id, a.n AS d
FROM generate_series(1,
(SELECT MAX(deliver_day) FROM message_templates)
) AS a(n)
CROSS JOIN
(
SELECT DISTINCT program_id
FROM messages
) m
)
SELECT t.program_id,
t.d AS "day",
COUNT(m.program_id) AS "count" -- COUNT(m.id)
FROM tmp t
LEFT JOIN message_templates mt
ON t.d = mt.deliver_day
LEFT JOIN messages m
ON m.message_template_id = mt.id AND t.program_id = m.program_id
GROUP BY t.program_id, t.d
ORDER BY t.program_id, t.d;
Tested in db-fiddle

How to the result set form the below table

Table 1 contains certain set of data's. I need to get the following result set form the Table 1
Table1
Id Desc ParentId
1 Cloths 0
2 Mens 1
3 Womens 1
4 T-Shirt_M 2
5 Casual Shirts_M 2
6 T-Shirt_F 3
7 Education 8
If I pass a parameter as "Casual Shirts_M" I should get the below result set.
Result Set
Id Desc ParentId
1 Cloths 0
2 Mens 1
5 Casual Shirts_M 2
As mentioned in comments, there are plenty of Recursive Common Table Expressions examples for this, here's another one
DECLARE #Desc NVARCHAR(50) = 'Casual Shirts_M'
;WITH cteX
AS
( SELECT
B.Id, B.[DESC], B.ParentId
FROM
Table1 b
WHERE
B.[Desc] = #Desc
UNION ALL
SELECT
E.Id, E.[DESC], E.ParentId
FROM
Table1 E
INNER JOIN
cteX r ON e.Id = r.ParentId
)
SELECT * FROM cteX ORDER BY ID ASC
SQL-Fiddle provided by #WhatsThePoint
The question comes under the concept of Building hierarchy using Recursive CTE:
CREATE TABLE cloths
(
id INT,
descr VARCHAR(100),
parentid INT
);
insert into cloths values (1,'Cloths',0);
insert into cloths values (2,'Mens',1);
insert into cloths values (3,'Womens',1);
insert into cloths values (4,'T-Shirt_M',2);
insert into cloths values (5,'Casual Shirts_M',2);
insert into cloths values (6,'T-Shirt_F',3);
insert into cloths values (7,'Education',8);
DECLARE #variety VARCHAR(100) = 'Casual Shirts_M';
WITH
cte1 (id, descr, parentid)
AS (SELECT *
FROM cloths
WHERE descr = #variety
UNION ALL
SELECT c.id,
c.descr,
c.parentid
FROM cloths c
INNER JOIN cte1 r
ON c.id = r.parentid)
SELECT *
FROM cte1
ORDER BY parentid ASC;

SQL query to get the value that appeared the most for each category

I have a table like this:
Category Reply
---------+---------------+
M 1
F 2
M 1
M 3
M 1
M 3
F 2
F 1
F 2
F 5
F 2
I'm looking for an SQL query to return the following results:
Category Total Number Best Reply Number
---------+---------------+------------------+---------------+
M 5 1 3
F 6 2 4
Total number : the number of appearance of that category (I know how to get this)
Best Reply: The Reply that was chosen the most for that category
Number : The number of time the "best Reply" was chosen
You don't specify your database, so I avoided using common table expressions which would make this clearer. It could still be cleaned up a bit. I did my work on SQL Server 2008.
select rsTotalRepliesByCategory.Category,
TotalRepliesByCategory,
rsCategoryReplyCount.Reply,
rsMaxReplies.MaxReplies
from
(
--calc total replies
select Category, COUNT(*) as TotalRepliesByCategory
from CategoryReply
group by Category
) rsTotalRepliesByCategory
INNER JOIN
(
--calc number of replies by category and reply
select Category, Reply, COUNT(*) as CategoryReplyCount
from CategoryReply
group by Category, Reply
) rsCategoryReplyCount on rsCategoryReplyCount.Category = rsTotalRepliesByCategory.Category
INNER JOIN
(
--calc the max replies
select Category, MAX(CategoryReplyCount) as MaxReplies
from
(
select Category, Reply, COUNT(*) as CategoryReplyCount
from CategoryReply
group by Category, Reply
) rsCategoryReplyCount2
group by Category
) rsMaxReplies on rsMaxReplies.Category = rsTotalRepliesByCategory.Category and rsMaxReplies.MaxReplies = rsCategoryReplyCount.CategoryReplyCount
Here is the setup I used to play around with this:
create table CategoryReply
(
Category char(1),
Reply int
)
insert into CategoryReply values ('M',1)
insert into CategoryReply values ('F',2)
insert into CategoryReply values ('M',1)
insert into CategoryReply values ('M',3)
insert into CategoryReply values ('M',1)
insert into CategoryReply values ('M',3)
insert into CategoryReply values ('F',2)
insert into CategoryReply values ('F',1)
insert into CategoryReply values ('F',2)
insert into CategoryReply values ('F',5)
insert into CategoryReply values ('F',2)
And finally, the output:
Category TotalRepliesByCategory Reply MaxReplies
F 6 2 4
M 5 1 3
SELECT Category, TotalNumber, Row_Number() over (order by TotalNumber)
FROM(
SELECT Category, Sum(Reply) as TotalNumber, Count(Reply) as Number
From Table
Group By Category) as temp
Would be something like that

SQL if statement with two tables

Given the following tables:
table objects
id Name rating
1 Megan 9
2 Irina 10
3 Vanessa 7
4 Samantha 9
5 Roxanne 1
6 Sonia 8
swap table
id swap_proposalid counterpartyid
1 4 2
2 3 2
Everyone wants the ten. I would like to make a list for Irina of possible swaps where id 4 and 3 don't appear because the propositions are already there.
output1
id Name rating
1 Megan 9
5 Roxanne 1
6 Sonia 8
Thanks
This should do the trick:
SELECT o.id, o.Name, o.rating
FROM objects o
LEFT JOIN swap s on o.id = s.swap_proposalid
WHERE s.id IS NULL
AND o.Name != 'Irina'
This works
SELECT mt2.ID, mt2.Name, mt2.Rating
FROM [MyTable] mt2 -- Other Candidates
, [MyTable] mt1 -- Candidate / Subject (Irina)
WHERE mt2.ID NOT IN
(
SELECT st.swap_proposalid
FROM SwapTable st
WHERE
st.counterpartyid = mt1.ID
)
AND mt1.ID <> mt2.ID -- Don't match Irina with Irina
AND mt1.Name = 'Irina' -- Find other swaps for Irina
-- Test Data
CREATE TABLE MyTable
(
ID INT,
Name VARCHAR(100),
Rating INT
)
GO
CREATE TABLE SwapTable
(
ID INT,
swap_proposalid INT,
counterpartyid INT
)
GO
INSERT INTO MyTable VALUES(1 ,'Megan', 9)
INSERT INTO MyTable VALUES(2 ,'Irina', 10)
INSERT INTO MyTable VALUES(3 ,'Vanessa', 7)
INSERT INTO MyTable VALUES(4 ,'Samantha', 9)
INSERT INTO MyTable VALUES(5 ,'Roxanne', 1)
INSERT INTO MyTable VALUES(6 ,'Sonia', 8)
INSERT INTO SwapTable(ID, swap_proposalid, counterpartyid)
VALUES (1, 4, 2)
INSERT INTO SwapTable(ID, swap_proposalid, counterpartyid)
VALUES (1, 3, 2)
Guessing that the logic involves identifying the objects EXCEPT the highest rated object EXCEPT propositions with the highest rated object e.g. (using sample DDL and data kindly posted by #nonnb):
WITH ObjectHighestRated
AS
(
SELECT ID
FROM MyTable
WHERE Rating = (
SELECT MAX(T.Rating)
FROM MyTable T
)
),
PropositionsForHighestRated
AS
(
SELECT swap_proposalid AS ID
FROM SwapTable
WHERE counterpartyid IN (SELECT ID FROM ObjectHighestRated)
),
CandidateSwappersForHighestRated
AS
(
SELECT ID
FROM MyTable
EXCEPT
SELECT ID
FROM ObjectHighestRated
EXCEPT
SELECT ID
FROM PropositionsForHighestRated
)
SELECT *
FROM MyTable
WHERE ID IN (SELECT ID FROM CandidateSwappersForHighestRated);

Get max column with group by

I have a table for contents on a page. The page is divided into sections.
I want to get the last version for each page-section.
Id (int)
Version (int)
SectionID
Id Version SectionID Content
1 1 1 AAA
2 2 1 BBB
3 1 2 CCC
4 2 2 DDD
5 3 2 EEE
I want to get:
Id Version SectionID Content
2 2 1 BBB
5 3 2 EEE
You could use an exclusive self join:
select last.*
from YourTable last
left join
YourTable new
on new.SectionID = last.SectionID
and new.Version > last.Version
where new.Id is null
The where statement basically says: where there is no newer version of this row.
Slightly more readable, but often slower, is a not exists condition:
select *
from YourTable yt
where not exists
(
select *
from YourTable yt2
where yt2.SectionID = yt.SectionID
and yt2.Version > yt.Version
)
Example table definition:
declare #t table(Id int, [Version] int, [SectionID] int, Content varchar(50))
insert into #t values (1,1,1,'AAA');
insert into #t values (2,2,1,'BBB');
insert into #t values (3,1,2,'CCC');
insert into #t values (4,2,2,'DDD');
insert into #t values (5,3,2,'EEE');
Working solution:
select A.Id, A.[Version], A.SectionID, A.Content
from #t as A
join (
select max(C.[Version]) [Version], C.SectionID
from #t C
group by C.SectionID
) as B on A.[Version] = B.[Version] and A.SectionID = B.SectionID
order by A.SectionID
A simpler and more readeable solution:
select A.Id, A.[Version], A.SectionID, A.Content
from #t as A
where A.[Version] = (
select max(B.[Version])
from #t B
where A.SectionID = B.SectionID
)
I just saw that there was a very similar question for Oracle with an accepted answer based on performance.
Maybe if your table is big, an performance is an issue you can give it a try to see if SQL server also performs better with this:
select Id, Version, SectionID, Content
from (
select Id, Version, SectionID, Content,
max(Version) over (partition by SectionID) max_Version
from #t
) A
where Version = max_Version