Generating combinations in SQL Server - sql

I have a table that contains groups ('G1', 'G2' etc) and a table that contains persons ('P1', 'P2', etc...) and a m:m relation ship between them, so one user can belong to several groups, and one group consists of several users.
I have a rule that is satisfied only if a certain number of members of each group is present (i.e. at least 2 members of G1 and at least 1 member of G2 must be present), and I have a list od users that are present. One person cannot fulfil more than one requirement, so if P1 and P2 are members of both G1 and G2, the rule still needs a third person which can be a member of either G1 or G2.
Any ideas how can this be done in SQL Server?
Creation scripts:
create table Groups (GroupID int, Name nvarchar(100))
insert into Groups values (1, 'First')
insert into Groups values (2, 'Second')
insert into Groups values (3, 'Third')
create table Persons (PersonID int, Name nvarchar(100))
insert into Persons values (1, 'One')
insert into Persons values (2, 'Two')
insert into Persons values (3, 'Three')
insert into Persons values (4, 'Four')
insert into Persons values (5, 'Five')
insert into Persons values (6, 'Six')
create table PersonGroups (PersonID int, GroupID int)
-- p1 and p2 are members of g1
insert into PersonGroups values (1, 1)
insert into PersonGroups values (2, 1)
-- p2, p3 and p4 are members of g2
insert into PersonGroups values (2, 2)
insert into PersonGroups values (3, 2)
insert into PersonGroups values (4, 2)
-- p2, p4, p5 and p6 are members of g3
insert into PersonGroups values (2, 3)
insert into PersonGroups values (4, 3)
insert into PersonGroups values (5, 3)
insert into PersonGroups values (6, 3)
So, If a rule needs one person from each group to be present (1,3,5), (1,2,3), (2,3,4) would be valid, and (3, 5, 6) would not be valid.

Create header table for rules
create table #ruleset (Id int, name varchar(100))
insert into #ruleset
select 1,'At least 1 person from each group'
Create child table for each rule having many entries for each group.
drop table #ruleset_grouprules
create table #ruleset_Grouprules(Id int identity(1,1), RuleId int,
GroupID int, MinUsers int, MaxUsers int)
insert into #ruleset_Grouprules (RuleId, groupId, MinUsers, MaxUsers)
select 1,1,1,null
union all
select 1,2,1,null
union all
select 1,3,1,null
You can use NULL in the MinUsers column to represent no minimum amount
You can use NULL in the MaxUsers column to represent no maximum amount
This query will show you whether the group rules have passed or not.
select r.id, r.Name, gr.GroupId,
case when x.GroupQty>=isnull(gr.MinUsers, x.GroupQty)
and x.GroupQty<=isnull(gr.MaxUsers, x.GroupQty)
then 1 else 0 end as GroupValid
from #ruleset r
join #ruleset_Grouprules gr on gr.RuleId=r.Id
join (
select g.groupID, count(*) GroupQty
from #Groups g
join #PersonGroups pg on pg.GroupID=g.GroupID
join #Persons p on p.PersonID=pg.PersonID
group by g.GroupID
)x on x.GroupID=gr.GroupID
You can then aggregate on this query to compare sum(GroupValid)=count(*) with a group by r.id to check if the entire Rule is valid. I left it like that so you can see the working data.

Related

sql query to join two tables and a boolean flag to indicate whether it contains any words from third table

I have 3 tables with the following schema
create table main (
main_id int PRIMARY KEY,
secondary_id int NOT NULL
);
create table secondary (
secondary_id int NOT NULL,
tags varchar(100)
);
create table bad_words (
words varchar(100) NOT NULL
);
insert into main values (1, 1001);
insert into main values (2, 1002);
insert into main values (3, 1003);
insert into main values (4, 1004);
insert into secondary values (1001, 'good word');
insert into secondary values (1002, 'bad word');
insert into secondary values (1002, 'good word');
insert into secondary values (1002, 'other word');
insert into secondary values (1003, 'ugly');
insert into secondary values (1003, 'bad word');
insert into secondary values (1004, 'pleasant');
insert into secondary values (1004, 'nice');
insert into bad_words values ('bad word');
insert into bad_words values ('ugly');
insert into bad_words values ('worst');
expected output
----------------
1, 1000, good word, 0 (boolean flag indicating whether the tags contain any one of the words from the bad_words table)
2, 1001, bad word,good word,other word , 1
3, 1002, ugly,bad word, 1
4, 1003, pleasant,nice, 0
I am trying to use case to select 1 or 0 for the last column and use a join to join the main and secondary table, but getting confused and stuck. Can someone please help me with a query ? These tables are stored in redshift and i want query compatible with redshift.
you can use the above schema to try your query in sqlfiddle
EDIT: I have updated the schema and expected output now by removing the PRIMARY KEY in secondary table so that easier to join with the bad_words table.
You can use EXISTS and a regex comparison with \m and \M (markers for beginning and end of a word, respectively):
with
main(main_id, secondary_id) as (values (1, 1000), (2, 1001), (3, 1002), (4, 1003)),
secondary(secondary_id, tags) as (values (1000, 'very good words'), (1001, 'good and bad words'), (1002, 'ugly'),(1003, 'pleasant')),
bad_words(words) as (values ('bad'), ('ugly'), ('worst'))
select *, exists (select 1 from bad_words where s.tags ~* ('\m'||words||'\M'))::int as flag
from main m
join secondary s using (secondary_id)
select main_id, a.secondary_id, tags, case when c.words is not null then 1 else 0 end
from main a
join secondary b on b.secondary_id = a.secondary_id
left outer join bad_words c on c.words like b.tags
SELECT m.main_id, m.secondary_id, t.tags, t.is_bad_word
FROM srini.main m
JOIN (
SELECT st.secondary_id, st.tags, exists (select 1 from srini.bad_words b where st.tags like '%'+b.words+'%') is_bad_word
FROM
( SELECT secondary_id, LISTAGG(tags, ',') as tags
FROM srini.secondary
GROUP BY secondary_id ) st
) t on t.secondary_id = m.secondary_id;
This worked for me in redshift and produced the following output with the above mentioned schema.
1 1001 good word false
3 1003 ugly,bad word true
2 1002 good word,other word,bad word true
4 1004 pleasant,nice false

Getting a single result on one table from criteria on multiple rows in another table

Imagine a Student table with the name and id of students at a school, and a Grades table that has grades on the form:
grade_id | student_id.
What I want to do is find all the students that match an arbitrary criteria of say "find all students that have grade A, grade B, but not C or D".
In a school situation a student could have several A's and B's, but for my particular problem they will allways have one or none of each grade.
Also, the tables i'm working on are huge (several million rows in each), but i only need to find say 10-20 on each query (the purpose of this is to find test data).
Thanks!
Change the table variables to your physical tables and this should help?
DECLARE #Students TABLE (
StudentId INT,
StudentName VARCHAR(50));
INSERT INTO #Students VALUES (1, 'Tom');
INSERT INTO #Students VALUES (2, 'Dick');
INSERT INTO #Students VALUES (3, 'Harry');
DECLARE #StudentGrades TABLE (
StudentId INT,
GradeId INT);
INSERT INTO #StudentGrades VALUES (1, 1);
INSERT INTO #StudentGrades VALUES (1, 1);
INSERT INTO #StudentGrades VALUES (1, 2);
INSERT INTO #StudentGrades VALUES (1, 3);
INSERT INTO #StudentGrades VALUES (2, 1);
INSERT INTO #StudentGrades VALUES (2, 2);
INSERT INTO #StudentGrades VALUES (3, 1);
INSERT INTO #StudentGrades VALUES (3, 1);
INSERT INTO #StudentGrades VALUES (3, 3);
INSERT INTO #StudentGrades VALUES (3, 4);
INSERT INTO #StudentGrades VALUES (3, 4);
DECLARE #Grades TABLE (
GradeId INT,
GradeName VARCHAR(10));
INSERT INTO #Grades VALUES (1, 'A');
INSERT INTO #Grades VALUES (2, 'B');
INSERT INTO #Grades VALUES (3, 'C');
INSERT INTO #Grades VALUES (4, 'D');
--Student/ Grade Summary
SELECT
s.StudentId,
s.StudentName,
g.GradeName,
COUNT(sg.GradeId) AS GradeCount
FROM
#Students s
CROSS JOIN #Grades g
LEFT JOIN #StudentGrades sg ON sg.StudentId = s.StudentId AND sg.GradeId = g.GradeId
GROUP BY
s.StudentId,
s.StudentName,
g.GradeName;
--Find ten students with A and B but not C or D
SELECT TOP 10
*
FROM
#Students s
WHERE
EXISTS (SELECT * FROM #StudentGrades sg WHERE sg.StudentId = s.StudentId AND sg.GradeId = 1) --Got an A
AND EXISTS (SELECT * FROM #StudentGrades sg WHERE sg.StudentId = s.StudentId AND sg.GradeId = 2) --Got a B
AND NOT EXISTS (SELECT * FROM #StudentGrades sg WHERE sg.StudentId = s.StudentId AND sg.GradeId IN (3, 4)); --Didn't get a C or D
Make sure all your id fields are indexed.
select *
from students s
where exists
(
select *
from grades g
where g.grade_id in (1, 2)
and g.student_id = s.student_id
)

Display 2 columns for each header

In SQL Server 2008 I have a table People (Id, Gender, Name).
Gender is either Male or Female. There can be many people with the same name.
I would like to write a query that displays for each gender the top 2 names
by count and their count, like this:
Male Female
Adam 23 Rose 34
Max 20 Jenny 15
I think that PIVOT might be used but all the examples I have seen display only one column for each header.
Here is an example on SQL Fiddle -- http://sqlfiddle.com/#!3/b3477/1
This uses an couple of common table expressions to separate the genders.
create table People
(
Id int,
Gender varchar(50),
Name varchar(50)
)
;
insert into People values (1, 'Male', 'Bob');
insert into People values (2, 'Male', 'Bob');
insert into People values (3, 'Male', 'Bill');
insert into People values (4, 'Male', 'Chuck');
insert into People values (5, 'Female', 'Anne');
insert into People values (6, 'Female', 'Anne');
insert into People values (7, 'Female', 'Bobbi');
insert into People values (8, 'Female', 'Jane');
with cteMale as
(
select Name as 'MaleName', Count(*) as Num, ROW_NUMBER() over(order by count(*) desc, Name) RowNum
from People
where Gender = 'Male'
group by Name
)
,
cteFemale as
(
select top 2 Name as 'FemaleName', Count(*) as Num, ROW_NUMBER() over(order by count(*) desc, Name) RowNum
from People
where Gender = 'Female'
group by Name
)
select a.MaleName, a.Num as MaleNum, b.femaleName, b.Num as FemaleNum
from cteMale a
join cteFemale b on
a.RowNum = b.RowNum
where a.RowNum <= 2
Use a windowing function. Below is a complete solution using a temporary table #people.
-- use temp db
use tempdb;
go
-- drop test table
--drop table #people;
--go
-- create test table
create table #people (my_id int, my_gender char(1), my_name varchar(25));
go
-- clear test table
delete from #people;
-- three count
insert into #people values
(23, 'M', 'Adam'),
(34, 'F', 'Rose');
go 3
-- two count
insert into #people values
(20, 'M', 'Max'),
(15, 'F', 'Jenny');
go 2
-- one count
insert into #people values
(20, 'M', 'John'),
(15, 'F', 'Julie');
go
-- grab top two by gender
;
with cte_Get_Top_Two as
(
select ROW_NUMBER() OVER(PARTITION BY my_gender ORDER BY count() DESC) AS my_window,
my_gender, my_name, count() as total
from #people
group by my_gender, my_name
)
select * from cte_Get_Top_Two where my_window in (1, 2)
go
Here is the output.
PS: You can drop my_id from the table since it does not relate to your problem but does not change solution.

convert marks into percentage

how to convert marks obtained by a student into x%
i.e. there are two exams. calculate certain %marks from both exams (say x% and Y%) so that the total will be 100%
Based on the limited info that you have provided, I think you might be asking for the following:
create table student
(
id int,
s_name varchar(10)
)
insert into student values (1, 'Jim')
insert into student values (2, 'Bob')
insert into student values (3, 'Jane')
create table exams
(
id int,
e_name varchar(10)
)
insert into exams values (1, 'Test 1')
insert into exams values (2, 'Test 2')
insert into exams values (3, 'Test 3')
insert into exams values (4, 'Test 4')
create table exam_student
(
e_id int,
s_id int,
dt datetime,
score decimal(5,2)
)
insert into exam_student values(1, 1, '2012-08-01', 65.0)
insert into exam_student values(1, 2, '2012-08-01', 85.0)
insert into exam_student values(2, 1, '2012-08-02', 75.0)
insert into exam_student values(2, 2, '2012-08-02', 42.0)
select avg(es.score) as ScorePct, s_id, s.s_name
from exam_student es
inner join exams e
on es.e_id = e.id
inner join student s
on es.s_id = s.id
group by s_id, s_name
Results:
If you provide more details on exactly what you are looking for that would be helpful in answering your question.

Check if matching child records exist before saving parent records

We basically have a set of child records in which we will use to create a new parent/child record(s) but need to first verify that a parent record doesn't already exist containing the same child records. Here are the details:
We have 3 tables, one is basically a linking table between the parent and children records.
Table A (parent table)
Id
Name
Desc
Table B (linking table between tables A and C)
Id
TableAId
TableCId
Table C (child table)
Id
StartPosition
EndPosition
Percentage
So with that structure, here is an example of a complete record, the parent table it one-to-many relation with child table:
Table A
(1, 'Sample', 'N/A')
Table B
(1, 1, 1)
(2, 1, 2)
(3, 1, 3)
Table C
(1, 1, 3, 0.50)
(2, 4, 5, 0.30)
(3, 6, 9, 0.20)
So we then pass in an xml string which we parse and throw into a temp table. The contents of the temp table are that of Table C, without the specific Id.
Then before we save any new records, we need to check if there is an existing Table A record which has both the same number of child records and that those child records match the 3 columns in our temp table (no ID match possible).
Hopefully this is explained well enough, I have done many searches and can't find anything specific to this issue.
What you're looking for is called a relational division. The article "Divided We Stand: The SQL of Relational Division" provides a nice summary of various techniques for using SQL to perform a relational division. For your case, you want the technique listed under "Exact Division":
CREATE TABLE tableA (
Id int PRIMARY KEY,
Name varchar(25),
[Desc] varchar(255)
);
INSERT INTO tableA
(Id, Name, [Desc])
VALUES
(1, 'Sample 1', 'Should match the XML'),
(2, 'Sample 2', 'Partial match (should be excluded)'),
(3, 'Sample 3', 'Has extra matches (should be excluded)');
GO
CREATE TABLE tableB (
Id int PRIMARY KEY,
TableAId int,
TableCId int
);
INSERT INTO tableB
(Id, TableAId, TableCId)
VALUES
(1, 1, 1),
(2, 1, 2),
(3, 1, 3),
(4, 2, 1),
(5, 2, 2),
(6, 3, 1),
(7, 3, 2),
(8, 3, 3),
(9, 3, 4);
GO
CREATE TABLE tableC (
Id int PRIMARY KEY,
StartPosition int,
EndPosition int,
Percentage decimal(3,2)
);
INSERT INTO tableC
(Id, StartPosition, EndPosition, Percentage)
VALUES
(1, 1, 3, 0.50),
(2, 4, 5, 0.30),
(3, 6, 9, 0.20),
(4, 10, 12, 0.10);
GO
-- this represents the temp table holding the XML data
-- we want to match Sample 1
CREATE TABLE xmlData (
StartPosition int,
EndPosition int,
Percentage decimal(3,2)
);
INSERT INTO xmlData
(StartPosition, EndPosition, Percentage)
VALUES
(1, 3, 0.50),
(4, 5, 0.30),
(6, 9, 0.20);
GO
SELECT
b.TableAId
FROM
tableB AS b
INNER JOIN
tableC AS c
ON
b.TableCId = c.Id
LEFT OUTER JOIN
xmlData AS x
ON
c.StartPosition = x.StartPosition AND
c.EndPosition = x.EndPosition AND
c.Percentage = x.Percentage
GROUP BY
b.TableAId
HAVING
COUNT(c.Id) = (SELECT COUNT(*) FROM xmlData) AND
COUNT(x.StartPosition) = (SELECT COUNT(*) FROM xmlData);
GO
DROP TABLE xmlData;
DROP TABLE tableC;
DROP TABLE tableB;
DROP TABLE tableA;
GO