Count condition met - sql

I have a table (stu_grades) that stores student data and their grades at the centers they attended
I want to find out how many times for e.g. each student in that table got 'A' and then 'B' etc at any center
stu_grades
stu_ID|grade1|grade2|Grade3|center
1 A A C 1
2 B B B 2
3 C C A 1
1 C A C 2
the same student could occur more than once in the table with the same grades or even a different grade, same or different center
I especially want to check where the grade has appeared more than 3 or more times and how many centeres they exist in
So the final output should be like:
Stu_ID|Grade|Count|centercount
1 A 3 2 (As they accquired 'A' from 2 centres)
1 C 3 2
2 B 3 1 (As they only exist in 1 centre)
3 C 2 1
3 A 1 1

select
stu_id,
grade,
sum(count) count,
count(distinct center) centercount
from (
select stu_id, grade, center, count(*)
from stu_grades,
lateral unnest(array[grade1, grade2, grade3]) grade
group by 1, 2, 3
) s
group by 1, 2
order by 1, 2;
Test it here.

Related

Return top 1 million based off of two criteria (SQL Query)

I’d like to build a query that returns the top 1 million customers based off of two criteria that ranks 10 million customers.
Criterion 1 being a grade assigned to each customer from 1 to 5, 1 being the best
Criterion 2 being a grade assigned to each customer from A to E, A being the best
Criterion 1 outweighs Criterion 2, in that before you move to B (Criterion 2), you must first go from 1 to 5 (Criterion 1) within the A band (Criterion 2) i.e. A customer that scores a 5 (criterion 1) and an A (criterion 2), is a better customer than a customer that scores a 1 (criterion 1) and a B (criterion 2).
I’d like the query to return the top 1 million customers, stopping within the bands that return the 1 million-th customer e.g. if customer 1 million is in the 4C band, don’t return any customers beyond 4C. It’s ok if it's just over 1 million, to accommodate every customer in 4C band.
This is my attempt at it but this doesn’t account for sequence:
SELECT *
FROM CUSTOMER_POPULATION
WHERE Criterion1 IN (5,4,3,2,1)
AND Criterion2 ('A','B','C','D','E')
LIMIT 1000000
TIA.
WITH CUSTOMER_POPULATION (NAME, CRITERION1, CRITERION2) AS
(SELECT * FROM VALUES
('Alice',3,'A'),('Bob',4,'C'),('Carol',5,'E'),('Dave',2,'C')
,('Esther',2,'E'),('Fred',5,'C'),('Gladys',3,'E'),('Harvey',2,'E')
,('Iona',3,'C'),('John',1,'A'),('Kate',4,'E'),('Leo',3,'B')
,('Mary',2,'C'),('Nora',3,'A'),('Oscar',1,'D'),('Penny',3,'C')
,('Quincy',3,'A'),('Ruth',5,'E'),('Sam',4,'B'),('Tina',2,'C')
,('Ulrich',1,'B'),('Velma',5,'B'),('Wayne',2,'C'),('Xena',5,'B')
,('Yale',1,'D'),('Zoe',5,'C')
)
SELECT *
FROM CUSTOMER_POPULATION
WHERE Criterion1 IN (5,4,3,2,1)
AND Criterion2 IN ('A','B','C','D','E')
ORDER BY CONCAT(Criterion2, Criterion1)
LIMIT 1000000
NAME
CRITERION1
CRITERION2
John
1
A
Alice
3
A
Nora
3
A
Quincy
3
A
Ulrich
1
B
Leo
3
B
Sam
4
B
Xena
5
B
Velma
5
B
Dave
2
C
Wayne
2
C
Tina
2
C
Mary
2
C
Iona
3
C
Penny
3
C
Bob
4
C
Fred
5
C
Zoe
5
C
Oscar
1
D
Yale
1
D
Harvey
2
E
Esther
2
E
Gladys
3
E
Kate
4
E
Ruth
5
E
Carol
5
E
rank() will let you number the bands according their ordering by the two criteria. Because ties get the same ranking, you won't cut off the results in the middle of a band at exactly the one millionth row.
with data as (
select *, rank() over (order by Criterion2, Criterion1) as rnk
from CUSTOMER_POPULATION
where Criterion1 IN (5,4,3,2,1) and Criterion2 in ('A','B','C','D','E')
)
select * from data where rnk <= 1000000;

How to count distinct a field cumulatively using recursive cte or other method in SQL?

Using example below, Day 1 will have 1,3,3 distinct name(s) for A,B,C respectively.
When calculating distinct name(s) for each house on Day 2, data up to Day 2 is used.
When calculating distinct name(s) for each house on Day 3, data up to Day 3 is used.
Can recursive cte be used?
Data:
Day
House
Name
1
A
Jack
1
B
Pop
1
C
Anna
1
C
Dew
1
C
Franco
2
A
Jon
2
B
May
2
C
Anna
3
A
Jon
3
B
Ken
3
C
Dew
3
C
Dew
Result:
Day
House
Distinct names
1
A
1
1
B
1
1
C
3
2
A
2 (jack and jon)
2
B
2
2
C
3
3
A
2 (jack and jon)
3
B
3
3
C
3
Without knowing the need and size of data it'll be hard to give an ideal/optimal solution. Assuming a small dataset needing a quick and dirty way to calculate, just use sub query like this...
SELECT p.[Day]
, p.House
, (SELECT COUNT(DISTINCT([Name]))
FROM #Bing
WHERE [Day]<= p.[Day] AND House = p.House) DistinctNames
FROM #Bing p
GROUP BY [Day], House
ORDER BY 1
There is no need for a recursive CTE. Just mark the first time a name is seen in a house and use a cumulative sum:
select day, house,
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (partition by house order by day) as num_unique_names
from (select t.*,
row_number() over (partition by house, name order by day) as seqnum
from t
) t
group by day, house

Best way to by column and aggregation on another column

I want to create a rank column using existing rank and binary columns. Suppose for example a table with ID, RISK, CONTACT, DATE. The existing rank is RISK, say 1,2,3,NULL, with 3 being the highest. The binary-valued is CONTACT with 0,1 or FAILURE/SUCESS. I want to create a new RANK that will order by RISK once a certain number of successful contacts has been exceeded.
For example, suppose the constraint is a minimum of 2 successful contacts. Then the rank should be created as follows in the two instances below:
Instance 1. Three ID, all have a min of two successful contacts. In that case the rank mirrors the risk:
ID risk contact date rank
1 3 S 1 3
1 3 S 2 3
1 3 F 3 3
1 3 F 4 3
2 2 S 1 2
2 2 S 2 2
2 2 F 3 2
2 2 F 4 2
3 1 S 1 1
3 1 S 2 1
3 1 S 3 1
Instance 2. Suppose ID=1 has only one successful contact. In that case it is relegated to the lowest rank, rank=1, while ID=2 gets the highest value, rank=3, and ID=3 maps to rank=2 because it satisfies the constraint but has a lower risk value than ID=2:
ID risk contact date rank
1 3 S 1 1
1 3 F 2 1
1 3 F 3 1
1 3 F 4 1
2 2 S 1 3
2 2 S 2 3
2 2 F 3 3
2 2 F 4 3
3 1 S 1 2
3 1 S 2 2
3 1 S 3 2
This is SQL, specifically Hive. Thanks in advance.
Edit - I think Gordon Linoff's code does it correctly. In the end, I used three interim tables. The code looks like that:
First,
--numerize risk, contact
select A.* ,
case when A.risk = 'H' then 3
when A.risk = 'M' then 2
when A.risk = 'L' then 1
when A.risk is NULL then NULL
when A.risk = 'NULL' then NULL
else -999 end as RISK_RANK,
case when A.contact = 'Successful' then 1
else NULL end as success
Second,
-- sum_successes_by_risk
select A.* ,
B.sum_successes_by_risk
from T as A
inner join
(select A.person, A.program, A.risk, sum(a.success) as sum_successes_by_risk
from T as A
group by A.person, A.program, A.risk
) as B
on A.program = B.program
and A.person = B.person
and A.risk = B.risk
Third,
--Create table that contains only max risk category
select A.* ,
B.max_risk_rank
from T as A
inner join
(select A.person, max(A.risk_rank) as max_risk_rank
from T as A
group by A.person
) as B
on A.person = B.person
and A.risk_rank = B.max_risk_rank
This is hard to follow, but I think you just want window functions:
select t.*,
(case when sum(case when contact = 'S' then 1 else 0 end) over (partition by id) >= 2
then risk
else 1
end) as new_risk
from t;

How to compare two columns in SQL for multiple rows?

I have a data set with four columns (author, document, rating 1, rating 2)
How do I pick authors who have written a document that has been rated higher in rating 1 than rating 2, and has also written another document that has been rated higher in rating 2 than rating 1.
Basically:
AUTHOR DOCUMENT RATING 1 RATING 2
A 1 1 2
B 2 1 2
B 3 3 1
C 4 2 2
C 5 3 4
C 6 1 3
D 7 1 2
D 8 1 2
So my desired query will give me B and C because it has written docs that have had both higher and lower numbers in both ratings.
What I have:
SELECT DISTINCT author
FROM(
(SELECT author
FROM table_name
WHERE rating1 < rating2)
UNION
(SELECT author
FROM table_name
WHERE rating1 > rating2)
)
AS a
What I cant figure out is how to group the authors, test whether rating 1 and rating 2 are both higher and lower, output the name and then move on to the next group of authors. What the above prints is just the set of distinct names with either higher or lower numbers. So this one would print D as well for example.
What is my SQL code missing that would satisfy the criteria mentioned above
Try this,
select *
from myTable as t1
inner join MyTable as t2
on t1.author = t2.author
and t2.rating1 > t2.rating2
where t1.rating1 > t1.rating2

SQL for finding the counts per user

Lets say I have the following table:
Student Course University
1 a x
1 b x
1 c x
1 a y
2 a x
2 a y
2 a z
3 a x
For each student, I am trying to find the number of unique courses and universities that they are enrolled in.
The output would be as follows:
Student No. of Courses No. of Universities
1 3 2
2 1 3
3 1 1
How would I construct the SQL for this?
SELECT Student,
COUNT(DISTINCT Course) AS NumberOfCourses,
COUNT(DISTINCT University) AS NumberOfUniversities
FROM YourTable
GROUP BY Student