SQL Find consistent failures - sql

I have a table of student ID's subjects and grades
ID SUBJECT GRADE DATE
01 math A 23/1/2013
02 eng C 22/2/2013
02 math D 24/3/2012
03 social B- 1/3/2012
03 math E 14/5/2014
......
For most subjects a fail grade is C, D or E
For math a fail grade is B-, C, D or E
I want to find students who have had a run of 5 consecutive failed grades within a cycle of 15 grades. I want to be alerted immediately after 5 happen. So for example
A A A B C C C C C A A A A B A
or
D E E E B- (maths)
After 11 passes I am not interested
D D D D A A A A A A A B B B B
I am using postgresql and am guessing that a window or aggregate function would help here?

You might like to take an approach of assigning an integer to the grades, such that 0 is a fail and 1 is a pass (depending on subject of course, and preferably looked up from a table that correlates grades and subjects to passes and fails).
The problem then reduces to, "In a series of 5 integers, is the sum equal to 0?".
Something like:
Sum(pass_fail_integer) over (partition by student
order by date
rows between 4 preceding and current row)
If a sequence of 5 fails defines the condition that you want to be warned of, I'm not clear on the significance of the cycle of 15 grades. Possibly you'd be looking for a series of 15 integers with a sum of less than 11?
Edit: if you want to confine the search to the most recent 15 grades, then a subquery that assigns a row_number to the grades per student, in date descending order, allows you to filter for the most recent 15, in which you would then apply the above logic to determine whether there are five consecutive failures.
So the general structure of the query would be:
select
distinct student
from (
select ...
sum(pass_fail_integer) over
(partition by student
order by date
rows between 4 preceding and
current row) consecutive_failures
from (
select ...
row_number() over (partition by student
order by date desc) rn
from ...)
where rn <= 15)
where consecutive_failures = 5)
You might leverage that inner query to also evaluate whether 5 failures have occurred in the 15 grade window, so you can eliminate early any students for which a check for 5 consecutive failures is not needed.

Related

SQL query to find the average time differences between two statuses

I am trying to find the time between status changes for tickets. The statuses are A,B,C,D,E. I need to identify where the bottlenecks are in the system. The table looks something like this:
ticket_no
created_at
current_status
next_status
1
12/2/2022
A
B
1
12/3/2022
B
C
1
12/3/2022
C
B
1
12/4/2022
B
C
1
12/4/2022
C
E
2
12/4/2022
A
C
2
12/5/2022
C
D
2
12/7/2022
D
E
As you can see for ticket 1, it cycled between statuses B and C before finally ending at E. I want to calculate the average time tickets take to move between specific statuses (say A->C, C->E). It’s a bit confusing because tickets can return to previous statuses and tickets don’t need to move through every status. There is an order to the statuses but you can return to a previous state.
Any ideas?
I’ve tried a bunch of things, like lagging (only looks at previous/next), or even pivoting with case statements and subtracting but it doesn't seem to work.
Again the ask is to find the time spent (on average) to go between 2 specific statuses, such as A->C or C->E
Here's my query so far. The idea is to pivot things and just subtract, but I'm really not sure this is gonna be valid
with pv_times as (
select ticket_no,
max(case when current_status='A' and next_status='B' then created_at else null end) as ab_time,
max(case when current_status='A' and next_status='C' then created_at else null end) as ac_time
FROM statuses
GROUP BY 1
)
select * from pv_times
# subtract times to find diff...but is this even valid?
time spent to go between 2 specific statuses
Enumerate all such statuses.
This is the lower diagonal triangle of a 5 × 5 matrix.
Then do a JOIN (.merge) to aggregate all observed
transitions against that vector of possibilities,
.count()'ing the number of them we observed.

Error in getting the value from the database table

I've sorted out people who got full score for the challenge of different difficulty level. However, the question states that query hacker_id and name of the people who got full score for more than once. I'm encountering a problem with COUNT. I tried to count the frequency of the name appearing in the table, but it wouldn't allow me to. I suspect there is something wrong with the GROUPBY syntax. Could anybody help me?
Previous Code
Select s.challenge_id,s.hacker_id,h.name,s.submission_id,c.difficulty_level,s.score
FROM (((Hackers AS h JOIN Submission AS s ON h.hacker_id=s.hacker_id)JOIN Challenges AS c ON c.challenge_id=s.challenge_id)JOIN Difficulty AS d ON d.difficulty_level=c.difficulty_level)
WHERE c.difficulty_level=d.difficulty_level and s.score=d.score
Result
challenge_id | hacker_id | name |submission_id |difficulty_level |score
71055 86870 Todd 94613 2 30
66730 90411 Joe 97397 6 100
71055 90411 Joe 97431 2 30
Problem
Select g.hacker_id,g.name,COUNT(g.name)
FROM (Select s.challenge_id,s.hacker_id,h.name,s.submission_id,c.difficulty_level,s.score
FROM (((Hackers AS h JOIN Submission AS s ON h.hacker_id=s.hacker_id)JOIN Challenges AS c ON c.challenge_id=s.challenge_id)JOIN Difficulty AS d ON d.difficulty_level=c.difficulty_level)
WHERE c.difficulty_level=d.difficulty_level and s.score=d.score) AS g
WHERE COUNT(g.name)>1
GROUBY g.hacker_id,g.name;
If you need COUNT of each name that appeared more than once in your first query then you can use the following
SELECT hacker_id, name, COUNT(name)
FROM
(
Select s.challenge_id,s.hacker_id,h.name,s.submission_id,c.difficulty_level,s.score
FROM (((Hackers AS h JOIN Submission AS s ON h.hacker_id=s.hacker_id)JOIN Challenges AS c ON c.challenge_id=s.challenge_id)JOIN Difficulty AS d ON d.difficulty_level=c.difficulty_level)
WHERE c.difficulty_level=d.difficulty_level and s.score=d.score
) AS T
GROUP BY hacker_id, name
HAVING COUNT(name) > 1
HAVING is used to filter the aggregated result

How to filter the max value and write to row?

Postgres 9.3.5, PostGIS 2.1.4.
I have tow tables (polygons and points) in a database.
I want to find out how many points are in each polygon. There will be 0 points per polygon or more than 200000. The little hick up is the following.
My point table looks the following:
x y lan
10 11 en
10 11 fr
10 11 en
10 11 es
10 11 en
- #just for demonstration/clarification purposes
13 14 fr
13 14 fr
13 14 es
-
15 16 ar
15 16 ar
15 16 ps
I do not simply want to count the number of points per polygon. I want to know what is the most often occuring lan in each polygon. So, assuming each - indicates that the points are falling into a new polygon, my results would look the following:
Polygon table:
polygon Count lan
1 3 en
2 2 fr
3 2 ar
This is what I got so far.
SELECT count(*), count.language AS language, hexagons.gid AS hexagonsWhere
FROM hexagonslan AS hexagons,
points_count AS france
WHERE ST_Within(count.geom, hexagons.geom)
GROUP BY language, hexagonsWhere
ORDER BY hexagons DESC;
It gives me the following:
Polygon Count language
1 3 en
1 1 fr
1 1 es
2 2 fr
2 1 es
3 2 ar
3 1 ps
Two things remain unclear.
How to get only the max value?
How will cases be treated where there are by any chance the max values identical?
Answer to 1.
To get the most common language and its count per Polygon, you could use a simple DISTINCT ON query:
SELECT DISTINCT ON (h.gid)
h.gid AS polygon, count(c.geom) AS ct, c.language
FROM hexagonslan h
LEFT JOIN points_count c ON ST_Within(c.geom, h.geom)
GROUP BY h.gid, c.language
ORDER BY h.gid, count(c.geom) DESC, c.language; -- language name is tiebreaker
Select first row in each GROUP BY group?
But for the data distribution you describe (up to 200.000 points per polygon), this should be substantially faster (hoping to make better use of an index on c.geom):
SELECT h.gid AS polygon, c.ct, c.language
FROM hexagonslan h
LEFT JOIN LATERAL (
SELECT c.language, count(*) AS ct
FROM points_count c
WHERE ST_Within(c.geom, h.geom)
GROUP BY 1
ORDER BY 2 DESC, 1 -- again, language name is tiebreaker
LIMIT 1
) c ON true
ORDER BY 1;
Optimize GROUP BY query to retrieve latest record per user
LEFT JOIN LATERAL .. ON true preserves polygons not containing any points.
Call a set-returning function with an array argument multiple times
In cases where there are by any chance the max values identical, the alphabetically first language is picked in the example, by way of the added ORDER BY item. If you want all languages that happen to share the maximum count, you have to do more:
Answer to 2.
SELECT h.gid AS polygon, c.ct, c.language
FROM hexagonslan h
LEFT JOIN LATERAL (
SELECT c.language, count(*) AS ct
, rank() OVER (ORDER BY count(*) DESC) AS rnk
FROM points_count c
WHERE ST_Within(c.geom, h.geom)
GROUP BY 1
) c ON c.rnk = 1
ORDER BY 1, 3 -- language only as additional sort critieria
Using the window function rank() here, (not row_number()!). We can get the count or points and the ranking of the count in a single SELECT. Consider the sequence of events:
Best way to get result count before LIMIT was applied

Tricky aggregation Oracle 11G

I am using Oracle 11G.
Here is my data in table ClassGrades:
ID Name APlusCount TotalStudents PctAplus
0 All 44 95 46.31
1 Grade1A 13 24 54.16
2 Grade1B 11 25 44.00
3 Grade1C 8 23 34.78
4 Grade1D 12 23 52.17
The data (APlusCount,TotalStudents) for ID 0 is the sum of data for all classes.
I want to calculate how each class compares to other classes except itself.
Example:
Take Grade1A that has PctAplus = 54.16.
I want to add all values for Grade1B,Grade1C and Grade1D which is;
((Sum of APlusCount for Grade 1B,1C,1D)/(Sum of TotalStudents for Grade 1B,1C,1D))*100
=(31/71)*100=> 43.66%
So Grade1A (54.16%) is doing much better when compared to its peers (43.66%)
I want to calculate Peers collective percentage for each Grade.
How do I do this?
Another approach might be to leverage the All record for totals (self cross join as mentioned in the comments), i.e.,
WITH g1 AS (
SELECT apluscount, totalstudents
FROM grades
WHERE name = 'All'
)
SELECT g.name, 100*(g1.apluscount - g.apluscount)/(g1.totalstudents - g.totalstudents)
FROM grades g, g1
WHERE g.name != 'All';
However I think that #Wernfried's solution is better as it doesn't depend on the existence of an All record.
UPDATE
Alternately, one could use an aggregate along with a GROUP BY in the WITH statement:
WITH g1 AS (
SELECT SUM(apluscount) AS apluscount, SUM(totalstudents) AS totalstudents
FROM grades
WHERE name != 'All'
)
SELECT g.name, 100*(g1.apluscount - g.apluscount)/(g1.totalstudents - g.totalstudents)
FROM grades g, g1
WHERE g.name != 'All';
Hope this helps. Again, the solution using window functions is probably the best, however.
I don't know how to deal with "All" record but for the others this is an approach:
select Name,
100*(sum(APlusCount) over () - APlusCount) /
(sum(TotalStudents) over () - TotalStudents) as result
from grades
where name <> 'All';
NAME RESULT
=================================
Grade1A 43.661971830986
Grade1B 47.142857142857
Grade1C 50
Grade1D 44.444444444444
See example in SQL Fiddle

Same entity from different tables/procedures

I have 2 procedures (say A and B). They both return data with similar columns set (Id, Name, Count). To be more concrete, procedures results examples are listed below:
A:
Id Name Count
1 A 10
2 B 11
B:
Id Name Count
1 E 14
2 F 15
3 G 16
4 H 17
The IDs are generated as ROW_NUMBER() as I don't have own identifiers for these records because they are aggregated values.
In code I query over the both results using the same class NameAndCountView.
And finally my problem. When I look into results after executing both procedures sequentially I get the following:
A:
Id Name Count
1 A 10 ->|
2 B 11 ->|
|
B: |
Id Name Count |
1 A 10 <-|
2 B 11 <-|
3 G 16
4 H 17
As you can see results in the second set are replaced with results with the same IDs from the first. Of course the problem take place because I use the same class for retrieving data, right?
The question is how to make this work without creating additional NameAndCountView2-like class?
If possible, and if you don't really mind about the original Id values, maybe you can try having the first query return even Ids :
ROW_NUMBER() over (order by .... )*2
while the second returns odd Ids :
ROW_NUMBER() over (order by .... )*2+1
This would also allow you to know where the Ids come from.
I guess this would be repeatable with N queries by having the query number i selecting
ROW_NUMBER() over (order by .... )*n+i
Hope this will help