Cumulative Sum and Percentage - sql

Can someone help me in getting the cumulative sum and percentage. I have three fields in the table "capability"
vertical|Defects(F)|Defects(NF)|
Billing | 193 |678
Provi |200 |906
Billing |232 |111
Analyt |67 |0
Provi |121 |690
I would want the final output to be
Vertical|Total Defects|Cumulative Defects|Cumulative%
Billing |1214 | 1214 |37.96%
Provi |1917 | 3131 |97.90%
Analyt |67 | 3198 |100.00%
Please note that I have around 3mn rows and data keeps increasing day on day.

I find the easiest way to get the cumulative sum is by using a correlated subquery -- unless you have the full power of window functions as in SQL Server 2012
However this poses several challenges. First, you need to summarize by the verticals. Then you need to order the verticals in the right order. Finally, you need to output the results the way you want them:
This should come close to what you want:
with d as
(select vertical, sum(Defects_N + Defects_NF) as Defects,
(case when vertical = 'Billing' then 1
when vertical = 'Provi' then 2
else 3
end) as num
from t
group by vertical
)
select vertical, defects as TotalDefects, cumDefects,
100 * cast(cumDefects as float) / fulldefects as DefectPercent
from (select d.*,
(select sum(defects) from d d2 where d2.num <= d.num
) as cumdefects,
SUM(defects) over () as fulldefects
from d
) d

Ok may be this will help someone trying to get the similar output. Here is what I did to get the desired output. Thanks a lot gordon, your suggestion actually helped me to make this possible
with d as
(SELECT ROW_NUMBER () OVER
(ORDER BY SUM([DEFECTS(F)]+[DEFECTS(NF)]) asc) NUM,
VERTICAL,
SUM([DEFECTS(F)]+[DEFECTS(NF)]) AS [DEFECTS]
FROM Capability
GROUP BY VERTICAL
)
select left(vertical,5) as Vertical, defects as TotalDefects, cumDefects,
cast(cumDefects as float) / fulldefects as DefectPercent
from (select d.*,
(select sum(defects) from d d2 where d2.num <= d.num
) as cumdefects,
SUM(defects) over () as fulldefects
from d
) d

Related

SQL query for all possible combinations from table

I have a table as result of some calculations from SQL database and it looks like this:
[ID] [PAR1] [PAR2]
[A] [110] [0.5]
[B] [105] [1.5]
[C] [120] [2.0]
[D] [130] [3.0]
[E] [115] [5.5]
[F] [130] [6.5]
[G] [120] [7.0]
[H] [110] [7.5]
[I] [105] [8.0]
[J] [120] [9.0]
[K] [110] [9.5]
It's sorted by PAR2 - less means better result.
I need to find the best result of SUM PAR2 from 3 rows, where sum of PAR1 is minimum 350 (at least 350). For ex.:
combination of A+B+C give the the best result of sum PAR2 (0.5+1.5+2.0=4.0), but sum of PAR1: 110+105+120=335 <(350) - condition is not ok, can't use the result,
combination of A+B+D give the result of sum PAR2 (0.5+1.5+3.0=5.0), but sum of PAR1: 110+105+130=345 <(350)- condition is not ok, cant's use the result
combination of A+B+E give the result of sum PAR2 (0.5+1.5+5.5=7.5), but sum of PAR1: 110+105+115=330 <(350)- condition is not ok, cant's use the result
combination of A+B+F give the result of sum PAR2 (0.5+1.5+6.5=8.5), but sum of PAR1: 110+105+130=345 <(350)- condition is not ok, cant's use the result
(...)
combination of B+C+D give the result of sum PAR2 (1.5+2.0+3.0=6.5), and sum of PAR1: 105+120+130=355 >(350)- condition is ok!, so we have a winner with best result 6.5
It is an ASP.NET application, so I tried to get the table from database and use VB code behind to get the result, but this is a "manually" work using FOR..NEXT LOOP, takes a time. So it's not nice and good option for calculations like this and also too slow.
I am wondering if there is a better smooth and smart SQL code to get the result directly from SQL Query. Maybe some advanced math functions? Any ideas?
Thanks in advance.
I made some test using forpas solution, and yes, it works very good. But it takes to much time when i added a lot of WHERE conditions, because original table is very large. So I will try to find a solution for using temp tables in function (not procedures). Thank you all for your answers.
forpas, special thanks also for example and explanation, in this way you let me quikly understand your idea - this is master level ;)
You can use a double inner self-join like this:
select top 1 * from tablename t1
inner join tablename t2 on t2.id > t1.id
inner join tablename t3 on t3.id > t2.id
where t1.par1 + t2.par1 + t3.par1 >= 350
order by t1.par2 + t2.par2 + t3.par2
See the demo.
Results:
> ID | PAR1 | PAR2 | ID | PAR1 | PAR2 | ID | PAR1 | PAR2
> :- | ---: | :--- | :- | ---: | :--- | :- | ---: | :---
> A | 110 | 0.5 | C | 120 | 2.0 | D | 130 | 3.0
So the winner is A+C+D because:
110 + 120 + 130 = 360 >= 350
and the sum of PAR2 is
0.5 + 2.0 + 3.0 = 5.5
which is the minimum
Check this. I feel its accurate or close to your requiremnt-
WITH CTE (ID,PAR1,PAR2)
AS
(
SELECT 'A',110,0.5 UNION ALL
SELECT 'B',105,1.5 UNION ALL
SELECT 'C',120,2.0 UNION ALL
SELECT 'D',130,3.0 UNION ALL
SELECT 'E',115,5.5 UNION ALL
SELECT 'F',130,6.5 UNION ALL
SELECT 'G',120,7.0 UNION ALL
SELECT 'H',110,7.5 UNION ALL
SELECT 'I',105,8.0 UNION ALL
SELECT 'J',120,9.0 UNION ALL
SELECT 'K',110,9.5
)
SELECT B.AID,B.BID,B.CID,SUM_P2,SUM_P1
(
SELECT * , ROW_NUMBER() OVER (PARTITION BY CHAR_SUM ORDER BY CHAR_SUM) CS
FROM
(
SELECT ASCII(A.ID) + ASCII(B.ID)+ASCII(C.ID) CHAR_SUM,
A.ID AID,B.ID BID,C.ID CID,
(A.PAR2+B.PAR2+C.PAR2) AS SUM_P2,
(A.PAR1+B.PAR1+C.PAR1) AS SUM_P1
FROM CTE A
CROSS APPLY CTE B
CROSS APPLY CTE C
WHERE A.ID <> B.ID AND A.ID <> C.ID AND B.ID <> C.ID
AND (A.PAR1+B.PAR1+C.PAR1) >= 350
) A
)B
WHERE CS = 1
You might try to cross join the table with itself three times. This way you would have all the combination of three rows pivoted on a single row, thus making you able to apply the conditions required and picking the maximum value.
select t1.ID, t2.ID, t3.ID, t1.PAR2 + t2.PAR2 + t3.PAR2
from yourTable t1
cross join
yourTable t2
cross join
yourTable t3
where t1.ID < t2.ID and t2.ID < t3.ID and
t1.PAR1 + t2.PAR1 + t3.PAR1 >= 350
order by t1.PAR2 + t2.PAR2 + t3.PAR2 ASC
While this solution should technically work, cross joining tables is not ideal performance-wise, even more when doing it multiple times. If the size of the table is going to grow over time, and you have the option to apply the calculation at code level, I think it would be advisable to do so.
Edit
Changed the where clause including Serg's suggestion

SQL Cross join to use max value for calculation in a case statement without group by

I have the following table:
|BoreholeID|Mins|
-----------------
|BH1 |0.5 |
|BH1 |1 |
|BH1 |1.5 |
and i want to select a third column called timeline that has a case statement that returns either a 1 if the mins value is greater than 80% of the max mins value AND if the mins value is greater than the max mins value minus 5. I have the following query to do this:
select boreholeid, mins,
(case when mins < (max(max_query.maxts)*0.8) and
mins<(max(max_query.maxts)-5) then 1 else 0 end) as Timeline
from maxrawcalcs
cross join
(select max(maxrawcalcs.mins) maxts from maxrawcalcs) as max_query
;
I have used a cross join in the past to use a max value like this and it worked no problem, but this query is telling me that i need to use a group by query for the other two selected fields which i do not want to do. How can i get around using a group by?
I've used a nested SQL and kept your logic as the following :
select q.boreholeid, q.mins,
(case when q.mins > ((q.maxts)*0.8) and
q.mins > ((q.maxts)-5)
then 1 else 0
end ) as Timeline
from
(
select boreholeid, mins,
(select max(mins) maxts from maxrawcalcs) as maxts
from maxrawcalcs
) q;
boreholeid mins Timeline
BH1 0.5 0
BH1 1 0
BH1 1.5 1
SQL Fiddle Demo

Top 3 values per group query MS Access

im new to MS access, and im trying to make a query that will pull up the top 3 people in 3 different categories in terms of points, i.e the desired outcome is :
Child's name | Membership Type | Total Points
=============================================
Jon Snow | Senior | 12
Hodor | Senior | 13
Bran Stark | Senior | 67
Cersei | Intermediate | 14
Joffery | Intermediate | 19
Ramsay Bolton| Intermediate | 25
Wun-Wun | Junior | 14
Arya Stark | Junior | 64
Ned Stark | Junior | 125
Ive found bits of code like this, which i /think/ does it,
SELECT StudentID, TestID, TestScore
FROM MyTable t
WHERE TestID IN
(
SELECT TOP 3 TestID
FROM MyTable
WHERE StudentID = t.StudentID
ORDER BY TestScore DESC, TestID
)
ORDER BY StudentID, TestScore DESC, TestID;
But i have no idea what this means, let alone how to adapt it to fit my needs.
Does anyone out there have an idea on how to get the desired out come?
EDIT: subbed in version that pulls up a syntax error.
SELECT [Members.Childs Name], [Members.Membership Type], [Results.Total Points]
FROM
(SELECT [Members.Childs Name], [Members.Membership Type], [Results.Total Points],
(SELECT Count(*) FROM [Results], [Members] sub
WHERE sub.Total Points <= Results.Total Points
AND sub.Membership Type = Members.Membership Type) As GroupRank
FROM Members, Results t) As main
WHERE main.GroupRank <= 3
ORDER BY [main.Membership Type],[main.Total Points DESC]
P.S Unrelated, but The finale was amazing :)
Consider a correlated sub query that calculates an ordinal rank count which you can then use as a derived table to select top three:
SELECT main.StudentID, main.MembershipType, main.TestScore
FROM
(SELECT t.StudentID, t.MembershipType, t.TestScore,
(SELECT Count(*) FROM MyTable sub
WHERE sub.TestScore >= t.TestScore
AND sub.MembershipType = t.MembershipType) As GroupRank
FROM MyTable t) As main
WHERE main.GroupRank <= 3
ORDER BY main.MembershipType, main.TestScore DESC
To explain specifically, GroupRank is calculated from a subquery (nested select in column section) that ranks TestScores for each MembershipType of outer query. However, this is not enough as you will want to use this calculated GroupRank to pick top three. So nest entire query inside a FROM clause which is known as a derived table as you created an implicit table to return another resultset. This final resultset filters for top 3 and then orders TestScores for each Membership.
In MS Access, you can save entire FROM clause query as its own stored query and then use that query to filter for top three:
SELECT g.StudentID, g.MembershipType, g.TestScore
FROM GroupRankQuery g
WHERE g.GroupRank <= 3
ORDER BY g.MembershipType, g.TestScore DESC
For multiple tables, use table aliases to help which temporarily renames table sources for easier referencing:
SELECT main.[Childs Name], main.[Membership Type], main.[Total Points]
FROM
(SELECT m.[Childs Name], m.[Membership Type], r.[Total Points],
(SELECT Count(*) FROM [Results] subR
INNER JOIN [Members] subM ON subR.StudentID = subM.StudentID
WHERE subR.[Total Points] >= r.[Total Points]
AND subM.[Membership Type] = m.[Membership Type]) As GroupRank
FROM Results r
INNER JOIN Members m ON r.StudentID = m.StudentID) As main
WHERE main.GroupRank <= 3
ORDER BY main.[Membership Type], main.[Total Points] DESC

Find gaps of a sequence in SQL without creating additional tables

I have a table invoices with a field invoice_number. This is what happens when i execute select invoice_number from invoice:
invoice_number
--------------
1
2
3
5
6
10
11
I want a SQL that gives me the following result:
gap_start | gap_end
4 | 4
7 | 9
How can i write a SQL to perform such query?
I am using PostgreSQL.
With modern SQL, this can easily be done using window functions:
select invoice_number + 1 as gap_start,
next_nr - 1 as gap_end
from (
select invoice_number,
lead(invoice_number) over (order by invoice_number) as next_nr
from invoices
) nr
where invoice_number + 1 <> next_nr;
SQLFiddle: http://sqlfiddle.com/#!15/1e807/1
We can use a simpler technique to get all missing values first, by joining on a generated sequence column like so:
select series
from generate_series(1, 11, 1) series
left join invoices on series = invoices.invoice_number
where invoice_number is null;
This gets us the series of missing numbers, which can be useful on it's own in some cases.
To get the gap start/end range, we can instead join the source table with itself.
select invoices.invoice_number + 1 as start,
min(fr.invoice_number) - 1 as stop
from invoices
left join invoices r on invoices.invoice_number = r.invoice_number - 1
left join invoices fr on invoices.invoice_number < fr.invoice_number
where r.invoice_number is null
and fr.invoice_number is not null
group by invoices.invoice_number,
r.invoice_number;
dbfiddle: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=32c5f3c021b0f1a876305a2bd3afafc9
This is probably less optimised than the above solutions, but could be useful in SQL servers that don't support lead() function perhaps.
Full credit goes to this excellent page in SILOTA docs:
http://www.silota.com/docs/recipes/sql-gap-analysis-missing-values-sequence.html
I highly recommend reading it, as it explains the solution step by step.
I found another query:
select invoice_number + lag gap_start,
invoice_number + lead - 1 gap_end
from (select invoice_number,
invoice_number - lag(invoice_number) over w lag,
lead(invoice_number) over w - invoice_number lead
from invoices window w as (order by invoice_number)) x
where lag = 1 and lead > 1;

Querying Data Backward in ORACLE SQL

I have a simple question regarding oracle sql. So i have this table
WEEKNUM DATA
1 10
2 4
3 6
4 7
So i want to make a view that shows like this,
WEEKNUM DATA ACCUM_DATE
1 10 10
2 4 14
3 6 20
4 7 27
I spend hours on this simple one but couldnt get any luck
thanks a lot
SELECT weeknum,
data,
sum(data) over (order by weeknum) accum_data
FROM your_table_name
should work. I'm using the sum analytic function here and assuming that you want to start with the smallest weeknum value and keep increasing the running total as the weeknum values increase. I'm also assuming that you never want to reset the accumulated sum. If you're trying to do something like generating an accumulated sum that restarts each year, you'd want to add a partition by to the analytic function.
You could use a Cross JOin in this case
Query:
select
A.WEEKNUM
, A.DATA
, SUM(B.DATA) DA
from table1 A
cross join table1 B
WHERE A.WEEKNUM>=B.WeekNUM
GROUP BY A.WEEKNUM
, A.DATA
order by A.WEEKNUM
Result:
WEEKNUM DATA DA
1 10 10
2 4 14
3 6 20
4 7 27
Thanks guys but i just found out this method works perfectly,
OVER (ORDER BY WEEKNUM ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CUMULATIVE_WEIGHT
Or use a sub-select to calculate:
select WEEKNUM, DATA, (select sum(DATA) from tablename t2
where t2.weeknum <= t1.weeknum) as ACCUM_DATE
from tablename t1