SQL Server performance issue with while loops - sql

I just need advice on how I could speed up my code. I'm supposed to count on yearly base, how the grades of some students are improving and calculate in percentage. Also keep in mind that I have around 100k-150k records per year.
Basically end results look like this, so at end of 20150131, 2% of students had grade A finished with grade B and so on.
Grade Date B C
A 20150131 2% 3%
B 20150131 88% 85%
C 20150131 10% 12%
A 20140131 2% 3%
B 20140131 88% 85%
C 20140131 10% 12%
A 20130131 2% 3%
B 20130131 88% 85%
C 20130131 10% 12%
Input looks like this .. just info about student and his grade on certain date
Student Date Grade
1 20150131 A
2 20150131 C
3 20150131 A
1 20140131 B
2 20140131 B
3 20140131 A
My code looks like this:
WHILE #StartDateInt > #PeriodSpan
BEGIN
while #y <= #CategoriesCount
BEGIN
set #CurrentGr = (Select Grade from #Categories where RowID = #y)
set #CurrentGrCount = (Select COUNT(Students) from #TempTable where Period = #PeriodSpan and Grade = #CurrentGr)
set #DefaultCurrentGr = (Select Grade from #Categories where RowID = #y)
insert into Grade_MTRX (Student, Period, Grades_B, SessionID)
select temp1.Grade, #PeriodNextSpan as Period, COUNT(Grades_B)/#CurrentGrCount as 'Grades_B', #SessionID
from #TempTable temp1
join #TempTable temp2 on temp1.Student = temp2.Student and temp1.Period + 10000 = temp2.Period
where temp1.Grade = #CurrentGr and temp2.Grade = 'C' and temp1.Period = #PeriodSpan
group by temp1.Grade, temp1.Period
update Grade_MTRX set Grades_C = (
select COUNT(Grades_C)/#CurrentGrCount
from #TempTable
where Grade = 'C' and Period = #PeriodNextSpan)
where Category = #CurrentGr and Period = #PeriodNextSpan
end
end
I understand SQL Server doesn't like while loops, as I understand it kills it's performance... But I'm using while inside of while loop... going over years, for each grade and just counting them and... first I insert 1 row of current grade, and then I keep updating that row until its fully populated.
I do understand this is really bad, but at the end that's why I am here to learn better way to accomplish this.
Thank you in advance!

150,000 records per year is really nothing. Let's say you had this Grade table:
CREATE TABLE Grade(
student_id INT,
date INT,
grade CHAR);
With this info:
student_id date grade
1 2013 A
1 2014 A
1 2015 B
2 2013 B
2 2014 A
2 2015 C
3 2013 C
3 2014 A
3 2015 B
Then if you just run a query like:
SELECT this_year.date, last_year.grade AS last_year, this_year.grade AS this_year, COUNT(*) AS total,
(100.0 * COUNT(*)) / (SELECT COUNT(*) FROM Grade WHERE date = this_year.date) AS percent
FROM Grade AS this_year
INNER JOIN Grade AS last_year ON this_year.date = last_year.date + 1
AND this_year.student_id = last_year.student_id
GROUP BY this_year.date, this_year.grade, last_year.grade
ORDER BY 1, 2, 3;
you end up with these results:
date | last_year | this_year | total | percent
------+-----------+-----------+-------+---------------------
2014 | A | A | 1 | 33.3333333333333333
2014 | B | A | 1 | 33.3333333333333333
2014 | C | A | 1 | 33.3333333333333333
2015 | A | B | 2 | 66.6666666666666667
2015 | A | C | 1 | 33.3333333333333333
(5 rows)
Having a few million rows of data with this kind of query shouldn't be any real trouble. Even tens of millions of rows. But if you need things to be faster still then check out windowing functions that you can do with Postgres, Oracle, and MSSQL server.

Related

Issue with SQL Group By and COALESCE on sqlite

I have a table as below in sqlite database. I want to create a line chart showing usage by product groups.
Table: ProductUsageData
UserID ProductName ProductGroup Qty RecordID
1 A1 A 12 1
2 A1 A 12 1
1 A2 A 15 1
3 A1 A 12 2
2 B1 B 12 2
5 B2 B 5 2
1 A1 A 12 3
1 A2 A 15 3
4 A1 A 12 3
3 C1 C 12 3
2 C2 C 15 3
Since I want separate line for each ProductGroup I am using below Query
SELECT
SUM(Qty) as UsedQty,
ProductGroup,
RecordID
FROM ProductUsageData
GROUP BY ProductGroup, RecordID
ORDER BY RecordID ASC;
While I get three records for A (for each RecordID) I get only 1 record each for B & C as they are not used during each RecordID.
Problem is when I am putting one line for each ProductGroup in the chart, the points for B & C are shown as per Qty in the first
My output is like this
A 39 1
A 12 2
B 17 2
A 39 3
C 27 3
So the graph looks like this
instead of
To fix this I changed the query using COALESCE to get 0 Qty if the ProductGroup is not used during each recording.
SELECT
COALESCE(SUM(Qty), 0) as UsedQty,
ProductGroup,
RecordID
FROM ProductUsageData
GROUP BY ProductGroup, RecordID
ORDER BY RecordID ASC;
I was expecting output as below
A 39 1
B 0 1
C 0 1
A 12 2
B 17 2
C 0 2
A 39 3
B 0 3
C 27 3
But I am getting same output as first
Please let me know how can I correct the query to get desired output
A typical solution is to first cross join two queries that select the distinct product groups and record ids from the table; this gives you all possible combinations of productGroup and recordID.
Then, you can bring in the original table with a left join, and aggregate:
select
g.productGroup,
coalesce(sum(p.qty), 0) qty,
r.recordID
from (select distinct productGroup from productUsageData) g
cross join (select distinct recordID from productUsageData) r
left join productUsageData p
on p.productGroup = g.productGroup
and p.recordID = r.recordID
group by r.recordID, g.productGroup
order by r.recordID, g.productGroup
In the real world, you might have separate referential tables for product groups and records ids, which would make the query simpler and more efficient (since it would avoid the need to select distinct in subqueries).
Demo on DB Fiddle:
productGroup | qty | recordID
:----------- | :-- | :-------
A | 39 | 1
B | 0 | 1
C | 0 | 1
A | 12 | 2
B | 17 | 2
C | 0 | 2
A | 39 | 3
B | 0 | 3
C | 27 | 3

to join the same table to match records from last year

I want to create a join that joins records of last year to the same period of the current year. All the data is in the same table.
Input Table:
A | B | C | D
a 2017 1 10
a 2017 2 20
a 2017 3 5
a 2016 1 100
a 2016 2 50
a 2016 3 1
Output Table:
A | B | C | D | E
a 2017 1 10 100
a 2017 2 20 50
a 2017 3 5 1
a 2016 1 100 NULL
a 2016 2 50 NULL
a 2016 3 1 NULL
There are several ways of doing this. One is a left join:
select t.*, tprev.d as e
from t left join
t tprev
on t.a = tprev.a and t.c = tprev.c and tprev.b = t.b - 1;
Another way is to use window functions:
select t.*, lag(t.d) over (partition by t.a, t.c order by b) as e
from t;
These work on the same on your data. But they are subtly different. The first looks specifically at the data one year before. The second looks at the most recent year before where there is data.

Complex SQL QUERY in MS ACCESS across three tables with sum()

I'm Working on a large financial project, building queries in SQL on MS-ACCESS.
These are my tables.
CE
anomes cod_c Name NIP Nip_Grupo
201706 1 ABC 10 50
201706 1 DDD 12 50
201706 2 CCC 11 50
O
anomes cod_c ID_O Nip_1 val_1 val_2
201706 1 ACA_00 10 500 200
201706 1 ACB_01 12 100 300
201706 2 ACC_07 11 50 400
OS
anomes cod_c ID_O Stage
201706 1 ACA_00 1
201706 1 ACB_01 2
201706 2 ACC_07 3
What I need is a list like this
Name | Sum (val1 + val2) where stage =1 | Sum (val1 + val2) where stage =2 |
ABC | x | x
DDD | x | x
CCC | x | x
This list should be accomplished by only entering Nip_Grupo (which connects the companies in table CE) And AnoMes which is a time code (yearmonth) reference.
Then the second table (O) has operations with intervenients and I'm looking for the Nip_1 to be the same as nip on CE and then link each operation from that company with the stage in OS so that I can sum the total values of operation, per company(CE) from a group, per stage.
It seems pretty straight forward, but I don't always have records on the table OS that link a stage to an entry in table O, at that point, I needed the result to show zero.
This is my Query so far ( a simplified version to fit my example):
SELECT CE.Name, (Sum([O].[val1])+Sum(val2))
FROM CE
INNER JOIN O ON (CE.Cod_Contraparte = O.Cod_Contraparte) AND
(CE.AnoMes = O.AnoMes) AND (CE.Nip = O.Nip_1Titular))
LEFT JOIN OPERACOES_STAGING_lnk AS OS ON (O.AnoMes = OS.AnoMes) AND
(O.ID_Operacao = OS.ID_Operacao)
WHERE (((CE.Nip_Grupo)=[enter nip:]) AND ((CE.anomes)=[enter anomes:]) AND
((CE.Nip)=[O].[Nip_1])) AND ((OS.Stage)=[2])
GROUP BY CE.Nome
ORDER BY CE.Nome
And this query returns only the sum when the stage is 2, and only if I have records on the table OS, as I have many operations that are not connected through the stage I need it to show zero and to print a full list of companies based on the group_id (Nip_Grupo)
Conditional aggregate may help
SELECT CE.name,
SUM( IIF( OS.stage=1, O.val1+O.val2,0)) as stage1,
SUM( IIF( OS.stage=2, O.val1+O.val2,0)) as stage2
FROM CE
INNER JOIN O ON (CE.Cod_Contraparte = O.Cod_Contraparte) AND (CE.AnoMes = O.AnoMes) AND (CE.Nip = O.Nip_1Titular)) AND ((CE.Nip)=[O].[Nip_1]))
INNER JOIN OPERACOES_STAGING_lnk AS OS ON (O.AnoMes = OS.AnoMes) AND (O.ID_Operacao = OS.ID_Operacao)
WHERE (((CE.Nip_Grupo)=[enter nip:]) AND ((CE.anomes)=[enter anomes:])
GROUP BY CE.Nome

Rows in table with specific sum of a column

I have a table containing some payments looking something like this:
id | from | to | amount
--------------------------
1 | 125 | 135 | 2.4
2 | 123 | 134 | 1.7
3 | 124 | 138 | 4.8
4 | 118 | 119 | 3.9
5 | 56 | 254 | 23.5
...
I need to know if there is a way to make SQL query that would tell me if there is a series of consecutive rows, the amount of which sums up to a certain value. For example, if I wanted value 6.5, it would return rows 2 to 3. If I wanted 12.8, it would return rows 1 to 4 and so on.
I am absolutely stuck and would appreciate some help.
I would approach this as follows. First, calculate the cumulative sum. Then, the condition that consecutive rows have a particular sum is equivalent to saying that the difference between two of the cumulative sums equals that value.
with p as (
select p.*, sum(amount) over (order by id) as cumamount
from payments p
)
select
from p p1 join
p p2
on p1.id <= p2.id and
( p2.cumamount - p1.cumamount ) = 6.5;
As a note: this will probably not work if amount is stored as a floating point number because of very small inaccuracies. If amount where an integer, it would be fine, but it clearly is not. A fixed point representation should be ok.
;with numbers as (select number from master..spt_values where type='p' and number between 1 and (Select MAX(id) from yourtable)),
ranges as ( select n1.number as start, n2.number as finish from numbers n1 cross join numbers n2 where n1.number<=n2.number)
select yourtable.* from yourtable
inner join
(
select start, finish
from ranges
inner join yourtable on id between start and finish
group by start, finish
having SUM(amount)=12.8
) results
on yourtable.id between start and finish

SQL - Check for price changes over multiple periods

I am using an MS Access database and am trying to make a query that provides an overview of securities for which the price changed by more than XX% during the last XY consecutive months. I have tried all kind of subqueries but cannot get my head around this.
Please find below a simplified example. The PriceTable contains three attributes: a period, a security id and the price of the security in that period. I am looking for a query that provides me per the last period (in this case 201210) all securities having a price change of more than plus or minus XX% (in this case 3%) in the last XY (in this case 3) months. The three columns on the right hand provide some calculations to further clarify this:
Delta is the price change from one period to the other ((PT-PT-1)/PT-1)
Delta>Threshold: checks whether the change is larger than (plus or minus) 3% (parameter XX)
Counter: checks whether the price change is larger than 3% for 3 (parameter XY) consecutive months
In the example below the query should only show productID number 1.
PriceTable Supporting calculations
+--------+------+-------+--------+-----------------+---------+
+ Period |SecID | Price | Delta% | Delta>Threshold | Counter |
+--------+------+-------+--------+-----------------+---------+
| 201206 | 1 | 105 | 0% | N | 0 |
| 201207 | 1 | 100 | -4.76% | Y | 1 |
| 201208 | 1 | 95 | -5% | Y | 2 |
| 201209 | 1 | 90 | -5.26% | Y | 3 |
| 201210 | 1 | 85 | -5.56% | Y | 4 |
| 201207 | 2 | 95 | 0% | N | 0 |
| 201208 | 2 | 100 | 5.26% | Y | 1 |
| 201209 | 2 | 103 | 3% | N | 0 |
| 201210 | 2 | 99 | -3.88% | Y | 1 |
+--------+------+-------+--------+-----------------+---------+
I hope someone can help me out!
Thanks in advance,
Paul
I don't have Access to hand, but here's a query for SQL Server:
The inner 'h' table is pretty much your helper table. the outer bit joins on 3 periods, and displays if the count with threshold 'Y' is 3
The way I did it you also need functions for working out the next period, and the number of periods between two end points. These should be fairly easy to write in VBA. You could also create a period table with a sequence number to work around this:
-- Function that works out the next period
-- i.e. if you supply 201112, it will return 201201
Create Function dbo.NextPeriod(#Period As Int) Returns Int As
Begin
Declare
#Month int,
#Ret int = Null
If #Period Is Not Null
Begin
Set #Month = #Period - 100 * (#Period / 100)
If #Month < 12
Set #Ret = #Period + 1
Else
Set #Ret = #Period - #Month + 101
End
Return #Ret
End;
-- Function that works out how many periods between the two endpoints
-- dbo.PeriodCount(201112, 201201) = 1
Create Function dbo.PeriodCount(#StartPeriod As Int, #EndPeriod As Int) Returns Int As
Begin
Declare
#StartMonth int,
#EndMonth int,
#StartYear int,
#EndYear int,
#Ret int = Null
If #StartPeriod Is Not Null And #EndPeriod Is Not Null
Begin
Set #StartMonth = #StartPeriod - 100 * (#StartPeriod /100)
Set #StartYear = (#StartPeriod - #StartMonth) / 100
Set #EndMonth = #EndPeriod - 100 * (#EndPeriod / 100)
Set #EndYear = (#EndPeriod - #EndMonth) / 100
Set #Ret = (12 * #EndYear + #EndMonth) - (12 * #StartYear + #StartMonth)
End
Return #Ret
End;
-- Show periods that are the start of a run
-- of #Periods periods with threshold
-- of at least #Threshold
Declare #Threshold Decimal(10, 2) = 3
Declare #Periods int = 3
Select
p0.SecurityID,
p0.Period
From
PriceTable p0
Inner Join (
Select
p1.*,
100 * (p1.Price - p2.Price) / p2.Price As Delta,
Case When Abs(100 * (p1.Price - p2.Price) / p2.Price) > #Threshold Then 'Y' Else 'N' End As OverThreshold
From
PriceTable p1
Left Outer Join
PriceTable p2
On p1.SecurityID = p2.SecurityID And
p1.Period = dbo.NextPeriod(p2.Period)
) h
On p0.SecurityID = h.SecurityID And
dbo.PeriodCount(p0.Period, h.Period) Between 0 And (#Periods - 1) And
h.OverThreshold = 'Y'
Group By
p0.SecurityID,
p0.Period
Having
Count(*) = #Periods
Order By
p0.SecurityID,
p0.Period;
This shows you how the method works, you can simplify it like so:
Declare #Threshold Decimal(10, 2) = 3
Declare #Periods int = 3
Select
p0.SecurityID,
p0.Period
From
PriceTable p0
Inner Join
PriceTable p1
On p0.SecurityID = p1.SecurityID And
dbo.PeriodCount(p0.Period, p1.Period) Between 0 And (#Periods - 1)
Inner Join
PriceTable p2
On p1.SecurityID = p2.SecurityID And
p1.Period = dbo.NextPeriod(p2.Period)
Where
Abs(100 * (p1.Price - p2.Price) / p2.Price) > #Threshold
Group By
p0.SecurityID,
p0.Period
Having
Count(*) = #Periods
Order By
p0.SecurityID,
p0.Period;
http://sqlfiddle.com/#!3/8eff9/2
#Laurence: please find below the code
Public Function NextPer(Nperiod As Long) As Long
Dim Month As Long
If Not IsNull(Nperiod) Then
Month = 100 * ((Nperiod / 100) - Round(Nperiod / 100, 0))
If Month < 12 Then
NextPer = Nperiod + 1
Else
NextPer = Nperiod - Month + 101
End If
End If
End Function
Public Function PCount(SPeriod As Long, EPeriod As Long) As Long
Dim SMonth As Long
Dim EMonth As Long
Dim SYear As Long
Dim EYear As Long
If Not IsNull(SPeriod) And Not IsNull(EPeriod) Then
SMonth = 100 * ((SPeriod / 100) - Round(SPeriod / 100, 0))
SYear = (SPeriod - SMonth) / 100
EMonth = 100 * ((EPeriod / 100) - Round(EPeriod / 100, 0))
EYear = (EPeriod - EMonth) / 100
PCount = (12 * EYear + EMonth) - (12 * SYear + SMonth)
End If
End Function
And the QUERY (the parameters are for the moment hardcoded)
SELECT p0.SecurityID, p0.Period
FROM (PriceTable AS p0
INNER JOIN PriceTable AS p1 ON (p0.SecurityID = p1.SecurityID)
AND (PCount(p0.Period,p1.Period)>=0) AND (PCount(p0.Period,p1.Period)<=2))
INNER JOIN PriceTable AS p2 ON (p1.SecurityID = p2.SecurityID)
AND (p1.Period = NextPer(p2.Period))
WHERE Abs(100*(p1.Price-p2.Price)/p2.Price)>0.03
GROUP BY p0.SecurityID, p0.Period
HAVING Count(*) = 3
ORDER BY p0.SecurityID asc , p0.Period asc;
+1 for your intention of trying to get this in query itself without UDFs. Out of extreme interest I have put some effort to find a solution. I admit following code is not the most efficient code. (with all those IIFs, the performance is not that great)
Getting first 5 columns as per your above table are pretty straightforwad. I have saved that in qryDelta. I find the tricky part of the question is to have Counter in the same results table. Second query qryCounter will give you the final table as you expected.
qryDelta
SELECT a.period, a.secid, a.price,
iif(isnull(ROUND((a.price-b.price)/b.price*100,2)),0,
ROUND((a.price-b.price)/b.price*100,2)) AS Delta,
iif(abs((a.price-b.price)/b.price)*100>3,"Y","N") AS Threshold,
SUM(iif(abs((a.price-b.price)/b.price)*100>3,1,0)) AS [Counter]
FROM tbldelta AS a LEFT JOIN tbldelta AS b
ON (a.secid = b.secid) AND (a.period = b.period + 1)
GROUP BY a.period, a.secid, a.price,
iif(isnull(ROUND((a.price-b.price)/b.price*100,2)),0,
ROUND((a.price-b.price)/b.price*100,2)),
iif(abs((a.price-b.price)/b.price)*100>3,"Y","N")
ORDER BY a.secid, a.period;
Results:
qryCounter
SELECT q.period, q.secid, q.price, q.delta, q.threshold,
SUM(iif(q.counter=0,0,1)) AS Counter
FROM qryDelta q
LEFT JOIN tblDelta t
ON q.secid = t.secid
AND (t.period < q.period)
GROUP BY q.secid, q.period, q.price, q.delta, q.threshold
Results:
However I too faced the issue with SecId = 2, Period = 201208 with a total = 2. So I changed my query conditions. Now the results seem to show the cumulative periodic count properly except for SectID = 2, Period = 201210 total = 3. Perhpas you guys could throw some light to this. Out of most of the experiments done, it seems more or less a bug on JOIN and between dates that we are trying to put as coditions here.
PS:
If you have decided to build user defined functions (UDF), then you may consider two things. Are you using Excel as front end or Access as front end. Then you have to provide necessary arrangements to call your Access UDF & query from Excel. If you are only using Access as both front and back end, then ofcourse using a UDF would be much easier to handle.
I solved it using just SQL. Here's how I did.
First of all, we need a query that, for each rows, shows the distance in rows from the last period:
Period SecID Price Row
===============================
201206 1 105 4
201207 1 100 3
201208 1 95 2
201209 1 90 1
201210 1 85 0
201207 2 95 3
201208 2 100 2
201209 2 103 1
201210 2 99 0
we will call it PriceTable_Ordered:
SELECT
PriceTable.Period,
PriceTable.SecID,
PriceTable.Price,
(select count(*) from PriceTable PriceTable_1
where PriceTable_1.SecID = PriceTable.SecID
AND PriceTable_1.Period > PriceTable.Period) AS Row
FROM PriceTable;
Now to calculate the Delta, and showing if the Delta is more than the threesold, we can use this query that we will call PriceTable_Total1:
SELECT
PriceTable_Ordered.*,
PriceTable_Ordered_1.Price,
(PriceTable_Ordered.Price-PriceTable_Ordered_1.Price)/(PriceTable_Ordered_1.Price) AS Delta,
iif((ABS(Delta*100)>3),"Y","N") AS DeltaThreesold
FROM
PriceTable_Ordered LEFT JOIN PriceTable_Ordered AS PriceTable_Ordered_1
ON (PriceTable_Ordered.SecID = PriceTable_Ordered_1.SecID)
AND (PriceTable_Ordered.[Row]=PriceTable_Ordered_1.[Row]-1);
And this returns:
Period SecID Price1 Row Price2 Delta DeltaThreesold
=========================================================
201206 1 105 4 N
201207 1 100 3 105 -4,76 Y
201208 1 95 2 100 -0,05 Y
201209 1 90 1 95 -5,26 Y
201210 1 85 0 90 -5,55 Y
201207 2 95 3 N
201208 2 100 2 95 5,26 Y
201209 2 103 1 100 0,03 N
201210 2 99 0 103 -3,88 Y
Now we can create PriceTable_Total2 based on PriceTable_Total1:
SELECT
PriceTable_Total1.Period,
PriceTable_Total1.SecID,
PriceTable_Total1.PriceTable_Ordered.Price,
PriceTable_Total1.Delta,
PriceTable_Total1.DeltaThreesold,
PriceTable_Total1.Row,
(select min(row) from PriceTable_Total1 PriceTable_Total1_1
where PriceTable_Total1.SecID = PriceTable_Total1_1.SecId
and PriceTable_Total1.Row < PriceTable_Total1_1.Row
and PriceTable_Total1_1.DeltaThreesold="N") AS MinN,
IIf([DeltaThreesold]="Y",[MinN]-[row],0) AS CountRows
FROM PriceTable_Total1;
we select all the columns of PriceTable_Total1, then for each row we count the minimum row number > than current row where threesold is "N". If current row is over threesold, the count we need is just this difference, otherwise it's 0. Here's the result:
Period SecID Price Delta DelTh Row MinN CountRows
========================================================
201206 1 105 N 4 0
201207 1 100 -4,76 Y 3 4 1
201208 1 95 -0,05 Y 2 4 2
201209 1 90 -5,26 Y 1 4 3
201210 1 85 -5,55 Y 0 4 4
201207 2 95 N 3 0
201208 2 100 5,26 Y 2 3 1
201209 2 103 0,03 N 1 3 0
201210 2 99 -3,88 Y 0 1 1
You can then hide the columns that you don't need. This query should work even if we cross the year and even if some periods are missing.
SELECT PriceTable_Total2.Period, PriceTable_Total2.SecID
FROM PriceTable_Total2
WHERE (PriceTable_Total2.Period=
(select max(period)
from PriceTable
where PriceTable.SecID=PriceTable_Total2.SecID)
AND (PriceTable_Total2.[CountRows])>=3);
this will return:
Period SecID
201210 1
and that means that only SecID 1 is over threesold in the last period for more than 3 months.
I hope this answer is correct, it was nice to try to solve it!!