SQL Complex Summation

SQL Complex Summation - sql

data, ID, Value, Exp1
201101, 1, 2
201202, 1, 3
201303, 1, 4
201101, 2, 2
201202, 2, 3
201303, 2, 4
201304, 2, 5
201305, 2, 6
201306, 2, 7
201307, 2, 8
201308, 2, 9
201309, 2, 10
201310, 2, 11
201311, 2, 12
201312, 2, 13
I have to calculate the value of Exp1 as
for ID=2. Exp1= (sum of value from 201307 to 201312)/6-(sum of value from 201301 to 201306)/6
Some IDs might not have value for all the months, some might have only one value.
Is this possible in SQL?
for ID 2: Exp1=(13+12+11+10+9+8)/6-(7+6+5+4+3+2)/6
for ID 1: Exp1=(0+0+0+0+0+0+0+0)/6-(2+3+4+0+0+0)/6
This has to be done for all the IDS

select
ID,
sum(
case
when YRMO between 201307 and 201312 then value
else 0
end)/6
- sum(
case
when YRMO between 201301 and 201306 then value
else 0
end)/6 as EXP1
from TABLE
group by ID;

select
id,
sum(value) / 6 exp1
from (
select
id,
case when YRMO between '201301' and '201306' then -value else value end value
from `table`
where YRMO between '201301' and '201312'
) q
group by id

Related

Calculating distance using geometry of x and y location in SQL

I'm using SQL Server and I need to calculate the distance between the x and y of a frame and the previous x and y of a frame where the day, team, and member are all the same. Currently, I have this code that works but doesn't accomplish what I need. I'm getting every distance permutation of the x and y location where the day, team, and member are all the same.
I need help to incorporate frames into the query so that I get the N+1 Frame x and y location minus the N Frame x and y location.
CREATE TABLE TestTable (
Day int NULL,
Frame int NULL,
Team int NULL,
Member int NULL,
x float NULL,
y float NULL
);
Insert into a Values
(1, 1, 1, 1, 1486.64, 2017.55),
(1, 1, 1, 2, 1754.55, 1495.81),
(1, 1, 2,1, 2049.15, 876.349),
(1, 2, 1, 1, 1707.59, 1171.22),
(1, 2, 1, 2, 1432.56, 1459.99),
(1, 2, 2, 1, 1470.27, 1086.22),
(1, 3, 1, 1, 3639.19, 1281.36),
(1, 3, 1, 2, 2751.37, 976.348),
(1, 3, 2, 1, 2496.69, 1283.29),
(1, 4, 1, 1, 2347.26, 984.255),
(1, 4, 1, 2, 2044.92, 711.154),
(1, 4, 2, 1, 2473.65, 1816.23);
Select A.Day, A.Frame, A.Team, A.Member,
GEOMETRY::Point(A.[x], A.[y], 0).STDistance(GEOMETRY::Point(B.[x], B.[y], 0)) As Distance
From a A
Join a B
ON A.Day = B.Day and A.Team = B.Team and A.Member = B.Member
I also may deal with NULL x and y values so if it's possible to add this to the query too.
Where A.x IS NOT NULL and A.y IS NOT NULL
Ultimately I want to track the distance of every member throughout the day, frame by frame.Later, I'll add up each member's total distance for the day.

;WITH CTE1 AS
(
SELECT
[day], team, member, frame, x, y,
LAG(x) OVER (PARTITION BY [day], team, member ORDER BY frame) AS PervFrameX,
LAG(y) OVER (PARTITION BY [day], team, member ORDER BY frame) AS PervFrameY
FROM
TestTable
WHERE
X IS NOT NULL AND Y IS NOT NULL
),
CTE2 AS
(
SELECT
[day], team, member, frame, x, y, PervFrameX, PervFrameY,
IIF(PervFrameX IS NULL OR PervFrameY IS NULL, 0,
GEOMETRY::Point(x, y, 0).STDistance(GEOMETRY::Point(PervFrameX, PervFrameY, 0))) As Distance
FROM
CTE1
)
SELECT
*,
SUM(Distance) OVER (PARTITION BY [day], team, member) AS MemberTotalDistance,
SUM(Distance) OVER (PARTITION BY [day]) AS DailyTotalDistance
FROM
CTE2
ORDER BY
[day], team, member, frame
CTE1 and CTE2 are used to improve readability of the query.
Output:
day team member frame x y PervFrameX PervFrameY Distance MemberTotalDistance DailyTotalDistance
1 1 1 1 1486.64 2017.55 NULL NULL 0.000 4135.086 8812.698
1 1 1 2 1707.59 1171.22 1486.64 2017.55 874.696 4135.086 8812.698
1 1 1 3 3639.19 1281.36 1707.59 1171.22 1934.738 4135.086 8812.698
1 1 1 4 2347.26 984.255 3639.19 1281.36 1325.652 4135.086 8812.698
1 1 2 1 1754.55 1495.81 NULL NULL 0.000 2483.257 8812.698
1 1 2 2 1432.56 1459.99 1754.55 1495.81 323.976 2483.257 8812.698
1 1 2 3 2751.37 976.348 1432.56 1459.99 1404.695 2483.257 8812.698
1 1 2 4 2044.92 711.154 2751.37 976.348 754.586 2483.257 8812.698
1 2 1 1 2049.15 876.349 NULL NULL 0.000 2194.355 8812.698
1 2 1 2 1470.27 1086.22 2049.15 876.349 615.750 2194.355 8812.698
1 2 1 3 2496.69 1283.29 1470.27 1086.22 1045.167 2194.355 8812.698
1 2 1 4 2473.65 1816.23 2496.69 1283.29 533.438 2194.355 8812.698

Running Total of all Previous Rows BigQuery

I have a BigQuery Table which looks like Below:
ID SessionNumber CountOfAction Category
1 1 1 B
1 2 3 A
1 3 1 A
1 4 4 B
1 5 5 B
I am trying to get the running total of all previous rows for CountofAction where category = A. The final Output should be
ID SessionNumber CountOfAction
1 1 0 --no previous rows have countofAction for category = A
1 2 0 --no previous rows have countofAction for category = A
1 3 3 --previous row (Row 2) has countofAction = 3 for category = A
1 4 4 --previous rows (Row 2 and 3) have countofAction = 3 and 1 for category = A
1 5 4 --previous rows (Row 2 and 3) have countofAction = 3 and 1 for category = A
Below is the query I have written but it doesn't give me desired output
select
ID,
SessionNumber ,
SUM(CountofAction) OVER(Partition by clieIDntid ORDER BY SessionNumber ROWS BETWEEN UNBOUNDED
PRECEDING AND 1 PRECEDING)as CumulativeCountofAction
From TAble1 where category = 'A'
I would really appreciate any help on this! Thanks in advance

Filtering on category in the where clause evicts (id, sessionNumber) tuples where category 'A' does not appear, which is not what you want.
Instead, you can use aggregation and a conditional sum():
select
id,
sessionNumber,
sum(sum(if(category = 'A', countOfAction, 0))) over(
partition by id
order by sessionNumber
rows between unbounded preceding and 1 preceding
) CumulativeCountofAction
from mytable t
group by id, sessionNumber
order by id, sessionNumber

Below is for BigQuery Standard SQL
#standardSQL
SELECT ID, SessionNumber,
IFNULL(SUM(IF(category = 'A', CountOfAction, 0)) OVER(win), 0) AS CountOfAction
FROM `project.dataset.table`
WINDOW win AS (ORDER BY SessionNumber ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
If to apply to sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 ID, 1 SessionNumber, 1 CountOfAction, 'B' Category UNION ALL
SELECT 1, 2, 3, 'A' UNION ALL
SELECT 1, 3, 1, 'A' UNION ALL
SELECT 1, 4, 4, 'B' UNION ALL
SELECT 1, 5, 5, 'B'
)
SELECT ID, SessionNumber,
IFNULL(SUM(IF(category = 'A', CountOfAction, 0)) OVER(win), 0) AS CountOfAction
FROM `project.dataset.table`
WINDOW win AS (ORDER BY SessionNumber ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
result is
Row ID SessionNumber CountOfAction
1 1 1 0
2 1 2 0
3 1 3 3
4 1 4 4
5 1 5 4

Calculating date intervals from a daily-grained fact table

I have the data for student absence which I got after some transformations. The data is day by day:
WITH datasample AS (
SELECT 1 AS StudentID, 20180101 AS DateID, 0 AS AbsentToday, 0 AS AbsentYesterday UNION ALL
SELECT 1, 20180102, 1, 0 UNION ALL
SELECT 1, 20180103, 1, 1 UNION ALL
SELECT 1, 20180104, 1, 1 UNION ALL
SELECT 1, 20180105, 1, 1 UNION ALL
SELECT 1, 20180106, 0, 1 UNION ALL
SELECT 2, 20180101, 0, 0 UNION ALL
SELECT 2, 20180102, 1, 0 UNION ALL
SELECT 2, 20180103, 1, 1 UNION ALL
SELECT 2, 20180104, 0, 1 UNION ALL
SELECT 2, 20180105, 1, 0 UNION ALL
SELECT 2, 20180106, 1, 1 UNION ALL
SELECT 2, 20180107, 0, 1
)
SELECT *
FROM datasample
ORDER BY StudentID, DateID
I need to add a column (AbsencePeriodInMonth) which would calculate the student's absence period during the month.
For example, StudentID=1 was absent in one consecutive period during the month and StudentID=2 had two periods, something like this:
StudentID DateID AbsentToday AbsentYesterday AbsencePeriodInMonth
1 20180101 0 0 0
1 20180102 1 0 1
1 20180103 1 1 1
1 20180104 1 1 1
1 20180105 1 1 1
1 20180106 0 1 0
2 20180101 0 0 0
2 20180102 1 0 1
2 20180103 1 1 1
2 20180104 0 1 0
2 20180105 1 0 2
2 20180106 1 1 2
2 20180107 0 1 0
My goal is actually to calculate the consecutive absent days prior to each day in the fact table, I think I can do it if I get the AbsencePeriodInMonth column, by having this added to my query after the *:
,CASE WHEN AbsentToday = 1 THEN DENSE_RANK() OVER(PARTITION BY StudentID, AbsencePeriodInMonth ORDER BY DateID)
ELSE 0
END AS DaysAbsent
Any idea on how I can add that AbsencePeriodInMonth or maybe calculate the consecutive absent days in some other way?

You can identify each period by counting the number of 0s before hand. Then you can enumerate them using dense_rank().
select ds.*,
(case when absenttoday = 1 then dense_rank() over (partition by studentid order by grp)
else 0
end) as AbsencePeriodInMonth
from (select ds.*, sum(case when absenttoday = 0 then 1 else 0 end) over (partition by studentid order by dateid) as grp
from datasample ds
) ds
order by StudentID, DateID;
Here is a SQL Fiddle.

Using Recursive CTE and Dense_Rank
WITH datasample AS (
SELECT 1 AS StudentID, 20180101 AS DateID, 0 AS AbsentToday, 0 AS AbsentYesterday UNION ALL
SELECT 1, 20180102, 1, 0 UNION ALL
SELECT 1, 20180103, 1, 1 UNION ALL
SELECT 1, 20180104, 1, 1 UNION ALL
SELECT 1, 20180105, 1, 1 UNION ALL
SELECT 1, 20180106, 0, 1 UNION ALL
SELECT 2, 20180101, 0, 0 UNION ALL
SELECT 2, 20180102, 1, 0 UNION ALL
SELECT 2, 20180103, 1, 1 UNION ALL
SELECT 2, 20180104, 0, 1 UNION ALL
SELECT 2, 20180105, 1, 0 UNION ALL
SELECT 2, 20180106, 1, 1 UNION ALL
SELECT 2, 20180107, 0, 1
), cte as
(Select *,DateID as dd
from datasample
where AbsentToday = 1 and AbsentYesterday = 0
union all
Select d.*, c.dd
from datasample d
join cte c
on d.StudentID = c.StudentID and d.DateID = c.DateID + 1
where d.AbsentToday = 1
), cte1 as
(
Select *, DENSE_RANK() over (partition by StudentId order by dd) as de
from cte
)
Select d.*, IsNull(c.de,0) as AbsencePeriodInMonth
from cte1 c
right join datasample d
on d.StudentID = c.StudentID and c.DateID = d.DateID
order by d.StudentID, d.DateID

Adjusting table based on previous values in BigQuery

I have a table that looks like below:
ID|Date |X| Flag |
1 |1/1/16|2| 0
2 |1/1/16|0| 0
3 |1/1/16|0| 0
1 |2/1/16|0| 0
2 |2/1/16|1| 0
3 |2/1/16|2| 0
1 |3/1/16|2| 0
2 |3/1/16|1| 0
3 |3/1/16|2| 0
I'm trying to make it so that flag is populated if X=2 in the PREVIOUS month. As such, it should look like this:
ID|Date |X| Flag |
1 |1/1/16|2| 0
2 |1/1/16|0| 0
3 |1/1/16|0| 0
1 |2/1/16|2| 1
2 |2/1/16|1| 0
3 |2/1/16|2| 0
1 |3/1/16|2| 1
2 |3/1/16|1| 0
3 |3/1/16|2| 1
I use this in SQL:
`select ID, date, X, flag into Work_Table from t
(
Select ID, date, X, flag,
Lag(X) Over (Partition By ID Order By date Asc) As Prev into Flag_table
From Work_Table
)
Update [dbo].[Flag_table]
Set flag = 1
where prev = '2'
UPDATE t
Set t.flag = [dbo].[Flag_table].flag FROM T
JOIN [dbo].[Flag_table]
ON t.ID= [dbo].[Flag_table].ID where T.date = [dbo].[Flag_table].date`
However I cannot do this in Bigquery. Any ideas?

Below is for BigQuery Standard SQL
#standardSQL
SELECT id, dt, x,
IF(LAG(x = 2) OVER(PARTITION BY id ORDER BY dt), 1, 0) flag
FROM `project.dataset.work_table`
You can test / play with it using dummy data from your question as
#standardSQL
WITH `project.dataset.work_table` AS (
SELECT 1 id, '1/1/16' dt, 2 x, 0 flag UNION ALL
SELECT 2, '1/1/16', 0, 0 UNION ALL
SELECT 3, '1/1/16', 0, 0 UNION ALL
SELECT 1, '2/1/16', 0, 0 UNION ALL
SELECT 2, '2/1/16', 1, 0 UNION ALL
SELECT 3, '2/1/16', 2, 0 UNION ALL
SELECT 1, '3/1/16', 2, 0 UNION ALL
SELECT 2, '3/1/16', 1, 0 UNION ALL
SELECT 3, '3/1/16', 2, 0
)
SELECT id, dt, x,
IF(LAG(x = 2) OVER(PARTITION BY id ORDER BY dt), 1, 0) flag
FROM `project.dataset.work_table`
ORDER BY dt, id
with result as
Row id dt x flag
1 1 1/1/16 2 0
2 2 1/1/16 0 0
3 3 1/1/16 0 0
4 1 2/1/16 0 1
5 2 2/1/16 1 0
6 3 2/1/16 2 0
7 1 3/1/16 2 0
8 2 3/1/16 1 0
9 3 3/1/16 2 1

Finding repeated occurrences with ranking functions

Please help me generate the following query i've been struggling with for some time now. Lets' say I have a simple table with month number and information whether there were any failed events in this particular month
Below a script to generate sample data:
WITH DATA(Month, Success) AS
(
SELECT 1, 0 UNION ALL
SELECT 2, 0 UNION ALL
SELECT 3, 0 UNION ALL
SELECT 4, 1 UNION ALL
SELECT 5, 1 UNION ALL
SELECT 6, 0 UNION ALL
SELECT 7, 0 UNION ALL
SELECT 8, 1 UNION ALL
SELECT 9, 0 UNION ALL
SELECT 10, 1 UNION ALL
SELECT 11, 0 UNION ALL
SELECT 12, 1 UNION ALL
SELECT 13, 0 UNION ALL
SELECT 14, 1 UNION ALL
SELECT 15, 0 UNION ALL
SELECT 16, 1 UNION ALL
SELECT 17, 0 UNION ALL
SELECT 18, 0
)
Given the definition of a "repeated failure ":
When event failure occurs during at least 4 months in any 6 months period then the last month with such failure is a "repeated failure" my query should return the following output
Month Success RepeatedFailure
1 0
2 0
3 0
4 1
5 1
6 0 R1
7 0 R2
8 1
9 0
10 1
11 0 R3
12 1
13 0
14 1
15 0
16 1
17 0
18 0 R1
where:
R1 -1st repeated failure in month no 6 (4 failures in last 6 months).
R2 -2nd repeated failure in month no 7 (4 failures in last 6 months).
R3 -3rd repeated failure in month no 11 (4 failures in last 6 months).
R1 -again 1st repeated failure in month no 18 because Repeated Failures should be again numbered from the beginning when new Repeated Failure occurs for the first time in last 6 reporting periods
Repeated Failures are numerated consecutively because based on its number i must apply appropriate multiplier:
1st repated failure - X2
2nd repeated failure - X4
3rd and more repeated failure -X5.

I'm sure this can be improved, but it works. We essentially do two passes - the first to establish repeated failures, the second to establish what kind of repeated failure each is. Note that Intermediate2 can definitely be done away with, I've only separated it out for clarity. All the code is one statement, my explanation is interleaved:
;WITH DATA(Month, Success) AS
-- assuming your data as defined (with my edit)
,Intermediate AS
(
SELECT
Month,
Success,
-- next column for illustration only
(SELECT SUM(Success)
FROM DATA hist
WHERE curr.Month - hist.Month BETWEEN 0 AND 5)
AS SuccessesInLastSixMonths,
-- next column for illustration only
6 - (SELECT SUM(Success)
FROM DATA hist
WHERE curr.Month - hist.Month BETWEEN 0 AND 5)
AS FailuresInLastSixMonths,
CASE WHEN
(6 - (SELECT SUM(Success)
FROM DATA hist
WHERE curr.Month - hist.Month BETWEEN 0 AND 5))
>= 4
THEN 1
ELSE 0
END AS IsRepeatedFailure
FROM DATA curr
-- No real data until month 6
WHERE curr.Month > 5
)
At this point we have established, for each month, whether it's a repeated failure, by counting the failures in the six months up to and including it.
,Intermediate2 AS
(
SELECT
Month,
Success,
IsRepeatedFailure,
(SELECT SUM(IsRepeatedFailure)
FROM Intermediate hist
WHERE curr.Month - hist.Month BETWEEN 0 AND 5)
AS RepeatedFailuresInLastSixMonths
FROM Intermediate curr
)
Now we have counted the number of repeated failures in the six months leading up to now
SELECT
Month,
Success,
CASE IsRepeatedFailure
WHEN 1 THEN 'R' + CONVERT(varchar, RepeatedFailuresInLastSixMonths)
ELSE '' END
AS RepeatedFailureText
FROM Intermediate2
so we can say, if this month is a repeated failure, what cardinality of repeated failure it is.
Result:
Month Success RepeatedFailureText
----------- ----------- -------------------------------
6 0 R1
7 0 R2
8 1
9 0
10 1
11 0 R3
12 1
13 0
14 1
15 0
16 1
17 0
18 0 R1
(13 row(s) affected)
Performance considerations will depend on on how much data you actually have.

;WITH DATA(Month, Success) AS
(
SELECT 1, 0 UNION ALL
SELECT 2, 0 UNION ALL
SELECT 3, 0 UNION ALL
SELECT 4, 1 UNION ALL
SELECT 5, 1 UNION ALL
SELECT 6, 0 UNION ALL
SELECT 7, 0 UNION ALL
SELECT 8, 1 UNION ALL
SELECT 9, 0 UNION ALL
SELECT 10, 1 UNION ALL
SELECT 11, 0 UNION ALL
SELECT 12, 1 UNION ALL
SELECT 13, 0 UNION ALL
SELECT 14, 1 UNION ALL
SELECT 15, 0 UNION ALL
SELECT 16, 1 UNION ALL
SELECT 17, 0 UNION ALL
SELECT 18, 0
)
SELECT DATA.Month,DATA.Success,Isnull(convert(Varchar(10),b.result),'') +
Isnull(CONVERT(varchar(10),b.num),'') RepeatedFailure
FROM (
SELECT *, ROW_NUMBER() over (order by Month) num FROM
( Select * ,(case when (select sum(Success)
from DATA where MONTH>(o.MONTH-6) and MONTH<=(o.MONTH) ) <= 2
and o.MONTH>=6 then 'R' else '' end) result
from DATA o
) a where result='R'
) b
right join DATA on DATA.Month = b.Month
order by DATA.Month

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Complex Summation - sql

select ID, sum( case when YRMO between 201307 and 201312 then value else 0 end)/6 - sum( case when YRMO between 201301 and 201306 then value else 0 end)/6 as EXP1 from TABLE group by ID;

select id, sum(value) / 6 exp1 from ( select id, case when YRMO between '201301' and '201306' then -value else value end value from `table` where YRMO between '201301' and '201312' ) q group by id

Related

Calculating distance using geometry of x and y location in SQL

Running Total of all Previous Rows BigQuery

Calculating date intervals from a daily-grained fact table

Adjusting table based on previous values in BigQuery

Finding repeated occurrences with ranking functions

Categories

Resources