Get Monthly Totals from Running Totals - sql

I have a table in a SQL Server 2008 database with two columns that hold running totals called Hours and Starts. Another column, Date, holds the date of a record. The dates are sporadic throughout any given month, but there's always a record for the last hour of the month.
For example:
ContainerID | Date | Hours | Starts
1 | 2010-12-31 23:59 | 20 | 6
1 | 2011-01-15 00:59 | 23 | 6
1 | 2011-01-31 23:59 | 30 | 8
2 | 2010-12-31 23:59 | 14 | 2
2 | 2011-01-18 12:59 | 14 | 2
2 | 2011-01-31 23:59 | 19 | 3
How can I query the table to get the total number of hours and starts for each month between two specified years? (In this case 2011 and 2013.) I know that I need to take the values from the last record of one month and subtract it by the values from the last record of the previous month. I'm having a hard time coming up with a good way to do this in SQL, however.
As requested, here are the expected results:
ContainerID | Date | MonthlyHours | MonthlyStarts
1 | 2011-01-31 23:59 | 10 | 2
2 | 2011-01-31 23:59 | 5 | 1

Try this:
SELECT c1.ContainerID,
c1.Date,
c1.Hours-c3.Hours AS "MonthlyHours",
c1.Starts - c3.Starts AS "MonthlyStarts"
FROM Containers c1
LEFT OUTER JOIN Containers c2 ON
c1.ContainerID = c2.ContainerID
AND datediff(MONTH, c1.Date, c2.Date)=0
AND c2.Date > c1.Date
LEFT OUTER JOIN Containers c3 ON
c1.ContainerID = c3.ContainerID
AND datediff(MONTH, c1.Date, c3.Date)=-1
LEFT OUTER JOIN Containers c4 ON
c3.ContainerID = c4.ContainerID
AND datediff(MONTH, c3.Date, c4.Date)=0
AND c4.Date > c3.Date
WHERE
c2.ContainerID is null
AND c4.ContainerID is null
AND c3.ContainerID is not null
ORDER BY c1.ContainerID, c1.Date

Using recursive CTE and some 'creative' JOIN condition, you can fetch next month's value for each ContainterID:
WITH CTE_PREP AS
(
--RN will be 1 for last row in each month for each container
--MonthRank will be sequential number for each subsequent month (to increment easier)
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY ContainerID, YEAR(Date), MONTH(DATE) ORDER BY Date DESC) RN
,DENSE_RANK() OVER (ORDER BY YEAR(Date),MONTH(Date)) MonthRank
FROM Table1
)
, RCTE AS
(
--"Zero row", last row in decembar 2010 for each container
SELECT *, Hours AS MonthlyHours, Starts AS MonthlyStarts
FROM CTE_Prep
WHERE YEAR(date) = 2010 AND MONTH(date) = 12 AND RN = 1
UNION ALL
--for each next row just join on MonthRank + 1
SELECT t.*, t.Hours - r.Hours, t.Starts - r.Starts
FROM RCTE r
INNER JOIN CTE_Prep t ON r.ContainerID = t.ContainerID AND r.MonthRank + 1 = t.MonthRank AND t.Rn = 1
)
SELECT ContainerID, Date, MonthlyHours, MonthlyStarts
FROM RCTE
WHERE Date >= '2011-01-01' --to eliminate "zero row"
ORDER BY ContainerID
SQLFiddle DEMO (I have added some data for February and March in order to test on different lengths of months)
Old version fiddle

Related

Redshift: Add Row for each hour in a day

I have a table contains item_wise quantity at different hour of date. trying to add data for each hour(24 enteries in a day) with previous hour available quantity. For example for hour(2-10), it will be 5.
I created a table with hours enteries (1-24) & full join with shared table.
How can i add previous available entry. Need suggestion
item_id| date | hour| quantity
101 | 2022-04-25 | 2 | 5
101 | 2022-04-25 | 10 | 13
101 | 2022-04-25 | 18 | 67
101 | 2022-04-25 | 23 | 27
You can try to use generate_series to generate hours number, let it be the OUTER JOIN base table,
Then use a correlated-subquery to get your expect quantity column
SELECT t1.*,
(SELECT quantity
FROM T tt
WHERE t1.item_id = tt.item_id
AND t1.date = tt.date
AND t1.hour >= tt.hour
ORDER BY tt.hour desc
LIMIT 1) quantity
FROM (
SELECT DISTINCT item_id,date,v.hour
FROM generate_series(1,24) v(hour)
CROSS JOIN T
) t1
ORDER BY t1.hour
Provided the table of int 1 .. 24 is all24(hour) you can use lead and join
select t.item_id, t.date, all24.hour, t.quantity
from all24
join (
select *,
lead(hour, 1, 25) over(partition by item_id, date order by hour) - 1 nxt_h
from tbl
) t on all24.hour between t.hour and t.nxt_h

How to calculate occurrence depending on months/years

My table looks like that:
ID | Start | End
1 | 2010-01-02 | 2010-01-04
1 | 2010-01-22 | 2010-01-24
1 | 2011-01-31 | 2011-02-02
2 | 2012-05-02 | 2012-05-08
3 | 2013-01-02 | 2013-01-03
4 | 2010-09-15 | 2010-09-20
4 | 2010-09-30 | 2010-10-05
I'm looking for a way to count the number of occurrences for each ID in a Year per Month.
But what is important, If some record has a Start date in the following month compared to the End date (of course from the same year) then occurrence should be counted for both months [e.g. ID 1 in the 3rd row has a situation like that. So in this situation, the occurrence for this ID should be +1 for January and +1 for February].
So I'd like to have it in this way:
Year | Month | Id | Occurrence
2010 | 01 | 1 | 2
2010 | 09 | 4 | 2
2010 | 10 | 4 | 1
2011 | 01 | 1 | 1
2011 | 02 | 1 | 1
2012 | 05 | 2 | 1
2013 | 01 | 3 | 1
I created only this for now...
CREATE TABLE IF NOT EXISTS counts AS
(SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source)
And I don't know how to move with that further. I'd appreciate your help.
I'm using Spark SQL.
Try the following strategy to achieve this:
Note:
I have created few intermediate tables. If you wish you can use sub-query or CTE depending on the permissions
I have taken care of 2 scenarios you mentioned (whether to count it as 1 occurrence or 2 occurrence) as you explained
Query:
Firstly, creating a table with flags to decide whether start and end date are falling on same year and month (1 means YES, 2 means NO):
/* Creating a table with flags whether to count the occurrences once or twice */
CREATE TABLE flagged as
(
SELECT *,
CASE
WHEN Year_st = Year_end and Month_st = Month_end then 1
WHEN Year_st = Year_end and Month_st <> Month_end then 2
Else 0
end as flag
FROM
(
SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source
) as calc
)
Now the flag in the above table will have 1 if year and month are same for start and end 2 if month differs. You can have more categories of flag if you have more scenarios.
Secondly, counting the occurrences for flag 1. As we know year and month are same for flag 1, we can take either of it. I have taken start:
/* Counting occurrences only for flag 1 */
CREATE TABLE flg1 as (
SELECT distinct id, year_st, month_st, count(*) as occurrence
FROM flagged
where flag=1
GROUP BY id, year_st, month_st
)
Similarly, counting the occurrences for flag 2. Since month differs for both the dates, we can UNION them before counting to get both the dates in same column:
/* Counting occurrences only for flag 2 */
CREATE TABLE flg2 as
(
SELECT distinct id, year_dt, month_dt, count(*) as occurrence
FROM
(
select ID, year_st as year_dt, month_st as month_dt FROM flagged where flag=2
UNION
SELECT ID, year_end as year_dt, month_end as month_dt FROM flagged where flag=2
) as unioned
GROUP BY id, year_dt, month_dt
)
Finally, we just have to SUM the occurrences from both the flags. Note that we use UNION ALL here to combine both the tables. This is very important because we need to count duplicates as well:
/* UNIONING both the final tables and summing the occurrences */
SELECT distinct year, month, id, SUM(occurrence) as occurrence
FROM
(
SELECT distinct id, year_st as year, month_st as month, occurrence
FROM flg1
UNION ALL
SELECT distinct id, year_dt as year, month_dt as month, occurrence
FROM flg2
) as fin_unioned
GROUP BY id, year, month
ORDER BY year, month, id, occurrence desc
Output of above query will be your expected output. I know this is not an optimized one, yet it works perfect. I will update if I come across optimized strategy. Comment if you have question.
db<>fiddle link here
Not sure if this works in Spark SQL.
But if the ranges aren't bigger than 1 month, then just add the extra to the count via a UNION ALL.
And the extra are those with the end in a higher month than the start.
SELECT YearOcc, MonthOcc, Id
, COUNT(*) as Occurrence
FROM
(
SELECT Id
, YEAR(CAST(Start AS DATE)) as YearOcc
, MONTH(CAST(Start AS DATE)) as MonthOcc
FROM source
UNION ALL
SELECT Id
, YEAR(CAST(End AS DATE)) as YearOcc
, MONTH(CAST(End AS DATE)) as MonthOcc
FROM source
WHERE MONTH(CAST(Start AS DATE)) < MONTH(CAST(End AS DATE))
) q
GROUP BY YearOcc, MonthOcc, Id
ORDER BY YearOcc, MonthOcc, Id
YearOcc | MonthOcc | Id | Occurrence
------: | -------: | -: | ---------:
2010 | 1 | 1 | 2
2010 | 9 | 4 | 2
2010 | 10 | 4 | 1
2011 | 1 | 1 | 1
2011 | 2 | 1 | 1
2012 | 5 | 2 | 1
2013 | 1 | 3 | 1
db<>fiddle here

How to calculate number of Orders for given date range

Need some help in writing sql query to capture number of active Orders between date range on month wise grouping. i.e if the user selected 2018-01-01 to 2019-12-31, I have to show number of active orders on a month basis i.e total 12 records.
I'm querying against Order Table whose schema looks like below
OrderID CustomerFirstName PurchaseDate OrderEndDate
1 XYZ 2018-01-01 9999-12-31
2 ABC 2018-02-02 2018-06-30
3 PQR 2018-06-01 2018-06-30
4 GHI 2018-01-01 2018-02-28
Order EndDate 9999-12-31 is never ending order.All considers has existing order in all date ranges.
From My UX, if I select Jan to Dec... Results should
JAN ==> 2 orders
Feb ==> 3 Orders => Order ID are 1,2,4.
Reason for Month FEB Order ID : 1,2,4 are consider as Active orders because
their end dates are falling in FEB.
For example : ORDER ID : 1 having End date has 9999-12-31 which is never ending. Always Active orders in all the date range
Order ID : 2 having End Date has 2018-06-30 so till June he should be considered has Active order for every Month
Order ID : 4 having end date has 2018-02-28 for Feb month OrderID is active Orders
Expected Output
Month NoOfOrders
Jan 2
Feb 3
Month NoOfOrder
Jan 2
Feb 3
Create a year-month table (inspired from this answer) and join the Order table against it
DECLARE #DateFrom datetime, #DateTo Datetime
SET #DateFrom = ' 2018-01-01'
SET #DateTo = '2018-12-31'
SELECT YearMonth, COUNT(*)
FROM (SELECT CONVERT(CHAR(4),DATEADD(MONTH, x.number, #DateFrom),120) + '-' + CONVERT(CHAR(2),DATEADD(MONTH, x.number, #DateFrom),110) As YearMonth,
CONVERT(DATE, CONVERT(CHAR(4),DATEADD(MONTH, x.number, #DateFrom),120) + '-' + Convert(CHAR(2),DATEADD(MONTH, x.number, #DateFrom),110) + '-01', 23) fulldate
FROM master.dbo.spt_values x
WHERE x.type = 'P'
AND x.number <= DATEDIFF(MONTH, #DateFrom, #DateTo)) YearMonthTbl
LEFT JOIN Orders o ON YEAR(fulldate) >= YEAR(purchaseDate) AND MONTH(fulldate) >= MONTH(purchaseDate) and fulldate < = enddate
GROUP BY YearMonth
I decided to include also year in output if the input range crosses into a new year
Here is the output for completeness
2018-01 2
2018-02 3
2018-03 2
2018-04 2
2018-05 2
2018-06 3
2018-07 1
2018-08 1
2018-09 1
2018-10 1
2018-11 1
2018-12 1
First Part - Handling records with orderenddate= '9999-12-31'
You can try like following. By putting a OR condition for orderenddate = '9999-12-31' will make sure that never ending records will eppear in all the searchs if the strat date is within the boundary.
SELECT *
FROM [order]
WHERE purchasedate >= #startdate
AND ( orderenddate <= #enddate
OR orderenddate = '9999-12-31' )
Second Part :
sql query to capture number of active Orders between date range on
month wise grouping.
For month wise grouping you can try like following.
;WITH numbersequence( number )
AS (SELECT 1 AS Number
UNION ALL
SELECT number + 1
FROM numbersequence
WHERE number < 12)
SELECT Sum(ct) ActiveOrderCount,
number AS [month]
FROM (SELECT number,
CASE
WHEN c.number >= Month(purchasedate)
AND c.number <= Month(orderenddate) THEN 1
ELSE 0
END ct
FROM #order
CROSS JOIN numbersequence c
WHERE purchasedate >= #startdate
AND ( orderenddate <= #enddate
OR orderenddate = '9999-12-31' )) t
GROUP BY number
Online Demo
Output
+------------------+-------+
| ActiveOrderCount | Month |
+------------------+-------+
| 2 | 1 |
+------------------+-------+
| 3 | 2 |
+------------------+-------+
| 2 | 3 |
+------------------+-------+
| 2 | 4 |
+------------------+-------+
| 2 | 5 |
+------------------+-------+
| 3 | 6 |
+------------------+-------+
| 1 | 7 |
+------------------+-------+
| 1 | 8 |
+------------------+-------+
| 1 | 9 |
+------------------+-------+
| 1 | 10 |
+------------------+-------+
| 1 | 11 |
+------------------+-------+
| 1 | 12 |
+------------------+-------+
Assumption : Start Date and End Date falls under same year. Otherwise you need to put year condition also.

calculate sum based on value of other row in another column

I am trying to figure how can I calculate the number of days,the customer did not eat any candy.
Assuming that the Customer eats 1 candy/day.
If customer purchases more candy, it gets added to previous stock
Eg.
Day Candy Puchased
0 30
40 30
65 30
110 30
125 40
170 30
Answer here is 20.
Meaning on 0th day, customer brought 30 candies and his next purchase was on 40th day so he did not get to eat any candy between 30th to 39th day, also in the same way he did not eat any candy between 100th to 109th day.
Can anyone help me to write the query. I think I have got the wrong logic in my query.
select sum(curr.candy_purchased-(nxt.day-curr.day)) as diff
from candies as curr
left join candies as nxt
on nxt.day=(select min(day) from candies where day > curr.day)
You need a recursive CTE
First I need create a row_id so I use row_number
Now I need the base case for recursion.
Day: Mean how many day has pass. (0 from db)
PrevD: Is the Prev day amount so you can calculate Day (start at 0)
Candy Puchased: How many cadies bought (30 from db)
Remaining: How many candies left after eating (start at 0)
NotEat: How many days couldnt eat candy (start at 0)
Level: Recursion Level (start at 0)
Recursion Case
Day, PrevD, Candy Puchased are easy
Remaining: if I eat more than I have then 0
NotEat: Keep adding the diffence when doesnt have candy.
SQL Fiddle Demo
WITH Candy as (
SELECT
ROW_NUMBER() over (order by [Day]) as rn,
*
FROM Table1
), EatCandy ([Day], [PrevD], [Candy Puchased], [Remaining], [NotEat], [Level]) as (
SELECT [Day], 0 as [PrevD], [Candy Puchased], [Candy Puchased] as [Remaining], 0 as [NotEat], 1 as [Level]
FROM Candy
WHERE rn = 1
UNION ALL
SELECT c.[Day] - ec.[PrevD],
c.[Day],
c.[Candy Puchased],
c.[Candy Puchased] +
IIF((c.[Day] - ec.[PrevD]) > ec.[Remaining], 0, ec.[Remaining] - (c.[Day] - ec.[PrevD])),
ec.[NotEat] +
IIF((c.[Day] - ec.[PrevD]) > ec.[Remaining], (c.[Day] - ec.[PrevD]) - ec.[Remaining], 0),
ec.[Level] + 1
FROM Candy c
JOIN EatCandy ec
ON c.rn = ec.[level] + 1
)
select * from EatCandy
OUTPUT
| Day | PrevD | Candy Puchased | Remaining | NotEat | Level |
|-----|-------|----------------|-----------|--------|-------|
| 0 | 0 | 30 | 30 | 0 | 1 |
| 40 | 40 | 30 | 30 | 10 | 2 |
| 25 | 65 | 30 | 35 | 10 | 3 |
| 45 | 110 | 30 | 30 | 20 | 4 |
| 15 | 125 | 40 | 55 | 20 | 5 |
| 45 | 170 | 30 | 40 | 20 | 6 |
Just add SELECT MAX(NotEat) over the last query
Nice question.
Check my answer and also try with different sample data.
and please,if with different sample data it is not working then let me know.
declare #t table([Day] int, CandyPuchased int)
insert into #t
values (0, 30),(40,30),(65, 30)
,(110, 30),(125,40),(170,30)
select * from #t
;With CTE as
(
select *,ROW_NUMBER()over(order by [day])rn from #t
)
,CTE1 as
(
select [day],[CandyPuchased],rn from CTE c where rn=1
union all
select a.[Day],case when a.Day-b.Day<b.CandyPuchased
then a.CandyPuchased+(b.CandyPuchased-(a.Day-b.Day))
else a.CandyPuchased end CandyPuchased
,a.rn from cte A
inner join CTE B on a.rn=b.rn+1
)
--select * from CTE1
select sum(case when a.Day-b.Day>b.CandyPuchased
then (a.Day-b.Day)-b.CandyPuchased else 0 end)[CandylessDays]
from CTE1 A
inner join CTE1 b on a.rn=b.rn+1
If you just need the result at the end of the series, you don't really need that join.
select max(days) --The highest day in the table (convert these to int first)
- (sum(candies) --Total candies purchased
- (select top 1 candies from #a order by days desc)) --Minus the candies purchased on the last day
from MyTable
If you need this as a sort of running total, try over:
select *, sum(candies) over (order by days) as TotalCandies
from MyTable
order by days desc

How to fill missing dates by groups in a table in sql

I want to know how to use loops to fill in missing dates with value zero based on the start/end dates by groups in sql so that i have consecutive time series in each group. I have two questions.
how to loop for each group?
How to use start/end dates for each group to dynamically fill in missing dates?
My input and expected output are listed as below.
Input: I have a table A like
date value grp_no
8/06/12 1 1
8/08/12 1 1
8/09/12 0 1
8/07/12 2 2
8/08/12 1 2
8/12/12 3 2
Also I have a table B which can be used to left join with A to fill in missing dates.
date
...
8/05/12
8/06/12
8/07/12
8/08/12
8/09/12
8/10/12
8/11/12
8/12/12
8/13/12
...
How can I use A and B to generate the following output in sql?
Output:
date value grp_no
8/06/12 1 1
8/07/12 0 1
8/08/12 1 1
8/09/12 0 1
8/07/12 2 2
8/08/12 1 2
8/09/12 0 2
8/10/12 0 2
8/11/12 0 2
8/12/12 3 2
Please send me your code and suggestion. Thank you so much in advance!!!
You can do it like this without loops
SELECT p.date, COALESCE(a.value, 0) value, p.grp_no
FROM
(
SELECT grp_no, date
FROM
(
SELECT grp_no, MIN(date) min_date, MAX(date) max_date
FROM tableA
GROUP BY grp_no
) q CROSS JOIN tableb b
WHERE b.date BETWEEN q.min_date AND q.max_date
) p LEFT JOIN TableA a
ON p.grp_no = a.grp_no
AND p.date = a.date
The innermost subquery grabs min and max dates per group. Then cross join with TableB produces all possible dates within the min-max range per group. And finally outer select uses outer join with TableA and fills value column with 0 for dates that are missing in TableA.
Output:
| DATE | VALUE | GRP_NO |
|------------|-------|--------|
| 2012-08-06 | 1 | 1 |
| 2012-08-07 | 0 | 1 |
| 2012-08-08 | 1 | 1 |
| 2012-08-09 | 0 | 1 |
| 2012-08-07 | 2 | 2 |
| 2012-08-08 | 1 | 2 |
| 2012-08-09 | 0 | 2 |
| 2012-08-10 | 0 | 2 |
| 2012-08-11 | 0 | 2 |
| 2012-08-12 | 3 | 2 |
Here is SQLFiddle demo
I just needed the query to return all the dates in the period I wanted. Without the joins. Thought I'd share for those wanting to put them in your query. Just change the 365 to whatever timeframe you are wanting.
DECLARE #s DATE = GETDATE()-365, #e DATE = GETDATE();
SELECT TOP (DATEDIFF(DAY, #s, #e)+1)
DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY number)-1, #s)
FROM [master].dbo.spt_values
WHERE [type] = N'P' ORDER BY number
The following query does a union with tableA and tableB. It then uses group by to merge the rows from tableA and tableB so that all of the dates from tableB are in the result. If a date is not in tableA, then the row has 0 for value and grp_no. Otherwise, the row has the actual values for value and grp_no.
select
dat,
sum(val),
sum(grp)
from
(
select
date as dat,
value as val,
grp_no as grp
from
tableA
union
select
date,
0,
0
from
tableB
where
date >= date '2012-08-06' and
date <= date '2012-08-13'
)
group by
dat
order by
dat
I find this query to be easier for me to understand. It also runs faster. It takes 16 seconds whereas a similar right join query takes 32 seconds.
This solution only works with numerical data.
This solution assumes a fixed date range. With some extra work this query can be adapted to limit the date range to what is found in tableA.