sql running total - sql

Im trying to generate a running total by month and year. Ive tried a few examples but I cant get it working. This is the sql I have and I would want to create a running total for the totalclients column
Month| Year| TotalClients| Running Total
Jan |2014| 1| 1
Feb| 2014| 4| 5
Mar| 2014| 8| 13
select Month, Year, TotalClients
From Total
This was the code I was trying to use, ive used a declare table as the main data comes from a different query but this should be the bit you need. I also commented out one of the from lines as I was trying out both way, the commented out line was in a few examples on the net but I couldn't get it working
select t1.monthstart, t1.yearstart, t1.TotalClients, sum(t2.TotalClients) as 'RunningTotal'
from #Totals t1 inner join #Totals t2 on t1.monthstart = t2.monthstart and t1.yearstart = t2.yearstart
--from #Totals t1, #Totals t2
WHERE t1.MonthStart <= t2.MonthStart and t1.Yearstart <= t2.Yearstart
GROUP BY t1.Yearstart, t1.MonthStart, t1.TotalClients
ORDER BY t1.yearstart , t1.monthstart

As #xQbert posted in comments above (I advise reading that article), SQL Server "Windowing Functions" is what you want to use in version 2012+. Windowing functions are flexible and powerful, and far more efficient than self-joins.
As an actual answer, here would be some possible code for you to use:
SELECT YearStart, MonthStart,
ClientCount = SUM(TotalClients) OVER (
PARTITION BY YearStart, MonthStart
ORDER BY YearStart, MonthStart RANGE UNBOUNDED PRECEDING
)
FROM Totals t1
ORDER BY YearStart, MonthStart

I used this in the end, I added a faulldate in to simplify what I wanted and it worked, I think the issue was in the join I used it had the <= the wrong way around.
SELECT
st1.invoicestartdate,
st1.TotalClients,
RunningTotal = SUM(st2.TotalClients)
FROM
#Totals AS st1
INNER JOIN
#Totals AS st2
ON st2.invoicestartdate <= st1.invoicestartdate
GROUP BY st1.invoicestartdate, st1.TotalClients
ORDER BY st1.invoicestartdate;

This query works for SQL Server 2012 and up. I assumed Month is numeric (Jan = 1, Feb = 2, etc.)
SELECT *,
SUM(t.TotalClients) OVER (PARTITION BY t.[Year] ORDER BY t.[Month])
FROM #Totals t
It will reset the client count once the year changes. To keep it going, change the SUM clause to
SUM(t.TotalClients) OVER (ORDER BY t.[Year], t.[Month])

Related

Teradata spool space issue on running a sub query with Count

I am using below query to calculate business days between two dates for all the order numbers. Business days are already available in the teradata table Common_WorkingCalendar. But, i'm also facing spool space issue while i execute the query. I have ample space available in my data lab. Need to optimize the query. Appreciate any inputs.
SELECT
tx."OrderNumber",
(SELECT COUNT(1) FROM Common_WorkingCalendar
WHERE CalDate between Cast(tx."TimeStamp" as date) and Cast(mf.ShipDate as date)) as BusDays
from StoreFulfillment ff
inner join StoreTransmission tx
on tx.OrderNumber = ff.OrderNumber
inner join StoreMerchandiseFulfillment mf
on mf.OrderNumber = ff.OrderNumber
This is a very inefficient way to get this count which results in a product join.
The recommended approach is adding a sequential number to your calendar which increases only on business days (calculated using SUM(CASE WHEN businessDay THEN 1 ELSE 0 END) OVER (ORDER BY CalDate ROWS UNBOUNDED PRECEDING)), then it's two joins, for the start date and the end date.
If this calculation is needed a lot you better add a new column, otherwise you can do it on the fly:
WITH cte AS
(
SELECT CalDate,
-- as this table only contains business days you can use this instead
row_number(*) Over (ORDER BY CalDate) AS DayNo
FROM Common_WorkingCalendar
)
SELECT
tx."OrderNumber",
to_dt.DayNo - from_dt.DayNo AS BusDays
FROM StoreFulfillment ff
INNER JOIN StoreTransmission tx
ON tx.OrderNumber = ff.OrderNumber
INNER JOIN StoreMerchandiseFulfillment mf
ON mf.OrderNumber = ff.OrderNumber
JOIN cte AS from_dt
ON from_dt.CalDate = Cast(tx."TimeStamp" AS DATE)
JOIN cte AS to_dt
ON to_dt.CalDate = Cast(mf.ShipDate AS DATE)

SQL Server 2008 Running Total

I'm aware this has been asked but I'm completely baffled.
Trying to run a running total by day using SQL Server 2008. Have looked at solutions elsewhere but would am still completely perplexed.
The below code shows Daily sales but I cannot make a running total fit. Have looked at the similar solutions here but no luck. Have looked at partition by, order by, CTE etc but I'm just not there yet with SQL.
Would appreciate help, my code is below. I know this only returns the total grouped by day...
SELECT
dim_invoice_date.invoice_date AS 'Invoice Date',
round(SUM(invoice_amount_corp),2) AS 'Sales'
FROM
fact_om_bud_invoice
JOIN
dim_invoice_date ON fact_om_bud_invoice.dim_invoice_date_key = dim_invoice_date.dim_invoice_date_key
WHERE
dim_invoice_date.current_cal_month IN ('Current')
AND fact_om_bud_invoice.budget_code IN ('BUDGET')
GROUP BY
dim_invoice_date.invoice_date
HAVING
ROUND(SUM(invoice_amount_corp), 2) <> 0
ORDER BY
'Invoice Date'
This returns the output:
Invoice Date Sales
-----------------------
4/10/2016 24,132
5/10/2016 15,849
6/10/2016 24,481
7/10/2016 10,243
10/10/2016 42,398
11/10/2016 24,187
Required format is something like:
Invoice Date Sales Running Sales
-------------------------------------------
04/10/2016 24,132 24,132
05/10/2016 15,849 39,981
06/10/2016 24,481 64,462
07/10/2016 10,243 74,705
10/10/2016 42,398 117,103
11/10/2016 24,187 141,290
dim_invoice_date is a numeric field, it's looking up a separate date table to display as date time.
For example, can use WITH common_table_expression
WITH cte AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY h.[Date]) RowN,
h.[Date],
SUM(s.Quantity) q
FROM
Sales s
JOIN Headers h
ON s.ID_Headers = h.ID
WHERE
h.[Date] > '2016.10.31'
GROUP BY
h.[Date]
)
SELECT
c.[Date],
c.q,
SUM(c1.q)
FROM
cte c
JOIN cte c1
ON c1.RowN <= c.RowN
GROUP BY
C.[Date],
c.q
ORDER BY
c.[Date]

Get rows with difference of dates being one

I have the following table and rows defined in SQLFiddle
I need to select rows from products table where difference between two rows start_date and
nvl(return_date,end_date) is 1. i.e. start_date of current row and nvl(return_date,end_date) of previous row should be one
For example
PRODUCT_NO TSH098 and PRODUCT_REG_NO FLDG, the END_DATE is August, 15 2012 and
PRODUCT_NO TSH128 and PRODUCT_REG_NO FLDG start_date is August, 16 2012, so the difference is only of a day.
How can I get the desired output using sql.
Any help is highly appreciable.
Thanks
You can use lag analytical function to get access to a row at a given physical offset prior to the current position. According to your sorting order it might look like this(not so elegant though).
select *
from products p
join (select *
from(select p.Product_no
, p.Product_Reg_No
, case
when (lag(start_date, 1, start_date) over(order by product_reg_no)-
nvl(return_date, end_date)) = 1
then lag(start_date, 1, start_date)
over(order by product_reg_no)
end start_date
, End_Date
, Return_Date
from products p
order by 2,1 desc
)
where start_date is not null
) s
on (p.start_date = s.start_date or p.end_date = s.end_date)
order by 2, 1 desc
SQL FIddle DEMO
In SQL, date + X adds X days to the date. So you can:
select *
from products
where start_date + 1 = nvl(end_date, return_date)
If the dates could contain a time part, use trunc to remove the time part:
select *
from products
where trunc(start_date) + 1 = trunc(nvl(end_date, return_date))
Live example at SQL Fiddle.
I am under the impression you only want the matching dates differing by 1 day if the product reg no matches. So I simply joint it and I think this is what you want
select p1.product_reg_no,
p1.product_no product_no_1,
p2.product_no product_no_2,
p1.start_date start_date_1,
nvl(p2.return_date,p2.end_date) return_or_end_date_2
from products p1
join products p2 on (p1.product_reg_no = p2.product_reg_no)
where p1.start_date-1 = nvl(p2.return_date,p2.end_date)
SQL Fiddle
If I was wrong with the grouping then just leave the join condition away which with the given example products table brings the same result
select p1.product_reg_no,
p1.product_no product_no_1,
p2.product_no product_no_2,
p1.start_date start_date_1,
nvl(p2.return_date,p2.end_date) return_or_end_date_2
from products p1, products p2
where p1.start_date-1 = nvl(p2.return_date,p2.end_date)
SQL Fiddle 2
Now you say the difference is 1 day. I automatically assumed that start_date is 1 day higher than the nvl(return_date,end_date). Also I assumed that the date is always midnight. But to have all that also excluded you can work with trunc and go in both directions:
select p1.product_reg_no,
p1.product_no product_no_1,
p2.product_no product_no_2,
p1.start_date start_date_1,
nvl(p2.return_date,p2.end_date) return_or_end_date_2
from products p1, products p2
where trunc(p1.start_date)-1 = trunc(nvl(p2.return_date,p2.end_date))
or trunc(p1.start_date)+1 = trunc(nvl(p2.return_date,p2.end_date))
SQL Fiddle 3
And this all works because dates (not timestamp) can be calculated by adding and subtracting.
EDIT: Following your comment you want return_date or end_date to be compared and equal dates are also wanted:
select p1.product_reg_no,
p1.product_no product_no_1,
p2.product_no product_no_2,
p1.start_date start_date_1,
p2.return_date return_date_2,
p2.end_date end_date_2
from products p1, products p2
where trunc(p1.start_date) = trunc(p2.return_date)
or trunc(p1.start_date)-1 = trunc(p2.return_date)
or trunc(p1.start_date)+1 = trunc(p2.return_date)
or trunc(p1.start_date) = trunc(p2.end_date)
or trunc(p1.start_date)-1 = trunc(p2.end_date)
or trunc(p1.start_date)+1 = trunc(p2.end_date)
SQL Fiddle 4
The way to compare the current row with the previous row is to user the LAG() function. Something like this:
select * from
(
select p.*
, lag (end_date) over
(order by start_date )
as prev_end_date
, lag (return_date) over
(order by start_date )
as prev_return_date
from products p
)
where (trunc(start_date) - 1) = trunc(nvl(prev_return_date, prev_end_date))
order by 2,1 desc
However, this will not return the results you desire, because you have not defined a mechanism for defining a sort order. And without a sort order the concept of "previous row" is meaningless.
However, what you can do is this:
select p1.*
, p2.*
from products p1 cross join products p2
where (trunc(p2.start_date) - 1) = trunc(nvl(p1.return_date, p1.end_date))
order by 2, 1 desc
This SQL queries your table twice, filtering on the basis of dates. The each row in the result set contains a record from each table. If a given start_date matches more than one end_date or vice versa you will get records for multiple hits.
You mean like this?
SELECT T2.*
FROM PRODUCTS T1
JOIN PRODUCTS T2 ON (
nvl(T1.end_date, T1.return_date) + 1 = T2.start_date
);
In your SQL Fiddle example, it returns:
PRODUCT_NO PRODUCT_REG_NO START_DATE END_DATE RETURN_DATE
TSH128 FLDG August, 16 2012 00:00:00-0400 September, 15 2012 00:00:00-0400 (null)
TSH125 SCRW August, 08 2012 00:00:00-0400 September, 07 2012 00:00:00-0400 (null)
TSH137 SCRW September, 08 2012 00:00:00-0400 October, 07 2012 00:00:00-0400 (null)
TSH128 is returned for the reasons you already explained.
TSH125 is returned because TSH116 end_date is August, 07 2012.
TSH137 is returned because TSH125 end_date is September, 07 2012.
If you want to compare only rows within the same product_reg_no, it's easy to add that to the JOIN condition. If you want both "directions" of the 1-day difference, it's easy to add that too.

Filling in missing dates DB2 SQL

My initial query looks like this:
select process_date, count(*) batchCount
from T1.log_comments
order by process_date asc;
I need to be able to do some quick analysis for weekends that are missing, but wanted to know if there was a quick way to fill in the missing dates not present in process_date.
I've seen the solution here but am curious if there's any magic hidden in db2 that could do this with only a minor modification to my original query.
Note: Not tested, framed it based on my exposure to SQL Server/Oracle. I guess this gives you the idea though:
*now amended and tested on DB2*
WITH MaxDateQry(MaxDate) AS
(
SELECT MAX(process_date) FROM T1.log_comments
),
MinDateQry(MinDate) AS
(
SELECT MIN(process_date) FROM T1.log_comments
),
DatesData(ProcessDate) AS
(
SELECT MinDate from MinDateQry
UNION ALL
SELECT (ProcessDate + 1 DAY) FROM DatesData WHERE ProcessDate < (SELECT MaxDate FROM MaxDateQry)
)
SELECT a.ProcessDate, b.batchCount
FROM DatesData a LEFT JOIN
(
SELECT process_date, COUNT(*) batchCount
FROM T1.log_comments
) b
ON a.ProcessDate = b.process_date
ORDER BY a.ProcessDate ASC;

Sql Server - Joining subqueries using calculated fields

I am trying to calculate the percentage change in price between days. As the days are not consectutive, I build into the query a calculated field that tells me what relative day it is (day 1, day 2, etc). In order to compare today with yesterday, I offset the calculated day number by 1 in a subquery. what I want to do is to join the inner and outer query on the calculated relative day. The code I came up with is:
SELECT TOP 11
P.Date,
(AVG(P.SettlementPri) - PriceY) / PriceY as PriceChange,
P.Symbol,
(RANK() OVER (ORDER BY P.Date desc)) as dayrank_Today
FROM OTE P
JOIN (SELECT TOP 11
C.Date,
AVG(SettlementPri) as PriceY,
(RANK() OVER (ORDER BY C.Date desc))+1 as dayrank_Yest
FROM OTE C
WHERE C.ComCode = 'C-'
GROUP BY c.Date) C ON dayrank_Today = C.dayrank_Yest
WHERE P.ComCode = 'C-'
GROUP BY P.Symbol, P.Date
If I try and execute the query, I get an erro message indicating dayrank_Today is an invalid column. I have tried renaming it, qualifying it, yell obsenities at it and I get squat. Still an error.
You can't do a select of a calculated column, and then use it in a join. You can use CTEs, which I'm not so familiar with, or you can jsut do table selects like so:
SELECT
P.Date,
(AVG(AvgPrice) - C.PriceY) / C.PriceY as PriceChange,
P.Symbol,
P.dayrank_Today FROM
(SELECT TOP 11
ComCode,
Date,
AVG(SettlementPri) as AvgPrice,
Symbol,
(RANK() OVER (ORDER BY Date desc)) as dayrank_Today
FROM OTE WHERE ComCode = 'C-') P
JOIN (SELECT TOP 11
C.Date,
AVG(SettlementPri) as PriceY,
(RANK() OVER (ORDER BY C.Date desc))+1 as dayrank_Yest
FROM OTE C
WHERE C.ComCode = 'C-'
GROUP BY c.Date) C ON dayrank_Today = C.dayrank_Yest
GROUP BY P.Symbol, P.Date
If possible consider using a CTE as it makes it very easy. Something like this:
With Raw as
(
SELECT TOP 11 C.Date,
Avg(SettlementPri) As PriceY,
Rank() OVER (ORDER BY C.Date desc) as dayrank
FROM OTE C WHERE C.Comcode = 'C-'
Group by C.Date
)
select today.pricey as todayprice ,
yesterday.pricey as yesterdayprice,
(today.pricey - yesterday.pricey)/today.pricey * 100 as percentchange
from Raw today
left outer join Raw yesterday on today.dayrank = yesterday.dayrank + 1
Obviously this doesn;t include the symbol but that can be included pretty easily.
If using 'With' syntax doesn;t suit you can also use calculated fields with Outer Apply http://technet.microsoft.com/en-us/library/ms175156.aspx
Although the CTE will mean that you only need to write your price calculation once which is a lot cleaner
Cheers
I had the same problem and found this thread and found a solution so I thought I'd post it here.
Instead of using the column name as parameter for ON, copy the statement that gave you the colmun name in the first place:
replace:
ON dayrank_Today = C.dayrank_Yest
with:
ON (RANK() OVER (ORDER BY Date desc)) = C.dayrank_Yest
Granted, you're displeasing the Programming Gods by violating DRY, but you could be pragmatic and mention the duplication in the comments, which should appease their wrath to a mild grumbling.