Recursive calculation on CTE - sql-server-2012

I'm looking for a way, how to calculate formula as below:
month 1
result_1 = result_0 + (number_of_open - number_of_close), where result_0 = 0
month 2
result_2 = result_1 + (number_of_open - number_of_close)
month 3
result_3 = result_2 + (number_of_open - number_of_close)
I know how to calculate it when results are shown from the very beginning, means:
but when I have some dates chosen, for example from February 1, 2015, it doesn't count correctly:
It seems to be counted from the taken moment.
Is there any idea how to solve it?

I found a good enough workaround for this. I've been searching how to solve this, and found that I can use SUM() OVER() and it will work as a Running Total. But if I need it to work inside given time-range, I had to use subquery, so I was not able to make it with CTE and had to use temp tables. My code is now:
IF OBJECT_ID('tempdb..#Temp1') IS NOT NULL BEGIN
drop table #Temp1
end
SELECT * into #Temp1
from (
SELECT
datepart(yy, t3.[datestamp]) AS full_year
,datepart(mm, t3.[datestamp]) AS month_number
,count(*) as number_of_activities
,t2.affected_item
FROM [sm70prod].[dbo].[ACTSVCMGTM1] AS t3
JOIN [sm70prod].[dbo].[INCIDENTSM1] AS t2 ON t3.number = t2.incident_id
WHERE
t2.affected_item like 'Service_name'
AND (t3.[type] LIKE 'Open')
GROUP BY t2.affected_item, datepart(yy, t3.[datestamp]), datepart(mm, t3.[datestamp])
)
as databases (full_year, month_number, number_of_activities, affected_item)
;
IF OBJECT_ID('tempdb..#Temp2') IS NOT NULL BEGIN
drop table #Temp2
end
SELECT * into #Temp2
from (
SELECT
datepart(yy, t3.[datestamp]) AS full_year
,datepart(mm, t3.[datestamp]) AS month_number
,count(*) as number_of_activities
,t2.affected_item
FROM [sm70prod].[dbo].[ACTSVCMGTM1] AS t3
JOIN [sm70prod].[dbo].[INCIDENTSM1] AS t2 ON t3.number = t2.incident_id
WHERE
t2.affected_item like 'Service_name'
AND (t3.[type] LIKE 'Closed')
GROUP BY t2.affected_item, datepart(yy, t3.[datestamp]), datepart(mm, t3.[datestamp])
)
as databases (full_year, month_number, number_of_activities, affected_item)
select * from (select sum(o.number_of_activities - c.number_of_activities) over (ORDER BY c.full_year, c.month_number) as [backlog]
from #Temp1 o full join #Temp2 c on o.full_year = c.full_year and o.month_number = c.month_number) sub
--where 'time condition'
and I can put my dates/time condition in 'where' clause.
Now the issue is how to correctly add it to the code where I use CTE and few 'selects' combined with 'union all's. But this is a story for another question, I guess.

Related

Delete the records repeated by date, and keep the oldest

I have this query, and it returns the following result, I need to delete the records repeated by date, and keep the oldest, how could I do this?
select
a.EMP_ID, a.EMP_DATE,
from
EMPLOYES a
inner join
TABLE2 b on a.table2ID = b.table2ID and b.ID_TYPE = 'E'
where
a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and month(a.DATE) = 1
and a.ID <> 31
order by
a.DATE;
Additionally, I would like to fill in the missing days of the month ... and put them empty if I don't have that data, can this be done?
I would appreciate if you could guide me to solve this problem
Thank you!
The other answers miss some of the requirement..
Initial step - do this once only. Make a calendar table. This will come in handy for all sorts of things over the time:
DECLARE #Year INT = '2000';
DECLARE #YearCnt INT = 50 ;
DECLARE #StartDate DATE = DATEFROMPARTS(#Year, '01','01')
DECLARE #EndDate DATE = DATEADD(DAY, -1, DATEADD(YEAR, #YearCnt, #StartDate));
;WITH Cal(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM Cal
WHERE n < DATEDIFF(DAY, #StartDate, #EndDate)
),
FnlDt(d, n) AS
(
SELECT DATEADD(DAY, n, #StartDate), n FROM Cal
),
FinalCte AS
(
SELECT
[D] = CONVERT(DATE,d),
[Dy] = DATEPART(DAY, d),
[Mo] = DATENAME(MONTH, d),
[Yr] = DATEPART(YEAR, d),
[DN] = DATENAME(WEEKDAY, d),
[N] = n
FROM FnlDt
)
SELECT * INTO Cal FROM finalCte
ORDER BY [Date]
OPTION (MAXRECURSION 0);
credit: mostly this site
Now we can write some simple query to stick your data (with one small addition) onto it:
--your query, minus the date bits in the WHERE, and with a ROW_NUMBER
WITH yourQuery AS(
SELECT a.emp_id, a.emp_date,
ROW_NUMBER() OVER(PARTITION BY CAST(a.emp_date AS DATE) ORDER BY a.emp_date) rn
FROM EMPLOYES a
INNER JOIN TABLE2 b on a.table2ID = b.table2ID
WHERE a.emp_id = 'VJAHAJHSJHDAJHSJDH' AND a.id <> 31 AND b.id_type = 'E'
)
--your query, left joined onto the cal table so that you get a row for every day even if there is no emp data for that day
SELECT c.d, yq.*
FROM
Cal c
LEFT JOIN yourQuery yq
ON
c.d = CAST(yq.emp_date AS DATE) AND --cut the time off
yq.rn = 1 --keep only the earliest time per day
WHERE
c.d BETWEEN '2021-01-01' AND EOMONTH('2021-01-01')
We add a rownumbering to your table, it restarts every time the date changes and counts up in order of time. We make this into a CTE (or a subquery, CTE is cleaner) then we simply left join it to the calendar table. This means that for any date you don't have data, you still have the calendar date. For any days you do have data, the rownumber rn being a condition of the join means that only the first datetime from each day is present in the results
Note: something is wonky about your question . You said you SELECT a.emp_id and your results show 'VJAHAJHSJHDAJHSJDH' is the emp id, but your where clause says a.id twice, once as a string and once as a number - this can't be right, so I've guessed at fixing it but I suspect you have translated your query into something for SO, perhaps to hide real column names.. Also your SELECT has a dangling comma that is a syntax error.
If you have translated/obscured your real query, make absolutely sure you understand any answer here when translating it back. It's very frustrating when someone is coming back and saying "hi your query doesn't work" then it turns out that they damaged it trying to translate it back to their own db, because they hid the real column names in the question..
FInally, do not use functions on table data in a where clause; it generally kills indexing. Always try and find a way of leaving table data alone. Want all of january? Do like I did, and say table.datecolumn BETWEEN firstofjan AND endofjan etc - SQLserver at least stands a chance of using an index for this, rather than calling a function on every date in the table, every time the query is run
You can use ROW_NUMBER
WITH CTE AS
(
SELECT a.EMP_ID, a.EMP_DATE,
RN = ROW_NUMBER() OVER (PARTITION BY a.EMP_ID, CAST(a.DATE as Date) ORDER BY a.DATE ASC)
from EMPLOYES a INNER JOIN TABLE2 b
on a.table2ID = b.table2ID
and b.ID_TYPE = 'E'
where a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and MONTH(a.DATE) = 1
and a.ID <> 31
)
SELECT * FROM CTE
WHERE RN = 1
Try with an aggregate function MAX or MIN
create table #tmp(dt datetime, val numeric(4,2))
insert into #tmp values ('2021-01-01 10:30:35', 1)
insert into #tmp values ('2021-01-02 10:30:35', 2)
insert into #tmp values ('2021-01-02 11:30:35', 3)
insert into #tmp values ('2021-01-03 10:35:35', 4)
select * from #tmp
select tmp.*
from #tmp tmp
inner join
(select max(dt) as dt, cast(dt as date) as dt_aux from #tmp group by cast(dt as date)) compressed_rows on
tmp.dt = compressed_rows.dt
drop table #tmp
results:

SQL with as expression shows multiple results

I am writing a SQL query using with as expression. I always get a result in the square of what I required.
This is my query:
DECLARE #MAX_DATE AS INT
SET #MAX_DATE = (SELECT DATEPART(MONTH,FECHA) FROM ALBVENTACAB WHERE NUMALBARAN IN (SELECT DISTINCT MAX(NUMALBARAN) FROM ALBVENTACAB));
;WITH TABLE_LAST AS (
SELECT CONCAT(DATEPART(MONTH,FECHA),'-',DATEPART(YEAR,FECHA)) as LAST_YEAR_MONTH
,SUM(TOTALNETO) AS LAST_YEAR_VALUE
FROM ALBVENTACAB
WHERE DATEPART(YEAR,CURRENT_TIMESTAMP) -1 = DATEPART(YEAR,FECHA) AND NUMSERIE LIKE 'A%'
AND DATEPART(MONTH,FECHA) <= #MAX_DATE
GROUP BY CONCAT(DATEPART(MONTH,FECHA),'-',DATEPART(YEAR,FECHA))
)
,TABLE_CURRENT AS(
SELECT CONCAT(DATEPART(MONTH,FECHA),'-',DATEPART(YEAR,FECHA)) as CURR_YEAR_MONTH
,SUM(TOTALNETO) AS CURR_YEAR_VALUE
FROM ALBVENTACAB
WHERE DATEPART(YEAR,CURRENT_TIMESTAMP) <= DATEPART(YEAR,FECHA) AND NUMSERIE LIKE 'A%'
GROUP BY CONCAT(DATEPART(MONTH,FECHA),'-',DATEPART(YEAR,FECHA))
)
SELECT *
FROM TABLE_CURRENT, TABLE_LAST
When I run the query I get exactly the square of the result.
I want to compare sale monthly with last year.
2-2020 814053.3 2-2019 840295.1
1-2020 1094993.65 2-2019 840295.1
3-2020 293927.3 2-2019 840295.1
2-2020 814053.3 1-2019 1050701.68
1-2020 1094993.65 1-2019 1050701.68
3-2020 293927.3 1-2019 1050701.68
2-2020 814053.3 3-2019 887776.1
1-2020 1094993.65 3-2019 887776.1
3-2020 293927.3 3-2019 887776.1
I should get only 3 rows instead of 9 rows.
You need to properly join your two CTE - the way you're doing it now, you're getting a Cartesian product of each row in either CTE together.
Do something like:
*;WITH TABLE_LAST AS
( ....
),
TABLE_CURRENT AS
( ....
)
SELECT *
FROM TABLE_CURRENT curr
INNER JOIN TABLE_LAST last ON (some join condition here)
What that join condition is going to be - I have no idea, and cannot tell from your question - but you have to define how these two sets of data "connect" ....
It could be something like:
SELECT *
FROM TABLE_CURRENT curr
INNER JOIN TABLE_LAST last ON curr.CURR_YEAR_MONTH = last.LAST_YEAR_MONT
or whatever else makes sense in your situation - but basically, you need to somehow "tie together" these two sets of data and get only those rows that make sense - not just every row from "last" combined with every row from "curr" ....
While you already got the answer on how to join the two results, I thought I'd tell you how to typically approach such problems.
From the same table, you want two sums on different conditions (different years that is). You solve this with conditional aggregation, which does just that: aggregate (sum) based on a condition (year).
select
datepart(month, fecha) as month,
sum(case when datepart(year, fecha) = datepart(year, getdate()) then totalneto end) as this_year,
sum(case when datepart(year, fecha) = datepart(year, getdate()) -1 then totalneto end) as last_year
from albventacab
where numserie like 'A%'
and fecha > dateadd(year, -2, getdate())
group by datepart(month, fecha)
order by datepart(month, fecha);

Calculated fields from queries in CTE are quite slow, how to optimize

I have a query with calculated fields which involves looking up a dataset within a CTE for each of them, but it's quite slow when I get to a couple of these fields.
Here's an idea:
;WITH TRNCTE AS
(
SELECT TRN.PORT_N, TRN.TRADE_DATE, TRN.TRANS_TYPE, TRN.TRANS_SUB_CODE, TRN.SEC_TYPE, TRN.SETTLE_DATE
FROM TRNS_RPT TRN
WHERE TRN.TRADEDT >= '2014-01-01' AND TRN.TRADEDT <= '2014-12-31'
)
SELECT
C.CLIENT_NAME,
C.PORT_N,
C.PHONE_NUMBER,
CASE
WHEN EXISTS(SELECT TOP 1 1 FROM TRNCTE WHERE PORT_N = C.PORT_N AND MONTH(SETTLE_DATE) = 12) THEN 'DECEMBER TRANSACTION'
ELSE 'NOT DECEMBER TRANSACTION'
END AS ALIAS1
FROM CLIENTS C
WHERE EXISTS(SELECT TOP 1 1 FROM TRNCTE WHERE PORT_N = C.PORT_N)
If I had many of these calculated fields, the query can take up to 10 minutes to execute. Gathering the data in the CTE takes about 15 seconds for around 1,000,000 records.
I don't really need JOINS since I'm not really using the data that a JOIN would do, I only want to check for the existence of records in TRNS_RPT with certains criterias and set alias fields to certain values whether I find such records or not.
Can you help me optimize this ? Thanks
From looking at your code I would probably do a join instead to avoid having to include TRNCTE twice in the query. It is not exactly the same if TRNCTE.PORT_N is not unique and you would get duplicate rows.
;WITH TRNCTE AS
(
SELECT TRN.PORT_N, TRN.TRADE_DATE, TRN.TRANS_TYPE, TRN.TRANS_SUB_CODE, TRN.SEC_TYPE, TRN.SETTLE_DATE
FROM TRNS_RPT TRN
WHERE TRN.TRADEDT >= '2014-01-01' AND TRN.TRADEDT <= '2014-12-31'
)
SELECT
C.CLIENT_NAME,
C.PORT_N,
C.PHONE_NUMBER,
CASE
WHEN MONTH(TRNCTE.SETTLE_DATE) = 12)
THEN 'DECEMBER TRANSACTION'
ELSE 'NOT DECEMBER TRANSACTION'
END AS ALIAS1
FROM CLIENTS C
JOIN TRNCTE ON C.PORT_N = TRNCTE.PORT_N
There is a trick what you can do in cases like this. You need to insert the result of the cte into a temp table. This way you will save the cost of the recalculation.
Plese give it a try.
;WITH TRNCTE AS
(
SELECT TRN.PORT_N, TRN.TRADE_DATE, TRN.TRANS_TYPE, TRN.TRANS_SUB_CODE, TRN.SEC_TYPE, TRN.SETTLE_DATE
FROM TRNS_RPT TRN
WHERE TRN.TRADEDT >= '2014-01-01' AND TRN.TRADEDT <= '2014-12-31'
)
SELECT * INTO #temp
FROM TRNCTE
SELECT
C.CLIENT_NAME,
C.PORT_N,
C.PHONE_NUMBER,
CASE
WHEN EXISTS(SELECT TOP 1 1 FROM #temp WHERE PORT_N = C.PORT_N AND MONTH(SETTLE_DATE) = 12) THEN 'DECEMBER TRANSACTION'
ELSE 'NOT DECEMBER TRANSACTION'
END AS ALIAS1
FROM CLIENTS C
WHERE EXISTS(SELECT TOP 1 1 FROM #temp WHERE PORT_N = C.PORT_N)
DROP TABLE #temp

Running Total as SUM () OVER () in SQL when values are missing

I'm trying to calculate Running Total and it works correct but only when values I'm conditioning on are available. When some are unavailable, calculation is going wrong, some NULLs happen and at the end Running Total is incorrect, here's the example of such situation:
and I would like to be set like on the screen shot below (with missing months added), which should give correct Running Total (named backlog here) at the end:
Is there any way to define full_year and month_number columns to be visible with '0' value set when there was no value?
My current query is as below:
IF OBJECT_ID('tempdb..#Temp4') IS NOT NULL BEGIN
drop table #Temp4
end
SELECT * into #Temp4
from (
SELECT
datepart(yy, t3.[datestamp]) AS full_year
,datepart(mm, t3.[datestamp]) AS month_number
,count(*) as number_of_activities
,t2.affected_item
FROM [sm70prod].[dbo].[ACTSVCMGTM1] AS t3
JOIN [sm70prod].[dbo].[INCIDENTSM1] AS t2 ON t3.number = t2.incident_id
WHERE
t2.affected_item like 'service'
AND (t3.[type] LIKE 'Open')
GROUP BY t2.affected_item, datepart(yy, t3.[datestamp]), datepart(mm, t3.[datestamp])
)
as databases (full_year, month_number, number_of_activities, affected_item)
;
IF OBJECT_ID('tempdb..#Temp5') IS NOT NULL BEGIN
drop table #Temp5
end
SELECT * into #Temp5
from (
SELECT
datepart(yy, t3.[datestamp]) AS full_year
,datepart(mm, t3.[datestamp]) AS month_number
,count(*) as number_of_activities
,t2.affected_item
FROM [sm70prod].[dbo].[ACTSVCMGTM1] AS t3
JOIN [sm70prod].[dbo].[INCIDENTSM1] AS t2 ON t3.number = t2.incident_id
WHERE
t2.affected_item like 'service'
AND (t3.[type] LIKE 'Closed')
GROUP BY t2.affected_item, datepart(yy, t3.[datestamp]), datepart(mm, t3.[datestamp])
)
as databases (full_year, month_number, number_of_activities, affected_item)
select * from (select o.full_year
,o.month_number
,o.number_of_activities as [open]
,c.number_of_activities as [close]
,sum(o.number_of_activities - c.number_of_activities) over (ORDER BY c.full_year, c.month_number) as [backlog]
from #Temp4 o full join #Temp5 c on o.full_year = c.full_year and o.month_number = c.month_number) as sub
order by full_year, month_number
https://msdn.microsoft.com/en-gb/library/ms190349%28v=sql.110%29.aspx
Try using the coalesce function:
COALESCE(someattribute, 0);
if the attribute is NULL the value zero will be used instead.
Also note:
When comparing varchars without a regex you should use the = operator and not the LIKE operator.
I found the answer for that. For missing months I can use the code as below:
If(OBJECT_ID('tempdb..#Temp6') Is Not Null)
Begin
Drop Table #Temp6
End
create table #Temp6
(
full_year int
,month_number int
)
; WITH cteStartDate AS
(SELECT StartDate = '2007-01-01'),
cteSequence(SeqNo)
AS (SELECT 0
UNION ALL
SELECT SeqNo + 1
FROM cteSequence
WHERE SeqNo < DATEDIFF(MM,(SELECT StartDate FROM cteStartDate),getdate()))
INSERT INTO #Temp6
SELECT datepart(yy, DATEADD(MM,SeqNo,(SELECT StartDate FROM cteStartDate))) AS full_year
,datepart(mm, DATEADD(MM,SeqNo,(SELECT StartDate FROM cteStartDate))) AS month_number
FROM cteSequence
OPTION (MAXRECURSION 0)
Then I can use full join with #Temp4, add ISNULL(o.number_of_activities, 0), the same for #Temp5 and it will work.

How to self-join table in a way that every record is joined with the "previous" record?

I have a MS SQL table that contains stock data with the following columns: Id, Symbol, Date, Open, High, Low, Close.
I would like to self-join the table, so I can get a day-to-day % change for Close.
I must create a query that will join the table with itself in a way that every record contains also the data from the previous session (be aware, that I cannot use yesterday's date).
My idea is to do something like this:
select * from quotes t1
inner join quotes t2
on t1.symbol = t2.symbol and
t2.date = (select max(date) from quotes where symbol = t1.symbol and date < t1.date)
However I do not know if that's the correct/fastest way. What should I take into account when thinking about performance? (E.g. will putting UNIQUE index on a (Symbol, Date) pair improve performance?)
There will be around 100,000 new records every year in this table. I am using MS SQL Server 2008
One option is to use a recursive cte (if I'm understanding your requirements correctly):
WITH RNCTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY date) rn
FROM quotes
),
CTE AS (
SELECT symbol, date, rn, cast(0 as decimal(10,2)) perc, closed
FROM RNCTE
WHERE rn = 1
UNION ALL
SELECT r.symbol, r.date, r.rn, cast(c.closed/r.closed as decimal(10,2)) perc, r.closed
FROM CTE c
JOIN RNCTE r on c.symbol = r.symbol AND c.rn+1 = r.rn
)
SELECT * FROM CTE
ORDER BY symbol, date
SQL Fiddle Demo
If you need a running total for each symbol to use as the percentage change, then easy enough to add an additional column for that amount -- wasn't completely sure what your intentions were, so the above just divides the current closed amount by the previous closed amount.
Something like this w'd work in SQLite:
SELECT ..
FROM quotes t1, quotes t2
WHERE t1.symbol = t2.symbol
AND t1.date < t2.date
GROUP BY t2.ID
HAVING t2.date = MIN(t2.date)
Given SQLite is a simplest of a kind, maybe in MSSQL this will also work with minimal changes.
Index on (symbol, date)
SELECT *
FROM quotes q_curr
CROSS APPLY (
SELECT TOP(1) *
FROM quotes
WHERE symbol = q_curr.symbol
AND date < q_curr.date
ORDER BY date DESC
) q_prev
You do something like this:
with OrderedQuotes as
(
select
row_number() over(order by Symbol, Date) RowNum,
ID,
Symbol,
Date,
Open,
High,
Low,
Close
from Quotes
)
select
a.Symbol,
a.Date,
a.Open,
a.High,
a.Low,
a.Close,
a.Date PrevDate,
a.Open PrevOpen,
a.High PrevHigh,
a.Low PrevLow,
a.Close PrevClose,
b.Close-a.Close/a.Close PctChange
from OrderedQuotes a
join OrderedQuotes b on a.Symbol = b.Symbol and a.RowNum = b.RowNum + 1
If you change the last join to a left join you get a row for the first date for each symbol, not sure if you need that.
You can use option with CTE and ROW_NUMBER ranking function
;WITH cte AS
(
SELECT symbol, date, [Open], [High], [Low], [Close],
ROW_NUMBER() OVER(PARTITION BY symbol ORDER BY date) AS Id
FROM quotes
)
SELECT c1.Id, c1.symbol, c1.date, c1.[Open], c1.[High], c1.[Low], c1.[Close],
ISNULL(c2.[Close] / c1.[Close], 0) AS perc
FROM cte c1 LEFT JOIN cte c2 ON c1.symbol = c2.symbol AND c1.Id = c2.Id + 1
ORDER BY c1.symbol, c1.date
For improving performance(avoiding sorting and RID Lookup) use this index
CREATE INDEX ix_symbol$date_quotes ON quotes(symbol, date) INCLUDE([Open], [High], [Low], [Close])
Simple demo on SQLFiddle
What you had is fine. I don't know if translating the sub-query into the join will help. However, you asked for it, so the way to do it might be to join the table to itself once more.
select *
from quotes t1
inner join quotes t2
on t1.symbol = t2.symbol and t1.date > t2.date
left outer join quotes t3
on t2.symbol = t3.symbol and t2.date > t3.date
where t3.date is null
You could do something like this:
DECLARE #Today DATETIME
SELECT #Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP))
;WITH today AS
(
SELECT Id ,
Symbol ,
Date ,
[OPEN] ,
High ,
LOW ,
[CLOSE],
DATEADD(DAY, -1, Date) AS yesterday
FROM quotes
WHERE date = #today
)
SELECT *
FROM today
LEFT JOIN quotes yesterday ON today.Symbol = yesterday.Symbol
AND today.yesterday = yesterday.Date
That way you limit your "today" results, if that's an option.
EDIT: The CTEs listed as other questions may work well, but I tend to be hesitant to use ROW_NUMBER when dealing with 100K rows or more. If the previous day may not always be yesterday, I tend to prefer to pull out the check for the previous day in its own query then use it for reference:
DECLARE #Today DATETIME, #PreviousDay DATETIME
SELECT #Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP));
SELECT #PreviousDay = MAX(Date) FROM quotes WHERE Date < #Today;
WITH today AS
(
SELECT Id ,
Symbol ,
Date ,
[OPEN] ,
High ,
LOW ,
[CLOSE]
FROM quotes
WHERE date = #today
)
SELECT *
FROM today
LEFT JOIN quotes AS previousday
ON today.Symbol = previousday.Symbol
AND previousday.Date = #PreviousDay