Moving/Rolling Average in SQL - sql

I am using SQL Server 2005.
Consider the following table with three columns: issueid, date and rate: sqlfiddle.com/#!2/611682
The result I am looking for is:
For issueid 1, the average on 3/31/2014 is the average of the rate values on 01/31/2014, 02/28/2014 and 3/31/2014. In other words, for each security and at each the date, the moving average is the average of the rate values for that month and the two previous months.
I would like the result to be dumped in a new column of the table.
Is there any way to do that in an efficient way?
Thank you for your help!

Try this:
SELECT A.issueid, A.[date], A.rate, AVG(B.rate)
FROM test_table A
OUTER APPLY (SELECT *
FROM test_table
WHERE issueid = A.issueid
AND [date] BETWEEN DATEADD(MONTH,-2,A.[date]) AND A.[date]) B
GROUP BY A.issueid, A.[date], A.rate
ORDER BY A.issueid, A.[date]

This works, if it is always a 3 month rolling, using SQL 2005 can't think of a more efficient way
SELECT r.IssueId, r.date, (r.rate + r1.rate +r2.rate) /3 AS AvgRate
FROM dbo.Rates AS r
LEFT OUTER JOIN ( SELECT Rates.IssueId,
Rates.date,
Rates.rate
FROM Rates ) r1 ON r1.IssueId = r.IssueId AND r1.date = CONVERT(VARCHAR(10), DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0,r.date),0)),1)
LEFT OUTER JOIN ( SELECT Rates.IssueId,
Rates.date,
Rates.rate
FROM Rates ) r2 ON r2.IssueId = r.IssueId AND r2.date = CONVERT(VARCHAR(10), DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0,r1.date),0)),1)

Related

Delete the records repeated by date, and keep the oldest

I have this query, and it returns the following result, I need to delete the records repeated by date, and keep the oldest, how could I do this?
select
a.EMP_ID, a.EMP_DATE,
from
EMPLOYES a
inner join
TABLE2 b on a.table2ID = b.table2ID and b.ID_TYPE = 'E'
where
a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and month(a.DATE) = 1
and a.ID <> 31
order by
a.DATE;
Additionally, I would like to fill in the missing days of the month ... and put them empty if I don't have that data, can this be done?
I would appreciate if you could guide me to solve this problem
Thank you!
The other answers miss some of the requirement..
Initial step - do this once only. Make a calendar table. This will come in handy for all sorts of things over the time:
DECLARE #Year INT = '2000';
DECLARE #YearCnt INT = 50 ;
DECLARE #StartDate DATE = DATEFROMPARTS(#Year, '01','01')
DECLARE #EndDate DATE = DATEADD(DAY, -1, DATEADD(YEAR, #YearCnt, #StartDate));
;WITH Cal(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM Cal
WHERE n < DATEDIFF(DAY, #StartDate, #EndDate)
),
FnlDt(d, n) AS
(
SELECT DATEADD(DAY, n, #StartDate), n FROM Cal
),
FinalCte AS
(
SELECT
[D] = CONVERT(DATE,d),
[Dy] = DATEPART(DAY, d),
[Mo] = DATENAME(MONTH, d),
[Yr] = DATEPART(YEAR, d),
[DN] = DATENAME(WEEKDAY, d),
[N] = n
FROM FnlDt
)
SELECT * INTO Cal FROM finalCte
ORDER BY [Date]
OPTION (MAXRECURSION 0);
credit: mostly this site
Now we can write some simple query to stick your data (with one small addition) onto it:
--your query, minus the date bits in the WHERE, and with a ROW_NUMBER
WITH yourQuery AS(
SELECT a.emp_id, a.emp_date,
ROW_NUMBER() OVER(PARTITION BY CAST(a.emp_date AS DATE) ORDER BY a.emp_date) rn
FROM EMPLOYES a
INNER JOIN TABLE2 b on a.table2ID = b.table2ID
WHERE a.emp_id = 'VJAHAJHSJHDAJHSJDH' AND a.id <> 31 AND b.id_type = 'E'
)
--your query, left joined onto the cal table so that you get a row for every day even if there is no emp data for that day
SELECT c.d, yq.*
FROM
Cal c
LEFT JOIN yourQuery yq
ON
c.d = CAST(yq.emp_date AS DATE) AND --cut the time off
yq.rn = 1 --keep only the earliest time per day
WHERE
c.d BETWEEN '2021-01-01' AND EOMONTH('2021-01-01')
We add a rownumbering to your table, it restarts every time the date changes and counts up in order of time. We make this into a CTE (or a subquery, CTE is cleaner) then we simply left join it to the calendar table. This means that for any date you don't have data, you still have the calendar date. For any days you do have data, the rownumber rn being a condition of the join means that only the first datetime from each day is present in the results
Note: something is wonky about your question . You said you SELECT a.emp_id and your results show 'VJAHAJHSJHDAJHSJDH' is the emp id, but your where clause says a.id twice, once as a string and once as a number - this can't be right, so I've guessed at fixing it but I suspect you have translated your query into something for SO, perhaps to hide real column names.. Also your SELECT has a dangling comma that is a syntax error.
If you have translated/obscured your real query, make absolutely sure you understand any answer here when translating it back. It's very frustrating when someone is coming back and saying "hi your query doesn't work" then it turns out that they damaged it trying to translate it back to their own db, because they hid the real column names in the question..
FInally, do not use functions on table data in a where clause; it generally kills indexing. Always try and find a way of leaving table data alone. Want all of january? Do like I did, and say table.datecolumn BETWEEN firstofjan AND endofjan etc - SQLserver at least stands a chance of using an index for this, rather than calling a function on every date in the table, every time the query is run
You can use ROW_NUMBER
WITH CTE AS
(
SELECT a.EMP_ID, a.EMP_DATE,
RN = ROW_NUMBER() OVER (PARTITION BY a.EMP_ID, CAST(a.DATE as Date) ORDER BY a.DATE ASC)
from EMPLOYES a INNER JOIN TABLE2 b
on a.table2ID = b.table2ID
and b.ID_TYPE = 'E'
where a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and MONTH(a.DATE) = 1
and a.ID <> 31
)
SELECT * FROM CTE
WHERE RN = 1
Try with an aggregate function MAX or MIN
create table #tmp(dt datetime, val numeric(4,2))
insert into #tmp values ('2021-01-01 10:30:35', 1)
insert into #tmp values ('2021-01-02 10:30:35', 2)
insert into #tmp values ('2021-01-02 11:30:35', 3)
insert into #tmp values ('2021-01-03 10:35:35', 4)
select * from #tmp
select tmp.*
from #tmp tmp
inner join
(select max(dt) as dt, cast(dt as date) as dt_aux from #tmp group by cast(dt as date)) compressed_rows on
tmp.dt = compressed_rows.dt
drop table #tmp
results:

Showing list of all 24 hours in sql server if there is no data also

I have a query where I need to show 24 hour calls for each day.
But I am getting the hours which I have calls only.
My requirement is I need to get all the hours split and 0 if there are no calls.
Please suggest
Below is my code.
select #TrendStartDate
,isd.Name
,isd.Call_ID
,isd.callType
,DATEPART(HOUR,isd.ArrivalTime)
from [PHONE_CALLS] ISD WITH (NOLOCK)
WHERE CallType = 'Incoming'
and Name not in ('DefaultQueue')
and CAST(ArrivalTime as DATe) between #TrendStartDate and #TrendEndDate
The basic idea is that you use a table containing numbers from 0 to 23, and left join that to your data table:
WITH CTE AS
(
SELECT TOP 24 ROW_NUMBER() OVER(ORDER BY ##SPID) - 1 As TheHour
FROM sys.objects
)
SELECT #TrendStartDate
,isd.Name
,isd.Call_ID
,isd.callType
,TheHour
FROM CTE
LEFT JOIN [PHONE_CALLS] ISD WITH (NOLOCK)
ON DATEPART(HOUR,isd.ArrivalTime) = TheHour
AND CallType = 'Incoming'
AND Name NOT IN ('DefaultQueue')
AND CAST(ArrivalTime as DATe) BETWEEN #TrendStartDate AND #TrendEndDate
If you have a tally table, you should use that. If not, the cte will provide you with numbers from 0 to 23.
If you have a numbers table you can use a query like the following:
SELECT d.Date,
h.Hour,
Calls = COUNT(pc.Call_ID)
FROM ( SELECT [Hour] = Number
FROM dbo.Numbers
WHERE Number >= 0
AND Number < 24
) AS h
CROSS JOIN
( SELECT Date = DATEADD(DAY, Number, #TrendStartDate)
FROM dbo.Numbers
WHERE Number <= DATEDIFF(DAY, #TrendStartDate, #TrendEndDate)
) AS d
LEFT JOIN [PHONE_CALLS] AS pc
ON pc.CallType = 'Incoming'
AND pc.Name NOT IN ('DefaultQueue')
AND CAST(pc.ArrivalTime AS DATE) = d.Date
AND DATEPART(HOUR, pc.ArrivalTime) = h.Hour
GROUP BY d.Date, h.Hour
ORDER BY d.Date, h.Hour;
The key is to get all the hours you need:
SELECT [Hour] = Number
FROM dbo.Numbers
WHERE Number >= 0
AND Number < 24
And all the days that you need in your range:
SELECT Date = DATEADD(DAY, Number, #TrendStartDate)
FROM dbo.Numbers
WHERE Number < DATEDIFF(DAY, #TrendStartDate, #TrendEndDate)
Then cross join the two, so that you are guaranteed to have all 24 hours for each day you want. Finally, you can left join to your call table to get the count of calls.
Example on DB<>Fiddle
You can use SQL SERVER recursivity with CTE to generate the hours between 0 and 23 and then a left outer join with the call table
You also use any other Method mentioned in this link to generate numbers from 0 to 23
Link to SQLFiddle
set dateformat ymd
declare #calls as table(date date,hour int,calls int)
insert into #calls values('2020-01-02',0,66),('2020-01-02',1,888),
('2020-01-02',2,5),('2020-01-02',3,8),
('2020-01-02',4,9),('2020-01-02',5,55),('2020-01-02',6,44),('2020-01-02',7,87),('2020-01-02',8,90),
('2020-01-02',9,34),('2020-01-02',10,22),('2020-01-02',11,65),('2020-01-02',12,54),('2020-01-02',13,78),
('2020-01-02',23,99);
with cte as (select 0 n,date from #calls union all select 1+n,date from cte where 1+n <24)
select distinct(cte.date),cte.n [Hour],isnull(ca.calls,0) calls from cte left outer join #calls ca on cte.n=ca.hour and cte.date=ca.date

How to self-join table in a way that every record is joined with the "previous" record?

I have a MS SQL table that contains stock data with the following columns: Id, Symbol, Date, Open, High, Low, Close.
I would like to self-join the table, so I can get a day-to-day % change for Close.
I must create a query that will join the table with itself in a way that every record contains also the data from the previous session (be aware, that I cannot use yesterday's date).
My idea is to do something like this:
select * from quotes t1
inner join quotes t2
on t1.symbol = t2.symbol and
t2.date = (select max(date) from quotes where symbol = t1.symbol and date < t1.date)
However I do not know if that's the correct/fastest way. What should I take into account when thinking about performance? (E.g. will putting UNIQUE index on a (Symbol, Date) pair improve performance?)
There will be around 100,000 new records every year in this table. I am using MS SQL Server 2008
One option is to use a recursive cte (if I'm understanding your requirements correctly):
WITH RNCTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY date) rn
FROM quotes
),
CTE AS (
SELECT symbol, date, rn, cast(0 as decimal(10,2)) perc, closed
FROM RNCTE
WHERE rn = 1
UNION ALL
SELECT r.symbol, r.date, r.rn, cast(c.closed/r.closed as decimal(10,2)) perc, r.closed
FROM CTE c
JOIN RNCTE r on c.symbol = r.symbol AND c.rn+1 = r.rn
)
SELECT * FROM CTE
ORDER BY symbol, date
SQL Fiddle Demo
If you need a running total for each symbol to use as the percentage change, then easy enough to add an additional column for that amount -- wasn't completely sure what your intentions were, so the above just divides the current closed amount by the previous closed amount.
Something like this w'd work in SQLite:
SELECT ..
FROM quotes t1, quotes t2
WHERE t1.symbol = t2.symbol
AND t1.date < t2.date
GROUP BY t2.ID
HAVING t2.date = MIN(t2.date)
Given SQLite is a simplest of a kind, maybe in MSSQL this will also work with minimal changes.
Index on (symbol, date)
SELECT *
FROM quotes q_curr
CROSS APPLY (
SELECT TOP(1) *
FROM quotes
WHERE symbol = q_curr.symbol
AND date < q_curr.date
ORDER BY date DESC
) q_prev
You do something like this:
with OrderedQuotes as
(
select
row_number() over(order by Symbol, Date) RowNum,
ID,
Symbol,
Date,
Open,
High,
Low,
Close
from Quotes
)
select
a.Symbol,
a.Date,
a.Open,
a.High,
a.Low,
a.Close,
a.Date PrevDate,
a.Open PrevOpen,
a.High PrevHigh,
a.Low PrevLow,
a.Close PrevClose,
b.Close-a.Close/a.Close PctChange
from OrderedQuotes a
join OrderedQuotes b on a.Symbol = b.Symbol and a.RowNum = b.RowNum + 1
If you change the last join to a left join you get a row for the first date for each symbol, not sure if you need that.
You can use option with CTE and ROW_NUMBER ranking function
;WITH cte AS
(
SELECT symbol, date, [Open], [High], [Low], [Close],
ROW_NUMBER() OVER(PARTITION BY symbol ORDER BY date) AS Id
FROM quotes
)
SELECT c1.Id, c1.symbol, c1.date, c1.[Open], c1.[High], c1.[Low], c1.[Close],
ISNULL(c2.[Close] / c1.[Close], 0) AS perc
FROM cte c1 LEFT JOIN cte c2 ON c1.symbol = c2.symbol AND c1.Id = c2.Id + 1
ORDER BY c1.symbol, c1.date
For improving performance(avoiding sorting and RID Lookup) use this index
CREATE INDEX ix_symbol$date_quotes ON quotes(symbol, date) INCLUDE([Open], [High], [Low], [Close])
Simple demo on SQLFiddle
What you had is fine. I don't know if translating the sub-query into the join will help. However, you asked for it, so the way to do it might be to join the table to itself once more.
select *
from quotes t1
inner join quotes t2
on t1.symbol = t2.symbol and t1.date > t2.date
left outer join quotes t3
on t2.symbol = t3.symbol and t2.date > t3.date
where t3.date is null
You could do something like this:
DECLARE #Today DATETIME
SELECT #Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP))
;WITH today AS
(
SELECT Id ,
Symbol ,
Date ,
[OPEN] ,
High ,
LOW ,
[CLOSE],
DATEADD(DAY, -1, Date) AS yesterday
FROM quotes
WHERE date = #today
)
SELECT *
FROM today
LEFT JOIN quotes yesterday ON today.Symbol = yesterday.Symbol
AND today.yesterday = yesterday.Date
That way you limit your "today" results, if that's an option.
EDIT: The CTEs listed as other questions may work well, but I tend to be hesitant to use ROW_NUMBER when dealing with 100K rows or more. If the previous day may not always be yesterday, I tend to prefer to pull out the check for the previous day in its own query then use it for reference:
DECLARE #Today DATETIME, #PreviousDay DATETIME
SELECT #Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP));
SELECT #PreviousDay = MAX(Date) FROM quotes WHERE Date < #Today;
WITH today AS
(
SELECT Id ,
Symbol ,
Date ,
[OPEN] ,
High ,
LOW ,
[CLOSE]
FROM quotes
WHERE date = #today
)
SELECT *
FROM today
LEFT JOIN quotes AS previousday
ON today.Symbol = previousday.Symbol
AND previousday.Date = #PreviousDay

Incorrect sum when I join a second table

This is the first time I ask for your help,
Actually I have to create a query, and did a similar example for it. I have two tables,
Report (ReportID, Date, headCount)
Production(ProdID, ReportID, Quantity)
My question is using this query, I get a wrong result,
SELECT
Report.date,
SUM(Report.HeadCount) AS SumHeadCount,
SUM(Production.Quantity) AS SumQuantity
FROM
Report
INNER JOIN
Production ON Report.ReportID = Production.ReportID
GROUP BY
Date
ORDER BY
Date
I guess some rows are being counted more than once, could you please give me a hand?
EDIT
if i run a query to get a sum of headcount grouped by day, I get:
date Headcount
7/2/2012 1843
7/3/2012 1802
7/4/2012 1858
7/5/2012 1904
also for Production Qty I get:
2012-07-02 8362
2012-07-03 8042
2012-07-04 8272
2012-07-05 9227
but when i combine the both queries i get i false one, i expect on 2 july 8362 qty against 1843, but i get:
day TotalHeadcount totalQty
7/2/2012 6021 8362
7/3/2012 7193 8042
7/4/2012 6988 8272
7/5/2012 7197 9227
This may be helpful
SELECT Report.ReportDate,
Sum(Report.HeadCount) AS SumHeadCount,
ProductionSummary.SumQuantity
FROM Report
INNER JOIN (SELECT ReportID,
Sum(Production.Quantity) AS SumQuantity
FROM Production
GROUP BY ReportID) AS ProductionSummary
ON Report.ReportID = ProductionSummary.ReportID
GROUP BY ReportDate
ORDER BY ReportDate
One way of avoiding this (subject to RDBMS support) would be
WITH R
AS (SELECT *,
Sum(HeadCount) OVER (PARTITION BY date) AS SumHeadCount
FROM Report)
SELECT R.date,
SumHeadCount,
Sum(P.Quantity) AS SumQuantity
FROM R
JOIN Production P
ON R.ReportID = P.ReportID
GROUP BY R.date, SumHeadCount
ORDER BY R.date
Group records per date using following
SELECT ReportSummary.ReportDate, SUM(ReportSummary.SumHeadCount) AS SumHeadCount, SUM(ProductionSummary.SumQuantity) AS SumQuantity
FROM
(
SELECT Report.ReportDate, SUM(Report.HeadCount) AS SumHeadCount
FROM Report
GROUP BY Report.ReportDate
) AS ReportSummary
INNER JOIN
(
SELECT Report.ReportDate, Sum(Production.Quantity) AS SumQuantity
FROM Production
INNER JOIN Report
ON Report.ReportID = Production.ReportID
GROUP BY Report.ReportDate
) AS ProductionSummary
ON ReportSummary.ReportDate = ProductionSummary.ReportDate
GROUP BY ReportSummary.ReportDate
ORDER BY ReportSummary.ReportDate
We have same problem but I solved it. Try this.
SELECT tbl_report.SumHeadCount, tbl_report.date, tbl_production.SumQuantity
FROM
( select date,
SUM(HeadCount) AS SumHeadCount FROM Report GROUP by date)as tbl_report
JOIN
( select SUM(Quantity) AS SumQuantity, date FROM Production GROUP by date)as tbl_production
WHERE tbl_report.date = tbl_production.date

Cannot group SQL results correctly

Hi i have a query which i need to show the number of transactions a user made,per day with the EUR equivalent of each transaction.
The query below does do that (find the eur equivalent by getting an average rate) but because the currencies are different i get the results by currency instead and not by total. what the query returns is:
Numb Transactions,Date, userid,transaction_type,total value (per currency),eur_equiv
1 12/12, 2, test 5 10
2 12/12,2, test 2 2
whereas i want it to return
Numb Transactions,Date, userid,transaction_type,total value (per currency),eur_equiv
1 12/12, 2, test 7 12
the query is shown below
SELECT COUNT(DISTINCT(ot.ID)) AS 'TRANSACTION COUNTER'
,CONVERT(VARCHAR(10) ,ot.CREATED_ON ,103) AS [DD/MM/YYYY]
,lad.ci
,ot.TRA_TYPE
,c.C_CODE
,CASE
WHEN op.CURRENCY_ID='CURRENCY-002' THEN SUM(CAST(op.IT_AMOUNT AS MONEY))
/(
SELECT AVG(CAST(cr.B_RATE AS MONEY)) AS AVG_RATE
FROM C_RATE cr
WHERE cr.CURRENCY_ID = 'CURRENCY-002'
)
WHEN op.CURRENCY_ID='-CURRENCY-005' THEN SUM(CAST(op.IT_AMOUNT AS MONEY))
/(
SELECT AVG(CAST(cr.B_RATE AS MONEY)) AS AVG_RATE
FROM C_RATE cr
WHERE cr.CURRENCY_ID = 'CURRENCY-005'
)
WHEN op.CURRENCY_ID='CURRENCY-006' THEN SUM(CAST(op.IT_AMOUNT AS MONEY))
/(
SELECT AVG(CAST(cr.B_RATE AS MONEY)) AS AVG_RATE
FROM C_RATE cr
WHERE cr.CURRENCY_ID = 'CURRENCY-006'
)
ELSE '0'
END AS EUR_EQUIVAL
FROM TRANSACTION ot
INNER JOIN PAYMENT op
ON op.ID = ot.ID
INNER JOIN CURRENCY c
ON op.CURRENCY_ID = c.ID
INNER JOIN ACCOUNT a
ON a.ID = ot.ACCOUNT_ID
INNER JOIN ACCOUNT_DETAIL lad
ON lad.A_NUMBER = a.A_NUMBER
INNER JOIN CUST cus
ON lad.CI = cus.CI
WHERE ot.TRA_TYPE_ID IN ('INBANK-TYPE'
,'IN-AC-TYPE'
,'DOM-TRANS-TYPE')
AND ot.STATUS_ID = 'COMPLETED'
AND cus.BRANCH IN ('123'
,'456'
,'789'
,'789')
GROUP BY
lad.CI
,CONVERT(VARCHAR(10) ,ot.CREATED_ON ,103)
,c.C_CODE
,op.CURRENCY_ID
,ot.TRAN_TYPE_ID
HAVING SUM(CAST(op.IT_AMOUNT AS MONEY))>'250000.00'
ORDER BY
CONVERT(VARCHAR(10) ,ot.CREATED_ON ,103) ASC
SELECT MIN([Numb Transactions]
, Date
, UserID
, Transaction_type
, SUM([Total Value]
, SUM([Eur Equiv]
FROM (
... -- Your current select (without order by)
) q
GROUP BY
Date
, UserId
, Transaction_type
The problem resides most likely in a double row in your joins. What I do, is Select * first, and see what columns generate double rows. You might need to adjust a JOIN relationship for the double rows to disappear.
Without any resultsets, it's very hard to reproduce the error you are getting.
Check for the following things:
Select * returns only double rows on the data i will merge with an aggregate function. If the answer here is "NO", you will need to alter a JOIN relationship with a Subselect. I am thinking of that Account and Account detail table.
certain joins can create duplicate rows if the join cannot be unique enough. Maybe you will have to join on multiple things here, for example JOIN table1 ON table1.ID = table2.EXT_ID and table1.Contact = table2.Contact
Couple of things:
1) Consider using a function for:
SELECT AVG(CAST(cr.B_RATE AS MONEY)) AS AVG_RATE
FROM C_RATE cr
WHERE cr.CURRENCY_ID = 'CURRENCY-002' with the currencyID as a parameter
2) Would grouping sets work here?
3) was the sum on the case or on the individual whens?
sum(CASE ..) vs sum(cast(op.IT_Amount as money)