Calculating Over Top N Values Per Group - sql

I have an access table with time series data like this:
loc | date | value
A 2/11/07 50
A 2/12/07 45
A 2/13/07 23
B 2/11/07 34
B 2/12/07 46
B 2/13/07 56
C ....... ...
...
D..........
.....
And I want to get the Z, (value - avg(values)/stDev(values), values of each group over different time periods so the 20 z values would consider values over the last 20 days, the 60 day over the last 60 days etc. And I also want to select the z values on the latest day so the result would look like this:
loc | date | value | 20Day zValue | 60Day ZValue | 120 day Zvalue
A 2/13/07 23 .04 .09 .6
B 2/13/07 56 .87 .54 .96
C .....................

Try this:
SELECT
a.*,
b.20Day_zValue,
c.60Day_zValue,
d.120Day_zValue
FROM
(
SELECT aa.loc, aa.date, aa.value
FROM tbl aa
INNER JOIN
(
SELECT loc, MAX(date) AS maxdate
FROM tbl
GROUP BY loc
) bb ON aa.loc = bb.loc AND aa.date = bb.maxdate
) a
INNER JOIN
(
SELECT loc, AVG(value)/StDev(value) AS 20Day_zValue
FROM tbl
WHERE date >= DateAdd('d', -20, Date())
GROUP BY loc
) b ON a.loc = b.loc
INNER JOIN
(
SELECT loc, AVG(value)/StDev(value) AS 60Day_zValue
FROM tbl
WHERE date >= DateAdd('d', -60, Date())
GROUP BY loc
) c ON a.loc = c.loc
INNER JOIN
(
SELECT loc, AVG(value)/StDev(value) AS 120Day_zValue
FROM tbl
WHERE date >= DateAdd('d', -120, Date())
GROUP BY loc
) d ON a.loc = d.loc

I came up with a solution, zane bien answer was a good start. the main problem was ms access's requirement for multiple joins. the basic bracketing structure is like this:
Select a,b,c
FROM
(((Table1 Inner Join Table2 ON a = b)
Inner Join Table3 ON a = c)
Inner Join Table4 ON a = d)
And the solution to my problem is:
SELECT A.loc AS Location, A.value AS Value,
(A.value - B.OneMonthAvg) / B.OneMonthStdev AS OneMonthZscore,
(A.value - C.ThreeMonthAvg) / C.ThreeMonthStdev AS ThreeMonthZscore,
(A.value - D.SixMonthAvg) / D.SixMonthStdev AS SixMonthZscore,
(A.value - E.OneYearAvg) / E.OneYearStdev AS OneYearZscore,
(A.value - F.TwoYearAvg) / F.TwoYearStdev AS TwoYearZscore,
(A.value - G.ThreeYearAvg) / G.ThreeYearStdev AS ThreeYearZscore
FROM
(((((((tbl AS A
INNER JOIN
(SELECT loc, AVG(value) AS OneMonthAvg, STDEV(value) AS OneMonthStdev
FROM tbl
WHERE date >= DateAdd('m', -1, Date())
GROUP BY loc)
AS B ON A.loc = B.loc)
INNER JOIN
(SELECT loc, AVG(value) AS ThreeMonthAvg, STDEV(value) AS ThreeMonthStdev
FROM tbl
WHERE date >= DateAdd('m', -3, Date())
GROUP BY loc)
AS C ON A.loc = C.loc)
INNER JOIN
(SELECT loc, AVG(value) AS SixMonthAvg, STDEV(value) AS SixMonthStdev
FROM tbl
WHERE date >= DateAdd('m', -6, Date())
GROUP BY loc)
AS D ON A.loc = D.loc)
INNER JOIN
(SELECT loc, AVG(value) AS OneYearAvg, STDEV(value) AS OneYearStdev
FROM tbl
WHERE date >= DateAdd('yyyy', -1, Date())
GROUP BY loc)
AS E ON A.loc = E.loc)
INNER JOIN
(SELECT loc, AVG(value) AS TwoYearAvg, STDEV(value) AS TwoYearStdev
FROM tbl
WHERE date >= DateAdd('yyyy', -2, Date())
GROUP BY loc)
AS F ON A.loc = F.loc)
INNER JOIN
(SELECT loc, AVG(value) AS ThreeYearAvg, STDEV(value) AS ThreeYearStdev
FROM tbl
WHERE date >= DateAdd('yyyy', -3, Date())
GROUP BY loc)
AS G ON A.loc = G.loc)
Where A.date = Date()
I changed the date range which i wanted to get the z-scores for.

Related

Return a single set of results from two SQL tables

I have two SQL queries that both return the same columns.
The returned column names are month_name, month_number and total.
How can I return a single set of results from the two tables, summing the "total" field?
Query 1
SELECT DISTINCT
DATENAME(MONTH,DATEADD(MONTH,month([date]),-1 )) as month_name,
MONTH([date]) as month_number,
SUM(pur.total_expenses - pur.vat) as 'total'
FROM (select distinct ID, MONTH([date]) as month_number, DATENAME(MONTH,DATEADD(MONTH,month([date]),-1 )) as month_name from [dbo].[purchase_invoices] WHERE YEAR([date]) = '2020' ) m
LEFT JOIN [dbo].[purchase_invoices] pur ON pur.id = m.id
LEFT JOIN [dbo].[dividends] div on month(dividend_date) = month([date]) AND month(dividend_date) = '2020'
WHERE YEAR([date]) = '2020'
GROUP BY m.month_number, MONTH([date]), m.month_name
ORDER BY month_number
Query 2
SELECT DISTINCT
DATENAME(MONTH,DATEADD(MONTH,month([dividend_date]),-1 )) as month_name,
MONTH([dividend_date]) as month_number,
SUM(div.dividend_value) as 'total'
FROM (select distinct ID, MONTH([dividend_date]) as month_number, DATENAME(MONTH,DATEADD(MONTH,month([dividend_date]),-1 )) as month_name from [dbo].[dividends] WHERE YEAR([dividend_date]) = '2020' ) m
LEFT JOIN [dbo].[dividends] div ON div.id = m.id
WHERE YEAR([dividend_date]) = '2020'
GROUP BY m.month_number, MONTH([dividend_date]), m.month_name
ORDER BY month_number
I know I need a JOIN on the month_name or number field, but I am not sure on how to achieve that.
Any help greatly appreciated.
UPDATE (Expected Output)
|---------------------|------------------|------------------|
| month_name | month_number | total |
|---------------------|------------------|------------------|
| Jan | 1 | 4500 |
|---------------------|------------------|------------------|
| Feb | 2 | 6000 |
|---------------------|------------------|------------------|
| ... | ... | ... |
Try using CTEs:
WITH invoice AS (
SELECT DISTINCT
ID,
MONTH([date]) AS month_number,
DATENAME(MONTH, DATEADD(MONTH, MONTH([date]), -1)) AS month_name
FROM[dbo].[purchase_invoices]
WHERE
YEAR([date]) = '2020'
),
q1 AS (
SELECT DISTINCT
DATENAME(MONTH, DATEADD(MONTH, MONTH([date]), -1)) AS month_name,
MONTH([date]) AS month_number,
SUM(pur.total_expenses - pur.vat) AS 'total'
FROM invoice as m
LEFT JOIN[dbo].[purchase_invoices] pur
ON pur.id = m.id
LEFT JOIN[dbo].[dividends] DIV
ON MONTH(dividend_date) = MONTH([date])
AND MONTH(dividend_date) = '2020'
WHERE
YEAR([date]) = '2020'
GROUP BY
m.month_number,
MONTH([date]),
m.month_name
ORDER BY
month_number
),
dividend AS (
SELECT DISTINCT
ID,
MONTH([dividend_date]) AS month_number,
DATENAME(MONTH, DATEADD(MONTH, MONTH([dividend_date]), -1)) AS month_name
FROM[dbo].[dividends]
WHERE
YEAR([dividend_date]) = '2020'
),
q2 AS (
SELECT DISTINCT
DATENAME(MONTH, DATEADD(MONTH, MONTH([dividend_date]), -1)) AS month_name,
MONTH([dividend_date]) AS month_number,
SUM(DIV.dividend_value) AS 'total'
FROM dividend as m
LEFT JOIN[dbo].[dividends] DIV
ON DIV.id = m.id
WHERE
YEAR([dividend_date]) = '2020'
GROUP BY
m.month_number,
MONTH([dividend_date]),
m.month_name
ORDER BY
month_number
)
SELECT
month_name,
month_number,
q1.total + q2.total AS total
FROM q1
LEFT JOIN q2
ON q1.month_name = q2.month_name
AND q1.month_number = q2.month_number
You can use union all:
select month_name, month_number, sum(total)
from ((<query1 here>) union all
(<query2 here>)
) q
group by month_name, month_number;

SQL Join two tables by unrelated date

I’m looking to join two tables that do not have a common data point, but common value (date). I want a table that lists the date and total number of hired/terminated employees on that day. Example is below:
Table 1
Hire Date Employee Number Employee Name
--------------------------------------------
5/5/2018 10078 Joe
5/5/2018 10077 Adam
5/5/2018 10078 Steve
5/8/2018 10079 Jane
5/8/2018 10080 Mary
Table 2
Termination Date Employee Number Employee Name
----------------------------------------------------
5/5/2018 10010 Tony
5/6/2018 10025 Jonathan
5/6/2018 10035 Mark
5/8/2018 10052 Chris
5/9/2018 10037 Sam
Desired result:
Date Total Hired Total Terminated
--------------------------------------
5/5/2018 3 1
5/6/2018 0 2
5/7/2018 0 0
5/8/2018 2 1
5/9/2018 0 1
Getting the total count is easy, just unsure as the best approach from the standpoint of "adding" a date column
If you need all dates within some window then you need to join the data to a calendar. You can then left join and sum flags for data points.
DECLARE #StartDate DATETIME = (SELECT MIN(ActionDate) FROM(SELECT ActionDate = MIN(HireDate) FROM Table1 UNION SELECT ActionDate = MIN(TerminationDate) FROM Table2)AS X)
DECLARE #EndDate DATETIME = (SELECT MAX(ActionDate) FROM(SELECT ActionDate = MAX(HireDate) FROM Table1 UNION SELECT ActionDate = MAX(TerminationDate) FROM Table2)AS X)
;WITH AllDates AS
(
SELECT CalendarDate=#StartDate
UNION ALL
SELECT DATEADD(DAY, 1, CalendarDate)
FROM AllDates
WHERE DATEADD(DAY, 1, CalendarDate) <= #EndDate
)
SELECT
CalendarDate,
TotalHired = SUM(CASE WHEN H.HireDate IS NULL THEN NULL ELSE 1 END),
TotalTerminated = SUM(CASE WHEN T.TerminationDate IS NULL THEN NULL ELSE 1 END)
FROM
AllDates D
LEFT OUTER JOIN Table1 H ON H.HireDate = D.CalendarDate
LEFT OUTER JOIN Table2 T ON T.TerminationDate = D.CalendarDate
/* If you only want dates with data points then uncomment out the where clause
WHERE
NOT (H.HireDate IS NULL AND T.TerminationDate IS NULL)
*/
GROUP BY
CalendarDate
I would do this with a union all and aggregations:
select dte, sum(is_hired) as num_hired, sum(is_termed) as num_termed
from (select hiredate as dte, 1 as is_hired, 0 as is_termed from table1
union all
select terminationdate, 0 as is_hired, 1 as is_termed from table2
) ht
group by dte
order by dte;
This does not include the "missing" dates. If you want those, a calendar or recursive CTE works. For instance:
with ht as (
select dte, sum(is_hired) as num_hired, sum(is_termed) as num_termed
from (select hiredate as dte, 1 as is_hired, 0 as is_termed from table1
union all
select terminationdate, 0 as is_hired, 1 as is_termed from table2
) ht
group by dte
),
d as (
select min(dte) as dte, max(dte) as max_dte)
from ht
union all
select dateadd(day, 1, dte), max_dte
from d
where dte < max_dte
)
select d.dte, coalesce(ht.num_hired, 0) as num_hired, coalesce(ht.num_termed) as num_termed
from d left join
ht
on d.dte = ht.dte
order by dte;
Try this one
SELECT ISNULL(a.THE_DATE, b.THE_DATE) as Date,
ISNULL(a.Total_Hire,0) as Total_Hire,
ISNULL (b.Total_Terminate,0) as Total_terminate
FROM (SELECT Hire_date as the_date, COUNT(1) as Total_Hire
FROM TABLE_HIRE GROUP BY HIRE_DATE) a
FULL OUTER JOIN (SELECT Termination_Date as the_date, COUNT(1) as Total_Terminate
FROM TABLE_TERMINATE GROUP BY HIRE_DATE) a
ON a.the_date = b.the_date

Finding missing dates compared to date range

I have one table (A) with date ranges and another (B) with just a set date. There are missing months in B that are within the date range of A. I need to identify the missing months.
A
Person StartDate EndDate
123 1/1/2016 5/1/2016
B
Person EffectiveDate
123 1/1/2016
123 2/1/2016
123 4/1/2016
123 5/1/2016
Expected result would be
123 3/1/2016
I'm using SQL Server 2012. Any assistance would be appreciated. Thanks!
One approach is to generate all values between the two dates. Here is an approach using a numbers table:
with n as (
select row_number() over (order by (select null)) - 1 as n
from master.spt_values
)
select a.person, dateadd(day, n.n, a.startdate) as missingdate
from a join
n
on dateadd(day, n.n, a.startdate) <= day.enddate left join
b
on b.person = a.person and b.effectivedate = dateadd(day, n.n, a.startdate)
where b.person is null;
Try this:
CREATE TABLE #A (Person INT, StartDate DATE, EndDate DATE)
INSERT INTO #A
SELECT '123','1/1/2016', '5/1/2016'
CREATE TABLE #B(Person INT, EffectiveDate DATE)
INSERT INTO #B
SELECT 123 ,'1/1/2016' UNION ALL
SELECT 123 ,'2/1/2016' UNION ALL
SELECT 123 ,'4/1/2016' UNION ALL
SELECT 123 ,'5/1/2016'
;WITH A1
AS(
SELECT PERSON , StartDate, EndDate
FROM #A
UNION ALL
SELECT PERSON ,DATEADD(MM,1,STARTDATE), EndDate
FROM A1
WHERE DATEADD(MM,1,STARTDATE) <= EndDate
)
SELECT PERSON , StartDate
FROM A1
WHERE
NOT EXISTS
(
SELECT 1 FROM #B B1
WHERE B1.Person = A1.PERSON
AND YEAR(B1.EffectiveDate) = YEAR(A1.STARTDATE) AND MONTH(B1.EffectiveDate) = MONTH(A1.STARTDATE)
)
This should work if you are interested in getting missing months
;WITH n
AS (SELECT ROW_NUMBER() OVER(ORDER BY
(
SELECT NULL
)) - 1 AS n
FROM master.dbo.spt_values)
SELECT a.person,
DATEADD(MONTH, n.n, a.startdate) AS missingdate
FROM a a
INNER JOIN n ON DATEADD(MONTH, n.n, a.startdate) <= a.enddate
LEFT JOIN b b ON MONTH(DATEADD(MONTH, n.n, a.startdate)) = MONTH(b.effectivedate) AND YEAR(DATEADD(MONTH, n.n, a.startdate)) = YEAR(b.effectivedate)
WHERE b.person IS NULL;

Sql select sub query with count

Ok so i have 3 tables :
[AXprod].[dbo].[RMSPOSINVOICE],[AXPROD].[dbo].[discountcard] ,[IntegrationProd].[dbo].[POS_KvitoGalva]. And i want to find out when discount card was used more than once in one inventlocation and time when it was used. The table [IntegrationProd].[dbo].[POS_KvitoGalva] has these times. I use this code to get the time each card was used each day is:
sELECT a.discountcardid,count(a.discountcardid)
FROM [AXprod].[dbo].[RMSPOSINVOICE] a
inner join [AXPROD].[dbo].[discountcard] b
on a.discountcardid = b.discountcardid
inner join [IntegrationProd].[dbo].[POS_KvitoGalva] c
on a.possalesid = c.id
where a.dataareaid = 'ermi' and len(a.discountcardid) > '0' and b.dataareaid = 'ermi' and ('500' = a.inventlocationid )
and (a.invoicedate >= '2015-04-22 00:00:00.000' and a.invoicedate <= '2015-04-22 00:00:00.000')
group by a.discountcardid,a.inventlocationid,a.posnumber
having count(a.discountcardid) > '1'
And i get the following result:
DISCOUNTCARDID COUNT
123456 2
145962 2
and i have a query to find when each card was used (date and time)
SELECT a.discountcardid,a.inventlocationid,a.posnumber,year,month,day,hour,minute,c.id
FROM [AXprod].[dbo].[RMSPOSINVOICE] a
inner join [AXPROD].[dbo].[discountcard] b
on a.discountcardid = b.discountcardid
inner join [IntegrationProd].[dbo].[POS_KvitoGalva] c
on a.possalesid = c.id
where a.dataareaid = 'ermi' and len(a.discountcardid) > '0' and b.dataareaid = 'ermi' and ('500' = a.inventlocationid )
and (a.invoicedate >= '2015-04-22 00:00:00.000' and a.invoicedate <= '2015-04-22 00:00:00.000')
group by a.discountcardid,a.inventlocationid,a.posnumber,year,month,day,hour,minute,c.id
order by DISCOUNTCARDID
And i get the result:
discountcardid inventlocationid posnumber year month day hour minute id
123456 500 7 2015 4 22 12 44 6355302
123456 500 7 2015 4 22 14 24 6355302
145962 500 7 2015 4 22 13 56 6355302
145962 500 7 2015 4 22 13 24 6355302
145555 500 7 2015 4 22 12 11 5465465
The problem:
I dont want to get discount cards that were only used once so i try this:
SELECT a.discountcardid,a.inventlocationid,a.posnumber,year,month,day,hour,minute,c.id,
( sELECT count(s.discountcardid)
FROM [AXprod].[dbo].[RMSPOSINVOICE] s
inner join [AXPROD].[dbo].[discountcard] b
on s.discountcardid = b.discountcardid
inner join [IntegrationProd].[dbo].[POS_KvitoGalva] c
on s.possalesid = c.id
where s.dataareaid = 'ermi' and len(s.discountcardid) > '0' and b.dataareaid = 'ermi' and ('500' = s.inventlocationid )
and (s.invoicedate >= '2015-04-22 00:00:00.000' and s.invoicedate <= '2015-04-22 00:00:00.000') and s.DISCOUNTCARDID = a.DISCOUNTCARDID
group by s.discountcardid,s.inventlocationid,s.posnumber
having count(a.discountcardid) > '1')
FROM [AXprod].[dbo].[RMSPOSINVOICE] a
inner join [AXPROD].[dbo].[discountcard] b
on a.discountcardid = b.discountcardid
inner join [IntegrationProd].[dbo].[POS_KvitoGalva] c
on a.possalesid = c.id
where a.dataareaid = 'ermi' and len(a.discountcardid) > '0' and b.dataareaid = 'ermi' and ('500' = a.inventlocationid )
and (a.invoicedate >= '2015-04-22 00:00:00.000' and a.invoicedate <= '2015-04-22 00:00:00.000')
group by a.discountcardid,a.inventlocationid,a.posnumber,year,month,day,hour,minute,c.id
order by DISCOUNTCARDID
But all i get is the same number of values and NULL in the last field in all columns. I hope i made myself clear ;).
You should be able to call the query once and use an windowed function in order to get the count. I don't believe you can use an analytic function in the where statement so I added an additional SELECT statement in order to add the WHERE > 1 for the count.
SELECT *
FROM (SELECT
a.discountcardid,
a.inventlocationid,
a.posnumber,
year,
month,
day,
hour,
minute,
c.id,
COUNT(*) OVER (PARTITION BY a.discountcardid, a.inventlocationid, a.posnumber) AS CardCount
FROM AXprod.dbo.RMSPOSINVOICE a
JOIN AXprod.dbo.discountcard b
ON b.discountcardid = a.discountcardid
JOIN IntegrationProd.dbo.POS_KvitoGalva c
ON c.id = a.possalesid
WHERE a.dataareaid = 'ermi'
AND len(a.discountcardid) > '0'
AND b.dataareaid = 'ermi'
AND a.inventlocationid = 500
AND a.invoicedate >= '2015-04-22 00:00:00.000'
AND a.invoicedate <= '2015-04-22 00:00:00.000'
) d
WHERE d.CardCount > 1
ORDER BY d.discountcardid

Copy prior month value and insert into new row

Here is an example of the current table I have:
1) Table name: TotalSales
Name Year Month Sales
------ ---- ----- -----
Alfred 2011 1 100
What I want to do is create a table like this, add a new row(Prior month sales):
2) Table name: TotalSales
Name Year Month Sales Prior month sales
------ ---- ----- ----- -----------------
Alfred 2011 2 110 100
Not sure how to this, but this is what I have been working on:
SELECT Name, Year, Month, Sales, Sales as [Prior Month sales]
FROM TotalSales
WHERE
DATEPART(month, [Prior Month sales]) = DATEPART(month, DATEADD(month, -1, getdate()))
Thanks for any help
I believe this should work...you need to join to itself on name/prior month, but you have 2 test cases for prior month since year/month are stored separately.
select c.Name, c.Year, c.Month, c.Sales, p.Sales
from TotalSales c
left join TotalSales p
on c.Name = p.Name and (
(c.Month > 1 and c.Year = p.Year and c.Month = p.Month + 1)
or (c.Month = 1 and c.Year = p.Year + 1 and p.Month = 12))
To select the given data you need to join the table to itself:
SELECT
TS.name,
TS.year,
TS.month,
TS.sales,
COALESCE(TS2.sales, 0) AS prior_month_sales
FROM
TotalSales TS
LEFT OUTER JOIN TotalSales TS2 ON
TS2.name = TS.name AND
(
(TS2.year = TS.year AND TS2.month = TS.month - 1) OR
(TS.month = 1 AND TS2.month = 12 AND TS2.year = TS.year - 1)
)
The LEFT OUTER JOIN is an outer join in case they didn't have any sales the previous month (or this is their first month with the company).
Try something like this to just update the table with the values you want...
UPDATE TotalSales
SET PriorMonthSales =
(
SELECT TS.Sales
FROM TotalSales TS
WHERE
(TotalSales.Month = TS.Month + 1 AND TotalSales.Year = TS.Year)
OR
(TotalSales.Month = 1 AND TS.Month = 12 AND TS.Year = TotalSales.Year -1)
)