Year Over Year (YOY) Distinct Count

Year Over Year (YOY) Distinct Count - sql

EDITED:
I'm working in Sql Server 2005 and I'm trying to get a year over year (YOY) count of distinct users for the current fiscal year (say Jun 1-May 30) and the past 3 years. I'm able to do what I need by running a select statement four times, but I can't seem to find a better way at this point. I'm able to get a distinct count for each year in one query, but I need it to a cumulative distinct count. Below is a mockup of what I have so far:
SELECT [Year], COUNT(DISTINCT UserID)
FROM
(
SELECT u.uID AS UserID,
CASE
WHEN dd.ddEnd BETWEEN #yearOneStart AND #yearOneEnd THEN 'Year1'
WHEN dd.ddEnd BETWEEN #yearTwoStart AND #yearTwoEnd THEN 'Year2'
WHEN dd.ddEnd BETWEEN #yearThreeStart AND #yearThreeEnd THEN 'Year3'
WHEN dd.ddEnd BETWEEN #yearFourStart AND #yearFourEnd THEN 'Year4'
ELSE 'Other'
END AS [Year]
FROM Users AS u
INNER JOIN UserDataIDMatch AS udim
ON u.uID = udim.udim_FK_uID
INNER JOIN DataDump AS dd
ON udim.udimUserSystemID = dd.ddSystemID
) AS Data
WHERE LOWER([Year]) 'other'
GROUP BY
[Year]
I get something like:
Year1 1
Year2 1
Year3 1
Year4 1
But I really need:
Year1 1
Year2 2
Year3 3
Year4 4
Below is a rough schema and set of values (updated for simplicity). I tried to create a SQL Fiddle, but I'm getting a disk space error when I attempt to build the schema.
CREATE TABLE Users
(
uID int identity primary key,
uFirstName varchar(75),
uLastName varchar(75)
);
INSERT INTO Users (uFirstName, uLastName)
VALUES
('User1', 'User1'),
('User2', 'User2')
('User3', 'User3')
('User4', 'User4');
CREATE TABLE UserDataIDMatch
(
udimID int indentity primary key,
udim.udim_FK_uID int foreign key references Users(uID),
udimUserSystemID varchar(75)
);
INSERT INTO UserDataIDMatch (udim_FK_uID, udimUserSystemID)
VALUES
(1, 'SystemID1'),
(2, 'SystemID2'),
(3, 'SystemID3'),
(4, 'SystemID4');
CREATE TABLE DataDump
(
ddID int identity primary key,
ddSystemID varchar(75),
ddEnd datetime
);
INSERT INTO DataDump (ddSystemID, ddEnd)
VALUES
('SystemID1', '10-01-2013'),
('SystemID2', '10-01-2014'),
('SystemID3', '10-01-2015'),
('SystemID4', '10-01-2016');

Unless I'm missing something, you just want to know how many records there are where the date is less than or equal to the current fiscal year.
DECLARE #YearOneStart DATETIME, #YearOneEnd DATETIME,
#YearTwoStart DATETIME, #YearTwoEnd DATETIME,
#YearThreeStart DATETIME, #YearThreeEnd DATETIME,
#YearFourStart DATETIME, #YearFourEnd DATETIME
SELECT #YearOneStart = '06/01/2013', #YearOneEnd = '05/31/2014',
#YearTwoStart = '06/01/2014', #YearTwoEnd = '05/31/2015',
#YearThreeStart = '06/01/2015', #YearThreeEnd = '05/31/2016',
#YearFourStart = '06/01/2016', #YearFourEnd = '05/31/2017'
;WITH cte AS
(
SELECT u.uID AS UserID,
CASE
WHEN dd.ddEnd BETWEEN #yearOneStart AND #yearOneEnd THEN 'Year1'
WHEN dd.ddEnd BETWEEN #yearTwoStart AND #yearTwoEnd THEN 'Year2'
WHEN dd.ddEnd BETWEEN #yearThreeStart AND #yearThreeEnd THEN 'Year3'
WHEN dd.ddEnd BETWEEN #yearFourStart AND #yearFourEnd THEN 'Year4'
ELSE 'Other'
END AS [Year]
FROM Users AS u
INNER JOIN UserDataIDMatch AS udim
ON u.uID = udim.udim_FK_uID
INNER JOIN DataDump AS dd
ON udim.udimUserSystemID = dd.ddSystemID
)
SELECT
DISTINCT [Year],
(SELECT COUNT(*) FROM cte cteInner WHERE cteInner.[Year] <= cteMain.[Year] )
FROM cte cteMain

Concept using an existing query
I have done something similar for finding out the number of distinct customers who bought something in between years, I modified it to use your concept of year, the variables you add would be that start day and start month of the year and the start year and end year.
Technically there is a way to avoid using a loop but this is very clear and you can't go past year 9999 so don't feel like putting clever code to avoid a loop makes sense
Tips for speeding up the query
Also when matching dates make sure you are comparing dates, and not comparing a function evaluation of the column as that would mean running the function on every record set and would make indices useless if they existed on dates (which they should). Use date add on
zero to initiate your target dates subtracting 1900 from the year, one from the month and one from the target date.
Then self join on the table where the dates create a valid range (i.e. yearlessthan to yearmorethan) and use a subquery to create a sum based on that range. Since you want accumulative from the first year to the last limit the results to starting at the first year.
At the end you will be missing the first year as by our definition it does not qualify as a range, to fix this just do a union all on the temp table you created to add the missing year and the number of distinct values in it.
DECLARE #yearStartMonth INT = 6, #yearStartDay INT = 1
DECLARE #yearStart INT = 2008, #yearEnd INT = 2012
DECLARE #firstYearStart DATE =
DATEADD(day,#yearStartDay-1,
DATEADD(month, #yearStartMonth-1,
DATEADD(year, #yearStart- 1900,0)))
DECLARE #lastYearEnd DATE =
DATEADD(day, #yearStartDay-2,
DATEADD(month, #yearStartMonth-1,
DATEADD(year, #yearEnd -1900,0)))
DECLARE #firstdayofcurrentyear DATE = #firstYearStart
DECLARE #lastdayofcurrentyear DATE = DATEADD(day,-1,DATEADD(year,1,#firstdayofcurrentyear))
DECLARE #yearnumber INT = YEAR(#firstdayofcurrentyear)
DECLARE #tempTableYearBounds TABLE
(
startDate DATE NOT NULL,
endDate DATE NOT NULL,
YearNumber INT NOT NULL
)
WHILE #firstdayofcurrentyear < #lastYearEnd
BEGIN
INSERT INTO #tempTableYearBounds
VALUES(#firstdayofcurrentyear,#lastdayofcurrentyear,#yearNumber)
SET #firstdayofcurrentyear = DATEADD(year,1,#firstdayofcurrentyear)
SET #lastdayofcurrentyear = DATEADD(year,1,#lastdayofcurrentyear)
SET #yearNumber = #yearNumber + 1
END
DECLARE #tempTableCustomerCount TABLE
(
[Year] INT NOT NULL,
[CustomerCount] INT NOT NULL
)
INSERT INTO #tempTableCustomerCount
SELECT
YearNumber as [Year],
COUNT(DISTINCT CustomerNumber) as CutomerCount
FROM Ticket
JOIN #tempTableYearBounds ON
TicketDate >= startDate AND TicketDate <=endDate
GROUP BY YearNumber
SELECT * FROM(
SELECT t2.Year as [Year],
(SELECT
SUM(CustomerCount)
FROM #tempTableCustomerCount
WHERE Year>=t1.Year
AND Year <=t2.Year) AS CustomerCount
FROM #tempTableCustomerCount t1 JOIN #tempTableCustomerCount t2
ON t1.Year < t2.Year
WHERE t1.Year = #yearStart
UNION
SELECT [Year], [CustomerCount]
FROM #tempTableCustomerCount
WHERE [YEAR] = #yearStart
) tt
ORDER BY tt.Year
It isn't efficient but at the end the temp table you are dealing with is so small I don't think it really matters, and adds a lot more versatility versus the method you are using.
Update: I updated the query to reflect the result you wanted with my data set, I was basically testing to see if this was faster, it was faster by 10 seconds but the dataset I am dealing with is relatively small. (from 12 seconds to 2 seconds).
Using your data
I changed the tables you gave to temp tables so it didn't effect my environment and I removed the foreign key because they are not supported for temp tables, the logic is the same as the example included but just changed for your dataset.
DECLARE #startYear INT = 2013, #endYear INT = 2016
DECLARE #yearStartMonth INT = 10 , #yearStartDay INT = 1
DECLARE #startDate DATETIME = DATEADD(day,#yearStartDay-1,
DATEADD(month, #yearStartMonth-1,
DATEADD(year,#startYear-1900,0)))
DECLARE #endDate DATETIME = DATEADD(day,#yearStartDay-1,
DATEADD(month,#yearStartMonth-1,
DATEADD(year,#endYear-1899,0)))
DECLARE #tempDateRangeTable TABLE
(
[Year] INT NOT NULL,
StartDate DATETIME NOT NULL,
EndDate DATETIME NOT NULL
)
DECLARE #currentDate DATETIME = #startDate
WHILE #currentDate < #endDate
BEGIN
DECLARE #nextDate DATETIME = DATEADD(YEAR, 1, #currentDate)
INSERT INTO #tempDateRangeTable(Year,StartDate,EndDate)
VALUES(YEAR(#currentDate),#currentDate,#nextDate)
SET #currentDate = #nextDate
END
CREATE TABLE Users
(
uID int identity primary key,
uFirstName varchar(75),
uLastName varchar(75)
);
INSERT INTO Users (uFirstName, uLastName)
VALUES
('User1', 'User1'),
('User2', 'User2'),
('User3', 'User3'),
('User4', 'User4');
CREATE TABLE UserDataIDMatch
(
udimID int indentity primary key,
udim.udim_FK_uID int foreign key references Users(uID),
udimUserSystemID varchar(75)
);
INSERT INTO UserDataIDMatch (udim_FK_uID, udimUserSystemID)
VALUES
(1, 'SystemID1'),
(2, 'SystemID2'),
(3, 'SystemID3'),
(4, 'SystemID4');
CREATE TABLE DataDump
(
ddID int identity primary key,
ddSystemID varchar(75),
ddEnd datetime
);
INSERT INTO DataDump (ddSystemID, ddEnd)
VALUES
('SystemID1', '10-01-2013'),
('SystemID2', '10-01-2014'),
('SystemID3', '10-01-2015'),
('SystemID4', '10-01-2016');
DECLARE #tempIndividCount TABLE
(
[Year] INT NOT NULL,
UserCount INT NOT NULL
)
-- no longer need to filter out other because you are using an
--inclusion statement rather than an exclusion one, this will
--also make your query faster (when using real tables not temp ones)
INSERT INTO #tempIndividCount(Year,UserCount)
SELECT tdr.Year, COUNT(DISTINCT UId) FROM
Users u JOIN UserDataIDMatch um
ON um.udim_FK_uID = u.uID
JOIN DataDump dd ON
um.udimUserSystemID = dd.ddSystemID
JOIN #tempDateRangeTable tdr ON
dd.ddEnd >= tdr.StartDate AND dd.ddEnd < tdr.EndDate
GROUP BY tdr.Year
-- will show you your result
SELECT * FROM #tempIndividCount
--add any ranges that did not have an entry but were in your range
--can easily remove this by taking this part out.
INSERT INTO #tempIndividCount
SELECT t1.Year,0 FROM
#tempDateRangeTable t1 LEFT OUTER JOIN #tempIndividCount t2
ON t1.Year = t2.Year
WHERE t2.Year IS NULL
SELECT YearNumber,UserCount FROM (
SELECT 'Year'+CAST(((t2.Year-t1.Year)+1) AS CHAR) [YearNumber] ,t2.Year,(
SELECT SUM(UserCount)
FROM #tempIndividCount
WHERE Year >= t1.Year AND Year <=t2.Year
) AS UserCount
FROM #tempIndividCount t1
JOIN #tempIndividCount t2
ON t1.Year < t2.Year
WHERE t1.Year = #startYear
UNION ALL
--add the missing first year, union it to include the value
SELECT 'Year1',Year, UserCount FROM #tempIndividCount
WHERE Year = #startYear) tt
ORDER BY tt.Year
Benefits over using a WHEN CASE based approach
More Robust
Do not need to explicitly determine the end and start dates of each year, just like in a logical year just need to know the start and end date. Can easily change what you are looking for with some simple modifications(i.e. say you want all 2 year ranges or 3 year).
Will be faster if the database is indexed properly
Since you are searching based on the same data type you can utilize the indices that should be created on the date columns in the database.
Cons
More Complicated
The query is a lot more complicated to follow, even though it is more robust there is a lot of extra logic in the actual query.
In some circumstance will not provide good boost to execution time
If the dataset is very small, or the number of dates being compared isn't significant then this could not save enough time to be worth it.

In SQL Server once you match a WHEN inside a CASE, it stop evaluating will not going on evaluating next WHEN clauses. Hence you can't accumulate that way.
if I understand you correctly, this would show your results.
;WITH cte AS
(F
SELECT dd.ddEnd [dateEnd], u.uID AS UserID
FROM Users AS u
INNER JOIN UserDataIDMatch AS udim
ON u.uID = udim.udim_FK_uID
INNER JOIN DataDump AS dd
ON udim.udimUserSystemID = dd.ddSystemID
WHERE ddEnd BETWEEN #FiscalYearStart AND #FiscalYearEnd3
)
SELECT datepart(year, #FiscalYearStart) AS [Year], COUNT(DISTINCT UserID) AS CntUserID
FROM cte
WHERE dateEnd BETWEEN #FiscalYearStart AND #FiscalYearEnd1
GROUP BY #FiscalYearStart
UNION
SELECT datepart(year, #FiscalYearEnd1) AS [Year], COUNT(DISTINCT UserID) AS CntUserID
FROM cte
WHERE dateEnd BETWEEN #FiscalYearStart AND #FiscalYearEnd2
GROUP BY #FiscalYearEnd1
UNION
SELECT datepart(year, #FiscalYearEnd3) AS [Year], COUNT(DISTINCT UserID) AS CntUserID
FROM cte
WHERE dateEnd BETWEEN #FiscalYearStart AND #FiscalYearEnd3
GROUP BY #FiscalYearEnd2

Related

Optimize this query without using not exist repeatably, is there a better way to write this query?

For example I have three table where say DataTable1, DataTable2 and DataTable3
and need to filter it from DataRange table, every time I have used NOT exist as shown below,
Is there a better way to write this.
Temp table to hold some daterange which is used for fiter:
Declare #DateRangeTable as Table(
StartDate datetime,
EndDate datetime
)
Some temp table which will hold data on which we need to apply date range filter
INSERT INTO #DateRangeTable values
('07/01/2020','07/04/2020'),
('07/06/2020','07/08/2020');
/*Table 1 which will hold some data*/
Declare #DataTable1 as Table(
Id numeric,
Date datetime
)
INSERT INTO #DataTable1 values
(1,'07/09/2020'),
(2,'07/06/2020');
Declare #DataTable2 as Table(
Id numeric,
Date datetime
)
INSERT INTO #DataTable2 values
(1,'07/10/2020'),
(2,'07/06/2020');
Declare #DataTable3 as Table(
Id numeric,
Date datetime
)
INSERT INTO #DataTable3 values
(1,'07/11/2020'),
(2,'07/06/2020');
Now I want to filter data based on DateRange table, here I need some optimized way so that i don't have to use not exists mutiple times, In real senario, I have mutiple tables where I have to filter based on the daterange table.
Select * from #DataTable1
where NOT EXISTS(
Select 1 from #DateRangeTable
where [Date] between StartDate and EndDate
)
Select * from #DataTable2
where NOT EXISTS(
Select 1 from #DateRangeTable
where [Date] between StartDate and EndDate
)
Select * from #DataTable3
where NOT EXISTS(
Select 1 from #DateRangeTable
where [Date] between StartDate and EndDate
)

Instead of using NOT EXISTS you could join the date range table:
SELECT dt.*
FROM #DataTable1 dt
LEFT JOIN #DateRangeTable dr ON dt.[Date] BETWEEN dr.StartDate and dr.EndDate
WHERE dr.StartDate IS NULL
It may perform better on large tables but you would have to compare the execution plans and make sure you have indexes on the date columns.

I would write the same query... but if you can change table structure I would try to improve performance adding two columns to specify the month as an integer (I suppose is the first couple of figures).
Obviously you have to test with your data and compare the timings.
Declare #DateRangeTable as Table(
StartDate datetime,
EndDate datetime,
StartMonth tinyint,
EndMonth tinyint
)
INSERT INTO #DateRangeTable values
('07/01/2020','07/04/2020', 7, 7),
('07/06/2020','07/08/2020', 7, 7),
('07/25/2020','08/02/2020', 7, 8); // (another record with different months)
Now your queries can use the new column to try to reduce comparisons (is a tinyint, sql server can partition records if you define a secondary index for StartMonth and EndMonth):
Select * from #DataTable1
where NOT EXISTS(
Select 1 from #DateRangeTable
where (DATEPART('month', [Date]) between StartMonth and EndMonth)
and ([Date] between StartDate and EndDate)
)

An aggregate may not appear in the set list of an UPDATE statement T-SQL

In T-SQL I'm attempting to update a stock user field with the number of weeks we expect it to be delivered to us by taking the difference between today and the purchase order due in dates. However the select query can return more than one line of purchase orders if there is more than one purchase order containing that product (obviously). I would like to take the smallest number it returns / minimum value but obviously cannot do this within the update query. Can anyone recommend a workaround? Thanks.
UPDATE [Exchequer].[ASAP01].[STOCK]
SET stUserField7 = DATEDIFF(day,CONVERT(VARCHAR(8), GETDATE(), 112),min(tlLineDate)) / 7 + 1
FROM [Exchequer].[ASAP01].[STOCK]
JOIN [Exchequer].[ASAP01].[CUSTSUPP]
ON stSupplier = acCode
JOIN [Exchequer].[ASAP01].[DETAILS]
ON stCode = tlStockCodeTrans1
WHERE stSupplier <> '' AND stQtyOnOrder > '0' AND stQtyOnOrder > stQtyAllocated
AND tlOurRef like 'POR%' AND (floor(tlQtyDel) + floor(tlQtyWOFF)) < floor(tlQty)
AND tlLineDate >= CONVERT(VARCHAR(8),GETDATE(), 112)

Why are you casting date to varchar for the difference?
This is not date but how you can use a window function in an update
declare #maps table(name varchar(10), isUsed bit, code varchar(10));
insert into #Maps values
('NY', 1, 'NY1')
, ('NY', 0, 'NY2')
, ('FL', 0, 'FL1')
, ('TX', 0, 'TX1')
declare #Results table (id int identity primary key, Name varchar(20), Value int, Code varchar(20), cnt int)
insert into #results values
('FL', 12, 'FL1', null)
, ('TX', 54, 'TX1', null)
, ('TX', 56, 'TX1', null)
, ('CA', 50, 'CA1', null)
, ('NJ', 40, 'NJ1', null)
select * from #results
order by name, Value desc
update r
set r.cnt = tt.cnt
from #results r
join ( select id, max(value) over (partition by name) as cnt
from #Results
) tt
on r.id = tt.id
select * from #results
order by name, value desc

Build a SELECT query with columns for the following:
The primary key of the [Exchequer].[ASAP01].[STOCK] table
The new desired stUserField7
Given the MIN(tlLineDate) expression in the original question, if this SELECT query does not have either a GROUP BY clause or change to use an APPLY instead of a JOIN, you've probably done something wrong.
Once you have that query, use it in an UPDATE statement like this:
UPDATE s
SET s.stUserField7 = t.NewValueFromSelectQuery
FROM [Exchequer].[ASAP01].[STOCK] s
INNER JOIN (
--- your new SELECT query here
) t ON t.<primary key field(s)> = s.<primary key field(s)>

Efficient way of storing date ranges

I need to store simple data - suppose I have some products with codes as a primary key, some properties and validity ranges. So data could look like this:
Products
code value begin_date end_date
10905 13 2005-01-01 2016-12-31
10905 11 2017-01-01 null
Those ranges are not overlapping, so on every date I have a list of unique products and their properties. So to ease the use of it I've created the function:
create function dbo.f_Products
(
#date date
)
returns table
as
return (
select
from dbo.Products as p
where
#date >= p.begin_date and
#date <= p.end_date
)
This is how I'm going to use it:
select
*
from <some table with product codes> as t
left join dbo.f_Products(#date) as p on
p.code = t.product_code
This is all fine, but how I can let optimizer know that those rows are unique to have better execution plan?
I did some googling, and found a couple of really nice articles for DDL which prevents storing overlapping ranges in the table:
Self-maintaining, Contiguous Effective Dates in Temporal Tables
Storing intervals of time with no overlaps
But even if I try those constraint I see that optimizer cannot understand that resulting recordset will return unique codes.
What I'd like to have is certain approach which gives me basically the same performance as if I stored those products list on certain date and selected it with date = #date.
I know that some RDMBS (like PostgreSQL) have special data types for this (Range Types). But SQL Server doesn't have anything like this.
Am I missing something or there're no way to do this properly in SQL Server?

You can create an indexed view that contains a row for each code/date in the range.
ProductDate (indexed view)
code value date
10905 13 2005-01-01
10905 13 2005-01-02
10905 13 ...
10905 13 2016-12-31
10905 11 2017-01-01
10905 11 2017-01-02
10905 11 ...
10905 11 Today
Like this:
create schema digits
go
create table digits.Ones (digit tinyint not null primary key)
insert into digits.Ones (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.Tens (digit tinyint not null primary key)
insert into digits.Tens (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.Hundreds (digit tinyint not null primary key)
insert into digits.Hundreds (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.Thousands (digit tinyint not null primary key)
insert into digits.Thousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.TenThousands (digit tinyint not null primary key)
insert into digits.TenThousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
go
create schema info
go
create table info.Products (code int not null, [value] int not null, begin_date date not null, end_date date null, primary key (code, begin_date))
insert into info.Products (code, [value], begin_date, end_date) values
(10905, 13, '2005-01-01', '2016-12-31'),
(10905, 11, '2017-01-01', null)
create table info.DateRange ([begin] date not null, [end] date not null, [singleton] bit not null default(1) check ([singleton] = 1))
insert into info.DateRange ([begin], [end]) values ((select min(begin_date) from info.Products), getdate())
go
create view info.ProductDate with schemabinding
as
select
p.code,
p.value,
dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) as [date]
from
info.DateRange as dr
cross join
digits.Ones as ones
cross join
digits.Tens as tens
cross join
digits.Hundreds as huns
cross join
digits.Thousands as thos
cross join
digits.TenThousands as tthos
join
info.Products as p on
dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) between p.begin_date and isnull(p.end_date, datefromparts(9999, 12, 31))
go
create unique clustered index idx_ProductDate on info.ProductDate ([date], code)
go
select *
from info.ProductDate with (noexpand)
where
date = '2014-01-01'
drop view info.ProductDate
drop table info.Products
drop table info.DateRange
drop table digits.Ones
drop table digits.Tens
drop table digits.Hundreds
drop table digits.Thousands
drop table digits.TenThousands
drop schema digits
drop schema info
go

A solution without gaps might be this:
DECLARE #tbl TABLE(ID INT IDENTITY,[start_date] DATE);
INSERT INTO #tbl VALUES({d'2016-10-01'}),({d'2016-09-01'}),({d'2016-08-01'}),({d'2016-07-01'}),({d'2016-06-01'});
SELECT * FROM #tbl;
DECLARE #DateFilter DATE={d'2016-08-13'};
SELECT TOP 1 *
FROM #tbl
WHERE [start_date]<=#DateFilter
ORDER BY [start_date] DESC
Important: Be sure that there is an (unique) index on start_date
UPDATE: for different products
DECLARE #tbl TABLE(ID INT IDENTITY,ProductID INT,[start_date] DATE);
INSERT INTO #tbl VALUES
--product 1
(1,{d'2016-10-01'}),(1,{d'2016-09-01'}),(1,{d'2016-08-01'}),(1,{d'2016-07-01'}),(1,{d'2016-06-01'})
--product 1
,(2,{d'2016-10-17'}),(2,{d'2016-09-16'}),(2,{d'2016-08-15'}),(2,{d'2016-07-10'}),(2,{d'2016-06-11'});
DECLARE #DateFilter DATE={d'2016-08-13'};
WITH PartitionedCount AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY [start_date] DESC) AS Nr
,*
FROM #tbl
WHERE [start_date]<=#DateFilter
)
SELECT *
FROM PartitionedCount
WHERE Nr=1

First you need to create a unique clustered index for (begin_date, end_date, code)
Then SQL engine will be able to do INDEX SEEK.
Additionally, you can also try to create a view for dbo.Products table to join that table with pre-populated dbo.Dates table.
select p.code, p.val, p.begin_date, p.end_date, d.[date]
from dbo.Product as p
inner join dbo.dates d on p.begin_date <= d.[date] and d.[date] <= p.end_date
Then in your function, you use that view as "where #date = view.date". The result can be either better or slightly worse... it depends on the actual data.
You also can try to make that view indexed (depends on how often it is being updated).
Alternatively, you can have better performance if you populate dbo.Products table for every date in the [begin_date] .. [end_date] range.

Approach with ROW_NUMBER scans the whole Products table once. It is the best method if you have a lot of product codes in the Products table and few validity ranges for each code.
WITH
CTE_rn
AS
(
SELECT
code
,value
,ROW_NUMBER() OVER (PARTITION BY code ORDER BY begin_date DESC) AS rn
FROM Products
WHERE begin_date <= #date
)
SELECT *
FROM
<some table with product codes> as t
LEFT JOIN CTE_rn ON CTE_rn.code = t.product_code AND CTE_rn.rn = 1
;
If you have few product codes and a lot of validity ranges for each code in the Products table, then it is better to seek the Products table for each code using OUTER APPLY.
SELECT *
FROM
<some table with product codes> as t
OUTER APPLY
(
SELECT TOP(1)
Products.value
FROM Products
WHERE
Products.code = t.product_code
AND Products.begin_date <= #date
ORDER BY Products.begin_date DESC
) AS A
;
Both variants need unique index on (code, begin_date DESC) include (value).
Note how the queries don't even look at end_date, because they assume that intervals don't have gaps. They will work in SQL Server 2008.

EDIT: My original answer was using an INNER JOIN, but the questioner wanted a LEFT JOIN.
CREATE TABLE Products
(
[Code] INT NOT NULL
, [Value] VARCHAR(30) NOT NULL
, Begin_Date DATETIME NOT NULL
, End_Date DATETIME NULL
)
/*
Products
code value begin_date end_date
10905 13 2005-01-01 2016-12-31
10905 11 2017-01-01 null
*/
INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 13, '2005-01-01', '2016-12-31')
INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 11, '2017-01-01', NULL)
CREATE NONCLUSTERED INDEX SK_ProductDate ON Products ([Code], Begin_Date, End_Date) INCLUDE ([Value])
CREATE TABLE SomeTableWithProductCodes
(
[CODE] INT NOT NULL
)
INSERT INTO SomeTableWithProductCodes ([Code]) VALUES (10905)
Here is a prototypical query, with a date predicate. Note that there are more optimal ways to do this in a bulletproof fashion, using a "less than" operator on the upper bound, but that's a different discussion.
SELECT
P.[Code]
, P.[Value]
, P.[Begin_Date]
, P.[End_Date]
FROM
SomeTableWithProductCodes ST
LEFT JOIN Products AS P ON
ST.[Code] = P.[Code]
AND '2016-06-30' BETWEEN P.[Begin_Date] AND ISNULL(P.[End_Date], '9999-12-31')
This query will perform an Index Seek on the Product table.
Here is a SQL Fiddle: SQL Fiddle - Products and Dates

select with count for years

I have pickup table which looks like this
create table Pickup
(
PickupID int IDENTITY,
ClientID int ,
PickupDate date ,
PickupProxy varchar (200) ,
PickupHispanic bit default 0,
EthnCode varchar(5) ,
CategCode varchar (2) ,
AgencyID int,
Primary Key (PickupID),
);
and it containe pickups for clients
I need to create report based on this table which should looks like this
I know i need to use CASE but really do not know how to put years and calculate
average pickups for each year. and how to count pickups for specific year.so far i have only this
SELECT
DATEPART(YEAR, PickupDate)as 'Month'
FROM dbo.Pickup
group by DATEPART(YEAR, PickupDate)
WITH ROLLUP
Any ideas?

Best way to structure a query like this is to use an Index table. Ideally create this in your master database so it is available to all dbs on the server but can be created in the local db too.
You can create one like this
create table IndexTable
(
IndexID int NOT NULL,
Primary Key (IndexID),
);
Then fill it with the numbers 1 - n where n is big enough, say 1,000,000.
Like this
INSERT IndexTable
VALUES (1)
WHILE (SELECT MAX(IndexID) FROM IndexTable) < 1000000
INSERT IndexTable
SELECT IndexID + (SELECT MAX(IndexID) FROM IndexTable)
FROM IndexTable
Your query then uses this table to treat months as integers
SELECT DATEADD(month, 0, i.IndexID) Months
,COUNT(p.PickupDate)
,AVERAGE(*Whatever*)
FROM Pickup p
INNER JOIN
IndexTable i ON DATEDIFF(month, p.PickupDate, 0) = i.IndexID
GROUP BY i.IndexID
WITH ROLLUP

Need multiple copies of one resultset in sql without using loop

Following is the sample data. I need to make 3 copies of this data in t sql without using loop and return as one resultset. This is sample data not real.
42 South Yorkshire
43 Lancashire
44 Norfolk
Edit: I need multiple copies and I have no idea in advance that how many copies I need I have to decide this on the basis of dates. Date might be 1st jan to 3rd Jan OR 1st jan to 8th Jan.
Thanks.

Don't know about better but this is definatley more creative! you can use a CROSS JOIN.
EDIT: put some code in to generate a date range, you can change the date range, the rows in the #date are your multiplier.
declare #startdate datetime
, #enddate datetime
create table #data1 ([id] int , [name] nvarchar(100))
create table #dates ([date] datetime)
INSERT #data1 SELECT 42, 'South Yorkshire'
INSERT #data1 SELECT 43, 'Lancashire'
INSERT #data1 SELECT 44, 'Norfolk'
set #startdate = '1Jan2010'
set #enddate = '3Jan2010'
WHILE (#startdate <= #enddate)
BEGIN
INSERT #dates SELECT #startdate
set #startdate=#startdate+1
END
SELECT [id] , [name] from #data1 cross join #dates
drop table #data1
drop table #dates

You could always use a CTE to do the dirty work
Replace the WHERE Counter < 4 with the amount of duplicates you need.
CREATE TABLE City (ID INTEGER PRIMARY KEY, Name VARCHAR(32))
INSERT INTO City VALUES (42, 'South Yorkshire')
INSERT INTO City VALUES (43, 'Lancashire')
INSERT INTO City VALUES (44, 'Norfolk')
/*
The CTE duplicates every row from CTE for the amount
specified by Counter
*/
;WITH CityCTE (ID, Name, Counter) AS
(
SELECT c.ID, c.Name, 0 AS Counter
FROM City c
UNION ALL
SELECT c.ID, c.Name, Counter + 1
FROM City c
INNER JOIN CityCTE cte ON cte.ID = c.ID
WHERE Counter < 4
)
SELECT ID, Name
FROM CityCTE
ORDER BY 1, 2
DROP TABLE City

This may not be the most efficient way of doing it, but it should work.
(select ....)
union all
(select ....)
union all
(select ....)

Assume the table is named CountyPopulation:
SELECT * FROM CountyPopulation
UNION ALL
SELECT * FROM CountyPopulation
UNION ALL
SELECT * FROM CountyPopulation
Share and enjoy.

There is no need to use a cursor. The set-based approach would be to use a Calendar table. So first we make our calendar table which need only be done once and be somewhat permanent:
Create Table dbo.Calendar ( Date datetime not null Primary Key Clustered )
GO
; With Numbers As
(
Select ROW_NUMBER() OVER( ORDER BY S1.object_id ) As [Counter]
From sys.columns As s1
Cross Join sys.columns As s2
)
Insert dbo.Calendar([Date])
Select DateAdd(d, [Counter], '19000101')
From Numbers
Where [Counter] <= 100000
GO
I populated it with a 100K dates which goes into 2300. Obviously you can always expand it. Next we generate our test data:
Create Table dbo.Data(Id int not null, [Name] nvarchar(20) not null)
GO
Insert dbo.Data(Id, [Name]) Values(42,'South Yorkshire')
Insert dbo.Data(Id, [Name]) Values(43, 'Lancashire')
Insert dbo.Data(Id, [Name]) Values(44, 'Norfolk')
GO
Now the problem becomes trivial:
Declare #Start datetime
Declare #End datetime
Set #Start = '2010-01-01'
Set #End = '2010-01-03'
Select Dates.[Date], Id, [Name]
From dbo.Data
Cross Join (
Select [Date]
From dbo.Calendar
Where [Date] >= #Start
And [Date] <= #End
) As Dates

By far the best solution is CROSS JOIN. Most natural.
See my answer here: How to retrieve rows multiple times in SQL Server?
If you have a Numbers table lying around, it's even easier. You can DATEDIFF the dates to give you the filter on the Numbers table

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Year Over Year (YOY) Distinct Count - sql

Related

Optimize this query without using not exist repeatably, is there a better way to write this query?

An aggregate may not appear in the set list of an UPDATE statement T-SQL

Efficient way of storing date ranges

select with count for years

Need multiple copies of one resultset in sql without using loop

Categories

Resources