I am creating a report in Tableau for a new product that captures metrics such as previous applications pending, new apps end of day pending etc. In order to do this, I need a a snapshot of the end of day status for each application each day. A decision was made above my pay grade to only capture a rolling seven day delta of the data. So, what happens is an application that has not had a status change in the previous seven days stops appearing in the DB until something new happens which allows for gaps in dates and throws my numbers off in my report. What I need is a snapshot for each day for each application, so when there is a date gap, I want to grab the most recent previous day's record and insert to fill in the gaps between the two dates. Also, I join to a credit score table and we sometimes pull all three bureaus, sometimes two, sometimes one so there could be up to three rows per application per day.
I have looked on this site for similar issues which I seem some similar issues however none are an exact match to what I am trying to accomplish and I honestly do not know where to start. Will a correlated subquery accomplish what I need? I provided some code below to show what the data looks like currently.
drop table if exists #date
drop table if exists #test
create table #date
(
calendar_date date
)
insert into #date
values
('2019-08-07'),
('2019-08-08'),
('2019-08-09'),
('2019-08-10'),
('2019-08-11'),
('2019-08-12')
create table #test
(
id int,
period_date date,
decision_status varchar(20),
credit_score int,
expired_flag bit
)
insert into #test (id,period_date,decision_status,credit_score,expired_flag)
values
(1,'2019-08-08','declined',635,null),
(1,'2019-08-08','declined',642,null),
(1,'2019-08-09','declined',635,null),
(1,'2019-08-09','declined',642,null),
(1,'2019-08-10','declined',635,null),
(1,'2019-08-10','declined',642,null),
(1,'2019-08-11','declined',635,null),
(1,'2019-08-11','declined',642,null),
(1,'2019-08-12','declined',635,null),
(1,'2019-08-12','declined',642,null),
(2,'2019-08-08','review',656,null),
(2,'2019-08-08','review',648,null),
(2,'2019-08-09','review',656,null),
(2,'2019-08-09','review',648,null),
(2,'2019-08-12','review',656,null),
(2,'2019-08-12','review',648,null),
(3,'2019-08-08','preapproved',678,null),
(3,'2019-08-08','preapproved',689,null),
(3,'2019-08-08','preapproved',693,null),
(3,'2019-08-09','preapproved',678,null),
(3,'2019-08-09','preapproved',689,null),
(3,'2019-08-09','preapproved',693,null),
(3,'2019-08-11','preapproved',678,1),
(3,'2019-08-11','preapproved',689,1),
(3,'2019-08-11','preapproved',693,1),
(3,'2019-08-12','preapproved',678,1),
(3,'2019-08-12','preapproved',689,1),
(3,'2019-08-12','preapproved',693,1),
(4,'2019-08-08','onboarded',725,null),
(4,'2019-08-09','onboarded',725,null),
(4,'2019-08-10','onboarded',725,null),
(5,'2019-08-08','approved',685,null),
(5,'2019-08-08','approved',675,null),
(5,'2019-08-09','approved',685,null),
(5,'2019-08-09','approved',675,null),
(5,'2019-08-12','approved',685,1),
(5,'2019-08-12','approved',675,1)
And the query:
select id, calendar_date, period_date, decision_status, credit_score, expired_flag
from #date join
#test
on calendar_date=dateadd(day,-1,period_date)
order by id, calendar_date
I just need each application to show for each day.
You may just need a left join: just need a left join:
select t.id, d.calendar_date, t.period_date, t.decision_status, t.credit_score, t.expired_flag
from #date d left join
#test t
on d.calendar_date = dateadd(day, -1, t.period_date)
order by id, d.calendar_date;
If by "application" you mean the id in #test, then use cross join to generate the rows and a outer apply to fill in the values:
select t.id, d.calendar_date, t.period_date, t.decision_status, t.credit_score, t.expired_flag
from #date d cross join
(select distinct id from #test) i outer apply
(select top (1) t.*
from #test t
where t.id = i.id and t.date <= d.date
order by t.date desc
) t
Update:
After receiving the reply from Gordon, which gave me some inspiration and set me in the right direction, and conducting some additional research, I appear to have found a solution that is working. I wanted to share the solution here in case anyone else runs across this problem. I am posting the code below:
drop table if exists #date
drop table if exists #test
drop table if exists #test1
drop table if exists #row_num
create table #date
(
calendar_date date
)
insert into #date
values
('2019-08-07'),
('2019-08-08'),
('2019-08-09'),
('2019-08-10'),
('2019-08-11')
create table #test
(
id int,
period_date date,
decision_status varchar(20),
credit_score int,
expired_flag bit
)
insert into #test (id,period_date,decision_status,credit_score,expired_flag)
values
(1,'2019-08-08','declined',635,null),
(1,'2019-08-08','declined',642,null),
(1,'2019-08-09','declined',635,null),
(1,'2019-08-09','declined',642,null),
(1,'2019-08-10','declined',635,null),
(1,'2019-08-10','declined',642,null),
(1,'2019-08-11','declined',635,null),
(1,'2019-08-11','declined',642,null),
(1,'2019-08-12','declined',635,null),
(1,'2019-08-12','declined',642,null),
(2,'2019-08-08','review',656,null),
(2,'2019-08-08','review',648,null),
(2,'2019-08-09','review',656,null),
(2,'2019-08-09','review',648,null),
(2,'2019-08-12','review',656,null),
(2,'2019-08-12','review',648,null),
(3,'2019-08-08','preapproved',678,null),
(3,'2019-08-08','preapproved',689,null),
(3,'2019-08-08','preapproved',693,null),
(3,'2019-08-09','preapproved',678,null),
(3,'2019-08-09','preapproved',689,null),
(3,'2019-08-09','preapproved',693,null),
(3,'2019-08-11','preapproved',678,1),
(3,'2019-08-11','preapproved',689,1),
(3,'2019-08-11','preapproved',693,1),
(3,'2019-08-12','preapproved',678,1),
(3,'2019-08-12','preapproved',689,1),
(3,'2019-08-12','preapproved',693,1),
(4,'2019-08-08','onboarded',725,null),
(4,'2019-08-09','onboarded',725,null),
(4,'2019-08-10','onboarded',725,null),
(5,'2019-08-08','approved',685,null),
(5,'2019-08-08','approved',675,null),
(5,'2019-08-09','approved',685,null),
(5,'2019-08-09','approved',675,null),
(5,'2019-08-12','approved',685,1),
(5,'2019-08-12','approved',675,1)
select id,calendar_date,decision_status,credit_score,expired_flag
,ROW_NUMBER() over(partition by id,calendar_date order by calendar_date) as row_id
,cast(ROW_NUMBER() over(partition by id,calendar_date order by calendar_date) as char(1)) as row_num
into #test1
from #date
join #test
on calendar_date=dateadd(day,-1,period_date)
order by id,calendar_date
create table #row_num
(
row_id int,
row_num char(1)
)
insert into #row_num
values
(1,'1'),
(2,'2'),
(3,'3')
select i.id
,d.calendar_date
,coalesce(t.decision_status,t1.decision_status) as decision_status
,coalesce(t.credit_score,t1.credit_score) as credit_score
,coalesce(t.expired_flag,t1.expired_flag) as expired_flag
from #date d
cross join
(select distinct id
from #test1 ) i
cross join #row_num r
left join #test1 t
on t.id=i.id
and t.row_id=r.row_id
and t.calendar_date=d.calendar_date
join
(select id,row_id,decision_status,credit_score,expired_flag
,calendar_date as start_date
,lead(calendar_date,1,dateadd(day,1,(select max(calendar_date) from #date)))
over (partition by id,row_id order by calendar_date) as end_date
from #test1
) t1
on t1.id=i.id
and t1.row_id=r.row_id
and d.calendar_date>=t1.start_date
and d.calendar_date<t1.end_date
order by i.id,d.calendar_date,r.row_id
This gives me what I am looking for, all the daily records for each application for each day.
I'm getting the wrong result from my report. Maybe i'm missing something simple.
The report is an inline table-valued-function that should count goods movement in our shop and how often these spareparts are claimed(replaced in a repair).
The problem: different spareparts in the shop-table(lets call it SP) can be linked to the same sparepart in the "repair-table"(TSP). I need the goods movement of every sparepart in SP and the claim-count of every distinct sparepart in TSP.
This is a very simplified excerpt of the relevant part:
create table #tsp(id int, name varchar(20),claimed int);
create table #sp(id int, name varchar(20),fiTsp int,ordered int);
insert into #tsp values(1,'1235-6044',300);
insert into #tsp values(2,'1234-5678',400);
insert into #sp values(1,'1235-6044',1,30);
insert into #sp values(2,'1235-6044',1,40);
insert into #sp values(3,'1235-6044',1,50);
insert into #sp values(4,'1234-5678',2,60);
WITH cte AS(
select tsp.id As TspID,tsp.name as TspName,tsp.claimed As Claimed
,sp.id As SpID,sp.name As SpName,sp.ordered As Ordered
from #sp sp inner join #tsp tsp
on sp.fiTsp=tsp.id
)
SELECT TspName, SUM(Claimed) As Claimed, Sum(Ordered) As Ordered
FROM cte
Group By TspName
drop table #tsp;
drop table #sp;
Result:
TspName Claimed Ordered
1234-5678 400 60
1235-6044 900 120
The Ordered-count is correct but the Claimed-count should be 300 instead of 900 for TspName='1235-6044'.
I need to group by Tsp.ID for the claim-count and group by Sp.ID for the order-count. But how in one query?
Edit: Actually the TVF looks like(note that getOrdered and getClaimed are SVFs and that i'm grouping in the outer select on TSP's Category):
CREATE FUNCTION [Gambio].[rptReusedStatistics](
#fromDate datetime
,#toDate datetime
,#fromInvoiceDate datetime
,#toInvoiceDate datetime
,#idClaimStatus varchar(50)
,#idSparePartCategories varchar(1000)
,#idSpareParts varchar(1000)
)
RETURNS TABLE AS
RETURN(
WITH ExclusionCat AS(
SELECT idSparePartCategory AS ID From tabSparePartCategory
WHERE idSparePartCategory IN(- 3, - 1, 6, 172,168)
), Report AS(
SELECT Cat.SparePartCategoryName AS Category
,TSP.SparePartDescription AS Part
,TSP.SparePartName AS PartNumber
,SP.Inventory
,Gambio.getGoodsIn(SP.idSparePart,#FromDate,#ToDate) GoodsIn
,Gambio.getOrdered(SP.idSparePart,#FromDate,#ToDate) Ordered
--,CASE WHEN TSP.idSparePart IS NULL THEN 0 ELSE
-- Gambio.getClaimed(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus,NULL)END AS Claimed
,CASE WHEN TSP.idSparePart IS NULL THEN 0 ELSE
Gambio.getClaimed(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus,1)END AS ClaimedReused
,CASE WHEN TSP.idSparePart IS NULL THEN 0 ELSE
Gambio.getCostSaving(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus)END AS Costsaving
FROM Gambio.SparePart AS SP
INNER JOIN tabSparePart AS TSP ON SP.fiTabSparePart = TSP.idSparePart
INNER JOIN tabSparePartCategory AS Cat
ON Cat.idSparePartCategory=TSP.fiSparePartCategory
WHERE Cat.idSparePartCategory NOT IN(SELECT ID FROM ExclusionCat)
AND (#idSparePartCategories IS NULL
OR TSP.fiSparePartCategory IN(
SELECT Item From dbo.Split(#idSparePartCategories,',')
)
)
AND (#idSpareParts IS NULL
OR TSP.idSparePart IN(
SELECT Item From dbo.Split(#idSpareParts,',')
)
)
)
SELECT Category
--, Part
--, PartNumber
, SUM(Inventory)As InventoryCount
, SUM(GoodsIn) As GoodsIn
, SUM(Ordered) As Ordered
--, SUM(Claimed) As Claimed
, SUM(ClaimedReused)AS ClaimedReused
, SUM(Costsaving) As Costsaving
, Count(*) AS PartCount
FROM Report
GROUP BY Category
)
Solution:
Thanks to Aliostad i've solved it by first grouping and then joining(actual TVF, reduced to a minimum):
WITH Report AS(
SELECT Cat.SparePartCategoryName AS Category
,TSP.SparePartDescription AS Part
,TSP.SparePartName AS PartNumber
,SP.Inventory
,SP.GoodsIn
,SP.Ordered
,Gambio.getClaimed(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus,1) AS ClaimedReused
,Gambio.getCostSaving(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus) AS Costsaving
FROM (
SELECT GSP.fiTabSparePart
,SUM(GSP.Inventory)AS Inventory
,SUM(Gambio.getGoodsIn(GSP.idSparePart,#FromDate,#ToDate))AS GoodsIn
,SUM(Gambio.getOrdered(GSP.idSparePart,#FromDate,#ToDate))AS Ordered
FROM Gambio.SparePart GSP
GROUP BY GSP.fiTabSparePart
)As SP
INNER JOIN tabSparePart TSP ON SP.fiTabSparePart = TSP.idSparePart
INNER JOIN tabSparePartCategory AS Cat
ON Cat.idSparePartCategory=TSP.fiSparePartCategory
)
SELECT Category
, SUM(Inventory)As InventoryCount
, SUM(GoodsIn) As GoodsIn
, SUM(Ordered) As Ordered
, SUM(ClaimedReused)AS ClaimedReused
, SUM(Costsaving) As Costsaving
, Count(*) AS PartCount
FROM Report
GROUP BY Category
You are JOINing first and then GROUPing by. You need to reverse it, GROUP BY first and then JOIN.
So here in my subquery, I group by first and then join:
select
claimed,
ordered
from
#tsp
inner JOIN
(select
fitsp,
SUM(ordered) as ordered
from
#sp
group by
fitsp) as SUMS
on
SUMS.fiTsp = id;
I think you just need to select Claimed and add it to the Group By in order to get what you are looking for.
WITH cte AS(
select tsp.id As TspID,tsp.name as TspName,tsp.claimed As Claimed
,sp.id As SpID,sp.name As SpName,sp.ordered As Ordered
from #sp sp inner join #tsp tsp
on sp.fiTsp=tsp.id )
SELECT TspName, Claimed, Sum(Ordered) As Ordered
FROM cte
Group By TspName, Claimed
Your cte is an inner join between tsp and sp, which means that the data you're querying looks like this:
SpID Ordered TspID TspName Claimed
1 30 1 1235-6044 300
2 40 1 1235-6044 300
3 50 1 1235-6044 300
4 60 2 1234-5678 400
Notice how TspID, TspName and Claimed all get repeated. Grouping by TspName means that the data gets grouped in two groups, one for 1235-6044 and one for 1234-5678. The first group has 3 rows on which to run the aggregate functions, the second group only one. That's why your sum(Claimed) will get you 300*3=900.
As Aliostad suggested, you should first group by TspID and do the sum of Ordered and then join to tsp.
No need to join, just subselect:
create table #tsp(id int, name varchar(20),claimed int);
create table #sp(id int, name varchar(20),fiTsp int,ordered int);
insert into #tsp values(1,'1235-6044',300);
insert into #tsp values(2,'1234-5678',400);
insert into #sp values(1,'1235-6044',1,30);
insert into #sp values(2,'1235-6044',1,40);
insert into #sp values(3,'1235-6044',1,50);
insert into #sp values(4,'1234-5678',2,60);
WITH cte AS(
select tsp.id As TspID,tsp.name as TspName,tsp.claimed As Claimed
,sp.id As SpID,sp.name As SpName,sp.ordered As Ordered
from #sp sp inner join #tsp tsp
on sp.fiTsp=tsp.id
)
SELECT id, name, SUM(claimed) as Claimed, (SELECT SUM(ordered) FROM #sp WHERE #sp.fiTsp = #tsp.id GROUP BY #sp.fiTsp) AS Ordered
FROM #tsp
GROUP BY id, name
drop table #tsp;
drop table #sp;
Produces:
id name Claimed Ordered
1 1235-6044 300 120
2 1234-5678 400 60
-- EDIT --
Based on the additional info, this is how I might try to split the CTE to form the data as per the example. I fully admit that Aliostad's approach may yield a cleaner query but here's an attempt (completely blind) using the subselect:
CREATE FUNCTION [Gambio].[rptReusedStatistics](
#fromDate datetime
,#toDate datetime
,#fromInvoiceDate datetime
,#toInvoiceDate datetime
,#idClaimStatus varchar(50)
,#idSparePartCategories varchar(1000)
,#idSpareParts varchar(1000)
)
RETURNS TABLE AS
RETURN(
WITH ExclusionCat AS (
SELECT idSparePartCategory AS ID From tabSparePartCategory
WHERE idSparePartCategory IN(- 3, - 1, 6, 172,168)
), ReportSP AS (
SELECT fiTabSparePart
,Inventory
,Gambio.getGoodsIn(idSparePart,#FromDate,#ToDate) GoodsIn
,Gambio.getOrdered(idSparePart,#FromDate,#ToDate) Ordered
FROM Gambio.SparePart
), ReportTSP AS (
SELECT TSP.idSparePart
,Cat.SparePartCategoryName AS Category
,TSP.SparePartDescription AS Part
,TSP.SparePartName AS PartNumber
,CASE WHEN TSP.idSparePart IS NULL THEN 0 ELSE
Gambio.getClaimed(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus,1)END AS ClaimedReused
,CASE WHEN TSP.idSparePart IS NULL THEN 0 ELSE
Gambio.getCostSaving(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus)END AS Costsaving
FROM tabSparePart AS TSP
INNER JOIN tabSparePartCategory AS Cat
ON Cat.idSparePartCategory=TSP.fiSparePartCategory
WHERE Cat.idSparePartCategory NOT IN(SELECT ID FROM ExclusionCat)
AND (#idSparePartCategories IS NULL
OR TSP.fiSparePartCategory IN(
SELECT Item From dbo.Split(#idSparePartCategories,',')
)
)
AND (#idSpareParts IS NULL
OR TSP.idSparePart IN(
SELECT Item From dbo.Split(#idSpareParts,',')
)
)
)
SELECT Category
--, Part
--, PartNumber
, (SELECT SUM(Inventory) FROM ReportSP WHERE ReportSP.fiTabSparePart = idSparePart GROUP BY fiTabSparePart) AS Inventory
, (SELECT SUM(GoodsIn) FROM ReportSP WHERE ReportSP.fiTabSparePart = idSparePart GROUP BY fiTabSparePart) AS GoodsIn
, (SELECT SUM(Ordered) FROM ReportSP WHERE ReportSP.fiTabSparePart = idSparePart GROUP BY fiTabSparePart) AS Ordered
, Claimed
, ClaimedReused
, Costsaving
, Count(*) AS PartCount
FROM ReportTSP
GROUP BY Category
)
Without a better understanding of the whole schema it's difficult to cover for all the eventualities but whether this works or not (I suspect PartCount will be 1 for all instances) hopefully it'll give you some fresh thoughts for alternate approaches.
SELECT
tsp.name
,max(tsp.claimed) as claimed
,sum(sp.ordered) as ordered
from #sp sp
inner join #tsp tsp
on sp.fiTsp=tsp.id
GROUP BY tsp.name