Need to Impute Missing Data from Sparsely Populated Table

Need to Impute Missing Data from Sparsely Populated Table - sql

I am trying to populate a #temp reporting table from an existing sparse table with 2 key elements-- datekeys and prices based on a date range and prices as they change. Here is the data that exists in the price change table:
The date range for a report 2 days ago on a rolling 7 day cycle would include 6/12 - 6/19. The 2 rows in the table have an old price and old price variance from 6/7 out of range for that report, however, the price and variance is needed in order to dub it in under the CashPrice column for all datekeys 20190612 through 20190618. On 6/19 there is a new price change/variance. On that
datekey, the new price/variance values should change.
The set-up data required for reporting would like this:
Here is the code to build the sampling data:
-- T-SQL script to build the sampling tables
-- use for date-related cross-join later
if object_id( N'tempdb..#Numbers', N'U' ) is not null
drop table #Numbers;
create table #Numbers(
n int
);
insert #Numbers( n ) values( 0 ), ( 1 ), ( 2 ), ( 3 ), ( 4 ), ( 5 ), ( 6 ),
( 7 ), ( 8 ), ( 9 ), ( 10 )
-- select * from #Numbers;
-- creating existing sparse price data
if object_id( N'tempdb..#dt', N'U' ) is not null
drop table #dt;
create table #dt(
StoreNumber int
, City char( 3 )
, State char( 2 )
, Type char( 1 )
, ProductKey int
, DateKey int
, CashPrice money
, DateLastPriceChange datetime
, CashPriceVar money
)
insert #dt values
( 1, 'OKC', 'OK', 'D', 144, 20190607, 2.799, '2019-06-07 11:37', -0.1 )
, ( 1, 'OKC', 'OK', 'D', 144, 20190619, 2.699, '2019-06-19 10:40', -0.1 )
-- select * from #dt;
-- creaing temporary working table for reporting
if object_id( N'tempdb..#tt', N'U' ) is not null
drop table #tt;
create table #tt(
StoreNumber int
, City char( 3 )
, State char( 2 )
, Type char( 1 )
, ProductKey int
, DateKey int
-- a couple of extra columns here
, DateKeyDate date
, DataDateKey int
, CashPrice money
, DateLastPriceChange datetime
, CashPriceVar money
)
-- dub in the start date for the report
declare #StartDateKey date = '2019-06-12'
-- populate the temporary working table
insert #tt( StoreNumber, ProductKey, DateKeyDate )
select distinct
dt.StoreNumber
, dt.ProductKey
, dateadd( day, n.n, #StartDateKey )
from #dt dt
cross join #Numbers n
where
n.n <= 7
-- select * from #tt
-- change the added datekeydate to datekey format
update #tt
set DateKey = year( DateKeyDate ) * 10000 + month( DateKeyDate ) * 100 +
day( DateKeyDate )
Here is the code that I have been working on that limps along. It's not ideal and in many cases against the full suite of data, it filters out the dates I cross-joined, so that I am left without the imputed data-- showing only the original known price changes. Please advise.
select
dd.StoreNumber
, dto.City
, dto.State
, dto.Type
, dd.ProductKey
, dd.DateKey
, dto.CashPrice
, dto.DateLastPriceChange
, dto.CashPriceVar
from (
select
tt.StoreNumber
, tt.ProductKey
, tt.DateKey
, max( dt.DateKey ) MaxDateKey
from #tt tt
inner join #dt dt
on dt.StoreNumber = tt.StoreNumber
and dt.ProductKey = tt.ProductKey
and dt.DateKey <= tt.DateKey
group by
tt.StoreNumber
, tt.ProductKey
, tt.DateKey
) dd
inner join #dt dto
on dto.StoreNumber = dd.StoreNumber
and dto.ProductKey = dd.ProductKey
and dto.DateKey = dd.MaxDateKey;
EDIT: Can anyone advise as to whether or not the SQL Server LEAD function might be a better choice--perhaps comparing the value of the current row with the value of the following row?

Related

Return columns from different tables that are after a given date

I need to write a query that returns product orders that were open during April of 2018 and are still open and also returns product orders that were open during April of 2018 and are no longer open.
The rows need to include in the results of the name of the customer that placed the order, the id for the order, and the date the order was filled.
Here is the table info
CREATE TABLE dbo.ProductOrders
(
POID INT NOT NULL IDENTITY(1, 1) PRIMARY KEY ,
ProductId INT NOT NULL
CONSTRAINT FK_ProductOrders_ProductId_ref_Products_ProductId
FOREIGN KEY REFERENCES dbo.Products ( ProductId ) ,
CustomerId INT NOT NULL ,
OrderedQuantity INT ,
Filled BIT NOT NULL
CONSTRAINT DF_ProductOrders_Filled
DEFAULT ( 0 ) ,
DateOrdered DATETIME
CONSTRAINT DF_ProductOrders_DateOrdered
DEFAULT ( GETDATE()) ,
DateFilled DATETIME
CONSTRAINT DF_ProductOrders_DateFilled
DEFAULT ( GETDATE())
);
INSERT dbo.ProductOrders ( ProductId ,
CustomerId ,
OrderedQuantity ,
Filled ,
DateOrdered ,
DateFilled )
VALUES ( 2, 1, 1000, 0, '4/16/18 8:09:13', NULL ) ,
( 2, 1, 500, 1, '3/27/18 17:00:21', '6/24/18 13:29:01' ) ,
( 3, 3, 2000, 1, '12/01/04 13:28:58', '2/19/05 19:41:42' ) ,
( 1, 1, 632, 0, '5/23/18 4:25:52', NULL ) ,
( 4, 4, 901, 0, '3/30/18 21:30:28', NULL );
CREATE TABLE dbo.Customers
(
CustomerId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY ,
CustomerName NVARCHAR(100) ,
Active BIT NOT NULL
CONSTRAINT DF_Customers_Active
DEFAULT ( 1 )
);
INSERT dbo.Customers ( CustomerName ,
Active )
VALUES ( 'Bikes R'' Us', 1 ) ,
( 'Industrial Giant', 1 ) ,
( 'Widget-Works', 0 ) ,
( 'Custom Hangers', 1 );
This is my best attempt at it, I know this is not the right syntax but I'm not sure if I need a join between these to tables to make this work or how I would go about selecting orders that start at April 2018 and also are open or closed after that date.
select CustomerName, POID, DataFilled,
From ProductOrders, Customers
Where DateOrdered is >= April 2018

I think you want and join and filtering:
select c.customername, po.poid, po.dateordered, po.datefilled
from productorders po
inner join customers c on c.customerid = po.customerid
where
po.dateordered >= '20180401'
and po.dateordered < '20180501'
and po.datefilled < getdate()
This gives you orders that were ordered in April 2018 and are not open anymore as of now. To get orders that are still open, you would change the last condition to po.datefilled is null.

How to Auto generate dates between date range using SQL Query?

I just want to generate the date between data range using SQL Query.
Source:
Result:
Thanks,
Lawrance A

Here is how to accomplish this by using a tally table to create a calendar table:
declare #source table
(
user_id int not null primary key clustered,
from_date date not null,
to_date date not null
);
insert into #source
values
(1, '02/20/2019', '02/23/2019'),
(2, '02/22/2019', '02/28/2019'),
(3, '03/01/2019', '03/05/2019');
with
rows as
(
select top 1000
n = 1
from sys.messages
),
tally as
(
select n = row_number() over(order by (select null)) - 1
from rows
),
calendar as
(
select
date = dateadd(dd, n, (select min(from_date) from #source))
from tally
)
select
s.user_id,
c.date
from #source s
cross join calendar c
where c.date between s.from_date and s.to_date;
Result set:

Is there any way I could use Alternate for Lead() and Lag() Functions in SQL Server 2008?

I've table where I need to calculate difference between row to one underneath it and get the resultant to xml. it's daily task so i need it is kind of recursive task.
Structure for my current table is as below :
CREATE TABLE #Temp
(
, CurrentDateTime DateTime
, ID INT
, ThisYearToDateTotal INT
, ThisYearToDateCBT INT
, ThisYearToDateManual INT
, ThisYearToDateScanned INT
, InProcess INT
, InputRequired INT
)`
So far I've written the code as below :
SELECT
Today_CurrentDateTime
, Today_Total
, Today_CBT
, Today_Manual
, Today_Scanned
, Today_InProcess
, Today_InputRequired
, Yesterday_Total
, Yesterday_CBT
, Yesterday_Manual
, Yesterday_Scanned
, Yesterday_InProcess
, Yesterday_InputRequired
, (TD.Today_Total - YD.Yesterday_Total) AS Diff_in_Total
, (TD.Today_CBT - YD.Yesterday_CBT) AS Diff_in_CBT
, (TD.Today_Manual - YD.Yesterday_Manual) AS Diff_in_Manual
, (TD.Today_Scanned - YD.Yesterday_Scanned) AS Diff_in_Scanned
, (TD.Today_InProcess - YD.Yesterday_InProcess) AS Diff_in_InProcess
, (TD.Today_InputRequired - YD.Yesterday_InputRequired) AS Diff_in_InputRequired
FROM #YesterdayData AS YD
INNER JOIN #TodayData AS TD ON TD.Today_ID = YD.Yesterday_ID
and getting the output as below :
Now I've a restriction here that I can't create another permanent table and that's why I can't calculate difference for a each day for throughout a week.
Any Help ?

If the dates are input in order of the identity field ID then you can inner join with the same #temp table ON previous ID.
CREATE TABLE #Temp
( CurrentDateTime DateTime
, ID INT
, ThisYearToDateTotal INT
, ThisYearToDateCBT INT
, ThisYearToDateManual INT
, ThisYearToDateScanned INT
, InProcess INT
, InputRequired INT
)
INSERT INTO #Temp VALUES
('2017-11-14 07:50:25.230', 1, 400000, 50000, 20000, 30000, 1000, 700)
,('2017-11-15 07:50:25.230', 2, 460000, 53000, 26000, 38000, 2000, 1400)
,('2017-11-16 07:53:01.943', 3, 469692, 53904, 26755, 389033, 2026, 1489)
,('2017-11-17 07:53:01.943', 4, 469692, 53904, 26755, 389033, 2026, 1489)
DELETE FROM #Temp WHERE ID = 3
SELECT T.CurrentDateTime
, TPrev.ThisYearToDateTotal - T.ThisYearToDateTotal [Total Diff]
, TPrev.ThisYearToDateCBT - T.ThisYearToDateCBT [ThisYearToDateCBT Diff]
, TPrev.ThisYearToDateManual - T.ThisYearToDateManual [ThisYearToDateManual Diff]
, TPrev.ThisYearToDateScanned - T.ThisYearToDateScanned [ThisYearToDateScanned Diff]
, TPrev.InProcess - T.InProcess [InProcess Diff]
, TPrev.InputRequired - T.InputRequired [InputRequired Diff]
FROM #Temp AS T LEFT JOIN #Temp AS TPrev ON TPrev.ID = (SELECT MAX(T2.ID)
FROM #Temp T2
WHERE T2.ID > T.ID)
ORDER BY T.ID
--DROP TABLE #Temp

It's quite easy to mimic LAG and LEAD using sub queries.
For the #Temp table you have in your question, to get ThisYearToDateTotal value of the previous or next row (order by CurrentDateTime).
Here is a simple example:
SELECT ID,
CurrentDateTime,
ThisYearToDateTotal,
(
SELECT TOP 1 ThisYearToDateTotal
FROM #Temp as tLag
WHERE tLag.ID = tMain.Id -- partition by
AND tLag.CurrentDateTime < tMain.CurrentDateTime
ORDER BY CurrentDateTime DESC
) As Lag_ThisYearToDateTotal,
(
SELECT TOP 1 ThisYearToDateTotal
FROM #Temp as tLead
WHERE tLead.ID = tMain.Id -- partition by
AND tLead.CurrentDateTime > tMain.CurrentDateTime
ORDER BY CurrentDateTime
) As Lead_ThisYearToDateTotal
FROM #Temp as tMain

SQL Server - from two rows, one column to one row, two columns?

if object_id( 'tempdb.dbo.#ctp', 'u' ) is not null
drop table #ctp ;
create table #ctp( id int, mastername varchar( 16 ) ) ;
insert into #ctp values( 1, 'Big Boy' ) ;
if object_id( 'tempdb.dbo.#client', 'u' ) is not null
drop table #client ;
create table #client( id int, name varchar(16 ), type int ) ;
insert into #client values( 1, 'ABC', 5 ) ;
insert into #client values( 2, 'XYZ', 6 ) ;
if object_id( 'tempdb.dbo.#ctpclient', 'u' ) is not null
drop table #ctpclient ;
create table #ctpclient( id int, ctpfk int, clientfk int ) ;
insert into #ctpclient values( 1, 1, 1 ) ;
insert into #ctpclient values( 2, 1, 2 ) ;
select tp.mastername
, c.name
, c.type
, cc.ctpfk
, cc.clientfk
from #ctp tp
join #ctpclient cc
on tp.id = cc.ctpfk
join #client c
on c.id = cc.clientfk
;
current output
mastername|name|type
Big Boy|ABC|5
Big Boy|XYZ|6
Instead of two rows of output, I would like the output to be as follows:
mastername|nameone|nametwo
Big Boy | ABC | XYZ
What is the optimal way to do this given that I have a many to many table such as #ctpclient?

Assuming you always have 2 rows you can use a crosstab (aka conditional aggregation). It would look something like this.
with SortedValues as
(
select tp.mastername
, c.name
, ROW_NUMBER() over (partition by mastername order by clientfk) as RowNum
from #ctp tp
join #ctpclient cc on tp.id = cc.ctpfk
join #client c on c.id = cc.clientfk
)
select mastername
, MAX(case when RowNum = 1 then name end) as NameOne
, MAX(case when RowNum = 2 then name end) as NameTwo
from SortedValues
group by mastername
If you have a varying numbers you can still accomplish but it is bit more complex.

SQL query when result is empty

I have a table like this
USER itemnumber datebought (YYYYmmDD)
a 1 20160101
b 2 20160202
c 3 20160903
d 4 20160101
Now I have to show the total number of items bought by each user after date 20160202 (2 february 2016)
I used
SELECT USER, COUNT(itemnumber)<br/>
FROM TABLE<br/>
WHERE datebought >= 20160202<br/>
GROUP BY USER<br>
It gives me results
b 1
c 1
but I want like this
a 0
b 1
c 1
d 0
Please tell me what is the most quick method / efficient method to do that ?

Try like this,
DECLARE #table TABLE
(
[USER] VARCHAR(1),
itemnumber INT,
datebought DATE
)
INSERT INTO #TABLE VALUES
('a',1,'20160101'),
('b',2,'20160202'),
('b',2,'20160202'),
('b',2,'20160202'),
('c',3,'20160903'),
('d',4,'20160101')
SELECT *
FROM #TABLE
SELECT [USER],
Sum(CASE
WHEN datebought >= '20160202' THEN 1
ELSE 0
END) AS ITEMCOUNT
FROM #TABLE
GROUP BY [USER]

Use this
SELECT USER, COUNT(itemnumber)
FROM TABLE
WHERE datebought >= 20160202
GROUP BY USER

Though this query won't be a good idea for the large amount of data:
SELECT USER, COUNT(itemnumber)
FROM TABLE
WHERE datebought >= 20160202
GROUP BY USER
UNION
SELECT DISTINCT USER, 0
FROM TABLE
WHERE datebought < 20160202

USE tempdb
GO
DROP TABLE test1
CREATE TABLE test1(a NVARCHAR(10), ino INT, datebought INT)
INSERT INTO dbo.test1
( a, ino, datebought )
VALUES ( 'a' , 1 , 20160101)
INSERT INTO dbo.test1
( a, ino, datebought )
VALUES ( 'b' , 2 , 20160202)
INSERT INTO dbo.test1
( a, ino, datebought )
VALUES ( 'c' , 3 , 20160903)
INSERT INTO dbo.test1
( a, ino, datebought )
VALUES ( 'd' , 4 , 20160101)
SELECT * FROM dbo.test1
SELECT a, COUNT(ino) OVER(PARTITION BY a) FROM dbo.test1
WHERE datebought>=20160202
UNION ALL
SELECT a, 0 FROM dbo.test1
WHERE datebought<20160202
ORDER BY a

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Need to Impute Missing Data from Sparsely Populated Table - sql

Related

Return columns from different tables that are after a given date

How to Auto generate dates between date range using SQL Query?

Is there any way I could use Alternate for Lead() and Lag() Functions in SQL Server 2008?

SQL Server - from two rows, one column to one row, two columns?

SQL query when result is empty

Categories

Resources