Dynamically generated sql pivot with calculated columns general methodology

Dynamically generated sql pivot with calculated columns general methodology - sql

I have the following scenario:
Generation of a pivot with dynamic columns which I can manage via generating dynamic sql either on the client or in a stored procedure
I then need to calculate additional values based on the values already calculated by the pivot.
Here is the sql to generate all the steps in the process :
IF OBJECT_ID('dbo.departmentStats', 'U') IS NOT NULL
DROP TABLE dbo.departmentStats;
CREATE TABLE departmentStats
([Facility] varchar(50), [Department] varchar(50),[Stat] varchar(5),
[January] int, [February] int, [March] int);
INSERT INTO departmentStats
([Facility],[Department],[Stat], [January], [February], [March])
VALUES
('f1','d1','Stat1', 90, 40, 60),
('f1','d1','Stat2', 30, 20, 10);
select 'Original Data' as src,* from departmentStats
IF OBJECT_ID('tempdb.dbo.#pivotstats', 'U') IS NOT NULL
DROP TABLE #pivotstats;
CREATE TABLE #pivotstats
(facility varchar(50),department varchar(50),stat VARCHAR(50),period
varchar(50),value int)
insert into #pivotstats
select facility,department,Stat, period, value
from departmentStats
unpivot
(
value
for Period in (January, February, March)
) u
select 'Original after unpivot' as src,* from #pivotstats
IF OBJECT_ID('dbo.departmentStats2', 'U') IS NOT NULL
DROP TABLE dbo.departmentStats2;
CREATE TABLE departmentStats2
([Facility] varchar(50), [Department] varchar(50),[Period] varchar(20),
[stat1] int, [stat2] int, [Sum] numeric(10,2),[Division] numeric(10,2));
insert into departmentStats2
select
facility
,department
,period
,Stat1
,stat2
,Stat1+Stat2 as [sum]
,case when Stat2<>0 then cast(Stat1/Stat2 AS numeric) else 0.0 end as
division
from #pivotstats
Pivot(
sum(value)
for Stat in ([Stat1],[Stat2])
)
as x
select 'Added formula columns' as src,* from departmentStats2
select 'Union and repivot' as src,* from (
select
facility
,department
,Period
,'Stat1' as Stat
,stat1 as value
from departmentStats2
union all
select
facility
,department
,Period
,'Stat2' as Stat
,stat2 as value
from departmentStats2
union all
select
facility
,department
,Period
,'sum' as Stat
,[SUM] as value
from departmentStats2
union all
select
facility
,department
,Period
,'Division' as Stat
,[Division] as value
from departmentStats2
) as bunion
pivot(
sum(value)
for Period in (January, February, March)
)as P
I have only included three periods for the sample.
I have no doubt that I can generate SQL in this format at will.
I don't foresee any real issues considering my data source will be static (updated daily) and heavily indexed. I don't yet have a full data set ready to test but I will update the q when I do
The reason for the pivot olympics is that the final output is actually as spreadsheet which uses index/match to lookup the values for each stat by facility/department/name to produce graphs and charts
Opinions ?
Obvious flaws?
Is this better done in C# by generating a class at runtime?
Remember the idea is to provide the ability to add stats based on known stats with a minimum of effort.
Here is a better way to do the last step using cross apply instead of union all
select 'Union and repivot' as src,* from (
select
departmentStats2.Facility,
departmentStats2.Department,Departmentstats2.period,x.stat,x.value
from
departmentStats2
cross Apply(
values(Facility,Department,Period,'1_Sales',Sales),
(Facility,Department,Period,'2_Cost',Cost),
(Facility,Department,Period,'3_GrossProfit',GrossProfit),
(Facility,Department,Period,'4_GPP',GPP)
) as x(Facility,Department,Period,Stat,[Value])
)as P
pivot(
sum(value)
for Period in (January, February, March)
)as P
order by Facility,department,Stat
from
http://sqlsunday.com/2014/03/02/unpivot-using-cross-apply/

Related

Is there a way to Union Tables if one table is not yet created?

A new table is created monthly which holds all the data for transactions within that month. For certain reports, I must union all of the tables thru a SQL script as the new tables become available. I would like to be pre-emptive and call out a table (standardized naming convention is used i.e. Table2023M9, Table20223M10) that has not yet been created in the DB.
SELECT*
FROM
(SELECT [company]
,[Account]
,[2023M9] as Amount
,'2023M9' as [Period]
From Table2023M9
Union all
SELECT [company]
,[Account]
,[2023M10] as Amount
,'2023M10' as [Period]
From Table2023M10 /*Table doesn't exist yet.*/
UNPIVOT (Amount FOR [Period] in (
[2023M9]
,[2023M10] as 2023Table)
As Transactions
Is there a way to do this?
I tried the following:
Declare #Temp TABLE( Entity nvarchar (100) )
if ('Table2023M9') is not null
insert into #Temp
select Company
from Table23M1;
union all
if ('Table2023M10') is not null
insert into #Temp
select Company
from Table2023M10;
select * from #Temp
However, I still get the error message that the table doesn't exist.

is possible to unpivot a sql server table using headers as a column and values as another column?

I have a table like this:
TableName
dates
ModelName
BaseUnitPerPallet
pallet
Calendar
June
Null
4
1
Country
June
Null
2
6
Product
June
DOWNSTREAM
Null
8
ProductBOM
June
DOWNSTREAM
9
9
and I want a table like this:
Columns
values
TableName
Calendar
TableName
Country
TableName
Product
TableName
ProductBOM
where columns field is the headers of the previous table, and values are the values in an unpivot way.
I have been trying without success the unpivot logic:
SELECT Columns, Values
FROM
(
SELECT TableName, dates, ModelName, BaseUnitPerPallet, pallet
FROM Database
as source_query
)
UNPIVOT
(
Values FOR Columns IN ( TableName, dates, ModelName, BaseUnitPerPallet, pallete)
)
as pivot_results
any advice or guidance would be great.
Additionally, any resource to do this dinamic? and apply the logic without write the column names?
Thanks in advance¡

I'd recommend using APPLY to unpivot your table
Unpivot using APPLY
DROP TABLE IF EXISTS #YourTable
CREATE TABLE #YourTable (
ID INT IDENTITY(1,1) PRIMARY KEY
,TableName VARCHAR(100)
,Dates Varchar(25)
,ModelName VARCHAR(100)
,BaseUnitPerPallet TINYINT
,Pallet TINYINT
)
INSERT INTO #YourTable
VALUES
('Calendar','June',NULL,4,1)
,('Country','June',NULL,2,6)
,('Product','June','DOWNSTREAM',NULL,8)
,('ProductBOM','June','DOWNSTREAM',9,9)
SELECT A.ID,B.*
FROM #YourTable AS A
CROSS APPLY
(VALUES
('TableName',A.TableName)
,('Dates',A.Dates)
,('ModelName',A.ModelName)
,('BaseUnitPerPallet',CAST(A.BaseUnitPerPallet AS Varchar(100)))
,('Pallet',CAST(A.Pallet AS Varchar(100)))
) AS B(ColumnName,Val)
--WHERE B.Val IS NOT NULL /*Optional in case you want to ignore NULLs*/
ORDER BY A.ID,B.ColumnName

SCD Type 2 - Handling Intraday changes?

I have a merge statement that builds my SCD type 2 table each night. This table must house all historical changes made in the source system and create a new row with the date from/date to columns populated along with the "islatest" flag. I have come across an issue today that I am not really sure how to handle.
There looks to have been multiple changes to the source table within a 24 hour period.
ID Code PAN EnterDate Cost Created
16155 1012401593331 ENRD 2015-11-05 7706.3 2021-08-17 14:34
16155 1012401593331 ENRD 2015-11-05 8584.4 2021-08-17 16:33
I use a basic merge statement to identify my changes however what would be the best approach to ensure all changes get picked up correctly? The above is giving me an error as it's trying to insert/update multiple rows with the same value
DECLARE #DateNow DATETIME = Getdate()
IF Object_id('tempdb..#meteridinsert') IS NOT NULL
DROP TABLE #meteridinsert;
CREATE TABLE #meteridinsert
(
meterid INT,
change VARCHAR(10)
);
MERGE
INTO [DIM].[Meters] AS target
using stg_meters AS source
ON target.[ID] = source.[ID]
AND target.latest=1
WHEN matched THEN
UPDATE
SET target.islatest = 0,
target.todate = #Datenow
WHEN NOT matched BY target THEN
INSERT
(
id,
code,
pan,
enterdate,
cost,
created,
[FromDate] ,
[ToDate] ,
[IsLatest]
)
VALUES
(
source.id,
source.code ,
source.pan ,
source.enterdate ,
source.cost ,
source.created ,
#Datenow ,
NULL ,
1
)
output source.id,
$action
INTO #meteridinsert;INSERT INTO [DIM].[Meters]
(
[id] ,
[code] ,
[pan] ,
[enterdate] ,
[cost] ,
[created] ,
[FromDate] ,
[ToDate] ,
[IsLatest]
)
SELECT ([id] ,[code] ,[pan] ,[enterdate] ,[cost] ,[created] , #DateNow ,NULL ,1 FROM stg_meters a
INNER JOIN #meteridinsert cid
ON a.id = cid.meterid
AND cid.change = 'UPDATE'

Maybe you can do it using merge statement, but I would prefer to use typicall update and insert approach in order to make it easier to understand (also I am not sure that merge allows you to use the same source record for update and insert...)
First of all I create the table dimscd2 to represent your dimension table
create table dimscd2
(naturalkey int, descr varchar(100), startdate datetime, enddate datetime)
And then I insert some records...
insert into dimscd2 values
(1,'A','2019-01-12 00:00:00.000', '2020-01-01 00:00:00.000'),
(1,'B','2020-01-01 00:00:00.000', NULL)
As you can see, the "current" is the one with descr='B' because it has an enddate NULL (I do recommend you to use surrogate keys for each record... This is just an incremental key for each record of your dimension, and the fact table must be linked with this surrogate key in order to reflect the status of the fact in the moment when happened).
Then, I have created some dummy data to represent the source data with the changes for the same natural key
-- new data (src_data)
select 1 as naturalkey,'C' as descr, cast('2020-01-02 00:00:00.000' as datetime) as dt into src_data
union all
select 1 as naturalkey,'D' as descr, cast('2020-01-03 00:00:00.000' as datetime) as dt
After that, I have created a temp table (##tmp) with this query to set the enddate for each record:
-- tmp table
select naturalkey, descr, dt,
lead(dt,1,0) over (partition by naturalkey order by dt) enddate,
row_number() over (partition by naturalkey order by dt) rn
into ##tmp
from src_data
The LEAD function takes the next start date for the same natural key, ordered by date (dt).
The ROW_NUMBER marks with 1 the oldest record in the source data for the natural key in the dimension.
Then, I proceed to close the "current" record using update
update d
set enddate = t.dt
from dimscd2 d
join ##tmp t
on d.naturalkey = t.naturalkey
and d.enddate is null
and t.rn = 1
And finally I add the new source data to the dimension with insert
insert into dimscd2
select naturalkey, descr, dt,
case enddate when '1900-00-00' then null else enddate end
from ##tmp
Final result is obtained with the query:
select * from dimscd2
You can test on this db<>fiddle

How to rebuild a record from a change log

I have a table of changes to an entity. I am trying to rebuild the data from the changes.
This is my Changes table:
CREATE TABLE Changes
(
Id IDENTITY,
RecordId INT,
Field INT,
Val VARCHAR(MAX),
DateOfChange DATETIME
);
Field column is a reference to what field changed, Val is the new value, RecordId is the Id of the record that changed. Ideally the Record table would contain the latest values but I am not that lucky. There are 10 different fields that changes are tracked, mostly dates but some other types are thrown in there.
This is my Record table:
CREATE TABLE Records
(
Id IDENTITY,
AUserGeneratedIdentifer VARCHAR(12)
)
I'd like to have a view to query by the rolled up values.
SELECT
AUserGeneratedIdentitfier, DateOpened, DateClosed, etc
FROM
RecordView
WHERE
AUserGeneratedIdentitfier = 'something'
I am trying to implemented it with CTEs but I am wondering if this is the correct way. I am using a CTE per field I am trying to get to.
WITH DateOpened AS
(
SELECT
RecordId, Val,
ROW_NUMBER() OVER (PARTITION BY RecordId ORDER BY DateOfChanged DESC) Rank
FROM
Changes
WHERE
FieldId = #DateOpenedId
) --- ... Repeat for every field
SELECT (my fields)
FROM Records
INNER JOIN <all ctes> on Record Id
But this method feels wrong to me, possibly due to my lack of SQL experience. Is there a better way that I am missing here? What are the performance implications of having multiple CTEs on the same table and joining with them?
Please excuse the hastily thrown together pseudo code, I hope it illustrates my problem accurately

I think you should be able to use a single ROW_NUMBER and pivot it as below.
DECLARE #Records TABLE (Id INT IDENTITY, AUserGeneratedIdentifer VARCHAR(12))
DECLARE #Changes TABLE (Id INT IDENTITY, RecordId INT, Field INT, Val VARCHAR(MAX), DateOfChange DATETIME);
INSERT INTO #Records (AUserGeneratedIdentifer)
VALUES ('qwer'), ('asdf')
INSERT INTO #Changes (RecordId, Field, Val, DateOfChange)
VALUES (1, 1, 'foo', '2021-01-01' ),
(2, 1, 'fooz', '2021-01-01' ),
(2, 2, 'barz', '2021-01-01' ),
(1, 2, 'bar', '2021-01-01' ),
(1, 1, 'foo2', '2021-01-02' ),
(2, 2, 'barz2', '2021-01-02' )
SELECT piv.RecordId, piv.[1], piv.[2]
FROM (
SELECT C.RecordId, C.Field, C.Val, ROW_NUMBER() OVER (PARTITION BY C.RecordId, C.Field ORDER BY C.DateOfChange DESC) RowNum
FROM #Changes C
) sub
PIVOT (
MAX(Val)
FOR Field IN ([1], [2])
) piv
WHERE piv.RowNum = 1

Efficient way of storing date ranges

I need to store simple data - suppose I have some products with codes as a primary key, some properties and validity ranges. So data could look like this:
Products
code value begin_date end_date
10905 13 2005-01-01 2016-12-31
10905 11 2017-01-01 null
Those ranges are not overlapping, so on every date I have a list of unique products and their properties. So to ease the use of it I've created the function:
create function dbo.f_Products
(
#date date
)
returns table
as
return (
select
from dbo.Products as p
where
#date >= p.begin_date and
#date <= p.end_date
)
This is how I'm going to use it:
select
*
from <some table with product codes> as t
left join dbo.f_Products(#date) as p on
p.code = t.product_code
This is all fine, but how I can let optimizer know that those rows are unique to have better execution plan?
I did some googling, and found a couple of really nice articles for DDL which prevents storing overlapping ranges in the table:
Self-maintaining, Contiguous Effective Dates in Temporal Tables
Storing intervals of time with no overlaps
But even if I try those constraint I see that optimizer cannot understand that resulting recordset will return unique codes.
What I'd like to have is certain approach which gives me basically the same performance as if I stored those products list on certain date and selected it with date = #date.
I know that some RDMBS (like PostgreSQL) have special data types for this (Range Types). But SQL Server doesn't have anything like this.
Am I missing something or there're no way to do this properly in SQL Server?

You can create an indexed view that contains a row for each code/date in the range.
ProductDate (indexed view)
code value date
10905 13 2005-01-01
10905 13 2005-01-02
10905 13 ...
10905 13 2016-12-31
10905 11 2017-01-01
10905 11 2017-01-02
10905 11 ...
10905 11 Today
Like this:
create schema digits
go
create table digits.Ones (digit tinyint not null primary key)
insert into digits.Ones (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.Tens (digit tinyint not null primary key)
insert into digits.Tens (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.Hundreds (digit tinyint not null primary key)
insert into digits.Hundreds (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.Thousands (digit tinyint not null primary key)
insert into digits.Thousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.TenThousands (digit tinyint not null primary key)
insert into digits.TenThousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
go
create schema info
go
create table info.Products (code int not null, [value] int not null, begin_date date not null, end_date date null, primary key (code, begin_date))
insert into info.Products (code, [value], begin_date, end_date) values
(10905, 13, '2005-01-01', '2016-12-31'),
(10905, 11, '2017-01-01', null)
create table info.DateRange ([begin] date not null, [end] date not null, [singleton] bit not null default(1) check ([singleton] = 1))
insert into info.DateRange ([begin], [end]) values ((select min(begin_date) from info.Products), getdate())
go
create view info.ProductDate with schemabinding
as
select
p.code,
p.value,
dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) as [date]
from
info.DateRange as dr
cross join
digits.Ones as ones
cross join
digits.Tens as tens
cross join
digits.Hundreds as huns
cross join
digits.Thousands as thos
cross join
digits.TenThousands as tthos
join
info.Products as p on
dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) between p.begin_date and isnull(p.end_date, datefromparts(9999, 12, 31))
go
create unique clustered index idx_ProductDate on info.ProductDate ([date], code)
go
select *
from info.ProductDate with (noexpand)
where
date = '2014-01-01'
drop view info.ProductDate
drop table info.Products
drop table info.DateRange
drop table digits.Ones
drop table digits.Tens
drop table digits.Hundreds
drop table digits.Thousands
drop table digits.TenThousands
drop schema digits
drop schema info
go

A solution without gaps might be this:
DECLARE #tbl TABLE(ID INT IDENTITY,[start_date] DATE);
INSERT INTO #tbl VALUES({d'2016-10-01'}),({d'2016-09-01'}),({d'2016-08-01'}),({d'2016-07-01'}),({d'2016-06-01'});
SELECT * FROM #tbl;
DECLARE #DateFilter DATE={d'2016-08-13'};
SELECT TOP 1 *
FROM #tbl
WHERE [start_date]<=#DateFilter
ORDER BY [start_date] DESC
Important: Be sure that there is an (unique) index on start_date
UPDATE: for different products
DECLARE #tbl TABLE(ID INT IDENTITY,ProductID INT,[start_date] DATE);
INSERT INTO #tbl VALUES
--product 1
(1,{d'2016-10-01'}),(1,{d'2016-09-01'}),(1,{d'2016-08-01'}),(1,{d'2016-07-01'}),(1,{d'2016-06-01'})
--product 1
,(2,{d'2016-10-17'}),(2,{d'2016-09-16'}),(2,{d'2016-08-15'}),(2,{d'2016-07-10'}),(2,{d'2016-06-11'});
DECLARE #DateFilter DATE={d'2016-08-13'};
WITH PartitionedCount AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY [start_date] DESC) AS Nr
,*
FROM #tbl
WHERE [start_date]<=#DateFilter
)
SELECT *
FROM PartitionedCount
WHERE Nr=1

First you need to create a unique clustered index for (begin_date, end_date, code)
Then SQL engine will be able to do INDEX SEEK.
Additionally, you can also try to create a view for dbo.Products table to join that table with pre-populated dbo.Dates table.
select p.code, p.val, p.begin_date, p.end_date, d.[date]
from dbo.Product as p
inner join dbo.dates d on p.begin_date <= d.[date] and d.[date] <= p.end_date
Then in your function, you use that view as "where #date = view.date". The result can be either better or slightly worse... it depends on the actual data.
You also can try to make that view indexed (depends on how often it is being updated).
Alternatively, you can have better performance if you populate dbo.Products table for every date in the [begin_date] .. [end_date] range.

Approach with ROW_NUMBER scans the whole Products table once. It is the best method if you have a lot of product codes in the Products table and few validity ranges for each code.
WITH
CTE_rn
AS
(
SELECT
code
,value
,ROW_NUMBER() OVER (PARTITION BY code ORDER BY begin_date DESC) AS rn
FROM Products
WHERE begin_date <= #date
)
SELECT *
FROM
<some table with product codes> as t
LEFT JOIN CTE_rn ON CTE_rn.code = t.product_code AND CTE_rn.rn = 1
;
If you have few product codes and a lot of validity ranges for each code in the Products table, then it is better to seek the Products table for each code using OUTER APPLY.
SELECT *
FROM
<some table with product codes> as t
OUTER APPLY
(
SELECT TOP(1)
Products.value
FROM Products
WHERE
Products.code = t.product_code
AND Products.begin_date <= #date
ORDER BY Products.begin_date DESC
) AS A
;
Both variants need unique index on (code, begin_date DESC) include (value).
Note how the queries don't even look at end_date, because they assume that intervals don't have gaps. They will work in SQL Server 2008.

EDIT: My original answer was using an INNER JOIN, but the questioner wanted a LEFT JOIN.
CREATE TABLE Products
(
[Code] INT NOT NULL
, [Value] VARCHAR(30) NOT NULL
, Begin_Date DATETIME NOT NULL
, End_Date DATETIME NULL
)
/*
Products
code value begin_date end_date
10905 13 2005-01-01 2016-12-31
10905 11 2017-01-01 null
*/
INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 13, '2005-01-01', '2016-12-31')
INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 11, '2017-01-01', NULL)
CREATE NONCLUSTERED INDEX SK_ProductDate ON Products ([Code], Begin_Date, End_Date) INCLUDE ([Value])
CREATE TABLE SomeTableWithProductCodes
(
[CODE] INT NOT NULL
)
INSERT INTO SomeTableWithProductCodes ([Code]) VALUES (10905)
Here is a prototypical query, with a date predicate. Note that there are more optimal ways to do this in a bulletproof fashion, using a "less than" operator on the upper bound, but that's a different discussion.
SELECT
P.[Code]
, P.[Value]
, P.[Begin_Date]
, P.[End_Date]
FROM
SomeTableWithProductCodes ST
LEFT JOIN Products AS P ON
ST.[Code] = P.[Code]
AND '2016-06-30' BETWEEN P.[Begin_Date] AND ISNULL(P.[End_Date], '9999-12-31')
This query will perform an Index Seek on the Product table.
Here is a SQL Fiddle: SQL Fiddle - Products and Dates

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Dynamically generated sql pivot with calculated columns general methodology - sql

Related

Is there a way to Union Tables if one table is not yet created?

is possible to unpivot a sql server table using headers as a column and values as another column?

SCD Type 2 - Handling Intraday changes?

How to rebuild a record from a change log

Efficient way of storing date ranges

Categories

Resources