SQL Server advanced Pivot grouped rows to columns - sql

I need to Pivot/rotate data in rows into columns - but a little different from most examples I've seen.
We have customers that will buy things in sets (think a pizza ingredient seller... people will always buy cheese, dough, and sauce; optionally some will buy toppings, but we don't care about that).
What I need to do is sort this row data, by order date into columns. Below are two scripts to fill temp input and temp output table to show what I'm trying to achieve.
SQL Server 2008
CREATE table #myInput
(CustomerID Varchar(10), OrderDate varchar(10), Item varchar(13), ItemColor varchar(20));
CREATE table #myOUTPUT
(
CustomerID Varchar(10),
OrderDate_1 varchar(10),
PartA_1 varchar(20),
PartB_1 varchar(20),
PartC_1 varchar(20),
OrderDate_2 varchar(10),
PartA_2 varchar(20),
PartB_2 varchar(20),
PartC_2 varchar(20),
OrderDate_3 varchar(10),
PartA_3 varchar(20),
PartB_3 varchar(20),
PartC_3 varchar(20)
)
INSERT INTO #myInput
(CustomerID, OrderDate, Item, ItemColor)
VALUES
('abc','5/1/2001','PartA','Silver'),
('abc','5/1/2001','PartB','Red'),
('abc','5/1/2001','PartC','Green'),
('abc','5/20/2002','PartA','Purple'),
('abc','5/20/2002','PartB','Yellow'),
('abc','5/20/2002','PartC','Black'),
('abc','10/1/2002','PartA','Red'),
('abc','10/1/2002','PartB','Silver'),
('abc','10/1/2002','PartC','Blue'),
('def','4/1/2000','PartA','Green'),
('def','4/1/2000','PartB','Red'),
('def','4/1/2000','PartC','White'),
('jkl','5/1/2001','PartA','Black'),
('jkl','5/1/2001','PartB','Yellow'),
('jkl','5/1/2001','PartC','Silver'),
('jkl','10/10/2001','PartA','Green'),
('jkl','10/10/2001','PartB','Black'),
('jkl','10/10/2001','PartC','Silver')
;
And the result:
insert into #myOUTPUT
(CustomerID,OrderDate_1,PartA_1,PartB_1,PartC_1,OrderDate_2,PartA_2,PartB_2,PartC_2,OrderDate_3,PartA_3,PartB_3,PartC_3)
VALUES
('abc','5/1/2001','Silver','Red','Green','5/20/2002','Purple','Yellow','Black','10/1/2002','Red','Silver','Blue'),
('def','4/1/2000','Green','Red','White','','','','','','','',''),
('jkl','5/1/2001','Black','Yellow','Silver','10/10/2001','Green','Black','Silver','','','','');
select * from #myInput
select * from #myOUTPUT
We're looking for 17 or less orders. At least at the current moment, we don't have more than 1 dozen orders for any one customer.
I've tried a couple of different things- pivot doesn't seem to produce the output i'm looking for. I was thinking perhaps dense_rank to determine how many columns we'll need at first, and then insert into a cursor handle via cte? But i'm unable to get exactly the output needed. Note that the source "date" field is stored in DB as varchar. Also, there's no order number - so uninqueness is only from customer id, and date.

I would approach this using conditional aggregation. If I understand correctly:
select customer,
max(case when seqnum_co = 1 then orderdate end) as orderdate_1,
max(case when seqnum_co = 1 and item = 'Part_A' then itemcolor end) as parta_1,
max(case when seqnum_co = 1 and item = 'Part_B' then itemcolor end) as partb_1,
max(case when seqnum_co = 1 and item = 'Part_C' then itemcolor end) as partc_1,
max(case when seqnum_co = 2 then orderdate end) as orderdate_2,
max(case when seqnum_co = 2 and item = 'Part_A' then itemcolor end) as parta_2,
max(case when seqnum_co = 2 and item = 'Part_B' then itemcolor end) as partb_2,
max(case when seqnum_co = 2 and item = 'Part_C' then itemcolor end) as partc_2,
. . .
from (select i.*,
dense_rank() over (partition by i.customerid order by orderdate) as seqnum_co
from #myinput
) i
group by customer;

Related

Optimized SQL select for selecting across multiple tables

I have the following tables (testing with SQLite).
create table group_header (id int, minCount int, maxCount int);
create table group_items(id int, group_id int, product varchar(10), cons varchar(10));
The group_header has the following records:
id|minCount|maxCount
1|2|2
The group_items has the following records:
id|group_id|product|cons
1|1|A|optional
2|1|B|optional
3|1|C|required
4|1|D|optional
The SQL query should return group_header id that satisfies the following conditions:
Input will be one or more products (e.g. 'A' and 'C')
SQL query should check for the following criteria:
the minCount should be fulfilled for the optional products (i.e. count of optional items in the the input should be >= minCount value)
all the required products in the group_item table should exist in the input product list
Brute force way this can be done by doing the following:
select * from group_items where group_id = 1 and product in ('A') or cons='required';
and then evaluating the minCount and required separately outside SQL. Any suggestions if this can be in a more optimized way using SQL.
If I understand correctly, this is aggregation with a having clause.
select gh.id
from group_header gh left join
group_items gi
on gh.id = gi.group_id
group by gh.id
having sum( case when gi.cons = 'required' and gi.product in ('A') then 1 else 0 end) = 1 and
sum( case when gi.cons = 'optional' then 1 else 0 end) >= gh.mincount;

Converting multiple rows into one row by ID

What I am trying to achieve is group them by id and create a column for the date as well as data.
The background of the dataset are it is lab result taken by participant and some test are not able to be taken on same day due to fasting restrictions n etc. The database I am using is SQL Server.
Below are my DataSet as well as the desired output.
Sample dataset:
create table Sample
(
Id int,
LAB_DATE date,
A_CRE_1 varchar(100),
B_GLUH_1 varchar(100),
C_LDL_1 varchar(100),
D_TG_1 varchar(100),
E_CHOL_1 varchar(100),
F_HDL_1 varchar(100),
G_CRPH_1 varchar(100),
H_HBA1C_1 varchar(100),
I_GLU120_1 varchar(100),
J_GLUF_1 varchar(100),
K_HCR_1 varchar(100)
)
insert into Sample(Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1)
values (01, '2017-11-21', '74', '6.4', '2.04', '4.17', '1.64', '6.1', '2.54')
insert into sample (Id, LAB_DATE, I_GLU120_1)
values (01, '2017-11-22','8.8')
insert into sample (Id, LAB_DATE, D_TG_1)
values (01, '2017-11-23','0.56')
insert into sample (Id,LAB_DATE,A_CRE_1,B_GLUH_1,C_LDL_1,D_TG_1,E_CHOL_1,F_HDL_1,K_HCR_1)
values (2,'2018-10-02','57','8.91','2.43','1.28','3.99','1.25','3.19')
insert into sample (Id,LAB_DATE,H_HBA1C_1)
values (2,'2018-10-03','8.6')
insert into sample (Id,LAB_DATE,J_GLUF_1)
values (2,'2018-10-04','7.8')
insert into sample (Id,LAB_DATE,A_CRE_1,B_GLUH_1,C_LDL_1,D_TG_1,E_CHOL_1,F_HDL_1,G_CRPH_1,H_HBA1C_1,K_HCR_1)
values (3,'2016-10-01','100','6.13','3.28','0.94','5.07','1.19','0.27','5.8','4.26')
Desired output:
ID|LAB_DATE|A_CRE_1|B_GLUH_1|C_LDL_1|Date_TG_1|D_TG_1|E_CHOL_1|F_HDL_1|G_CRPH_1|H_HBA1C_1|Date_GLU120_1|I_GLU120_1|J_GLUF_1|K_HCR_1
1|2017-11-21|74|6.4|2.04|2017-11-23|0.56|4.17|1.64|||6.1|2017-11-22|8.8|||2.54
2|02/10/2018|57|8.91|2.43||1.28|3.99|1.25||03/10/2018|8.6|||04/10/2018|7.8|3.19
3|01/10/2016|100|6.13|3.28||0.94|5.07|1.19|0.27||5.8|||||4.26
Here's a solution (that cannot cope with multiple rows of the same id/sample type - you haven't said what to do with those)
select * from
(select Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1 from sample) s1
INNER JOIN
(select Id, LAB_DATE as glu120date, I_GLU120_1 from sample) s2
ON s1.id = s2.id
(select Id, LAB_DATE as dtgdate, D_TG_1 from sample) s3
ON s1.id = s3.id
Hopefully you get the idea with this pattern; if you have other sample types with their own dates, break them out of s1 and into their own subquery in a similar way (eg make an s4 for e_chol_1, s5 for k_hcr_1 etc). Note that if any sample type is missing it will cause the whole row to disappear from the results. If this is not desired and you accept NULL for missing samples, use LEFT JOIN instead of INNER
If there will be multiple samples for patient 01 and you only want the latest, the pattern becomes:
select * from
(select Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1,
row_number() over(partition by id order by lab_date desc) rn
from sample) s1
INNER JOIN
(select Id, LAB_DATE as glu120date, I_GLU120_1,
row_number() over(partition by id order by lab_date desc) rn
from sample) s2
ON s1.id = s2.id and s1.rn = s2.rn
WHERE
s1.rn = 1
Note the addition of row_number() over(partition by id order by lab_date desc) rn - this establishes an incrementing counter in descending date order(latest record = 1, older = 2 ...) that restarts from 1 for every different id. We join on it too then say where rn = 1 to pick only the latest records for each sample type
As #Ben suggested, you can use group by id and take min for all column like below one.
DECLARE #Sample as table (
Id int,
LAB_DATE date,
A_CRE_1 varchar(100),
B_GLUH_1 varchar(100),
C_LDL_1 varchar(100),
D_TG_1 varchar(100),
E_CHOL_1 varchar(100),
F_HDL_1 varchar(100),
G_CRPH_1 varchar(100),
H_HBA1C_1 varchar(100),
I_GLU120_1 varchar(100),
J_GLUF_1 varchar(100),
K_HCR_1 varchar(100))
insert into #Sample(Id, LAB_DATE,A_CRE_1,
B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1)
values (01,'2017-11-21','74','6.4','2.04','4.17','1.64','6.1','2.54')
insert into #Sample (Id, LAB_DATE, I_GLU120_1)
values (01, '2017-11-22','8.8')
insert into #Sample (Id, LAB_DATE, D_TG_1)
values (01, '2017-11-23','0.56')
SELECT s.Id
, MIN(s.LAB_DATE) AS LAB_DATE
, MIN(s.A_CRE_1) AS A_CRE_1
, MIN(s.B_GLUH_1) AS B_GLUH_1
, MIN(s.C_LDL_1) AS C_LDL_1
, MIN(s.D_TG_1) AS D_TG_1
, MIN(s.E_CHOL_1) AS E_CHOL_1
, MIN(s.F_HDL_1) AS F_HDL_1
, MIN(s.G_CRPH_1) AS G_CRPH_1
, MIN(s.H_HBA1C_1) AS H_HBA1C_1
, MIN(s.I_GLU120_1) AS I_GLU120_1
, MIN(s.J_GLUF_1) AS J_GLUF_1
, MIN(s.K_HCR_1) AS K_HCR_1
FROM #Sample AS s
GROUP BY s.Id
You can also check the SQL Server STUFF function. Can take help from the below link
https://www.mssqltips.com/sqlservertip/2914/rolling-up-multiple-rows-into-a-single-row-and-column-for-sql-server-data/
Following on from my comments about presenting the original data, here's what I think you should do (taking the query you commented)
SELECT
ID,
MAX(CASE WHEN TestID='1' THEN Results END) [Test_1],
MAX(CASE WHEN TestID='2' THEN Results END) [Test_2],
MAX(CASE WHEN TestID='1' THEN Result_Date_Time END) Test12Date,
MAX(CASE WHEN TestID='3' THEN Results END) [Test_3],
MAX(CASE WHEN TestID='3' THEN Result_Date_Time END) Test3Date
FROM [tbBloodSample]
GROUP BY ID
ORDER BY ID
Notes: If TestID is an int, don't use strings like '1' in your query, use ints. You don't need an ELSE NULL in a case- null is the default if the when didn't work out
Here is a query pattern. Test1 and 2 are always done on the same day, hence why I only pivot their date once. Test 3 might be done later, might be same, this means the dates in test12date and test3date might be same, might be different
Convert the strings to dates after you do the pivot, to reduce the number of conversions

Query to determine cumulative changes to records

Given the following table containing the example rows, I’m looking for a query to give me the aggregate results of changes made to the same record. All changes are made against a base record in another table (results table), so the contents of the results table are not cumulative.
Base Records (from which all changes are made)
Edited Columns highlighted
I’m looking for a query that would give me the cumulative changes (in order by date). This would be the resulting rows:
Any help appreciated!
UPDATE---------------
Let me offer some clarification. The records being edited exist in one table, let's call that [dbo].[Base]. When a person updates a record from [dbo].[Base], his updates go into [dbo].[Updates]. Therefore, a person is always editing from the base table.
At some point, let's say once a day, we need to calculate the sum of changes with the following rule:
For any given record, determine the latest change for each column and take the latest change. If no change was made to a column, take the value from [dbo].[Base]. So, one way of looking at the [dbo].[Updates] table would be to see only the changed columns.
Please let's not discuss the merits of this approach, I realize it's strange. I just need to figure out how to determine the final state of each record.
Thanks!
This is dirty, but you can give this a shot (test here: https://rextester.com/MKSBU15593)
I use a CTE to do an initial CROSS JOIN of the Base and Update tables and then a second to filter it to only the rows where the IDs match. From there I use FIRST_VALUE() for each column, partitioned by the ID value and ordered by a CASE expression (if the Base column value matches the Update column value then 1 else 0) and the Datemodified column to get the most recent version of the each column.
It spits out
CREATE TABLE Base
(
ID INT
,FNAME VARCHAR(100)
,LNAME VARCHAR(100)
,ADDRESS VARCHAR(100)
,RATING INT
,[TYPE] VARCHAR(5)
,SUBTYPE VARCHAR(5)
);
INSERT INTO dbo.Base
VALUES
( 100,'John','Doe','123 First',3,'Emp','W2'),
( 200,'Jane','Smith','Wacker Dr.',2,'Emp','W2');
CREATE TABLE Updates
(
ID INT
,DATEMODIFIED DATE
,FNAME VARCHAR(100)
,LNAME VARCHAR(100)
,ADDRESS VARCHAR(100)
,RATING INT
,[TYPE] VARCHAR(5)
,SUBTYPE VARCHAR(5)
);
INSERT INTO dbo.Updates
VALUES
( 100,'1/15/2019','John','Doe','123 First St.',3,'Emp','W2'),
( 200,'1/15/2019','Jane','Smyth','Wacker Dr.',2,'Emp','W2'),
( 100,'1/17/2019','Johnny','Doe','123 First',3,'Emp','W2'),
( 200,'1/19/2019','Jane','Smith','2 Wacker Dr.',2,'Emp','W2'),
( 100,'1/20/2019','Jon','Doe','123 First',3,'Cont','W2');
WITH merged AS
(
SELECT b.ID AS IDOrigin
,'1/1/1900' AS DATEMODIFIEDOrigin
,b.FNAME AS FNAMEOrigin
,b.LNAME AS LNAMEOrigin
,b.ADDRESS AS ADDRESSOrigin
,b.RATING AS RATINGOrigin
,b.[TYPE] AS TYPEOrigin
,b.SUBTYPE AS SUBTYPEOrigin
,u.*
FROM base b
CROSS JOIN
dbo.Updates u
), filtered AS
(
SELECT *
FROM merged
WHERE IDOrigin = ID
)
SELECT distinct
ID
,FNAME = FIRST_VALUE(FNAME) OVER (PARTITION BY ID ORDER BY CASE WHEN FNAME = FNAMEOrigin THEN 1 ELSE 0 end, datemodified desc)
,LNAME = FIRST_VALUE(LNAME) OVER (PARTITION BY ID ORDER BY CASE WHEN LNAME = LNAMEOrigin THEN 1 ELSE 0 end, datemodified desc)
,ADDRESS = FIRST_VALUE(ADDRESS) OVER (PARTITION BY ID ORDER BY CASE WHEN ADDRESS = ADDRESSOrigin THEN 1 ELSE 0 end, datemodified desc)
,RATING = FIRST_VALUE(RATING) OVER (PARTITION BY ID ORDER BY CASE WHEN RATING = RATINGOrigin THEN 1 ELSE 0 end, datemodified desc)
,[TYPE] = FIRST_VALUE([TYPE]) OVER (PARTITION BY ID ORDER BY CASE WHEN [TYPE] = TYPEOrigin THEN 1 ELSE 0 end, datemodified desc)
,SUBTYPE = FIRST_VALUE(SUBTYPE) OVER (PARTITION BY ID ORDER BY CASE WHEN SUBTYPE = SUBTYPEOrigin THEN 1 ELSE 0 end, datemodified desc)
FROM filtered
Don't you just want the last record?
select e.*
from edited e
where e.datemodified = (select max(e2.datemodified)
from edited e2
where e2.id = e.id
);

UNPIVOT/PIVOT to combine multiple records into one

Currently, I have a table that tracks product inventory locations using the following table:
ProductID(PK) Location(PK) BIN1 BIN2
1000 EAST XRZY CCAB
1000 WEST AAAA NULL
I'm attempting to UNPIVOT the data into the following:
ProductID EAST_BIN1 EAST_BIN2 WEST_BIN1 WEST_BIN2
1000 XRZY CCAB AAAA NULL
Note that the location column has been PIVOTed into part of the BIN value field.
However, I've found that if I pivot the data, I'm unable to combine it with the BIN fields. PIVOT simply aggregates (using MAX) the BIN values into one field, while UNPIVOT just transforms the BIN* fields into rows.
What am I missing in terms of transforming the data above?
Any help would be greatly appreciated!
You can do it "by hand" as follows:
SELECT ProductID,
MAX(CASE WHEN Location='EAST' THEN BIN1 ELSE NULL END) AS EAST_BIN1,
MAX(CASE WHEN Location='EAST' THEN BIN2 ELSE NULL END) AS EAST_BIN2,
MAX(CASE WHEN Location='WEST' THEN BIN1 ELSE NULL END) AS WEST_BIN1,
MAX(CASE WHEN Location='WEST' THEN BIN2 ELSE NULL END) AS WEST_BIN2
FROM YOURTABLE
GROUP BY ProductID
This creates multiple rows (as your source table) with the results in the correct column, then smashes them down to one row with a group by. The correct value is taken using the aggregate function MAX.
To pivot you'll need to get the data into a single BIN column to pivot on. Consider this...
declare #t table (ProductId int, Location varchar(20), BIN1 varchar(4), BIN2 varchar(4));
insert into #t values(1000, 'EAST', 'XRZY', 'CCAB'), (1000, 'WEST', 'AAAA', null);
with cte as (
select ProductId, Col = Location + '_BIN1', BINVal = Bin1 from #t
union all
select ProductId, Col = Location + '_BIN2', BINVal = Bin2 from #t
)
select
*
from
cte
pivot (
max(BINVal)
for Col in ([EAST_BIN1], [EAST_BIN2], [WEST_BIN1], [WEST_BIN2])
) p

How to construct a list of products recursively in SQL?

Given the following type table,
create table products (
productid varchar(10),
make varchar(10),
age varchar(10),
colour varchar(10),
category1 varchar(10),
category2 varchar(10),
caregory3 varchar(10)
)
I would like to select a list of products (all the fields) but there should be only one product per make. The product that gets selected for each make should be determined by applying a set of rules in order. For example,
- If there exists a red product belonging to a particular make, select this product.
- For all make not yet represented, select a product that is less than two years old.
- For all make not yet represented, select a product that is category 1 value of x
- excreta
You can do this by maintaining some memory/temporary table of selected product, and insert into this table additional products only when it does not yet contain a potential product's make. By applying one insert select into per rule in order the memory/temporary table is filled up. Eg.
insert into #temp
select productid, make, age, colour, category1, category2, caregory3
from products a
where *rule applies*
and a.make not exists in (select make from #temp where a.make = #temp.make)
This does not seem very elegant however.
Note: This is a simplification of the actual problem. In the actual problem there can only be one valid product on each selection level.
I think this will work for you:
SELECT * FROM
(
SELECT *, RANK() OVER (PARTITION BY p.make ORDER BY
CASE
WHEN colour = 'red' THEN 1
WHEN age < '2' THEN 2
WHEN category1 = 'x' THEN 3
ELSE 4
END) as priority
FROM products p
) ranked
WHERE priority = 1
Cursors could be used for this, in conjunction with a table variable to store selected products:
DECLARE #selected_products table(
productid varchar(10),
make varchar(10),
age varchar(10),
colour varchar(10),
category1 varchar(10),
category2 varchar(10),
caregory3 varchar(10)
)
DECLARE #current_make varchar(10)
DECLARE make_cursor CURSOR
FOR SELECT DISTINCT make FROM products
OPEN make_cursor
FETCH NEXT FROM make_cursor INTO #current_make
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #red_product_count int
SELECT #red_product_count = COUNT(productid)
FROM products
WHERE colour = 'red'
IF #red_product_count > 0
BEGIN
SELECT TOP 1 productid, make, age, colour, category1, category2, category3
INTO #selected_products
FROM products
ORDER BY some_field
END
END
CLOSE make_cursor
DEALLOCATE make_cursor
The documentation for cursors can be found at: http://technet.microsoft.com/en-us/library/ms180169.aspx
You can then use additional flow-control logic to apply your rules, and even nest cursors to get the desired results. There are more examples in the documentation.