Counting and grouping challenge in a pivot table with T-SQL - sql

I have a pivot table that converts a vertical database design to a horizontal one:
The source table:
Id ParentId Property Value
---------------------------------
1 1 Date 01-09-2015
2 1 CountValue 2
3 1 TypeA Value1
4 1 TypeB Value2
5 1 TypeC Value2
6 2 Date 15-10-2015
7 2 CountValue 3
8 2 TypeA Value3
9 2 TypeB Value22
10 2 TypeC Value99
After pivoting this looks like:
ParentId Date CountValue TypeA TypeB TypeC
----------------------------------------------------------
1 01-09-2015 2 Value1 Value2 Value2
2 15-10-2015 3 Value3 Value22 Value99
Then, there's a look-up table for valid values in columns TypeA, TypeB and TypeC:
Id Name Value
-----------------
1 TypeA Value1
2 TypeA Value2
3 TypeA Value3
4 TypeB Value20
5 TypeB Value21
6 TypeB Value22
7 TypeC Value1
8 TypeC Value2
So, given the above structure I'm looking for a way to query the pivot table in a way that I'll get a count of all invalid values in TypeA, TypeB and TypeC where Date is a valid date and CountValue is not empty and greater than 0.
How can I achieve a result that is expected and outputted like below:
Count Column
--------------
0 TypeA
1 TypeB
1 TypeC
I've accomplished the result by creating three several queries and glue the results using UNION, but I think it should also be possible using the pivot table, but I'm unsure how. Can the desired result be realized using the pivot table?
Note: the database used is a SQL Server 2005 database.

I would not approach this a PIVOT, otherwise you have to pivot your data, then unpivot it to get the output required. Breaking it down step by step you can get your valid parent IDs using this:
SELECT t.ParentID
FROM #T AS t
GROUP BY t.ParentID
HAVING ISDATE(MAX(CASE WHEN t.Property = 'Date' THEN t.Value END)) = 1
AND MAX(CASE WHEN t.Property = 'CountValue' THEN CONVERT(INT, t.Value) END) > 0;
The two having clauses limit this to your criteria of having a valid date, and a CountValue that is greater than 0
The next step would be to find your invalid properties:
SELECT t.*
FROM #T AS t
WHERE NOT EXISTS
( SELECT 1
FROM #V AS v
WHERE v.Name = t.Property
AND v.Value = t.Value
);
This will include Date, and CountValue, and also won't include TypeA because all the properties are valid, so a bit more work is required, we must find the distinct properties we are interested in:
SELECT DISTINCT Name
FROM #V
Now we can combine this with the invalid properties to get the count, and with the valid parent IDs to get the desired result:
WITH ValidParents AS
( SELECT t.ParentID
FROM #T AS t
GROUP BY t.ParentID
HAVING ISDATE(MAX(CASE WHEN t.Property = 'Date' THEN t.Value END)) = 1
AND MAX(CASE WHEN t.Property = 'CountValue' THEN CONVERT(INT, t.Value) END) > 0
), InvalidProperties AS
( SELECT t.Property
FROM #T AS t
WHERE t.ParentID IN (SELECT vp.ParentID FROM ValidParents AS vp)
AND NOT EXISTS
( SELECT 1
FROM #V AS v
WHERE v.Name = t.Property
AND v.Value = t.Value
)
)
SELECT [Count] = COUNT(t.Property),
[Column] = v.Name
FROM (SELECT DISTINCT Name FROM #V) AS V
LEFT JOIN InvalidProperties AS t
ON t.Property = v.Name
GROUP BY v.Name;
Which gives:
Count Column
--------------
0 TypeA
1 TypeB
1 TypeC
SCHEMA FOR ABOVE QUERIES
For SQL Server 2008+. Apologies, I don't have SQL Server 2005 anymore, and forgot it doesn't support table value constructors.
CREATE TABLE #T (Id INT, ParentId INT, Property VARCHAR(10), Value VARCHAR(10));
INSERT #T (Id, ParentId, Property, Value)
VALUES
(1, 1, 'Date', '01-09-2015'), (2, 1, 'CountValue', '2'), (3, 1, 'TypeA', 'Value1'),
(4, 1, 'TypeB', 'Value2'), (5, 1, 'TypeC', 'Value2'), (6, 2, 'Date', '15-10-2015'),
(7, 2, 'CountValue', '3'), (8, 2, 'TypeA', 'Value3'), (9, 2, 'TypeB', 'Value22'),
(10, 2, 'TypeC', 'Value99');
CREATE TABLE #V (ID INT, Name VARCHAR(5), Value VARCHAR(7));
INSERT #V (Id, Name, Value)
VALUES
(1, 'TypeA', 'Value1'), (2, 'TypeA', 'Value2'), (3, 'TypeA', 'Value3'),
(4, 'TypeB', 'Value20'), (5, 'TypeB', 'Value21'), (6, 'TypeB', 'Value22'),
(7, 'TypeC', 'Value1'), (8, 'TypeC', 'Value2');

Final result without PIVOT:
SELECT [count] = SUM(CASE WHEN l.id IS NULL THEN 1 ELSE 0 END)
,t.Property
FROM #lookup l
RIGHT JOIN #tab t
ON t.Property = l.Name
AND t.Value = l.Value
WHERE t.Property LIKE 'Type%'
GROUP BY t.Property;
LiveDemo
Data:
CREATE TABLE #tab(
Id INTEGER NOT NULL PRIMARY KEY
,ParentId INTEGER NOT NULL
,Property VARCHAR(10) NOT NULL
,Value VARCHAR(10) NOT NULL
);
INSERT INTO #tab(Id,ParentId,Property,Value) VALUES (1,1,'Date','01-09-2015');
INSERT INTO #tab(Id,ParentId,Property,Value) VALUES (2,1,'CountValue','2');
INSERT INTO #tab(Id,ParentId,Property,Value) VALUES (3,1,'TypeA','Value1');
INSERT INTO #tab(Id,ParentId,Property,Value) VALUES (4,1,'TypeB','Value2');
INSERT INTO #tab(Id,ParentId,Property,Value) VALUES (5,1,'TypeC','Value2');
INSERT INTO #tab(Id,ParentId,Property,Value) VALUES (6,2,'Date','15-10-2015');
INSERT INTO #tab(Id,ParentId,Property,Value) VALUES (7,2,'CountValue','3');
INSERT INTO #tab(Id,ParentId,Property,Value) VALUES (8,2,'TypeA','Value3');
INSERT INTO #tab(Id,ParentId,Property,Value) VALUES (9,2,'TypeB','Value22');
INSERT INTO #tab(Id,ParentId,Property,Value) VALUES (10,2,'TypeC','Value99');
CREATE TABLE #lookup(
Id INTEGER NOT NULL PRIMARY KEY
,Name VARCHAR(5) NOT NULL
,Value VARCHAR(7) NOT NULL
);
INSERT INTO #lookup(Id,Name,Value) VALUES (1,'TypeA','Value1');
INSERT INTO #lookup(Id,Name,Value) VALUES (2,'TypeA','Value2');
INSERT INTO #lookup(Id,Name,Value) VALUES (3,'TypeA','Value3');
INSERT INTO #lookup(Id,Name,Value) VALUES (4,'TypeB','Value20');
INSERT INTO #lookup(Id,Name,Value) VALUES (5,'TypeB','Value21');
INSERT INTO #lookup(Id,Name,Value) VALUES (6,'TypeB','Value22');
INSERT INTO #lookup(Id,Name,Value) VALUES (7,'TypeC','Value1');
INSERT INTO #lookup(Id,Name,Value) VALUES (8,'TypeC','Value2');
EDIT:
Adding more criteria:
LiveDemo2
SELECT [count] = SUM(CASE WHEN l.id IS NULL THEN 1 ELSE 0 END)
,t.Property
FROM #lookup l
RIGHT JOIN #tab t
ON t.Property = l.Name
AND t.Value = l.Value
WHERE t.Property LIKE 'Type%'
AND t.ParentId IN (SELECT ParentId FROM #tab WHERE Property = 'Date' AND ISDATE(VALUE) = 1)
AND t.ParentID IN (SELECT ParentId FROM #tab WHERE Property = 'CountValue' AND Value > 0)
GROUP BY t.Property;

Related

SQL Group By specific column with nullable

Let's say I have this data in my table A:
group_id
type
active
1
A
true
1
B
false
1
C
true
2
null
false
3
B
true
3
C
false
I want to create a query which return the A row if exists (without the type column), else return a row with active false.
For this specific table the result will be:
group_id
active
1
true
2
false
3
false
How can I do this ?
I'm assuming I have to use a GROUP BY but I can't find a way to do it.
Thank you
This is a classic row_number problem, generate a row number based on your ordering criteria, then select just the first row in each grouping.
declare #MyTable table (group_id int, [type] char(1), active bit);
insert into #MyTable (group_id, [type], active)
values
(1, 'A', 1),
(1, 'B', 0),
(1, 'C', 1),
(2, null, 0),
(3, 'B', 1),
(3, 'C', 0);
with cte as (
select *
, row_number() over (
partition by group_id
order by case when [type] = 'A' then 1 else 0 end desc, active asc
) rn
from #MyTable
)
select group_id, active
from cte
where rn = 1
order by group_id;
Returns:
group_id
active
1
1
2
0
3
0
Note: Providing the DDL+DML as I have shown makes it much easier for people to assist.
This should do it. We select all the distinct group_ids and then join our table back to that. There is an ISNULL function that will insert the 'false' when 'A' type records are not found.
DECLARE #tableA TABLE (
group_id int
, [type] nchar(1)
, active nvarchar(10)
);
INSERT INTO #tableA (group_id, [type], active)
VALUES
(1, 'A', 'true')
, (1,'B','false')
, (1,'C', 'false')
, (2, null, 'false')
, (3, 'B', 'true')
, (3, 'C', 'false')
;
SELECT
gid.group_id
, ISNULL(a.active,'false') as active
FROM (SELECT DISTINCT group_id FROM #tableA) as gid
LEFT OUTER JOIN #tableA as a
ON a.group_id = gid.group_id
AND a.type = 'A'

SQL Server query to extract all rows

I've two database tables, one called "Headers" and one called "Rows".
The structure is:
Header: IDPK | Description
Row: IDPK | IDPK_Header | Item_ID | Qty
I need to do a query that says: "From a Header, IDPK find another header that have the same number of rows and the same item ID and quantity".
For example:
Header Rows
IDPK Description IDPK Item_ID Qty
1 'Test1' 1 'A' 10
1 'Test1' 2 'B' 20
2 'Test2' 3 'A' 10
2 'Test2' 4 'B' 20
3 'Test3' 5 'A' 5
3 'Test3' 6 'B' 20
4 'Test4' 7 'A' 10
Header Test1 match Test2 but not Test3 and Test4
The problem is that the number of rows must be exactly the same. I try with ALL operator but without luck.
How I can do the query with an eye for the performance? The two tables can be very huge (~500.000 records).
Assuming there are no duplicates:
with r as (
select r.*, count(*) over (partition by idpk_header) as num_items
from rows r
)
select r1.idpk_header, r2.idpk_header
from r r1 join
r r2
on r1.item_id = r1.item_id and r2.qty = r1.qty and r2.num_items = r1.num_items
group by r1.idpk_header, r2.idpk_header, r1.num_items
having count(*) = r1.num_items;
Basically, this does a self-join on the items, so you only get matches. The on validates that the two have the same number of items. And the having guarantees that all match.
Note: This version returns each match of the header to itself. That is a nice check. You can of course filter this out in the on or a where clause.
If you do have duplicate items, you can simply replace r with:
select idpk_header, item_id, sum(qty) as qty,
count(*) over (partition by idpk_header) as num_items
from rows r
group by idpk_header, item_id;
I woul suggest using a forxml query in order to create the list of items per IDPK. Next I would search for matching item lists and quantities. See following example:
DECLARE #Headers TABLE(
IDPK INT,
Description NVARCHAR(100)
)
DECLARE #Rows TABLE(
IDPK INT,
ITEMID NVARCHAR(1),
Qty INT
)
INSERT INTO #Headers VALUES
(1, 'Test1'),
(2, 'Test2'),
(3, 'Test3'),
(4, 'Test4'),
(5, 'Test5')
INSERT INTO #Rows VALUES
(1, 'A', 10),
(1, 'B', 20),
(2, 'A', 10),
(2, 'B', 20),
(3, 'A', 5 ),
(3, 'B', 20),
(4, 'C', 10),
(5, 'A', 10),
(5, 'C', 20)
;
WITH cteHeaderRows AS(
SELECT IDPK
,ItemIDs=STUFF(
(
SELECT ',' + CAST(ITEMID AS VARCHAR(MAX))
FROM #Rows t2
WHERE t2.IDPK = t1.IDPK
ORDER BY ITEMID, QTY
FOR XML PATH('')
),1,1,''
)
,Qtys=STUFF(
(
SELECT ',' + CAST(Qty AS VARCHAR(MAX))
FROM #Rows t2
WHERE t2.IDPK = t1.IDPK
ORDER BY ITEMID, QTY
FOR XML PATH('')
),1,1,''
)
FROM #Rows t1
GROUP BY IDPK
),
cteFilter AS(
SELECT h1.IDPK AS IDPK1, h2.IDPK AS IDPK2
FROM cteHeaderRows h1
JOIN cteHeaderRows h2 ON h1.IDPK != h2.IDPK AND h1.ItemIDs = h2.ItemIDs AND h2.Qtys = h1.Qtys
)
SELECT DISTINCT h.IDPK, h.Description, r.ItemID, r.Qty
FROM #Headers h
JOIN cteFilter f ON f.IDPK1 = h.IDPK
JOIN #Rows r ON r.IDPK = f.IDPK1
ORDER BY 1,3,4

Table transformation logic in SQL

I have a table T :
CREATE TABLE T
(
id INT,
type VARCHAR(200),
type_value VARCHAR(10),
value VARCHAR(200)
);
INSERT INTO T VALUES (1, 'RoomColor', 'room1', 'yellow');
INSERT INTO T VALUES (1, 'RoomColor', 'room2', 'red');
INSERT INTO T VALUES (2, 'RoomColor', 'room1', 'blue');
INSERT INTO T VALUES (2, 'RoomColor', 'room1', 'pink');
INSERT INTO T VALUES (3, 'RoomColor', 'room1', 'white');
INSERT INTO T VALUES (3, 'RoomColor', 'room2', 'grey');
INSERT INTO T VALUES (3, 'RoomColor', 'room2', 'brown');
INSERT INTO T VALUES (4, 'RoomColor', 'room3', 'green');
I need to transform it into :
id BedRoomColor DiningRoomColor
-------------------------------------------
1 yellow red
2 blue pink
3 white grey
4 green null
Logic behind the transformation:
If there are more than two room type_value then discard the third room type_value
For same id if there are more than one room type_value ( for example room1,room1 or room2,room2 or room1,room2) then use first type_value to create as BedRoomColor and second type_value to create DiningRoomColor
If there is only 1 room type_value (for eg. room1 or room2 or room3) for an id then corresponding value ( red,green,yellow etc ) will be placed in BedRoomColor and DiningRoomColor will be null
I am struggling with this logic for couple of days. Can anyone please help me.
Thanks
You can use this script
;WITH CTE AS (
SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY id ORDER BY type_value) FROM T
)
SELECT id, [1] BedRoomColor, [2] DiningRoomColor FROM
(SELECT id,value, RN FROM CTE ) SRC
PIVOT (MAX(value) FOR RN IN ([1], [2]) ) AS PVT
Result:
id BedRoomColor DiningRoomColor
----------- ------------------ ---------------
1 yellow red
2 blue pink
3 white grey
4 green NULL
Another way and with adding type to query is:
;with tt as (
select *,
row_number() over (partition by [type], id order by type_value) rn
-- ^^^^^^ I add type to support other types if there is
from t
)
select id,
max(case when [type] = 'RoomColor' and rn = 1 then [value] end) 'BedRoomColor',
max(case when [type] = 'RoomColor' and rn = 2 then [value] end) 'DiningRoomColor'
from tt
group by id;
SQL Server Fiddle Demo
try this:
with tmp as (
select T.*, rownumber() over(patition by id order by type_value) rang
from T
)
select f1.id, f1.value as BedRoomColor, f2.value as DiningRoomColor
from tmp f1
left outer join tmp f2 on f1.id=f2.id and f2.rang=2
where f1.rang=1

T-SQL Summation

I'm trying to create result set with 3 columns. Each column coming from the summation of 1 Column of Table A but grouped by different ID's. Here's an overview of what I wanted to do..
Table A
ID Val.1
1 4
1 5
1 6
2 7
2 8
2 9
3 10
3 11
3 12
I wanted to create something like..
ROW SUM.VAL.1 SUM.VAL.2 SUM.VAL.3
1 15 21 33
I understand that I can not get this using UNION, I was thinking of using CTE but not quite sure with the logic.
You need conditional Aggregation
select 1 as Row,
sum(case when ID = 1 then Val.1 end),
sum(case when ID = 2 then Val.1 end),
sum(case when ID = 3 then Val.1 end)
From yourtable
You may need dynamic cross tab or pivot if number of ID's are not static
DECLARE #col_list VARCHAR(8000)= Stuff((SELECT ',sum(case when ID = '+ Cast(ID AS VARCHAR(20))+ ' then [Val.1] end) as [val.'+Cast(ID AS VARCHAR(20))+']'
FROM Yourtable
GROUP BY ID
FOR xml path('')), 1, 1, ''),
#sql VARCHAR(8000)
exec('select 1 as Row,'+#col_list +'from Yourtable')
Live Demo
I think pivoting the data table will yield the desired result.
IF OBJECT_ID('tempdb..#TableA') IS NOT NULL
DROP TABLE #TableA
CREATE TABLE #TableA
(
RowNumber INT,
ID INT,
Value INT
)
INSERT #TableA VALUES (1, 1, 4)
INSERT #TableA VALUES (1, 1, 5)
INSERT #TableA VALUES (1, 1, 6)
INSERT #TableA VALUES (1, 2, 7)
INSERT #TableA VALUES (1, 2, 8)
INSERT #TableA VALUES (1, 2, 9)
INSERT #TableA VALUES (1, 3, 10)
INSERT #TableA VALUES (1, 3, 11)
INSERT #TableA VALUES (1, 3, 12)
-- https://msdn.microsoft.com/en-us/library/ms177410.aspx
SELECT RowNumber, [1] AS Sum1, [2] AS Sum2, [3] AS Sum3
FROM
(
SELECT RowNumber, ID, Value
FROM #TableA
) a
PIVOT
(
SUM(Value)
FOR ID IN ([1], [2], [3])
) AS p
This technique works if the ids you are seeking are constant, otherwise I imagine some dyanmic-sql would work as well if changing ids are needed.
https://msdn.microsoft.com/en-us/library/ms177410.aspx

Count number of records returned by temp table - SQL Server

My script is as below
CREATE TABLE #t (Id int, Name varchar(10))
INSERT INTO #t VALUES (1, 'A')
INSERT INTO #t VALUES (1, 'B')
INSERT INTO #t VALUES (1, 'C')
INSERT INTO #t VALUES (1, 'D')
INSERT INTO #t VALUES (2, 'E')
SELECT COUNT(0)FROM (SELECT COUNT(0) FROM #t GROUP BY Id) a
but I am getting an error
Msg 8155, Level 16, State 2, Line 5
No column name was specified for column 1 of 'A'.
When you use a subquery, all the columns need to given names:
SELECT COUNT(0)
FROM (SELECT COUNT(0) as cnt FROM #t GROUP BY Id
) a;
However, a simpler way to write this is:
SELECT COUNT(DISTINCT id)
FROM #t;
Actually, this isn't exactly the same. Your version will count NULL values but this does not. The exact equivalent is:
SELECT COUNT(DISTINCT id) + MAX(CASE WHEN id IS NULL THEN 1 ELSE 0 END)
FROM #t;