Multiple Columns to Row with respective values - sql

Trying to bring multiple columns into rows. The intended result is
Here's sample data with what I tried. I am open to unpivot as well if that's faster overall. The full data has 15 AttributeID, AttributeData columns.
DROP TABLE Attribute;
CREATE TABLE Attribute
(
Producttitle varchar(200),
AttributeID_1 varchar(50),
AttributeData_1 varchar(50),
AttributeID_2 varchar(50),
AttributeData_2 varchar(50),
AttributeID_3 varchar(50),
AttributeData_3 varchar(50)
);
INSERT INTO Attribute
VALUES ('title1', '3145', 'Specific', '30', 'Yes', '40', 'Pink')
INSERT INTO Attribute
VALUES ('title2', '17', 'Stainless', '19', 'smoke', '19', 'Something');
SELECT
Producttitle,
[AttributeID],
[AttributeData]
FROM
Attribute
CROSS APPLY
(SELECT 'Indicator1', [AttributeID_1] UNION ALL
SELECT 'Indicator2', [AttributeID_2] UNION ALL
SELECT 'Indicator3', [AttributeID_3]) c (indicatorname, [AttributeID])
CROSS APPLY
(SELECT 'Indicator1', [AttributeData_1] UNION ALL
SELECT 'Indicator2', [AttributeData_2] UNION ALL
SELECT 'Indicator3', [AttributeData_3]) d (indicatorname, [AttributeData]);

You can use cross apply to unpivot your dataset. It is much simpler with values():
select a.title, x.*
from attribute a
cross apply (values
(a.attributeId_1, a.attributeData_1),
(a.attributeId_2, a.attributeData_2),
(a.attributeId_3, a.attributeData_3)
) as x(attributeId, attributeData)
Note that this works because the two groups of columns have consistent data types - otherwise additional casting would be required.

GMB's solution is really cool, but basic unions will also work:
SELECT Producttitle, AttributeID_1 AttributeID, AttributeData_1 AttributeData
from attribute
union
SELECT Producttitle, AttributeID_2 AttributeID, AttributeData_2 AttributeData
from attribute
union
SELECT Producttitle, AttributeID_3 AttributeID, AttributeData_3 AttributeData
from attribute

Related

PostgreSQL: Select unique rows where distinct values are in list

Say that I have the following table:
with data as (
select 'John' "name", 'A' "tag", 10 "count"
union all select 'John', 'B', 20
union all select 'Jane', 'A', 30
union all select 'Judith', 'A', 40
union all select 'Judith', 'B', 50
union all select 'Judith', 'C', 60
union all select 'Jason', 'D', 70
)
I know there are a number of distinct tag values, namely (A, B, C, D).
I would like to select the unique names that only have the tag A
I can get close by doing
-- wrong!
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1
however, this will include unique names that only have 1 distinct tag, regardless of what tag is it.
I am using PostgreSQL, although having more generic solutions would be great.
You're almost there - you already have groups with one tag, now just test if it is the tag you want:
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1 and max(tag)='A'
(Note max could be min as well - SQL just doesn't have single() aggregate function but that's different story.)
You can use not exists here:
select distinct "name"
from data d
where "tag" = 'A'
and not exists (
select * from data d2
where d2."name" = d."name" and d2."tag" != d."tag"
);
This is one possible way of solving it:
select
distinct("name")
from data
where "name" not in (
-- create list of names we want to exclude
select distinct name from data where "tag" != 'A'
)
But I don't know if it's the best or most efficient one.

ROW type/constructor in BigQuery

Does BigQuery have the concept of a ROW, for example, similar to MySQL or Postgres or Oracle or Snowflake? I know it sort of implicitly uses it when doing an INSERT ... VALUES (...) , for example:
INSERT dataset.Inventory (product, quantity)
VALUES('top load washer', 10),
('front load washer', 20)
Each of the values would be implicitly be a ROW type of the Inventory table, but is this construction allowed elsewhere in BigQuery? Or is this a feature that doesn't exist in BQ?
I think below is a simplest / naïve example of such constructor in BigQuery
with t1 as (
select 'top load washer' product, 10 quantity, 'a' type, 'x' category union all
select 'front load washer', 20, 'b', 'y'
), t2 as (
select 1 id, 'a' code, 'x' value union all
select 2, 'd', 'z'
)
select *
from t1
where (type, category) = (select as struct code, value from t2 where id = 1)
Besides using in simple queries, it can also be use in BQ scripts - for example (another simplistic example)
declare type, category string;
create temp table t2 as (
select 1 id, 'a' code, 'x' value union all
select 2, 'd', 'z'
);
set (type, category) = (select as struct code, value from t2 where id = 1);

SQL - expand dataset into lookup table?

I currently have a legacy table that looks like the one below.
This is a set of rules that our business has stored over the years. the issue is the "all" and "both" values really should be separated out into rows so they can be queried more efficiently.
For example, the contract length column can only ever be between 1 and 5, the type column can only ever be "gas" or "water" and the sales channel "internal" or "external". Instead of saying all or both, another row should exist with the specific rule and the table should look like the below.
So this will have a row for every variation in the first table.
I didn't think it would be a long task to manually do myself. but I was wrong :)
Does anyone have any idea on how to achieve this quickly in SQL? I would say what I have tried so far...but I am completely stumped on this one so am wondering if it can even be done at all?
This could be done in a single sql statement, but for the sake of your mental health and the ability to check interim result sets before you get to the final output is probably a lot healthier and less risky.
I would approach this with a UNION query, one set of UNIONs for each column that should be split out to more granular rows.
For instance for contractlength:
SELECT Supplier, 1, Type, SalesChannel FROM yourtable WHERE contractLength in ('1', 'All')
UNION ALL
SELECT Supplier, 2, Type, SalesChannel FROM yourtable WHERE contractLength in ('2', 'All')
UNION ALL
SELECT Supplier, 3, Type, SalesChannel FROM yourtable WHERE contractLength in ('3', 'All')
UNION ALL
SELECT Supplier, 4, Type, SalesChannel FROM yourtable WHERE contractLength in ('4', 'All')
UNION ALL
SELECT Supplier, 5, Type, SalesChannel FROM yourtable WHERE contractLength in ('5', 'All')
You can write those results out to a temp table, and then build your query for type on top of it writing to a new temp table.
SELECT Supplier, contractLength, 'Gas', SalesChannel FROM previousTempTable WHERE type in ('Gas','Both')
UNION ALL
SELECT Supplier, contractLength, 'Water', SalesChannel FROM previousTempTable WHERE type in ('Gas','Both')
Rinse and repeat for SalesChannel.
There's other more elegant ways to solve this with some SELECT DISTINCT and cross joins, but your list of values for each column is limited and this solution I'm proposing feels like a quick easy way to get your data in shape. It's also easy to understand if this is auditable data or the process needs to be repeated.
You don't need to query your table multiple times, or use temp tables. You can do this pretty elegantly with conditional unpivots, by using CROSS APPLY
SELECT
t.Supplier,
c1.ContractLength,
c2.Type,
c3.SalesChannel
FROM YourTable t
CROSS APPLY (
SELECT t.ContractLength
WHERE t.ContractLength <> 'All'
UNION ALL
SELECT *
FROM (VALUES
(1),(2),(3),(4),(5)
) v(ContractLength)
WHERE t.ContractLength = 'All'
) c1
CROSS APPLY (
SELECT t.Type
WHERE t.Type <> 'Both'
UNION ALL
SELECT *
FROM (VALUES
('Gas'),('Water')
) v(Type)
WHERE t.Type = 'Both'
) c2
CROSS APPLY (
SELECT t.SalesChannel
WHERE t.SalesChannel <> 'Both'
UNION ALL
SELECT *
FROM (VALUES
('Internal'),('External')
) v(SalesChannel)
WHERE t.SalesChannel = 'Both'
) c3;
A somewhat less efficient, but more compact, version of the same, is to use normal joins against the VALUES clauses
SELECT
t.Supplier,
c1.ContractLength,
c2.Type,
c3.SalesChannel
FROM YourTable t
JOIN (VALUES
(1),(2),(3),(4),(5)
) c1(ContractLength)
ON c1.ContractLength = t.ContractLength OR t.ContractLength = 'All'
JOIN (VALUES
('Gas'),('Water')
) c2(Type)
ON c2.Type = t.Type OR t.Type = 'Both'
JOIN (VALUES
('Internal'),('External')
) c3(SalesChannel)
ON c3.SalesChannel = t.SalesChannel OR t.SalesChannel = 'Both';

Group by absorb NULL unless it's the only value

I'm trying to group by a primary column and a secondary column. I want to ignore NULL in the secondary column unless it's the only value.
CREATE TABLE #tempx1 ( Id INT, [Foo] VARCHAR(10), OtherKeyId INT );
INSERT INTO #tempx1 ([Id],[Foo],[OtherKeyId]) VALUES
(1, 'A', NULL),
(2, 'B', NULL),
(3, 'B', 1),
(4, 'C', NULL),
(5, 'C', 1),
(6, 'C', 2);
I'm trying to get output like
Foo OtherKeyId
A NULL
B 1
C 1
C 2
This question is similar, but takes the MAX of the column I want, so it ignores other non-NULL values and won't work.
I tried to work out something based on this question, but I don't quite understand what that query does and can't get my output to work
-- Doesn't include Foo='A', creates duplicates for 'B' and 'C'
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Foo] ORDER BY [OtherKeyId]) rn1
FROM #tempx1
)
SELECT c1.[Foo], c1.[OtherKeyId], c1.rn1
FROM cte c1
INNER JOIN cte c2 ON c2.[OtherKeyId] = c1.[OtherKeyId] AND c2.rn1 = c1.rn1
This is for a modern SQL Server: Microsoft SQL Server 2019
You can use a GROUP BY expression with HAVING clause like below one
SELECT [Foo],[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo],[OtherKeyId]
HAVING SUM(CASE WHEN [OtherKeyId] IS NULL THEN 0 END) IS NULL
OR ( SELECT COUNT(*) FROM #tempx1 WHERE [Foo] = t.[Foo] ) = 1
Demo
Hmmm . . . I think you want filtering:
select t.*
from #tempx1 t
where t.otherkeyid is not null or
not exists (select 1
from #tempx1 t2
where t2.foo = t.foo and t2.otherkeyid is not null
);
My actual problem is a bit more complicated than presented here, I ended up using the idea from Barbaros Özhan solution to count the number of items. This ends up with two inner queries on the data set with two different GROUP BY. I'm able to get the results I need on my real dataset using a query like the following:
SELECT
a.[Foo],
b.[OtherKeyId]
FROM (
SELECT
[Foo],
COUNT([OtherKeyId]) [C]
FROM #tempx1 t
GROUP BY [Foo]
) a
JOIN (
SELECT
[Foo],
[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo], [OtherKeyId]
) b ON b.[Foo] = a.[Foo]
WHERE
(b.[OtherKeyId] IS NULL AND a.[C] = 0)
OR (b.[OtherKeyId] IS NOT NULL AND a.[C] > 0)

Sql Server 2008 - PIVOT without Aggregation Function

I know you've got multiple topics touching on this. But, I havent found one that addressed my needs. I need to (on demand) pivot select deep table data to a wide output table. The gotcha in this is that I cannot use an aggregate with Pivot because it eats responses that are needed in the output. I have worked up to a solution, but I don't think it's the best because it will require umpteen left joins to work. I've included all attempts and notes as follows:
-- Sql Server 2008 db.
-- Deep table structure (not subject to modification) contains name/value pairs with a userId as
-- foreign key. In many cases there can be MORE THAN ONE itemValue given by the user for the
-- itemName such as if asked their race, can answer White + Hispanic, etc. Each response is stored
-- as a seperate record - this cannot currently be changed.
-- Goal: pivot deep data to wide while also compressing result
-- set down. Account for all items per userId, and duplicating
-- column values (rather than show nulls) as applicable
-- Sample table to store some data of both single and multiple responses
DECLARE #testTable AS TABLE(userId int, itemName varchar(50), itemValue varchar(255))
INSERT INTO #testTable
SELECT 1, 'q01', '1-q01 Answer'
UNION SELECT 1, 'q02', '1-q02 Answer'
UNION SELECT 1, 'q03', '1-q03 Answer 1'
UNION SELECT 1, 'q03', '1-q03 Answer 2'
UNION SELECT 1, 'q03', '1-q03 Answer 3'
UNION SELECT 1, 'q04', '1-q04 Answer'
UNION SELECT 1, 'q05', '1-q05 Answer'
UNION SELECT 2, 'q01', '2-q01 Answer'
UNION SELECT 2, 'q02', '2-q02 Answer'
UNION SELECT 2, 'q03', '2-q03 Answer 1'
UNION SELECT 2, 'q03', '2-q03 Answer 2'
UNION SELECT 2, 'q04', '2-q04 Answer'
UNION SELECT 2, 'q05', '2-q05 Answer'
SELECT 'Raw Data'
SELECT * FROM #TestTable
SELECT 'Using Pivot - shows aggregate result of itemValue per itemName - eats others'
; WITH Data AS (
SELECT
[userId]
, [itemName]
, [itemValue]
FROM
#testTable
)
SELECT
[userId]
, [q02]
, [q03]
, [q05]
FROM
Data
PIVOT
(
MIN(itemValue) -- Aggregate function eats needed values.
FOR itemName in ([q02], [q03], [q05])
) AS PivotTable
SELECT 'Aggregate with Grouping - Causes Null Values'
SELECT
DISTINCT userId
,[q02] = Max(CASE WHEN itemName = 'q02' THEN itemValue END)
,[q03] = Max(CASE WHEN itemName = 'q03' THEN itemValue END)
,[q05] = Max(CASE WHEN itemName = 'q05' THEN itemValue END)
FROM
#testTable
WHERE
itemName in ('q02', 'q03', 'q05') -- Makes it a hair quicker
GROUP BY
userId -- If by userId only, it only gives 1 row PERIOD = BAD!!
, [itemName]
, [itemValue]
SELECT 'Multiple Left Joins - works properly but bad if pivoting 175 columns or so'
; WITH Data AS (
SELECT
userId
,[itemName]
,[itemValue]
FROM
#testTable
WHERE
itemName in ('q02', 'q03', 'q05') -- Makes it a hair quicker
)
SELECT
DISTINCT s1.userId
,[q02] = s2.[itemValue]
,[q03] = s3.[itemValue]
,[q05] = s5.[itemValue]
FROM
Data s1
LEFT JOIN Data s2
ON s2.userId = s1.userId
AND s2.[itemName] = 'q02'
LEFT JOIN Data s3
ON s3.userId = s1.userId
AND s3.[itemName] = 'q03'
LEFT JOIN Data s5
ON s5.userId = s1.userId
AND s5.[itemName] = 'q05'
So the bottom query is the only one (so far) that does what I need it to do, but the LEFT JOIN's WILL get out of hand and cause performance issues when I use actual item names to pivot. Any recommendations are appreciated.
I think you'll have to stick with joins, because joins are exactly the way of producing results like the one you are after. The purpose of a join is to combine row sets together (on a condition or without any), and your target output is nothing else than a combination of subsets of rows.
However, if the majority of questions always have single responses, you could substantially reduce the number of necessary joins. The idea is to join only multiple-response groups as separate row sets. As for the single-response items, they are joined only as part of the entire dataset of target items.
An example should better illustrate what I might poorly describe verbally. Assuming there are two potentially multiple-response groups in the source data, 'q03' and 'q06' (actually, here's the source table:
DECLARE #testTable AS TABLE(
userId int,
itemName varchar(50),
itemValue varchar(255)
);
INSERT INTO #testTable
SELECT 1, 'q01', '1-q01 Answer'
UNION SELECT 1, 'q02', '1-q02 Answer'
UNION SELECT 1, 'q03', '1-q03 Answer 1'
UNION SELECT 1, 'q03', '1-q03 Answer 2'
UNION SELECT 1, 'q03', '1-q03 Answer 3'
UNION SELECT 1, 'q04', '1-q04 Answer'
UNION SELECT 1, 'q05', '1-q05 Answer'
UNION SELECT 1, 'q06', '1-q06 Answer 1'
UNION SELECT 1, 'q06', '1-q06 Answer 2'
UNION SELECT 1, 'q06', '1-q06 Answer 3'
UNION SELECT 2, 'q01', '2-q01 Answer'
UNION SELECT 2, 'q02', '2-q02 Answer'
UNION SELECT 2, 'q03', '2-q03 Answer 1'
UNION SELECT 2, 'q03', '2-q03 Answer 2'
UNION SELECT 2, 'q04', '2-q04 Answer'
UNION SELECT 2, 'q05', '2-q05 Answer'
UNION SELECT 2, 'q06', '2-q06 Answer 1'
UNION SELECT 2, 'q06', '2-q06 Answer 2'
;
which is same as the table in the original post, but with added 'q06' items), the resulting script could be like this:
WITH ranked AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY userId, itemName ORDER BY itemValue)
FROM #testTable
),
multiplied AS (
SELECT
r.userId,
r.itemName,
r.itemValue,
rn03 = r03.rn,
rn06 = r06.rn
FROM ranked r03
INNER JOIN ranked r06 ON r03.userId = r06.userId AND r06.itemName = 'q06'
INNER JOIN ranked r ON r03.userId = r.userId AND (
r.itemName = 'q03' AND r.rn = r03.rn OR
r.itemName = 'q06' AND r.rn = r06.rn OR
r.itemName NOT IN ('q03', 'q06')
)
WHERE r03.itemName = 'q03'
AND r.itemName IN ('q02', 'q03', 'q05', 'q06')
)
SELECT userId, rn03, rn06, q02, q03, q05, q06
FROM multiplied
PIVOT (
MIN(itemValue)
FOR itemName in (q02, q03, q05, q06)
) AS PivotTable
; WITH SRData AS (
SELECT -- Only query single response items in this block
[userId]
, [q01]
, [q02]
, [q04]
, [q05]
FROM
#testTable
PIVOT
(
MIN(itemValue)
FOR itemName in ([q01], [q02], [q04], [q05])
) AS PivotTable
)
SELECT
sr.[userId]
, sr.[q01]
, sr.[q02]
, [q03] = mr03.[itemValue]
, sr.[q04]
, sr.[q05]
, [q06] = mr06.[itemValue]
FROM
SRData sr
LEFT JOIN #testTable mr03 ON mr03.userId = sr.userId AND mr03.itemName = 'q03' -- Muli Response for q03
LEFT JOIN #testTable mr06 ON mr06.userId = sr.userId AND mr06.itemName = 'q06' -- Muli Response for q06
Not clear what the desired results should look like exactly but one possibility
; WITH Data AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY [userId], [itemName]
ORDER BY [itemValue]) AS RN
, [userId]
, [itemName]
, [itemValue]
FROM
#testTable
)
SELECT
[userId]
, [q02]
, [q03]
, [q05]
FROM
Data
PIVOT
(
MIN(itemValue)
FOR itemName in ([q02], [q03], [q05])
) AS PivotTable
Returns
userId q02 q03 q05
----------- ------------------------------ ------------------------------ ------------------------------
1 1-q02 Answer 1-q03 Answer 1 1-q05 Answer
1 NULL 1-q03 Answer 2 NULL
1 NULL 1-q03 Answer 3 NULL
2 2-q02 Answer 2-q03 Answer 1 2-q05 Answer
2 NULL 2-q03 Answer 2 NULL