new to snowflake - issue with SQL statement with joins and pivoting values - sql

I'm trying to model product by have a product table linking to characteristic table with 3 many to many table
eg
Product (productId, brand, description, model)
ProductCharacteristic (productId, characteristicId)
Characteristic (characteristicId, type, name, value)
From this structure I'm trying pivot back into one result to show product fields with and characteristic.type characteristic.value into a select statement.
For example:
select distinct
p.ProductId, p.name, p.description, p.brand, p.model, body, DoorNum
from
Product p
left join
ProductionCharacteristic pc on p.ProductId = pc.ProductId,
(select CharacteristicId, "'Body'" as body, "'DoorNum'" as DoorNum
from Characteristic c
pivot (max(c.name) for c.type IN ('Body' , 'DoorNum'))) temp
where
pc.CharacteristicId = temp.CharacteristicId
Issue1: is any_Value an Aggregate function? it pivot doesnt like it, i'm using max() instead
Issue2: the above select returns 3 rows. is there a way flatten it down to one record?

Notes
I must insist that you NEVER use mixed join expressions. Meaning that if you are going to use the keywords JOIN and LEFT JOIN then don't use , and vice versa.
You didn't make it clear what you expected your output to look like so I am going with a comma separated list, but will also be showing how you can use an array.
I put this in as a comment, but worth repeating: ANY_VALUE() is an aggregate function, but doesn't work inside the context of PIVOT
https://community.snowflake.com/s/question/0D50Z00008uVHTYSA4/anyvalue-does-not-work-on-pivot
Query Options
Option 1 - Inline Pivot (Not recommended as it's harder to read and the pivot sub-query can't be reused)
select
p.ProductId
, p.name
, p.description
, p.brand
, p.model
, listagg(temp.body,', ') as body
, listagg(temp.DoorNum,', ') as doornum1
, arrayagg(temp.DoorNum) as doornum2
from
Product p
left join
ProductionCharacteristic pc
on p.ProductId = pc.ProductId
join (
select CharacteristicId, body, doornum
from Characteristic c
pivot (max(c.name) for c.type IN ('Body' , 'DoorNum')) as p (CharacteristicId, value, body, doornum)
) temp
on pc.CharacteristicId = temp.CharacteristicId
group by 1,2,3,4,5
;
Option 2 - Pivot in CTE (Recommended)
with temp as (
select CharacteristicId, body, doornum
from Characteristic c
pivot (max(c.name) for c.type IN ('Body' , 'DoorNum')) as p (CharacteristicId, value, body, doornum)
)
select
p.ProductId
, p.name
, p.description
, p.brand
, p.model
, listagg(temp.body,', ') as body
, listagg(temp.DoorNum,', ') as doornum1
, arrayagg(temp.DoorNum) as doornum2
from
Product p
left join
ProductionCharacteristic pc
on p.ProductId = pc.ProductId
join temp
on pc.CharacteristicId = temp.CharacteristicId
group by 1,2,3,4,5
;
Option 3 - Lateral Join (Like a correlated subquery)
select
p.ProductId
, p.name
, p.description
, p.brand
, p.model
, temp.body
, temp.doornum1
, temp.doornum2
from
Product p
join lateral (
select listagg(iff(type = 'Body',name,null),', ') as body
,listagg(iff(type = 'DoorNum',name,null),', ') as doornum1
,arrayagg(iff(type = 'DoorNum',name,null)) as doornum2
from Characteristic c
join ProductionCharacteristic pc
on pc.CharacteristicId = c.CharacteristicId
where p.ProductId = pc.ProductId
) temp
;
Hope this helps.

Related

SQL: How to get distinct values out of string_Agg() function?

The following code
SELECT
DISTINCT(p.ID) AS ID
, PIT.Code AS Code
, year(PT.Date) AS Year
FROM fact.PreT PT
INNER JOIN dim.ProdIType PIT
ON PIT.ProdITypeSKey = PT.ProdITypeSKey
INNER JOIN dim.Proudct P
ON P.ProductSKey = pt.ProductSKey
WHERE p.ID = '15'
GROUP BY p.ID, PIT.Code, PT.Year
returns the following:
I have reconfigured my script to add aggregate and group the codes by id and year, however duplicates are spotted. Code and output below:
SELECT
DISTINCT(p.ID) AS ID
, string_agg(PIT.Code, ',') AS Code
, year(PT.Date) AS Year
FROM fact.PreT PT
INNER JOIN dim.ProdIType PIT
ON PIT.ProdITypeSKey = PT.ProdITypeSKey
INNER JOIN dim.Proudct P
ON P.ProductSKey = pt.ProductSKey
WHERE p.ID = '15'
GROUP BY p.ID, PT.Year
Result:
Desired output - distinct and ordered code ascending:
Can someone explain why string_acc is duplicating codes? how should I tackle this issue?
You need to subquery it and group again. Note that DISTINCT is not a function, it acts over the whole resultset, and is the same as grouping by all column.
SELECT
ID
, string_agg(Code, ',') AS Code
, [Year]
FROM (
SELECT
p.ID
, PIT.Code AS Code
, year(PT.Date) AS Year
FROM fact.PreT PT
INNER JOIN dim.ProdIType PIT
ON PIT.ProdITypeSKey = PT.ProdITypeSKey
INNER JOIN dim.Proudct P
ON P.ProductSKey = pt.ProductSKey
WHERE p.ID = '15'
GROUP BY p.ID, year(PT.Date), PIT.Code
) p
GROUP BY p.ID, PT.Year;

DISTINCT return same ID two times wrongly

This is my SQL query:
SELECT DISTINCT(ItemId), TCode, PartNumber,ModelNumber, ItemUOM
FROM #Results
This query returns:
ItemId TCode Source PartNumber ModelNumber ItemUOM
-----------------------------------------------------------------
1024 1000 NULL NULL EA
1024 1000 FLEX FLEX EA
#Result is a temp table I have used left join in that query
Why does SELECT DISTINCT return the same ItemID 1024 twice?
SELECT DISTCINT(I.ItemId),
(DENSE_RANK() OVER(ORDER BY I.ItemId ASC)) AS RowNumber,
(I.TCode), E.Name AS Source,
I.GoldenRecordNumber AS GoldenRecordNo, I.ItemCode AS MMRefNo,
I.ShortDescription AS ShortText, I.LongDescription AS POText,
Suppliers.Description AS Manufacturer, Suppliers.Name AS ManufacturerCode,
Suppliers.Abbreviation AS ManufacturerAbbr,
ItemSuppliers.ReferenceNo AS PartNumber, ItemSuppliers.ReferenceNo AS ModelNumber,
UOM.Name AS ItemUOM, MG.Name AS PSGC,
NM.Noun AS ClassName, NM.LongAbbrevation AS ClassDescription
INTO
#Results
FROM
Items I
LEFT JOIN
ItemSuppliers ON I.ItemId = ItemSuppliers.ItemsId
LEFT JOIN
Suppliers ON ItemSuppliers.ManufacturerId = Suppliers.SupplierId
LEFT JOIN
UnitOfMeasurement UOM ON UOM.UOMId = I.UOMId
LEFT JOIN
MaterialGroup MG ON MG.MaterialGroupId = I.MaterialGroupId
LEFT JOIN
NounModifiers NM ON NM.NounModifierId = I.NounModifierId
LEFT JOIN
AutoClass AC ON AC.ClassName = NM.Noun
LEFT JOIN
ERP E ON E.ERPId = I.ERPName
LEFT JOIN
NounModifierAttributes NMA ON NMA.NounModifierId =
NM.NounModifierId
LEFT JOIN
Attributes A ON A.AttributeId = NMA.AttributeId
LEFT JOIN
ItemAttributes IA ON IA.ItemId = I.ItemId
WHERE
(I.ItemCode LIKE '%'+'2001010088'+'%' )
SELECT 'Int' = COUNT(distinct(ItemId))
FROM #Results
WHERE (TCode IS NOT NULL OR MMRefNo IS NOT NULL)
SELECT DISTINCT(ItemId),
TCode, Source, GoldenRecordNo, MMRefNo, ShortText, POText,
Manufacturer, ManufacturerCode, ManufacturerAbbr, PartNumber, ModelNumber,
ItemUOM, PSGC, ClassName, ClassDescription
FROM
#Results
WHERE
(TCode IS NOT NULL OR MMRefNo IS NOT NULL)
AND RowNumber BETWEEN (1-1)*100 + 1 AND (((1-1) * 100 + 1) + 100) - 1
DROP TABLE #Results
if you are convinced the rows which are selected can be grouped together then it should work fine.
1. but if rows are having different data then distinct will not help.
2. use ltrim,rtrim to remove leading and trailing spaces.
example: distinct(ltrim(rtrim(ItemId)))
this will help if it due to spaces or for junk values
The behavior of DISTINCT works as expected. For instance, you could use GROUP BY clause to group them by ItemId, TCode to get top most records
SELECT
ItemId, TCode,
MAX(PartNumber) PartNumber, MAX(ModelNumber) ModelNumber,
MAX(ItemUOM), ...
FROM #Results
GROUP BY ItemId, TCode
In case any failure in GROUP BY clause use ranking function to assign the rank and get the record based on rank value.

how to optimized without union clauses?

Is it possible to write below query without a union clause.
select ProductId,ImageName,ImageType, ROW_NUMBER() over (order by ProductId desc) RowId
from
(
select p.id ProductId ,p.pic_image ImageName,'pic_image' ImageType
from product p
left outer join iimages_edited pe on p.id = pe.[id]
where isnull(p.pic_image,'') <> '' and isnull(pe.pic_image,0)=0
union
select p.id ProductId,p.pic_bimage ImageName,'pic_bimage' ImageType
from product p
left outer join iimages_edited pe on p.id = pe.[id]
where isnull(p.pic_bimage,'') <> '' and isnull(pe.pic_bimage,0)=0
union
select p.id ProductId,p.pic_limage ImageName,'pic_limage' ImageType
from product p
left outer join iimages_edited pe on p.id = pe.[id]
where isnull(p.pic_limage,'') <> '' and isnull(pe.pic_limage,0)=0
union
select p.id ProductId,p.pic_blimage ImageName,'pic_blimage' ImageType
from product p
left outer join iimages_edited pe on p.id = pe.[id]
where isnull(p.pic_blimage,'') <> '' and isnull(pe.pic_blimage,0)=0
union
select p.id ProductId,p.pic_cimage ImageName,'pic_cimage' ImageType
from product p
left outer join iimages_edited pe on p.id = pe.[id]
where isnull(p.pic_cimage,'') <> '' and isnull(pe.pic_cimage,0)=0
)t
Above query has same table but different where condition, It is
possible to do it in a single query ?
Any help will be much appreciated !
Thanks in advance
It seems that you are repeating the same join and filters with differents columns each time. You can convert them to rows using UNPIVOT, on each table, before the join :
select pe.ProductId, p.ProductId, p.ImageName, p.ImageType, ROW_NUMBER()
over (order by p.ProductId desc) RowId
from (
select id as ProductId, ImageType, ImageName
from product
unpivot (
ImageType for ImageName
in (pic_image, pic_bimage, pic_limage, pic_blimage, pic_cimage)
) t
) as p
left outer join (
select id as ProductId, ImageType, ImageName
from iimages_edited
unpivot (
ImageType for ImageName
in (pic_image, pic_bimage, pic_limage, pic_blimage, pic_cimage)
) t
) as pe
on p.ImageType = pe.ImageType
and p.ProductId = pe.ProductId
where pe.ProductId is null
UNPIVOT filters null values, so ISNULL are probably not necessary.

Get Distinct results of all columns based on MAX DATE of one

Using SQL Server 2012
I have seen a few threads about this topic but I can't find one that involves multiple joins in the query. I can't create a VIEW on this database so the joins are needed.
The Query
SELECT
p.Price
,s.Type
,s.Symbol
, MAX(d.Date) Maxed
FROM AdventDW.dbo.FactPrices p
INNER JOIN dbo.DimSecurityMaster s
ON s.SecurityID = p.SecurityID
INNER JOIN dbo.DimDateTime d
ON
p.DateTimeKey = d.DateTimeKey
GROUP BY p.Price ,
s.Type ,
s.Symbol
ORDER BY s.Symbol
The query works but does not produce distinct results. I am using Order by to validate the results, but it is not required once I get it working. I The result set looks like this.
Price Type Symbol Maxed
10.57 bfus *bbkd 3/31/1989
10.77 bfus *bbkd 2/28/1990
100.74049 cbus 001397AA6 8/2/2005
100.8161 cbus 001397AA6 7/21/2005
The result set I want is
Price Type Symbol Maxed
10.77 bfus *bbkd 2/28/1990
100.74049 cbus 001397AA6 8/2/2005
Here were a few other StackOverflow threads I tried but couldn't get t work with my specific query
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
SQL Selecting distinct rows from multiple columns based on max value in one column
If you want data for the maximum date, use row_number() rather than group by:
SELECT ts.*
FROM (SELECT p.Price, s.Type, s.Symbol, d.Date,
ROW_NUMBER() OVER (PARTITION BY s.Type, s.Symbol
ORDER BY d.Date DESC
) as seqnum
FROM AdventDW.dbo.FactPrices p INNER JOIN
dbo.DimSecurityMaster s
ON s.SecurityID = p.SecurityID INNER JOIN
dbo.DimDateTime d
ON p.DateTimeKey = d.DateTimeKey
) ts
WHERE seqnum = 1
ORDER BY s.Symbol;
You should use a derived table since you really only want to group the DateTimeKey table to get the MAX date.
SELECT p.Price ,
s.Type ,
s.Symbol ,
tmp.MaxDate
FROM AdventDW.dbo.FactPrices p
INNER JOIN dbo.DimSecurityMaster s ON s.SecurityID = p.SecurityID
INNER JOIN
( SELECT MAX(d.Date) AS MaxDate ,
d.DateTimeKey
FROM dbo.DimDateTime d
GROUP BY d.DateTimeKey ) tmp ON p.DateTimeKey = tmp.DateTimeKey
ORDER BY s.Symbol;
/*
this is your initial select which is fine because this is base from your original criteria,
I cannot ignore this so i'll keep this in-tact. Instead from here i'll create a temp
*/
SELECT
p.Price
, s.Type
, s.Symbol
, MAX(d.Date) Maxed
INTO #tmpT
FROM AdventDW.dbo.FactPrices p
INNER JOIN dbo.DimSecurityMaster s
ON s.SecurityID = p.SecurityID
INNER JOIN dbo.DimDateTime d
ON p.DateTimeKey = d.DateTimeKey
GROUP BY p.Price ,
s.Type ,
s.Symbol
ORDER BY s.Symbol
SELECT innerTable.Price, innerTable.Symbol, innerTable.Type, innerTable.Maxed
FROM (
SELECT
ROW_NUMBER () OVER (PARTITION BY t1.Symbol, t1.Type, t1.Maxed ORDER BY t1.Maxed DESC) as row
, *
FROM #tmpT AS t1
) AS innerTable
WHERE row = 1
DROP TABLE #tmpT

Changing SQL NOT IN to JOINS

Hello guys,
Our aim is to get a script that will insert the missing pairs of product - TaxCategory in the intermediate table (ProductTaxCategory)
The following script is correctly working but we are trying to find a way to optimize it:
INSERT ProductTaxCategory
(ProductTaxCategory_TaxCategoryId,ProductTaxCategory_ProductId)
SELECT
TaxCategoryId
,ProductId
FROM Product pr
CROSS JOIN TaxCategory tx
WHERE pr.ProductId NOT IN
(
SELECT ProductTaxCategory_ProductId
FROM ProductTaxCategory
)
OR
pr.ProductId IN
(
SELECT ProductTaxCategory_ProductId
FROM ProductTaxCategory
)
AND
tx.TaxCategoryId NOT IN
(
SELECT ProductTaxCategory_TaxCategoryId
FROM ProductTaxCategory
WHERE ProductTaxCategory_ProductId = pr.ProductId
)
How can we optimize this query ?
Try something like (full statement now):
INSERT INTO ProductTaxCategory
(ProductTaxCategory_TaxCategoryId,ProductTaxCategory_ProductId)
SELECT TaxCategoryId, ProductId
FROM Product pr CROSS JOIN TaxCategory tx
WHERE NOT EXISTS
(SELECT 1 FROM ProductTaxCategory
WHERE ProductTaxCategory_ProductId = pr.ProductId
AND ProductTaxCategory_TaxCategoryId = tx.TaxCategoryId)
EXISTS with (SELECT 1 ... WHERE ID=...) is often a better alternative to IN (SELECT ID FROM ... ) constructs.
You can do a LEFT JOIN with ProductTaxCategoryand check for NULLs.
Something like this.
INSERT ProductTaxCategory
(
ProductTaxCategory_TaxCategoryId,
ProductTaxCategory_ProductId
)
SELECT p.TaxCategoryId, p.ProductId
FROM
(
SELECT TaxCategoryId, ProductId
FROM Product pr
CROSS JOIN TaxCategory tx
) p
LEFT JOIN ProductTaxCategory ptx
ON P.TaxCategoryId = ptx.ProductTaxCategory_TaxCategoryId
AND P.ProductId = ptx.ProductTaxCategory_ProductId
WHERE ptx.ProductTaxCategory_ProductId IS NULL
Use CROSS JOIN and EXCEPT
INSERT ProductTaxCategory(ProductTaxCategory_ProductId, ProductTaxCategory_TaxCategoryId)
SELECT p.ProductID, tc.TaxCategoryId FROM Product p CROSS JOIN TaxCategory tc
EXCEPT
SELECT ProductTaxCategory_ProductId, ProductTaxCategory_TaxCategoryId FROM ProductTaxCategory
CROSS JOIN will search all the possible pairs. EXCEPT will get you what's missing. Finally you can INSERT them onto the table.