SQL Joining on Field with Nulls - sql

I'm trying to match two tables where one of the tables stores multiple values as a string.
In the example below I need to classify each product ordered from the #Orders table with a #NewProduct.NewProductId.
The issue I'm having is sometimes we launch a new product like "Black Shirt",
then later we launch an adaption to that product like "Black Shirt Vneck".
I need to match both changes correctly to the #Orders table. So if the order has Black and Shirt, but not Vneck, it's considered a "Black Shirt", but if the order has Black and Shirt and Vneck, it's considered a "Black Vneck Shirt."
The code below is an example - the current logic I'm using returns duplicates with the Left Join.
Also, assume we can modify the format of #NewProducts but not #Orders.
IF OBJECT_ID('tempdb.dbo.#NewProducts') IS NOT NULL DROP TABLE #NewProducts
CREATE TABLE #NewProducts
(
ProductType VARCHAR(MAX)
, Attribute_1 VARCHAR(MAX)
, Attribute_2 VARCHAR(MAX)
, NewProductId INT
)
INSERT #NewProducts
VALUES
('shirt', 'black', 'NULL', 1),
('shirt', 'black', 'vneck', 2),
('shirt', 'white', 'NULL', 3)
IF OBJECT_ID('tempdb.dbo.#Orders') IS NOT NULL DROP TABLE #Orders
CREATE TABLE #Orders
(
OrderId INT
, ProductType VARCHAR(MAX)
, Attributes VARCHAR(MAX)
)
INSERT #Orders
VALUES
(1, 'shirt', 'black small circleneck'),
(2, 'shirt', 'black large circleneck'),
(3, 'shirt', 'black small vneck'),
(4, 'shirt', 'black small vneck'),
(5, 'shirt', 'white large circleneck'),
(6, 'shirt', 'white small vneck')
SELECT *
FROM #Orders o
LEFT JOIN #NewProducts np
ON o.ProductType = np.ProductType
AND CHARINDEX(np.Attribute_1, o.Attributes) > 0
AND (
CHARINDEX(np.Attribute_2, o.Attributes) > 0
OR np.Attribute_2 = 'NULL'
)

You seem to want the longest overlap:
SELECT *
FROM #Orders o OUTER APPLY
(SELECT Top (1) np.*
FROM #NewProducts np
WHERE o.ProductType = np.ProductType AND
CHARINDEX(np.Attribute_1, o.Attributes) > 0
ORDER BY ((CASE WHEN CHARINDEX(np.Attribute_1, o.Attributes) > 0 THEN 1 ELSE 0 END) +
(CASE WHEN CHARINDEX(np.Attribute_2, o.Attributes) > 0 THEN 1 ELSE 0 END)
) DESC
) np;
I can't say I'm thrilled with the need to do this. It seems like the Orders should contain numeric ids that reference the actual product. However, I can see how something like this is sometimes necessary.

I couldn't get Gordon's answer to work, and was part way through my own response when his came in. His idea of taking the biggest overlap helped. I've tweaked your NewProducts table, so that that side of things is "normalised" even if the Orders table cannot be. Code below or at rextester.com/ERIF13021
create table #NewProduct
(
NewProductID int primary key,
ProductType varchar(max),
ProductName varchar(max)
)
create table #Attribute
(
AttributeID int primary key,
AttributeName varchar(max)
)
create table #ProductAttribute
(
NewProductID int,
AttributeID int
)
insert into #NewProduct
values (1, 'shirt', 'black shirt'),
(2, 'shirt', 'black vneck shirt'),
(3, 'shirt', 'white shirt')
insert into #Attribute
values (1, 'black'),
(2, 'white'),
(3, 'vneck')
insert into #ProductAttribute
values (1,1),
(2,1),
(2,3),
(3,2)
select top 1 with ties
*
from
(
select
o.OrderId,
p.NewProductID,
p.ProductType,
p.ProductName,
o.Attributes,
sum(case when charindex(a.AttributeName,o.Attributes)>0 then 1 else 0 end) as Matches
from
#Orders o
JOIN #Attribute a ON
charindex(a.AttributeName,o.Attributes)>0
JOIN #ProductAttribute pa ON
a.AttributeID = pa.AttributeID
JOIN #NewProduct p ON
pa.NewProductID = p.NewProductID AND
o.ProductType = p.ProductType
group by
o.OrderId,
p.NewProductID,
p.ProductType,
p.ProductName,
o.Attributes
) o2
order by
row_number() over (partition by o2.OrderID order by o2.Matches desc)

Related

SQL N To No Releationship Table

I have 3 table by this names
Supplier :For store supplier info
SupplierID
Name
1
Supplier 1
2
Supplier 2
3
Supplier 3
4
Supplier 4
Product : For store product info
ProductID
Name
1
Product 1
2
Product 2
3
Product 3
4
Product 4
5
Product 5
SupplierProduct : For store Product that supplier can supply
ProductID
SupplierID
2
1
3
1
4
1
2
2
3
2
4
2
3
3
4
3
1
4
2
4
4
4
I want to write a query that get a bunch of product ID and return the supplier ID that have all this product ID (N:N Relation) for example get product ID 2,3 and return just supplier ID 1,2
This is a question of Relational Division With Remainder, with multiple divisors.
Firstly, to be able to make good solutions for this, you need your input data in tabular form. You can use a table variable or a Table Valued Parameter for this.
There are many solutions. Here is one common one:
Join the input data to the SupplierProduct table. In your case, you only want the Supplier data, so do this in a subquery.
Group it up and check that the count matches the total count of inputs
DECLARE #ProductInput TABLE (ProductID int);
INSERT #ProductInput (ProductID) VALUES (2),(3);
SELECT *
FROM Supplier s
WHERE (SELECT COUNT(*)
FROM SupplierProduct sp
JOIN #ProductInput pi ON pi.ProductID = sp.ProductID
WHERE sp.SupplierID = s.SupplierID
) = (SELECT COUNT(*) FROM #ProductInput)
;
db<>fiddle
Another common solution is a double NOT EXISTS. This verifies that there are no inputs which do not have a match. It is generally considered to be less efficient.
DECLARE #ProductInput TABLE (ProductID int);
INSERT #ProductInput (ProductID) VALUES (2),(3);
SELECT *
FROM Supplier s
WHERE NOT EXISTS (SELECT 1
FROM #ProductInput pi
WHERE NOT EXISTS (SELECT 1
FROM SupplierProduct sp
WHERE pi.ProductID = sp.ProductID
AND sp.SupplierID = s.SupplierID
)
);
You can use intersect as follows:
select distinct SupplierID
from SupplierProduct
where ProductID = 2
intersect
select SupplierID
from SupplierProduct
where ProductID = 3
Fiddle
Try this:
DECLARE #Supplier TABLE (SupplierID int, Name varchar(50));
INSERT INTO #Supplier VALUES
(1, 'Supplier 1')
, (2, 'Supplier 2')
, (3, 'Supplier 3')
, (4, 'Supplier 4')
;
DECLARE #Product TABLE (ProductID int, Name varchar(50));
INSERT INTO #Product VALUES
(1, 'Product 1')
, (2, 'Product 2')
, (3, 'Product 3')
, (4, 'Product 4')
, (5, 'Product 5')
;
DECLARE #SupplierProduct TABLE (ProductID int, SupplierID int);
INSERT INTO #SupplierProduct VALUES
(2, 1)
, (3, 1)
, (4, 1)
, (2, 2)
, (3, 2)
, (4, 2)
, (3, 3)
, (4, 3)
, (1, 4)
, (2, 4)
, (4, 4)
;
DECLARE #ProductSelection TABLE (ProductID int)
INSERT INTO #ProductSelection
SELECT
ProductID
FROM #Product
WHERE 1=1
-- AND ProductID IN (2, 3) -- returns Suppliers 1, 2
-- AND ProductID IN (3, 4) -- returns Suppliers 1, 2, 3
AND ProductID IN (2, 4) -- returns Suppliers 1, 2, 4
;
WITH SupplierList AS
(
SELECT
RowNo = ROW_NUMBER() OVER (PARTITION BY SP.SupplierID ORDER BY SP.SupplierID)
, S.SupplierID
FROM #SupplierProduct SP
JOIN #ProductSelection P ON P.ProductID = SP.ProductID
JOIN #Supplier S ON S.SupplierID = SP.SupplierID
)
SELECT
SupplierID
FROM SupplierList
WHERE RowNo = (SELECT SUM(1) FROM #ProductSelection)

SQL Duplicates optimization

I have the following query:
Original query:
SELECT
cd1.cust_number_id, cd1.cust_number_id, cd1.First_Name, cd1.Last_Name
FROM #Customer_Data cd1
inner join #Customer_Data cd2 on
cd1.Cd_Id <> cd2.Cd_Id
and cd2.cust_number_id <> cd1.cust_number_id
and cd2.First_Name = cd1.First_Name
and cd2.Last_Name = cd1.Last_Name
inner join #Customer c1 on c1.Cust_id = cd1.cust_number_id
inner join #Customer c2 on c2.cust_id = cd2.cust_number_id
WHERE c1.cust_number <> c2.cust_number
I optimized it as follows, but there is an error in my optimization and I can't find it:
Optimized query:
SELECT cd1.cust_number_id, cd1.cust_number_id, cd1.First_Name,cd1.Last_Name
FROM (
SELECT cdResult.cust_number_id, cdResult.First_Name,cdResult.Last_Name, COUNT(*) OVER (PARTITION BY cdResult.First_Name, cdResult.Last_Name) as cnt_name_bday
FROM #Customer_Data cdResult
WHERE cdResult.First_Name IS NOT NULL
AND cdResult.Last_Name IS NOT NULL) AS cd1
WHERE cd1.cnt_name_bday > 1;
Test data:
DECLARE #Customer_Data TABLE
(
Cd_Id INT,
cust_number_id INT,
First_Name NVARCHAR(30),
Last_Name NVARCHAR(30)
)
INSERT #Customer_Data (Cd_Id,cust_number_id,First_Name,Last_Name)
VALUES (1, 22, N'Alex', N'Bor'),
(2, 22, N'Alex', N'Bor'),
(3, 23, N'Alex', N'Bor'),
(4, 24, N'Tom', N'Cruse'),
(5, 25, N'Tom', N'Cruse')
DECLARE #Customer TABLE
(
Cust_id INT,
Cust_number INT
)
INSERT #Customer (Cust_id, Cust_number)
VALUES (22, 022),
(23, 023),
(24, 024),
(25, 025)
The problem is that the original query returns 6 rows (duplicating the row). And optimized returns just duplicates, how to make the optimized query also duplicated the row?
I would suggest just using window functions:
SELECT CD.cud_customer_id
FROM (SELECT cd.*, COUNT(*) OVER (PARTITION BY cud_name, cud_birthday) as cnt_name_bday FROM dbo.customer_data cd
) cd
WHERE cnt_name_bday > 1;
Your query is finding duplicates for either name or birthday. You want duplicates with both at the same time.
You can use only one exists :
SELECT cd.cud_customer_id
FROM dbo.customer_data AS cd
WHERE EXISTS (SELECT 1
FROM dbo.customer_data AS c
WHERE c.cud_name = cd.cud_name AND c.cud_birthday = cd.cud_birthday AND c.cust_id <> cd.cud_customer_id
);

SQL Update Or Insert By Comparing Dates

I am trying to do the UPDATE or INSERT, but I am not sure if this is possible without using loop. Here is the example:
Says, I have this SQL below in which I joined two tables: tblCompany and tblOrders.
SELECT CompanyID, CompanyName, c.LastSaleDate, o.SalesOrderID, o.SalesPrice
, DATEADD(m, -6, GETDATE()) AS DateLast6MonthFromToday
FROM dbo.tblCompany c
CROSS APPLY (
SELECT TOP 1 SalesOrderID, SalesPrice
FROM dbo.tblOrders o
WHERE c.CompanyID = o.CompanyID
ORDER BY SalesOrderID DESC
) AS a
WHERE Type = 'End-User'
Sample Result:
CompanyID, SalesOrderID, SalesPrice, LastSalesDate, DateLast6MonthFromToday
101 10001 50 2/01/2016 10/20/2016
102 10002 80 12/01/2016 10/20/2016
103 10003 80 5/01/2016 10/20/2016
What I am trying to do is comparing the LastSalesDate and the DateLast6MonthFromToday. Condition is below:
If the LastSalesDate is lesser (earlier), then do the INSERT INTO tblOrders (CompanyID, Column1, Column2...) VALUES (CompanyIDFromQuery, Column1Value, Column2Value)
Else, do UPDATE tblOrders SET SalesPrice = 1111 WHERE SalesOrderID = a.SalesOrderID
As the above sample result, the query will only update SalesOrderID 10001 and 10003. And For Company 102, NO insert since the LastSaleDate is greater, then just do the UPDATE for the SalesOrderID.
I know it is probably can be done if I create a Cursor to loop through every record and do the comparison then Update or Insert, but I wonder if there is another way perform this without the loop since I have around 20K records.
Sorry for the confusion,
I don't know your tables structure and your data types. Also I know nothing
about duplicates and join ralationships between this 2 tables.
But I want only show how it works on next example:
use [your test db];
go
create table dbo.tblCompany
(
companyid int,
companyname varchar(max),
lastsaledate datetime,
[type] varchar(max)
);
create table dbo.tblOrders
(
CompanyID int,
SalesOrderID int,
SalesPrice float
);
insert into dbo.tblCompany
values
(1, 'Avito', '2016-01-01', 'End-User'),
(2, 'BMW', '2016-05-01', 'End-User'),
(3, 'PornHub', '2017-01-01', 'End-User')
insert into dbo.tblOrders
values
(1, 1, 500),
(1, 2, 700),
(1, 3, 900),
(2, 1, 500),
(2, 2, 700),
(2, 3, 900),
(3, 1, 500),
(3, 2, 700),
(3, 3, 900)
declare #column_1_value int = 5;
declare #column_2_value int = 777;
with cte as (
select
CompanyID,
SalesOrderID,
SalesPrice
from (
select
CompanyID,
SalesOrderID,
SalesPrice,
row_number() over(partition by CompanyID order by SalesOrderId desc) as rn
from
dbo.tblOrders
) t
where rn = 1
)
merge cte as target
using (select * from dbo.tblCompany where [type] = 'End-User') as source
on target.companyid = source.companyid
and source.lastsaledate >= dateadd(month, -6, getdate())
when matched
then update set target.salesprice = 1111
when not matched
then insert (
CompanyID,
SalesOrderID,
SalesPrice
)
values (
source.CompanyId,
#column_1_value,
#column_2_value
);
select * from dbo.tblOrders
If you will give me an information, then I can prepare target and source tables properly.

SQL return only distinct IDs from LEFT JOIN

I've inherited some fun SQL and am trying to figure out how to how to eliminate rows with duplicate IDs. Our indexes are stored in a somewhat columnar format and then we pivot all the rows into one with the values as different columns.
The below sample returns three rows of unique data, but the IDs are duplicated. I need just two rows with unique IDs (and the other columns that go along with it). I know I'll be losing some data, but I just need one matching row per ID to the query (first, top, oldest, newest, whatever).
I've tried using DISTINCT, GROUP BY, and ROW_NUMBER, but I keep getting the syntax wrong, or using them in the wrong place.
I'm also open to rewriting the query completely in a way that is reusable as I currently have to generate this on the fly (cardtypes and cardindexes are user defined) and would love to be able to create a stored procedure. Thanks in advance!
declare #cardtypes table ([ID] int, [Name] nvarchar(50))
declare #cards table ([ID] int, [CardTypeID] int, [Name] nvarchar(50))
declare #cardindexes table ([ID] int, [CardID] int, [IndexType] int, [StringVal] nvarchar(255), [DateVal] datetime)
INSERT INTO #cardtypes VALUES (1, 'Funny Cards')
INSERT INTO #cardtypes VALUES (2, 'Sad Cards')
INSERT INTO #cards VALUES (1, 1, 'Bunnies')
INSERT INTO #cards VALUES (2, 1, 'Dogs')
INSERT INTO #cards VALUES (3, 1, 'Cat')
INSERT INTO #cards VALUES (4, 1, 'Cat2')
INSERT INTO #cardindexes VALUES (1, 1, 1, 'Bunnies', null)
INSERT INTO #cardindexes VALUES (2, 1, 1, 'playing', null)
INSERT INTO #cardindexes VALUES (3, 1, 2, null, '2014-09-21')
INSERT INTO #cardindexes VALUES (4, 2, 1, 'Dogs', null)
INSERT INTO #cardindexes VALUES (5, 2, 1, 'playing', null)
INSERT INTO #cardindexes VALUES (6, 2, 1, 'poker', null)
INSERT INTO #cardindexes VALUES (7, 2, 2, null, '2014-09-22')
SELECT TOP(100)
[ID] = c.[ID],
[Name] = c.[Name],
[Keyword] = [colKeyword].[StringVal],
[DateAdded] = [colDateAdded].[DateVal]
FROM #cards AS c
LEFT JOIN #cardindexes AS [colKeyword] ON [colKeyword].[CardID] = c.ID AND [colKeyword].[IndexType] = 1
LEFT JOIN #cardindexes AS [colDateAdded] ON [colDateAdded].[CardID] = c.ID AND [colDateAdded].[IndexType] = 2
WHERE [colKeyword].[StringVal] LIKE 'p%' AND c.[CardTypeID] = 1
ORDER BY [DateAdded]
Edit:
While both solutions are valid, I ended up using the MAX() solution from #popovitsj as it was easier to implement. The issue of data coming from multiple rows doesn't really factor in for me as all rows are essentially part of the same record. I will most likely use both solutions depending on my needs.
Here's my updated query (as it didn't quite match the answer):
SELECT TOP(100)
[ID] = c.[ID],
[Name] = MAX(c.[Name]),
[Keyword] = MAX([colKeyword].[StringVal]),
[DateAdded] = MAX([colDateAdded].[DateVal])
FROM #cards AS c
LEFT JOIN #cardindexes AS [colKeyword] ON [colKeyword].[CardID] = c.ID AND [colKeyword].[IndexType] = 1
LEFT JOIN #cardindexes AS [colDateAdded] ON [colDateAdded].[CardID] = c.ID AND [colDateAdded].[IndexType] = 2
WHERE [colKeyword].[StringVal] LIKE 'p%' AND c.[CardTypeID] = 1
GROUP BY c.ID
ORDER BY [DateAdded]
You could use MAX or MIN to 'decide' on what to display for the other columns in the rows that are duplicate.
SELECT ID, MAX(Name), MAX(Keyword), MAX(DateAdded)
(...)
GROUP BY ID;
using row number windowed function along with a CTE will do this pretty well. For example:
;With preResult AS (
SELECT TOP(100)
[ID] = c.[ID],
[Name] = c.[Name],
[Keyword] = [colKeyword].[StringVal],
[DateAdded] = [colDateAdded].[DateVal],
ROW_NUMBER()OVER(PARTITION BY c.ID ORDER BY [colDateAdded].[DateVal]) rn
FROM #cards AS c
LEFT JOIN #cardindexes AS [colKeyword] ON [colKeyword].[CardID] = c.ID AND [colKeyword].[IndexType] = 1
LEFT JOIN #cardindexes AS [colDateAdded] ON [colDateAdded].[CardID] = c.ID AND [colDateAdded].[IndexType] = 2
WHERE [colKeyword].[StringVal] LIKE 'p%' AND c.[CardTypeID] = 1
ORDER BY [DateAdded]
)
SELECT * from preResult WHERE rn = 1

SQL: JOIN with 'near' match

I need to do a JOIN with a 'near match'. The best way to explain this is with an example:
CREATE TABLE Car
(
Vin int,
Make nvarchar(50),
ColorID int,
)
CREATE TABLE Color
(
ColorID int,
ColorCode nvarchar(10)
)
CREATE TABLE ColorName
(
ColorID int,
Languagecode varchar(12),
ColorName nvarchar(50)
)
INSERT INTO Color Values (1, 'RED CODE')
INSERT INTO Color Values (2, 'GREEN CODE')
INSERT INTO Color Values (3, 'BLUE CODE')
INSERT INTO ColorName Values (1, 'en', 'Red')
INSERT INTO ColorName Values (1, 'en-US', 'Red, my friend')
INSERT INTO ColorName Values (1, 'en-GB', 'Red, my dear')
INSERT INTO ColorName Values (1, 'en-AU', 'Red, mate')
INSERT INTO ColorName Values (1, 'fr', 'Rouge')
INSERT INTO ColorName Values (1, 'fr-BE', 'Rouge, mon ami')
INSERT INTO ColorName Values (1, 'fr-CA', 'Rouge, mon chum')
INSERT INTO Car Values (123, 'Honda', 1)
The SPROC would look like this:
DECLARE #LanguageCode varchar(12) = 'en-US'
SELECT * FROM Car A
JOIN Color B ON (A.ColorID = B.ColorID)
LEFT JOIN ColorName C ON (B.ColorID = C.ColorID AND C.LanguageCode = #LanguageCode)
See http://sqlfiddle.com/#!6/ac24d/24 (thanks to Jake!)
Here is the challenge:
When the SPROC parameter #LanguageCode is an exact match, all is well.
I would like for it to also work for partial matches; more specifically: say for example that #LanguageCode would be 'en-NZ' then I would like the SPROC to return the value for language code 'en' (since there is no value for 'en-NZ').
As an extra challenge: if there is no match at all I would like to return the 'en' value; for example if #LanguageCode would be 'es' then the SPROC would return the 'en' value (since there is no value for 'es').
Try left(#LanguageCode, 2) + '%'
http://sqlfiddle.com/#!6/ac24d/26
About second part - you have to query table two times anyway (you can do it in one statement, but if will be like two statements in one). You also can insert data into temporary (or variable) table, check if there's no rows and then make another query
I've made a query with table function
http://sqlfiddle.com/#!6/b7be3/5
So you can write
DECLARE #LanguageCode varchar(12) = 'es'
if not exists (select * from sf_test(#LanguageCode))
select * from sf_test('en')
else
select * from sf_test(#LanguageCode)
you also can write
declare #temp table
(
Vin int,
Make nvarchar(50),
ColorCode nvarchar(10)
)
insert into #temp
select * from sf_test(#LanguageCode)
if not exists (select * from #temp)
select * from sf_test('en')
else
select * from #temp
As #Roman Pekar has said in his comment, this can indeed be done, including your additional request about falling back to en, in one statement with the help of a ranking function. Here's how you could go about it:
WITH FilteredAndRanked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (
PARTITION BY ColorID
ORDER BY CASE LanguageCode
WHEN #LanguageCode THEN 1
WHEN LEFT(#LanguageCode, 2) THEN 2
WHEN 'en' THEN 3
END
)
FROM ColorName
WHERE LanguageCode IN (
#LanguageCode,
LEFT(#LanguageCode, 2),
'en'
)
)
SELECT
...
FROM Car A
INNER JOIN Color B ON (A.ColorID = B.ColorID)
LEFT JOIN FilteredAndRanked C ON (B.ColorID = C.ColorID AND C.rnk = 1)
;
That is, the ColorName table is filtered and ranked before being used in the query, and then only the rows with the rankings of 1 are joined:
The filter for ColorName includes only rows with LanguageCode values of #LanguageCode, LEFT(#LanguageCode, 2) and 'en'.
The ranking values are assigned based on which language code each row contains: rows with LEFT(#LanguageCode, 2) are ranked after those with #LanguageCode but before the 'en' ones.