SQL Update Or Insert By Comparing Dates - sql

I am trying to do the UPDATE or INSERT, but I am not sure if this is possible without using loop. Here is the example:
Says, I have this SQL below in which I joined two tables: tblCompany and tblOrders.
SELECT CompanyID, CompanyName, c.LastSaleDate, o.SalesOrderID, o.SalesPrice
, DATEADD(m, -6, GETDATE()) AS DateLast6MonthFromToday
FROM dbo.tblCompany c
CROSS APPLY (
SELECT TOP 1 SalesOrderID, SalesPrice
FROM dbo.tblOrders o
WHERE c.CompanyID = o.CompanyID
ORDER BY SalesOrderID DESC
) AS a
WHERE Type = 'End-User'
Sample Result:
CompanyID, SalesOrderID, SalesPrice, LastSalesDate, DateLast6MonthFromToday
101 10001 50 2/01/2016 10/20/2016
102 10002 80 12/01/2016 10/20/2016
103 10003 80 5/01/2016 10/20/2016
What I am trying to do is comparing the LastSalesDate and the DateLast6MonthFromToday. Condition is below:
If the LastSalesDate is lesser (earlier), then do the INSERT INTO tblOrders (CompanyID, Column1, Column2...) VALUES (CompanyIDFromQuery, Column1Value, Column2Value)
Else, do UPDATE tblOrders SET SalesPrice = 1111 WHERE SalesOrderID = a.SalesOrderID
As the above sample result, the query will only update SalesOrderID 10001 and 10003. And For Company 102, NO insert since the LastSaleDate is greater, then just do the UPDATE for the SalesOrderID.
I know it is probably can be done if I create a Cursor to loop through every record and do the comparison then Update or Insert, but I wonder if there is another way perform this without the loop since I have around 20K records.
Sorry for the confusion,

I don't know your tables structure and your data types. Also I know nothing
about duplicates and join ralationships between this 2 tables.
But I want only show how it works on next example:
use [your test db];
go
create table dbo.tblCompany
(
companyid int,
companyname varchar(max),
lastsaledate datetime,
[type] varchar(max)
);
create table dbo.tblOrders
(
CompanyID int,
SalesOrderID int,
SalesPrice float
);
insert into dbo.tblCompany
values
(1, 'Avito', '2016-01-01', 'End-User'),
(2, 'BMW', '2016-05-01', 'End-User'),
(3, 'PornHub', '2017-01-01', 'End-User')
insert into dbo.tblOrders
values
(1, 1, 500),
(1, 2, 700),
(1, 3, 900),
(2, 1, 500),
(2, 2, 700),
(2, 3, 900),
(3, 1, 500),
(3, 2, 700),
(3, 3, 900)
declare #column_1_value int = 5;
declare #column_2_value int = 777;
with cte as (
select
CompanyID,
SalesOrderID,
SalesPrice
from (
select
CompanyID,
SalesOrderID,
SalesPrice,
row_number() over(partition by CompanyID order by SalesOrderId desc) as rn
from
dbo.tblOrders
) t
where rn = 1
)
merge cte as target
using (select * from dbo.tblCompany where [type] = 'End-User') as source
on target.companyid = source.companyid
and source.lastsaledate >= dateadd(month, -6, getdate())
when matched
then update set target.salesprice = 1111
when not matched
then insert (
CompanyID,
SalesOrderID,
SalesPrice
)
values (
source.CompanyId,
#column_1_value,
#column_2_value
);
select * from dbo.tblOrders
If you will give me an information, then I can prepare target and source tables properly.

Related

SQL Duplicates optimization

I have the following query:
Original query:
SELECT
cd1.cust_number_id, cd1.cust_number_id, cd1.First_Name, cd1.Last_Name
FROM #Customer_Data cd1
inner join #Customer_Data cd2 on
cd1.Cd_Id <> cd2.Cd_Id
and cd2.cust_number_id <> cd1.cust_number_id
and cd2.First_Name = cd1.First_Name
and cd2.Last_Name = cd1.Last_Name
inner join #Customer c1 on c1.Cust_id = cd1.cust_number_id
inner join #Customer c2 on c2.cust_id = cd2.cust_number_id
WHERE c1.cust_number <> c2.cust_number
I optimized it as follows, but there is an error in my optimization and I can't find it:
Optimized query:
SELECT cd1.cust_number_id, cd1.cust_number_id, cd1.First_Name,cd1.Last_Name
FROM (
SELECT cdResult.cust_number_id, cdResult.First_Name,cdResult.Last_Name, COUNT(*) OVER (PARTITION BY cdResult.First_Name, cdResult.Last_Name) as cnt_name_bday
FROM #Customer_Data cdResult
WHERE cdResult.First_Name IS NOT NULL
AND cdResult.Last_Name IS NOT NULL) AS cd1
WHERE cd1.cnt_name_bday > 1;
Test data:
DECLARE #Customer_Data TABLE
(
Cd_Id INT,
cust_number_id INT,
First_Name NVARCHAR(30),
Last_Name NVARCHAR(30)
)
INSERT #Customer_Data (Cd_Id,cust_number_id,First_Name,Last_Name)
VALUES (1, 22, N'Alex', N'Bor'),
(2, 22, N'Alex', N'Bor'),
(3, 23, N'Alex', N'Bor'),
(4, 24, N'Tom', N'Cruse'),
(5, 25, N'Tom', N'Cruse')
DECLARE #Customer TABLE
(
Cust_id INT,
Cust_number INT
)
INSERT #Customer (Cust_id, Cust_number)
VALUES (22, 022),
(23, 023),
(24, 024),
(25, 025)
The problem is that the original query returns 6 rows (duplicating the row). And optimized returns just duplicates, how to make the optimized query also duplicated the row?
I would suggest just using window functions:
SELECT CD.cud_customer_id
FROM (SELECT cd.*, COUNT(*) OVER (PARTITION BY cud_name, cud_birthday) as cnt_name_bday FROM dbo.customer_data cd
) cd
WHERE cnt_name_bday > 1;
Your query is finding duplicates for either name or birthday. You want duplicates with both at the same time.
You can use only one exists :
SELECT cd.cud_customer_id
FROM dbo.customer_data AS cd
WHERE EXISTS (SELECT 1
FROM dbo.customer_data AS c
WHERE c.cud_name = cd.cud_name AND c.cud_birthday = cd.cud_birthday AND c.cust_id <> cd.cud_customer_id
);

SQL Joining on Field with Nulls

I'm trying to match two tables where one of the tables stores multiple values as a string.
In the example below I need to classify each product ordered from the #Orders table with a #NewProduct.NewProductId.
The issue I'm having is sometimes we launch a new product like "Black Shirt",
then later we launch an adaption to that product like "Black Shirt Vneck".
I need to match both changes correctly to the #Orders table. So if the order has Black and Shirt, but not Vneck, it's considered a "Black Shirt", but if the order has Black and Shirt and Vneck, it's considered a "Black Vneck Shirt."
The code below is an example - the current logic I'm using returns duplicates with the Left Join.
Also, assume we can modify the format of #NewProducts but not #Orders.
IF OBJECT_ID('tempdb.dbo.#NewProducts') IS NOT NULL DROP TABLE #NewProducts
CREATE TABLE #NewProducts
(
ProductType VARCHAR(MAX)
, Attribute_1 VARCHAR(MAX)
, Attribute_2 VARCHAR(MAX)
, NewProductId INT
)
INSERT #NewProducts
VALUES
('shirt', 'black', 'NULL', 1),
('shirt', 'black', 'vneck', 2),
('shirt', 'white', 'NULL', 3)
IF OBJECT_ID('tempdb.dbo.#Orders') IS NOT NULL DROP TABLE #Orders
CREATE TABLE #Orders
(
OrderId INT
, ProductType VARCHAR(MAX)
, Attributes VARCHAR(MAX)
)
INSERT #Orders
VALUES
(1, 'shirt', 'black small circleneck'),
(2, 'shirt', 'black large circleneck'),
(3, 'shirt', 'black small vneck'),
(4, 'shirt', 'black small vneck'),
(5, 'shirt', 'white large circleneck'),
(6, 'shirt', 'white small vneck')
SELECT *
FROM #Orders o
LEFT JOIN #NewProducts np
ON o.ProductType = np.ProductType
AND CHARINDEX(np.Attribute_1, o.Attributes) > 0
AND (
CHARINDEX(np.Attribute_2, o.Attributes) > 0
OR np.Attribute_2 = 'NULL'
)
You seem to want the longest overlap:
SELECT *
FROM #Orders o OUTER APPLY
(SELECT Top (1) np.*
FROM #NewProducts np
WHERE o.ProductType = np.ProductType AND
CHARINDEX(np.Attribute_1, o.Attributes) > 0
ORDER BY ((CASE WHEN CHARINDEX(np.Attribute_1, o.Attributes) > 0 THEN 1 ELSE 0 END) +
(CASE WHEN CHARINDEX(np.Attribute_2, o.Attributes) > 0 THEN 1 ELSE 0 END)
) DESC
) np;
I can't say I'm thrilled with the need to do this. It seems like the Orders should contain numeric ids that reference the actual product. However, I can see how something like this is sometimes necessary.
I couldn't get Gordon's answer to work, and was part way through my own response when his came in. His idea of taking the biggest overlap helped. I've tweaked your NewProducts table, so that that side of things is "normalised" even if the Orders table cannot be. Code below or at rextester.com/ERIF13021
create table #NewProduct
(
NewProductID int primary key,
ProductType varchar(max),
ProductName varchar(max)
)
create table #Attribute
(
AttributeID int primary key,
AttributeName varchar(max)
)
create table #ProductAttribute
(
NewProductID int,
AttributeID int
)
insert into #NewProduct
values (1, 'shirt', 'black shirt'),
(2, 'shirt', 'black vneck shirt'),
(3, 'shirt', 'white shirt')
insert into #Attribute
values (1, 'black'),
(2, 'white'),
(3, 'vneck')
insert into #ProductAttribute
values (1,1),
(2,1),
(2,3),
(3,2)
select top 1 with ties
*
from
(
select
o.OrderId,
p.NewProductID,
p.ProductType,
p.ProductName,
o.Attributes,
sum(case when charindex(a.AttributeName,o.Attributes)>0 then 1 else 0 end) as Matches
from
#Orders o
JOIN #Attribute a ON
charindex(a.AttributeName,o.Attributes)>0
JOIN #ProductAttribute pa ON
a.AttributeID = pa.AttributeID
JOIN #NewProduct p ON
pa.NewProductID = p.NewProductID AND
o.ProductType = p.ProductType
group by
o.OrderId,
p.NewProductID,
p.ProductType,
p.ProductName,
o.Attributes
) o2
order by
row_number() over (partition by o2.OrderID order by o2.Matches desc)

How find duplicates in a table with no primary key or ID field?

I've inherited a SQL Server database that has duplicate data in it. I need to find and remove the duplicate rows. But without an id field, I'm not sure how to find the rows.
Normally, I'd compare it with itself using a LEFT JOIN and check that all fields are the same except the ID field would be table1.id <> table2.id, but without that, I don't know how to find duplicates rows and not have it also match on itself.
TABLE:
productId int not null,
categoryId int not null,
state varchar(255) not null,
dateDone DATETIME not null
SAMPLE DATA
1, 3, "started", "2016-06-15 04:23:12.000"
2, 3, "started", "2016-06-15 04:21:12.000"
1, 3, "started", "2016-06-15 04:23:12.000"
1, 3, "done", "2016-06-15 04:23:12.000"
In that sample, only rows 1 and 3 are duplicates.
How do I find duplicates?
Use having (and group by)
select
productId
, categoryId
, state
, dateDone
, count(*)
from your_table
group by productId ,categoryId ,state, dateDone
having count(*) >1
You can do this with windowing functions. For instance
create table #tmp
(
Id INT
)
insert into #tmp
VALUES (1), (1), (2) --so now we have duplicated rows
WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id) AS [DuplicateCounter],
Id
FROM #tmp
)
DELETE FROM CTE
WHERE DuplicateCounter > 1 --duplicated rows have DuplicateCounter > 1
For some reason I thought you wanted to delete them I guess I read that wrong but just switch DELETE in my statement to SELECT and now you have all of the duplicates and not the original. But using DELETE will remove all duplicates and still leave you 1 record which I suspect is your desire.
IF OBJECT_ID('tempdb..#TT') IS NOT NULL
BEGIN
DROP TABLE #TT
END
CREATE TABLE #TT (
productId int not null,
categoryId int not null,
state varchar(255) not null,
dateDone DATETIME not null
)
INSERT INTO #TT (productId, categoryId, state, dateDone)
VALUES (1, 3, 'started', '2016-06-15 04:23:12.000')
,(2, 3, 'started', '2016-06-15 04:21:12.000')
,(1, 3, 'started', '2016-06-15 04:23:12.000')
,(1, 3, 'done', '2016-06-15 04:23:12.000')
SELECT *
FROM
#TT
;WITH cte AS (
SELECT
*
,RowNum = ROW_NUMBER() OVER (PARTITION BY productId, categoryId, state, dateDone ORDER BY productId) --note what you order by doesn't matter
FROM
#TT
)
--if you want to delete them just do this otherwise change DELETE TO SELECT
DELETE
FROM
cte
WHERE
RowNum > 1
SELECT *
FROM
#TT
If you want to and can change schema you can always add an identity column after the fact too and it will populate the existing record
ALTER TABLE #TT
ADD Id INTEGER IDENTITY(1,1) NOT NULL
You can try CTE and then limit the actual selection from the CTE to where RN = 1. Here is the query:-
;WITH ACTE
AS
(
SELECT ProductID, categoryID, State, DateDone,
RN = ROW_NUMBER() OVER(PARTITION BY ProductID, CategoryID, State, DateDone
ORDER BY ProductID, CategoryID, State, DateDone)
FROM [Table]
)
SELECT * FROM ACTE WHERE RN = 1

how to loop thru a table to find data set?

I have to find the timediff in minutes for a order lifetime.
i.e time from order was received(Activity ID 1) to keyed(2) to printed(3) to delivered(4) for each order
for eg
I am completely lost at which approach should i take??
use case or if then statement ?? something like for each to loop thru each record?
what should be the most efficient way to do it?
i know once i get dates in correct variables i can use DATEDIFF.
declare #received as Datetime, #keyed as DateTime, #printed as Datetime, #Delivered as Datetime, #TurnTime1 as int
Select
IF (tblOrderActivity.ActivityID = 1) SET #received = tblOrderActivity.ActivityDate
---
----
from tblOrderActivity
where OrderID = 1
it should show me #TurnTime1 = 48 mins as orderID 1 took 48 mins from received(activity id 1) to keyed (activity id 2) #TurnTime2 = 29 mins as it took 29mins for order 1 from keyed(activity id 2) to printed (activity id 3) so on and so forth for each order
You can do this easily by pivoting the data.It can be done in two ways.
1.Use Conditional Aggregate to pivot the data. After pivoting you can find datediff between different stages. Try this.
SELECT orderid,Received,Keyed,Printed,Delivered,
Datediff(minute, Received, Keyed) TurnTime1,
Datediff(minute, Keyed, Printed) TurnTime2,
Datediff(minute, Printed, Delivered) TurnTime3
FROM (SELECT OrderID,
Max(CASE WHEN ActivityID = 1 THEN ActivityDate END) Received,
Max(CASE WHEN ActivityID = 2 THEN ActivityDate END) Keyed,
Max(CASE WHEN ActivityID = 3 THEN ActivityDate END) Printed,
Max(CASE WHEN ActivityID = 4 THEN ActivityDate END) Delivered
FROM Yourtable
GROUP BY OrderID)A
2.use Pivot to transpose the data
SELECT orderid,
[1] AS Received,
[2] AS Keyed,
[3] AS Printed,
[4] AS Delivered,
Datediff(minute, [1], [2]) TurnTime1,
Datediff(minute, [2], [3]) TurnTime2,
Datediff(minute, [3], [4]) TurnTime3
FROM Yourtable
PIVOT (Max(ActivityDate)
FOR ActivityID IN([1],[2],[3],[4]))piv
As you mentioned in your question, one possible way is to use CASE statement
DECLARE #i INT, #max INT
SELECT #i = MIN(OrderId) FROM tblOrderActivity
SELECT #max = MAX(OrderId) from tblOrderActivity
WHILE #i <= #max
BEGIN
SELECT OrderId
,ActivityID
,ActivityDate
,CASE
WHEN ActivityID = 1 THEN DATEDIFF(MINUTE, ActivityDate, (SELECT ActivityDate FROM C WHERE ActivityID = 2 AND OrderId = #i))
END AS tokeyed
,CASE
WHEN ActivityID = 2 THEN DATEDIFF(MINUTE, ActivityDate, (SELECT ActivityDate FROM C WHERE ActivityID = 3 AND OrderId = #i))
END AS toprinted
,CASE
WHEN ActivityID = 3 THEN DATEDIFF(MINUTE, ActivityDate, (SELECT ActivityDate FROM C WHERE ActivityID = 4 AND OrderId = #i))
END AS todelivered
FROM tblOrderActivity
SET #i = #i + 1
END
At first I make a list of all orders (CTE_Orders).
For each order I get four dates, one for each ActivityID using OUTER APPLY. I assume that some activities could be missing (not completed yet), so OUTER APPLY would return NULL there. When I calculate durations I assume that if activity is not in the database, it hasn't happened yet and I calculate duration till the current time. You can handle this case differently if you have other requirements.
I assume that each order can have at most one row for each Activity ID. If you can have two or more rows with the same Order ID and Activity ID, then you need to decide which one to pick by adding ORDER BY to the SELECT inside the OUTER APPLY.
DECLARE #TOrders TABLE (OrderID int, ActivityID int, ActivityDate datetime);
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (1, 1, '2007-04-16T08:34:00');
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (1, 1, '2007-04-16T08:34:00');
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (1, 2, '2007-04-16T09:22:00');
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (1, 3, '2007-04-16T09:51:00');
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (1, 4, '2007-04-16T16:14:00');
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (2, 1, '2007-04-16T08:34:00');
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (3, 1, '2007-04-16T08:34:00');
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (3, 2, '2007-04-16T09:22:00');
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (3, 3, '2007-04-16T09:51:00');
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (3, 4, '2007-04-16T16:14:00');
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (4, 1, '2007-04-16T08:34:00');
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (4, 2, '2007-04-16T09:22:00');
INSERT INTO #TOrders (OrderID, ActivityID, ActivityDate) VALUES (4, 3, '2007-04-16T09:51:00');
WITH
CTE_Orders
AS
(
SELECT DISTINCT Orders.OrderID
FROM #TOrders AS Orders
)
SELECT
CTE_Orders.OrderID
,Date1_Received
,Date2_Keyed
,Date3_Printed
,Date4_Delivered
,DATEDIFF(minute, ISNULL(Date1_Received, GETDATE()), ISNULL(Date2_Keyed, GETDATE())) AS Time12
,DATEDIFF(minute, ISNULL(Date2_Keyed, GETDATE()), ISNULL(Date3_Printed, GETDATE())) AS Time23
,DATEDIFF(minute, ISNULL(Date3_Printed, GETDATE()), ISNULL(Date4_Delivered, GETDATE())) AS Time34
FROM
CTE_Orders
OUTER APPLY
(
SELECT TOP(1) Orders.ActivityDate AS Date1_Received
FROM #TOrders AS Orders
WHERE
Orders.OrderID = CTE_Orders.OrderID
AND Orders.ActivityID = 1
) AS OA1_Received
OUTER APPLY
(
SELECT TOP(1) Orders.ActivityDate AS Date2_Keyed
FROM #TOrders AS Orders
WHERE
Orders.OrderID = CTE_Orders.OrderID
AND Orders.ActivityID = 2
) AS OA2_Keyed
OUTER APPLY
(
SELECT TOP(1) Orders.ActivityDate AS Date3_Printed
FROM #TOrders AS Orders
WHERE
Orders.OrderID = CTE_Orders.OrderID
AND Orders.ActivityID = 3
) AS OA3_Printed
OUTER APPLY
(
SELECT TOP(1) Orders.ActivityDate AS Date4_Delivered
FROM #TOrders AS Orders
WHERE
Orders.OrderID = CTE_Orders.OrderID
AND Orders.ActivityID = 4
) AS OA4_Delivered
ORDER BY OrderID;
This the result set:
OrderID Date1_Received Date2_Keyed Date3_Printed Date4_Delivered Time12 Time23 Time34
1 2007-04-16 08:34:00.000 2007-04-16 09:22:00.000 2007-04-16 09:51:00.000 2007-04-16 16:14:00.000 48 29 383
2 2007-04-16 08:34:00.000 NULL NULL NULL 4082575 0 0
3 2007-04-16 08:34:00.000 2007-04-16 09:22:00.000 2007-04-16 09:51:00.000 2007-04-16 16:14:00.000 48 29 383
4 2007-04-16 08:34:00.000 2007-04-16 09:22:00.000 2007-04-16 09:51:00.000 NULL 48 29 4082498
You can easily calculate other durations, like the total time for the order (time 4 - time1).
Once you have several different queries that produce the same correct result that you need you should measure their performance with your real data on your system to decide which is more efficient.
this one should fill your needs, but I would suggest to use this query while you insert the values into the table and directly add this value in a new column
SELECT OrderID,
ActivityID,
ActivityDate,
Datediff(MINUTE, ActivityDate, (SELECT ActivityDate
FROM [TestDB].[dbo].[tblOrderActivity] AS b
WHERE b.OrderID = a.OrderID
AND a.ActivityID + 1 = b.ActivityID))
FROM [TestDB].[dbo].[tblOrderActivity] AS a

filter data having more than 1 record

I have a table comprising departments;and the audit of persons added/removed from it.
deptid|personid|actionid|lastupdate
3|5678|i|....
3|5765|i|...
3|8796|i|...
3|5463|i|...
3|5678|r|.....
4|5678|i|....
In a particular department,I need to find out the audit for all those persons who have been actioned MORE THAN ONCE for a given department.
Note that a person can be allocated against multiple departments.
So in the above data,the result expected is:
3|5678|i|....
3|5678|r|.....
I tried the below - but do not know how to proceed to filter further
select personId,actionid,lastUpdate,RN=ROW_NUMBER()
OVER (PARTITION BY personId ORDER BY lastUpdate)
from DeptAudit where deptId=3
Probably this can help:
SELECT personId,actionid,lastUpdate
FROM DeptAudit
WHERE personid IN
(SELECT personId
FROM DeptAudit
GROUP BY personId
HAVING COUNT(*) > 1)
DECLARE #Widget TABLE
(ID INT,
Widget INT,
PART NVARCHAR(10));
INSERT INTO #Widget VALUES
(3, 56757, 'i'),
(3, 56755, 'i'),
(3, 56759, 'i'),
(3, 56753,'i'),
(3, 5678, 'r');
;WITH CTE AS (
select ID,Widget,PART,RN = ROW_NUMBER()OVER(PARTITION BY PART ORDER BY Widget desc) from #Widget
)select ID,Widget,PART from CTE WHERE RN = 1