Update using Self Join Sql Server - sql

I have huge data and sample of the table looks like below
+-----------+------------+-----------+-----------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+-----------+
| 1 | 6/3/2014 | 1 | 6/3/2014 |
| 1 | 5/22/2015 | 2 | NULL |
| 1 | 6/3/2015 | 3 | NULL |
| 1 | 11/20/2015 | 4 | NULL |
| 2 | 2/25/2014 | 1 | 2/25/2014 |
| 2 | 7/31/2014 | 2 | NULL |
| 2 | 8/26/2014 | 3 | NULL |
+-----------+------------+-----------+-----------+
Now I need to check if the difference between Date in 2nd row and Flag_date in 1st row. If the difference is more than 180 then 2nd row Flag_date should be updated with the date in 2nd row else it needs to be updated by Flag_date in 1st Row. And same rule follows for all rows with same unique_ID
update a
set a.Flag_Date=case when DATEDIFF(dd,b.Flag_Date,a.[Date])>180 then a.[Date] else b.Flag_Date end
from Table1 a
inner join Table1 b
on a.RowNumber=b.RowNumber+1 and a.Unique_ID=b.Unique_ID
The above update query when executed once, only the second row under each Unique_ID gets updated and result looks like below
+-----------+------------+-----------+------------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+------------+
| 1 | 2014-06-03 | 1 | 2014-06-03 |
| 1 | 2015-05-22 | 2 | 2015-05-22 |
| 1 | 2015-06-03 | 3 | NULL |
| 1 | 2015-11-20 | 4 | NULL |
| 2 | 2014-02-25 | 1 | 2014-02-25 |
| 2 | 2014-07-31 | 2 | 2014-02-25 |
| 2 | 2014-08-26 | 3 | NULL |
+-----------+------------+-----------+------------+
And I need to run four times to achieve my desired result
+-----------+------------+-----------+------------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+------------+
| 1 | 2014-06-03 | 1 | 2014-06-03 |
| 1 | 2015-05-22 | 2 | 2015-05-22 |
| 1 | 2015-06-03 | 3 | 2015-05-22 |
| 1 | 2015-11-20 | 4 | 2015-11-20 |
| 2 | 2014-02-25 | 1 | 2014-02-25 |
| 2 | 2014-07-31 | 2 | 2014-02-25 |
| 2 | 2014-08-26 | 3 | 2014-08-26 |
+-----------+------------+-----------+------------+
Is there a way where I can run update only once and all the rows are updated.
Thank you!

If you are using SQL Server 2012+, then you can use lag():
with toupdate as (
select t1.*,
lag(flag_date) over (partition by unique_id order by rownumber) as prev_flag_date
from table1 t1
)
update toupdate
set Flag_Date = (case when DATEDIFF(day, prev_Flag_Date, toupdate.[Date]) > 180
then toupdate.[Date] else prev_Flag_Date
end);
Both this version and your version can take advantage of an index on table1(unique_id, rownumber) or, better yet, table1(unique_id, rownumber, flag_date).
EDIT:
In earlier versions, this might have better performance:
with toupdate as (
select t1.*, t2.flag_date as prev_flag_date
from table1 t1 outer apply
(select top 1 t2.flag_date
from table1 t2
where t2.unique_id = t1.unique_id and
t2.rownumber < t1.rownumber
order by t2.rownumber desc
) t2
)
update toupdate
set Flag_Date = (case when DATEDIFF(day, prev_Flag_Date, toupdate.[Date]) > 180
then toupdate.[Date] else prev_Flag_Date
end);
The CTE can make use of the same index -- and it is important to have the index. The reason for the better performance is because your join on row_number() cannot use an index on that field.

Related

find next values satisfying a condition in SQL

Suppose that I have a dataframe as:
| ID | Value | Time |
|---------|-------|------|
| 101 | 100 | 1 |
| 101 | 0 | 2 |
| 101 | 200 | 4 |
| 101 | 200 | 7 |
| 101 | 0 | 10 |
| 102 | 100 | 2 |
| 102 | 0 | 3 |
| 102 | 200 | 5 |
For each non-zero Value, I would like to find the next Time that Value=0 for the same ID. So my desired output will be
| ID | Value | Time | NextTime |
|---------|-------|------|----------|
| 101 | 100 | 1 | 2 |
| 101 | 0 | 2 | Null |
| 101 | 200 | 4 | 10 |
| 101 | 200 | 7 | 10 |
| 101 | 0 | 10 | Null |
| 102 | 100 | 2 | 3 |
| 102 | 0 | 3 | Null |
| 102 | 200 | 5 | Null |
I have tried to use the following subquery:
SELECT *, CASE WHEN Value=0 THEN NULL ELSE (SELECT MIN(Time) FROM Table1 sub
WHERE sub.ID = main.ID AND sub.Time > main.Time AND sub.Value=0) END as NextTime
FROM Table1 AS main
ORDER BY
ID,
Time
This query should work, but the problem is that I am working with a extremely large dataframe (millions records), so this query can not be finished in a reasonable time. Could any one help with a more efficient way to get the desired result? Thanks.
You want a cumulative minimum:
select t.*,
min(case when value = 0 then time end) over
(partition by id
order by time
rows between 1 following and unbounded following
) as next_0_time
from t;
EDIT:
If you want values on the 0 rows to be NULL, then use a case expression:
select t.*,
(case when value <> 0
then min(case when value = 0 then time end) over
(partition by id
order by time
rows between 1 following and unbounded following
)
end) as next_0_time
from t;
Here is a db<>fiddle.

Semi-transposing a table in Oracle

I am having trouble semi-transposing the table below based on the 'LENGTH' column. I am using an Oracle database, sample data:
+-----------+-----------+--------+------+
| PERSON_ID | PERIOD_ID | LENGTH | FLAG |
+-----------+-----------+--------+------+
| 1 | 1 | 4 | 1 |
| 1 | 2 | 3 | 0 |
| 2 | 1 | 4 | 1 |
+-----------+-----------+--------+------+
I would like to lengthen this table based on the LENGTH row; basically duplicating the row for each value in the LENGTH column.
See the desired output table below:
+-----------+-----------+--------+------+
| PERSON_ID | PERIOD_ID | NUMBER | FLAG |
+-----------+-----------+--------+------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 2 | 1 |
| 1 | 1 | 3 | 1 |
| 1 | 1 | 4 | 1 |
| 1 | 2 | 1 | 0 |
| 1 | 2 | 2 | 0 |
| 1 | 2 | 3 | 0 |
| 2 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
| 2 | 1 | 3 | 1 |
| 2 | 1 | 4 | 1 |
+-----------+-----------+--------+------+
I typically work in Posgres so Oracle is new to me.
I've found some solutions using the connect by statement but they seem overly complicated, particularly when compared to the simple generate_series() command from Posgres.
A recursive CTE subtracting 1 from length until 1 is reached should work. (In Postgres too, BTW, should you need something working cross platform.)
WITH cte (person_id,
period_id,
number_,
flag)
AS
(
SELECT person_id,
period_id,
length number_,
flag
FROM elbat
UNION ALL
SELECT person_id,
period_id,
number_ - 1 number_,
flag
FROM cte
WHERE number_ > 1
)
SELECT *
FROM cte
ORDER BY person_id,
period_id,
number_;
db<>fiddle

Insert column into table and increment row by one

Given the following tables, I am trying to set eventid to the highest value, incremented by one. So max(eventid) + 1. I cannot seem to get the right SQL syntax to accomplish this.
What I have right now that works, but does not give me what I need, is the following. How would I get the eventid to show, in this case, 96740?
INSERT INTO stock_history
(lastmodby,
event,
previous_stock,
new_stock,
lastmodified,
productid)
SELECT '160' AS lastmodby,
'SALE' AS event,
stockstatus AS previous_stock,
stockstatus + 1 AS new_stock,
Getdate() AS lastmodified,
products_joined.productid AS productid
FROM products_joined
WHERE productcode = 'abc'
stock_history table
+--------+-----------+----------------+-------+-----------+-------+---------+-----------------------+-----------+
| id | productid | previous_stock | count | new_stock | event | eventid | lastmodified | lastmodby |
+--------+-----------+----------------+-------+-----------+-------+---------+-----------------------+-----------+
| 105619 | 9282 | 9 | 1 | 10 | SALE | | 7/24/2015 5:29:00 PM | 160 |
| 105578 | 9282 | 8 | 1 | 9 | ORDER | 96739 | 7/23/2015 7:30:00 PM | 37655 |
| 89241 | 9282 | 7 | 1 | 8 | ORDER | 96738 | 6/1/2014 6:06:00 PM | 30761 |
| 86773 | 9282 | 6 | 1 | 7 | ORDER | 96737 | 4/12/2014 4:36:00 PM | 29745 |
| 70419 | 9282 | 5 | 1 | 6 | ORDER | 96736 | 5/21/2013 1:17:00 PM | 1754 |
| 69088 | 9200 | 19 | 1 | 20 | EDIT | 96735 | 4/28/2013 10:26:00 AM | 1754 |
| 69050 | 9200 | 18 | 1 | 19 | ORDER | 96734 | 4/27/2013 2:17:00 PM | 23001 |
| 68127 | 9200 | 17 | 1 | 18 | ORDER | 96733 | 4/13/2013 12:34:00 PM | 22674 |
| 67064 | 9200 | 16 | 1 | 17 | ORDER | 96732 | 3/30/2013 9:23:00 AM | 22327 |
+--------+-----------+----------------+-------+-----------+-------+---------+-----------------------+-----------+
products_joined table
+-------------+-----------+-------------+
| productcode | productid | stockstatus |
+-------------+-----------+-------------+
| abc | 9282 | 9 |
| xyz | 9200 | 19 |
+-------------+-----------+-------------+
You're better off using an IDENTITY column - SQL Server will take care of this much more efficiently than you can, both in terms of performance as in reliability. This question gives you some options on how to do that, e.g.
ALTER TABLE stock_history DROP COLUMN eventid
ALTER TABLE stock_history ADD eventid INT IDENTITY(1,1)
That said, it is technically possible to do this within the query. It might help to know this pattern for situations where an IDENTITY column is not an option.
INSERT INTO stock_history
(lastmodby,
event,
previous_stock,
new_stock,
lastmodified,
productid,
eventid)
SELECT '160' AS lastmodby,
'SALE' AS event,
stockstatus AS previous_stock,
stockstatus + 1 AS new_stock,
Getdate() AS lastmodified,
products_joined.productid AS productid,
(SELECT MAX(eventid) + 1 FROM stock_history) AS eventid
FROM products_joined
WHERE productcode = 'abc'
Note that this could lead to duplicate eventids if this query is executed multiple times at once.

Sum data from two tables with different number of rows

There are 3 Tables (SorMaster, SorDetail, and InvWarehouse):
SorMaster:
+------------+
| SalesOrder |
+------------+
| 100 |
| 101 |
| 102 |
+------------+
SorDetail:
+------------+------------+---------------+
| SalesOrder | MStockCode | MBackOrderQty |
+------------+------------+---------------+
| 100 | PN-1 | 4 |
| 100 | PN-2 | 9 |
| 100 | PN-3 | 1 |
| 100 | PN-4 | 6 |
| 101 | PN-1 | 6 |
| 101 | PN-3 | 2 |
| 102 | PN-2 | 19 |
| 102 | PN-3 | 14 |
| 102 | PN-4 | 6 |
| 102 | PN-5 | 4 |
+------------+------------+---------------+
InvWarehouse:
+------------+-----------+-----------+
| MStockCode | Warehouse | QtyOnHand |
+------------+-----------+-----------+
| PN-1 | A | 1 |
| PN-2 | B | 9 |
| PN-3 | A | 0 |
| PN-4 | B | 1 |
| PN-1 | A | 0 |
| PN-3 | B | 5 |
| PN-2 | A | 9 |
| PN-3 | B | 4 |
| PN-4 | A | 6 |
| PN-5 | B | 0 |
+------------+-----------+-----------+
Desired Results:
+------------+-----------------+--------------+
| MStockCode | SumBackOrderQty | SumQtyOnHand |
+------------+-----------------+--------------+
| PN-1 | 10 | 10 |
| PN-2 | 28 | 1 |
| PN-3 | 17 | 5 |
| PN-4 | 12 | 13 |
| PN-5 | 11 | 6 |
+------------+-----------------+--------------+
I have been going around in circles with no end in sight. Seems like it should be simple but just can't wrap my head around it. The SumBackOrderQty obviously getting counted twice as the SumQtyOnHand is evaluated. To this point I have been doing the calculations in the PHP instead of the select statement but would like to clean things up a bit where possible.
Current query statement is:
SELECT SorDetail.MStockCode,
SUM(SorDetail.MBackOrderQty) AS 'SumMBackOrderQty',
SUM(InvWarehouse.QtyOnHand) AS 'SumQtyOnHand'
FROM SysproCompanyJ.dbo.SorMaster SorMaster,
SysproCompanyJ.dbo.SorDetail SorDetail LEFT OUTER JOIN SysproCompanyJ.dbo.InvWarehouse InvWarehouse
ON SorDetail.MStockCode = InvWarehouse.StockCode
WHERE SorMaster.SalesOrder = SorDetail.SalesOrder
AND SorMaster.ActiveFlag != 'N'
AND SorDetail.MBackOrderQty > '0'
AND SorDetail.MPrice > '0'
GROUP BY SorDetail.MStockCode
ORDER BY SorDetail.MStockCode ASC
Without providing the complete picture, in terms of your RDBMS, database schema, a description of the problem you're trying to solve and sample data that matches the aforementioned, the following is just an illustration of what a solution based on Barmar's comment could look like:
SELECT SD.MStockCode,
SD.SumBackOrderQty,
IW.SumQtyOnHand
FROM (SELECT MStockCode,
SUM(MBackOrderQty) AS `SumBackOrderQty`
FROM SorDetail
JOIN SorMaster ON SorDetail.SalesOrder=SorMaster.SalesOrder
WHERE SorMaster.ActiveFlag != 'N'
AND SorDetail.MBackOrderQty > 0
AND SorDetail.MPrice > 0
GROUP BY MStockCode) AS SD
LEFT JOIN (SELECT MStockCode,
SUM(QtyOnHand) AS `SumQtyOnHand`
FROM InvWarehouse
GROUP BY MStockCode) AS IW ON SD.MStockCode=IW.MStockCode
ORDER BY SD.MStockCode;
Here's one approach:
select MStockCode,
(select sum(MBackOrderQty) from sorDetail as T2
where T2.MStockCode = T1.MStockCode ) as SumBackOrderQty,
(select sum(QtyOnHand) from invWarehouse as T3
where T3.MStockCode = T1.MStockCode ) as SumQtyOnHand
from
(
select mstockcode from sorDetail
union
select mstockcode from invWarehouse
) as T1
In a fiddle here: http://sqlfiddle.com/#!9/fdaca/6
Though my SumQtyOnHand values don't match yours (as #Gordon pointed out).

fetch most recent non null record

I'm trying to fetch the most recent record and find a non-NULL match. The problem is my subquery returns more than one result.
Data set
| ID | DD | SIG_ID | DCRP |
----------------------------------------
| 1 | 2010-06-01 | 1 | Expert |
| 2 | 2010-09-01 | 1 | Expert |
| 3 | 2010-12-01 | 1 | Expert |
| 4 | 2010-12-01 | 1 | Expert II |
| 5 | 2011-03-01 | 1 | Expert II |
| 6 | 2011-06-01 | 1 | (null) |
| 7 | 2010-06-01 | 2 | Senior |
| 8 | 2010-09-01 | 2 | Senior |
| 9 | 2010-09-01 | 2 | Senior |
| 10 | 2010-12-01 | 2 | Senior II |
| 11 | 2011-03-01 | 2 | (null) |
| 12 | 2011-03-01 | 2 | Senior |
| 13 | 2010-06-01 | 3 | (null) |
| 14 | 2010-09-01 | 3 | (null) |
| 15 | 2010-12-01 | 3 | (null) |
Query
SELECT a.sig_id, a.id,
CASE
WHEN b.dcrp IS NULL
THEN
(SELECT dcrp
FROM tbl
WHERE sig_id = a.sig_id
AND id < a.id
AND dcrp IS NOT NULL)
ELSE b.dcrp
END AS dcrp
FROM
(SELECT sig_id, MAX(id) id
FROM tbl
GROUP BY sig_id) a
LEFT JOIN
(SELECT id, dcrp
FROM tbl
WHERE dcrp IS NOT NULL) b ON b.id = a.id
Desired result
Fetch the most recent dcrp for each sig_id:
| ID | DD | SIG_ID | DCRP |
----------------------------------------
| 5 | 2011-03-01 | 1 | Expert II |
| 12 | 2011-03-01 | 2 | Senior |
| 15 | 2010-12-01 | 3 | (null) |
SQL Fiddle
You can use the following:
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY SIG_ID
ORDER BY CASE WHEN DCRP IS NOT NULL THEN 0 ELSE 1 END,
DD DESC) RN
FROM tbl
)
SELECT *
FROM CTE
WHERE RN = 1
And the fiddle.
;with si as (
select distinct sig_id from tbl
)
select *
from si
cross apply (select top 1 * from tbl where si.sig_id=tbl.sig_id order by case when dcrp is null then 1 else 0 end asc,dd desc) sii
and with fiddler :
http://sqlfiddle.com/#!3/8e267/2/0
The query in SQLFiddle fails due to subquery returning more than 1 row.
Adding TOP 1 fixes that. Please check if it is OK.
THEN
(SELECT TOP 1 dcrp
FROM tbl
WHERE sig_id = a.sig_id
AND id < a.id
AND dcrp IS NOT NULL)