update a value column of a row based in another value of row - sql

I have a historical table with data as bellow :
In my case the product id (110,111,112) had changed the price that way i have the same product with the value of the product_updateDate also sometimes i can have the same duplicate data with no information changed
+------+------------+--------------+---------------+-----------------------------+-------------------------+-------------------------+----------------+
| id | product_id | product_name | product_Price | product_addDate | product_UpdateDate | Insertdate_DB | Updatedate_DB |
+------+------------+--------------+---------------+-----------------------------+-------------------------+-------------------------+----------------+
| 1 | 110 | DELL | 1000 | 2017-03-01 08:00:00.000 | NULL | 2017-03-06 10:00:00.000 | NULL |
| 2 | 111 | HP | 900 | 2017-03-01 08:00:00.000 | NULL | 2017-03-06 10:00:00.000 | NULL |
| 3 | 112 | Mac | 1300 | 2017-03-01 08:00:00.000 | NULL | 2017-03-06 10:00:00.000 | NULL |
| 4 | 113 | Lenovo | 950 | 2017-03-01 08:00:00.000 | NULL | 2017-03-06 10:00:00.000 | NULL |
| 5 | 110 | DELL | 900 | 2017-03-04 08:00:00.000 | 2017-03-04 08:00:00.000 | 2017-03-07 10:00:00.000 | NULL |
| 6 | 111 | HP | 800 | 2017-03-04 08:00:00.000 | 2017-03-04 08:00:00.000 | 2017-03-07 10:00:00.000 | NULL |
| 7 | 112 | Mac | 120 | 2017-03-04 08:00:00.000 | 2017-03-04 08:00:00.000 | 2017-03-07 10:00:00.000 | NULL |
+------+------------+--------------+---------------+-----------------------------+-------------------------+-------------------------+----------------+
What i want is doing an Update Query to get the result as bellow:
+----+------------+--------------+---------------+-------------------------+-------------------------+-------------------------+---------------+
| id | product_id | product_name | product_Price | product_addDate | product_UpdateDate | Insertdate_DB | Updatedate_DB |
+----+------------+--------------+---------------+-------------------------+-------------------------+-------------------------+---------------+
| 1 | 110 | DELL | 1000 | 2017-03-01 08:00:00.000 | 2017-03-04 08:00:00.000 | 2017-03-06 10:00:00.000 | GETDATE() |
| 2 | 111 | HP | 900 | 2017-03-01 08:00:00.000 | 2017-03-04 08:00:00.000 | 2017-03-06 10:00:00.000 | GETDATE() |
| 3 | 112 | Mac | 1300 | 2017-03-01 08:00:00.000 | 2017-03-04 08:00:00.000 | 2017-03-06 10:00:00.000 | GETDATE() |
| 4 | 113 | Lenovo | 950 | 2017-03-01 08:00:00.000 | NULL | 2017-03-06 10:00:00.000 | NULL |
| 5 | 110 | DELL | 900 | 2017-03-04 08:00:00.000 | NULL | 2017-03-07 10:00:00.000 | NULL |
| 6 | 111 | HP | 800 | 2017-03-04 08:00:00.000 | NULL | 2017-03-07 10:00:00.000 | NULL |
| 7 | 112 | Mac | 120 | 2017-03-04 08:00:00.000 | NULL | 2017-03-07 10:00:00.000 | NULL |
+----+------------+--------------+---------------+-------------------------+-------------------------+-------------------------+---------------+
The query as bellow can update the product_UpdateDate
update tablename
set product_UpdateDate = case when tablename.product_UpdateDate is null
then t.product_UpdateDate
else null end
from tablename
join tablename t
on tablename.product_id = t.product_id
and t.ID <> tablename.id `
But also i want to change the Updatedate_DB with the GETDATE() value when i try to do update this column all rows have the value GETDATE() but i want change only the rows that the value have change
Thanks for help

I think row_number() might be a simpler method. The logic is hard to follow, but the following should result in the data in your question:
with toupdate as (
select t.*,
row_number() over (partition by product_id order by id) as seqnum,
count(*) over (partition by product_id) as cnt
from t
)
update toupdate
set product_UpdateDate = (case when seqnum = 1 then product_addDate end),
Updatedate_DB = (case when seqnum = 1 then getdate() end)
where cnt > 1;

Related

Get the last N periods from a date column, regardless the period (year, month, day, hour, etc.)

I have different tables with data. In some tables data is loaded quearterly, in others monthly/daily etc.
Every table has ReportedDate column. What I like to do is to be able to filter only the last N periods. If it is days for example, the last 3 days. The problem is I cannot use GETDATE() - 3 for example, because the data is loaded for workdays and not holidays and weekends.
I have tried to use ROW_NUMBER() PARTITION BY ReportedDate but it works really slow.
I would appreciate suggestions.
A sample of a table:
+-----------+-----------------------------+
| Indicator | ReportedDate |
+-----------+-----------------------------+
| 0.2917 | 2020-08-12 00:00:00.0000000 |
| 0.261919 | 2020-08-13 00:00:00.0000000 |
| 0.259211 | 2020-08-14 00:00:00.0000000 |
| 0.201075 | 2020-08-17 00:00:00.0000000 |
| 0.250153 | 2020-08-18 00:00:00.0000000 |
| 0.333093 | 2020-08-19 00:00:00.0000000 |
| 0.976495 | 2020-08-20 00:00:00.0000000 |
| 0.759739 | 2020-08-21 00:00:00.0000000 |
| 1.17279 | 2020-08-24 00:00:00.0000000 |
| 0.285365 | 2020-08-25 00:00:00.0000000 |
+-----------+-----------------------------+
SELECT *
FROM (SELECT Indicator, ReportedDate, ROW_NUMBER() OVER(PARTITION BY ReportedDate ORDER BY ReportedDate desc) as periods
FROM indicatorTable) a
where periods <= 2
Another example - table with stock prices:
+--------+--------+-------------------------+
| Ticker | Price | Date |
+--------+--------+-------------------------+
| AAPL | 116.03 | 2020-11-25 00:00:00.000 |
| AAPL | 115.17 | 2020-11-24 00:00:00.000 |
| AAPL | 113.85 | 2020-11-23 00:00:00.000 |
| AAPL | 117.34 | 2020-11-20 00:00:00.000 |
| AAPL | 118.64 | 2020-11-19 00:00:00.000 |
| AAPL | 118.03 | 2020-11-18 00:00:00.000 |
| AAPL | 119.39 | 2020-11-17 00:00:00.000 |
| AAPL | 120.3 | 2020-11-16 00:00:00.000 |
| AAPL | 119.26 | 2020-11-13 00:00:00.000 |
| AAPL | 119.21 | 2020-11-12 00:00:00.000 |
| IBM | 124.2 | 2020-11-25 00:00:00.000 |
| IBM | 124.42 | 2020-11-24 00:00:00.000 |
| IBM | 120.09 | 2020-11-23 00:00:00.000 |
| IBM | 116.94 | 2020-11-20 00:00:00.000 |
| IBM | 117.18 | 2020-11-19 00:00:00.000 |
| IBM | 116.77 | 2020-11-18 00:00:00.000 |
| IBM | 117.7 | 2020-11-17 00:00:00.000 |
| IBM | 118.36 | 2020-11-16 00:00:00.000 |
| IBM | 116.85 | 2020-11-13 00:00:00.000 |
| IBM | 114.5 | 2020-11-12 00:00:00.000 |
| MSFT | 213.87 | 2020-11-25 00:00:00.000 |
| MSFT | 213.86 | 2020-11-24 00:00:00.000 |
| MSFT | 210.11 | 2020-11-23 00:00:00.000 |
| MSFT | 210.39 | 2020-11-20 00:00:00.000 |
| MSFT | 212.42 | 2020-11-19 00:00:00.000 |
| MSFT | 211.08 | 2020-11-18 00:00:00.000 |
| MSFT | 214.46 | 2020-11-17 00:00:00.000 |
| MSFT | 217.23 | 2020-11-16 00:00:00.000 |
| MSFT | 216.51 | 2020-11-13 00:00:00.000 |
| MSFT | 215.44 | 2020-11-12 00:00:00.000 |
+--------+--------+-------------------------+
What I want is to take the results for the last two periods, in this case:
+--------+--------+-------------------------+
| Ticker | Price | Date |
+--------+--------+-------------------------+
| AAPL | 116.03 | 2020-11-25 00:00:00.000 |
| AAPL | 115.17 | 2020-11-24 00:00:00.000 |
| IBM | 124.2 | 2020-11-25 00:00:00.000 |
| IBM | 124.42 | 2020-11-24 00:00:00.000 |
| MSFT | 213.87 | 2020-11-25 00:00:00.000 |
| MSFT | 213.86 | 2020-11-24 00:00:00.000 |
+--------+--------+-------------------------+
Use dense_rank instead row_number
SELECT *
FROM (SELECT Indicator, ReportedDate, dense_rank() OVER(PARTITION BY (select 1) ORDER BY ReportedDate desc) as periods
FROM #t) a
where periods <= 2
What if:
declare
#t table (Indicator decimal(37,12), ReportedDate datetime)
insert into #t
select 0.2917 , cast('2020-08-12 00:00:00' as datetime)
union
select 0.261919 , cast('2020-08-13 00:00:00' as datetime)
union
select 0.259211 , cast('2020-08-14 00:00:00' as datetime)
union
select 0.201075 , cast('2020-08-17 00:00:00' as datetime)
union
select 0.250153 , cast('2020-08-18 00:00:00' as datetime)
union
select 0.333093 , cast('2020-08-19 00:00:00' as datetime)
union
select 0.976495 , cast('2020-08-20 00:00:00' as datetime)
union
select 0.759739 , cast('2020-08-21 00:00:00' as datetime)
union
select 1.17279 , cast('2020-08-24 00:00:00' as datetime)
union
select 0.285365, cast('2020-08-25 00:00:00' as datetime)
select top 3 * from #t
order by 2 desc

SQL Matching many-to-many dates for ID field

Edit: Fixed Start Date for User 2
I have a list of user ids, each having many start dates and many end dates.
A start date can be recorded many times after the "actual" start date of an "event", same goes for the end date.
The result should be each the first start date and first end date for each user "event"
I hope that makes sense, see the example below.
Thanks!
Assuming the Following tables are given:
Start Table:
+--------+-------------+
| UserID | Start |
+--------+-------------+
| 1 | 2019-01-01 |
| 1 | 2019-01-02 |
| 1 | 2019-01-03 |
| 1 | 2019-04-01 |
| 1 | 2019-04-02 |
| 1 | 2019-04-03 |
| 2 | 2019-06-01 |
| 2 | 2019-06-02 |
| 2 | 2019-10-01 |
| 2 | 2019-10-02 |
+--------+-------------+
End Table:
+--------+------------+
| UserID | End |
+--------+------------+
| 1 | 2019-03-01 |
| 1 | 2019-03-02 |
| 1 | 2019-03-03 |
| 1 | 2019-05-01 |
| 1 | 2019-05-02 |
| 1 | 2019-05-03 |
| 2 | 2019-08-01 |
| 2 | 2019-08-02 |
| 2 | 2019-12-01 |
| 2 | 2019-12-02 |
+--------+------------+
Result:
+--------+------------+------------+
| UserID | Start | End |
+--------+------------+------------+
| 1 | 2019-01-01 | 2019-03-01 |
| 1 | 2019-04-01 | 2019-05-01 |
| 2 | 2019-06-01 | 2019-08-01 |
| 2 | 2019-10-01 | 2019-12-01 |
+--------+------------+------------+
Not sure I agree with your 2019-10-02
Here is one solution
Example
Select UserID
,[Start] = min([Start])
,[End]
From (
Select A.*
,[End] = (Select min([End]) From EndTable Where UserID=A.UserID and [End] >= A.Start )
From StartTable A
) A
Group By UserID,[End]
Returns
UserID Start End
1 2019-01-01 2019-03-01
1 2019-04-01 2019-05-01
2 2019-06-01 2019-08-01
2 2019-10-01 2019-12-01

Set a flag based on the value of another flag in the past hour

I have a table with the following design:
+------+-------------------------+-------------+
| Shop | Date | SafetyEvent |
+------+-------------------------+-------------+
| 1 | 2018-06-25 10:00:00.000 | 0 |
| 1 | 2018-06-25 10:30:00.000 | 1 |
| 1 | 2018-06-25 10:45:00.000 | 0 |
| 2 | 2018-06-25 11:00:00.000 | 0 |
| 2 | 2018-06-25 11:30:00.000 | 0 |
| 2 | 2018-06-25 11:45:00.000 | 0 |
| 3 | 2018-06-25 12:00:00.000 | 1 |
| 3 | 2018-06-25 12:30:00.000 | 0 |
| 3 | 2018-06-25 12:45:00.000 | 0 |
+------+-------------------------+-------------+
Basically at each shop, we track the date/time of a repair and flag if a safety event occurred. I want to add an additional column that tracks if a safety event has occurred in the last 8 hours at each shop. The end result will be like this:
+------+-------------------------+-------------+-------------------+
| Shop | Date | SafetyEvent | SafetyEvent8Hours |
+------+-------------------------+-------------+-------------------+
| 1 | 2018-06-25 10:00:00.000 | 0 | 0 |
| 1 | 2018-06-25 10:30:00.000 | 1 | 1 |
| 1 | 2018-06-25 10:45:00.000 | 0 | 1 |
| 2 | 2018-06-25 11:00:00.000 | 0 | 0 |
| 2 | 2018-06-25 11:30:00.000 | 0 | 0 |
| 2 | 2018-06-25 11:45:00.000 | 0 | 0 |
| 3 | 2018-06-25 12:00:00.000 | 1 | 1 |
| 3 | 2018-06-25 12:30:00.000 | 0 | 1 |
| 3 | 2018-06-25 12:45:00.000 | 0 | 1 |
+------+-------------------------+-------------+-------------------+
I was trying to use DATEDIFF but couldn't figure out how to have it occur for each row.
This isn't particularly efficient, but you can use apply or a correlated subquery:
select t.*, t8.SafetyEvent8Hours
from t apply
(select max(SafetyEvent) as SafetyEvent8Hours
from t t2
where t2.shop = t.shop and
t2.date <= t.date and
t2.date > dateadd(hour, -8, t.date)
) t8;
If you can rely on events being logged every 15 minutes, then a more efficient method is to use window functions:
select t.*,
max(SafetyEvent) over (partition by shop order by date rows between 31 preceding and current row) as SafetyEvent8Hours
from t

How to update column with average weekly value for each day in sql

I have the following table. I insert a column named WeekValue, I want to fill the weekvalue column with the weekly average value of impressionCnt of the same category for each row.
Like:
+-------------------------+----------+---------------+--------------+
| Date | category | impressioncnt | weekAverage |
+-------------------------+----------+---------------+--------------+
| 2014-02-06 00:00:00.000 | a | 123 | 100 |
| 2014-02-06 00:00:00.000 | b | 121 | 200 |
| 2014-02-06 00:00:00.000 | c | 99 | 300 |
| 2014-02-07 00:00:00.000 | a | 33 | 100 |
| 2014-02-07 00:00:00.000 | b | 456 | 200 |
| 2014-02-07 00:00:00.000 | c | 54 | 300 |
| 2014-02-08 00:00:00.000 | a | 765 | 100 |
| 2014-02-08 00:00:00.000 | b | 78 | 200 |
| 2014-02-08 00:00:00.000 | c | 12 | 300 |
| ..... | | | |
| 2014-03-01 00:00:00.000 | a | 123 | 111 |
| 2014-03-01 00:00:00.000 | b | 121 | 222 |
| 2014-03-01 00:00:00.000 | c | 99 | 333 |
| 2014-03-02 00:00:00.000 | a | 33 | 111 |
| 2014-03-02 00:00:00.000 | b | 456 | 222 |
| 2014-03-02 00:00:00.000 | c | 54 | 333 |
| 2014-03-03 00:00:00.000 | a | 765 | 111 |
| 2014-03-03 00:00:00.000 | b | 78 | 222 |
| 2014-03-03 00:00:00.000 | c | 12 | 333 |
+-------------------------+----------+---------------+--------------+
I tried
update [dbo].[RetailTS]
set Week = datepart(day, dateDiff(day, 0, [Date])/7 *7)/7 +1
To get the week numbers then try to group by the week week number and date and category, but this seems isn't correct. How do I write the SQL query? Thanks!
Given that you may be adding more data in the future, thus requiring another update, you might want to just select out the weekly averages:
SELECT
Date,
category,
impressioncnt,
AVG(impressioncnt) OVER
(PARTITION BY category, DATEDIFF(d, 0, Date) / 7) AS weekAverage
FROM RetailTS
ORDER BY
Date, category;

Join and fill down NULL values with last non null

I have been attempting a join that, at first I believed was relatively simple, but am now having a bit of trouble getting it exactly right. I have two sets of data which resemble the following
ID | stmt_dt ID | renewal_dt
-- --
1 |1/31/15 1 | 2/28/15
1 |2/28/15 1 | 4/30/15
1 |3/31/15 2 | 2/28/15
1 |4/30/15 3 | 1/31/15
1 |5/31/15
2 |1/31/15
2 |2/28/15
2 |3/31/15
2 |4/30/15
2 |5/31/15
3 |1/31/15
3 |2/28/15
3 |3/31/15
3 |4/30/15
3 |5/31/15
4 |1/31/15
4 |2/28/15
4 |3/31/15
4 |4/30/15
4 |5/31/15
Here is my desired output
ID | stmt_dt | renewal_dt
--
1 |1/31/15 | NA
1 |2/28/15 | 2/28/15
1 |3/31/15 | 2/28/15
1 |4/30/15 | 4/30/15
1 |5/31/15 | 4/30/15
2 |1/31/15 | NA
2 |2/28/15 | 2/28/15
2 |3/31/15 | 2/28/15
2 |4/30/15 | 2/28/15
2 |5/31/15 | 2/28/15
3 |1/31/15 | 1/31/15
3 |2/28/15 | 1/31/15
3 |3/31/15 | 1/31/15
3 |4/30/15 | 1/31/15
3 |5/31/15 | 1/31/15
4 |1/31/15 | NA
4 |2/28/15 | NA
4 |3/31/15 | NA
4 |4/30/15 | NA
4 |5/31/15 | NA
My biggest issue has been getting the merged values to fill down to the next non null within each group. Any ideas on how to achieve this join? Thanks!
min(...) over (... rows between 1 following and 1 following)* + join
* = LEAD
select s.ID
,s.stmt_dt
,r.renewal_dt
from stmt s
left join (select ID
,renewal_dt
,min (renewal_dt) over
(
partition by ID
order by renewal_dt
rows between 1 following
and 1 following
) as next_renewal_dt
from renewal
) r
on s.ID = r.ID
and s.stmt_dt >= r.renewal_dt
and s.stmt_dt < coalesce (r.next_renewal_dt,date '9999-01-01')
order by s.ID
,s.stmt_dt
+----+------------+------------+
| ID | stmt_dt | renewal_dt |
+----+------------+------------+
| 1 | 2015-01-31 | |
| 1 | 2015-02-28 | 2015-02-28 |
| 1 | 2015-03-31 | 2015-02-28 |
| 1 | 2015-04-30 | 2015-04-30 |
| 1 | 2015-05-31 | 2015-04-30 |
| 2 | 2015-01-31 | |
| 2 | 2015-02-28 | 2015-02-28 |
| 2 | 2015-03-31 | 2015-02-28 |
| 2 | 2015-04-30 | 2015-02-28 |
| 2 | 2015-05-31 | 2015-02-28 |
| 3 | 2015-01-31 | 2015-01-31 |
| 3 | 2015-02-28 | 2015-01-31 |
| 3 | 2015-03-31 | 2015-01-31 |
| 3 | 2015-04-30 | 2015-01-31 |
| 3 | 2015-05-31 | 2015-01-31 |
| 4 | 2015-01-31 | |
| 4 | 2015-02-28 | |
| 4 | 2015-03-31 | |
| 4 | 2015-04-30 | |
| 4 | 2015-05-31 | |
+----+------------+------------+
union all + last_value
select ID
,dt as stmt_dt
,last_value (case when tab = 'R' then dt end ignore nulls) over
(
partition by id
order by dt
,case tab when 'R' then 1 else 2 end
) as renewal_dt
from ( select 'S',ID,stmt_dt from stmt
union all select 'R',ID,renewal_dt from renewal
) as t (tab,ID,dt)
qualify tab = 'S'
order by ID
,stmt_dt
+----+------------+------------+
| ID | stmt_dt | renewal_dt |
+----+------------+------------+
| 1 | 2015-01-31 | |
| 1 | 2015-02-28 | 2015-02-28 |
| 1 | 2015-03-31 | 2015-02-28 |
| 1 | 2015-04-30 | 2015-04-30 |
| 1 | 2015-05-31 | 2015-04-30 |
| 2 | 2015-01-31 | |
| 2 | 2015-02-28 | 2015-02-28 |
| 2 | 2015-03-31 | 2015-02-28 |
| 2 | 2015-04-30 | 2015-02-28 |
| 2 | 2015-05-31 | 2015-02-28 |
| 3 | 2015-01-31 | 2015-01-31 |
| 3 | 2015-02-28 | 2015-01-31 |
| 3 | 2015-03-31 | 2015-01-31 |
| 3 | 2015-04-30 | 2015-01-31 |
| 3 | 2015-05-31 | 2015-01-31 |
| 4 | 2015-01-31 | |
| 4 | 2015-02-28 | |
| 4 | 2015-03-31 | |
| 4 | 2015-04-30 | |
| 4 | 2015-05-31 | |
+----+------------+------------+
SELECT correlated query
select s.ID
,s.stmt_dt
,(
select max (r.renewal_dt)
from renewal r
where r.ID = s.ID
and r.renewal_dt <= s.stmt_dt
) as renewal_dt
from stmt s
order by ID
,stmt_dt
+----+------------+------------+
| ID | stmt_dt | renewal_dt |
+----+------------+------------+
| 1 | 2015-01-31 | |
| 1 | 2015-02-28 | 2015-02-28 |
| 1 | 2015-03-31 | 2015-02-28 |
| 1 | 2015-04-30 | 2015-04-30 |
| 1 | 2015-05-31 | 2015-04-30 |
| 2 | 2015-01-31 | |
| 2 | 2015-02-28 | 2015-02-28 |
| 2 | 2015-03-31 | 2015-02-28 |
| 2 | 2015-04-30 | 2015-02-28 |
| 2 | 2015-05-31 | 2015-02-28 |
| 3 | 2015-01-31 | 2015-01-31 |
| 3 | 2015-02-28 | 2015-01-31 |
| 3 | 2015-03-31 | 2015-01-31 |
| 3 | 2015-04-30 | 2015-01-31 |
| 3 | 2015-05-31 | 2015-01-31 |
| 4 | 2015-01-31 | |
| 4 | 2015-02-28 | |
| 4 | 2015-03-31 | |
| 4 | 2015-04-30 | |
| 4 | 2015-05-31 | |
+----+------------+------------+