Update a Field/Column based on Current and Previous Record Value - sql

I need assistance with updating a field/column "IsLatest" based on the comparison between the current and previous record. I'm using CTE's syntax and I'm able to get the current and previous record but I'm unable updated field/column "IsLatest" which I need based on the field/column "Value" of the current and previous record.
Example
Current Output
Dates Customer Value IsLatest
2010-01-01 00:00:00.000 1 12 1
Dates Customer Value IsLatest
2010-01-01 00:00:00.000 1 12 0
2010-01-02 00:00:00.000 1 30 1
Dates Customer Value IsLatest
2010-01-01 00:00:00.000 1 12 0
2010-01-02 00:00:00.000 1 30 0
2010-01-03 00:00:00.000 1 13 1
Expected Final Output
Dates Customer Value ValueSetId IsLatest
2010-01-01 00:00:00.000 1 12 12 0
2010-01-01 00:00:00.000 1 12 13 0
2010-01-01 00:00:00.000 1 12 14 0
2010-01-02 00:00:00.000 1 30 12 0
2010-01-02 00:00:00.000 1 30 13 0
2010-01-02 00:00:00.000 1 30 14 0
2010-01-03 00:00:00.000 1 13 12 0
2010-01-03 00:00:00.000 1 13 13 0
2010-01-03 00:00:00.000 1 13 14 0
2010-01-04 00:00:00.000 1 14 12 0
2010-01-04 00:00:00.000 1 14 13 0
2010-01-04 00:00:00.000 1 14 14 1

;WITH a AS
(
SELECT
Dates Customer Value,
row_number() over (partition by customer order by Dates desc, ValueSetId desc) rn
FROM #Customers)
SELECT Dates, Customer, Value, case when RN = 1 then 1 else 0 end IsLatest
FROM a

Related

SQL Conditional update column value based on previous row value

I have a table with attendance dates in SQL Workbench/J. I need to define the attendance periods of the people, where any attendance period with gaps less or equal than 90 days are merged into a single attendance period and any gaps larger than that are considered a different attendance period. For example for a single person this is the table I have
id
year
start_date
end_date
prev_att_month
diff
1
2012
2012-08-01
2012-08-31
2012-07-01
31
1
2012
2012-07-01
2012-07-31
2012-04-01
91
1
2012
2012-04-01
2012-04-30
2012-03-01
31
1
2012
2012-03-01
2012-03-31
2012-02-01
29
1
2012
2012-02-01
2012-02-29
2012-01-01
31
1
2012
2012-01-01
2012-01-31
2011-12-01
31
1
2011
2011-12-01
2011-12-31
2011-11-01
30
1
2011
2011-11-01
2011-11-30
2011-10-01
31
1
2011
2011-10-01
2011-10-31
2011-09-01
30
1
2011
2011-09-01
2011-09-30
2011-08-01
31
1
2011
2011-08-01
2011-08-31
2011-07-01
31
1
2011
2011-07-01
2011-07-31
2011-05-01
61
1
2011
2011-05-01
2011-05-31
2011-04-01
30
1
2011
2011-04-01
2011-04-30
2011-03-01
31
1
2011
2011-03-01
2011-03-31
2011-02-01
28
1
2011
2011-02-01
2011-02-28
2010-08-01
184
1
2010
2010-08-01
2010-08-31
2010-07-01
31
1
2010
2010-07-01
2010-07-31
2010-06-01
30
1
2010
2010-06-01
2010-06-30
2010-05-01
31
1
2010
2010-05-01
2010-05-31
2010-04-01
30
1
2010
2010-04-01
2010-04-30
where I defined the previous attendance month column with a lag function and then found the difference between that column and the the start_date in the diff column. This way I can check the gaps between the attendance months
I want this output of the attendance periods with the 90 day rule explained above:
id
start_date
end_date
1
1/04/2010
31/08/2010
1
1/02/2011
30/04/2012
1
1/07/2012
31/08/2012
Does any one have an idea of how to do this?
So far I was just able to define the difference between the attendance months but since this is a large data set I have not been able to find a solution to define the attendance periods without making a row to row analysis.
with [table] as (
select id, year, start_date, end_date,
lag(start_date) over (partition by id order by id, year, start_date) as prev_att_month,
start_date-prev_att_month as diff
from source
)
select *
from [table]
where id = 1
One method would be to use a windowed COUNT to count how many times a value greater than 90 has appeared in the diff column, which provides a unique group number. Then you can just group your data into those groups and get the MIN and MAX values:
WITH Grps AS(
SELECT V.id,
V.year,
V.start_date,
V.end_date,
V.prev_att_month,
V.diff,
COUNT(CASE WHEN diff > 90 THEN 1 END) OVER (PARTITION BY ID ORDER BY V.start_date ASC) AS Grp
FROM (VALUES(1,2012,CONVERT(date,'20120801'),CONVERT(date,'20120831'),CONVERT(date,'2012-07-01'),31),
(1,2012,CONVERT(date,'20120701'),CONVERT(date,'20120731'),CONVERT(date,'2012-04-01'),91),
(1,2012,CONVERT(date,'20120401'),CONVERT(date,'20120430'),CONVERT(date,'2012-03-01'),31),
(1,2012,CONVERT(date,'20120301'),CONVERT(date,'20120331'),CONVERT(date,'2012-02-01'),29),
(1,2012,CONVERT(date,'20120201'),CONVERT(date,'20120229'),CONVERT(date,'2012-01-01'),31),
(1,2012,CONVERT(date,'20120101'),CONVERT(date,'20120131'),CONVERT(date,'2011-12-01'),31),
(1,2011,CONVERT(date,'20111201'),CONVERT(date,'20111231'),CONVERT(date,'2011-11-01'),30),
(1,2011,CONVERT(date,'20111101'),CONVERT(date,'20111130'),CONVERT(date,'2011-10-01'),31),
(1,2011,CONVERT(date,'20111001'),CONVERT(date,'20111031'),CONVERT(date,'2011-09-01'),30),
(1,2011,CONVERT(date,'20110901'),CONVERT(date,'20110930'),CONVERT(date,'2011-08-01'),31),
(1,2011,CONVERT(date,'20110801'),CONVERT(date,'20110831'),CONVERT(date,'2011-07-01'),31),
(1,2011,CONVERT(date,'20110701'),CONVERT(date,'20110731'),CONVERT(date,'2011-05-01'),61),
(1,2011,CONVERT(date,'20110501'),CONVERT(date,'20110531'),CONVERT(date,'2011-04-01'),30),
(1,2011,CONVERT(date,'20110401'),CONVERT(date,'20110430'),CONVERT(date,'2011-03-01'),31),
(1,2011,CONVERT(date,'20110301'),CONVERT(date,'20110331'),CONVERT(date,'2011-02-01'),28),
(1,2011,CONVERT(date,'20110201'),CONVERT(date,'20110228'),CONVERT(date,'2010-08-01'),184),
(1,2010,CONVERT(date,'20100801'),CONVERT(date,'20100831'),CONVERT(date,'2010-07-01'),31),
(1,2010,CONVERT(date,'20100701'),CONVERT(date,'20100731'),CONVERT(date,'2010-06-01'),30),
(1,2010,CONVERT(date,'20100601'),CONVERT(date,'20100630'),CONVERT(date,'2010-05-01'),31),
(1,2010,CONVERT(date,'20100501'),CONVERT(date,'20100531'),CONVERT(date,'2010-04-01'),30),
(1,2010,CONVERT(date,'20100401'),CONVERT(date,'20100430'),NULL,NULL))V(id,year,start_date,end_date,prev_att_month,diff))
SELECT id,
MIN(Start_date) AS Start_date,
MAX(End_Date) AS End_Date
FROM Grps
GROUP BY Id,
Grp
ORDER BY id,
Start_date;

Updating specific rows to values based on the count of rows in another table

I have a table RESERVED_BOOKINGS_OVERRIDDEN
booking_product_id on_site_from_dt on_site_to_dt venue_id
4 2021-08-07 16:00:00.000 2021-08-14 10:00:00.000 12
4 2021-08-07 16:00:00.000 2021-08-10 10:00:00.000 12
6 2021-08-02 16:00:00.000 2021-08-09 10:00:00.000 12
and another table ALLOCATED_PRODUCTS
Date booking_product_id venue_id ReservedQuant
2021-08-05 00:00:00.000 4 12 3
2021-08-06 00:00:00.000 4 12 3
2021-08-07 00:00:00.000 4 12 3
2021-08-08 00:00:00.000 4 12 3
2021-08-05 00:00:00.000 6 12 1
Now I need to update the ReservedQuant column in the ALLOCATED_PRODUCTS table based on the rows in RESERVED_BOOKINGS_OVERRIDDEN
The ReservedQuant must minus by the amount of rows found where the ALLOCATED_PRODUCTS.Date is within the RESERVED_BOOKINGS_OVERRIDDEN.on_site_from_dt and RESERVED_BOOKINGS_OVERRIDDEN.on_site_to_dt and ALLOCATED_PRODUCTS.booking_product_id = RESERVED_BOOKINGS_OVERRIDDEN.booking_product_id.
This should be the state of the data after the update:
Date booking_product_id venue_id ReservedQuant
2021-08-05 00:00:00.000 4 12 3
2021-08-06 00:00:00.000 4 12 3
2021-08-07 00:00:00.000 4 12 1
2021-08-08 00:00:00.000 4 12 1
2021-08-05 00:00:00.000 6 12 0
update a set a.ReservedQuant=ReservedQuant-(select count(1) from RESERVED_BOOKINGS_OVERRIDDEN b where a.booking_product_id=b.booking_product_id
and a.date between cast(b.on_site_from_dt as date) and cast(b.on_site_to_dt as date))
from ALLOCATED_PRODUCTS a

Pandas time difference calculation error

I have two time columns in my dataframe: called date1 and date2.
As far as I always assumed, both are in date_time format. However, I now have to calculate the difference in days between the two and it doesn't work.
I run the following code to analyse the data:
df['month1'] = pd.DatetimeIndex(df['date1']).month
df['month2'] = pd.DatetimeIndex(df['date2']).month
print(df[["date1", "date2", "month1", "month2"]].head(10))
print(df["date1"].dtype)
print(df["date2"].dtype)
The output is:
date1 date2 month1 month2
0 2016-02-29 2017-01-01 1 1
1 2016-11-08 2017-01-01 1 1
2 2017-11-27 2009-06-01 1 6
3 2015-03-09 2014-07-01 1 7
4 2015-06-02 2014-07-01 1 7
5 2015-09-18 2017-01-01 1 1
6 2017-09-06 2017-07-01 1 7
7 2017-04-15 2009-06-01 1 6
8 2017-08-14 2014-07-01 1 7
9 2017-12-06 2014-07-01 1 7
datetime64[ns]
object
As you can see, the month for date1 is not calculated correctly!
The final operation, which does not work is:
df["date_diff"] = (df["date1"]-df["date2"]).astype('timedelta64[D]')
which leads to the following error:
incompatible type [object] for a datetime/timedelta operation
I first thought it might be due to date2, so I tried:
df["date2_new"] = pd.to_datetime(df['date2'] - 315619200, unit = 's')
leading to:
unsupported operand type(s) for -: 'str' and 'int'
Anyone has an idea what I need to change?
Use .dt accessor with days attribute:
df[['date1','date2']] = df[['date1','date2']].apply(pd.to_datetime)
df['date_diff'] = (df['date1'] - df['date2']).dt.days
Output:
date1 date2 month1 month2 date_diff
0 2016-02-29 2017-01-01 1 1 -307
1 2016-11-08 2017-01-01 1 1 -54
2 2017-11-27 2009-06-01 1 6 3101
3 2015-03-09 2014-07-01 1 7 251
4 2015-06-02 2014-07-01 1 7 336
5 2015-09-18 2017-01-01 1 1 -471
6 2017-09-06 2017-07-01 1 7 67
7 2017-04-15 2009-06-01 1 6 2875
8 2017-08-14 2014-07-01 1 7 1140
9 2017-12-06 2014-07-01 1 7 1254

PostgreSQL - rank over rows listed in blocks of 0 and 1

I have a table that looks like:
id code date1 date2 block
--------------------------------------------------
20 1234 2017-07-01 2017-07-31 1
15 1234 2017-06-01 2017-06-30 1
13 1234 2017-05-01 2017-05-31 0
11 1234 2017-03-01 2017-03-31 0
9 1234 2017-02-01 2017-02-28 1
8 1234 2017-01-01 2017-01-31 0
7 1234 2016-11-01 2016-11-31 0
6 1234 2016-10-01 2016-10-31 1
2 1234 2016-09-01 2016-09-31 1
I need to rank the rows according to the blocks of 0's and 1's, like:
id code date1 date2 block desired_rank
-------------------------------------------------------------------
20 1234 2017-07-01 2017-07-31 1 1
15 1234 2017-06-01 2017-06-30 1 1
13 1234 2017-05-01 2017-05-31 0 2
11 1234 2017-03-01 2017-03-31 0 2
9 1234 2017-02-01 2017-02-28 1 3
8 1234 2017-01-01 2017-01-31 0 4
7 1234 2016-11-01 2016-11-31 0 4
6 1234 2016-10-01 2016-10-31 1 5
2 1234 2016-09-01 2016-09-31 1 5
I've tried to use rank() and dense_rank(), but the result I end up with is:
id code date1 date2 block dense_rank()
-------------------------------------------------------------------
20 1234 2017-07-01 2017-07-31 1 1
15 1234 2017-06-01 2017-06-30 1 2
13 1234 2017-05-01 2017-05-31 0 1
11 1234 2017-03-01 2017-03-31 0 2
9 1234 2017-02-01 2017-02-28 1 3
8 1234 2017-01-01 2017-01-31 0 3
7 1234 2016-11-01 2016-11-31 0 4
6 1234 2016-10-01 2016-10-31 1 4
2 1234 2016-09-01 2016-09-31 1 5
In the last table, the rank doesn't care about the rows, it just takes all the 1's and 0's as a unit and sets an ascending count starting at the first 1 and 0.
My query goes like this:
CREATE TEMP TABLE data (id integer,code text, date1 date, date2 date, block integer);
INSERT INTO data VALUES
(20,'1234', '2017-07-01','2017-07-31',1),
(15,'1234', '2017-06-01','2017-06-30',1),
(13,'1234', '2017-05-01','2017-05-31',0),
(11,'1234', '2017-03-01','2017-03-31',0),
(9, '1234', '2017-02-01','2017-02-28',1),
(8, '1234', '2017-01-01','2017-01-31',0),
(7, '1234', '2016-11-01','2016-11-30',0),
(6, '1234', '2016-10-01','2016-10-31',1),
(2, '1234', '2016-09-01','2016-09-30',1);
SELECT *,dense_rank() OVER (PARTITION BY code,block ORDER BY date2 DESC)
FROM data
ORDER BY date2 DESC;
By the way, the database is in postgreSQL.
I hope there's a workaround... Thanks :)
Edit: Note that the blocks of 0's and 1's aren't equal.
There's no way to get this result using a single Window Function:
SELECT *,
Sum(flag) -- now sum the 0/1 to create the "rank"
Over (PARTITION BY code
ORDER BY date2 DESC)
FROM
(
SELECT *,
CASE
WHEN Lag(block) -- check if this is the 1st row of a new block
Over (PARTITION BY code
ORDER BY date2 DESC) = block
THEN 0
ELSE 1
END AS flag
FROM DATA
) AS dt

SQL Date Range Query - Table Comparison

I have two SQL Server tables containing the following information:
Table t_venues:
venue_id is unique
venue_id | start_date | end_date
1 | 01/01/2014 | 02/01/2014
2 | 05/01/2014 | 05/01/2014
3 | 09/01/2014 | 15/01/2014
4 | 20/01/2014 | 30/01/2014
Table t_venueuser:
venue_id is not unique
venue_id | start_date | end_date
1 | 02/01/2014 | 02/01/2014
2 | 05/01/2014 | 05/01/2014
3 | 09/01/2014 | 10/01/2014
4 | 23/01/2014 | 25/01/2014
From these two tables I need to find the dates that haven't been selected for each range, so the output would look like this:
venue_id | start_date | end_date
1 | 01/01/2014 | 01/01/2014
3 | 11/01/2014 | 15/01/2014
4 | 20/01/2014 | 22/01/2014
4 | 26/01/2014 | 30/01/2014
I can compare the two tables and get the date ranges from t_venues to appear in my query using 'except' but I can't get the query to produce the non-selected dates. Any help would be appreciated.
Calendar Table!
Another perfect candidate for a calendar table. If you can't be bothered to search for one, here's one I made earlier.
Setup Data
DECLARE #t_venues table (
venue_id int
, start_date date
, end_date date
);
INSERT INTO #t_venues (venue_id, start_date, end_date)
VALUES (1, '2014-01-01', '2014-01-02')
, (2, '2014-01-05', '2014-01-05')
, (3, '2014-01-09', '2014-01-15')
, (4, '2014-01-20', '2014-01-30')
;
DECLARE #t_venueuser table (
venue_id int
, start_date date
, end_date date
);
INSERT INTO #t_venueuser (venue_id, start_date, end_date)
VALUES (1, '2014-01-02', '2014-01-02')
, (2, '2014-01-05', '2014-01-05')
, (3, '2014-01-09', '2014-01-10')
, (4, '2014-01-23', '2014-01-25')
;
The Query
SELECT t_venues.venue_id
, calendar.the_date
, CASE WHEN t_venueuser.venue_id IS NULL THEN 1 ELSE 0 END As is_available
FROM dbo.calendar /* see: http://gvee.co.uk/files/sql/dbo.numbers%20&%20dbo.calendar.sql for an example */
INNER
JOIN #t_venues As t_venues
ON t_venues.start_date <= calendar.the_date
AND t_venues.end_date >= calendar.the_date
LEFT
JOIN #t_venueuser As t_venueuser
ON t_venueuser.venue_id = t_venues.venue_id
AND t_venueuser.start_date <= calendar.the_date
AND t_venueuser.end_date >= calendar.the_date
ORDER
BY t_venues.venue_id
, calendar.the_date
;
The Result
venue_id the_date is_available
----------- ----------------------- ------------
1 2014-01-01 00:00:00.000 1
1 2014-01-02 00:00:00.000 0
2 2014-01-05 00:00:00.000 0
3 2014-01-09 00:00:00.000 0
3 2014-01-10 00:00:00.000 0
3 2014-01-11 00:00:00.000 1
3 2014-01-12 00:00:00.000 1
3 2014-01-13 00:00:00.000 1
3 2014-01-14 00:00:00.000 1
3 2014-01-15 00:00:00.000 1
4 2014-01-20 00:00:00.000 1
4 2014-01-21 00:00:00.000 1
4 2014-01-22 00:00:00.000 1
4 2014-01-23 00:00:00.000 0
4 2014-01-24 00:00:00.000 0
4 2014-01-25 00:00:00.000 0
4 2014-01-26 00:00:00.000 1
4 2014-01-27 00:00:00.000 1
4 2014-01-28 00:00:00.000 1
4 2014-01-29 00:00:00.000 1
4 2014-01-30 00:00:00.000 1
(21 row(s) affected)
The Explanation
Our calendar tables contains an entry for every date.
We join our t_venues (as an aside, if you have the choice, lose the t_ prefix!) to return every day between our start_date and end_date. Example output for venue_id=4 for just this join:
venue_id the_date
----------- -----------------------
4 2014-01-20 00:00:00.000
4 2014-01-21 00:00:00.000
4 2014-01-22 00:00:00.000
4 2014-01-23 00:00:00.000
4 2014-01-24 00:00:00.000
4 2014-01-25 00:00:00.000
4 2014-01-26 00:00:00.000
4 2014-01-27 00:00:00.000
4 2014-01-28 00:00:00.000
4 2014-01-29 00:00:00.000
4 2014-01-30 00:00:00.000
(11 row(s) affected)
Now we have one row per day, we [outer] join our t_venueuser table. We join this in much the same manner as before, but with one added twist: we need to join based on the venue_id too!
Running this for venue_id=4 gives this result:
venue_id the_date t_venueuser_venue_id
----------- ----------------------- --------------------
4 2014-01-20 00:00:00.000 NULL
4 2014-01-21 00:00:00.000 NULL
4 2014-01-22 00:00:00.000 NULL
4 2014-01-23 00:00:00.000 4
4 2014-01-24 00:00:00.000 4
4 2014-01-25 00:00:00.000 4
4 2014-01-26 00:00:00.000 NULL
4 2014-01-27 00:00:00.000 NULL
4 2014-01-28 00:00:00.000 NULL
4 2014-01-29 00:00:00.000 NULL
4 2014-01-30 00:00:00.000 NULL
(11 row(s) affected)
See how we have a NULL value for rows where there is no t_venueuser record. Genius, no? ;-)
So in my first query I gave you a quick CASE statement that shows availability (1=available, 0=not available). This is for illustration only, but could be useful to you.
You can then either wrap the query up and then apply an extra filter on this calculated column or simply add a where clause in: WHERE t_venueuser.venue_id IS NULL and that will do the same trick.
This is a complete hack, but it gives the results you require, I've only tested it on the data you provided so there may well be gotchas with larger sets.
In general what you are looking at solving here is a variation of gaps and islands problem ,this is (briefly) a sequence where some items are missing. The missing items are referred as gaps and the existing items are referred as islands. If you would like to understand this issue in general check a few of the articles:
Simple talk article
blogs.MSDN article
SO answers tagged gaps-and-islands
Code:
;with dates as
(
SELECT vdates.venue_id,
vdates.vdate
FROM ( SELECT DATEADD(d,sv.number,v.start_date) vdate
, v.venue_id
FROM t_venues v
INNER JOIN master..spt_values sv
ON sv.type='P'
AND sv.number BETWEEN 0 AND datediff(d, v.start_date, v.end_date)) vdates
LEFT JOIN t_venueuser vu
ON vdates.vdate >= vu.start_date
AND vdates.vdate <= vu.end_date
AND vdates.venue_id = vu.venue_id
WHERE ISNULL(vu.venue_id,-1) = -1
)
SELECT venue_id, ISNULL([1],[2]) StartDate, [2] EndDate
FROM (SELECT venue_id, rDate, ROW_NUMBER() OVER (PARTITION BY venue_id, DateType ORDER BY rDate) AS rType, DateType as dType
FROM( SELECT d1.venue_id
,d1.vdate AS rDate
,'1' AS DateType
FROM dates AS d1
LEFT JOIN dates AS d0
ON DATEADD(d,-1,d1.vdate) = d0.vdate
LEFT JOIN dates AS d2
ON DATEADD(d,1,d1.vdate) = d2.vdate
WHERE CASE ISNULL(d2.vdate, '01 Jan 1753') WHEN '01 Jan 1753' THEN '2' ELSE '1' END = 1
AND ISNULL(d0.vdate, '01 Jan 1753') = '01 Jan 1753'
UNION
SELECT d1.venue_id
,ISNULL(d2.vdate,d1.vdate)
,'2'
FROM dates AS d1
LEFT JOIN dates AS d2
ON DATEADD(d,1,d1.vdate) = d2.vdate
WHERE CASE ISNULL(d2.vdate, '01 Jan 1753') WHEN '01 Jan 1753' THEN '2' ELSE '1' END = 2
) res
) src
PIVOT (MIN (rDate)
FOR dType IN
( [1], [2] )
) AS pvt
Results:
venue_id StartDate EndDate
1 2014-01-01 2014-01-01
3 2014-01-11 2014-01-15
4 2014-01-20 2014-01-22
4 2014-01-26 2014-01-30