sql-query that change all validTo dates to the next validFrom date minus one Day - sql

I have to modify a big pricelist table so that there is only one valid price for every article.
Sometimes the sales employees insert new prices and forgot to change the old infinite validTo dates.
So I have to write a sql-query to change all validTo dates to the next validFrom date minus one day, when the validTo date has infinite validity (9999-12-31).
But I have no idea how can i reach this with only SQL (Oracle 12).
anr price validFrom validTo
1 447.1 2015-06-01 9999-12-31 <
1 447.2 2015-06-16 2015-06-16
1 447.3 2015-06-17 2015-06-17
1 447.4 2015-06-22 2015-06-22
1 447.5 2015-07-06 9999-12-31 <
1 395.0 2015-07-20 2015-07-20
1 447.6 2015-08-03 9999-12-31 <
1 447.7 2015-08-17 9999-12-31 <
1 447.8 2015-08-24 9999-12-31 <
1 395.0 2015-09-07 2015-09-07
1 450.9 2015-11-15 9999-12-31 < no change because it is the last entry
after updating the the table, the result should look like
anr price validFrom validTo
1 447.1 2015-06-01 2015-06-15 <
1 447.2 2015-06-16 2015-06-16
1 447.3 2015-06-17 2015-06-17
1 447.4 2015-06-22 2015-06-22
1 447.5 2015-07-06 2015-07-19 <
1 395.0 2015-07-20 2015-07-20
1 447.6 2015-08-03 2015-08-16 <
1 447.7 2015-08-17 2015-08-23 <
1 447.8 2015-08-24 2015-09-06 <
1 395.0 2015-09-07 2015-09-07
1 450.9 2015-11-15 9999-12-31 <

In order to update an end date you can simply select the minimum of all higher start dates.
update mytable upd
set enddate = coalesce(
(
select min(startdate) - 1
from mytable later
where later.startdate > upd.startdate
and later.anr = upd.anr -- same product
), date'9999-12-31') -- coalesce for the case there is no later record
where enddate = date'9999-12-31';
I have taken anr to be the product id. If it isn't then change the statement accordingly.

Oracle provides an analytic function LEAD that references the current-plus-n-th record given a sort criterion. This function may serve the purpose of selecting the proper date value in an update statement as follows ( let test_prices be the table name, ppk its PK ):
update test_prices p
set p.validTo = (
select ps.vtn
from (
select lead ( p1.validFrom, 1 ) over ( order by p1.validFrom ) - 1 vtn
, ppk
from test_prices p1
) ps
where ps.ppk = p.ppk
)
where to_char(p.validTo, 'YYYY') = '9999'
and p.validFrom != ( select max(validFrom) from test_prices )
;

UPDATE VALID_DATES v
SET validTo = (
SELECT validTo
FROM (
SELECT anr,
validFrom,
COALESCE(
LEAD( validFrom - 1, 1 ) OVER ( PARTITION BY anr ORDER BY validFrom ),
validTo
) AS validTo
FROM valid_dates
) u
WHERE v.anr = u.anr
AND v.validFrom = u.validFrom
)
WHERE validTo = DATE '9999-12-31';

There are two possibilities:
1. Explicit time spans
price validFrom validTo
90.99 2016-01-01 9999-12-31
80.00 2016-01-16 2016-01-17
The first price would be valid both before January 16 and after January 17, whereas the second price was only valid on two days in January.
It would then be a very bad idea to change the first validTo.
2. Implicit time spans
price validFrom
90.99 2016-01-01
80.00 2016-01-16
90.99 2016-01-18
This data represents the same as in the explicit time spans example. The first price is valid before January 16, then the second price is valid until January 17, and afterwards the next price (which equals the first price again) is valid. Here you don't need an EndDate, because it's implicit. Of course the first price is only valid until January 15, because from January 16 there is another price valid (record #2).
So: Either remove the EndDate column completely or let it untouched. Don't simply update it, as you have intended. If you updated your records to next date minus one, you would actually hold data redundantly, which might lead to problems later.

Related

how to aggregate one record multiple times based on condition

I have a bunch of records in the table below.
product_id produced_date expired_date
123 2010-02-01 2012-05-31
234 2013-03-01 2014-08-04
345 2012-05-01 2018-02-25
... ... ...
I want the output to display how many unexpired products currently we have at the monthly level. (Say, if a product expires on August 04, we still count it in August stock)
Month n_products
2010-02-01 10
2010-03-01 12
...
2022-07-01 25
2022-08-01 15
How should I do this in Presto or Hive? Thank you!
You can use below SQL.
Here we are using case when to check if a product is expired or not(produced_date >= expired_date ), if its expired, we are summing it to get count of product that has been expired. And then group that data over expiry month.
select
TRUNC(expired_date, 'MM') expired_month,
SUM( case when produced_date >= expired_date then 1 else 0 end) n_products
from mytable
group by 1
We can use unnest and sequence functions to create a derived table; Joining our table with this derived table, should give us the desired result.
Select m.month,count(product_id) as n_products
(Select
(select x
from unnest(sequence(Min(month(produced_date)), Max(month(expired_date)), Interval '1' month)) t(x)
) as month
from table) m
left join table t on m.month >= t.produced_date and m.month <= t.expired_date
group by 1
order by 1

Query to find rows with nearest date in future

I'm trying to display a result set based on a min date value and today's date but can't seem to make it work. It's essentially a date sensitive price list.
Example Data
ID Title Value ExpireDate
1 Fred 10 2019-03-01
2 Barney 15 2019-03-01
3 Fred2 20 2019-06-01
4 Barney2 25 2019-06-01
5 Fred3 30 2019-07-01
6 Barney3 55 2019-07-01
Required Results:
Display records based on minimum date > GetDate()
3 Fred2 20 2019-06-01
4 Barney2 25 2019-06-01
Any assistance would be great - thank you.
Use where clause to filter all future rows and row_number() to find the first row per group:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Title ORDER BY ExpireDate) AS rn
FROM t
WHERE ExpireDate >= CAST(CURRENT_TIMESTAMP AS DATE)
) AS x
WHERE rn = 1
Based on your revised question, you can simply do this:
SELECT TOP 1 WITH TIES *
FROM t
WHERE ExpireDate >= CAST(CURRENT_TIMESTAMP AS DATE)
ORDER BY ExpireDate

Teradata SQL: Determine how many accounts had status change in given month

Ok, so I have a table that looks something like this:
Acct_id Eff_dt Expr_dt Prod_cd Open_dt
-------------------------------------------------------
111 2012-05-01 2013-06-01 A 2012-05-01
111 2013-06-02 2014-03-08 A 2012-05-01
111 2014-03-09 9999-12-31 B 2012-05-01
222 2015-07-15 2015-11-11 A 2015-07-15
222 2015-11-12 2016-08-08 B 2015-07-15
222 2016-08-09 9999-12-31 A 2015-07-15
333 2016-01-01 2016-04-15 B 2016-01-01
333 2016-04-16 2016-08-08 B 2016-01-01
333 2016-08-09 9999-12-31 A 2016-01-01
444 2017-02-03 2017-05-15 A 2017-02-03
444 2017-05-16 2017-12-02 A 2017-02-03
444 2017-12-03 9999-12-31 B 2017-02-03
555 2017-12-12 9999-12-31 B 2017-12-12
There are many more columns that I'm not including as they're otherwise not relevant.
What I'm trying to determine is how many accounts had a change in Prod_cd in a given month, but then only in one direction (so from A > B in this example). Sometimes however an account was first opened as B, and then later changed to A. Or it was opened as A, changed to B, and moved back to A. I only want to know the current set of accounts where in a given month the Prod_cd changed from A to B.
Eff_dt is the date when a change was made to an account (could be any change, such as address change, name change, or what I'm looking for, product code change).
Expr_dt is the expiration date of that row, essentially the last day before a new change was made. When the date of that row is 9999-12-31, that's the most current row.
Open_dt is the date the account was created.
I created a query at first that was something like this:
select
count(distinct acct_id)
from table
where prod_cd = 'B'
and expr_dt = '9999-12-31'
and eff_dt between '2017-12-01' and '2017-12-31'
and open_dt < '2017-12-01'
But it's giving me results that don't look right. I want to specifically track the # of conversions that happened, but the count of accounts I'm getting seems way too high.
There is probably a way to create a more reliable query using window functions, but given that the Prod_cd changes can happen in multiple directions, I'm not sure how to write that query. Any help would be appreciated!
If you are specifically looking for the switch A --> B, then the simplest method is to use lag(). But, Teradata requires a slightly different formulation:
select count(distinct acct_id)
from (select t.*,
max(prod_cd) over (partition by acct_id order by effdt rows between 1 preceding and 1 preceding) as prev_prod_cd
from t
) t
where prod_cd = 'B' and prev_prod_cd = 'A' and
expr_dt = '9999-12-31' and
eff_dt between '2017-12-01' and '2017-12-31' and
open_dt < '2017-12-01';
I am guessing that the date conditions go in the outer query -- meaning that they lag() does not use them.
Similar to Gordon's answer, but using a supported window function (instead of LAG) and using Teradata's QUALIFY clause to do the lag-gy lookup:
SELECT DISTINCT acct_id
FROM mytable
QUALIFY
MAX(prod_cd) OVER (PARTITION BY acct_id ORDER BY eff_dt ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'A'
AND prod_cd = 'B'
AND expr_dt = '9999-12-31'
AND eff_dt between DATE '2013-01-01' and DATE '2017-12-31'
AND open_dt < DATE '2017-12-01'

Available date range by year

I have a table that holds date
ID Dates
1 2014-01-20
2 2014-01-21
...
100 2014-05-20
101 2014-06-01 --Missing a few dates
102 2014-06-02
...
201 2014-10-31
202 2014-12-05 --Missing a few dates
...
349 2015-04-29
350 2015-04-30
I want to find the available date range by year between a from and to date, for example
#StartDate: 2014/04/06
#EndDate: 2015/04/05
The expected result is
Year StartRange EndRange
2014 2014-04-06 2014-05-20
2014 2014-06-01 2014-10-31
2014 2014-12-05 2014-12-31
2015 2015-01-01 2015-04-05
I am trying to find the available date ranges from the Dates column. Lets take the first row in the expected result 2014-04-06 to 2014-05-20 which says I have continuous dates from the 6th April to 20th May then there is a break (I do not have dates from 2014-05-21 to 2014-05-30)
The dates 2014-04-06 (in the first row) and 2015-04-05 (in the last row) are included in the expected result as it is the start and end date (parameter to the query) and I have those dates in the [Dates] column of my table
Thanks
This is an "Islands and Gaps" type of problem. Here is one way to do it:
;
WITH
cteDays As (SELECT *, DATEDIFF(dd,0,Dates) As DayNo From YourTable)
, cteDifs As
(
SELECT *,
DayNo-(ROW_NUMBER() OVER(ORDER BY Dates, ID)) As Dif
FROM cteDays
)
SELECT
Year(Dates) As [Year],
MIN(Dates) As StartRange,
MAX(Dates) As EndRange
FROM cteDifs
GROUP BY Year(Dates), Dif
ORDER BY [Year], StartRange

SQL Server: Finding date given EndDate and # Days, excluding days from specific date ranges

I have a TableA in a database similar to the following:
Id | Status | Start | End
1 | Illness | 2013-04-02 | 2013-04-23
2 | Illness | 2013-05-05 | 2014-01-01
3 | Vacation | 2014-02-01 | 2014-03-01
4 | Illness | 2014-03-08 | 2014-03-09
5 | Vacation | 2014-05-05 | NULL
Imagine it's keeping track of a specific user's "Away" days. Given the following Inputs:
SomeEndDate (Date),
NumDays (Integer)
I want to find the SomeStartDate (Date) that is Numdays non-illness days from EndDate. In other words, say I am given a SomeEndDate value '2014-03-10' and a NumDays value of 60; the matching SomeStartDate would be:
2014-03-10 to 2014-03-09 = 1
2014-03-08 to 2014-01-01 = 57
2013-05-05 to 2013-05-03 = 2
So, at 60 non-illness days, we get a SomeStartDate of '2013-05-03'. IS there any easy way to accomplish this in SQL? I imagine I could loop each day, check whether or not it falls into one of the illness ranges, and increment a counter if not (exiting the loop after counter = #numdays)... but that seems wildly inefficient. Appreciate any help.
Make a Calendar table that has a list of all the dates you will ever care about.
SELECT MIN([date])
FROM (
SELECT TOP(#NumDays) [date]
FROM Calendar c
WHERE c.Date < #SomeEndDate
AND NOT EXISTS (
SELECT 1
FROM TableA a
WHERE c.Date BETWEEN a.Start AND a.END
AND Status = 'Illness'
)
ORDER BY c.Date
) t
The Calendar table method lets you also easily exclude holidays, weekends, etc.
SQL Server 2012:
Try this solution:
DECLARE #NumDays INT = 70, #SomeEndDate DATE = '2014-03-10';
SELECT
[RangeStop],
CASE
WHEN RunningTotal_NumOfDays <= #NumDays THEN [RangeStart]
WHEN RunningTotal_NumOfDays - Current_NumOfDays <= #NumDays THEN DATEADD(DAY, -(#NumDays - (RunningTotal_NumOfDays - Current_NumOfDays))+1, [RangeStop])
END AS [RangeStart]
FROM (
SELECT
y.*,
DATEDIFF(DAY, y.RangeStart, y.RangeStop) AS Current_NumOfDays,
SUM( DATEDIFF(DAY, y.RangeStart, y.RangeStop) ) OVER(ORDER BY y.RangeStart DESC) AS RunningTotal_NumOfDays
FROM (
SELECT LEAD(x.[End]) OVER(ORDER BY x.[End] DESC) AS RangeStart, -- It's previous date because of "ORDER BY x.[End] DESC"
x.[Start] AS RangeStop
FROM (
SELECT #SomeEndDate AS [Start], '9999-12-31' AS [End]
UNION ALL
SELECT x.[Start], x.[End]
FROM #MyTable AS x
WHERE x.[Status] = 'Illness'
AND x.[End] <= #SomeEndDate
) x
) y
) z
WHERE RunningTotal_NumOfDays - Current_NumOfDays <= #NumDays;
/*
Output:
RangeStop RangeStart
---------- ----------
2014-03-10 2014-03-09
2014-03-08 2014-01-01
2013-05-05 2013-05-03
*/
Note #1: LEAD(End) will return the previous End date (previous because of ORDER BY End DESC)
Note #2: DATEDIFF(DAY, RangeStart, RangeStop) computes the num. of days between current start (alias x.RangeStop) and "previous" end (alias x.RangeStar) => Current_NumOfDays
Note #3: SUM( Current_NumOfDays ) computes a running total thus: 1 + 66 + (3)
Note #4: I've used #NumOfDays = 70 (not 60)