Teradata SQL: Determine how many accounts had status change in given month - sql

Ok, so I have a table that looks something like this:
Acct_id Eff_dt Expr_dt Prod_cd Open_dt
-------------------------------------------------------
111 2012-05-01 2013-06-01 A 2012-05-01
111 2013-06-02 2014-03-08 A 2012-05-01
111 2014-03-09 9999-12-31 B 2012-05-01
222 2015-07-15 2015-11-11 A 2015-07-15
222 2015-11-12 2016-08-08 B 2015-07-15
222 2016-08-09 9999-12-31 A 2015-07-15
333 2016-01-01 2016-04-15 B 2016-01-01
333 2016-04-16 2016-08-08 B 2016-01-01
333 2016-08-09 9999-12-31 A 2016-01-01
444 2017-02-03 2017-05-15 A 2017-02-03
444 2017-05-16 2017-12-02 A 2017-02-03
444 2017-12-03 9999-12-31 B 2017-02-03
555 2017-12-12 9999-12-31 B 2017-12-12
There are many more columns that I'm not including as they're otherwise not relevant.
What I'm trying to determine is how many accounts had a change in Prod_cd in a given month, but then only in one direction (so from A > B in this example). Sometimes however an account was first opened as B, and then later changed to A. Or it was opened as A, changed to B, and moved back to A. I only want to know the current set of accounts where in a given month the Prod_cd changed from A to B.
Eff_dt is the date when a change was made to an account (could be any change, such as address change, name change, or what I'm looking for, product code change).
Expr_dt is the expiration date of that row, essentially the last day before a new change was made. When the date of that row is 9999-12-31, that's the most current row.
Open_dt is the date the account was created.
I created a query at first that was something like this:
select
count(distinct acct_id)
from table
where prod_cd = 'B'
and expr_dt = '9999-12-31'
and eff_dt between '2017-12-01' and '2017-12-31'
and open_dt < '2017-12-01'
But it's giving me results that don't look right. I want to specifically track the # of conversions that happened, but the count of accounts I'm getting seems way too high.
There is probably a way to create a more reliable query using window functions, but given that the Prod_cd changes can happen in multiple directions, I'm not sure how to write that query. Any help would be appreciated!

If you are specifically looking for the switch A --> B, then the simplest method is to use lag(). But, Teradata requires a slightly different formulation:
select count(distinct acct_id)
from (select t.*,
max(prod_cd) over (partition by acct_id order by effdt rows between 1 preceding and 1 preceding) as prev_prod_cd
from t
) t
where prod_cd = 'B' and prev_prod_cd = 'A' and
expr_dt = '9999-12-31' and
eff_dt between '2017-12-01' and '2017-12-31' and
open_dt < '2017-12-01';
I am guessing that the date conditions go in the outer query -- meaning that they lag() does not use them.

Similar to Gordon's answer, but using a supported window function (instead of LAG) and using Teradata's QUALIFY clause to do the lag-gy lookup:
SELECT DISTINCT acct_id
FROM mytable
QUALIFY
MAX(prod_cd) OVER (PARTITION BY acct_id ORDER BY eff_dt ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'A'
AND prod_cd = 'B'
AND expr_dt = '9999-12-31'
AND eff_dt between DATE '2013-01-01' and DATE '2017-12-31'
AND open_dt < DATE '2017-12-01'

Related

Can not understand the logic of this query

This query is trying to get the s1ppmp (the price of product) of each s1ilie (size), each s1iref (reference) and s1ydat (the lastest date) for the price, because one product could have more than one price on different dates, for example, during the black friday or the normal price for the other days.
The anmoisjour comes from calender table, but there is no connection between CALENDER table and main table msk100, so ... I don't understand the logic of this query...
SELECT s1isoc,
s1ilie,
s1iref,
s1ydat,
anmoisjour,
s1ppmp
FROM msk110
INNER JOIN (SELECT s1isoc AS isoc,
s1ilie AS ilie,
s1iref AS iref,
MAX(s1ydat) AS ydat,
anmoisjour
FROM calendrier,
msk110
WHERE s1ydat <= anmoisjour
AND anmoisjour BETWEEN 20100101 AND 20302131
GROUP BY s1isoc,
s1ilie,
s1iref,
anmoisjour) a ON s1isoc = isoc
AND s1ilie = ilie
AND s1iref = iref
AND s1ydat = ydat
WHERE s1isoc = 1
AND anmoisjour BETWEEN 20100101 AND 20302131
ORDER BY anmoisjour,
s1ydat;
s1isoc, s1ilie, s1iref, s1ydat,and s1ppmp comes from msk110
and
anmoisjour belongs to calender table, which is a date table.
I believe the confusion is the way that the calendar table is joined.
If anmoisjour is the day column of the calendar table and this table holds 1 row per day, the WHERE filter anmoisjour BETWEEN 20100101 AND 20302131 makes calendrier hold a row for each day for 20 years (2010 to 2030).
They way the product prices table msk100 is linked to the calendar calendrier table is not directly by date, but with a max date (msk100.s1ydat <= calendrier.anmoisjour). This means that for example a date of msk100.s1ydat that's 2015-01-01 will join against every row of the calendar table thats between 2015-01-01 and 2030-12-31.
The GROUP BY is by the calendar table's date (calendrier.anmoisjour) this means that if a particular product, size and price repeats on different dates, let's say the only occurrences are on dates 2015-01-01, 2017-01-01 and 2020-01-01, then the result of the group by would be the following (ordered by calendar date, displaying even NULL to demonstrate):
MAX(s1ydat) anmoisjour
null 2010-01-01
null ...
null 2014-12-31
2015-01-01 2015-01-01
2015-01-01 2015-01-02
2015-01-01 ...
2015-01-01 2016-01-01
2015-01-01 ...
2017-01-01 2017-01-01
2017-01-01 2017-01-02
2017-01-01 ...
2017-01-01 2019-12-31
2020-01-01 2020-01-01
2020-01-01 2025-01-01
2020-01-01 ...
What your query is showing is the contents of the product table with the last date that that particular product had that particular price, for each day over 20 years, also where s1isoc = 1 (which I don't know what that means).

Expanding/changing my query to find more entries using (potentially) IFELSE

My question will use this dataset as an example. I have a query setup (I have changed variables to more generic variables for the sake of posting this on the internet so the query may not make perfect sense) that picks the most recent date for a given account. So the query returns values with a reason_type of 1 with the most recent date. This query has effective_date set to is not null.
account date effective_date value reason_type
123456 4/20/2017 5/1/2017 5 1
123456 1/20/2017 2/1/2017 10 1
987654 2/5/2018 3/1/2018 15 1
987654 12/31/2017 2/1/2018 20 1
456789 4/27/2018 5/1/2018 50 1
456789 1/24/2018 2/1/2018 60 1
456123 4/25/2017 null 15 2
789123 5/1/2017 null 16 2
666888 2/1/2018 null 31 2
333222 1/1/2018 null 20 2
What I am looking to do now is to basically use that logic to only apply to reason_type
if there is an entry for it, otherwise have it default to reason_type
I think I should be using an IFELSE, but I'm admittedly not knowledgeable about how I would go about that.
Here is the code that I currently have to return the reason_type 1s most recent entry.
I hope my question is clear.
SELECT account, date, effective_date, value, reason_type
from
(
SELECT account, date, effective_date, value, reason_type
ROW_NUMBER() over (partition by account order by date desc) rn
from mytable
WHERE value is not null
AND effective_date is not null
)
WHERE rn =1
I think you might want something like this (do you really have a column named date by the way? That seems like a bad idea):
SELECT account, date, effective_date, value, reason_type
FROM (
SELECT account, date, effective_date, value, reason_type
, ROW_NUMBER() OVER ( PARTITION BY account ORDER BY date DESC ) AS rn
FROM mytable
WHERE value IS NOT NULL
) WHERE rn = 1
-- effective_date IS NULL or is on or before today's date
AND ( effective_date IS NULL OR effective_date < TRUNC(SYSDATE+1) );
Hope this helps.

sql-query that change all validTo dates to the next validFrom date minus one Day

I have to modify a big pricelist table so that there is only one valid price for every article.
Sometimes the sales employees insert new prices and forgot to change the old infinite validTo dates.
So I have to write a sql-query to change all validTo dates to the next validFrom date minus one day, when the validTo date has infinite validity (9999-12-31).
But I have no idea how can i reach this with only SQL (Oracle 12).
anr price validFrom validTo
1 447.1 2015-06-01 9999-12-31 <
1 447.2 2015-06-16 2015-06-16
1 447.3 2015-06-17 2015-06-17
1 447.4 2015-06-22 2015-06-22
1 447.5 2015-07-06 9999-12-31 <
1 395.0 2015-07-20 2015-07-20
1 447.6 2015-08-03 9999-12-31 <
1 447.7 2015-08-17 9999-12-31 <
1 447.8 2015-08-24 9999-12-31 <
1 395.0 2015-09-07 2015-09-07
1 450.9 2015-11-15 9999-12-31 < no change because it is the last entry
after updating the the table, the result should look like
anr price validFrom validTo
1 447.1 2015-06-01 2015-06-15 <
1 447.2 2015-06-16 2015-06-16
1 447.3 2015-06-17 2015-06-17
1 447.4 2015-06-22 2015-06-22
1 447.5 2015-07-06 2015-07-19 <
1 395.0 2015-07-20 2015-07-20
1 447.6 2015-08-03 2015-08-16 <
1 447.7 2015-08-17 2015-08-23 <
1 447.8 2015-08-24 2015-09-06 <
1 395.0 2015-09-07 2015-09-07
1 450.9 2015-11-15 9999-12-31 <
In order to update an end date you can simply select the minimum of all higher start dates.
update mytable upd
set enddate = coalesce(
(
select min(startdate) - 1
from mytable later
where later.startdate > upd.startdate
and later.anr = upd.anr -- same product
), date'9999-12-31') -- coalesce for the case there is no later record
where enddate = date'9999-12-31';
I have taken anr to be the product id. If it isn't then change the statement accordingly.
Oracle provides an analytic function LEAD that references the current-plus-n-th record given a sort criterion. This function may serve the purpose of selecting the proper date value in an update statement as follows ( let test_prices be the table name, ppk its PK ):
update test_prices p
set p.validTo = (
select ps.vtn
from (
select lead ( p1.validFrom, 1 ) over ( order by p1.validFrom ) - 1 vtn
, ppk
from test_prices p1
) ps
where ps.ppk = p.ppk
)
where to_char(p.validTo, 'YYYY') = '9999'
and p.validFrom != ( select max(validFrom) from test_prices )
;
UPDATE VALID_DATES v
SET validTo = (
SELECT validTo
FROM (
SELECT anr,
validFrom,
COALESCE(
LEAD( validFrom - 1, 1 ) OVER ( PARTITION BY anr ORDER BY validFrom ),
validTo
) AS validTo
FROM valid_dates
) u
WHERE v.anr = u.anr
AND v.validFrom = u.validFrom
)
WHERE validTo = DATE '9999-12-31';
There are two possibilities:
1. Explicit time spans
price validFrom validTo
90.99 2016-01-01 9999-12-31
80.00 2016-01-16 2016-01-17
The first price would be valid both before January 16 and after January 17, whereas the second price was only valid on two days in January.
It would then be a very bad idea to change the first validTo.
2. Implicit time spans
price validFrom
90.99 2016-01-01
80.00 2016-01-16
90.99 2016-01-18
This data represents the same as in the explicit time spans example. The first price is valid before January 16, then the second price is valid until January 17, and afterwards the next price (which equals the first price again) is valid. Here you don't need an EndDate, because it's implicit. Of course the first price is only valid until January 15, because from January 16 there is another price valid (record #2).
So: Either remove the EndDate column completely or let it untouched. Don't simply update it, as you have intended. If you updated your records to next date minus one, you would actually hold data redundantly, which might lead to problems later.

Select min/max from group defined by one column as subgroup of another - SQL, HPVertica

I'm trying to find the min and max date within a subgroup of another group. Here's example 'data'
ID Type Date
1 A 7/1/2015
1 B 1/1/2015
1 A 8/5/2014
22 B 3/1/2015
22 B 9/1/2014
333 A 8/1/2015
333 B 4/1/2015
333 B 3/29/2014
333 B 2/28/2013
333 C 1/1/2013
What I'd like to identify is - within an ID, what is the min/max Date for each block of similar Type? So for ID # 333 I want the below info:
A: min & max = 8/1/2015
B: min = 2/28/2013
max = 4/1/2015
C: min & max = 1/1/2013
I'm having trouble figuring out how to identify only uninterrupted groupings of Type within a grouping of ID. For ID #1, I need to keep the two 'A' Types with separate min/max dates because they were split by a Type 'B', so I can't just pull the min date of all Type A's for ID #1, it has to be two separate instances.
What I've tried is something like the below two lines, but neither of these accurately captures the case mentioned above for ID #1 where Type B interrupts Type A.
Max(Date) OVER (Partition By ID, Type)
or this:
Row_Number() OVER (Partition By ID, Type ORDER BY Date DESC)
,then selecting Row #1 for max date, and date ASC w/ row #1 for min date
Thank you for any insight you can provide!
If I understand right, you want the min/max values for an id/type grouped using a descending date sort, but the catch is that you want them based on clusters within the id by time.
What you can do is use CONDITIONAL_CHANGE_EVENT to tag the rows on change of type, then use that in your GROUP BY on a standard min/max aggregation.
This would be the intermediate step towards getting to what you want:
select ID, Type, Date,
CONDITIONAL_CHANGE_EVENT(Type) OVER( PARTITION BY ID ORDER BY Date desc) cce
from mytable
group by ID, Type, Date
order by ID, Date desc, Type
ID Type Date cce
1 A 2015-07-01 00:00:00 0
1 B 2015-01-01 00:00:00 1
1 A 2014-08-05 00:00:00 2
22 B 2015-03-01 00:00:00 0
22 B 2014-09-01 00:00:00 0
333 A 2015-08-01 00:00:00 0
333 B 2015-04-01 00:00:00 1
333 B 2014-03-29 00:00:00 1
333 B 2013-02-28 00:00:00 1
333 C 2013-01-01 00:00:00 2
Once you have them grouped using CCE, you can do an aggregate on this to get the min/max you are looking for grouping on cce. You can play with the order by at the bottom, this ordering seem to make the most sense to me.
select id, type, min(date), max(date)
from (
select ID, Type, Date,
CONDITIONAL_CHANGE_EVENT(Type) OVER( PARTITION BY ID ORDER BY Date desc) cce
from mytable
group by ID, Type, Date
) x
group by id, type, cce
order by id, 3 desc, 4 desc;
id type min max
1 A 2015-07-01 00:00:00 2015-07-01 00:00:00
1 B 2015-01-01 00:00:00 2015-01-01 00:00:00
1 A 2014-08-05 00:00:00 2014-08-05 00:00:00
22 B 2014-09-01 00:00:00 2015-03-01 00:00:00
333 A 2015-08-01 00:00:00 2015-08-01 00:00:00
333 B 2013-02-28 00:00:00 2015-04-01 00:00:00
333 C 2013-01-01 00:00:00 2013-01-01 00:00:00

PostgreSQL query for multiple update

I have a table in which I have 4 columns: emp_no,desig_name,from_date and to_date:
emp_no desig_name from_date to_date
1001 engineer 2004-08-01 00:00:00
1001 sr.engineer 2010-08-01 00:00:00
1001 chief.engineer 2013-08-01 00:00:00
So my question is to update first row to_date column just one day before from_date of second row as well as for the second one aslo?
After update it should look like:
emp_no desig_name from_date to_date
1001 engineer 2004-08-01 00:00:00 2010-07-31 00:00:00
1001 sr.engineer 2010-08-01 00:00:00 2013-07-31 00:00:00
1001 chief.engineer 2013-08-01 00:00:00
You can calculate the "next" date using the lead() function.
This calculated value can then be used to update the table:
with calc as (
select promotion_id,
emp_no,
from_date,
lead(from_date) over (partition by emp_no order by from_date) as next_date
from emp
)
update emp
set to_date = c.next_date - interval '1' day
from calc c
where c.promotion_id = emp.promotion_id;
As you can see getting that value is quite easy, and storing derived information is very often not a good idea. You might want to consider a view that calculates this information on the fly so you don't need to update your table each time you insert a new row.
SQLFiddle example: http://sqlfiddle.com/#!15/31665/1