SQL Select with grouping and replacing a column - sql

I have a requirement in which I need to retrieve rows in a select query in which I have to get value of END_DATE as EFFECTIVE_DATE -1 DAY for the records with same key (CARD_NBR in this case)
I have tried using it by GROUP by but I am not able to get the desired output. Could someone please help in guiding me ? The record with most recent effective date should have END_DATE as 9999-12-31 only.
Table:
CARD_NBR
SERIEL_NO
EFFECTIVE_DATE
END_DATE
12345
1
2021-01-01
9999-12-31
12345
2
2021-01-25
9999-12-31
12345
3
2021-02-15
9999-12-31
67899
1
2021-03-01
9999-12-31
67899
2
2021-04-02
9999-12-31
67899
3
2021-05-24
9999-12-31
Output:
CARD_NBR
SERIEL_NO
EFFECTIVE_DATE
END_DATE
12345
1
2021-01-01
2021-01-24
12345
2
2021-01-25
2021-02-14
12345
3
2021-02-15
9999-12-31
67899
1
2021-03-01
2021-04-01
67899
2
2021-04-02
2021-05-24
67899
3
2021-05-24
9999-12-31

You can use lead():
select t.*,
lead(effective_date - interval '1 day', 1, effective_date) over (partition by card_nbr order by effective_date) as imputed_end_date
from t;
Date manipulations are highly database-dependent so this uses Standard SQL syntax. You can incorporate this into an update, but the best approach also depends on the database.

SQLite v.3.25 now supports windows function and you can use below code to get your result.
SELECT A.CARD_NBR,
A.SRL_NO,
A.START_DT,
COALESCE(B.START_DT,A.END_DT) AS END_DT
FROM
(
SELECT A.CARD_NBR,
A.SRL_NO,
A.START_DT,
A.END_DT,
ROW_NUMBER() OVER(PARTITION BY A.CARD_NBR ORDER BY A.SRL_NO ASC) RNUM1
FROM T1 A
)A
LEFT JOIN
(
SELECT B.CARD_NBR,
B.SRL_NO,
B.START_DT,
B.END_DT,
ROW_NUMBER() OVER(PARTITION BY B.CARD_NBR ORDER BY B.SRL_NO ASC) RNUM1
FROM T1 B
)B
ON A.CARD_NBR=B.CARD_NBR
AND A.RNUM1+1=B.RNUM1

Related

How to order rows by the greatest date of each row, for a table with 8 date columns?

This is very different from doing an SQL order by 2 date columns (or for proper way to sort sql columns, which is only for 1 column). There, we would do something like:
ORDER BY CASE WHEN date_1 > date_2
THEN date_2 ELSE date_1 END
FYI, I'm using YYY-MM-DD in this example for brevity, but I also need it to work for
TIMESTAMP (YYYY-MM-DD HH:MI:SS)
I have this table:
id
name
date_1
date_2
date_3
date_4
date_5
date_6
date_7
date_8
1
John
2008-08-11
2008-08-12
2009-08-11
2009-08-21
2009-09-11
2017-08-11
2017-09-12
2017-09-30
2
Bill
2008-09-12
2008-09-12
2008-10-12
2011-09-12
2008-09-13
2022-05-20
2022-05-21
2022-05-22
3
Andy
2008-10-13
2008-10-13
2008-10-14
2008-10-15
2008-11-01
2008-11-02
2008-11-03
2008-11-04
4
Hank
2008-11-14
2008-11-15
2008-11-16
2008-11-17
2008-12-31
2009-01-01
2009-01-02
2009-01-02
5
Alex
2008-12-15
2018-12-15
2018-12-15
2018-12-16
2018-12-17
2018-12-18
2018-12-25
2008-12-31
... But, the permutations of that give me a headache, just to think about them.
This Answer had more of a "general solution", but that was to SELECT, not to ORDER BY...
SELECT MAX(date_col)
FROM(
SELECT MAX(date_col1) AS date_col FROM some_table
UNION
SELECT MAX(date_col2) AS date_col FROM some_table
UNION
SELECT MAX(date_col3) AS date_col FROM some_table
...
)
Is there something more like that, such as could be created by iterating a loop in, say PHP or Node.js? I need something a scalable solution.
I only need to list each row once.
I want to order them each by whichever col has the most recent date of those I list on that row.
Something like:
SELECT * FROM some_table WHERE
(
GREATEST OF date_1
OR date_2
OR date_3
OR date_4
OR date_5
OR date_6
OR date_7
OR date_8
)
You can use the GREATEST function to achieve it.
SELECT GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8) max_date,t.*
FROM Tab t
ORDER BY GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8) Desc;
DB Fiddle: Try it here
max_date
id
name
date_1
date_2
date_3
date_4
date_5
date_6
date_7
date_8
2022-05-22
2
Bill
2008-09-12
2008-09-12
2008-10-12
2011-09-12
2008-09-13
2022-05-20
2022-05-21
2022-05-22
2018-12-25
5
Alex
2008-12-15
2018-12-15
2018-12-15
2018-12-16
2018-12-17
2018-12-18
2018-12-25
2008-12-31
2017-09-30
1
John
2008-08-11
2008-08-12
2009-08-11
2009-08-21
2009-09-11
2017-08-11
2017-09-12
2017-09-30
2009-01-02
4
Hank
2008-11-14
2008-11-15
2008-11-16
2008-11-17
2008-12-31
2009-01-01
2009-01-02
2009-01-02
2008-11-04
3
Andy
2008-10-13
2008-10-13
2008-10-14
2008-10-15
2008-11-01
2008-11-02
2008-11-03
2008-11-04
In the event of a NULL value, GREATEST could throw-off the ORDER.
Based on this Answer from a Question about GREATEST handling NULL, this would apply these tables, based on the approved Answer:
SELECT COALESCE (
GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8),
date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8
) max_date,t.*
FROM TAB t
ORDER BY COALESCE (
GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8),
date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8
) DESC;

How to generate series using start and end date and quarters on postgres

I have a table like shown below where I want to use the start and end date to evenly distribute the value for each row to the 3 months in each quarter to all of the quarters in between start and end date (last two columns).
I am familiar with generate series and intervals in Postgres but I am having hard time to get what I want.
My table has and ID column that groups rows together, a quarter column that indicates which quarter the row references for the ID, a value column that is the value for the whole quarter (and every quarter in the date range), and start_date and end_date columns indicating the date range. Here is a sample:
ID quarter value start_date end_date
1 2 152 2019-11-07 2050-12-30
1 1 785 2019-11-07 2050-12-30
2 2 152 2019-03-05 2050-12-30
2 1 785 2019-03-05 2050-12-30
3 4 41 2018-06-12 2050-12-30
3 3 50 2018-06-12 2050-12-30
3 2 88 2018-06-12 2050-12-30
3 1 29 2018-06-12 2050-12-30
4 2 1607 2018-12-17 2050-12-30
4 1 4803 2018-12-17 2050-12-30
Here is my desired output (for ID 1):
ID quarter value start_date end_date
1 2 152/3 2020-04-01 2020-07-01
1 1 785/3 2020-01-01 2020-04-01
1 2 152/3 2021-04-01 2021-07-01
1 1 785/3 2021-01-01 2021-04-01
start_date in the output will be the next quarter on first table. I need the series to be generated from the start_date to the end_date of the first table.
You can do this by using the GENERATE_SERIES function and passing in the start and end date for each unique (by ID) row and setting the interval to 3 months. Then join the result back with your original table on both ID and quarter.
Here's an example (note original_data is what I've called your first table):
WITH
quarters_table AS (
SELECT
t.ID,
(EXTRACT('month' FROM t.quarter_date) - 1)::INT / 3 + 1 AS quarter,
t.quarter_date::DATE AS start_date,
COALESCE(
LEAD(t.quarter_date) OVER (),
DATE_TRUNC('quarter', t.original_end_date) + INTERVAL '3 months'
)::DATE AS end_date
FROM (
SELECT
original_record.ID,
original_record.end_date AS original_end_date,
GENERATE_SERIES(
DATE_TRUNC('quarter', original_record.start_date),
DATE_TRUNC('quarter', original_record.end_date),
INTERVAL '3 months'
) AS quarter_date
FROM (
SELECT DISTINCT ON (original_data.ID)
original_data.ID,
original_data.start_date,
original_data.end_date
FROM
original_data
ORDER BY
original_data.ID
) AS original_record
) AS t
)
SELECT
quarters_table.ID,
quarters_table.quarter,
original_data.value::DOUBLE PRECISION / 3 AS value,
quarters_table.start_date,
quarters_table.end_date
FROM
quarters_table
INNER JOIN
original_data
ON
quarters_table.ID = original_data.ID
AND quarters_table.quarter = original_data.quarter;
Sample output:
id | quarter | value | start_date | end_date
----+---------+------------------+------------+------------
1 | 1 | 261.666666666667 | 2020-01-01 | 2020-04-01
1 | 2 | 50.6666666666667 | 2020-04-01 | 2020-07-01
1 | 1 | 261.666666666667 | 2021-01-01 | 2021-04-01
1 | 2 | 50.6666666666667 | 2021-04-01 | 2021-07-01
For completeness, here's the original_data table I've used in testing:
WITH
original_data AS (
SELECT
1 AS ID,
2 AS quarter,
152 AS value,
'2019-11-07'::DATE AS start_date,
'2050-12-30'::DATE AS end_date
UNION ALL
SELECT
1 AS ID,
1 AS quarter,
785 AS value,
'2019-11-07'::DATE AS start_date,
'2050-12-30'::DATE AS end_date
UNION ALL
SELECT
2 AS ID,
2 AS quarter,
152 AS value,
'2019-03-05'::DATE AS start_date,
'2050-12-30'::DATE AS end_date
-- ...
)
This is one way to go about it. Showing an example based on the output you've outlined. You can then add more conditions to the CASE/WHEN for additional quarters.
SELECT
ID,
Quarter,
Value/3 AS "Value",
CASE
WHEN Quarter = 1 THEN '2020-01-01'
WHEN Quarter = 2 THEN '2020-04-01'
END AS "Start_Date",
CASE
WHEN Quarter = 1 THEN '2020-04-01'
WHEN Quarter = 2 THEN '2020-07-01'
END AS "End_Date"
FROM
Table

Need a join between different rows of a table

I have a table named projects. It has 3 rows, task_id, start_date and end _date.
It is guaranteed that the difference between the End_Date and the Start_Date is equal to 1 day for each row in the table.
If the End_Date of the tasks are consecutive, then they are part of the same project.
I need the start and end dates of projects listed by the number of days it took to complete the project in ascending order. If there is more than one project that have the same number of completion days, then order by the start date of the project.
So far I only extracted I project with a triple join, but can not list the other projects. Any idea how to use a more general JOIN in here?
input:
Task_ID Start_Date End_Date
----------- ---------- ----------
1 2015-10-01 2015-10-02
2 2015-10-02 2015-10-03
3 2015-10-03 2015-10-04
4 2015-10-13 2015-10-14
5 2015-10-14 2015-10-15
6 2015-10-28 2015-10-29
7 2015-10-30 2015-10-31
output:
start_date end_date
---------- ----------
2015-10-28 2015-10-29
2015-10-30 2015-10-31
2015-10-13 2015-10-15
2015-10-01 2015-10-04
my query:
select p3.start_date,p1.end_date
from projects p1,projects p2, projects p3
where p1.start_Date=p2.end_date and p2.start_date=p3.end_date
my query output:
start_date end_date
---------- ----------
2015-10-01 2015-10-04
This is a type of gaps-and-islands problem. You can solve it by identifying when the islands start -- and that can use lag():
select min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date = start_date then 0 else 1 end) over (order by start_date) as grp
from (select t.*,
lag(end_date) over (order by start_date) as prev_end_date
from t
) t
) t
group by grp
order by min(start_date);
The middle subquery is calculating when an "island" starts. This occurs when the previous end date is not the start_date on the next row.

Create interval from discrete dates

I have a function which saves the current status of several objects and writes it in a table, which looks like something like this:
ObjectId StatusId Date
1 10 2020-04-04 00:00:00.000
2 10 2020-04-04 00:00:00.000
1 11 2020-04-05 00:00:00.000
2 10 2020-04-05 00:00:00.000
1 10 2020-04-06 00:00:00.000
2 10 2020-04-06 00:00:00.000
I would like to make it an interval grouped by ObjectId and StatusId.
So for the above the preferred output would look like this:
ObjectId StatusId StartDate EndDate
1 10 2020-04-04 00:00:00.000 2020-04-04 00:00:00.000
1 11 2020-04-05 00:00:00.000 2020-04-05 00:00:00.000
1 10 2020-04-06 00:00:00.000 2020-04-06 00:00:00.000
2 10 2020-04-04 00:00:00.000 2020-04-06 00:00:00.000
Note one object can have the same status on multiple occasions but if it had a different status it needs to be in a separate interval. So simple group by and max(Date) doesn't work in my case.
Thanks in advance.
This is a form of gaps-and-islands. For this purpose, the difference of row numbers is probably the simplest method:
select objectid, status, min(date), max(date)
from (select t.*,
row_number() over (partition by objectid order by date) as seqnum,
row_number() over (partition by objectid, status order by date) as seqnum_2
from t
) t
group by objectid, status, (seqnum - seqnum_2);
Why this works can be a little cumbersome to explain. However, if you look at the results of the subquery, you will see how the difference is constant for the groups you want to identify.

Teradata SQL: Determine how many accounts had status change in given month

Ok, so I have a table that looks something like this:
Acct_id Eff_dt Expr_dt Prod_cd Open_dt
-------------------------------------------------------
111 2012-05-01 2013-06-01 A 2012-05-01
111 2013-06-02 2014-03-08 A 2012-05-01
111 2014-03-09 9999-12-31 B 2012-05-01
222 2015-07-15 2015-11-11 A 2015-07-15
222 2015-11-12 2016-08-08 B 2015-07-15
222 2016-08-09 9999-12-31 A 2015-07-15
333 2016-01-01 2016-04-15 B 2016-01-01
333 2016-04-16 2016-08-08 B 2016-01-01
333 2016-08-09 9999-12-31 A 2016-01-01
444 2017-02-03 2017-05-15 A 2017-02-03
444 2017-05-16 2017-12-02 A 2017-02-03
444 2017-12-03 9999-12-31 B 2017-02-03
555 2017-12-12 9999-12-31 B 2017-12-12
There are many more columns that I'm not including as they're otherwise not relevant.
What I'm trying to determine is how many accounts had a change in Prod_cd in a given month, but then only in one direction (so from A > B in this example). Sometimes however an account was first opened as B, and then later changed to A. Or it was opened as A, changed to B, and moved back to A. I only want to know the current set of accounts where in a given month the Prod_cd changed from A to B.
Eff_dt is the date when a change was made to an account (could be any change, such as address change, name change, or what I'm looking for, product code change).
Expr_dt is the expiration date of that row, essentially the last day before a new change was made. When the date of that row is 9999-12-31, that's the most current row.
Open_dt is the date the account was created.
I created a query at first that was something like this:
select
count(distinct acct_id)
from table
where prod_cd = 'B'
and expr_dt = '9999-12-31'
and eff_dt between '2017-12-01' and '2017-12-31'
and open_dt < '2017-12-01'
But it's giving me results that don't look right. I want to specifically track the # of conversions that happened, but the count of accounts I'm getting seems way too high.
There is probably a way to create a more reliable query using window functions, but given that the Prod_cd changes can happen in multiple directions, I'm not sure how to write that query. Any help would be appreciated!
If you are specifically looking for the switch A --> B, then the simplest method is to use lag(). But, Teradata requires a slightly different formulation:
select count(distinct acct_id)
from (select t.*,
max(prod_cd) over (partition by acct_id order by effdt rows between 1 preceding and 1 preceding) as prev_prod_cd
from t
) t
where prod_cd = 'B' and prev_prod_cd = 'A' and
expr_dt = '9999-12-31' and
eff_dt between '2017-12-01' and '2017-12-31' and
open_dt < '2017-12-01';
I am guessing that the date conditions go in the outer query -- meaning that they lag() does not use them.
Similar to Gordon's answer, but using a supported window function (instead of LAG) and using Teradata's QUALIFY clause to do the lag-gy lookup:
SELECT DISTINCT acct_id
FROM mytable
QUALIFY
MAX(prod_cd) OVER (PARTITION BY acct_id ORDER BY eff_dt ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'A'
AND prod_cd = 'B'
AND expr_dt = '9999-12-31'
AND eff_dt between DATE '2013-01-01' and DATE '2017-12-31'
AND open_dt < DATE '2017-12-01'