How to order rows by the greatest date of each row, for a table with 8 date columns? - sql

This is very different from doing an SQL order by 2 date columns (or for proper way to sort sql columns, which is only for 1 column). There, we would do something like:
ORDER BY CASE WHEN date_1 > date_2
THEN date_2 ELSE date_1 END
FYI, I'm using YYY-MM-DD in this example for brevity, but I also need it to work for
TIMESTAMP (YYYY-MM-DD HH:MI:SS)
I have this table:
id
name
date_1
date_2
date_3
date_4
date_5
date_6
date_7
date_8
1
John
2008-08-11
2008-08-12
2009-08-11
2009-08-21
2009-09-11
2017-08-11
2017-09-12
2017-09-30
2
Bill
2008-09-12
2008-09-12
2008-10-12
2011-09-12
2008-09-13
2022-05-20
2022-05-21
2022-05-22
3
Andy
2008-10-13
2008-10-13
2008-10-14
2008-10-15
2008-11-01
2008-11-02
2008-11-03
2008-11-04
4
Hank
2008-11-14
2008-11-15
2008-11-16
2008-11-17
2008-12-31
2009-01-01
2009-01-02
2009-01-02
5
Alex
2008-12-15
2018-12-15
2018-12-15
2018-12-16
2018-12-17
2018-12-18
2018-12-25
2008-12-31
... But, the permutations of that give me a headache, just to think about them.
This Answer had more of a "general solution", but that was to SELECT, not to ORDER BY...
SELECT MAX(date_col)
FROM(
SELECT MAX(date_col1) AS date_col FROM some_table
UNION
SELECT MAX(date_col2) AS date_col FROM some_table
UNION
SELECT MAX(date_col3) AS date_col FROM some_table
...
)
Is there something more like that, such as could be created by iterating a loop in, say PHP or Node.js? I need something a scalable solution.
I only need to list each row once.
I want to order them each by whichever col has the most recent date of those I list on that row.
Something like:
SELECT * FROM some_table WHERE
(
GREATEST OF date_1
OR date_2
OR date_3
OR date_4
OR date_5
OR date_6
OR date_7
OR date_8
)

You can use the GREATEST function to achieve it.
SELECT GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8) max_date,t.*
FROM Tab t
ORDER BY GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8) Desc;
DB Fiddle: Try it here
max_date
id
name
date_1
date_2
date_3
date_4
date_5
date_6
date_7
date_8
2022-05-22
2
Bill
2008-09-12
2008-09-12
2008-10-12
2011-09-12
2008-09-13
2022-05-20
2022-05-21
2022-05-22
2018-12-25
5
Alex
2008-12-15
2018-12-15
2018-12-15
2018-12-16
2018-12-17
2018-12-18
2018-12-25
2008-12-31
2017-09-30
1
John
2008-08-11
2008-08-12
2009-08-11
2009-08-21
2009-09-11
2017-08-11
2017-09-12
2017-09-30
2009-01-02
4
Hank
2008-11-14
2008-11-15
2008-11-16
2008-11-17
2008-12-31
2009-01-01
2009-01-02
2009-01-02
2008-11-04
3
Andy
2008-10-13
2008-10-13
2008-10-14
2008-10-15
2008-11-01
2008-11-02
2008-11-03
2008-11-04

In the event of a NULL value, GREATEST could throw-off the ORDER.
Based on this Answer from a Question about GREATEST handling NULL, this would apply these tables, based on the approved Answer:
SELECT COALESCE (
GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8),
date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8
) max_date,t.*
FROM TAB t
ORDER BY COALESCE (
GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8),
date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8
) DESC;

Related

SQL Select with grouping and replacing a column

I have a requirement in which I need to retrieve rows in a select query in which I have to get value of END_DATE as EFFECTIVE_DATE -1 DAY for the records with same key (CARD_NBR in this case)
I have tried using it by GROUP by but I am not able to get the desired output. Could someone please help in guiding me ? The record with most recent effective date should have END_DATE as 9999-12-31 only.
Table:
CARD_NBR
SERIEL_NO
EFFECTIVE_DATE
END_DATE
12345
1
2021-01-01
9999-12-31
12345
2
2021-01-25
9999-12-31
12345
3
2021-02-15
9999-12-31
67899
1
2021-03-01
9999-12-31
67899
2
2021-04-02
9999-12-31
67899
3
2021-05-24
9999-12-31
Output:
CARD_NBR
SERIEL_NO
EFFECTIVE_DATE
END_DATE
12345
1
2021-01-01
2021-01-24
12345
2
2021-01-25
2021-02-14
12345
3
2021-02-15
9999-12-31
67899
1
2021-03-01
2021-04-01
67899
2
2021-04-02
2021-05-24
67899
3
2021-05-24
9999-12-31
You can use lead():
select t.*,
lead(effective_date - interval '1 day', 1, effective_date) over (partition by card_nbr order by effective_date) as imputed_end_date
from t;
Date manipulations are highly database-dependent so this uses Standard SQL syntax. You can incorporate this into an update, but the best approach also depends on the database.
SQLite v.3.25 now supports windows function and you can use below code to get your result.
SELECT A.CARD_NBR,
A.SRL_NO,
A.START_DT,
COALESCE(B.START_DT,A.END_DT) AS END_DT
FROM
(
SELECT A.CARD_NBR,
A.SRL_NO,
A.START_DT,
A.END_DT,
ROW_NUMBER() OVER(PARTITION BY A.CARD_NBR ORDER BY A.SRL_NO ASC) RNUM1
FROM T1 A
)A
LEFT JOIN
(
SELECT B.CARD_NBR,
B.SRL_NO,
B.START_DT,
B.END_DT,
ROW_NUMBER() OVER(PARTITION BY B.CARD_NBR ORDER BY B.SRL_NO ASC) RNUM1
FROM T1 B
)B
ON A.CARD_NBR=B.CARD_NBR
AND A.RNUM1+1=B.RNUM1

Create interval from discrete dates

I have a function which saves the current status of several objects and writes it in a table, which looks like something like this:
ObjectId StatusId Date
1 10 2020-04-04 00:00:00.000
2 10 2020-04-04 00:00:00.000
1 11 2020-04-05 00:00:00.000
2 10 2020-04-05 00:00:00.000
1 10 2020-04-06 00:00:00.000
2 10 2020-04-06 00:00:00.000
I would like to make it an interval grouped by ObjectId and StatusId.
So for the above the preferred output would look like this:
ObjectId StatusId StartDate EndDate
1 10 2020-04-04 00:00:00.000 2020-04-04 00:00:00.000
1 11 2020-04-05 00:00:00.000 2020-04-05 00:00:00.000
1 10 2020-04-06 00:00:00.000 2020-04-06 00:00:00.000
2 10 2020-04-04 00:00:00.000 2020-04-06 00:00:00.000
Note one object can have the same status on multiple occasions but if it had a different status it needs to be in a separate interval. So simple group by and max(Date) doesn't work in my case.
Thanks in advance.
This is a form of gaps-and-islands. For this purpose, the difference of row numbers is probably the simplest method:
select objectid, status, min(date), max(date)
from (select t.*,
row_number() over (partition by objectid order by date) as seqnum,
row_number() over (partition by objectid, status order by date) as seqnum_2
from t
) t
group by objectid, status, (seqnum - seqnum_2);
Why this works can be a little cumbersome to explain. However, if you look at the results of the subquery, you will see how the difference is constant for the groups you want to identify.

Change Dates using Lag based on condition

Input Table :
date_1 date_2 ID
2019-01-01 2019-06-30 1
2019-05-01 2019-05-31 1
2019-06-01 2019-07-30 1
2019-01-02 2019-02-28 2
2019-03-01 2019-08-30 2
2019-01-02 2019-02-28 3
2019-02-06 2019-08-30 3
I am working on a complex HIVE problem of dates.
I need to changes dates of date_1 column and date_2 column for same ID.
I want to copy date_2's date to date_1's date in next row based on a condition. And all this I have to do for each ID, i.e. partition By ID.
Note : Data is sorted by ID asc, date_1 asc, date_2 asc.
For example :
Consider 2nd row, date_1 date is '2019-05-01' and now see its previous row for same ID 1 , here date_2 date is '2019-06-30'.
So check IF date_2 value of any row's previous row is greater than current row's value of date_1 , which is true in case of second row of ID 1.
When true then replace date_1 value of second row with date_2 value of previous row.
i.e. change 2019-05-01 to 2019-06-30, otherwise keep it as it is. Same do it for 3rd row and so on.
when considering 3rd row , then look for its previous row 2nd . And same goes for other rows.
Consider 2nd row of ID 2.
Here 2019-02-28is not greater than 2019-03-01 , so keep it as it is.
Expected Output :
date_1 date_2 ID
2019-01-01 2019-06-30 1
2019-06-30 2019-05-31 1
2019-06-01 2019-07-30 1
2019-01-02 2019-02-28 2
2019-03-01 2019-08-30 2
2019-01-02 2019-02-28 3
2019-02-28 2019-08-30 3
I think you want lag() like this
select date_add(lag(date2, 1, date1) over (partition by id order by date1), 1) as date1,
date2,
id
from t;

SQL query to check if the next row value is same or different

I am joining two tables based on a common column date. However, the column I am trying to get from one the table (cmg) in this case, should get next row value only if it is different from its previous row's value
Table A
Date comp.no
-----------------------
2019-03-08 5
2019-02-26 5
2019-01-17 5
2019-01-10 5
2018-12-27 5
Table B
Date cmg
-----------------
2019-07-17 NULL
2019-04-20 NULL
2019-02-26 RHB
2019-01-19 NULL
2019-01-17 RHB
2019-01-10 RMB
2018-12-28 NULL
2018-12-27 RHB
2018-12-12 RUB
2018-11-28 RUB
2018-10-20 NULL
2018-07-21 NULL
2018-04-21 NULL
2018-01-20 NULL
2017-10-21 NULL
2017-07-29 NULL
2017-05-07 NULL
2017-02-13 NULL
2016-11-22 NULL
2016-08-29 NULL
2016-06-07 NULL
2016-04-06 RUB
2016-03-21 RUB
2016-03-07 RUB
You can use lag function to compare with previous value. And for the first row you'll need an isnull() check since the first row won't have a previous value.
;with cte as(
select case
when isnull(lag(t2.cmg)over (order by t2.cmg desc),'') <>t2.cmg then 1 else 0 end as isresult
,t2.date,t2.cmg
from TableA t1
inner join TableB t2
on t1.date=t2.date
)
select date,cmg from cte where isresult=1
Use lag():
select date, cmg
from (select b.date, b.cmg, lag(b.cmg) over (order by b.date) as prev_cmg
from a join
b
on a.date = b.date
) b
where prev_cmg is null or prev_cmg <> cmg
order by date;

SQL query to retrieve only one occurrence for each id

This is my (simplified) table:
id eventName seriesKey eventStart
1 Event1 5000 2012-01-01 14:00:00
2 Event2 5001 2012-01-01 14:30:00
3 Event3 5000 2012-01-01 14:50:00
4 Event4 5002 2012-01-01 14:55:00
5 Event5 5001 2012-01-01 15:00:00
6 Event6 5002 2012-01-01 15:30:00
7 Event7 5002 2012-01-01 16:00:00
I have to build a query that orders the table by eventStart (ASC) but for each seriesKey, I need only one occurrence.
Thank you very much
Try aggregating with GROUP BY and using aggregate functions like MIN().
SELECT seriesKey,
MIN(eventStart) eventStart
FROM events
GROUP BY seriesKey;
This results in:
5000 2012-01-01 14:00:00.000
5001 2012-01-01 14:30:00.000
5002 2012-01-01 14:30:00.000
If your're interested in all columns from events table, not just the above two columns I choose, there's a freaky implementation in some databases (e.g. SQL Server) which may help you:
SELECT *
FROM events e1
WHERE e1.ID IN
(
SELECT TOP 1 e2.ID
FROM events e2
WHERE e2.seriesKey = e1.seriesKey
ORDER BY e2.eventStart
);
Resulting in:
1 Event1 5000 2012-01-01 14:00:00.000
2 Event2 5001 2012-01-01 14:30:00.000
6 Event2 5002 2012-01-01 14:30:00.000
If you also need the other columns associated with the key, you have two options:
select *
from (
select id,
eventName,
seriesKey,
eventStart,
row_number() over (partition by seriesKey order by eventStart) as rn
from the_event_table
) t
where rn = 1
order by eventStart
or for older DBMS that do not support windowing functions:
select t1.id,
t1.eventName,
t1.seriesKey,
t1.eventStart
from the_event_table t1
where t1.eventStart = (select min(t2.eventStart)
from the_event_table t2
where t2.seriesKey = t1.seriesKey)
order by eventStart
you can get earlier date for each seriesKey:
select * from
(
select seriesKey, min(eventStart) as mindate
group by seriesKey
)
order by mindate