Rank the closest date, but not if date has already been used in previous row - sql

Complicated to summarize the issue in the title, and in text so check the IMG for visual explanation.
I've got an issue joining two tables. The date from the first table (startdatetable) should get the next/closest date in the other table (enddatetable). This is to 99% easy done with rank, because most rows have a startdate that can find the next enddate, and before another enddate is available there is a new startdate.
However, if there are two startdates and only one enddate available the startdates will join the same enddate.
What I'm trying to do is, that if a date has been used in the row before, it should not be used in the next row.
The rows I want are highlighted.
The SQL I started out with looked like
select *
from (
select rank() over (partition by tid order by enddate asc) as rnk
, id, startdate, enddate
from startdateTable
inner join enddateTable on startdateTable.ID = enddateTable.id
and enddateTable.enddate > startdateTable.startdate
) q
where q.rnk = 1
This gets me the following result. The last row should instead get the 2100 date, since the 2020-06 date has been used in the previous row.

If you have two tables and you want to align them, then you can use row_number():
select s.id, s.startdate, e.enddate
from (select s.*, row_number() over (partition by id order by startdate) as seqnum
from startdateTable
) s join
(select e.*, row_number() over (partition by id order by enddate) as seqnum
from enddateTable
) e
on e.id = s.id and e.seqnum = s.seqnum

Related

Take data within a month, if not, then in the previous one

In my application, I use this select to get data in a month:
select e.id, e.date, e.local_currency_id, e.rate, e.downloaddate
from exchange_rates e
join (select local_currency_id, max(date) as max_date
from exchange_rates group by local_currency_id, date_trunc('month', date)) as m_date on e.date = m_date.max_date
where e.date >=:currentDate and e.local_currency_id = :localCurrencyId
order by date_trunc('month', e.downloaddate) desc limit 1
The question is that I want to get the data in the month (local_currency), sorting from the end of the month, in reverse order, and if there is no data in this month, then I need to take the data in the previous one.
Now, according to my select, I get data that is greater than my date, but I need to get the data as I described above
not totally following your logic by how about use row_number() to do a ranking.
partition by the currency id and order by the date descending. then just pick the ranking of 1. (the most current)
with t as (select e.*,
row_number() over (partition by e.local_currency_id order by date_trunc('month', e.downloaddate) desc) as rn
from exchange_rates e
where e.local_currency_id = :localCurrencyId)
select id, date, local_currency_id, rate, downloaddate
from t
where rn = 1

How can i group rows on sql base on condition

I am using redshift sql and would like to group users who has overlapping voucher period into a single row instead (showing the minimum start date and max end date)
For E.g if i have these records,
I would like to achieve this result using redshift
Explanation is tat since row 1 and row 2 has overlapping dates, I would like to just combine them together and get the min(Start_date) and max(End_Date)
I do not really know where to start. Tried using row_number to partition them but does not seem to work well. This is what I tried.
select
id,
start_date,
end_date,
lag(end_date, 1) over (partition by id order by start_date) as prev_end_date,
row_number() over (partition by id, (case when prev_end_date >= start_date then 1 else 0) order by start_date) as rn
from users
Are there any suggestions out there? Thank you kind sirs.
This is a type of gaps-and-islands problem. Because the dates are arbitrary, let me suggest the following approach:
Use a cumulative max to get the maximum end_date before the current date.
Use logic to determine when there is no overall (i.e. a new period starts).
A cumulative sum of the starts provides an identifier for the group.
Then aggregate.
As SQL:
select id, min(start_date), max(end_date)
from (select u.*,
sum(case when prev_end_date >= start_date then 0 else 1
end) over (partition by id
order by start_date, voucher_code
rows between unbounded preceding and current row
) as grp
from (select u.*,
max(end_date) over (partition by id
order by start_date, voucher_code
rows between unbounded preceding and 1 preceding
) as prev_end_date
from users u
) u
) u
group by id, grp;
Another approach would be using recursive CTE:
Divide all rows into numbered partitions grouped by id and ordered by start_date and end_date
Iterate over them calculating group_start_date for each row (rows which have to be merged in final result would have the same group_start_date)
Finally you need to group the CTE by id and group_start_date taking max end_date from each group.
Here is corresponding sqlfiddle: http://sqlfiddle.com/#!18/7059b/2
And the SQL, just in case:
WITH cteSequencing AS (
-- Get Values Order
SELECT *, start_date AS group_start_date,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY start_date, end_date) AS iSequence
FROM users),
Recursion AS (
-- Anchor - the first value in groups
SELECT *
FROM cteSequencing
WHERE iSequence = 1
UNION ALL
-- Remaining items
SELECT b.id, b.start_date, b.end_date,
CASE WHEN a.end_date > b.start_date THEN a.group_start_date
ELSE b.start_date
END
AS groupStartDate,
b.iSequence
FROM Recursion AS a
INNER JOIN cteSequencing AS b ON a.iSequence + 1 = b.iSequence AND a.id = b.id)
SELECT id, group_start_date as start_date, MAX(end_date) as end_date FROM Recursion group by id, group_start_date ORDER BY id, group_start_date

Finding the commencement date of a new project

Interested in a challenging SQL problem, read ahead:
For the data set below, I'm trying to find a logic which identifies the commencement date of a new project for each employee.
Data Set
The logic to identify commencement date of new project is that:
An employee will not have any date record prior to the present one in a 14 day time frame.
Project windows only last 14 days after the commencement. The first record falling outside such a window will be counted as the start of the next project.
What is needed
Both Redshift/ Postgres solutions accepted.
Please note Redshift doesn't support recursive CTEs or RANGE keyword in window frame.
Thanks for reading.
For Postgresql, including the CTE (DataSet) for the dataset, here you go:
WITH RECURSIVE TimeLine(Employee, ProjectID, ProjectStartDate, Date, DateRank) AS (
SELECT Employee, 1, Date, Date, DateRank
FROM DataSetWithRank
WHERE DateRank = 1
UNION ALL
SELECT T.Employee,
T.ProjectID + CASE When D.Date >= T.ProjectStartDate+14 THEN 1 Else 0 END,
CASE When D.Date >= T.ProjectStartDate+14 THEN D.Date Else T.ProjectStartDate END,
D.Date, D.DateRank
FROM TimeLine T
JOIN DataSetWithRank D ON D.Employee = T.Employee AND D.DateRank = T.DateRank + 1
), DataSet(Employee,Date) AS (
SELECT UNNEST(ARRAY['Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1']),
UNNEST(ARRAY['2018-01-01','2018-01-03','2018-01-05','2018-01-08','2018-01-11','2018-01-13','2018-01-14','2018-01-16','2018-01-18','2018-01-21','2018-01-22','2018-01-24','2018-01-25','2018-01-27','2018-01-29']::date[])
UNION
SELECT UNNEST(ARRAY['Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2']),
UNNEST(ARRAY['2018-01-03','2018-01-05','2018-01-07','2018-01-10','2018-01-13','2018-01-15','2018-01-16','2018-01-18','2018-01-20','2018-01-23','2018-01-24','2018-01-26','2018-01-27','2018-01-29','2018-01-31']::date[])
), DataSetWithRank AS (
SELECT *, DENSE_RANK() OVER (PARTITION BY Employee ORDER BY Date) AS DateRank
FROM DataSet
)
SELECT Employee,
'Project ' || ProjectID AS "Project #",
Date,
DENSE_RANK() OVER (PARTITION BY Employee, ProjectID ORDER BY Date) AS Rank,
CASE WHEN Date = ProjectStartDate THEN 'Y' ELSE NULL END AS Is_New
FROM TimeLine

SQL: transposing a time series table into a start-end time table if an event occur

I am trying to use a select statement to create a view, transposing a table with datetime into a table with records in each row, the start-end time when the consecutive values by time (partition by station) in 'record' field is not 0.
Here is a sample of the initial table.
And how it should look like after transposing.
Can anyone help?
You can use the conditional_change_event analytical function to create a special grouping identifier to split these out in a simple query:
select row_number() over () unique_id,
station,
min(datetime) startdate,
max(datetime) enddate
from (
select t.*, CONDITIONAL_CHANGE_EVENT(decode(recording,0,0,1))
over (partition by station order by datetime) chg
from mytable t
) x
where recording > 0
group by station, chg
order by 1, 2
The decode is just to set up your islands and gaps (where gaps are recording <= 0 and islands are recording > 0). Then the change event on that will generate a new identifier for grouping. Also note that I am grouping on the change event even though it isn't part of the output.
ROW_NUMBER() is the best for partitioning. Next, you can do a self join on the partitioned tables to see if the difference between times is greater than five minutes. I think the best solution is to partition on the rolling sum of the timestamp difference, offset by 5 minutes based on your pattern. If the five minutes is not a regular pattern then there is probably a generalized approach that can be used with the zeroes.
Solution written as a CTE below for easy view creation (it's a slow view though).
WITH partitioned as (
SELECT datetime, station, recording,
ROW_NUMBER() OVER(PARTITION BY station
ORDER BY datetime ASC) rn
FROM table --Not sure what the tablename is
WHERE recording != 0),
diffed as (
SELECT a.datetime, a.station,
DATEDIFF(mi,ISNULL(b.datetime,a.datetime),a.datetime)-5) Difference
--The ISNULL logic is for when a.datetime is the beginning of the block,
--we want a 0
FROM partitioned a
LEFT JOIN partitioned b on a.rn = b.rn + 1 and a.station=b.station
GROUP BY a.datetime,a.station),
cumulative as (
SELECT a.datetime, a.station, SUM(b.difference) offset_grouping
FROM diff a
LEFT JOIN diff b on a.datetime >= b.datetime and a.station = b.station ),
ordered as (SELECT datetime,station,
ROW_NUMBER() OVER(PARTITION BY station,offset_grouping ORDER BY datetime asc) starter,
ROW_NUMBER() OVER(PARTITION BY station,offset_grouping ORDER BY datetime desc) ender
FROM cumulative)
SELECT ROW_NUMBER() OVER(ORDER BY a.datetime) unique_id,a.station,a.datetime startdate, b.datetime enddate
FROM ordered a
JOIN ordered b on a.starter = b.ender and a.station=b.station and a.starter=1
This is the only solution I can think of but again, it's slow depending on the amount of data you have.

Oracle - select rows with minimal value in a subset

I have a following table of dates:
dateID INT (PK),
personID INT (FK),
date DATE,
starttime VARCHAR, --Always in a format of 'HH:MM'
What I want to do is I want to pull rows (all columns, including PK) with lowest date (primary condition) and starttime (secondary condition) for every person. For example, if we have
row1(date = '2013-04-01' and starttime = '14:00')
and
row2(date = '2013-04-02' and starttime = '08:00')
row1 will be retrieved, along with all other columns.
So far I have come up with gradual filtering the table, but it`s quite a mess. Is there more efficient way of doing this?
Here is what I made so far:
SELECT
D.id
, D.personid
, D.date
, D.starttime
FROM table D
JOIN (
SELECT --Select lowest time from the subset of lowest dates
A.personid,
B.startdate,
MIN(A.starttime) AS starttime
FROM table A
JOIN (
SELECT --Select lowest date for every person to exclude them from outer table
personid
, MIN(date) AS startdate
FROM table
GROUP BY personid
) B
ON A.personid = B.peronid
AND A.date = B.startdate
GROUP BY
A.personid,
B.startdate
) C
ON C.personid = D.personid
AND C.startdate = D.date
AND C.starttime = D.starttime
It works, but I think there is a more clean/efficient way to do this. Any ideas?
EDIT: Let me expand a question - I also need to extract maximum date (only date, without time) for each person.
The result should look like this:
id
personid
max(date) for each person
min(date) for each person
min(starttime) for min(date) for each person
It is a part of a much larger query (the resulting table is joined with it), and the resulting table must be lightweight enough so that the query won`t execute for too long. With single join with this table (just using min, max for each field I wanted) the query took about 3 seconds, and I would like the resulting query not to take longer than 2-3 times that.
you should be able to do this like:
select a.dateID, a.personID, a.date, a.max_date, a.starttime
from (select t.*,
max(t.date) over (partition by t.personID) max_date,
row_number() over (partition by t.personID
order by t.date, t.starttime) rn
from table t) a
where a.rn = 1;
sample data added to fiddle: http://sqlfiddle.com/#!4/63c45/1
This is the query you can use and no need to incorporate in your query. You can also use #Dazzal's query as stand alone
SELECT ID, PERSONID, DATE, STARTTIME
(
SELECT ID, PERONID, DATE, STARTTIME, ROW_NUMBER() OVER(PARTITION BY personid ORDER BY STARTTIME, DATE) AS RN
FROM TABLE
) A
WHERE
RN = 1
select a.id,a.accomp, a.accomp_name, a.start_year,a.end_year, a.company
from (select t.*,
min(t.start_year) over (partition by t.company) min_date,
max(t.end_year) over (partition by t.company) max_date,
row_number() over (partition by t.company
order by t.end_year desc) rn
from temp_123 t) a
where a.rn = 1;