Efficient join with a "correlated" subquery - sql

Given three tables Dates(date aDate, doUse boolean), Days(rangeId int, day int, qty int) and Range(rangeId int, startDate date) in Oracle
I want to join these so that Range is joined with Dates from aDate = startDate where doUse = 1 whith each day in Days.
Given a single range it might be done something like this
SELECT rangeId, aDate, CASE WHEN doUse = 1 THEN qty ELSE 0 END AS qty
FROM (
SELECT aDate, doUse, SUM(doUse) OVER (ORDER BY aDate) day
FROM Dates
WHERE aDate >= :startDAte
) INNER JOIN (
SELECT rangeId, day,qty
FROM Days
WHERE rangeId = :rangeId
) USING (day)
ORDER BY day ASC
What I want to do is make query for all ranges in Range, not just one.
The problem is that the join value "day" is dependent on the range startDate to be calculated, wich gives me some trouble in formulating a query.
Keep in mind that the Dates table is pretty huge so I would like to avoid calculating the day value from the first date in the table, while each Range Days shouldn't be more than a 100 days or so.
Edit: Sample data
Dates Days
aDate doUse rangeId day qty
2008-01-01 1 1 1 1
2008-01-02 1 1 2 10
2008-01-03 0 1 3 8
2008-01-04 1 2 1 2
2008-01-05 1 2 2 5
Ranges
rangeId startDate
1 2008-01-02
2 2008-01-03
Result
rangeId aDate qty
1 2008-01-02 1
1 2008-01-03 0
1 2008-01-04 10
1 2008-01-05 8
2 2008-01-03 0
2 2008-01-04 2
2 2008-01-05 5

Try this:
SELECT rt.rangeId, aDate, CASE WHEN doUse = 1 THEN qty ELSE 0 END AS qty
FROM (
SELECT *
FROM (
SELECT r.*, t.*, SUM(doUse) OVER (PARTITION BY rangeId ORDER BY aDate) AS span
FROM (
SELECT r.rangeId, startDate, MAX(day) AS dm
FROM Range r, Days d
WHERE d.rangeid = r.rangeid
GROUP BY
r.rangeId, startDate
) r, Dates t
WHERE t.adate >= startDate
ORDER BY
rangeId, t.adate
)
WHERE
span <= dm
) rt, Days d
WHERE d.rangeId = rt.rangeID
AND d.day = GREATEST(rt.span, 1)
P. S. It seems to me that the only point to keep all these Dates in the database is to get a continuous calendar with holidays marked.
You may generate a calendar of arbitrary length in Oracle using following construction:
SELECT :startDate + ROWNUM
FROM dual
CONNECT BY
1 = 1
WHERE rownum < :length
and keep only holidays in Dates. A simple join will show you which Dates are holidays and which are not.

Ok, so maybe I've found a way. Someting like this:
SELECT irangeId, aDate + sum(case when doUse = 1 then 0 else 1) over (partionBy rangeId order by aDate) as aDate, qty
FROM Days INNER JOIN (
select irangeId, startDate + day - 1 as aDate, qty
from Range inner join Days using (irangeid)
) USING (aDate)
Now I just need a way to fill in the missing dates...
Edit: Nah, this way means that I'll miss the doUse vaue of the last dates...

Related

MS-SQL how to add missing month in a table values

I have a table with the following entries,
ID
date
Frequency
1
'2012-04-30'
5
1
'2012-06-30'
4
1
'2012-07-31'
25
2
'2012-04-30'
7
2
'2012-05-31'
4
2
'2012-06-30'
1
2
'2012-07-31'
6
I need to add missing month and the date which gets added should be the last date of that month with frequency value as 0.
The expected output is
ID
date
Frequency
1
'2012-04-30'
5
1
'2012-05-31'
0
1
'2012-06-30'
4
1
'2012-07-31'
25
2
'2012-04-30'
7
2
'2012-05-31'
4
2
'2012-06-30'
1
2
'2012-07-31'
6
I need to add missing month and the date which gets added should be the last date of that
I would suggest recursive CTEs:
with cte as (
select id, date, frequency,
lead(date) over (partition by id order by date) as next_date
from t
union all
select id, eomonth(date, 1), 0, next_date
from cte
where eomonth(date, 1) < dateadd(day, -1, next_date)
)
select id, date, frequency
from cte
order by id, date;
The anchor part of the CTE calculates the end date for a given row. The recursive part then just keeps adding months to fill in the missing rows (and none if there are none). The use of eomonth(date, 1) is just a handy way of getting the last day of the next month.
Here is a db<>fiddle.
If you have all dates in the table, you can also use cross join to generate the rows and then left join to bring in the existing data:
select i.id, d.date, coalesce(t.frequency, 0) as frequency
from (select distinct id from t) i cross join
(select distinct date from t) d left join
t
on i.id = t.id and d.date = t.date
order by i.id, d.date;
If you have a large amount of data, you can compare performance. This may be a case where a recursive CTE is faster than alternative methods.

Calculate Experience without overlapping

I'm trying to come up with the correct query to calculate the employment experience time but, I can't get it right. Here's the data I have:
Case 1:
EmployeeID PoisitionID StartDate EndDate
1 15 5/22/2017 5/22/2018
1 17 7/14/2018 8/10/2019
Case 2:
EmployeeID PositonID StartDate EndDate
1 15 5/22/2017 8/10/2019
1 17 3/8/2019 8/10/2019
Case 3:
EmployeeID PositonID StartDate EndDate
1 15 5/22/2017 NULL
1 17 3/8/2019 NULL
In the first case, my expected result in months would be: 27 months for both positions.
In the second case, my expected result in months would be:27 months for positonid 15 and 0 months for positionid 17 because positionid 17 falls during the date range of the first position and therefore, the employee will not be awarded with any years of experience.
In the third case, my expected result in months would be:30 months using today's date as an enddate for positonid 15 and 0 months for positionid 17 because positionid 17 falls during the date range of the first position and therefore, the employee will not be awarded with any years of experience.
You don't have any gaps, so I think this does what you want:
select employeeid,
datediff(month, min(startdate), coalesce(max(enddate), getdate())) as months
from t
group by employeeid;
This is what I have:
Your table 1:
select 1 as EmployeeID , 15 as PositonID , cast('5/22/2017' as date) as StartDate, cast('5/22/2018' as date) as EndDate into t2
union select 1, 17, '7/14/2018', '8/10/2019'
And the query to get the result
with a as
(
select EmployeeID, isnull(StartDate, cast(getdate() as date)) as sedate from t2
union
select EmployeeID, isnull(EndDate, cast(getdate() as date)) from t2
)
select a1.*, a2.sedate, case when datediff(month,a1.sedate, a2.sedate)< 0 then 0 else isnull(datediff(month,a1.sedate, a2.sedate), 0) end as months from a a1 left join a a2 on a1.EmployeeID = a2.EmployeeID and a1.sedate < a2.sedate
and not exists(select 1 from a a3 where a3.EmployeeID = a2.EmployeeID and a3.sedate > a1.sedate and a3.sedate < a2.sedate )
I changed the table to the values of Case2 and Case 3 and it seemed to work.
Let us know if that helps

SQL Server: Count days difference between previous date and current date

I've been trying to find a way to count days difference between two dates from previous and current rows which counting only business days.
Example data and criteria here.
ID StartDate EndDate NewDate DaysDifference
========================================================================
0 04/05/2017 null
1 12/06/2017 16/06/2017 12/06/2017 29
2 03/07/2017 04/07/2017 16/06/2017 13
3 07/07/2017 10/07/2017 04/07/2017 5
4 12/07/2017 26/07/2017 10/07/2017 13
My end goal is
I want two new columns; NewDate and DayDifference.
NewDate column is from EndDate from previous row. As you can see that for example, NewDate of ID 2 is 16/06/2017 which come from EndDate of ID 1. But if value in EndDate of previous row is null, use its StartDate instead(ID 1 case).
DaysDifference column is from counting only business days between EndDate and NewDate columns.
Here is script that I am using atm.
select distinct
c.ID
,c.EndDate
,isnull(p.EndDate,c.StartDate) as NewDate
,count(distinct cast(l.CalendarDate as date)) as DaysDifference
from
(select *
from table) c
full join
(select *
from table) p
on c.level = p.level
and c.id-1 = p.id
left join Calendar l
on (cast(l.CalendarDate as date) between cast(p.EndDate as date) and cast(c.EndDate as date)
or
cast(l.CalendarDate as date) between cast(p.EndDate as date) and cast(c.StartDate as date))
and l.Day not in ('Sat','Sun') and l.Holiday <> 'Y'
where c.ID <> 0
group by
c.ID
,c.EndDate
,isnull(p.EndDate,c.StartDate)
And this's the current result :
ID EndDate NewDate DaysDifference
=========================================================
1 16/06/2017 12/06/2017 0
2 04/07/2017 16/06/2017 13
3 10/07/2017 04/07/2017 5
4 26/07/2017 10/07/2017 13
Seems like in the real data, I've got correct DaysDifference for ID 2,3,4 except ID 1 because of the null value from its previous row(ID 0) that printing StartDate instead of null EndDate, so it counts incorrectly.
Hope I've provided enough info. :)
Could you please guide me a way to count DaysDifference correctly.
Thanks in advance!
I think you can use this logic to get the previous date:
select t.*,
lag(coalesce(enddate, startdate), 1) over (order by 1) as newdate
from t;
Then for the difference:
select id, enddate, newdate,
sum(case when c.day not in ('Sat', 'Sun') and c.holiday <> 'Y' then 1 else 0 end) as diff
from (select t.*,
lag(coalesce(enddate, startdate), 1) over (order by 1) as newdate
from t
) t join
calendar c
on c.calendardate >= newdate and c.calendardate <= startdate
group by select id, enddate, newdate;

db2 compare year and month side by side

I need to compare side by side the companies values by current year vs last year and current month with same month of the previous year.
I use this query to get the values
SELECT STORE, SUM(TOTAL) as VAL, DATE FROM MYTABLE
WHERE DATE=CURRENT_DATE GROUP BY STORE ORDER BY STORE
below the results
STORE | VAL | DATE
1 10 CURRENT_DATE (2018-27-03)
1 20 2018-26-03
1 30 2018-25-03
2 20 CURRENT_DATE (2018-27-03)
2 20 2018-26-02
and i need this
STORE | VALUE CURRENT YEAR | VALUE LAST YEAR
1 60 30 (CALCULATED)
2 40 50 (CALCULATED)
STORE | VALUE CURRENT MONTH | VALUE SAME MONTH OF LAST YEAR
1 60 30 (CALCULATED)
2 20 50 (CALCULATED)
Thank you
You could just join two sub-selects together.
E.g with this DDL and Data
CREATE TABLE MYTABLE (STORE int, VAL int, D DATE);
INSERT INTO MYTABLE VALUES
( 1, 10, '2018-03-27')
,( 1, 20, '2018-03-26')
,( 1, 10, '2018-02-25')
,( 1, 35, '2017-03-25')
,( 2, 20, '2018-03-27')
,( 2, 15, '2017-03-26');
This will get you current month and last month last year values
SELECT C.*, LY.VAL_CURR_MONTH_LY
FROM (
SELECT STORE, SUM(VAL) as VAL_CURR_MONTH
FROM MYTABLE WHERE INT(D)/100=INT(CURRENT_DATE)/100
GROUP BY STORE ) AS C
LEFT JOIN
(SELECT STORE
, SUM(VAL) AS VAL_CURR_MONTH_LY
FROM MYTABLE
WHERE INT(D)/100 = INT(CURRENT_DATE)/100 -100
GROUP BY STORE ) LY
ON
C.STORE = LY.STORE
Then this for years
SELECT C.*, LY.VAL_LY
FROM (
SELECT STORE, SUM(VAL) as VAL_CURR_YEAR
FROM MYTABLE WHERE INT(D)/10000=INT(CURRENT_DATE)/10000
GROUP BY STORE ) AS C
LEFT JOIN
(SELECT STORE
, SUM(VAL) AS VAL_LY
FROM MYTABLE
WHERE INT(D)/10000 = INT(CURRENT_DATE)/10000 -1
GROUP BY STORE ) LY
ON
C.STORE = LY.STORE
P.S. there are many other ways to manipulate dates, but casting to INT is maybe one of the easier ways
Also, here is a more flexible way to get the "Same Month of Last Year" value. A similar method can get "last Year" values.
SELECT T.*
, AVG(VAL) OVER(
PARTITION BY STORE
ORDER BY YEAR_MONTH
RANGE BETWEEN 101 PRECEDING AND 100 PRECEDING
) AS SAME_MONTH_PREV_YEAR
FROM
( SELECT STORE
, INTEGER(D)/100 AS YEAR_MONTH
, SUM(VAL) AS VAL
FROM
MYTABLE T
GROUP BY
STORE
, INTEGER(D)/100
) AS T
;
Gives
STORE YEAR_MONTH VAL SAME_MONTH_PREV_YEAR
----- ---------- --- --------------------
1 201703 35 NULL
1 201802 10 NULL
1 201803 30 35
2 201703 15 NULL
2 201803 20 15
It is better to avoid functions on table columns in where clauses. Check following SQLs which are based on P. Vernon sample table.
Note: These SQLs are for DB2 LUW 11.1
For month:
SELECT STORE,
SUM(CASE WHEN YEAR(D) = year(current date) THEN val
ELSE 0 END) as VAL_CURR_MONTH,
SUM(CASE WHEN YEAR(D) = year(current date) - 1 THEN vaL
ELSE 0 END) as VAL_CURR_MONTH_LY
FROM MYTABLE
WHERE D between first_day(current date) and last_day(current date)
or D between first_day(current date - 1 year) and last_day(current date - 1 year)
GROUP BY STORE
ORDER BY STORE
For year:
SELECT STORE, SUM(CASE WHEN YEAR(D) = year(current date) THEN val
ELSE 0 END) as VAL_CY,
SUM(CASE WHEN YEAR(D) = year(current date) - 1 THEN vaL
ELSE 0 END) as VAL_LY
FROM MYTABLE
WHERE D between first_day(current date - (month(current date) - 1) months)
and last_day(current date + (12 - month(current date)) months)
or D between first_day(current date - (month(current date) - 1) months - 1 year)
and last_day(current date + (12 - month(current date)) months - 1 year)
GROUP BY STORE
ORDER BY STORE

Find From/To Dates across multiple rows - SQL Postgres

I want to be able to "book" within range of dates, but you can't book across gaps of days. So booking across multiple rates is fine as long as they are contiguous.
I am happy to change data structure/index, if there are better ways of storing start/end ranges.
So far I have a "rates" table which contains Start/End Periods of time with a daily rate.
e.g. Rates Table.
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-16 2016-04-17
3 50.00 2016-04-18 2016-04-30
For the above data I would want to return:
From To
2015-04-12 2016-4-30
For simplicity sake it is safe to assume that dates are safely consecutive. For contiguous dates To is always 1 day before from.
For the case there is only 1 row, I would want it to return the From/To of that single row.
Also to clarify if I had the following data:
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-17 2016-04-18
3 50.00 2016-04-19 2016-04-30
4 50.00 2016-05-01 2016-05-21
Meaning where there is a gap >= 1 day it would count as a separate range.
In which case I would expect the following:
From To
2015-04-12 2016-04-15
2015-04-17 2016-05-21
Edit 1
After playing around I have come up with the following SQL which seems to work. Although I'm not sure if there are better ways/issues with it?
WITH grouped_rates AS
(SELECT
from_date,
to_date,
SUM(grp_start) OVER (ORDER BY from_date, to_date) group
FROM (SELECT
gite_id,
from_date,
to_date,
CASE WHEN (from_date - INTERVAL '1 DAY') = lag(to_date)
OVER (ORDER BY from_date, to_date)
THEN 0
ELSE 1
END grp_start
FROM rates
GROUP BY from_date, to_date) AS start_groups)
SELECT
min(from_date) from_date,
max(to_date) to_date
FROM grouped_rates
GROUP BY grp;
This is identifying contiguous overlapping groups in the data. One approach is to find where each group begins and then do a cumulative sum. The following query adds a flag indicating if a row starts a group:
select r.*,
(case when not exists (select 1
from rates r2
where r2.from < r.from and r2.to >= r.to or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r;
The or in the correlation condition is to handle the situation where intervals that define a group overlap on the start date for the interval.
You can then do a cumulative sum on this flag and aggregate by that sum:
with r as (
select r.*,
(case when not exists (select 1
from rates r2
where (r2.from < r.from and r2.to >= r.to) or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r
)
select min(from), max(to)
from (select r.*,
sum(r.StartFlag) over (order by r.from) as grp
from r
) r
group by grp;
CREATE TABLE prices( id INTEGER NOT NULL PRIMARY KEY
, price MONEY
, date_from DATE NOT NULL
, date_upto DATE NOT NULL
);
-- some data (upper limit is EXCLUSIVE)
INSERT INTO prices(id, price, date_from, date_upto) VALUES
( 1, 75.00, '2015-04-12', '2016-04-16' )
,( 2, 100.00, '2016-04-17', '2016-04-19' )
,( 3, 50.00, '2016-04-19', '2016-05-01' )
,( 4, 50.00, '2016-05-01', '2016-05-22' )
;
-- SELECT * FROM prices;
-- Recursive query to "connect the dots"
WITH RECURSIVE rrr AS (
SELECT date_from, date_upto
, 1 AS nperiod
FROM prices p0
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_upto = p0.date_from) -- no preceding segment
UNION ALL
SELECT r.date_from, p1.date_upto
, 1+r.nperiod AS nperiod
FROM prices p1
JOIN rrr r ON p1.date_from = r.date_upto
)
SELECT * FROM rrr r
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_from = r.date_upto) -- no following segment
;
Result:
date_from | date_upto | nperiod
------------+------------+---------
2015-04-12 | 2016-04-16 | 1
2016-04-17 | 2016-05-22 | 3
(2 rows)