how to split single row into multiple row in db2? - sql

This is what i have in table xyz
NAME AMOUNT BEGIN_DATE END_DATE
ABC 5.0 2013-05-11 2014-06-20
following is what i want using IBM DB2 database
NAME AMOUNT BEGIN_DATE END_DATE
ABC 5.0 2013-05-11 2013-12-31
ABC 5.0 2014-01-01 2014-06-30
instead of just one row from xyz table, i need to fetch 2 rows as above output.
How do I split one row into two ?

The following will only list rows where the begin and end dates span exactly two years or within the same year.
SELECT
NAME,
AMOUNT,
BEGIN_DATE,
DATE(YEAR(BEGIN_DATE)||'-12-31') AS END_DATE
FROM xyz
WHERE YEAR(END_DATE)-YEAR(BEGIN_DATE)=1
UNION
SELECT
NAME,
AMOUNT,
DATE(YEAR(END_DATE)||'-01-01') AS BEGIN_DATE,
END_DATE
FROM xyz
WHERE YEAR(END_DATE)-YEAR(BEGIN_DATE)=1
UNION
SELECT
NAME,
AMOUNT,
BEGIN_DATE,
END_DATE
FROM xyz
WHERE YEAR(END_DATE)-YEAR(BEGIN_DATE)=0
ORDER BY BEGIN_DATE

You can make two SQL statements, to select the first, using '2013-12-31' as a constant for the end-date, then to select a second time, using '2014-01-01' as a constant start date. Then use UNION ALL to put them together.
If you also have some records that start and end within 2013, and therefore do not need to be split, you can get those separately, and exclude them from the other two queries. Other variations in your data might require some extra conditions, but this example should get you going:
select NAME, AMOUNT, BEGIN_DATE, END_DATE
from xyz
where END_DATE <= '2013-12-31'
UNION ALL
select NAME, AMOUNT, BEGIN_DATE, '2013-12-31'
from xyz
where END_DATE >= '2014-01-01'
UNION ALL
select NAME, AMOUNT, '2014-01-01', END_DATE
from xyz
where END_DATE >= '2014-01-01'

Related

Get Start and End date from multiple rows of dates, excluding weekends

I'm trying figure out how to return Start Date and End date based on data like in the below table:
Name
Date From
Date To
A
2022-01-03
2022-01-03
A
2021-12-29
2021-12-31
A
2021-12-28
2021-12-28
A
2021-12-27
2021-12-27
A
2021-12-23
2021-12-24
A
2021-11-08
2021-11-09
The result I am after would show like this:
Name
Date From
Date To
A
2021-12-23
2022-01-03
A
2021-11-08
2021-11-09
The dates in first table will sometimes go over weekends with the Date From and Date To, but in cases where the row ends on a Friday and next row starts on following Monday it will need to be classified as the same "block", as presented in the second table. I was hoping to use DATEFIRST setting to cater for the weekends to avoid using a calendar table, as per How do I exclude Weekend days in a SQL Server query?, but if calendar table ends up being the easiest way out I'm happy to look into creating one.
In above example I only have 1 Name, but the table will have multiple names and it will need to be grouped by that.
The only examples of this I am seeing are using only 1 date column for records and I struggled changing their code around to cater for my example. The closest example I found doesn't work for me as it is based on datetime fields and the time differences - find start and stop date for contiguous dates in multiple rows
This is a Gaps & Island problem with the twist that you need to consider weekend continuity.
You can do:
select max(name) as name, min(date_from) as date_from, max(date_to) as date_to
from (
select *, sum(inc) over(order by date_to) as grp
from (
select *,
case when lag(ext_to) over(order by date_to) = date_from
then 0 else 1 end as inc
from (
select *,
case when (datepart(weekday, date_to) = 6)
then dateadd(day, 3, date_to)
else dateadd(day, 1, date_to) end as ext_to
from t
) x
) y
) z
group by grp
Result:
name date_from date_to
---- ---------- ----------
A 2021-11-08 2021-11-09
A 2021-12-23 2022-01-03
See running example at db<>fiddle #1.
Note: Your question doesn't mention it, but you probably want to segment per person. I didn't do it.
EDIT: Adding partition by name
Partitioning by name is quite easy actually. The following query does it:
select name, min(date_from) as date_from, max(date_to) as date_to
from (
select *, sum(inc) over(partition by name order by date_to) as grp
from (
select *,
case when lag(ext_to) over(partition by name order by date_to) = date_from
then 0 else 1 end as inc
from (
select *,
case when (datepart(weekday, date_to) = 6)
then dateadd(day, 3, date_to)
else dateadd(day, 1, date_to) end as ext_to
from t
) x
) y
) z
group by name, grp
order by name, grp
See running query at db<>fiddle #2.
with extended as (
select name,
date_from,
case when datepart(weekday, date_to) = 6
then dateadd(day, 2, date_to) else date_to end as date_to
from t
), adjacent as (
select *,
case when dateadd(day, 1,
lag(date_to) over (partition by name order by date_from)) = date_from
then 0 else 1 end as brk
from extended
), blocked as (
select *, sum(brk) over (partition by name order by date_from) as grp
from adjacent
)
select name, min(date_from), max(date_to) from blocked
group by name, grp;
I'm assuming that ranges do no overlap and that all input dates do fall on weekdays. While hammering this out on my cellphone I originally made two mistakes. For some reason I got to and from dates reversed in my head and then I was thinking that Friday is 5 (as with ##datefirst) rather than 6. (Of course this could otherwise vary with the regional setting anyway.) One advantage of using table expressions is to modularize and bury certain details in lower levels of the logic. In this case it would be very easy to adjust dates should some of these assumptions prove to be wrong.
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=42e0c452d57d474232bcf991d6d3c43c

Write a query to output the start and end dates listed by the number of days it took to complete in ascending order

Query for creating table
CREATE TABLE "HR"."PROJECT"
("TASK_ID" NUMBER NOT NULL ENABLE,
"START_DATE" DATE,
"END_DATE" DATE,
CONSTRAINT "CITI_PK" PRIMARY KEY ("TASK_ID")
)
Query for inserting data
Insert into HR.PROJECT (TASK_ID,START_DATE,END_DATE) values (1,to_date('01-11-21','DD-MM-RR'),to_date('02-11-21','DD-MM-RR'));
Insert into HR.PROJECT (TASK_ID,START_DATE,END_DATE) values (2,to_date('02-11-21','DD-MM-RR'),to_date('03-11-21','DD-MM-RR'));
Insert into HR.PROJECT (TASK_ID,START_DATE,END_DATE) values (3,to_date('03-11-21','DD-MM-RR'),to_date('04-11-21','DD-MM-RR'));
Insert into HR.PROJECT (TASK_ID,START_DATE,END_DATE) values (4,to_date('13-11-21','DD-MM-RR'),to_date('14-11-21','DD-MM-RR'));
Insert into HR.PROJECT (TASK_ID,START_DATE,END_DATE) values (5,to_date('14-11-21','DD-MM-RR'),to_date('15-11-21','DD-MM-RR'));
Insert into HR.PROJECT (TASK_ID,START_DATE,END_DATE) values (6,to_date('28-11-21','DD-MM-RR'),to_date('29-11-21','DD-MM-RR'));
Insert into HR.PROJECT (TASK_ID,START_DATE,END_DATE) values (7,to_date('30-11-21','DD-MM-RR'),to_date('01-12-21','DD-MM-RR'));
Here is the Table
Output required
Explanation
Project 1: Tasks 1, 2 and 3 are completed on consecutive days, so these are part of the project. Thus start date of project is 01-11-2021 and end date is 04-11-2021, so it took 3 days to complete the project.
Project 2: Tasks 4 and 5 are completed on consecutive days, so these are part of the project. Thus, the start date of project is 13-11-2021 and end date is 15-11-2021, so it took 2 days to complete the project.
Project 3: Only task 6 is part of the project. Thus, the start date of project is 28-11-2021 and end date is 29-11-2021, so it took 1 day to complete the project.
Project 4: Only task 7 is part of the project. Thus, the start date of project is 30-11-2021 and end date is 01-12-2021, so it took 1 day to complete the project.
Note : If there is more than one project that have the same number of completion days, then order by the start date of the project.
My approach was to use lead and lag function but i am not even close to get answer. Is my approach wrong or any other best way to solve this.
this is what i have tried
select * from
(select lag(end_date) over (order by start_date) as nx_dt1 , start_date from project )
where to_date(start_date,'DD-MM-YYYY') <> to_date(nx_dt1,'DD-MM-YYYY')
order by start_date asc;
From Oracle 12, you can use MATCH_RECOGNIZE:
SELECT start_date,
end_date,
end_date - start_date AS days_to_complete
FROM HR.project
MATCH_RECOGNIZE(
ORDER BY start_date
MEASURES
FIRST(start_date) AS start_date,
LAST(end_date) AS end_date
PATTERN (first_date successive_dates*)
DEFINE
successive_dates AS PREV(end_date) = start_date
)
ORDER BY start_date DESC
Which, for your sample data, outputs:
START_DATE
END_DATE
DAYS_TO_COMPLETE
2021-11-30 00:00:00
2021-12-01 00:00:00
1
2021-11-28 00:00:00
2021-11-29 00:00:00
1
2021-11-13 00:00:00
2021-11-15 00:00:00
2
2021-11-01 00:00:00
2021-11-04 00:00:00
3
db<>fiddle here
I gave this a try too MATCH_RECOGNIZE in the previous answer is interesting and something I have never seen before. I came up with something very different.
with P (TASK_ID,START_DATE,END_DATE) as(
select 1,to_date('01-11-21','DD-MM-RR'),to_date('02-11-21','DD-MM-RR') from dual union all
select 2,to_date('02-11-21','DD-MM-RR'),to_date('03-11-21','DD-MM-RR') from dual union all
select 3,to_date('03-11-21','DD-MM-RR'),to_date('04-11-21','DD-MM-RR') from dual union all
select 4,to_date('13-11-21','DD-MM-RR'),to_date('14-11-21','DD-MM-RR') from dual union all
select 5,to_date('14-11-21','DD-MM-RR'),to_date('15-11-21','DD-MM-RR') from dual union all
select 6,to_date('28-11-21','DD-MM-RR'),to_date('29-11-21','DD-MM-RR') from dual union all
select 7,to_date('30-11-21','DD-MM-RR'),to_date('01-12-21','DD-MM-RR') from dual
)
select task_id, START_DATE,substr(SYS_CONNECT_BY_PATH(end_date, '/'),2,10) end_date,level days_took_to_complete
from p
where
CONNECT_BY_ISLEAF =1
start with end_date not in (select start_date from p)
connect by nocycle end_date = prior start_date
order by level,START_DATE desc

How can I find dates ranges with no data from an effective dated table with SQL?

This is a little bit confusing so i'll try to clarify.
let's say I have an employee table like this
employee
eff_Dt
end_effective_date
1
1900-01-01
2020-12-31
1
2021-01-01
2021-02-01
1
2021-02-02
9999-01-01
2
1900-01-01
9999-01-01
3
1900-01-01
2015-12-31
3
2016-01-01
2020-01-01
4
1900-01-01
2016-01-01
4
2018-01-01
9999-01-01
Employees 1 and 2 are fine. They have a full effective dated history from 1900-01-01 to 9999-12-31. All of my employee records need that.
The SQL I need is to find records like 3 and 4. In the case of employee 3, we are missing the data from 2020-01-02 to 9999-01-01 and for employee 4 we are missing data from 2016-01-02 to 2017-12-31.
How can I develop a query that will return these records? I am on Oracle SQL - would prefer an ANSI SQL solution if possible but if the best solution is uses oracle specific functions than it is what it is. I do not have access to create indices or create stored procedures. This can only be done via query.
Thank you in advance.
I'd suggest to count days. The 2958098 is amount of days between 1900-01-01 and 9999-01-01.
This query will return employees 3 and 4
select employee, sum(end_effective_date - eff_dt)
from test
group by employee
having sum(end_effective_date - eff_dt) < 2958098;
UPD: Same query without hard-coded values
select employee, sum(end_effective_date - eff_dt)
from test
group by employee
having sum(end_effective_date - eff_dt) < (date'9999-01-01' - date'1900-01-01' + 1);
If you want the employees missing dates:
select employee
from t
group by employee
having min(eff_Dt) <> date '1900-01-01' or
max(end_effective_date) <> date '9999-01-01';
If you want the specific missing time periods, use lead() for most of them . . . and then union all to get the first one:
select employee,
end_effective_date + interval '1' day as missing_eff_dt,
next_eff_dt - interval '1' day as missing_end_dt
from (select t.*, lead(eff_dt) as next_eff_dt
from t
) t
where next_eff_dt > end_effective_date + interval '1' day
union all
select employee, date '1900-01-01',
min(eff_dt) - interval '1' day
from t
group by employee
where min(eff_dt) > date '1900-01-01'
If I got it well, you need to find records for employees who have gaps between end dates and start dates, and those who don't have 9999-01-01 as the max end_date. The query below will work for that purpose.
select EMPLOYEE, EFF_DT, END_EFFECTIVE_DATE
from (
select tt.*
, count(distinct grp)
over(partition by EMPLOYEE) cnt
, max(END_EFFECTIVE_DATE)
over(partition by EMPLOYEE order by EFF_DT desc) max_END_EFFECTIVE_DATE
from (
select t.*
, case
when EFF_DT != nvl(
lag(END_EFFECTIVE_DATE, 1)over(partition by EMPLOYEE order by EFF_DT)
, date '-4712-01-01'
)+ 1
then row_number()over (partition by EMPLOYEE order by EFF_DT)
else null
end
grp
from your_table
) tt
)ttt
where cnt > 1
or max_END_EFFECTIVE_DATE < date '9999-01-01'
;
The query below uses the Tabibitosan method to stitch together the adjacent time periods. The method itself uses the analytic sum() function and standard aggregation; it works almost unchanged in any SQL dialect that supports basic analytic functions.
The output shows only the employees with incomplete data. It shows uninterrupted periods of "effectivity"; if data is "complete", then there should be only one such interval for the employee, from 1 JAN 1900 to 1 JAN 9999. Those are excluded; the output shows the employees with gaps at the beginning, in the middle, and/or the end, and for those employees it shows the interval (or intervals) of "effectivity".
While you didn't request this, the query could be modified easily to show the "missing" periods for each employee (the periods when they were not effective).
with
t (employee, eff_dt, end_effective_date, grp) as (
select employee, eff_dt, end_effective_date,
end_effective_date - sum(end_effective_date + 1 - eff_dt)
over (partition by employee order by eff_dt)
from sample_data
)
select employee, min(eff_dt) as eff_dt, max(end_effective_date) as end_dt
from t
group by employee, grp
having min(eff_dt) != date '1900-01-01'
or max(end_effective_date) != date '9999-01-01'
order by employee, eff_dt
;
EMPLOYEE EFF_DT END_DT
---------- ---------- ----------
3 1900-01-01 2020-01-01
4 1900-01-01 2016-01-01
4 2018-01-01 9999-01-01

How to aggregate a measure on a year-month level based on start and end date in SQL?

I have a SQL query that pulls in three columns as below
employee_id start_date end_date hours
123 09-01-2019 09-02-2019 8
123 09-28-2019 10-01-2019 32
I want to rewrite the query so instead of going granular, i just want to know the sum(hrs) an employee has on a year month level like below:
employee_id Year_Month hours
123 201909 32
123 201910 8
The employee has 4 days in September so 4*8=32 and one day in october so 8 hours for the month of October. My issue is when there are start and end dates that cross between adjacent months. I'm not sure how to write a query to get my desired output and I'd really appreciate any help on this
It might be simpler to use a recursive query to generate series of days in each month, then aggregate by month and count:
with
data as (< your existing query here >),
cte (employee_id, dt, max_dt) as (
select employee_id, start_date, end_date from data
union all
select employee_id, dt + 1, max_dt from cte where dt + 1 < max_dt
)
select employee_id, to_char(dt, 'yyyymm') year_months, count(*) * 8 hours
from mytable
group by employee_id, to_char(dt, 'yyyymm')
This assumes 8 hours per day, as explained in your question.

SELECT records FROM amount_history WHERE amount1 is l<= max amount2 in last 3 years

I know basic SQL but am trying to come up with a query that is beyond me.
AMOUNT_HISTORY table is something like this
I can add an input parameter for effective date, say effectiveDate.
The highest Amount1 for RefNo 1 in last 3 years is 12,000 which is the <= the Amount2 at the effectiveDate - thats fine.
The highest Amount1 for RefNo 2 in last 3 years is 22,000, which is > than the Amount2 at the effective Date - I need to select the RefNo in that case.
Note the dates go back further, so will need the last 3 dates criteria. There will only be dates on the anniversary of the effectiveDate. Normally I would add the query I have developed thus far, but I didn't get further than a simple Select From Where so not much progress made really. Any help appreciated.
First, you need to select only the rows where the date is between "effective date minus two years" and "effective date". This is done best in the WHERE clause, with a between condition, using add_months() to subtract two years from the effective date. Note - I used dt as column name (date is reserved in Oracle and shouldn't be used as an identifier); and I pass the effective date as a bind variable.
Then, from the remaining rows, group by refno, and compare max(amount1) to the amount2 on the effective date. This is a comparison at the group level, not at the individual row level, so it goes in the HAVING clause, not in the WHERE clause.
Lastly, amount2 on the effective date is at the row level, not at the group level; so we need a little trick. I use a "conditional maximum" - the max() function applied to an expression that is amount2 when the date is the effective date, but null on the other dates. A case expression is perfect for that.
with
test_data ( refno, dt, amount1, amount2 ) as (
select 1, date '2017-01-01', 12000, 12000 from dual union all
select 1, date '2016-01-01', 11000, null from dual union all
select 1, date '2015-01-01', 10500, null from dual union all
select 2, date '2017-01-01', 20000, 10000 from dual union all
select 2, date '2016-01-01', 21000, null from dual union all
select 2, date '2015-01-01', 22000, null from dual
)
-- End of test data (not part of the solution). SQL query begins below this line.
select refno, max(case when dt = :effective_date then amount2 end) as amount2
from test_data
where dt between add_months(:effective_date, -24) and :efffective_date
group by refno
having max(amount1) > max(case when dt = :effective_date then amount2 end)
;
REFNO AMOUNT2
----- -------
2 10000
I can't provide you the direct answer but definitely will help you getting the correct answer.
So, basically you need to learn -
GROUP BY along with aggregate function - http://www.w3schools.com/sql/sql_groupby.asp.
Group by clause group the results by ref no and then use aggregate function MAX
DATE_ADD - http://www.w3schools.com/sql/func_date_add.asp
sql query for getting data for last 3 months
Use DATE_ADD to add the criteria for interval of last 3 years