This is a follow up question from How to get a list of months between 2 given dates using a query?
really. (I suspect it's because I don't quite understand the logic behind connect by level clauses !)
What I have is a list of data like so
ID | START_DATE | END_DATE
1 | 01-JAN-2018 | 20-JAN-2018
2 | 13-FEB-2018 | 20-MAR-2018
3 | 01-MAR-2018 | 07-MAR-2018
and what I want to try and get is a list with all the days between the start and end date for each ID.
So for example I want a list which gives
ID | DATE
1 | 01-JAN-2018
1 | 02-JAN-2018
1 | 03-JAN-2018
...
1 | 19-JAN-2018
1 | 20_JAN-2018
2 | 13-FEB-2018
2 | 14-FEB-2018
2 | 15-FEB-2018
...
etc.
What I've tried to do is adapt one of the answers from the above link as follows
select id
, trunc(start_date+((level-1)),'DD')
from (
select id
, start_date
, end_date
from blah
)
connect by level <= ((trunc(end_date,'DD')-trunc(start_date,'DD'))) + 1
which gives me what I want but then a whole host of duplicate dates as if it's like a cartesian join. Is there something simple I need to add to fix this?
I like recursive CTEs:
with cte as (
select id, start_dte as dte, end_dte
from blah
union all
select id, dte + 1, end_dte
from cte
where dte < end_dte
)
select *
from cte
order by id, dte;
This is ANSI standard syntax and works in several other databases.
The hierarchical query you were trying to do needs to include id = prior id in the connect-by clause, but as that causes loops with multiple source rows you also need to include a call to a non-deterministic function, such as dbms_random.value:
select id, start_date + level - 1 as day
from blah
connect by level <= end_date - start_date + 1
and prior id = id
and prior dbms_random.value is not null
With your sample data in a CTE, that gets 63 rows back:
with blah (ID, START_DATE, END_DATE) as (
select 1, date '2018-01-01', date '2018-01-20' from dual
union all select 2, date '2018-02-13', date '2018-03-20' from dual
union all select 3, date '2018-03-01', date '2018-03-07' from dual
)
select id, start_date + level - 1 as day
from blah
connect by level <= end_date - start_date + 1
and prior id = id
and prior dbms_random.value is not null;
ID DAY
---------- ----------
1 2018-01-01
1 2018-01-02
1 2018-01-03
...
1 2018-01-19
1 2018-01-20
2 2018-02-13
2 2018-02-14
...
3 2018-03-05
3 2018-03-06
3 2018-03-07
You don't need to trunc() the dates unless they have non-midnight times, which seems unlikely in this case, and even then it might not be necessary if only the end-date has a later time (like 23:59:59).
A recursive CTE is more intuitive in many ways though, at least once you understand the basic idea of them; so I'd probably use Gordon's approach too. There can be differences in performance and whether they work at all for large amounts of data (or generated rows), but for a lot of data it's worth comparing different approaches to find the most suitable/performant anyway.
Related
I have a table of EMPLOYEES that contains information about the DATE and WORKTIME per that day. Fx:
ID | DATE | WORKTIME |
----------------------------------------
1 | 1-Sep-2014 | 4 |
2 | 2-Sep-2014 | 6 |
1 | 3-Sep-2014 | 5.5 |
1 | 4-Sep-2014 | 7 |
2 | 4-Sep-2014 | 4 |
1 | 9-Sep-2014 | 8 |
and so on.
Question: How can I create a query that would allow me to calculate amount of time worked per week (HOURS_PERWEEK). I understand that I need a summation of WORKTIME together with grouping considering both, ID and week, but so far my trials as well as googling didnt yield any results. Any ideas on this? Thank you in advance!
edit:
Got a solution of
select id, sum (worktime), trunc(date, 'IW') week
from employees
group by id, TRUNC(date, 'IW');
But will need somehow to connect that particular output with DATE table by updating a newly created column such as WEEKLY_TIME. Any hints on that?
You can find the start of the ISO week, which will always be a Monday, using TRUNC("DATE", 'IW').
So if, in the query, you GROUP BY the id and the start of the week TRUNC("DATE", 'IW') then you can SELECT the id and aggregate to find the SUM the WORKTIME column for each id.
Since this appears to be a homework question and you haven't attempted a query, I'll leave it at this to point you in the correct direction and you can complete the query.
Update
Now I need to create another column (lets call it WEEKLY_TIME) and populate it with values from the current output, so that Sep 1,3,4 (for ID=1) would all contain value 16.5, specifying that on that day (that is within the certain week) that person worked 16.5 in total. And for ID=2 it would then be a value of 10 for both Sep 2 and 4.
For this, if I understand correctly, you appear to not want to use aggregation functions and want to use the analytic version of the function:
select id,
"DATE",
trunc("DATE", 'IW') week,
worktime,
sum (worktime) OVER (PARTITION BY id, trunc("DATE", 'IW'))
AS weekly_time
from employees;
Which, for the sample data:
CREATE TABLE employees (ID, "DATE", WORKTIME) AS
SELECT 1, DATE '2014-09-01', 4 FROM DUAL UNION ALL
SELECT 2, DATE '2014-09-02', 6 FROM DUAL UNION ALL
SELECT 1, DATE '2014-09-03', 5.5 FROM DUAL UNION ALL
SELECT 1, DATE '2014-09-04', 7 FROM DUAL UNION ALL
SELECT 2, DATE '2014-09-04', 4 FROM DUAL UNION ALL
SELECT 1, DATE '2014-09-09', 8 FROM DUAL;
Outputs:
ID
DATE
WEEK
WORKTIME
WEEKLY_TIME
1
2014-09-01 00:00:00
2014-09-01 00:00:00
4
16.5
1
2014-09-03 00:00:00
2014-09-01 00:00:00
5.5
16.5
1
2014-09-04 00:00:00
2014-09-01 00:00:00
7
16.5
1
2014-09-09 00:00:00
2014-09-08 00:00:00
8
8
2
2014-09-04 00:00:00
2014-09-01 00:00:00
4
10
2
2014-09-02 00:00:00
2014-09-01 00:00:00
6
10
db<>fiddle here
edit: answer submitted without noticing "Oracle" tag. Otherwise, question answered here: Oracle SQL - Sum and group data by week
Select employee_Id,
DATEPART(week, workday) as [Week],
sum (worktime) as [Weekly Hours]
from WORK
group by employee_id, DATEPART(week, workday)
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=238b229156a383fa3c466b6c3c2dee1e
I have this table and I want to adjust END_DATE one day prior to the next ST_DATE in case if there are overlap dates for a group of ID
TABLE HAVE
ID ST_DATE END_DATE
1 2020-01-01 2020-02-01
1 2020-05-10 2020-05-20
1 2020-05-18 2020-06-19
1 2020-11-11 2020-12-01
2 1999-03-09 1999-05-10
2 1999-04-09 2000-05-10
3 1999-04-09 2000-05-10
3 2000-06-09 2000-08-16
3 2000-08-17 2009-02-17
Below is what I'm looking for
TABLE WANT
ID ST_DATE END_DATE
1 2020-01-01 2020-02-01
1 2020-05-10 2020-05-17 =====changed to a day less than the next ST_DATE due to some sort of overlap
1 2020-05-18 2020-06-19
1 2020-11-11 2020-12-01
2 1999-03-09 1999-04-08 =====changed to a day less than the next ST_DATE due to some sort of overlap
2 1999-04-09 2000-05-10
3 1999-04-09 2000-05-10
3 2000-06-09 2000-08-16
3 2000-08-17 2009-02-17
Maybe you can use LEAD() for this. Initial idea:
select
id, st_date, end_date
, lead( st_date ) over ( partition by id order by st_date ) nextstart_
from overlap
;
-- result
ID ST_DATE END_DATE NEXTSTART
---------- --------- --------- ---------
1 01-JAN-20 01-FEB-20 10-MAY-20
1 10-MAY-20 20-MAY-20 18-MAY-20
1 18-MAY-20 19-JUN-20 11-NOV-20
1 11-NOV-20 01-DEC-20
2 09-MAR-99 10-MAY-99 09-APR-99
2 09-APR-99 10-MAY-00
3 09-APR-99 10-MAY-00 09-JUN-00
3 09-JUN-00 16-AUG-00 17-AUG-00
3 17-AUG-00 17-FEB-09
Once you have the next start date and the end_date side by side (as it were),
you can use CASE ... for adjusting the dates as you need them.
select ilv.id, ilv.st_date
, case
when ilv.end_date > ilv.nextstart_ then
to_char( ilv.nextstart_ - 1 ) || ' <- modified end date'
else
to_char( ilv.end_date )
end dt_modified
from (
select
id, st_date, end_date
, lead( st_date ) over ( partition by id order by st_date ) nextstart_
from overlap
) ilv
;
ID ST_DATE DT_MODIFIED
---------- --------- ---------------------------------------
1 01-JAN-20 01-FEB-20
1 10-MAY-20 17-MAY-20 <- modified end date
1 18-MAY-20 19-JUN-20
1 11-NOV-20 01-DEC-20
2 09-MAR-99 08-APR-99 <- modified end date
2 09-APR-99 10-MAY-00
3 09-APR-99 10-MAY-00
3 09-JUN-00 16-AUG-00
3 17-AUG-00 17-FEB-09
DBfiddle here.
If two "windows" for the same id have the same start date, then the problem doesn't make sense. So, let's assume that the problem makes sense - that is, the combination (id, st_date) is unique in the inputs.
Then, the problem can be formulated as follows: for each id, order rows by st_date ascending. Then, for each row, if its end_dt is less than the following st_date, return the row as is. Otherwise replace end_dt with the following st_date, minus 1. This last step can be achieved with the analytic lead() function.
A solution might look like this:
select id, st_date,
least(end_date, lead(st_date, 1, end_date + 1)
over (partition by id order by st_date) - 1) as end_date
from have
;
The bit about end_date + 1 in the lead function handles the last row for each id. For such rows there is no "next" row, so the default application of lead will return null. The default can be overridden by using the third parameter to the function.
First of all, I am not very experienced with SQL. I have found similar questions here before but so far I was not able to develop a working solution for my specific problem.
I have a table that holds phone call records which has the following fields:
END: Holds the timestamp of when a call ended - Data Type: DATE
LINE: Holds the phone line that was used for a call - Data Type: NUMBER
CALLDURATION: Holds the duration of a call in seconds - Data Type: NUMBER
The table has entries like this:
END LINE CALLDURATION
---------------------- ------------------- -----------------------
25/01/2012 14:05:10 6 65
25/01/2012 14:08:51 7 1142
25/01/2012 14:20:36 5 860
I need to create a query that returns the number of concurrent phone calls based on the data from that table. The query should be capable of calculating that number in fixed intervals, such as every 5 minutes.
To make this more clear, here is an example of what the query should return (based on the example entries from the previous table:
TIMESTAMP CURRENTLYUSEDLINES
---------------------- -------------------
25/01/2012 14:05:00 2
25/01/2012 14:10:00 1
25/01/2012 14:15:00 1
How can I do this with a (Oracle) SQL query? There are currently almost 1 million records in the table so the query must be as fast as possible because otherwise it would take forever to execute it.
One solution would be this one:
WITH t AS
(SELECT TIMESTAMP '2012-01-25 14:00:00' + LEVEL * INTERVAL '5' MINUTE AS TS
FROM dual
CONNECT BY TIMESTAMP '2012-01-25 14:00:00' + LEVEL * INTERVAL '5' MINUTE <= TIMESTAMP '2012-01-25 14:15:00'),
calls AS
(SELECT TIMESTAMP '2012-01-25 14:05:10' AS END_TIME, 6 AS LINE, 65 AS duration FROM dual
UNION ALL SELECT TIMESTAMP '2012-01-25 14:08:51', 7, 1142 FROM dual
UNION ALL SELECT TIMESTAMP '2012-01-25 14:20:36', 5, 860 FROM dual)
SELECT TS, count(distinct line)
FROM t
LEFT OUTER JOIN calls ON ts BETWEEN END_TIME - duration * INTERVAL '1' SECOND AND END_TIME
GROUP BY ts
HAVING count(distinct line) > 0
ORDER BY ts;
TS COUNT(DISTINCTLINE)
-------------------- -------------------
25.01.2012 14:05:00 2
25.01.2012 14:10:00 1
25.01.2012 14:15:00 1
3 rows selected.
I am trying to join two tables whereby one person can have more than one card and some of them might be canceled.
For example :
**Customer Card**
Cust ID | Cust Acct | Card No | Join Date | Cancel Date
1 | 10001 | E100001 | 20150501 | 20160101
1 | 10001 | E100002 | 20151001 | 0
2 | 10002 | E100003 | 20150101 | 20160601
3 | 10003 | E100004 | 20150201 | 0
4 | 10003 | E100005 | 20160101 | 0
**Customer Account**
Cust ID | Cust Acct
1 | 10001
2 | 10002
3 | 10003
Basically, I want to show all accounts with 1st join card no, even though card is canceled. If the 1st card is canceled, then needs to show the 2nd card joining date.
The expected result :
Cust ID | Cust Acct | Card No | Join Date | Cancel Date
1 | 10001 | E100002 | 20151001 | 0
2 | 10002 | E100003 | 20151001 | 20160601
3 | 10003 | E100004 | 20150201 | 0
Thanks for the assistance ! Any idea ?
One method uses row_number():
select cc.*, ca.CardNo, ca.JoinDate, ca.CancelDate
from customercard cc join
(select ca.*,
row_number() over (partition by custid order by joindate asc) as seqnum
from customeraccount ca
) ca
on cc.custid = ca.custid and seqnum = 1;
This can be done in one pass over the data (without requiring a subquery and outer query), using GROUP BY and KEEP(DENSE_RANK FIRST).
First some housekeeping.
Table and column names cannot have spaces in them (unless you use double-quoted names, which is an unnecessary and very poor practice in most cases).
Your date columns seem to be in number format, which is a very poor practice. How can you prevent an input like 20151490 (the 90-th day of the 14-th month) being stored in the db? All dates SHOULD be stored as dates. However, storing them in exactly that format allows correct order comparison (although that is just by accident and shouldn't be relied on). Since that is not the main point of your question, though, I used the data as is.
Why do you need a join? The first table should not include the cust_id - including it violates the second normal form of database design. If you do, in fact, have that column in the first table, I don't see the need for the second table, or for a join. (If the cust_id is not in the first table, then you do need a join, but I will leave that aside since the question is really about picking the right rows, not about joining - despite the title).
In the first table you have two cust_id, 3 and 4, associated with the same account (and contradicting the second table, too). I assume that's a typo and in fact 4 should be 3 - but this illustrates EXACTLY why second normal form is so important. You SHOULD NOT have cust_id in the first table.
The key to your reformulated requirement is conditional ordering. If for a given account all cards on file are canceled, or if none is canceled, then pick the one with the earliest join_date. However, if an account has a mix of both kinds of cards, then pick the earliest card that is not canceled. In SQL, that can be achieved with a composite ordering (by two expressions, of which the SECOND is join_date). The first criterion is the "conditional" part. In the solution below, I use the expression CASE when cancel_date = 0 then 0 end. That is, a card that has NOT been canceled will have a flag of 0, and one that is canceled will have the flag NULL (the default if there is no ELSE part in the CASE expression). By default, NULL comes last in an ordering (which is ascending by default). So, if all cards are still valid they will all have the flag 0 and the ordering by this flag won't matter. If all are canceled the flag is NULL for all, so ordering by this flag won't matter. But if some are valid and some canceled, then the valid ones will come first, so the earliest date will be picked only from valid cards.
Note that then 0 (the flag value of 0) is irrelevant; I could make it 1, or even a string (then 'a') and the "conditional ordering" would work just the same, and for the same reason. I attach something that is not NULL to valid cards and NULL to canceled cards; that's all that matters.
This is the change that Gordon would need to make his solution work, too. But, in cases like this, I prefer the KEEP(DENSE_RANK FIRST) approach, especially if performance is important (as might be the case when you have a very large number of customers, accounts, and credit cards on file).
with
customer_card ( cust_id , cust_acct , card_no , join_date , cancel_date ) as (
select 1, 10001, 'E100001', 20150501, 20160101 from dual union all
select 1, 10001, 'E100002', 20151001, 0 from dual union all
select 2, 10002, 'E100003', 20150101, 20160601 from dual union all
select 3, 10003, 'E100004', 20150201, 0 from dual union all
select 3, 10003, 'E100005', 20160101, 0 from dual
)
-- end of test data; actual solution begins HERE
select cust_id, cust_acct,
min(card_no) keep (dense_rank first
order by case when cancel_date = 0 then 0 end, join_date) as card_no,
min(join_date) keep (dense_rank first
order by case when cancel_date = 0 then 0 end, join_date) as join_date,
min(cancel_date) keep (dense_rank first
order by case when cancel_date = 0 then 0 end, join_date) as cancel_date
from customer_card
group by cust_id, cust_acct
order by cust_id, cust_acct -- ORDER BY is optional
;
Output:
CUST_ID CUST_ACCT CARD_NO JOIN_DATE CANCEL_DATE
--------- ---------- ------- --------- -----------
1 10001 E100002 20151001 0
2 10002 E100003 20150101 20160601
3 10003 E100004 20150201 0
Try this:
Edit: I'm stealing mathguy's "customer_card" table creation. Would imagine his way works too, so here's another solution:
with
customer_card ( cust_id , cust_acct , card_no , join_date , cancel_date ) as (
select 1, 10001, 'E100001', 20150501, 20160101 from dual union all
select 1, 10001, 'E100002', 20151001, 0 from dual union all
select 2, 10002, 'E100003', 20150101, 20160601 from dual union all
select 3, 10003, 'E100004', 20150201, 0 from dual union all
select 3, 10003, 'E100005', 20160101, 0 from dual
)
, allresults as(
select
cust_id,
cust_acct,
card_no,
join_date,
cancel_date,
rank() over(partition by cust_acct order by decode(cancel_date, 0, 1, 2), join_date, rownum) DATE_RANK
from customer_card
)
select
*
from allresults
where DATE_RANK = 1
I asked this question in regard to SQL Server, but what's the answer for an Oracle environment (10g)?
If I have a table containing schedule information that implies particular dates, is there a SQL statement that can be written to convert that information into actual rows, using something like MSSQL's Commom Table Expressions, perhaps?
Consider a payment schedule table with these columns:
StartDate - the date the schedule begins (1st payment is due on this date)
Term - the length in months of the schedule
Frequency - the number of months between recurrences
PaymentAmt - the payment amount :-)
SchedID StartDate Term Frequency PaymentAmt
-------------------------------------------------
1 05-Jan-2003 48 12 1000.00
2 20-Dec-2008 42 6 25.00
Is there a single SQL statement to allow me to go from the above to the following?
Running
SchedID Payment Due Expected
Num Date Total
--------------------------------------
1 1 05-Jan-2003 1000.00
1 2 05-Jan-2004 2000.00
1 3 05-Jan-2005 3000.00
1 4 05-Jan-2006 4000.00
2 1 20-Dec-2008 25.00
2 2 20-Jun-2009 50.00
2 3 20-Dec-2009 75.00
2 4 20-Jun-2010 100.00
2 5 20-Dec-2010 125.00
2 6 20-Jun-2011 150.00
2 7 20-Dec-2011 175.00
Your thoughts are appreciated.
Oracle actually has syntax for hierarchical queries using the CONNECT BY clause. SQL Server's use of the WITH clause looks like a hack in comparison:
SELECT t.SchedId,
CASE LEVEL
WHEN 1 THEN
t.StartDate
ELSE
ADD_MONTHS(t.StartDate, t.frequency)
END 'DueDate',
CASE LEVEL
WHEN 1 THEN
t.PaymentAmt
ELSE
SUM(t.paymentAmt)
END 'RunningExpectedTotal'
FROM PaymentScheduleTable t
WHERE t.PaymentNum <= t.Term / t.Frequency
CONNECT BY PRIOR t.startdate = t.startdate
GROUP BY t.schedid, t.startdate, t.frequency, t.paymentamt
ORDER BY t.SchedId, t.PaymentNum
I'm not 100% on that - I'm more confident about using:
SELECT t.SchedId,
t.StartDate 'DueDate',
t.PaymentAmt 'RunningExpectedTotal'
FROM PaymentScheduleTable t
WHERE t.PaymentNum <= t.Term / t.Frequency
CONNECT BY PRIOR t.startdate = t.startdate
ORDER BY t.SchedId, t.PaymentNum
...but it doesn't include the logic to handle when you're dealing with the 2nd+ entry in the chain to add months & sum the amounts. The summing could be done with GROUP BY CUBE or ROLLUP depending on the detail needed.
I don't understand why 5 payment days for schedid = 1 and 7 for scheid = 2?
48 /12 = 4 and 42 / 6 = 7. So I expected 4 payment days for schedid = 1.
Anyway I use the model clause:
create table PaymentScheduleTable
( schedid number(10)
, startdate date
, term number(3)
, frequency number(3)
, paymentamt number(5)
);
insert into PaymentScheduleTable
values (1,to_date('05-01-2003','dd-mm-yyyy')
, 48
, 12
, 1000);
insert into PaymentScheduleTable
values (2,to_date('20-12-2008','dd-mm-yyyy')
, 42
, 6
, 25);
commit;
And now the select with model clause:
select schedid, to_char(duedate,'dd-mm-yyyy') duedate, expected, i paymentnum
from paymentscheduletable
model
partition by (schedid)
dimension by (1 i)
measures (
startdate duedate
, paymentamt expected
, term
, frequency)
rules
( expected[for i from 1 to term[1]/frequency[1] increment 1]
= nvl(expected[cv()-1],0) + expected[1]
, duedate[for i from 1 to term[1]/frequency[1] increment 1]
= add_months(duedate[1], (cv(i)-1) * frequency[1])
)
order by schedid,i;
This outputs:
SCHEDID DUEDATE EXPECTED PAYMENTNUM
---------- ---------- ---------- ----------
1 05-01-2003 1000 1
1 05-01-2004 2000 2
1 05-01-2005 3000 3
1 05-01-2006 4000 4
2 20-12-2008 25 1
2 20-06-2009 50 2
2 20-12-2009 75 3
2 20-06-2010 100 4
2 20-12-2010 125 5
2 20-06-2011 150 6
2 20-12-2011 175 7
11 rows selected.
I didn't set out to answer my own question, but I'm doing work with Oracle now and I have had to learn some new Oracle-flavored things.
Anyway, the CONNECT BY statement is really nice--yes, much nicer than MSSQL's hierchical query approach, and using that construct, I was able to produce a very clean query that does what I was looking for:
SELECT DISTINCT
t.SchedID
,level as PaymentNum
,add_months(T.StartDate,level - 1) as DueDate
,(level * t.PaymentAmt) as RunningTotal
FROM SchedTest t
CONNECT BY level <= (t.Term / t.Frequency)
ORDER BY t.SchedID, level
My only remaining issue is that I had to use DISTINCT because I couldn't figure out how to select my rows from DUAL (the affable one-row Oracle table) instead of from my table of schedule data, which has at least 2 rows. If I could do the above with FROM DUAL, then my DISTINCT indicator wouldn't be necessary. Any thoughts?
Other than that, I think this is pretty nice. Et tu?