Oracle SQL, Recursion, Spreadsheet calculation - sql

I want to transfer a calculation from an Excel spreadsheet to an Oracle SQL Query.
There are three predefined columns ID, IncommingDate and ProcessingTime.
Now I want to calculate two additional columns namely Processing Start and Processing End.
The result should look as follows:
With the Formulas:
One can see that the ProcessingStart of one entry should be the maximum of its IncommingDate and the ProcessingEnd of the previous entry.
How can I achieve this using SQL?
I have prepared an example query here:
WITH example AS
(
SELECT
1 AS id,
to_date ('01.01.2018 00:00:00','dd.MM.yyyy HH24:mi:ss') AS IncommingDate,
60 AS "Processing Time [sec.]"
FROM
dual
UNION ALL
SELECT
2,
to_date ('01.01.2018 00:05:00','dd.MM.yyyy HH24:mi:ss'),
60
FROM
dual
UNION ALL
SELECT
3,
to_date ('01.01.2018 00:05:30','dd.MM.yyyy HH24:mi:ss'),
60
FROM
dual
UNION ALL
SELECT
4,
to_date ('01.01.2018 00:10:00','dd.MM.yyyy HH24:mi:ss'),
60
FROM
dual
)
SELECT
*
FROM
example
Does anybody of you knows a way to do this?

It looks like you need to use recursive subquery factoring:
with rcte (id, IncommingDate, ProcessingTime, ProcessingStart, ProcessingEnd) as (
select id,
IncommingDate,
ProcessingTime,
IncommingDate,
IncommingDate + (ProcessingTime/86400)
from example
where id = 1
union all
select e.id,
e.IncommingDate,
e.ProcessingTime,
greatest(e.IncommingDate, r.ProcessingEnd),
greatest(e.IncommingDate, r.ProcessingEnd) + (e.ProcessingTime/86400)
from rcte r
-- assumes IDs are the ordering criteris and are contiguous
join example e on e.id = r.id + 1
)
select * from rcte;
ID INCOMMINGDATE PROCESSINGTIME PROCESSINGSTART PROCESSINGEND
---------- ------------------- -------------- ------------------- -------------------
1 2018-01-01 00:00:00 60 2018-01-01 00:00:00 2018-01-01 00:01:00
2 2018-01-01 00:05:00 60 2018-01-01 00:05:00 2018-01-01 00:06:00
3 2018-01-01 00:05:30 60 2018-01-01 00:06:00 2018-01-01 00:07:00
4 2018-01-01 00:10:00 60 2018-01-01 00:10:00 2018-01-01 00:11:00
The anchor member is ID 1, and can do a simple calculation for that first step to get the start/end times.
The recursive member then find the next original row and uses greatest() to decide whether to do its calculations based on it's incoming time or the previous end time.
This is assuming that the ordering is based on the IDs, and that they are contiguous. If that isn't how you are actually ordering then it's only a bit more complicated.

Related

SQL - Fuzzy JOIN on Timestamp columns within X amount of time

Say I have two tables:
a:
timestamp
precipitation
2015-08-03 21:00:00 UTC
3
2015-08-03 22:00:00 UTC
3
2015-08-04 3:00:00 UTC
4
2016-02-04 18:00:00 UTC
4
and b:
timestamp
loc
2015-08-03 21:23:00 UTC
San Francisco
2016-02-04 16:04:00 UTC
New York
I want to join to get a table who has fuzzy joined entries where every row in b tries to get joined to a row in a. Criteria:
The time is within 60 minutes. If a match does not exist within 60 minutes, do not include that row in the output.
In the case of a tie where some row in b could join onto two rows in a, pick the closest one in terms of time.
Example Output:
timestamp
loc
precipitation
2015-08-03 21:00:00 UTC
San Francisco
3
What you need is an ASOF join. I don't think there is an easy way to do this with BigQuery. Other databases like Kinetica (and I think Clickhouse) support ASOF functions that can be used to perform 'fuzzy' joins.
The syntax for Kinetica would be something like the following.
SELECT *
FROM a
LEFT JOIN b
ON ASOF(a.timestamp, b.timestamp, INTERVAL '0' MINUTES, INTERVAL '60' MINUTES, MIN)
The ASOF function above sets up an interval of 60 minutes within which to look for matches on the right side table. When there are multiple matches, it selects the one that is closest (MAX would pick the one that is farthest away).
As per my understanding and based on the data you provided I think the below query should work for your use case.
create temporary table a as(
select TIMESTAMP('2015-08-03 21:00:00 UTC') as ts, 3 as precipitation union all
select TIMESTAMP('2015-08-03 22:00:00 UTC'), 3 union all
select TIMESTAMP('2015-08-04 3:00:00 UTC'), 4 union all
select TIMESTAMP('2016-02-04 18:00:00 UTC'), 4
);
create temporary table b as(
select TIMESTAMP('2015-08-03 21:23:00 UTC') as ts,'San Francisco ' as loc union all
select TIMESTAMP('2016-02-04 14:04:00 UTC') as ts,'New York ' as loc
);
select b_ts,a_ts,loc,precipitation,diff_time_sec
from(
select b.ts b_ts,a.ts a_ts,
ABS(TIMESTAMP_DIFF(b.ts,a.ts, SECOND)) as diff_time_sec,
*
from b
inner join a on b.ts between date_sub(a.ts, interval 60 MINUTE) and date_add(a.ts, interval 60 MINUTE)
)
qualify RANK() OVER(partition by b_ts ORDER BY diff_time_sec) = 1

How to calculate total worktime per week [SQL]

I have a table of EMPLOYEES that contains information about the DATE and WORKTIME per that day. Fx:
ID | DATE | WORKTIME |
----------------------------------------
1 | 1-Sep-2014 | 4 |
2 | 2-Sep-2014 | 6 |
1 | 3-Sep-2014 | 5.5 |
1 | 4-Sep-2014 | 7 |
2 | 4-Sep-2014 | 4 |
1 | 9-Sep-2014 | 8 |
and so on.
Question: How can I create a query that would allow me to calculate amount of time worked per week (HOURS_PERWEEK). I understand that I need a summation of WORKTIME together with grouping considering both, ID and week, but so far my trials as well as googling didnt yield any results. Any ideas on this? Thank you in advance!
edit:
Got a solution of
select id, sum (worktime), trunc(date, 'IW') week
from employees
group by id, TRUNC(date, 'IW');
But will need somehow to connect that particular output with DATE table by updating a newly created column such as WEEKLY_TIME. Any hints on that?
You can find the start of the ISO week, which will always be a Monday, using TRUNC("DATE", 'IW').
So if, in the query, you GROUP BY the id and the start of the week TRUNC("DATE", 'IW') then you can SELECT the id and aggregate to find the SUM the WORKTIME column for each id.
Since this appears to be a homework question and you haven't attempted a query, I'll leave it at this to point you in the correct direction and you can complete the query.
Update
Now I need to create another column (lets call it WEEKLY_TIME) and populate it with values from the current output, so that Sep 1,3,4 (for ID=1) would all contain value 16.5, specifying that on that day (that is within the certain week) that person worked 16.5 in total. And for ID=2 it would then be a value of 10 for both Sep 2 and 4.
For this, if I understand correctly, you appear to not want to use aggregation functions and want to use the analytic version of the function:
select id,
"DATE",
trunc("DATE", 'IW') week,
worktime,
sum (worktime) OVER (PARTITION BY id, trunc("DATE", 'IW'))
AS weekly_time
from employees;
Which, for the sample data:
CREATE TABLE employees (ID, "DATE", WORKTIME) AS
SELECT 1, DATE '2014-09-01', 4 FROM DUAL UNION ALL
SELECT 2, DATE '2014-09-02', 6 FROM DUAL UNION ALL
SELECT 1, DATE '2014-09-03', 5.5 FROM DUAL UNION ALL
SELECT 1, DATE '2014-09-04', 7 FROM DUAL UNION ALL
SELECT 2, DATE '2014-09-04', 4 FROM DUAL UNION ALL
SELECT 1, DATE '2014-09-09', 8 FROM DUAL;
Outputs:
ID
DATE
WEEK
WORKTIME
WEEKLY_TIME
1
2014-09-01 00:00:00
2014-09-01 00:00:00
4
16.5
1
2014-09-03 00:00:00
2014-09-01 00:00:00
5.5
16.5
1
2014-09-04 00:00:00
2014-09-01 00:00:00
7
16.5
1
2014-09-09 00:00:00
2014-09-08 00:00:00
8
8
2
2014-09-04 00:00:00
2014-09-01 00:00:00
4
10
2
2014-09-02 00:00:00
2014-09-01 00:00:00
6
10
db<>fiddle here
edit: answer submitted without noticing "Oracle" tag. Otherwise, question answered here: Oracle SQL - Sum and group data by week
Select employee_Id,
DATEPART(week, workday) as [Week],
sum (worktime) as [Weekly Hours]
from WORK
group by employee_id, DATEPART(week, workday)
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=238b229156a383fa3c466b6c3c2dee1e

Getting the days between two dates for multiple IDs

This is a follow up question from How to get a list of months between 2 given dates using a query?
really. (I suspect it's because I don't quite understand the logic behind connect by level clauses !)
What I have is a list of data like so
ID | START_DATE | END_DATE
1 | 01-JAN-2018 | 20-JAN-2018
2 | 13-FEB-2018 | 20-MAR-2018
3 | 01-MAR-2018 | 07-MAR-2018
and what I want to try and get is a list with all the days between the start and end date for each ID.
So for example I want a list which gives
ID | DATE
1 | 01-JAN-2018
1 | 02-JAN-2018
1 | 03-JAN-2018
...
1 | 19-JAN-2018
1 | 20_JAN-2018
2 | 13-FEB-2018
2 | 14-FEB-2018
2 | 15-FEB-2018
...
etc.
What I've tried to do is adapt one of the answers from the above link as follows
select id
, trunc(start_date+((level-1)),'DD')
from (
select id
, start_date
, end_date
from blah
)
connect by level <= ((trunc(end_date,'DD')-trunc(start_date,'DD'))) + 1
which gives me what I want but then a whole host of duplicate dates as if it's like a cartesian join. Is there something simple I need to add to fix this?
I like recursive CTEs:
with cte as (
select id, start_dte as dte, end_dte
from blah
union all
select id, dte + 1, end_dte
from cte
where dte < end_dte
)
select *
from cte
order by id, dte;
This is ANSI standard syntax and works in several other databases.
The hierarchical query you were trying to do needs to include id = prior id in the connect-by clause, but as that causes loops with multiple source rows you also need to include a call to a non-deterministic function, such as dbms_random.value:
select id, start_date + level - 1 as day
from blah
connect by level <= end_date - start_date + 1
and prior id = id
and prior dbms_random.value is not null
With your sample data in a CTE, that gets 63 rows back:
with blah (ID, START_DATE, END_DATE) as (
select 1, date '2018-01-01', date '2018-01-20' from dual
union all select 2, date '2018-02-13', date '2018-03-20' from dual
union all select 3, date '2018-03-01', date '2018-03-07' from dual
)
select id, start_date + level - 1 as day
from blah
connect by level <= end_date - start_date + 1
and prior id = id
and prior dbms_random.value is not null;
ID DAY
---------- ----------
1 2018-01-01
1 2018-01-02
1 2018-01-03
...
1 2018-01-19
1 2018-01-20
2 2018-02-13
2 2018-02-14
...
3 2018-03-05
3 2018-03-06
3 2018-03-07
You don't need to trunc() the dates unless they have non-midnight times, which seems unlikely in this case, and even then it might not be necessary if only the end-date has a later time (like 23:59:59).
A recursive CTE is more intuitive in many ways though, at least once you understand the basic idea of them; so I'd probably use Gordon's approach too. There can be differences in performance and whether they work at all for large amounts of data (or generated rows), but for a lot of data it's worth comparing different approaches to find the most suitable/performant anyway.

Oracle SQL - Calculate number of concurrent phone calls

First of all, I am not very experienced with SQL. I have found similar questions here before but so far I was not able to develop a working solution for my specific problem.
I have a table that holds phone call records which has the following fields:
END: Holds the timestamp of when a call ended - Data Type: DATE
LINE: Holds the phone line that was used for a call - Data Type: NUMBER
CALLDURATION: Holds the duration of a call in seconds - Data Type: NUMBER
The table has entries like this:
END LINE CALLDURATION
---------------------- ------------------- -----------------------
25/01/2012 14:05:10 6 65
25/01/2012 14:08:51 7 1142
25/01/2012 14:20:36 5 860
I need to create a query that returns the number of concurrent phone calls based on the data from that table. The query should be capable of calculating that number in fixed intervals, such as every 5 minutes.
To make this more clear, here is an example of what the query should return (based on the example entries from the previous table:
TIMESTAMP CURRENTLYUSEDLINES
---------------------- -------------------
25/01/2012 14:05:00 2
25/01/2012 14:10:00 1
25/01/2012 14:15:00 1
How can I do this with a (Oracle) SQL query? There are currently almost 1 million records in the table so the query must be as fast as possible because otherwise it would take forever to execute it.
One solution would be this one:
WITH t AS
(SELECT TIMESTAMP '2012-01-25 14:00:00' + LEVEL * INTERVAL '5' MINUTE AS TS
FROM dual
CONNECT BY TIMESTAMP '2012-01-25 14:00:00' + LEVEL * INTERVAL '5' MINUTE <= TIMESTAMP '2012-01-25 14:15:00'),
calls AS
(SELECT TIMESTAMP '2012-01-25 14:05:10' AS END_TIME, 6 AS LINE, 65 AS duration FROM dual
UNION ALL SELECT TIMESTAMP '2012-01-25 14:08:51', 7, 1142 FROM dual
UNION ALL SELECT TIMESTAMP '2012-01-25 14:20:36', 5, 860 FROM dual)
SELECT TS, count(distinct line)
FROM t
LEFT OUTER JOIN calls ON ts BETWEEN END_TIME - duration * INTERVAL '1' SECOND AND END_TIME
GROUP BY ts
HAVING count(distinct line) > 0
ORDER BY ts;
TS COUNT(DISTINCTLINE)
-------------------- -------------------
25.01.2012 14:05:00 2
25.01.2012 14:10:00 1
25.01.2012 14:15:00 1
3 rows selected.

PostgreSQL query for multiple update

I have a table in which I have 4 columns: emp_no,desig_name,from_date and to_date:
emp_no desig_name from_date to_date
1001 engineer 2004-08-01 00:00:00
1001 sr.engineer 2010-08-01 00:00:00
1001 chief.engineer 2013-08-01 00:00:00
So my question is to update first row to_date column just one day before from_date of second row as well as for the second one aslo?
After update it should look like:
emp_no desig_name from_date to_date
1001 engineer 2004-08-01 00:00:00 2010-07-31 00:00:00
1001 sr.engineer 2010-08-01 00:00:00 2013-07-31 00:00:00
1001 chief.engineer 2013-08-01 00:00:00
You can calculate the "next" date using the lead() function.
This calculated value can then be used to update the table:
with calc as (
select promotion_id,
emp_no,
from_date,
lead(from_date) over (partition by emp_no order by from_date) as next_date
from emp
)
update emp
set to_date = c.next_date - interval '1' day
from calc c
where c.promotion_id = emp.promotion_id;
As you can see getting that value is quite easy, and storing derived information is very often not a good idea. You might want to consider a view that calculates this information on the fly so you don't need to update your table each time you insert a new row.
SQLFiddle example: http://sqlfiddle.com/#!15/31665/1