Summarize values across timeline in SQL - sql

The Problem
I have a PostgreSQL database on which I am trying to summarize the revenue of a cash register over time. The cash register can either have status ACTIVE or INACTIVE, but I only want to summarize the earnings created when it was ACTIVE for a given period of time.
I have two tables; one that marks the revenue and one that marks the cash register status:
CREATE TABLE counters
(
id bigserial NOT NULL,
"timestamp" timestamp with time zone,
total_revenue bigint,
id_of_machine character varying(50),
CONSTRAINT counters_pkey PRIMARY KEY (id)
)
CREATE TABLE machine_lifecycle_events
(
id bigserial NOT NULL,
event_type character varying(50),
"timestamp" timestamp with time zone,
id_of_affected_machine character varying(50),
CONSTRAINT machine_lifecycle_events_pkey PRIMARY KEY (id)
)
A counters entry is added every 1 minute and total_revenue only increases. A machine_lifecycle_events entry is added every time the status of the machine changes.
I have added an image illustrating the problem. It is the revenue during the blue periods which should be summarized.
What I have tried so far
I have created a query which can give me the total revenue in a given instant:
SELECT total_revenue
FROM counters
WHERE timestamp < '2014-03-05 11:00:00'
AND id_of_machine='1'
ORDER BY
timestamp desc
LIMIT 1
The questions
How do I calculate the revenue earned between two timestamps?
How do I determine the start and end timestamps of the blue periods when I have to compare the timestamps in machine_lifecycle_events with the input period?
Any ideas on how to attack this problem?
Update
Example data:
INSERT INTO counters VALUES
(1, '2014-03-01 00:00:00', 100, '1')
, (2, '2014-03-01 12:00:00', 200, '1')
, (3, '2014-03-02 00:00:00', 300, '1')
, (4, '2014-03-02 12:00:00', 400, '1')
, (5, '2014-03-03 00:00:00', 500, '1')
, (6, '2014-03-03 12:00:00', 600, '1')
, (7, '2014-03-04 00:00:00', 700, '1')
, (8, '2014-03-04 12:00:00', 800, '1')
, (9, '2014-03-05 00:00:00', 900, '1')
, (10, '2014-03-05 12:00:00', 1000, '1')
, (11, '2014-03-06 00:00:00', 1100, '1')
, (12, '2014-03-06 12:00:00', 1200, '1')
, (13, '2014-03-07 00:00:00', 1300, '1')
, (14, '2014-03-07 12:00:00', 1400, '1');
INSERT INTO machine_lifecycle_events VALUES
(1, 'ACTIVE', '2014-03-01 08:00:00', '1')
, (2, 'INACTIVE', '2014-03-03 00:00:00', '1')
, (3, 'ACTIVE', '2014-03-05 00:00:00', '1')
, (4, 'INACTIVE', '2014-03-06 12:00:00', '1');
SQL Fiddle with sample data.
Example query:
The revenue between '2014-03-02 08:00:00' and '2014-03-06 08:00:00' is 300. 100 for the first ACTIVE period, and 200 for the second ACTIVE period.

DB design
To make my work easier I sanitized your DB design before I tackled the questions:
CREATE TEMP TABLE counter (
id bigserial PRIMARY KEY
, ts timestamp NOT NULL
, total_revenue bigint NOT NULL
, machine_id int NOT NULL
);
CREATE TEMP TABLE machine_event (
id bigserial PRIMARY KEY
, ts timestamp NOT NULL
, machine_id int NOT NULL
, status_active bool NOT NULL
);
Test case in the fiddle.
Major points
Using ts instead of "timestamp". Never use basic type names as column names.
Simplified & unified the name machine_id and made it out to be integer as it should be, instead of varchar(50).
event_type varchar(50) should be an integer foreign key, too, or an enum. Or even just a boolean for only active / inactive. Simplified to status_active bool.
Simplified and sanitized INSERT statements as well.
Answers
Assumptions
total_revenue only increases (per question).
Borders of the outer time frame are included.
Every "next" row per machine in machine_event has the opposite status_active.
1. How do I calculate the revenue earned between two timestamps?
WITH span AS (
SELECT '2014-03-02 12:00'::timestamp AS s_from -- start of time range
, '2014-03-05 11:00'::timestamp AS s_to -- end of time range
)
SELECT machine_id, s.s_from, s.s_to
, max(total_revenue) - min(total_revenue) AS earned
FROM counter c
, span s
WHERE ts BETWEEN s_from AND s_to -- borders included!
AND machine_id = 1
GROUP BY 1,2,3;
2. How do I determine the start and end timestamps of the blue periods when I have to compare the timestamps in machine_event with the input period?
This query for all machines in the given time frame (span).
Add WHERE machine_id = 1 in the CTE cte to select a specific machine.
WITH span AS (
SELECT '2014-03-02 08:00'::timestamp AS s_from -- start of time range
, '2014-03-06 08:00'::timestamp AS s_to -- end of time range
)
, cte AS (
SELECT machine_id, ts, status_active, s_from
, lead(ts, 1, s_to) OVER w AS period_end
, first_value(ts) OVER w AS first_ts
FROM span s
JOIN machine_event e ON e.ts BETWEEN s.s_from AND s.s_to
WINDOW w AS (PARTITION BY machine_id ORDER BY ts)
)
SELECT machine_id, ts AS period_start, period_end -- start in time frame
FROM cte
WHERE status_active
UNION ALL -- active start before time frame
SELECT machine_id, s_from, ts
FROM cte
WHERE NOT status_active
AND ts = first_ts
AND ts <> s_from
UNION ALL -- active start before time frame, no end in time frame
SELECT machine_id, s_from, s_to
FROM (
SELECT DISTINCT ON (1)
e.machine_id, e.status_active, s.s_from, s.s_to
FROM span s
JOIN machine_event e ON e.ts < s.s_from -- only from before time range
LEFT JOIN cte c USING (machine_id)
WHERE c.machine_id IS NULL -- not in selected time range
ORDER BY e.machine_id, e.ts DESC -- only the latest entry
) sub
WHERE status_active -- only if active
ORDER BY 1, 2;
Result is the list of blue periods in your image.
SQL Fiddle demonstrating both.
Recent similar question:
Sum of time difference between rows

ok, I have an answer, but I had to assume that the id of the machine_lifecycle_events can be used to determine accessor and predecessor. So for my solution to work better you should have a link between the active and inactive events. There might be also other ways to solve it but those would add even more complexity.
first, to get the revenue for all active periods per machine you can do the following:
select c.id_of_machine, cycle_id, cycle_start, cycle_end, sum(total_revenue)
from counters c join (
select e1.id as cycle_id,
e1.timestamp as cycle_start,
e2.timestamp as cycle_end,
e1.id_of_affected_machine as cycle_machine_id
from machine_lifecycle_events e1 join machine_lifecycle_events e2
on e1.id + 1 = e2.id and -- this should be replaced with a specific column to find cycles which belong together
e1.id_of_affected_machine = e2.id_of_affected_machine
where e1.event_type = 'ACTIVE'
) cycle
on c.id_of_machine = cycle_machine_id and
cycle_start <= c.timestamp and c.timestamp <= cycle_end
group by c.id_of_machine, cycle_id, cycle_start, cycle_end
order by c.id_of_machine, cycle_id
you can further use this query and add more where conditions to get the revenue only within a time frame or for specific machines:
select sum(total_revenue)
from counters c join (
select e1.id as cycle_id,
e1.timestamp as cycle_start,
e2.timestamp as cycle_end,
e1.id_of_affected_machine as cycle_machine_id
from machine_lifecycle_events e1 join machine_lifecycle_events e2
on e1.id + 1 = e2.id and -- this should be replaced with a specific column to find cycles which belong together
e1.id_of_affected_machine = e2.id_of_affected_machine
where e1.event_type = 'ACTIVE'
) cycle
on c.id_of_machine = cycle_machine_id and
cycle_start <= c.timestamp and c.timestamp <= cycle_end
where '2014-03-02 08:00:00' <= c.timestamp and c.timestamp <= '2014-03-06 08:00:00'
and c.id_of_machine = '1'
As mentioned in the beginning, and in the comments, my way of finding connecting events isn't suitable for any more complex examples with multiple machines. The easiest way would be to have another column which would always point to the preceding event. Another way would be to have a function which would find those events but this solution couldn't make use of indices.

Use self-join and build intervals table with actual status of each interval.
with intervals as (
select e1.timestamp time1, e2.timestamp time2, e1.EVENT_TYPE as status
from machine_lifecycle_events e1
left join machine_lifecycle_events e2 on e2.id = e1.id + 1
) select * from counters c
join intervals i on (timestamp between i.time1 and i.time2 or i.time2 is null)
and i.status = 'ACTIVE';
I didn't use aggregation to show the result set, you can do this simply, I think. Also I missed machineId to simplify demonstration of this pattern.

Related

Left join on single record

I have the next table that stores events:
(simplified structure)
ID
User
Action
Timestamp
12
user1
END
2022-01-01 05:00
43
user1
START
2022-01-01 04:00
54
user1
END
2022-01-01 03:00
13
user1
START
2022-01-01 02:00
I need to join 2 events in one row, so any START event is accompanied by the END event that comes after that.
So the result should be the next:
ID1
ID2
User
Start Timestamp
End Timestamp
13
54
user1
2022-01-01 02:00
2022-01-01 03:00
43
12
user1
2022-01-01 04:00
2022-01-01 05:00
Ideally, it should not have to many performance issues, as there could be a lot of records in the table.
I've tried the next query:
select
s.id as "ID1",
e.id as "ID2",
s.user,
s.time as "Start Time",
e.time as "End Time"
from Events s
left join Events e on s.user = e.user
where s.action = 'START'
and e.action = 'END'
and s.timestamp < e.timestamp
but it will also match the record 13 to record 12.
Is it possible to join the left side to right only once? (keeping in mind that is should be the next END record time-wise?
Thanks
One way is a lateral join that picks the smallest "end" timestamp that is greater than the "start" timestamp:
select st.id as id1,
en.id as id2,
st."timestamp" as start_timestamp,
en."timestamp" as end_timestamp
from events st
left join lateral (
select id, "timestamp"
from events e
where e."user" = st."user"
and e.action = 'END'
and e.timestamp >= st.timestamp
order by "timestamp"
fetch first 1 row only
) en on true
where st.action = 'START';
The above is standard ANSI SQL and works (at least) in Postgres.
In Postgres I would create an index on events ("user", "timestamp") where action = 'END' to make the lateral query fast.
Here is a PostgreSQL solution using lateral join. It might be working on HANA as no Postgres-specific features are used. The internal query selects the 'END' action for the same user that occurred soonest after the corresponding 'START'. Events that have started but not finished yet will have NULL values for "ID2" and "End timestamp".
create temporary table the_table(id integer, usr text, action text, ts timestamp);
insert into the_table values
(12,'user1','END','2022-01-01 05:00'),(43,'user1','START','2022-01-01 04:00'),
(54,'user1','END','2022-01-01 03:00'),(13,'user1','START','2022-01-01 02:00');
select tx.id as "ID1", l.id as "ID2", tx.usr as "User",
tx.ts as "Start timestamp", l.ts as "End timestamp"
from the_table as tx
left join lateral
(
select ti.id, ti.ts
from the_table as ti
where ti.action = 'END'
and ti.ts > tx.ts
and ti.usr = tx.usr
order by ti.ts - tx.ts
limit 1
) as l on true
where tx.action = 'START'
order by "Start timestamp";
The issue with your query above is that for each start event, there can be multiple end events, which occur after. However, you would like to choose the one that's 'closest' to the start event. You can achieve this by adding an additional aggregation.
Please find a HANA example (uses no HANA specific functionality):
CREATE TABLE TEST (ID integer, USER NVARCHAR(20), ACTION NVARCHAR(20), TIMESTAMP DATETIME)
INSERT INTO TEST VALUES (12, 'user1', 'END', '2022-01-01 05:00')
INSERT INTO TEST VALUES (43, 'user1', 'START', '2022-01-01 04:00')
INSERT INTO TEST VALUES (54, 'user1', 'END', '2022-01-01 03:00')
INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 02:00')
INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 09:00')
SELECT
S.ID ID1,
S.USER,
S.ACTION,
S.TIMESTAMP START_TIME,
MIN(E.TIMESTAMP) END_TIME
FROM TEST S
JOIN TEST E ON (
s.USER = e.USER AND
s.ACTION = 'START' AND
e.ACTION = 'END' AND
e.TIMESTAMP >= s.TIMESTAMP
)
GROUP BY S.ID, S.ACTION, S.USER, S.TIMESTAMP
If you need to have E.ID included, you will need to join it back to the result set. Note, that there may be multiple end events with the same timestamp, which you need to handle when joining back E.ID.
If you additionally would like to include START events without corresponding END event, you can use the following:
INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 09:00')
SELECT
S.ID ID1,
S.USER,
S.ACTION,
S.TIMESTAMP START_TIME,
MIN(E.TIMESTAMP) END_TIME
FROM TEST S
LEFT JOIN TEST E ON (
s.USER = e.USER AND
e.ACTION = 'END' AND
e.TIMESTAMP >= s.TIMESTAMP
)
WHERE s.ACTION ='START'
GROUP BY S.ID, S.ACTION, S.USER, S.TIMESTAMP
We want to get the nearest timestamp of the END event for each START event.
I would go with the following approach:
Get the minimum greater than zero timestamp difference for each of the START events.
Now find the actual END event using the timedelta.
Assumptions
At max we can have only one event which is not ended yet!
For every START event, the timestamps will be unique. (Same goes for END event.
WITH closest_to_start AS (
SELECT
s.id,
MIN(TIMESTAMPDIFF(SECOND, s.timestamp, e.timestamp)) AS min_delta
FROM Events AS s
INNER JOIN Events AS e ON s.user = e.user
WHERE s.action = 'START'
AND e.action = 'END'
GROUP BY s.id
HAVING min_delta >= 0
)
SELECT s.id,
e.id
FROM Events AS s
OUTER JOIN closest_to_start ON closest_to_start.id = s.id
OUTER JOIN Events AS e ON e.id = s.id
WHERE s.action = 'START'
AND e.action = 'END'
AND
(
e.timestamp IS NULL
OR
TIMESTAMPDIFF(SECOND, s.timestamp, e.timestamp) = closest_to_start.min_delta
)
You can use the window function Lead.
with Daten
as
(
Select 12 as ID, 'user1' as Benutzer, 'END' as action, '05:00' as Time
Union
Select 43 as ID, 'user1' as Benutzer, 'Start' as action, '04:00' as Time
Union
Select 54 as ID, 'user1' as Benutzer, 'END' as action, '03:00' as Time
Union
Select 13 as ID, 'user1' as Benutzer, 'Start' as action, '02:00' as Time
)
Select
*
from
(
Select
*,
lead(ID,1) over (order by number) as ID2,
lead(action,1) over (order by number) as action2,
lead(time,1) over (order by number) as time2
from
(
Select
*,
ROW_NUMBER() OVER(ORDER BY Benutzer,Time,action) as number
from
Daten
) x
) y
where y.action = 'Start'
Solution tested in HANA SQL
Same query but excluding the records that are not the min duration
CREATE TABLE "TESTSCHEMA"."EVENTS" (ID integer, "user" NVARCHAR(20), "action" NVARCHAR(20), "timestamp" SECONDDATE);
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (12, 'user1', 'END', '2022-01-01 05:00');
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (43, 'user1', 'START', '2022-01-01 04:00');
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (54, 'user1', 'END', '2022-01-01 03:00');
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (13, 'user1', 'START', '2022-01-01 02:00');
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (13, 'user1', 'START', '2022-01-01 09:00');
SELECT "ID1","ID2","Start Time","End Time" FROM
(
select
ROW_NUMBER() OVER(PARTITION BY s."ID" order by SECONDS_BETWEEN(e."timestamp",s."timestamp") DESC) AS RN,
s."ID" as "ID1",
e."ID" as "ID2",
s."user",
s."timestamp" as "Start Time",
e."timestamp" as "End Time",
SECONDS_BETWEEN(e."timestamp",s."timestamp") AS "Duration"
from "TESTSCHEMA"."EVENTS" s
left join "TESTSCHEMA"."EVENTS" e on s."user" = e."user"
where s."action" = 'START'
and e."action" = 'END'
and s."timestamp" < e."timestamp"
)WHERE RN=1

SQL Server 2012 - Retrieve value where date is between two dates

I have the following table called 'Rates':
Valid From Employee Rate
----------------------------------
01/03/2010 1M 50
01/03/2010 2M 75
01/10/2015 1M 55
01/10/2015 2M 80
I also have the following table called 'Jobs':
ID Employee OpenedDate Rate
100000 1M 05/06/2012
100000 2M 08/09/2018
How do I retrieve the rate from the Rates table into the Jobs table, where the OpenedDate is greater than or equal to the current ValidFrom date and less than or equal to the next ValidFrom date, where the Employee also matches?
So I would end up with:
ID Employee OpenedDate Rate
100000 1M 05/06/2012 50
100000 2M 08/09/2018 80
Hope I explained that okay
Cheers for any and all help!
ps not sure how to display the above data as a table layout in Stack, been looking through the help but I cant see how??
If you can't modify the table to add a "ValidTo" column then you'll have to create one dynamically using LEAD Window function
CREATE TABLE Table1
([Valid From] DATETIME, Employee varchar(2), Rate int)
;
INSERT INTO Table1
([Valid From], Employee, Rate)
VALUES
('2010-01-03 00:00:00', '1M', 50),
('2010-01-03 00:00:00', '2M', 75),
('2015-01-10 00:00:00', '1M', 55),
('2015-01-10 00:00:00', '2M', 80)
;
CREATE TABLE Table2
(ID int, Employee varchar(2), OpenedDate DATETIME, Rate int)
;
INSERT INTO Table2
(ID, Employee, OpenedDate, Rate)
VALUES
(100000, '1M', '2012-05-06 00:00:00', NULL),
(100000, '2M', '2018-08-09 00:00:00', NULL)
;
;WITH cteValidToAdded
AS(
SELECT
T1.[Valid From]
,[ValidTo] = ISNULL(LEAD(T1.[Valid From])OVER(PARTITION BY T1.Employee ORDER BY T1.[Valid From], T1.Employee),'25001212') --Some date in distance future
,T1.Employee
,T1.Rate
FROM dbo.Table1 T1
)
SELECT
T2.ID
,T2.Employee
,OpenedDate = CONVERT(VARCHAR(12), T2.OpenedDate, 101)
,V.Rate
FROM dbo.Table2 T2
LEFT JOIN cteValidToAdded V ON V.Employee = T2.Employee
AND T2.OpenedDate >= V.[Valid From] AND T2.OpenedDate < V.ValidTo
Output
ID Employee OpenedDate Rate
100000 1M 05/06/2012 50
100000 2M 08/09/2018 80
Just use outer apply:
select j.*, r.rate
from jobs j outer apply
(select top (1) r.*
from rates r
where r.employee = j.employee and
r.valid_from <= j.opened_date
order by r.valid_from desc
) r;
With an index on rates(employee, valid_from) (and maybe including rate), this should be faster than a version that uses window functions.

A long query tuning

I have the following query, in this query, I am selecting the ebs tables, with a custom table which has header_id and populating the the data in a custom table XXREPORT_L1_TBL.
I want to tune this query.
[update] made changes to the query as bellow:
splited the query in 3 different insert statements
removed the columns which do in line queries for values
added an update statement at the end for these columns.
insert into XX.XXREPORT_L1_TBL ( ORDER_NUMBER
, LINE_NUMBER
, UOM
, CUSTOMER_LENGTH
, THEORETICAL_WEIGHT
, FINISH
, ORDER_QTY_PCS
, ORDER_QTY_KGS
, SALES_VALUE
, TOTAL_VALUE
, ORDERED_QUANTITY
, WIP_ENTITY_ID
, JOB_NAME
, JOB_TYPE
, JOB_STATUS
, JOB_RELEASED_DATE
, DATE_COMPLETED
, DATE_CLOSED
, JOB_CARD_QTY
, ALLOY
, PROFILE
, PROD_QTY_KGS
, COST_KGS_THEORY
, COST_KGS_ACTUAL
)
SELECT
---- Sales Order
xx.order_number
,xx.line_number
,xx.UOM,
xx.customer_length,
xx.theoretical_weight,
xx.finish,
xx.order_qty_pcs,
xx.order_qty_kgs,
xx.sales_value, -- total value / total kgs
xx.total_value, -- line total
xx.ordered_quantity,
-- Production
xx.wip_entity_id,
xx.job_name,
( select case when a.inventory_item_id = 5716770 and a.job_type='NOT CHILD' then 'PARENT'
when a.job_type='CHILD' and a.inventory_item_id is null then 'CHILD'
when a.job_type='NOT CHILD' and a.inventory_item_id is NOT null then 'NOT CHILD' END JOB_TYPE
from ( select disc2.wip_entity_id as wip_entity_id, decode ( nvl(disc2.attribute9,-1) , -1,'NOT CHILD', 'CHILD') job_type, oel.inventory_item_id
from APPS.wip_discrete_jobs disc2, APPS.oe_order_lines_all oel
where oel.line_id(+) = disc2.source_line_id
)a
where a.wip_entity_id = xx.wip_entity_id
) job_type,
( select decode ( xx.status_type, 6, 'Open',
3, 'Open',
4, 'Completed',
LU1.MEANING )
from APPS.FND_LOOKUP_VALUES LU1
where LU1.LOOKUP_TYPE = 'WIP_JOB_STATUS'
AND LU1.LOOKUP_CODE = xx.STATUS_TYPE
) job_status,
xx.job_released_date,
xx.date_completed,
xx.date_closed
,xx.net_quantity as job_card_qty
,xx.alloy
,xx.profile
,xx.prod_qty_kgs
-- Theoretical Order cost
,xx.cost_kgs_theory
-- Actual Order cost
,xx.cost_kgs_actual
from (
select a.*
-- Theoretical Order cost
, DECODE (a.qty_completed * a.customer_length * a.theoretical_weight,0,0,
a.TOT_THEORY_COST_RELIEVED/(a.qty_completed * a.customer_length * a.theoretical_weight) ) as cost_kgs_theory
-- Actual Order cost
, DECODE ( a.qty_completed * a.customer_length * a.theoretical_weight, 0, 0,
a.TOT_ACTUAL_COST_INCURRED/(a.qty_completed * a.customer_length * a.theoretical_weight )) as cost_kgs_actual
from (
select
-- Normal orders, INTERNAL Orders, Crimped Profile (parent jobs)
-- Sales Order
oeh.order_number as order_number
,oel.line_number
,oel.pricing_quantity_uom as UOM
,oel.attribute1 as customer_length
,oel.attribute6 as theoretical_weight
,oel.attribute5 as finish
,oel.attribute18 as order_qty_pcs
,oel.attribute7 as order_qty_kgs
,xx_om.GetLineUnitSellingPrice(oel.line_id) sales_value
,xx_om.GetHeaderUnitSellingPrice(oeh.header_id) total_value
,oel.ordered_quantity ordered_quantity
-- Production
, tbl0.qty_completed as qty_completed
,disc.wip_entity_id as wip_entity_id
,( select wip_entity_name from APPS.wip_entities ent
where ent.wip_entity_id = disc.wip_entity_id) job_name
,disc.status_type
,disc.date_released as job_released_date
, DECODE ( disc.date_completed, NULL, disc.date_completed,
-- my day Definition
to_date(to_char(to_date(TO_CHAR(disc.date_completed- interval '7' hour,'DD-MON-YYYY')||'00:00:00','DD-MON-YYYY HH24:MI:SS'), 'DD-MON-YYYY HH24:MI:SS'), 'DD-MON-YYYY HH24:MI:SS')) as date_completed
, DECODE ( disc.date_closed, NULL, disc.date_closed,
to_date(to_char(to_date(TO_CHAR(disc.date_closed- interval '7' hour,'DD-MON-YYYY')||'00:00:00','DD-MON-YYYY HH24:MI:SS'), 'DD-MON-YYYY HH24:MI:SS'), 'DD-MON-YYYY HH24:MI:SS')) as date_closed
, disc.net_quantity
, ( select opr2.quantity_completed
from APPS.wip_operations opr2
where opr2.wip_entity_id = disc.wip_entity_id
and opr2.operation_seq_num = (select max(opr.operation_seq_num)
from APPS.wip_operations opr, APPS.wip_discrete_jobs disc2
where opr.wip_entity_id = disc2.wip_entity_id
and disc2.wip_entity_id = disc.wip_entity_id))* oel.attribute1 * oel.attribute6 as prod_qty_kgs
,oel.attribute4 as alloy
,oel.attribute2 as profile
-- Theoretical Order cost
,tbl0.TOT_THEORY_COST_RELIEVED
-- Actual Order cost
,tbl0.TOT_ACTUAL_COST_INCURRED
from XX.XXREPORT_Lzero_TBL tbl0
join APPS.oe_order_headers_all oeh on oeh.header_id = tbl0.header_id
join APPS.oe_order_lines_all oel on oeh.org_id = oel.org_id and oeh.header_id = oel.header_id
join APPS.xx_assemblies asm on oel.line_id = asm.line_id
join APPS.wip_discrete_jobs disc on disc.primary_item_id = asm.inventory_item_id
where oel.link_to_line_id is null
union
-- Crimped Child Jobs
select
-- Sales Order
oeh.order_number as order_number
,oel.line_number
,oel.pricing_quantity_uom as UOM
,oel.attribute1 as customer_length
,oel.attribute6 as theoretical_weight
,oel.attribute5 as finish
,oel.attribute18 as order_qty_pcs
,oel.attribute7 as order_qty_kgs
,xx_om.GetLineUnitSellingPrice(oel.line_id) sales_value
,xx_om.GetHeaderUnitSellingPrice(oeh.header_id) total_value
,oel.ordered_quantity ordered_quantity
-- Production
, tbl0.qty_completed as qty_completed
,child_jobs.wip_entity_id as wip_entity_id
,( select wip_entity_name from APPS.wip_entities ent
where ent.wip_entity_id = child_jobs.wip_entity_id) job_name
,disc.status_type
,disc.date_released as job_released_date
, DECODE ( disc.date_completed, NULL, disc.date_completed,
to_date(to_char(to_date(TO_CHAR(disc.date_completed-interval '7' hour,'DD-MON-YYYY')||'00:00:00','DD-MON-YYYY HH24:MI:SS'), 'DD-MON-YYYY HH24:MI:SS'), 'DD-MON-YYYY HH24:MI:SS')) as date_completed
, DECODE ( disc.date_closed, NULL, disc.date_closed,
to_date(to_char(to_date(TO_CHAR(disc.date_closed-interval '7' hour,'DD-MON-YYYY')||'00:00:00','DD-MON-YYYY HH24:MI:SS'), 'DD-MON-YYYY HH24:MI:SS'), 'DD-MON-YYYY HH24:MI:SS')) as date_closed
, disc.net_quantity
, ( select opr2.quantity_completed
from APPS.wip_operations opr2
where opr2.wip_entity_id = disc.wip_entity_id
and opr2.operation_seq_num = (select max(opr.operation_seq_num)
from APPS.wip_operations opr, APPS.wip_discrete_jobs disc2
where opr.wip_entity_id = disc2.wip_entity_id
and disc.wip_entity_id = disc.wip_entity_id))* oel.attribute1 * oel.attribute6 as prod_qty_kgs
,oel.attribute4 as alloy
,oel.attribute2 as profile
-- Theoretical Order cost
,tbl0.TOT_THEORY_COST_RELIEVED
-- Actual Order cost
,tbl0.TOT_ACTUAL_COST_INCURRED
from XX.XXREPORT_Lzero_TBL tbl0
join APPS.oe_order_headers_all oeh on oeh.header_id = tbl0.header_id
join APPS.oe_order_lines_all oel on oeh.org_id = oel.org_id and oeh.header_id = oel.header_id
join APPS.xx_assemblies asm on oel.line_id = asm.line_id
join APPS.wip_discrete_jobs disc on disc.primary_item_id = asm.inventory_item_id
join ( select wdj2.source_line_id, wdj2.attribute9 child_wip, wdj2.wip_entity_id, wdj2.status_type status_type
from APPS.wip_discrete_jobs wdj2
where attribute9 IS NOT NULL ) child_jobs on child_jobs.child_wip = to_char(disc.wip_entity_id)
where oel.link_to_line_id is null
union
-- Orders with star (*) items need to pick profile and customer length etc from ego_configured_pr_agv view
select
-- Sales Order
oeh.order_number as order_number
,oel.line_number
,oel.pricing_quantity_uom as UOM
,to_char(agv.gx_cp_length) as customer_length
,to_char(agv.gx_cp_th_weight) as theoretical_weight
,agv.gx_cp_surfacetreatment as finish
,oel.attribute18 as order_qty_pcs
, to_char(agv.gx_cp_th_weight * agv.gx_cp_length * oel.ordered_quantity) as order_qty_kgs
,XX.xx_om.GetLineUnitSellingPrice(oel.line_id) sales_value
,XX.xx_om.GetHeaderUnitSellingPrice(oeh.header_id) total_value
,oel.ordered_quantity ordered_quantity
-- Production
, tbl0.qty_completed as qty_completed
,disc.wip_entity_id as wip_entity_id
,( select wip_entity_name from APPS.wip_entities ent
where ent.wip_entity_id = disc.wip_entity_id) job_name
,disc.status_type
,disc.date_released as job_released_date
, DECODE ( disc.date_completed, NULL, disc.date_completed,
to_date(to_char(to_date(TO_CHAR(disc.date_completed-interval '7' hour,'DD-MON-YYYY')||'00:00:00','DD-MON-YYYY HH24:MI:SS'), 'DD-MON-YYYY HH24:MI:SS'), 'DD-MON-YYYY HH24:MI:SS')) as date_completed
, DECODE ( disc.date_closed, NULL, disc.date_closed,
to_date(to_char(to_date(TO_CHAR(disc.date_closed-interval '7' hour,'DD-MON-YYYY')||'00:00:00','DD-MON-YYYY HH24:MI:SS'), 'DD-MON-YYYY HH24:MI:SS'), 'DD-MON-YYYY HH24:MI:SS')) as date_closed
, disc.net_quantity
, ( select opr2.quantity_completed
from APPS.wip_operations opr2
where opr2.wip_entity_id = disc.wip_entity_id
and opr2.operation_seq_num = (select max(opr.operation_seq_num)
from APPS.wip_operations opr, APPS.wip_discrete_jobs disc2
where opr.wip_entity_id = disc2.wip_entity_id
and disc2.wip_entity_id = disc.wip_entity_id))* agv.gx_cp_length * agv.gx_cp_th_weight as prod_qty_kgs
,gx_cp_alloy as alloy
,gx_cp_profile_id as profile
-- Theoretical Order cost
,tbl0.TOT_THEORY_COST_RELIEVED
-- Actual Order cost
,tbl0.TOT_ACTUAL_COST_INCURRED
from XX.XXREPORT_Lzero_TBL tbl0
join APPS.oe_order_headers_all oeh on oeh.header_id = tbl0.header_id
join APPS.oe_order_lines_all oel on oeh.org_id = oel.org_id and oeh.header_id = oel.header_id
join APPS.wip_discrete_jobs disc on oel.line_id = disc.source_line_id
join APPS.ego_gx_configured_pr_agv agv on agv.inventory_item_id= oel.inventory_item_id
where oel.link_to_line_id is null
)a
) xx;
There's almost certainly no short and simple solution to tuning this query. The problem here is it's size and complexity. Lack of performance is merely a consequence.
As a first step I would consider taking a break from the keyboard. Grab a pen and paper and in plain English (or whichever "human" language you prefer) write the questions you want answered from your database via this query. Then ask yourself what columns/variables/attributes do you absolutely need to answer those questions? Write them down as well.
Now, do you really need all of those columns, nested joins, selects, and so forth to produce those variables? Maybe, but probably not. The key point here is to focus on only the data/information you need (YAGNI) and from there map out a picture the bare relationships you need to produce the information that answer your question. In other words, work from the outside in, not the other way around.
I realize that this perhaps sounds a bit abstract and vague, but the whole point is that maintaining clear and simple code is always an ongoing struggle. Keeping your eye on the objective at hand will help keep your head of the weeds.
Finally, a few more specific thoughts at a glance:
Do you really need that union? Try to do without it if you can.
Nesting sucks. Nested nesting especially sucks. Keep things flat whenever possible and practical.
Is it possible or practical to split this into independent, smaller queries?
Use more descriptive names for your variables, add comments judiciously.
Learn and master the SQL EXPLAIN command.

Finding time difference between two times in different rows

Scenario: Database for Biometric device. It inserts
EmpId, EmpName, DepName, RecTime, RecDate
It gets inserted when User Enters office and swipes finger and then 2nd time when he leaves office. RecTime saves Entry time then Exit time.
Problem:
I want to calculate total time a person has worked but finding difference between RecTime in first record being inserted and 2nd record being inserted, in minutes and hours.
I tried DateDiff function, DateSub etc but nothing worked
Not going to solve your problem as there is insufficient data to do so.
The approach pattern I would follow is;
CREATE TABLE #EmpLogging
(
LogID INT,
EmpId INT,
EmpName VARCHAR(50),
DepName VARCHAR(50),
RecTime TIME,
RecDate DATE
)
INSERT INTO #EmpLogging
SELECT 1, 1, 'Fred', 'Legal', '08:00:00', '2013-01-01' UNION ALL
SELECT 2, 2, 'Susan', 'Marketing', '08:03:00', '2013-01-01' UNION ALL
SELECT 3, 1, 'Fred', 'Legal', '17:00:00', '2013-01-01' UNION ALL
SELECT 4, 2, 'Susan', 'Marketing', '17:55:00', '2013-01-01'
;WITH EmpSequence AS
(
SELECT *
,EmpSeq = ROW_NUMBER() OVER (PARTITION BY EmpId ORDER BY RecDate, RecTime)
FROM #EmpLogging
)
,EndTime AS
(
SELECT E1.*
,OutTime = E2.RecTime
,OutDate = E2.RecDate
FROM EmpSequence E1
LEFT
JOIN EmpSequence E2 ON E1.EmpId = E2.EmpId
AND E1.EmpSeq = E2.EmpSeq - 1
)
SELECT *
,MINUTETimeFrame = DATEDIFF(MI, RecTime, OutTime)
FROM EndTime

Finding overlapping dates

I have a set of Meeting rooms and meetings in that having start date and end Date. A set of meeting rooms belong to a building.
The meeting details are kept in MeetingDetail table having a startDate and endDate.
Now I want to fire a report between two time period say reportStartDate and reportEndDate, which finds me the time slots in which all the meeting rooms are booked for a given building
Table structure
MEETING_ROOM - ID, ROOMNAME, BUILDING_NO
MEETING_DETAIL - ID, MEETING_ROOM_ID, START_DATE, END_DATE
The query has to be fired for reportStartDate and REportEndDate
Just to clarify further, the aim is to find all the time slots in which all the meeting rooms were booked in a given time period of reportStartDate and reportEndDate
For SQL Server 2005+ you could try the following (see note at the end for mysql)
WITH TIME_POINTS (POINT_P) AS
(SELECT DISTINCT START_DATE FROM MEETING_DETAIL
WHERE START_DATE > #reportStartDate AND START_DATE < #reportEndDate
UNION SELECT DISTINCT END_DATE FROM MEETING_DETAIL
WHERE END_DATE > #reportStartDate AND END_DATE < #reportEndDate
UNION SELECT #reportEndDate
UNION SELECT #reportStartDate),
WITH TIME_SLICE (START_T, END_T) AS
(SELECT A.POINT_P, MIN(B.POINT_P) FROM
TIMEPOINTS A
INNER JOIN TIMEPOINTS B ON A.POINT_P > B.POINT_P
GROUP BY A.POINT_P),
WITH SLICE_MEETINGS (START_T, END_T, MEETING_ROOM_ID, BUILDING_NO) AS
(SELECT START_T, END_T, MEETING_ROOM_ID, BUILDING_NO FROM
TIME_SLICE A
INNER JOIN MEETING_DETAIL B ON B.START_DATE <= A.START_T AND B.END_DATE >= B.END_T
INNER JOIN MEETING_ROOM C ON B.MEETING_ROOM_ID = C.ID),
WITH SLICE_COUNT (START_T, END_T, BUILDING_NO, ROOMS_C) AS
(SELECT START_T, END_T, BUILDING_NO, COUNT(MEETING_ROOM_ID) FROM
SLICE_MEETINGS
GROUP BY START_T, END_T, BUILDING_NO),
WITH ROOMS_BUILDING (BUILDING_NO, ROOMS_C) AS
(SELECT BUILDING_NO, COUNT(ID) FROM
MEETING_ROOM
GROUP BY BUILDING_NO)
SELECT B.BUILDING_NO, A.START_T, A.END_T
FROM SLICE_COUNT A.
INNER JOIN ROOMS_BUILDING B WHERE A.BUILDING_NO = B.BUILDING_NO AND B.ROOMS_C = A.ROOMS_C;
what it does is (each step corresponds to each CTE definition above)
Get all the time markers, i.e. end or start times
Get all time slices i.e. the smallest unit of time between which there is no other time marker (i.e. no meetings start in a time slice, it's either at the beginning or at the end of a time slice)
Get meetings for each time slice, so now you get something like
10.30 11.00 Room1 BuildingA
10.30 11.00 Room2 BuildingA
11.00 12.00 Room1 BuildingA
Get counts of rooms booked per building per time slice
Filter out timeslice-building combinations that match the number of rooms in each building
Edit
Since mysql doesn't support the WITH clause you'll have to construct views for each (of the 5) WITH clases above. everything else would remain the same.
After reading your comment, I think I understand the problem a bit better. As a first step I would generate a matrix of meeting rooms and time slots using cross join:
select *
from (
select distinct start_date
, end_date
from #meeting_detail
) ts
cross join
#meeting_room mr
Then, for each cell in the matrix, add meetings in that timeslot:
left join
#meeting_detail md
on mr.id = md.meeting_room_id
and ts.start_date < md.end_date
and md.start_date < ts.end_date
And then demand that there are no free rooms. For example, by saying that the left join must succeed for all rooms and time slots. A left join succeeds if any field is not null:
group by
mr.building_no
, ts.start_date
, ts.end_date
having max(case when md.meeting_room_id is null
then 1 else 0 end) = 0
Here's a complete working example. It's written for SQL Server, and the table variables (#meeting_detail) won't work in MySQL. But the report generating query should work in most databases:
set nocount on
declare #meeting_room table (id int, roomname varchar(50),
building_no int)
declare #meeting_detail table (meeting_room_id int,
start_date datetime, end_date datetime)
insert #meeting_room (id, roomname, building_no)
select 1, 'Kitchen', 6
union all select 2, 'Ballroom', 6
union all select 3, 'Conservatory', 7
union all select 4, 'Dining Room', 7
insert #meeting_detail (meeting_room_id, start_date, end_date)
select 1, '2010-08-01 9:00', '2010-08-01 10:00'
union all select 1, '2010-08-01 10:00', '2010-08-01 11:00'
union all select 2, '2010-08-01 10:00', '2010-08-01 11:00'
union all select 3, '2010-08-01 10:00', '2010-08-01 11:00'
select mr.building_no
, ts.start_date
, ts.end_date
from (
select distinct start_date
, end_date
from #meeting_detail
) ts
cross join
#meeting_room mr
left join
#meeting_detail md
on mr.id = md.meeting_room_id
and ts.start_date < md.end_date
and md.start_date < ts.end_date
group by
mr.building_no
, ts.start_date
, ts.end_date
having max(case when md.meeting_room_id is null
then 1 else 0 end) = 0
This prints:
building_no start end
6 2010-08-01 10:00:00.000 2010-08-01 11:00:00.000