How to do a very special grouping in Oracle SQL - sql

I have a table as this in Oracle SQL
Could you please share me some light on how to connect all activities so that their period is connected into one row so that the result looks like following in Oracle SQL:
Thanks in advance!

Assuming there are no gaps, then you can use lead() and lag() -- without aggregation:
select activity, start_date,
coalesce(lead(start_date) over (order by start_date) - interval '1' second,
max_end_date
)
from (select t.*,
lag(activity) over (order by start_date) as prev_activity,
max(end_date) over () as max_end_date
from t
) t
where prev_activity is null or prev_activity <> activity;
Note: I think it is a very bad idea to have the end time be one second before midnight. I think your data should be structured with dates -- with no time components -- for both the start and end. Then, comparisons would use < for the end time.

It's a gap & island problem - You can try the below
select activity, min(startdate) as startdate,max(enddate) as enddate
from
(select *,row_number() over(order by startdate)-
row_number() over(partition by col1 order by startdate) as grp
)A
group by activity,grp

It's a difficult to cover all cases just based on 4 sample records. You could also use Pattern Recognition With MATCH_RECOGNIZE
WITH t(ACTIVITY, start_date, end_date) AS (
SELECT 'Working', TIMESTAMP '2020-01-01 00:00:00', TIMESTAMP '2020-01-02 23:59:59' FROM dual UNION ALL
SELECT 'Working', TIMESTAMP '2020-01-03 00:00:00', TIMESTAMP '2020-01-10 23:59:59' FROM dual UNION ALL
SELECT 'Day Off', TIMESTAMP '2020-01-10 00:00:00', TIMESTAMP '2020-01-12 23:59:59' FROM dual UNION ALL
SELECT 'Working', TIMESTAMP '2020-01-13 00:00:00', TIMESTAMP '2020-01-13 23:59:59' FROM dual)
SELECT *
FROM t
MATCH_RECOGNIZE (
ORDER BY end_date
MEASURES
FINAL MIN(start_date) AS start_date,
FINAL MAX(end_date) AS end_date,
FINAL LAST(ACTIVITY) AS ACTIVITY
PATTERN (a act*)
DEFINE
act AS PREV(act.ACTIVITY) = act.ACTIVITY
)

Related

Oracle substract previous row

I've got this query:
SELECT user_id, from_loc_id, to_loc_id, to_char(dstamp, 'hh24:mi:ss')
FROM inventory_transaction
WHERE code = 'Pick'
AND substr(work_group,1,6) = 'BRANCH'
AND dstamp BETWEEN to_date('24/02/2022 17:00:00', 'dd/mm/yyyy hh24:mi:ss') AND
to_date('24/02/2022 18:00:00', 'dd/mm/yyyy hh24:mi:ss')
ORDER BY user_id;
That's the output:
My expected output is:
I was trying to use lag, but didn't really worked.
I've just realized I need to add a second ORDER BY, so first by user, second by to_char(dstamp, 'hh24:mi:ss').
All solutions much appreciate. Thank you.
You can use NUMTODSINTERVAL function with day argument and applying SUBSTR to extract hours:minutes:seconds portion as your data resides within a specific date such as
SELECT t.user_id,
t.dstamp,
SUBSTR(
NUMTODSINTERVAL(dstamp - LAG(dstamp)
OVER (PARTITION BY user_id ORDER BY dstamp),'day'),
12,8) AS time_diff
FROM t
Demo
Edit : The case above is applied for the column dstamp is considered to be of date data type, if its data type is timestamp, then use the following query containing date cast instead
SELECT t.user_id,
t.dstamp,
SUBSTR(
NUMTODSINTERVAL(CAST(dstamp AS date) - LAG(CAST(dstamp AS date))
OVER (PARTITION BY user_id ORDER BY CAST(dstamp AS date)),'day'),
12,8) AS time_diff
FROM t
Demo

Creating a query with a column of dates that I specify without the ability to create a temporary table

I have read only access in my database and I have a table that looks like this;
I want an output that looks like the following
Is this possible? I think I would need to have a table of dates to pass through but I'm not sure how to go about that. Any help would be appreciated.
Use a sub-query factoring clause (WITH) to generate the dates to join:
WITH dates ( dt ) AS (
SELECT DATE '2016-07-01' FROM DUAL UNION ALL
SELECT DATE '2017-07-01' FROM DUAL UNION ALL
SELECT DATE '2019-08-01' FROM DUAL
)
SELECT EXTRACT(MONTH FROM d.dt) AS Month,
EXTRACT(YEAR FROM d.dt) AS Year,
NVL2(t.SomeValue, 'Y', 'N') AS "Change?"
FROM dates d
LEFT OUTER JOIN table_name t
ON ( TRUNC(t.effective_date, 'MM') = d.dt )
My desired output would include every month and year from 1/2012 on.
Then use a recursive sub-query factoring clause:
WITH dates ( dt ) AS (
SELECT DATE '2012-01-01' FROM DUAL
UNION ALL
SELECT ADD_MONTHS( dt, 1 )
FROM dates
WHERE ADD_MONTHS( dt, 1 ) <= SYSDATE
)
SELECT EXTRACT(MONTH FROM d.dt) AS Month,
EXTRACT(YEAR FROM d.dt) AS Year,
NVL2(t.SomeValue, 'Y', 'N') AS "Change?"
FROM dates d
LEFT OUTER JOIN table_name t
ON ( TRUNC(t.effective_date, 'MM') = d.dt )
You can use recursion to break the records in to who years...
WITH
fragmented (
window_start,
window_close,
some_value,
interval_start,
interval_close
)
AS
(
SELECT
window_start,
window_close,
some_value,
interval_start,
CASE
WHEN add_months(interval_start, 12) < window_close
THEN add_months(interval_start, 12)
ELSE window_close
END
AS interval_close
FROM
(
SELECT
effective_date AS window_start,
LEAD(effective_date) OVER (ORDER BY effective_date) AS window_close,
some_value,
effective_date AS interval_start
FROM
example
)
lookahead
UNION ALL
SELECT
window_start,
window_close,
some_value,
add_months(interval_start, 12),
CASE
WHEN add_months(interval_start, 24) < window_close
THEN add_months(interval_start, 24)
ELSE window_close
END
FROM
fragmented
WHERE
interval_close < window_close
)
SELECT
*
FROM
fragmented
ORDER BY
window_start,
interval_start
;
Demo : https://dbfiddle.uk/?rdbms=oracle_18&fiddle=ca1fef00069c178c28e09d209db35395

Duplicate row based on date difference

help needed. I have a set of data from oracle import to tableau for calculation. But in order to do that, i need to duplicate charts as shown in table below. For example, if there is date diff between start and end, then i need to duplicate it and assign with code 0,1 depend on how many date differences. The purpose is i need to use this function in Tableau for time interval calculation. Thanks
Pregenerate codes up to max possible value and join original table to code series so that number of row duplications is determined by difference between dates on particular row:
with t (s,e) as (
select timestamp '2020-08-16 18:30:00', timestamp '2020-08-16 20:00:00' from dual union all
select timestamp '2020-08-17 08:00:00', timestamp '2020-08-18 08:00:00' from dual union all
select timestamp '2020-08-19 08:00:00', timestamp '2020-08-19 00:00:00' from dual union all
select timestamp '2020-08-20 10:00:00', timestamp '2020-08-22 03:00:00' from dual
), series (code) as (
select level - 1 from dual connect by level <= (select count(*) from t)
)
select t.*, series.code
from t
join series on trunc(e) - trunc(s) >= series.code
order by s,code;

SQL - How to find missing activity days using start_date and end_date

I have a few fields in a database that look like this:
trip_id
start_date
end_date
start_station_name
end_station_name
I need to write a query that shows all the stations with no activity on a particular day in the year 2015. I wrote the following query but it's not giving the right output:
select
start_station_name,
extract(date from start_date) as dt,
count(*)
from
trips_table
where
(
start_date >= timestamp('2015-01-01')
and
start_date < timestamp('2016-01-01')
)
group by
start_station_name,
dt
order by
count(*)
Can someone help come up with the right query? Thanks in advance!
Below is for BigQuery Standard SQL
It assumes start_date and end_date are of DATE type
It also assumes that all days in between start_date and end_date are "dedicated" to station in start_station_name field, which most likely not what is expected but question is missing details here thus such an assumption
#standardSQL
WITH days AS (
SELECT day
FROM UNNEST(GENERATE_DATE_ARRAY('2015-01-01', '2015-12-31')) AS day
),
stations AS (
SELECT DISTINCT start_station_name AS station
FROM `trips_table`
)
SELECT s.*
FROM (SELECT * FROM stations CROSS JOIN days) AS s
LEFT JOIN (SELECT * FROM `trips_table`,
UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) AS day) AS a
ON s.day = a.day AND s.station = a.start_station_name
WHERE a.day IS NULL
You can test/play it with below simple/dummy data
#standardSQL
WITH `trips_table` AS (
SELECT 1 AS trip_id, DATE '2015-01-01' AS start_date, DATE '2015-12-01' AS end_date, '111' AS start_station_name UNION ALL
SELECT 2, DATE '2015-12-10', DATE '2015-12-31', '111'
),
days AS (
SELECT day
FROM UNNEST(GENERATE_DATE_ARRAY('2015-01-01', '2015-12-31')) AS day
),
stations AS (
SELECT DISTINCT start_station_name AS station
FROM `trips_table`
)
SELECT s.*
FROM (SELECT * FROM stations CROSS JOIN days) AS s
LEFT JOIN (SELECT * FROM `trips_table`,
UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) AS day) AS a
ON s.day = a.day AND s.station = a.start_station_name
WHERE a.day IS NULL
ORDER BY station, day
the output is like below
station day
111 2015-12-02
111 2015-12-03
111 2015-12-04
111 2015-12-05
111 2015-12-06
111 2015-12-07
111 2015-12-08
111 2015-12-09
Use recursion for this purpose: try this SQL SERVER
WITH sample AS (
SELECT CAST('2015-01-01' AS DATETIME) AS dt
UNION ALL
SELECT DATEADD(dd, 1, dt)
FROM sample s
WHERE DATEADD(dd, 1, dt) < CAST('2016-01-01' AS DATETIME)
)
SELECT * FROM sample
Where CAST(sample.dt as date) NOT IN (
SELECT CAST(start_date as date)
FROM tablename
WHERE start_date >= '2015-01-01 00:00:00'
AND start_date < '2016-01-01 00:00:00'
)
Option(maxrecursion 0)
If you want the station data with it then you can use left join as :
WITH sample AS (
SELECT CAST('2015-01-01' AS DATETIME) AS dt
UNION ALL
SELECT DATEADD(dd, 1, dt)
FROM sample s
WHERE DATEADD(dd, 1, dt) < CAST('2016-01-01' AS DATETIME)
)
SELECT * FROM sample
left join tablename
on CAST(sample.dt as date) = CAST(tablename.start_date as date)
where sample.dt>= '2015-01-01 00:00:00' and sample.dt< '2016-01-01 00:00:00' )
Option(maxrecursion 0)
For mysql, see this fiddle. I think this would help you....
SQL Fiddle Demo

What is the difference between preceding and following in teradata query

I am confused why we use this min function.I am not able to understand how the below snippet works.Please guide
COALESCE( min((start_Date)) OVER (partition by Seq_id ORDER BY start_Date rows between 1 following and 1 following),cast( '9999-12-31 00:00:00' as timestamp(6)) end_Date FROM table.test1
This is your query:
SELECT COALESCE(min((start_Date)) OVER (partition by Seq_id
ORDER BY start_Date
rows between 1 following and 1 following
),
cast( '9999-12-31 00:00:00' as timestamp(6))
) as end_Date
FROM table.test1
This query is doing;
SELECT COALESCE(LEAD(Start_Date) OVER (PARTITION BY seq_id ORDER BY start_date),
cast( '9999-12-31 00:00:00' as timestamp(6))
) as end_Date
That is, it is fetching the date value from the "next" row as defined by Start_Date.
I think this construct is used because (some versions of) Teradata do not support LEAD().
You will find nice explanation with Teradata window function,Rows between Preceding and Preceding :
http://pauldhip.blogspot.dk/2015/04/window-function-rows-between-preceding.html