What is the difference between preceding and following in teradata query - sql

I am confused why we use this min function.I am not able to understand how the below snippet works.Please guide
COALESCE( min((start_Date)) OVER (partition by Seq_id ORDER BY start_Date rows between 1 following and 1 following),cast( '9999-12-31 00:00:00' as timestamp(6)) end_Date FROM table.test1

This is your query:
SELECT COALESCE(min((start_Date)) OVER (partition by Seq_id
ORDER BY start_Date
rows between 1 following and 1 following
),
cast( '9999-12-31 00:00:00' as timestamp(6))
) as end_Date
FROM table.test1
This query is doing;
SELECT COALESCE(LEAD(Start_Date) OVER (PARTITION BY seq_id ORDER BY start_date),
cast( '9999-12-31 00:00:00' as timestamp(6))
) as end_Date
That is, it is fetching the date value from the "next" row as defined by Start_Date.
I think this construct is used because (some versions of) Teradata do not support LEAD().

You will find nice explanation with Teradata window function,Rows between Preceding and Preceding :
http://pauldhip.blogspot.dk/2015/04/window-function-rows-between-preceding.html

Related

Oracle substract previous row

I've got this query:
SELECT user_id, from_loc_id, to_loc_id, to_char(dstamp, 'hh24:mi:ss')
FROM inventory_transaction
WHERE code = 'Pick'
AND substr(work_group,1,6) = 'BRANCH'
AND dstamp BETWEEN to_date('24/02/2022 17:00:00', 'dd/mm/yyyy hh24:mi:ss') AND
to_date('24/02/2022 18:00:00', 'dd/mm/yyyy hh24:mi:ss')
ORDER BY user_id;
That's the output:
My expected output is:
I was trying to use lag, but didn't really worked.
I've just realized I need to add a second ORDER BY, so first by user, second by to_char(dstamp, 'hh24:mi:ss').
All solutions much appreciate. Thank you.
You can use NUMTODSINTERVAL function with day argument and applying SUBSTR to extract hours:minutes:seconds portion as your data resides within a specific date such as
SELECT t.user_id,
t.dstamp,
SUBSTR(
NUMTODSINTERVAL(dstamp - LAG(dstamp)
OVER (PARTITION BY user_id ORDER BY dstamp),'day'),
12,8) AS time_diff
FROM t
Demo
Edit : The case above is applied for the column dstamp is considered to be of date data type, if its data type is timestamp, then use the following query containing date cast instead
SELECT t.user_id,
t.dstamp,
SUBSTR(
NUMTODSINTERVAL(CAST(dstamp AS date) - LAG(CAST(dstamp AS date))
OVER (PARTITION BY user_id ORDER BY CAST(dstamp AS date)),'day'),
12,8) AS time_diff
FROM t
Demo

Window function for average

I have this table timestamp_table and I'm using Presto SQL
timestamp | id
2021-01-01 10:00:00 | 2456
I would like to compute the number of unique IDs in the last 24 and 48 hours and I thought this could be achieved with window functions but I'm struggling. This is my proposed solution, but it needs work
SELECT COUNT(id) OVER (PARTITION BY timestamp ORDER BY timestamp RANGE BETWEEN INTERVAL '24' HOUR PRECEDING AND CURRENT ROW)
You're probably having trouble due to the PARTITION BY clause, since the COUNT will only apply to rows within the same timestamp values.
Try something like this, as a starting point:
The fiddle
SELECT *
, COUNT(id) OVER (ORDER BY timestamp RANGE BETWEEN INTERVAL '24' HOUR PRECEDING AND CURRENT ROW)
, MIN(id) OVER (ORDER BY timestamp RANGE BETWEEN INTERVAL '24' HOUR PRECEDING AND CURRENT ROW)
FROM tbl
;
I think that you can't get data for both time intervals by one table scan. Because row that is in last 24 hours must be in both groups: 24 hours and 48 hours. So you must do 2 request or union them.
select 'h24', count(distinct id)
from timestamp_table
where timestamp < current_timestamp and timestamp >= date_add(day, -1, current_timestamp)
union all
select 'h48', count(distinct id)
from timestamp_table
where timestamp < current_timestamp and timestamp >= date_add(day, -2, current_timestamp)

Monthly data not reflecting properly

Need last four months data:
select count(distinct session_id)
from master_gui partition for (to_date('11-25-2020','MM-DD-YYYY'))
where session_id in (select distinct session_id
from reporting_data partition for (to_date('11-25-2020','MM-DD-YYYY'))
where flow_name in ('BEGIN_STATUS'));
any suggestion in above query how to include dates for last 4 months.
CHECKED FROM BELOW partition key value:
SELECT OWNER, NAME, OBJECT_TYPE, COLUMN_NAME, COLUMN_POSITION FROM ALL_PART_KEY_COLUMNS
REPORTING_USER REPORTING_DATA TABLE CREATE_TIME 1
REPORTING_USER MASTER_GUI TABLE SESSION_START_TIME 1
using below query to get last 4 months records(Aug, Spet, Oct and Nov month)
select count(distinct session_id)
from master_gui where SESSION_START_TIME >= add_months(trunc(sysdate), -4)
and session_id in (select distinct session_id from reporting_data where CREATE_TIME>= add_months(trunc(sysdate), -4)
and flow_name in ('BEGIN_STATUS'));
Thanks Experts,
Used below query after changes, is it correct:
As we have to get count from master_gui table so used it and parent key value SESSION_START_TIME also reporting_data tab;e parent key value CREATE_TIME.
select count(distinct session_id)
from master_gui where SESSION_START_TIME < trunc(sysdate,'mm')
and SESSION_START_TIME >= add_months( trunc(sysdate, 'mm'),-4)
and session_id in (select distinct session_id from REPORTING_DATA where create_time < trunc(sysdate,'mm')
and create_time >= add_months( trunc(sysdate, 'mm'),-4)
and flow_name in ('BEGIN_STATUS'));
Thanks experts,
is below is correct will get some performance better by using below query, removed distinct clause from subquery inside.
select count(distinct session_id)
from master_gui where SESSION_START_TIME < trunc(sysdate,'mm')
and SESSION_START_TIME >= add_months( trunc(sysdate, 'mm'),-4)
and session_id in (select session_id from REPORTING_DATA where create_time < trunc(sysdate,'mm')
and create_time >= add_months( trunc(sysdate, 'mm'),-4)
and flow_name in ('BEGIN_STATUS'));
Thanks Experts,
I need to use in partition only to get faster perofmance:
select count(distinct session_id)
from master_gui partition for (to_date('11-01-2020','MM-DD-YYYY'))
where session_id in (select distinct session_id from reporting_data partition for (to_date('11-30-2020','MM-DD-YYYY'))
where flow_name in ('BEGIN_STATUS'));
Is above query is correct for 1st Nov 2020 to 30th Nov 2020.
This part of your query means you are selecting records only from the partition which holds values for 25-NOV-2020.
from reporting_data partition for (to_date('11-25-2020','MM-DD-YYYY'))
Therefore if your table is partitioned by daily intervals you will get records only for the 25th. If the partition key is monthly you will get records only for November. Using this syntax you could only get records for the last four months if the partition key is (say) annual.
The solution is simply to omit the partition clause and use a WHERE clause instead.
select count(distinct session_id)
from master_gui
where session_id in (select distinct session_id
from reporting_data partition
where <<partition_key_column>> >= sysdate - interval '4' month)
where flow_name in ('BEGIN_STATUS')
and <<partition_key_column>> >= sysdate - interval '4' month;
This query will still use partition pruning.
is it correct?
Looks like what I suggested. However, you have refined "last four months" to mean the last four complete months i.e. excluding the current month. My search criteria includes the current month. So maybe what you actually need is something like
select session_id
from reporting_data
where create_time < trunc(sysdate,'mm')
and create_time >= add_months( trunc(sysdate, 'mm'),-4)
This will provide a span from 01-AUG-2020 to 30-NOV-2020.
Incidentally, you don't need the DISTINCT in the subquery. The IN clause will handle duplicates so DISTINCT just adds unnecessary work, which could matter if you're dealing with large amounts of data.
There's a DATE datatype column, I presume. If so, include it into the where clause, e.g.
... and date_column >= add_months(trunc(sysdate, 'mm'), -4)

How to do a very special grouping in Oracle SQL

I have a table as this in Oracle SQL
Could you please share me some light on how to connect all activities so that their period is connected into one row so that the result looks like following in Oracle SQL:
Thanks in advance!
Assuming there are no gaps, then you can use lead() and lag() -- without aggregation:
select activity, start_date,
coalesce(lead(start_date) over (order by start_date) - interval '1' second,
max_end_date
)
from (select t.*,
lag(activity) over (order by start_date) as prev_activity,
max(end_date) over () as max_end_date
from t
) t
where prev_activity is null or prev_activity <> activity;
Note: I think it is a very bad idea to have the end time be one second before midnight. I think your data should be structured with dates -- with no time components -- for both the start and end. Then, comparisons would use < for the end time.
It's a gap & island problem - You can try the below
select activity, min(startdate) as startdate,max(enddate) as enddate
from
(select *,row_number() over(order by startdate)-
row_number() over(partition by col1 order by startdate) as grp
)A
group by activity,grp
It's a difficult to cover all cases just based on 4 sample records. You could also use Pattern Recognition With MATCH_RECOGNIZE
WITH t(ACTIVITY, start_date, end_date) AS (
SELECT 'Working', TIMESTAMP '2020-01-01 00:00:00', TIMESTAMP '2020-01-02 23:59:59' FROM dual UNION ALL
SELECT 'Working', TIMESTAMP '2020-01-03 00:00:00', TIMESTAMP '2020-01-10 23:59:59' FROM dual UNION ALL
SELECT 'Day Off', TIMESTAMP '2020-01-10 00:00:00', TIMESTAMP '2020-01-12 23:59:59' FROM dual UNION ALL
SELECT 'Working', TIMESTAMP '2020-01-13 00:00:00', TIMESTAMP '2020-01-13 23:59:59' FROM dual)
SELECT *
FROM t
MATCH_RECOGNIZE (
ORDER BY end_date
MEASURES
FINAL MIN(start_date) AS start_date,
FINAL MAX(end_date) AS end_date,
FINAL LAST(ACTIVITY) AS ACTIVITY
PATTERN (a act*)
DEFINE
act AS PREV(act.ACTIVITY) = act.ACTIVITY
)

How to select a row having a column with max value with a group by

I have a table with the next columns
MSG_ID NOT NULL NUMBER(10)
CREATION_DATE DATE
PORT VARCHAR2(50)
MESSAGE VARCHAR2(1024)
IP_ADDRESS VARCHAR2(50)
PARSED NUMBER(1)
PARSED_ON DATE
Where parse time is parsed_on - creation_date.
I would like to know if it is possible in 1 single query extract for each hour the message that take longer to parse, getting the HOUR, PORT, MSG_ID MINUTES...I am blocked here
select TO_CHAR(CREATION_DATE, 'HH24') || ':mm' HOUR, PORT, MSG_ID, ROUND(MAX(parsed_on - creation_date)) * 24*60 MINUTES
from T_INCOME_CALLS
where TO_CHAR(CREATION_DATE, 'dd/mm/yyyy') = TO_CHAR(SYSDATE, 'dd/mm/yyyy')
group by TO_CHAR(CREATION_DATE, 'HH24'), PORT, MSG_ID
order by TO_CHAR(CREATION_DATE, 'HH24') ;
You can use window function row_number to find row with largest parse time in each hour like this:
select *
from (
select to_number(to_char(creation_date, 'HH24')) as hour,
port,
msg_id,
round(parsed_on - creation_date) * 24 * 60 as parse_time,
row_number() over (
partition by to_char(creation_date, 'HH24'), port, msg_id
order by (parsed_on - creation_date) desc nulls last
) as rn
from t_income_calls t
where creation_date between trunc(sysdate)
and trunc(sysdate + 1) - interval '1' second
) t
where rn = 1;
Also, notice the filter. I used date range instead of to_char on creation_date. The use of to_char on creation_date inhibits the use of index on creation_date if it is present.
I have assumed that the need is for the item that takes most time, per hour, for a grouping of IP_ADDRESS and PORT, which is different to your original query. I am also assuming MSG_ID is unique.
If you want 1 and only 1 row per recorded hour then use row_number(), if however you want tied values as well substitute dense_rank() in the query below. The create_on date has been used as a tie-beaker for sorting.
SELECT
TO_CHAR(CREATION_DATE, 'HH24') || ':mm' HOUR
, PORT, MSG_ID
, ROUND(parsed_on - creation_date) * 24*60 MINUTES
FROM (
SELECT
T_INCOME_CALLS.*
, ROW_NUMBER() OVER(PARTITION BY IP_ADDRESS, port, TO_CHAR(CREATION_DATE, 'HH24')
ORDER BY (parsed_on - creation_date) desc, CREATION_DATE) AS rn
FROM T_INCOME_CALLS
WHERE CREATION_DATE >= TRUNC(SYSDATE) AND CREATION_DATE < TRUNC(SYSDATE) + 1
)
WHERE rn = 1
Please avoid converting dates into strings for your where clause, this is not efficient . Instead leave created_on untouched and amend the criteria to suit that data which will allow access to indexes for the filtering.
You can get it also without a sub-query when you use FIRST function:
SELECT TO_CHAR(CREATION_DATE, 'HH24') || ':mm' HOUR, PORT, MSG_ID,
MAX(MESSAGE) KEEP (DENSE_RANK FIRST ORDER BY (parsed_on - creation_date) desc, CREATION_DATE)
FROM T_INCOME_CALLS
WHERE CREATION_DATE >= TRUNC(SYSDATE) AND CREATION_DATE < TRUNC(SYSDATE) + 1
GROUP BY TO_CHAR(CREATION_DATE, 'HH24'), PORT, MSG_ID
ORDER BY TO_CHAR(CREATION_DATE, 'HH24');