I'm looking for a way to query the time difference (in seconds) between two records grouped by operation id.
When given this sample data:
datatable(timestamp:datetime, operation_id:string)
[
datetime(15-7-2021 12:45:37), 'abc',
datetime(15-7-2021 12:45:39), 'abc',
datetime(15-7-2021 13:29:12), 'def',
datetime(15-7-2021 13:29:14), 'def',
datetime(15-7-2021 13:29:17), 'def',
datetime(15-7-2021 13:29:23), 'def',
]
The expected output would be:
operation_id
diff
abc
2
def
2
def
3
def
6
Is this possible?
P.s. it's similar to this question but I do not want the difference between the min and max but for each record
you can order the table and then use the prev() function:
datatable(ts:datetime, op_id:string)
[
datetime(07-15-2021 12:45:37), 'abc',
datetime(07-15-2021 12:45:39), 'abc',
datetime(07-15-2021 13:29:12), 'def',
datetime(07-15-2021 13:29:14), 'def',
datetime(07-15-2021 13:29:17), 'def',
datetime(07-15-2021 13:29:23), 'def',
]
| order by op_id asc, ts asc
| extend prev_ts = prev(ts), prev_op_id = prev(op_id)
| project op_id, diff = case(prev_op_id == op_id, (ts - prev_ts)/1s, double(null))
| where isnotnull(diff)
op_id
diff
abc
2
def
2
def
3
def
6
Related
I have a third party app that writes to Vertica database every 5 minutes. As a result, a sample table looks like this:
CREATE TABLE sample (
item_id int,
metric_val float,
ts timestamp
);
-- Hypothetical sample values in 2nd column; these can be any values
INSERT INTO sample VALUES(1, 11.0, '2022-03-29 00:00:00')
INSERT INTO sample VALUES(1, 11.1, '2022-03-29 00:05:00')
INSERT INTO sample VALUES(1, 11.2, '2022-03-29 00:10:00')
INSERT INTO sample VALUES(1, 11.3, '2022-03-29 00:15:00')
INSERT INTO sample VALUES(1, 11.4, '2022-03-29 00:20:00')
INSERT INTO sample VALUES(1, 11.5, '2022-03-29 00:25:00')
INSERT INTO sample VALUES(1, 11.6, '2022-03-29 00:30:00')
...
...
INSERT INTO sample VALUES(1, 12.1, '2022-03-29 01:00:00')
INSERT INTO sample VALUES(1, 12.2, '2022-03-29 01:05:00')
...
INSERT INTO sample VALUES(1, 13.1, '2022-03-29 02:00:00')
INSERT INTO sample VALUES(1, 13.2, '2022-03-29 02:05:00')
As a result, there are 288 (24 hours * 12 entries each hour) rows for each day for a given item. I want to retrieve the records at the top of each hour i.e.
1, 11.0, 2022-03-29 00:00:00
1, 12.0, 2022-03-29 01:00:00
1, 13.0, 2022-03-29 02:00:00
...
1, 101.0, 2022-03-30 00:00:00
1, 102.0, 2022-03-30 01:00:00
I tried the below query but the challenge is to increment the value of 'n'
WITH a AS (
SELECT item_id, metric_val, ts, ROW_NUMBER() OVER (PARTITION BY ts, HOUR(ts) ORDER BY ts) AS n
FROM sample WHERE item_id = 1
)
SELECT * FROM a WHERE n = 1
The Vertica TIME_SLICE function seems promising but I couldn't make that work even after multiple attempts. Can this please be advised?
SELECT version();
Vertica Analytic Database v10.1.1-0
Seems pretty simple - or do I miss something?
Just filter out the rows whose ts truncated to the hour ('HH') is equal to ts ...
WITH sample (item_id, metric_val, ts) AS (
-- Hypothetical sample values in 2nd column; these can be any values
SELECT 1, 11.0, TIMESTAMP '2022-03-29 00:00:00'
UNION ALL SELECT 1, 11.1, TIMESTAMP '2022-03-29 00:05:00'
UNION ALL SELECT 1, 11.2, TIMESTAMP '2022-03-29 00:10:00'
UNION ALL SELECT 1, 11.3, TIMESTAMP '2022-03-29 00:15:00'
UNION ALL SELECT 1, 11.4, TIMESTAMP '2022-03-29 00:20:00'
UNION ALL SELECT 1, 11.5, TIMESTAMP '2022-03-29 00:25:00'
UNION ALL SELECT 1, 11.6, TIMESTAMP '2022-03-29 00:30:00'
UNION ALL SELECT 1, 12.1, TIMESTAMP '2022-03-29 01:00:00'
UNION ALL SELECT 1, 12.2, TIMESTAMP '2022-03-29 01:05:00'
UNION ALL SELECT 1, 13.1, TIMESTAMP '2022-03-29 02:00:00'
UNION ALL SELECT 1, 13.2, TIMESTAMP '2022-03-29 02:05:00'
)
SELECT
*
FROM sample
WHERE TRUNC(ts,'HH') = ts;
-- out item_id | metric_val | ts
-- out ---------+------------+---------------------
-- out 1 | 11.0 | 2022-03-29 00:00:00
-- out 1 | 12.1 | 2022-03-29 01:00:00
-- out 1 | 13.1 | 2022-03-29 02:00:00
I have a table of phone calls consisting of user_id, call_date, city,
where city can be either A or B.
It looks like this:
user_id
call_date
city
1
2021-01-01
A
1
2021-01-02
B
1
2021-01-03
B
1
2021-01-05
B
1
2021-01-10
A
1
2021-01-12
B
1
2021-01-16
A
2
2021-01-17
A
2
2021-01-20
B
2
2021-01-22
B
2
2021-01-23
A
2
2021-01-24
B
2
2021-01-26
B
2
2021-01-30
A
For this table, we need to select for each user all the periods when he was in city B.
These periods are counted in days and start when the first call is made from city B, and end as soon as the next call is made from city A.
So for user_id = 1 fist period starts on 2021-01-02 and ands on 2021-01-10. There can be several such periods for each user.
The result should be the following table:
user_id
period_1
period_2
1
8
4
2
3
6
Can you please tell me how I can limit the periods according to the condition of the problem, and then calculate the datediff within each period?
Thank you
This is a typical gaps and islands problem. You need to group consecutive rows first, then find the first call_date of the next group. Sample code for Postgres is below, the same may be adapted to another DBMS by applying appropriate function to calculate the difference in days.
with a (user_id, call_date, city)
as (
select *
from ( values
('1', date '2021-01-01', 'A'),
('1', date '2021-01-02', 'B'),
('1', date '2021-01-03', 'B'),
('1', date '2021-01-05', 'B'),
('1', date '2021-01-10', 'A'),
('1', date '2021-01-12', 'B'),
('1', date '2021-01-16', 'A'),
('2', date '2021-01-17', 'A'),
('2', date '2021-01-20', 'B'),
('2', date '2021-01-22', 'B'),
('2', date '2021-01-23', 'A'),
('2', date '2021-01-24', 'B'),
('2', date '2021-01-26', 'B'),
('2', date '2021-01-30', 'A')
) as t
)
, grp as (
/*Identify groups*/
select a.*,
/*This is a grouping of consecutive rows:
they will have the same difference between
two row_numbers while the more detailed
row_number changes, which means the attribute had changed.
*/
dense_rank() over(
partition by user_id
order by call_date asc
) -
dense_rank() over(
partition by user_id, city
order by call_date asc
) as grp,
/*Get next call date*/
lead(call_date, 1, call_date)
over(
partition by user_id
order by call_date asc
) as next_dt
from a
)
select
user_id,
city,
min(call_date) as dt_from,
max(next_dt) as dt_to,
max(next_dt) - min(call_date) as diff
from grp
where city = 'B'
group by user_id, grp, city
order by 1, 3
user_id | city | dt_from | dt_to | diff
:------ | :--- | :--------- | :--------- | ---:
1 | B | 2021-01-02 | 2021-01-10 | 8
1 | B | 2021-01-12 | 2021-01-16 | 4
2 | B | 2021-01-20 | 2021-01-23 | 3
2 | B | 2021-01-24 | 2021-01-30 | 6
db<>fiddle here
I am having some difficulty with writing an accurate view.
I have 2 tables that I am looking to join on different databases.
Table 1 (in database 1) contains 3 columns:
Purchase_date
Item_id
Quantity_purchased
Table 2 (in database 2) contains 3 columns:
Item_id
Price_effective_date
Price
I am trying to determine the price of the item at the purchase date, which is a challenge since the item prices change on price effective dates. Accordingly, table 2 will have multiple instances of the same item_id, but with different prices and price effective dates.
My current code is:
select tb1.*,
tb2.price x tb1.quantity_purchased as total_price
from "Database2"."schema"."Table1" tb1
left join (select item_id,
price
from "Database2"."Schema"."Table2"
) tb2
on tb1.item_id = tb2.item_id
where tb2.price_effective_date <= tb1.purchase_date
I want to limit my results to the price at the most recent price_effective_date that is just before the purchase_date.
Any recommendations?
It's not really Snowflake specific, and luckily it can be addressed with a pretty common pattern in SQL queries.
Let's prepare some data (btw, for the future, it's best to provide the exact setup like this in your questions, it helps investigations tremendously):
create or replace table tb1(purchase_date date, item_id int, quantity int);
insert into tb1 values
('2020-01-01', 101, 1),
('2020-06-30', 101, 1),
('2020-07-01', 101, 1),
('2020-12-31', 101, 1),
('2021-01-01', 101, 1),
('2020-01-01', 102, 1),
('2020-06-30', 102, 1),
('2020-07-01', 102, 1),
('2020-12-31', 102, 1),
('2021-01-01', 102, 1);
create or replace table tb2(item_id int, effective_date date, price decimal);
insert into tb2 values
(101, '2020-01-01', 10),
(101, '2021-01-01', 11),
(102, '2020-01-01', 20),
(102, '2020-07-01', 18),
(102, '2021-01-01', 22);
Now, what you want is to join records from tb1 and tb2 on item_id, but only use the records from tb2 where effective_date is the largest of all the values of effective_date for that item that are before purchase_date. Correct? If you phrase it like this, the SQL writes itself almost:
select tb1.*, tb2.effective_date, tb2.price
from tb1 join tb2 on tb1.item_id = tb2.item_id
where tb2.effective_date = (
select max(effective_date)
from tb2 sub
where sub.effective_date <= tb1.purchase_date
and sub.item_id = tb1.item_id
)
order by tb1.item_id, purchase_date;
The result is hopefully what you want:
PURCHASE_DATE
ITEM_ID
QUANTITY
EFFECTIVE_DATE
PRICE
2020-01-01
101
1
2020-01-01
10
2020-12-31
101
1
2020-01-01
10
2021-01-01
101
1
2021-01-01
11
2020-01-01
102
1
2020-01-01
20
2020-06-30
102
1
2020-01-01
20
2020-07-01
102
1
2020-07-01
18
2020-12-31
102
1
2020-07-01
18
2021-01-01
102
1
2021-01-01
22
Note, this query will not handle wrong data, e.g. purchases with no matching items and effective dates.
EDIT: Handling missing effective_dates
To handle cases where there are no effective dates matching the purchase date, you can identify the "missing" purchases, and then add the smallest existing effective_date for these items, e.g. (we add a new item, value 103 to the existing table to showcase this):
insert into tb1 values
('2020-06-01', 103, 11),
('2020-08-01', 103, 12);
insert into tb2 values
(103, '2020-07-01', 30);
with missing as (
select * from tb1 where not exists (
select * from tb2
where tb2.effective_date <= tb1.purchase_date
and tb2.item_id = tb1.item_id)
)
select m.item_id, m.purchase_date, m.quantity,
(select min(effective_date) from tb2 where tb2.item_id = m.item_id) best_date
from missing m;
You can take this query and UNION ALL it with the original query.
I have a table as follows:
Sn no. t_time Value rate
ABC 17-MAY-18 08:00:00 100.00 3
ABC 17-MAY-18 22:00:00 200.00 1
ABC 16-MAY-18 08:00:00 100.00 1
XYZ 14-MAY-18 01:00:00 700.00 1
XYZ 15-MAY-18 10:00:00 500.00 2
XYZ 15-MAY-18 13:00:00 100.00 2
And I want to generate the output as follows:
Sn no. New_value
ABC 150
XYZ 450
It is grouped by the Sn no. The New_value is the latest time of each date value multiplied by rate, and then averaged together.
For example ABC new_value is
Average of:[(100*1) and (200*1)]
Its a large dataset. How do I write a query for the above in the most efficient way. Please help.
You can use analytical function(row_number()) to achieve the result
SQL> WITH cte_table(Snno, t_time, Value, rate) AS (
2 SELECT 'ABC', to_date('2018-05-17 08:00:00', 'YYYY-MM-DD HH24:MI:SS'), 100.00, 3 FROM DUAL UNION ALL
3 SELECT 'ABC', to_date('2018-05-17 22:00:00', 'YYYY-MM-DD HH24:MI:SS'), 200.00, 1 FROM DUAL UNION ALL
4 SELECT 'ABC', to_date('2018-05-16 08:00:00', 'YYYY-MM-DD HH24:MI:SS'), 100.00, 1 FROM DUAL UNION ALL
5 SELECT 'XYZ', to_date('2018-05-14 01:00:00', 'YYYY-MM-DD HH24:MI:SS'), 700.00, 1 FROM DUAL UNION ALL
6 SELECT 'XYZ', to_date('2018-05-15 10:00:00', 'YYYY-MM-DD HH24:MI:SS'), 500.00, 2 FROM DUAL UNION ALL
7 SELECT 'XYZ', to_date('2018-05-15 13:00:00', 'YYYY-MM-DD HH24:MI:SS'), 100.00, 2 FROM DUAL),
8 --------------------------------
9 -- End of data preparation
10 --------------------------------
11 rn_table AS (
12 SELECT t.*, row_number() OVER (PARTITION BY TRUNC(t_time) ORDER BY t_time DESC) AS rn
13 FROM cte_table t)
14 SELECT snno,
15 AVG(VALUE * rate) new_value
16 FROM rn_table
17 WHERE rn = 1
18 GROUP BY snno;
Output:
SNNO NEW_VALUE
---- ----------
ABC 150
XYZ 450
Use the ROW_NUMBER (or RANK/DENSE_RANK if it is more appropriate) analytic function in a sub-query and then aggregate in the outer query:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( Snno, t_time, Value, rate ) AS
SELECT 'ABC', TIMESTAMP '2018-05-17 08:00:00', 100.00, 3 FROM DUAL UNION ALL
SELECT 'ABC', TIMESTAMP '2018-05-17 22:00:00', 200.00, 1 FROM DUAL UNION ALL
SELECT 'ABC', TIMESTAMP '2018-05-16 08:00:00', 100.00, 1 FROM DUAL UNION ALL
SELECT 'XYZ', TIMESTAMP '2018-05-14 01:00:00', 700.00, 1 FROM DUAL UNION ALL
SELECT 'XYZ', TIMESTAMP '2018-05-15 10:00:00', 500.00, 2 FROM DUAL UNION ALL
SELECT 'XYZ', TIMESTAMP '2018-05-15 13:00:00', 100.00, 2 FROM DUAL;
Query 1:
SELECT snno,
AVG( value * rate ) As new_value
FROM (
SELECT t.*,
ROW_NUMBER() OVER (
PARTITION BY snno, value
ORDER BY t_time DESC
) AS rn
FROM table_name t
)
WHERE rn = 1
GROUP BY snno
Results:
| SNNO | NEW_VALUE |
|------|-------------------|
| ABC | 250 |
| XYZ | 633.3333333333334 |
Below is an example table.
DECLARE #Temp TABLE (ID int, Name varchar(50), LiveDate Date, LiveTime time(7), Duration_Seconds int)
INSERT INTO #Temp (ID, Name, LiveDate, LiveTime, Duration)
SELECT 1, 'ABC', '2013-08-19', '00:01:00.0000000', 300
UNION ALL
SELECT 2, 'ABC', '2013-08-19', '00:01:00.0000000', 300
UNION ALL
SELECT 3, 'DEF', '2013-08-19', '00:01:00.0000000', 300
UNION ALL
SELECT 4, 'DEF', '2013-08-19', '00:03:00.0000000', 300
UNION ALL
SELECT 5, 'GHI', '2013-08-19', '00:01:00.0000000', 300
UNION ALL
SELECT 6, 'GHI', '2013-08-19', '00:01:00.0000000', 300
UNION ALL
SELECT 7, 'GHI', '2013-08-19', '00:03:00.0000000', 300
UNION ALL
SELECT 8, 'GHI', '2013-08-19', '00:09:00.0000000', 300
UNION ALL
SELECT 9, 'GHI', '2013-08-20', '00:06:00.0000000', 300
UNION ALL
SELECT 10, 'JKL', '2013-08-19', '00:01:00.0000000', 300
UNION ALL
SELECT 11, 'MNO', '2013-08-19', '00:01:00.0000000', 300
SELECT *,
CASE
WHEN COUNT(*) OVER (PARTITION BY Name, LiveDate, LiveTime) > 1 THEN 1
ELSE 0
END AS Duplicate
FROM #Temp
Now, the output that I desire is the following.
/*
Desired Output
ID Name LiveDate Livetime Duration_Seconds Duplicate OverLap
1 ABC 2013-08-19 00:01:00.0000000 300 Yes No
2 ABC 2013-08-19 00:01:00.0000000 300 Yes No
3 DEF 2013-08-19 00:01:00.0000000 300 No Yes
4 DEF 2013-08-19 00:03:00.0000000 300 No Yes
5 GHI 2013-08-19 00:01:00.0000000 300 Yes Yes
6 GHI 2013-08-19 00:01:00.0000000 300 Yes Yes
7 GHI 2013-08-19 00:03:00.0000000 300 No Yes
8 GHI 2013-08-19 00:09:00.0000000 300 No No
9 GHI 2013-08-20 00:06:00.0000000 300 No No
10 JKL 2013-08-19 00:01:00.0000000 300 No No
11 MNO 2013-08-19 00:01:00.0000000 300 No No
*/
How may I go about doing this? Any help would be appreciated.
I am unsure of how to find Overlap.
For Overlap to be Yes/True/1, the Name and Date has to be the same.
Then, we have to look at the time and duration.
Let's say for GHI, time = 12:01 for ID 5 and 6 and 12:03 for ID 7.
But according to the duration, which is 300 seconds OR 5 mins, since 12:03 is within 5 mins from 12:01, I want to mark Overlap = Yes/True/1 for those three records.
Consider LiveTime as Start Time. Duration_Seconds as total time the record was Live.
So GHI ID 5 & 6 LiveTime = 12:01 AM and lasted for 300 seconds (5 minutes). So It went live at 12:01 AM and was dead at 12:06 AM.
GHI ID 7 went live at 12:03 AM with SAME Name and Date. But it should not have since we already have record Live from 12:01 AM to 12:06 AM with same Name and Date. Therefore, all GHI are marked as Overlap = Yes/True/1
Hope this helps you understand what I am trying to do.
THX
This will work, there might be a simpler way:
;WITH cte AS ( SELECT *,
CASE
WHEN COUNT(*) OVER (PARTITION BY Name, LiveDate, LiveTime) > 1 THEN 1
ELSE 0
END AS Duplicate
FROM #Temp)
SELECT DISTINCT a.*,CASE WHEN b.ID IS NOT NULL THEN 1 ELSE 0 END 'Overlap'
FROM cte a
LEFT JOIN cte b
ON a.NAME = b.NAME
AND a.LiveDate = b.LiveDate
AND ((b.LiveTime > a.Livetime AND b.LiveTime < DATEADD(SECOND,a.Duration_Seconds,a.LiveTime))
OR (a.LiveTime > b.Livetime AND a.LiveTime < DATEADD(SECOND,b.Duration_Seconds,b.LiveTime)))
You might have to adjust the JOIN criteria if the above doesn't work for all instances of overlap.