Select a column from a different row - sql

so I have the following table on AWS Redshift
node_id power_source timestamp
----------------------------------------------
108 LINE 2019-09-10 09:15:30
108 BATT 2019-09-10 10:20:15
108 LINE 2019-09-10 13:45:00
108 LINE 2019-09-11 06:00:15
108 BATT 2019-09-12 05:50:15
108 BATT 2019-09-12 12:15:15
108 LINE 2019-09-12 18:45:15
108 LINE 2019-09-13 09:20:15
108 BATT 2019-09-14 11:20:15
108 BATT 2019-09-14 13:30:15
108 BATT 2019-09-14 15:30:15
108 LINE 2019-09-14 16:48:36
108 LINE 2019-09-15 09:20:15
I am trying to figure out how long (cumulative) the node's power_source is on 'BATT'. I am thinking that I could do a datediff on the timestamps, but I would need to get the timestamp of the first 'LINE' row after the 'BATT' row (based on ts). Not really sure how to get that value though. Once I have that, then I could just SUM() the datediff.
Edit:
Here is the expected result
node_id power_source timestamp ts_line_power ts_diff(in mins)
-----------------------------------------------------------------------------------------
108 BATT 2019-09-10 10:20:15 2019-09-10 13:45:00 205
108 BATT 2019-09-12 05:50:15 2019-09-12 18:45:15 785
108 BATT 2019-09-14 11:20:15 2019-09-14 16:48:36 328
Any help/assistance would be appreciated

If I understand correctly, you can use lead():
select node_id,
sum(datediff(minute, timestamp, next_ts)) as diff_in_minutes
from (select t.*,
lead(timestamp) over (partition by node_id order by timestamp) as next_ts
from t
) t
where power_source = 'BATT'
group by node_id;
This gets the timestamp after the BATT record and uses that to define the end time.
EDIT:
The above is overall for all "BATT"s. You have a group-and-islands problem. For that, you can assign a group by counting the number of non-BATT records greater than each row. This keeps the next record in the group.
This is all window functions and aggregation:
select node_id, min(timestamp), max(timestamp),
sum(datediff(minute, min(timestamp), max(timestamp))) as diff_in_minutes
from (select t.*,
sum( (power_source = 'LINE')::int ) over (partition by node_id order by timestamp desc) as grp
from t
) t
group by node_id, grp
having sum( (power_source = 'BATT')::int) > 0; -- only include rows that have at least one BATT
Note that this assumes that only "LINE" and "BATT" are valid values for the power source.

Related

Summing column that is grouped - SQL

I have a query:
SELECT
date,
COUNT(o.row_number)FILTER (WHERE o.row_number > 1 AND date_ddr IS NOT NULL AND telephone_number <> 'Anonymous' ) repeat_calls_24h
(
SELECT
telephone_number,
date_ddr,
ROW_NUMBER() OVER(PARTITION BY ddr.telephone_number ORDER BY ddr.date) row_number,
FROM
table_a
)o
GROUP BY 1
Generating the following table:
date
Repeat calls_24h
17/09/2022
182
18/09/2022
381
19/09/2022
81
20/09/2022
24
21/09/2022
91
22/09/2022
110
23/09/2022
231
What can I add to my query to provide a sum of the previous three days as below?:
date
Repeat calls_24h
Repeat Calls 3d
17/09/2022
182
18/09/2022
381
19/09/2022
81
644
20/09/2022
24
486
21/09/2022
91
196
22/09/2022
110
225
23/09/2022
231
432
Thanks
We can do it using lag.
select "date"
,"Repeat calls_24h"
,"Repeat calls_24h" + lag("Repeat calls_24h") over(order by "date") + lag("Repeat calls_24h", 2) over(order by "date") as "Repeat Calls 3d"
from t
date
Repeat calls_24h
Repeat Calls 3d
2022-09-17
182
null
2022-09-18
381
null
2022-09-19
81
644
2022-09-20
24
486
2022-09-21
91
196
2022-09-22
110
225
2022-09-23
231
432
Fiddle

Redshift Min Window Function over multiple dates/ IDs

I have some data that looks like this (trunced):
date listing_id inquiry_id listed_on inquiry_date days_between_list_inquiry
2021-06-08 957 16891 2021-06-08T00:00:00.000Z 2020-12-22 168
2021-06-09 957 17045 2021-06-09T00:00:00.000Z 2020-12-22 169
2021-06-09 957 16985 2021-06-09T00:00:00.000Z 2020-12-22 169
2021-03-04 1117 6869 2021-03-04T00:00:00.000Z 2021-03-01 3
2021-03-05 1117 6933 2021-03-05T00:00:00.000Z 2021-03-01 4
2021-03-08 1117 7212 2021-03-08T00:00:00.000Z 2021-03-01 7
2021-03-11 1117 7449 2021-03-11T00:00:00.000Z 2021-03-01 10
The table captures a daily record of each listing on the day level.
For each listing_id, I'd like to create column that captures the first_inquiry_date related to that listing. So, for listing_id 957, that would be 2020-12-22; for ID 1117, it would be 2021-03-01.
I tried:
min(date_trunc('day',li.created_at)) over (order by ll.id asc, date asc rows unbounded preceding) as min_inquiry_date,
and
min(date_trunc('day',li.created_at)) over (order by ll.id date rows unbounded preceding) as min_inquiry_date,
and a variety of other order bys but I'm not getting what I'm looking for.
Any help would be greatly appreciated. Thank you!
You need a partition by:
min(date_trunc('day',li.created_at)) over (partition by listing_id) as min_inquiry_date
you can just do this
select
listing_id,
min(inquiry_date) as first_inquiry_date
from [table name]
group by 1

Postgresql: Average for each day in interval

I have table that is structured like this:
item_id first_observed last_observed price
1 2016-10-21 2016-10-27 121
1 2016-10-28 2016-10-31 145
2 2016-10-22 2016-10-28 135
2 2016-10-29 2016-10-30 169
What I want is to get the average price for every day. I obviously cannot just group by first_observed or last_observed. Does Postgres offer a smart way of doing this?
The expected output would be like this:
date avg(price)
2016-10-21 121
2016-10-22 128
2016-10-23 128
2016-10-24 128
2016-10-25 128
2016-10-26 128
2016-10-27 128
2016-10-28 140
2016-10-29 157
2016-10-30 157
2016-10-31 157
I could also be outputted like this (both are fine):
start end avg(price)
2016-10-21 2016-10-21 121
2016-10-22 2016-10-27 128
2016-10-28 2016-10-28 140
2016-10-29 2016-10-31 157
demo:db<>fiddle
generate_series allows you to expand date ranges:
First step:
SELECT
generate_series(first_observed, last_observed, interval '1 day')::date as observed,
AVG(price)::int as avg_price
FROM items
GROUP BY observed
ORDER BY observed
expanding the date range
grouping the dates for AVG aggregate
Second step
SELECT
MIN(observed) as start,
MAX(observed) as end,
avg_price
FROM (
-- <first step as subquery>
)s
GROUP BY avg_price
ORDER BY start
Grouping by avg_price to get the MIN/MAX date for it
WITH ObserveDates (ObserveDate) AS (
SELECT * FROM generate_series((SELECT MIN(first_observed) FROM T), (SELECT MAX(last_observed) FROM T), '1 days')
)
SELECT ObserveDate, AVG(Price)
FROM ObserveDates
JOIN T ON ObserveDate BETWEEN first_observed AND last_observed
GROUP BY ObserveDate
ORDER BY ObserveDate

SQL order dates sequentially by year

I have a SQL view that produces the following list of Mondays in a specific date range as shown below:
Date Number
16/12/2013 208
23/12/2013 190
30/12/2013 187
15/12/2014 203
22/12/2014 190
29/12/2014 153
14/12/2015 225
21/12/2015 217
28/12/2015 223
Is it possible to order them by the first of each year then the second then the third etc. to give me the results as shown below:
Date Number
16/12/2013 208
15/12/2014 203
14/12/2015 225
23/12/2013 190
22/12/2014 190
21/12/2015 217
30/12/2013 187
29/12/2014 153
28/12/2015 223
Thank you in advance for any help or advice.
I think you should be able to get what you want by using the row_number() over a partition on the year, for example:
Select [Date], [Number],
Row_Number() over (PARTITION BY Year([DATE] order by [DATE]) as WEEK_IN_YR
from [table]
order by WEEK_IN_YR, [Date]
https://msdn.microsoft.com/en-gb/library/ms186734.aspx

How to get min date of every month for six months?

I have data like this.
Process_date SEQ_No
------------- ---------
16-MAR-13 733
09-MAR-13 732
02-MAR-13 731
24-FEB-13 730
16-FEB-13 728
09-FEB-13 727
02-FEB-13 726
26-JAN-13 725
21-JAN-13 724
12-JAN-13 723
05-JAN-13 722
29-DEC-12 721
24-DEC-12 720
15-DEC-12 719
08-DEC-12 718
03-DEC-12 717
22-NOV-12 716
17-NOV-12 715
10-NOV-12 714
03-NOV-12 713
29-OCT-12 712
23-OCT-12 711
13-OCT-12 710
05-OCT-12 709
28-SEP-12 708
22-SEP-12 707
15-SEP-12 706
08-SEP-12 705
01-SEP-12 704
Every month admin will refresh actual data table and automatically this above table will update with unique seq_no and process_date.
I need to extarct min date of everymonth(First refresh of last 6 months - excluding currrent month) and also seq_no related to that month so using joins(using seq_no - that is available in main table) i can combine actual data.
I need result like:
02-MAR-13 731 ( I don't need MAR as it should not take current month data)
so i need final result like below:
02-FEB-13 726
05-JAN-13 722
08-DEC-12 718
03-NOV-12 713
05-OCT-12 709
01-SEP-12 704
--sorry for asking direct quetion like this. I am not sure how to do that. thats the reason i have not prepared/posted any query.
select Process_date, SEQ_No
from (select Process_date, SEQ_No,
row_number() over (partition by trunc(process_date, 'mm') order by process_date) rn
from yourtab
where Process_date < trunc(sysdate, 'mm'))
where rn = 1;
will do that
fiddle example: http://sqlfiddle.com/#!4/a5452/1
I didn't understood how seq_no is in another table...
But using the input data:
select
min(process_date),
min(seq_no) keep (dense_rank first order by process_date)
from
your_table
where
process_date between add_months(trunc(sysdate,'MM'),-7)
and last_day(add_months(sysdate, -1))
group by
trunc(process_date,'MM');
Try:
SELECT seq_no,process_date FROM my_table
WHERE process_date IN (SELECT min(process_date)
FROM my_table
GROUP BY TRUNC(process_date,'MM'))