Postgresql Get Maximum value per day with corresponding time

Postgresql Get Maximum value per day with corresponding time - sql

I have the following table:
Date | Time | Value | ReceivedTime
2022-04-01| 00:59:59 | 5 | 00:30:15
2022-04-01| 13:59:59 | 15 | 13:30:00
2022-04-02| 21:59:59 | 5 | 21:30:15
2022-04-02| 22:59:59 | 25 | 22:25:15
2022-04-02| 23:59:59 | 25 | 23:00:15
2022-04-03| 14:59:59 | 50 | 00:30:15
2022-04-03| 15:59:59 | 555 | 00:30:15
2022-04-03| 16:59:59 | 56 | 00:30:15
I want to get maximum value along with Date,ReceivedTime.
Expected Result:
Date | Value | ReceivedTime
2022-04-01 | 15 | 13:30:00
2022-04-02 | 25 | 23:00:15
2022-04-03 | 555 | 00:30:15

This answer assumes that, in the event of two or more records being tied on a given day for the same highest value, you want to retain the single record with the most recent ReceivedTime. We can use DISTINCT ON here:
SELECT DISTINCT ON (Date) Date, Value, ReceivedTime
FROM yourTable
ORDER BY Date, Value DESC, ReceivedTime DESC;

Related

Logic to read multiple rows in a table where flag = 'Y'

Consider the following scenario. I have a Customer table, which includes RowStart and EndDate logic, thus writing a new row every time a field value is updated.
Relevant fields in this table are:
RowStartDate
RowEndDate
CustomerNumber
EmployeeFlag
For this, I'd like to write a query, which will return an employee's period of tenure (EmploymentStartDate, and EmploymentEndDate). I.e. The RowStartDate when EmployeeFlag first became 'Y', and then the first RowStartDate where EmployeeFlag changed to 'N' (Ordered of course, by the RowStartDate asc). There is an additional complexity in that the Flag value may change between Y and N multiple times for a single person, as they may become staff, resign and then be employed again at a later date.
Example table structure is:
| CustomerNo | StaffFlag | RowStartDate | RowEndDate |
| ---------- | --------- | ------------ | ---------- |
| 12 | N | 2019-01-01 | 2019-01-14 |
| 12 | N | 2019-01-14 | 2019-03-02 |
| 12 | Y | 2019-03-02 | 2019-10-12 |
| 01 | Y | 2020-03-13 | NULL |
| 12 | N | 2019-10-12 | 2020-01-01 |
| 12 | Y | 2020-01-01 | NULL |
Output could be something like
| CustomerNo | StaffStartDate | StaffEndDate |
| ---------- | -------------- | ------------ |
| 12 | 2019-03-02 | 2019-10-12 |
| 01 | 2020-03-13 | NULL |
| 12 | 2021-01-01 | NULL |
Any ideas on how I might be able to solve this would be really appreciated.

Make sure you order the columns by ID and by dates:
select *
from yourtable
order by CustomerNumber asc,
EmployeeFlag desc,
RowStartDate asc,
RowEndDate asc
This gives you a list of all changes over time per employee.
Subsequently, you want to map two rows into a single row with two columns (two dates mapped into overall start and end date). Others have done this using the lead() function. For details please have a look here: Merging every two rows of data in a column in SQL Server

Delete rows with certain magnitude datetime difference

I have a table that looks something like the following:
| ID | Value | Date |
|----|-------|----------------------|
| 4 | 9 | 4/14/2021 3:00:00 PM |
| 4 | 1 | 4/14/2021 4:00:00 PM |
| 4 | 3 | 4/14/2021 4:03:00 PM |
| 4 | 2 | 4/14/2021 4:05:00 PM |
| 2 | 50 | 4/14/2021 4:00:00 PM |
| 2 | 20 | 4/14/2021 4:10:00 PM |
What I would like to do is delete any rows for each ID that are within a certain time magnitude from each other, for this example lets say 5 minutes, while keeping the most recent record. Using the example table the expected output would be the following
| ID | Value | Date |
|----|-------|----------------------|
| 4 | 9 | 4/14/2021 3:00:00 PM |
| 4 | 2 | 4/14/2021 4:05:00 PM |
| 2 | 50 | 4/14/2021 4:00:00 PM |
| 2 | 20 | 4/14/2021 4:10:00 PM |
This is for MS Access

Following produces desired output with given sample.
Query1:
SELECT Table1.*, (
SELECT TOP 1 Dup.Dte FROM Table1 AS Dup WHERE Dup.GrpID=Table1.GrpID AND
Dup.ID>Table1.ID ORDER BY Dte) AS NextDte
FROM Table1;
Query2 - If you really want to delete records and not just filter:
DELETE FROM Table1 WHERE ID IN (SELECT ID FROM Query1 WHERE DateDiff("n",[Dte],[NextDte])<=5);
Notice this requires a unique identifier field (or compound unique identifier) - autonumber field should serve. I used ID for that field and GrpID for each group ID. Value and Date are reserved words so I used different names.

Get value from previous hour and subtract from current value in SQL

I need to subtract level value from previous hour max(date)'s level value but I'm confused on how to get the last record and subtract it hour-wise.
My table records are like this:
SNO | Date | ID | Level
1 | 2021-01-13 00:07:44.190 | 1021 | 56.29
2 | 2021-01-13 00:33:44.190 | 1022 | 84.29
3 | 2021-01-13 00:35:44.190 | 1021 | 54.29
4 | 2021-01-13 00:43:44.190 | 1021 | 53.29
5 | 2021-01-13 00:47:44.190 | 1022 | 82.29
6 | 2021-01-13 01:07:44.190 | 1021 | 52.93
7 | 2021-01-13 01:33:44.190 | 1022 | 82.29
8 | 2021-01-13 01:43:44.190 | 1021 | 47.29
9 | 2021-01-13 01:47:44.190 | 1022 | 79.29
10 | 2021-01-13 02:07:44.190 | 1021 | 44.29
11 | 2021-01-13 02:33:44.190 | 1022 | 77.29
Now what I need to do is I need max(date) from each hour on the basis of ID whose results are like this:
SNO | Date | ID | Level | Level_2
3 | 2021-01-13 00:43:44.190 | 1021 | 53.29 | <-- Level from previous last hour or 0 -->
4 | 2021-01-13 00:47:44.190 | 1022 | 82.29 | <-- Level from previous last hour or 0 -->
7 | 2021-01-13 01:43:44.190 | 1021 | 47.29 | 54.29
8 | 2021-01-13 01:47:44.190 | 1022 | 79.29 | 82.29
9 | 2021-01-13 02:07:44.190 | 1021 | 44.29 | 47.29
10 | 2021-01-13 02:33:44.190 | 1022 | 77.29 | 79.29
Kindly please share possible results for this condition and you can ask for more information if needed.

To get the results you want, you can filter down to the last row in each hour and then use lag():
select t.*,
lag(level) over (partition by id order by date) as prev_level
from (select t.*,
row_number() over (partition by id
order by convert(date, date), datepart(hour, date) order by date desc
) as seqnm
from t
) t
where seqnum = 1;
Note: This assumes that you have data for every hour.
Another method is to look at the next date and see if it is in the same hour.
An alternative way to get data for every hour is to use lead():
select t.*
from (select t.*,
lead(date) over (partition by id order by date) as next_date
from t
) t
where next_date is null or
datediff(hour, date, next_date) > 0;

Can I put a condition on a window function in Redshift?

I have an events-based table in Redshift. I want to tie all events to the FIRST event in the series, provided that event was in the N-hours preceding this event.
If all I cared about was the very first row, I'd simply do:
SELECT
event_time
,first_value(event_time)
OVER (ORDER BY event_time rows unbounded preceding) as first_time
FROM
my_table
But because I only want to tie this to the first event in the past N-hours, I want something like:
SELECT
event_time
,first_value(event_time)
OVER (ORDER BY event_time rows between [N-hours ago] and current row) as first_time
FROM
my_table
A little background on my table. It's user actions, so effectively a user jumps on, performs 1-100 actions, and then leaves. Most users are 1-10x per day. Sessions rarely last over an hour, so I could set N=1.
If I just set a PARTITION BY date_trunc('hour', event_time), I'll double create for sessions that span the hour.
Assume my_table looks like
id | user_id | event_time
----------------------------------
1 | 123 | 2015-01-01 01:00:00
2 | 123 | 2015-01-01 01:15:00
3 | 123 | 2015-01-01 02:05:00
4 | 123 | 2015-01-01 13:10:00
5 | 123 | 2015-01-01 13:20:00
6 | 123 | 2015-01-01 13:30:00
My goal is to get a result that looks like
id | parent_id | user_id | event_time
----------------------------------
1 | 1 | 123 | 2015-01-01 01:00:00
2 | 1 | 123 | 2015-01-01 01:15:00
3 | 1 | 123 | 2015-01-01 02:05:00
4 | 4 | 123 | 2015-01-01 13:10:00
5 | 4 | 123 | 2015-01-01 13:20:00
6 | 4 | 123 | 2015-01-01 13:30:00

The answer appears to be "no" as of now.
There is a functionality in SQL Server of using RANGE instead of ROWS in the frame. This allows the query to compare values to the current row's value.
https://www.simple-talk.com/sql/learn-sql-server/window-functions-in-sql-server-part-2-the-frame/
When I attempt this syntax in Redshift I get the error that "Range is not yet supported"
Someone update this when that "yet" changes!

Postgres 9.1 - Numbering groups of rows

I have some data that represents different 'actions'. These 'actions' collectively comprise an 'event'.
The data looks like this:
EventID | UserID | Action | TimeStamp
--------------+------------+------------+-------------------------
1 | 111 | Start | 2012-01-01 08:00:00
1 | 111 | Stop | 2012-01-01 08:59:59
1 | 999 | Start | 2012-01-01 09:00:00
1 | 999 | Stop | 2012-01-01 09:59:59
1 | 111 | Start | 2012-01-01 10:00:00
1 | 111 | Stop | 2012-01-01 10:30:00
As you can see, each single 'event' is made of one or more 'Actions' (or as I think of them, 'sub events').
I need to identify each 'sub event' and give it an identifier. This is what I am looking for:
EventID | SubeventID | UserID | Action | TimeStamp
--------------+----------------+------------+------------+-------------------------
1 | 1 | 111 | Start | 2012-01-01 08:00:00
1 | 1 | 111 | Stop | 2012-01-01 08:59:59
1 | 2 | 999 | Start | 2012-01-01 09:00:00
1 | 2 | 999 | Stop | 2012-01-01 09:59:59
1 | 3 | 111 | Start | 2012-01-01 10:00:00
1 | 3 | 111 | Stop | 2012-01-01 10:30:00
I need something that can start counting, but only increment when some column has a specific value (like "Action" = 'Start').
I have been trying to use Window Functions for this, but with limited success. I just can't seem to find a solution that I feel will work... any thoughts?

If you have some field you can sort by, you could use the following query (untested):
SELECT
sum(("Action" = 'Start')::int) OVER (PARTITION BY "EventID" ORDER BY "Timestamp" ROWS UNBOUNDED PRECEDING)
FROM
events
Note that if the first SubEvent does not start with Start, it will have an event id of 0, which might not be what you want.
You could also use COUNT() in place of SUM():
SELECT
EventID
, COUNT(CASE WHEN Action = 'Start' THEN 1 END)
OVER ( PARTITION BY EventID
ORDER BY TimeStamp
ROWS UNBOUNDED PRECEDING )
AS SubeventID
, UserID
, Action
FROM
tableX AS t ;
Tests at SQL-Fiddle: test

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgresql Get Maximum value per day with corresponding time - sql

Related

Logic to read multiple rows in a table where flag = 'Y'

Delete rows with certain magnitude datetime difference

Get value from previous hour and subtract from current value in SQL

Can I put a condition on a window function in Redshift?

Postgres 9.1 - Numbering groups of rows

Categories

Resources