Can I put a condition on a window function in Redshift? - sql

I have an events-based table in Redshift. I want to tie all events to the FIRST event in the series, provided that event was in the N-hours preceding this event.
If all I cared about was the very first row, I'd simply do:
SELECT
event_time
,first_value(event_time)
OVER (ORDER BY event_time rows unbounded preceding) as first_time
FROM
my_table
But because I only want to tie this to the first event in the past N-hours, I want something like:
SELECT
event_time
,first_value(event_time)
OVER (ORDER BY event_time rows between [N-hours ago] and current row) as first_time
FROM
my_table
A little background on my table. It's user actions, so effectively a user jumps on, performs 1-100 actions, and then leaves. Most users are 1-10x per day. Sessions rarely last over an hour, so I could set N=1.
If I just set a PARTITION BY date_trunc('hour', event_time), I'll double create for sessions that span the hour.
Assume my_table looks like
id | user_id | event_time
----------------------------------
1 | 123 | 2015-01-01 01:00:00
2 | 123 | 2015-01-01 01:15:00
3 | 123 | 2015-01-01 02:05:00
4 | 123 | 2015-01-01 13:10:00
5 | 123 | 2015-01-01 13:20:00
6 | 123 | 2015-01-01 13:30:00
My goal is to get a result that looks like
id | parent_id | user_id | event_time
----------------------------------
1 | 1 | 123 | 2015-01-01 01:00:00
2 | 1 | 123 | 2015-01-01 01:15:00
3 | 1 | 123 | 2015-01-01 02:05:00
4 | 4 | 123 | 2015-01-01 13:10:00
5 | 4 | 123 | 2015-01-01 13:20:00
6 | 4 | 123 | 2015-01-01 13:30:00

The answer appears to be "no" as of now.
There is a functionality in SQL Server of using RANGE instead of ROWS in the frame. This allows the query to compare values to the current row's value.
https://www.simple-talk.com/sql/learn-sql-server/window-functions-in-sql-server-part-2-the-frame/
When I attempt this syntax in Redshift I get the error that "Range is not yet supported"
Someone update this when that "yet" changes!

Related

Postgresql Get Maximum value per day with corresponding time

I have the following table:
Date | Time | Value | ReceivedTime
2022-04-01| 00:59:59 | 5 | 00:30:15
2022-04-01| 13:59:59 | 15 | 13:30:00
2022-04-02| 21:59:59 | 5 | 21:30:15
2022-04-02| 22:59:59 | 25 | 22:25:15
2022-04-02| 23:59:59 | 25 | 23:00:15
2022-04-03| 14:59:59 | 50 | 00:30:15
2022-04-03| 15:59:59 | 555 | 00:30:15
2022-04-03| 16:59:59 | 56 | 00:30:15
I want to get maximum value along with Date,ReceivedTime.
Expected Result:
Date | Value | ReceivedTime
2022-04-01 | 15 | 13:30:00
2022-04-02 | 25 | 23:00:15
2022-04-03 | 555 | 00:30:15
This answer assumes that, in the event of two or more records being tied on a given day for the same highest value, you want to retain the single record with the most recent ReceivedTime. We can use DISTINCT ON here:
SELECT DISTINCT ON (Date) Date, Value, ReceivedTime
FROM yourTable
ORDER BY Date, Value DESC, ReceivedTime DESC;

Summarizing data using SQL

I have a problem that I am trying to solve using SQL and I needed your inputs on the approach to go about it.
This is how the input data & expected output looks like:
container_edits - This is the input table
container | units | status | move_time
-------------------------------------------------
XYZ | 5 | Start | 2018-01-01 00:00:15
XYZ | 2 | Add | 2018-01-01 00:01:10
XYZ | 3 | Add | 2018-01-01 00:02:00
XYZ | null | Complete | 2018-01-01 00:03:00
XYZ | 5 | Start | 2018-01-01 00:04:15
XYZ | 3 | Add | 2018-01-01 00:05:10
XYZ | 4 | Add | 2018-01-01 00:06:00
XYZ | 5 | Add | 2018-01-01 00:07:10
XYZ | 6 | Add | 2018-01-01 00:08:00
XYZ | null | Complete | 2018-01-01 00:09:00
Expected summarized output
container | loop_num | units | start_time | end_time
------------------------------------------------------------------------
XYZ | 1 | 10 | 2018-01-01 00:00:15 | 2018-01-01 00:03:00
XYZ | 2 | 23 | 2018-01-01 00:04:15 | 2018-01-01 00:09:00
Essentially, I need to partition the data based on the status label, extract the minimum and maximum time within the partition and get the total number of units within that partition. I am aware of the usage of window functions and the partition by clause but I am unclear on how to apply that when I need to partition based on the value of a column ('status' in this case).
Any leads on how to go about solving this would be really helpful. Thank you!
You can assign a group using a cumulative sum of starts -- which is your loop_num The rest is aggregation:
select container, loop_num, sum(units),
min(move_time), max(move_time)
from (select ce.*,
sum(case when status = 'Start' then 1 else 0 end) over (partition by container order by move_time) as loop_num
from container_edits ce
) ce
group by container, loop_num;
Here is a db<>fiddle (it happens to use Postgres, but the syntax is standard SQL).

Count the number of events between two (or more) different login periods

I would like to count the number of events occured for each user between each login. The login are stored at one table, and the other events are stored at another.
So if a user logged-in at 2019-10-03 10:00:00 then any events that occured after that time will be grouped and counted towards that specific time until a new event occurs (e.g: 2019-11-04 11:00:00) and then we'll count according to the new time.
Meaning that for 2019-10-03 10:00:00 we'll count all values between 2019-10-03 10:00:00 and 2019-11-04 11:00:00 and for 2019-11-04 11:00:00 we count anything above it.
Another way of looking at it:
user_login:
User | Login_Timestamp
1 | 2019-10-03 10:00:00
1 | 2019-11-03 14:44:00
1 | 2019-14-03 08:01:11
user_events:
User | ... | EventTimestamp
1 | ... | 2019-10-03 10:01:00
1 | ... | 2019-10-03 10:10:00
1 | ... | 2019-11-03 13:10:00
1 | ... | 2019-11-03 14:45:11
1 | ... | 2019-11-03 14:46:11
1 | ... | 2019-14-03 10:10:00
The output I would like to get is:
User | LoginTimestamp | NumberOfEvents
1 | 2019-10-03 10:00:00 | 3
1 | 2019-11-03 14:44:00 | 2
1 | 2019-14-03 08:01:11 | 1
Thanks !
Using transposition:
WITH cte AS (
SELECT user,
loginTimestamp AS loginTimestampStart,
LEAD(loginTimestampStart) OVER(PARTITION BY user
ORDER BY loginTimestamp) AS loginTimestampEnd
FROM user_login
)
SELECT c.user, c.loginTimestampStart, COUNT(*) AS NumberOfEvents
FROM cte c
JOIN user_events e
ON c.user = e.user
AND e.EventTimestamp >= c.loginTimestampStart
AND (e.EventTimestamp < c.loginTimestampEnd OR c.loginTimestampEnd IS NULL)
GROUP BY c.user, c.loginTimestampStart

How can i create retrospective trend over time using SQL

I have some troubles creating a retrospective trend over time for next table using just SQL
user_id | Date of Exam | Exam Name | Result
-------------- +-----------------+--------------+-------
1 | 2013-01-01 6:00 | Geography | PASS
1 | 2013-01-02 6:00 | Math | FAIL
1 | 2013-01-03 6:00 | Geography | FAIL
1 | 2013-01-04 6:00 | Biology | FAIL
1 | 2013-01-04 7:00 | Biology | PASS
1 | 2013-01-04 6:00 | Math | FAIL
1 | 2013-01-04 7:00 | Math | PASS
2 | 2013-01-04 7:00 | Math | FAIL
I need to get pass rate for each day during a specific date range. For example for a specific day X i need to get latest available results as on current day for particular student(if no result is available for current date for him I need to take the one from previous date if previous day result is empty I need to take from the day before and so on). IF multiple results are available for one day for specific student the latest result should be used in calculations, the older one is ignored. I need to get pass percentage for each exam group for a particular day. The resulting table should look like this
Exam Name | 2013-01-01 | 2013-01-02 | 2013-01-03 | 2013-01-04
---------- +---------------+-- ------------+--------------+------------
Geography | 100% | 100% | 0% | 0%
Math | NULL | 0% | 0% | 50%
Biology | NULL | NULL | NULL | 100%
As of now i only managed to return multiple tables for each day. But i think it possible to merge it a single table. This is a query to get latest result for a specific day
select ExamName, COUNT(*) as TotalCount,
sum(case when Result = 'PASS' then 1 else 0 end) PassCount
from (SELECT
UserID,
ExamName,
Result,
DateOfExam,
ROW_NUMBER() OVER (Partition BY UserID, ExamName Order By DateOfExam DESC) AS RowNum
From dbo.ExamResults
where DateOfExam <= '2013-01-04 7:00'
) T1
where T1.RowNum = 1
group by ExamName
SQLFiddle with some DDL: http://sqlfiddle.com/#!6/6fde8/2/0

Postgres 9.1 - Numbering groups of rows

I have some data that represents different 'actions'. These 'actions' collectively comprise an 'event'.
The data looks like this:
EventID | UserID | Action | TimeStamp
--------------+------------+------------+-------------------------
1 | 111 | Start | 2012-01-01 08:00:00
1 | 111 | Stop | 2012-01-01 08:59:59
1 | 999 | Start | 2012-01-01 09:00:00
1 | 999 | Stop | 2012-01-01 09:59:59
1 | 111 | Start | 2012-01-01 10:00:00
1 | 111 | Stop | 2012-01-01 10:30:00
As you can see, each single 'event' is made of one or more 'Actions' (or as I think of them, 'sub events').
I need to identify each 'sub event' and give it an identifier. This is what I am looking for:
EventID | SubeventID | UserID | Action | TimeStamp
--------------+----------------+------------+------------+-------------------------
1 | 1 | 111 | Start | 2012-01-01 08:00:00
1 | 1 | 111 | Stop | 2012-01-01 08:59:59
1 | 2 | 999 | Start | 2012-01-01 09:00:00
1 | 2 | 999 | Stop | 2012-01-01 09:59:59
1 | 3 | 111 | Start | 2012-01-01 10:00:00
1 | 3 | 111 | Stop | 2012-01-01 10:30:00
I need something that can start counting, but only increment when some column has a specific value (like "Action" = 'Start').
I have been trying to use Window Functions for this, but with limited success. I just can't seem to find a solution that I feel will work... any thoughts?
If you have some field you can sort by, you could use the following query (untested):
SELECT
sum(("Action" = 'Start')::int) OVER (PARTITION BY "EventID" ORDER BY "Timestamp" ROWS UNBOUNDED PRECEDING)
FROM
events
Note that if the first SubEvent does not start with Start, it will have an event id of 0, which might not be what you want.
You could also use COUNT() in place of SUM():
SELECT
EventID
, COUNT(CASE WHEN Action = 'Start' THEN 1 END)
OVER ( PARTITION BY EventID
ORDER BY TimeStamp
ROWS UNBOUNDED PRECEDING )
AS SubeventID
, UserID
, Action
FROM
tableX AS t ;
Tests at SQL-Fiddle: test