Difference between first and last row in grouping - sql

I have this table taken from very complex query, that is, it's not possible to join on itself.
Rows are ordered by time desc.
type, value, time
+---+----+
| 2 | 2 |
| 2 | 7 |
| 3 | 20 |
| 3 | 16 |
+---+----+
I need to calculate the difference between first and last value per one type grouping, in the given example this will give me
+---+----+
| 2 | -5 |
| 3 | 4 |
+---+----+
Is it feasible?

One method uses window functions. Something like this works:
select distinct type,
(first_value(value) over (partition by type order by time asc) -
first_value(value) over (partition by type order by time desc)
) as diff
from t;
Unfortunately, Postgres doesn't have a first_value() aggregation function.
You could also do this using array_agg():
select distinct type,
((array_agg(value order by time asc))[1] -
(array_agg(value order by time desc))[1]
) as diff
from t
group by type;

Related

How can I create time range grouping in window function SQL

I'm trying to create a grouping using multiple window function on SQL, the objective is to discern between different groups if there are some other groups in the middle. see below table
Part | time | expected result |
a | 11-29-2022 00:05:00.000 | 1 |
a | 11-29-2022 00:05:00.010 | 1 |
b | 11-29-2022 00:06:00.000 | 2 |
c | 11-29-2022 00:15:00.000 | 3 |
c | 11-29-2022 00:15:00.000 | 3 |
b | 11-29-2022 00:40:00.010 | 4 |
b | 11-29-2022 00:40:00.020 | 4 |
b | 11-29-2022 00:40:00.020 | 4 |
b | 11-29-2022 00:40:00.030 | 4 |
I'm doing something like:
Select part, time, count(*) over(Partition by Part order by time )
Lets focus in part "b", first occurrence is at minute 6, after that appears different parts and part b appears again at minute 40 so I need something like a time range to create the grouping
Also notice that sometimes the time is different in milliseconds even if the parts are consecutive (part b), those must belong to the same group.
Was trying to use the Rank window function but with 'range between' wasn't able to get that result.
Thanks!
Just another option via dense_rank()
Select *
,NewValue = dense_rank() over (order by convert(varchar(25),[Time],120))
From YourTable
Results
Please try this sql query.
Select part, time, dense_rank() over(Partition by Part )
or
Select part, time, dense_rank() over(Partition by Part order by time rows between unbounded preceding and unbounded following )

Aggregate results split by day

I'm trying to write a query that returns summarised data, per day, over many day's of data.
For example
| id | user_id | start
|----|---------|------------------------------
| 1 | 1 | 2020-02-01T17:35:37.242+00:00
| 2 | 1 | 2020-02-01T13:25:21.344+00:00
| 3 | 1 | 2020-01-31T16:42:51.344+00:00
| 4 | 1 | 2020-01-30T06:44:55.344+00:00
The outcome I'm hoping for is a function that I can pass in a the userid and timezone, or UTC offset, and get out:
| day | count |
|---------|-------|
| 1/2/20 | 2 |
| 31/1/20 | 1 |
| 30/1/20 | 7 |
Where the count is all the rows that have a start time falling between 00:00:00.0000 and 23:59:59.9999 on each day - taking into consideration the supplied UTC offset.
I don't really know where to start writing a query like this, and I the fact I can't even picture where to start feels like a big gap in my SQL thinking. How should I approach something like this?
You can use:
select date_trunc('day', start) as dte, count(*)
from t
where userid = ?
group by date_trunc('day', start)
order by dte;
If you want to handle an additional offset, build that into the query:
select dte, count(*)
from t cross join lateral
(values (date_trunc('day', start + ? * interval '1 hour'))) v(dte)
where userid = ?
group by v.dte
order by v.dte;

Column "f.price" must appear in the GROUP BY clause or be used in an aggregate function, but I've already used window function

I have a table with two columns: date and price. They both aren't unique.
I need to get running total in unique date order (one date - values sum for this date, next date - next sum + previous one and so on).
I know how to do this with subquery, but I want to use window functions:
There is a simple query:
SELECT f.date, SUM(f.price) OVER () FROM f GROUP BY f.date
It returns the error:
column f.price must appear in the GROUP BY clause or be used in an aggregate function
But I've already used aggregate function (SUM).
Can somebody tell me why this happend?
try avoiding over()
select f.date,
SUM(f.price)
from f
group by f.date
You are mixing window functions and aggregation, which is generally not a good idea. You are getting the error because, indeed, column f.price is not used in an aggregate function (it is used a window function).
I believe that the following query should give you what you want. It uses a window function, and relies on DISTINCT instead of aggregation.
SELECT DISTINCT fdate, SUM(fprice) OVER(ORDER BY fdate) FROM f ORDER BY fdate;
Demo on DB Fiddle:
Consider the following sample data, that seems to match your spec:
| fdate | fprice |
| ------------------------ | ------ |
| 2018-01-01T00:00:00.000Z | 1 |
| 2018-01-01T00:00:00.000Z | 2 |
| 2018-01-02T00:00:00.000Z | 3 |
| 2018-01-03T00:00:00.000Z | 4 |
| 2018-01-03T00:00:00.000Z | 1 |
The query would return:
| fdate | sum |
| ------------------------ | --- |
| 2018-01-01T00:00:00.000Z | 3 |
| 2018-01-02T00:00:00.000Z | 6 |
| 2018-01-03T00:00:00.000Z | 11 |

How can I group by the difference of a column between rows in SQL?

I have a table of events with a created_at timestamp. I want to divide them into groups of events that are N seconds apart, specifically 130 seconds. Then for each group, I just need to know the lowest timestamp and the highest timestamp.
Here's some sample data (ignore the formatting of the timestamp, it's a datetime field):
------------------------
| id | created_at |
------------------------
| 1 | 2013-1-20-08:00 |
| 2 | 2013-1-20-08:01 |
| 3 | 2013-1-20-08:05 |
| 4 | 2013-1-20-08:07 |
| 5 | 2013-1-20-08:09 |
| 6 | 2013-1-20-08:12 |
| 7 | 2013-1-20-08:20 |
------------------------
And what I would like to get as a result is:
-------------------------------------
| started_at | ended_at |
-------------------------------------
| 2013-1-20-08:00 | 2013-1-20-08:01 |
| 2013-1-20-08:05 | 2013-1-20-08:09 |
| 2013-1-20-08:12 | 2013-1-20-08:12 |
| 2013-1-20-08:20 | 2013-1-20-08:20 |
-------------------------------------
I've googled and searched every possible way of phrasing that question and experimented for some time, but I can't figure it out. I can already do this in Ruby, I'm just trying to figure out if it's possible to move this to the database level. If you're curious or it's easier to visualize, here's what it looks like in Ruby:
groups = SortedSet[*events].divide { |a,b| (a.created_at - b.created_at).abs <= 130 }
groups.map do |group|
{ started_at: group.to_a.first.created_at, ended_at: group.to_a.last.created_at }
end
Does anyone know how to do this in SQL, specifically PostgreSQL?
I think you want to start each new grouping when the difference from the previous is greater than 130 seconds. You can do this with lag and date arithmetic to determine where a grouping starts. Then do a cumulative sum to get the grouping:
select Grouping, min(created_at), max(created_at)
from (select t.*, sum(GroupStartFlag) over (order by created_at) as Grouping
from (select t.*,
lag(created_at) over (order by created_at) as prevca,
(case when extract(epoch from created_at - lag(created_at) over (order by created_at)) < 130
then 0 else 1
end) as GroupStartFlag
from t
) t
) t
group by Grouping;
The final step is the aggregate by the "grouping" identifier to get the earliest and latest dates.

make a select distinct sorting the results according to another column

I have this table:
| DAY | TRIMESTER |
Day is an integer value, always increasing (it counts the seconds passing from day 0). TRIMESTER contains a String value ('FIRST','SECOND','THIRD',etc). I need to get the list of trimesters in the right order.
SELECT DISTINCT TRIMESTER FROM table
returns:
| TRIMESTER |
| FIRST |
| THIRD |
| SECOND |
I have assessed that this would solve my problem:
SELECT DISTINCT TRIMESTER, SUM(DAY) FROM table GROUP BY TRIMESTER ORDER BY SUM(DAY)
Is there a nicer solution which would output what I need and that would require less computing done by the database? The database is Oracle 11g and the tables are supposed to become very big.
SAMPLE DATA:
| DAY | TRIMESTER |
| 0 | FIRST |
| 10 | FIRST |
| 12 | FIRST |
| 20 | FIRST |
| 30 | SECOND |
| 35 | SECOND |
| 46 | THIRD |
I need to get in order: 'FIRST','SECOND' and 'THIRD'. Anyway I have no control over the keys in the TRIMESTER column. They are strings and might just be any string, I can't order them by name. I only know that they cover a "range" of DAY values. E.g. if I had values of "DAY" between 31 and 34 in the example, they'd all have a "SECOND" value in the trimester column.
Using GROUP BY:
select TRIMESTER
from MyTable
group by trimester
order by max(DAY)
SQL Fiddle Example #1
Using RANK and PARTITION:
SELECT TRIMESTER
FROM (
SELECT TRIMESTER,
RANK() OVER (partition by TRIMESTER ORDER BY DAY DESC) DAYRANK
FROM MyTable)
WHERE DAYRANK = 1;
SQL Fiddle Example #2
This should do it:
SELECT TRIMESTER
FROM MY_TABLE
GROUP BY TRIMESTER
ORDER BY MIN (DAY);