Aggregate results split by day - sql

I'm trying to write a query that returns summarised data, per day, over many day's of data.
For example
| id | user_id | start
|----|---------|------------------------------
| 1 | 1 | 2020-02-01T17:35:37.242+00:00
| 2 | 1 | 2020-02-01T13:25:21.344+00:00
| 3 | 1 | 2020-01-31T16:42:51.344+00:00
| 4 | 1 | 2020-01-30T06:44:55.344+00:00
The outcome I'm hoping for is a function that I can pass in a the userid and timezone, or UTC offset, and get out:
| day | count |
|---------|-------|
| 1/2/20 | 2 |
| 31/1/20 | 1 |
| 30/1/20 | 7 |
Where the count is all the rows that have a start time falling between 00:00:00.0000 and 23:59:59.9999 on each day - taking into consideration the supplied UTC offset.
I don't really know where to start writing a query like this, and I the fact I can't even picture where to start feels like a big gap in my SQL thinking. How should I approach something like this?

You can use:
select date_trunc('day', start) as dte, count(*)
from t
where userid = ?
group by date_trunc('day', start)
order by dte;
If you want to handle an additional offset, build that into the query:
select dte, count(*)
from t cross join lateral
(values (date_trunc('day', start + ? * interval '1 hour'))) v(dte)
where userid = ?
group by v.dte
order by v.dte;

Related

How can I create time range grouping in window function SQL

I'm trying to create a grouping using multiple window function on SQL, the objective is to discern between different groups if there are some other groups in the middle. see below table
Part | time | expected result |
a | 11-29-2022 00:05:00.000 | 1 |
a | 11-29-2022 00:05:00.010 | 1 |
b | 11-29-2022 00:06:00.000 | 2 |
c | 11-29-2022 00:15:00.000 | 3 |
c | 11-29-2022 00:15:00.000 | 3 |
b | 11-29-2022 00:40:00.010 | 4 |
b | 11-29-2022 00:40:00.020 | 4 |
b | 11-29-2022 00:40:00.020 | 4 |
b | 11-29-2022 00:40:00.030 | 4 |
I'm doing something like:
Select part, time, count(*) over(Partition by Part order by time )
Lets focus in part "b", first occurrence is at minute 6, after that appears different parts and part b appears again at minute 40 so I need something like a time range to create the grouping
Also notice that sometimes the time is different in milliseconds even if the parts are consecutive (part b), those must belong to the same group.
Was trying to use the Rank window function but with 'range between' wasn't able to get that result.
Thanks!
Just another option via dense_rank()
Select *
,NewValue = dense_rank() over (order by convert(varchar(25),[Time],120))
From YourTable
Results
Please try this sql query.
Select part, time, dense_rank() over(Partition by Part )
or
Select part, time, dense_rank() over(Partition by Part order by time rows between unbounded preceding and unbounded following )

counting values with almost the same time

Good afternoon. What is the essence of the matter, the train has a geotag that determines its position in space. Location data is entered into a table. It is required to count how many times the train was in a certain timezone. But the problem is that being in a certain time zone, the geotag leaves several records in the table by time. What query can be used to count the number of arrivals?
I created a query that counts how many times the train was at point 270 and at point 289. To do this, I rounded the time to hours, but the problem is that if the train arrived at the end of the hour, but left at the beginning of the next, the query counts it as two arrivals . Below I will attach the query itself and the output results.
Create temp table tmpTable_1 ON COMMIT DROP as
select addr,zone_id,DATE_PART('hour',time)*100 as IntTime from trac_path_rmp where time between '2022.04.06' and '2022.04.07';
Create temp table tmpTable_2 ON COMMIT DROP as select addr,zone_id,IntTime from tmpTable_1 where addr in (12421,12422,12423,12425) group by addr,zone_id,IntTime;
select addr,sum(case when zone_id=289 then 1 else 0 end) as "Zone 289", sum(case when zone_id=270 then 1 else 0 end) as "Zone 270" from tmpTable_2 group by addr order by addr;
We can use LAG OVER() to get the timestamp of the previous row and only return the rows when there is at least a minutes difference. We could easily modify this: to 5 minutes for example.
We also keep the first row where LAG returns null.
We need to use hours and minutes because if we only use minutes we will get 0 time difference when there is exactly an hour between rows.
See dbFiddle link below.
;WITH CTE AS
(SELECT
*,
time_ - LAG(time_) OVER (ORDER BY id) AS dd
FROM table_name)
SELECT
id,time_,addr,x,y,z,zone_id,type
FROM cte
WHERE DATE_PART('hours',dd) + 60 * DATE_PART('minutes',dd) > 0
OR dd IS null;
id | time_ | addr | x | y | z | zone_id | type
--: | :------------------ | ----: | ------: | ------: | ------: | ------: | ---:
138 | 2022-04-06 19:19:11 | 12421 | 9793.50 | 4884.70 | -125.00 | 270 | 1
141 | 2022-04-06 20:37:23 | 12421 | 9736.00 | 4856.90 | -125.00 | 270 | 1
146 | 2022-04-06 22:58:15 | 12421 | 9736.00 | 4856.90 | -125.00 | 270 | 1
db<>fiddle here

Comparing two tables that are the same and listing out the max date

I was wondering if it's possible to compare dates within the same table with same ID, but the catch is that there is an additional column that display the status. For instance, here's a table A:
The results I would like to see is this:
I know I could use a group by and max aggregate with ID to find the max date; however, I would like the status (Running/Stopped) column associated to be there. It would help me a lot.
In most databases, the fastest method (assuming the right indexes) is a correlated subquery:
select t.*
from t
where t.date = (select max(t2.date) from t t2 where t2.id = t.id);
Even if not the fastest, this should work in any database.
In case of Oracle, you can use the KEEP clause like this:
SELECT t.id,
MAX(t.status) KEEP (DENSE_RANK LAST ORDER BY t."DATE") AS corresponding_status,
MAX(t."DATE") AS last_date
FROM tab t
GROUP BY t.id
ORDER BY 1
For this sample data:
+----+---------+------------+
| ID | STATUS | DATE |
+----+---------+------------+
| 1 | Running | 2018-02-03 |
| 1 | Stopped | 2018-04-04 |
| 2 | Running | 2018-03-24 |
| 2 | Stopped | 2018-01-02 |
| 3 | Running | 2018-06-12 |
| 3 | Stopped | 2018-06-12 |
+----+---------+------------+
This would return this result:
+----+----------------------+------------+
| ID | CORRESPONDING_STATUS | LAST_DATE |
+----+----------------------+------------+
| 1 | Stopped | 2018-04-04 |
| 2 | Running | 2018-03-24 |
| 3 | Stopped | 2018-06-12 |
+----+----------------------+------------+
As can be seen in this SQL Fiddle.
For the cases, when you have multiple entries on the same ID and DATE combination, it'll choose one STATUS value - in this case the last one (based on alphanumerical sorting), as I've used MAX on the STATUS.
The part LAST ORDER BY t."DATE" corresponds to how we choose DATE value in the group, i.e. by choosing the last DATE in the group.
See this Oracle Docs entry on more details.

How can I group by the difference of a column between rows in SQL?

I have a table of events with a created_at timestamp. I want to divide them into groups of events that are N seconds apart, specifically 130 seconds. Then for each group, I just need to know the lowest timestamp and the highest timestamp.
Here's some sample data (ignore the formatting of the timestamp, it's a datetime field):
------------------------
| id | created_at |
------------------------
| 1 | 2013-1-20-08:00 |
| 2 | 2013-1-20-08:01 |
| 3 | 2013-1-20-08:05 |
| 4 | 2013-1-20-08:07 |
| 5 | 2013-1-20-08:09 |
| 6 | 2013-1-20-08:12 |
| 7 | 2013-1-20-08:20 |
------------------------
And what I would like to get as a result is:
-------------------------------------
| started_at | ended_at |
-------------------------------------
| 2013-1-20-08:00 | 2013-1-20-08:01 |
| 2013-1-20-08:05 | 2013-1-20-08:09 |
| 2013-1-20-08:12 | 2013-1-20-08:12 |
| 2013-1-20-08:20 | 2013-1-20-08:20 |
-------------------------------------
I've googled and searched every possible way of phrasing that question and experimented for some time, but I can't figure it out. I can already do this in Ruby, I'm just trying to figure out if it's possible to move this to the database level. If you're curious or it's easier to visualize, here's what it looks like in Ruby:
groups = SortedSet[*events].divide { |a,b| (a.created_at - b.created_at).abs <= 130 }
groups.map do |group|
{ started_at: group.to_a.first.created_at, ended_at: group.to_a.last.created_at }
end
Does anyone know how to do this in SQL, specifically PostgreSQL?
I think you want to start each new grouping when the difference from the previous is greater than 130 seconds. You can do this with lag and date arithmetic to determine where a grouping starts. Then do a cumulative sum to get the grouping:
select Grouping, min(created_at), max(created_at)
from (select t.*, sum(GroupStartFlag) over (order by created_at) as Grouping
from (select t.*,
lag(created_at) over (order by created_at) as prevca,
(case when extract(epoch from created_at - lag(created_at) over (order by created_at)) < 130
then 0 else 1
end) as GroupStartFlag
from t
) t
) t
group by Grouping;
The final step is the aggregate by the "grouping" identifier to get the earliest and latest dates.

make a select distinct sorting the results according to another column

I have this table:
| DAY | TRIMESTER |
Day is an integer value, always increasing (it counts the seconds passing from day 0). TRIMESTER contains a String value ('FIRST','SECOND','THIRD',etc). I need to get the list of trimesters in the right order.
SELECT DISTINCT TRIMESTER FROM table
returns:
| TRIMESTER |
| FIRST |
| THIRD |
| SECOND |
I have assessed that this would solve my problem:
SELECT DISTINCT TRIMESTER, SUM(DAY) FROM table GROUP BY TRIMESTER ORDER BY SUM(DAY)
Is there a nicer solution which would output what I need and that would require less computing done by the database? The database is Oracle 11g and the tables are supposed to become very big.
SAMPLE DATA:
| DAY | TRIMESTER |
| 0 | FIRST |
| 10 | FIRST |
| 12 | FIRST |
| 20 | FIRST |
| 30 | SECOND |
| 35 | SECOND |
| 46 | THIRD |
I need to get in order: 'FIRST','SECOND' and 'THIRD'. Anyway I have no control over the keys in the TRIMESTER column. They are strings and might just be any string, I can't order them by name. I only know that they cover a "range" of DAY values. E.g. if I had values of "DAY" between 31 and 34 in the example, they'd all have a "SECOND" value in the trimester column.
Using GROUP BY:
select TRIMESTER
from MyTable
group by trimester
order by max(DAY)
SQL Fiddle Example #1
Using RANK and PARTITION:
SELECT TRIMESTER
FROM (
SELECT TRIMESTER,
RANK() OVER (partition by TRIMESTER ORDER BY DAY DESC) DAYRANK
FROM MyTable)
WHERE DAYRANK = 1;
SQL Fiddle Example #2
This should do it:
SELECT TRIMESTER
FROM MY_TABLE
GROUP BY TRIMESTER
ORDER BY MIN (DAY);