SQLite group by every specific interval - sql

let's assume that I have a table with entries and these entries contains timestamp column (as Long) which is telling us when that entry arrived into a table.
Now, I want to make a SELECT query, in which I want to know how many entries came in selected interval with concrete frequency.
For example: interval is from 27.10.2020 to 30.10.2020 and frequency is 6 hours. The result of the query would tell me how many entries came in this interval in 6 hour groups.
Like:
27.10.2020 00:00:00 - 27.10.2020 06:00:00 : 2 entries
27.10.2020 06:00:00 - 27.10.2020 12:00:00 : 5 entries
27.10.2020 12:00:00 - 27.10.2020 18:00:00 : 0 entries
27.10.2020 18:00:00 - 28.10.2020 00:00:00 : 11 entries
28.10.2020 00:00:00 - 28.10.2020 06:00:00 : 8 entries
etc ...
The frequency parameter can be inserted in hours, days, weeks ...
Thank you all for you help!

First you need a recursive CTE like that returns the time intervals:
with cte as (
select '2020-10-27 00:00:00' datestart,
datetime('2020-10-27 00:00:00', '+6 hour') dateend
union all
select dateend,
min('2020-10-30 00:00:00', datetime(dateend, '+6 hour'))
from cte
where dateend < '2020-10-30 00:00:00'
)
Then you must do LEFT join of this CTE to the table and aggregate:
with cte as (
select '2020-10-27 00:00:00' datestart,
datetime('2020-10-27 00:00:00', '+6 hour') dateend
union all
select dateend,
min('2020-10-30 00:00:00', datetime(dateend, '+6 hour'))
from cte
where dateend < '2020-10-30 00:00:00'
)
select c.datestart, c.dateend, count(t.datecol) entries
from cte c left join tablename t
on datetime(t.datecol, 'unixepoch') >= c.datestart and datetime(t.datecol, 'unixepoch') < c.dateend
group by c.datestart, c.dateend
Replace tablename and datecol with the names of your table and date column.
If your date column contains milliseconds then change the ON clause to this:
on datetime(t.datecol / 1000, 'unixepoch') >= c.datestart
and datetime(t.datecol / 1000, 'unixepoch') < c.dateend

Here is one option:
select
datetime((strftime('%s', ts) / (6 * 60 * 60)) * 6 * 60 * 60, 'unixepoch') newts,
count(*) cnt
from mytable
where ts >= '2020-10-27' and ts < '2020-10-30'
group by newts
order by newts
ts represents the datetime column in your table. SQLite does not have a long datatype, so this assumes that you have a legitimate date stored as text.
The logic of the query is to turn the date to an epoch timestamp, then round it to 6 hours, which is represented by 6 * 60 * 60.

Related

BigQuery - Query for each elements

I would like to loop over several elements for a query.
Here is the query :
SELECT
timestamp_trunc(timestamp, DAY) as Day,
count(1) as Number
FROM `table`
WHERE user_id="12345" AND timestamp >= '2021-07-05 00:00:00 UTC' AND timestamp <= '2021-07-08 23:59:59 UTC'
GROUP BY 1
ORDER BY Day
So I have for the user "12345" a row counter per each day between two dates, this is perfect.
But I would like to do this query for each user_id of my table.
Thank you very much
SELECT
timestamp_trunc(timestamp, DAY) as Day,
user_id,
count(1) as Number
FROM `table`
WHERE timestamp >= '2021-07-05 00:00:00 UTC' AND timestamp <= '2021-07-08 23:59:59 UTC'
GROUP BY 1, 2
ORDER BY Day
If you know the users, then use conditional aggregation:
SELECT timestamp_trunc(timestamp, DAY) as Day,
COUNTIF(user_id = 12345) as cnt_12345,
COUNTIF(user_id = 67) as cnt_67,
COUNTIF(user_id = 89) as cnt_89
FROM `table`
WHERE timestamp >= '2021-07-05 00:00:00 UTC' AND
timestamp <= '2021-07-09 00:00:00 UTC'
GROUP BY 1
ORDER BY 1;
Note the change that I made to the time comparison as well -- so you don't have to worry about fractions of a second before midnight.

How to select the count of entities, where a time series date is between start and end date of that entity

I'd like to select a day series, kind of like this:
SELECT generate_series(
timestamp without time zone '2016-10-16',
timestamp without time zone '2016-10-17',
'1 day')
Further I got entities which always have a startdate and an enddate.
With the above query, I would like to select the amount (count) of entities, which have that day in the series being inside both dates.
Output would be (kinda) like this:
select output
Any help is appreciated!
edit: Here is a sample table:
table occurrences (
datestart DATE
dateend DATE
)
Schema and insert statements:
create table entities (id int, startdate date,enddate date);
insert into entities values(1, '2016-10-10','2016-10-18');
insert into entities values(1, '2016-10-10','2016-10-16');
insert into entities values(1, '2016-10-15','2016-10-16');
insert into entities values(1, '2016-10-17','2016-10-18');
Query:
with dateseries as
(
SELECT generate_series(
timestamp without time zone '2016-10-16',
timestamp without time zone '2016-10-20',
'1 day')
)
select generate_series,
(select count(*) from entities e where d.generate_series between e.startdate and e.enddate) entities_count
from dateseries d
Output:
generate_series
entities_count
2016-10-16 00:00:00
3
2016-10-17 00:00:00
2
2016-10-18 00:00:00
2
2016-10-19 00:00:00
0
2016-10-20 00:00:00
0
db<fiddle here
If I understand correctly, you can do something like this:
SELECT gs.dte, COUNT(e.entity_id)
FROM generate_series('2016-10-16'::date, '2016-10-17'::date, '1 day') gs(dte) LEFT JOIN
entities e
ON e.startdate < gs.dte + interval '1 day' AND
e.enddate >= gs.dte
GROUP BY gs.dte;
Dates seem sufficient for the logic you want to implement, but you can, of course, use timestamps instead.

SQL query SELECT time intervals

I'm trying to SELECT all the rows from a SQL database which are between an hour interval, for every day.
The datetime column is called "Dt" and has the following datetime format: 2019-10-17 16:03:43
I'd like to extract all the rows from this table where the Dt was between 22:00:00 and 02:00:00, for everyday.
SELECT *
FROM MY_TABLE
WHERE "Dt" BETWEEN '*-*- 22:00:00' AND '*-*- 02:00:00';
where * should be any...
Thanks for your support!
EDIT: I forgot to mention: I'm using the integrated SQL interpreter from DB Browser for SQLite
You need to extract the time part of the date and compare that it is within the range. Since midnight is between 22 and 2, you will need to split it to two comparisons, time between 22 and 0 and between 0 and 2.
To see how to extract the time take a look at this question.
With Postgres, assuming dt is defined as timestamp you can do the following:
SELECT *
FROM MY_TABLE
WHERE "Dt" BETWEEN "Dt"::date + time '22:00:00' and ("Dt"::date + 1) + time '02:00:00'
Or if you want to exclude timestamps at precisely 02:00:00
SELECT *
FROM MY_TABLE
WHERE "Dt" >= "Dt"::date + time '22:00:00'
and "Dt" < ("Dt"::date + 1) + time '02:00:00'
select DT_time from (
select cast (substr(to_char(Dt,'dd-mm-yyyy HH:MM:SS'),12,2) as integer ) as DT_time from MY_TABLE )
where DT_time between 2 and 22;
between 22:00:00 and 02:00:00
means:
SELECT *
FROM MY_TABLE
WHERE
substr(Dt, 12) BETWEEN '22:00:00' AND '23:59:59'
OR
substr(Dt, 12) BETWEEN '00:00:00' AND '02:00:00'
This will work ::
SELECT *
FROM MY_TABLE
WHERE DATEPART(HOUR, Dt)>22
AND DATEPART(HOUR, Dt)<2
Update :
SELECT *
FROM MY_TABLE
WHERE Dt Between DATEADD (hour,22,DATEADD(day, DATEDIFF(day, 0, Dt), 0)) AND DATEADD (hour,2,DATEADD(day, DATEDIFF(day, -1, Dt), 0))
SELECT *
FROM MY_TABLE
WHERE DATEPART(HOUR, Dt)>22
OR DATEPART(HOUR, Dt)<2
Above query work for you..
1st one will check only for particular date and consecutive next date along with your time range.
But If you don't care about dates and only looking for time interval in particular hours then 2nd one is for you.
For SQLite :
SELECT *
FROM MY_TABLE
WHERE strftime('%H','Dt')>22
OR strftime('%H','Dt')<2

How to return zero value from a count with no rows

Here is my below query which gives the count and group by each hour
SELECT ADD_SECONDS(start_time,- MINUTE(start_time) * 60 - SECOND(start_time)) as time , to_integer(to_varchar(start_time, 'DD')) as day , count(*) as count FROM SYSTEM.TABLE where start_time >= '2016-01-01 00:00:00' and start_time <= '2016-01-01 23:59:59' and place_id=1 group by ADD_SECONDS(start_time,- MINUTE(start_time) * 60 - SECOND(start_time)),to_integer(to_varchar(start_time, 'DD'))
order by ADD_SECONDS(start_time,- MINUTE(start_time) * 60 - SECOND(start_time))
But at the time period 11:00 pm to 12:00 pm I have no count so instead of not returning the row I want it to return the row with 0.
So when I went through some search I found that COALESCE can help so I tried with
SELECT COALESCE (( SELECT ADD_SECONDS(start_time,- MINUTE(start_time) * 60 - SECOND(start_time)) as time , to_integer(to_varchar(start_time, 'DD')) as day , count(*) as count FROM SYSTEM.TABLE where start_time >= '2016-01-01 00:00:00' and start_time <= '2016-01-01 23:59:59' and place_id=1 group by ADD_SECONDS(start_time,- MINUTE(start_time) * 60 - SECOND(start_time)),to_integer(to_varchar(start_time, 'DD'))
order by ADD_SECONDS(start_time,- MINUTE(start_time) * 60 - SECOND(start_time))
), 0);
But it did not also work. Any help is appreciated.
In order to 'have a row' for every hour, you need to create the hour-rows independent of your data. This could be done e.g. by using an auxiliary table that has a row for every hour.
SAP HANA provides shared default aux. tables that can be used for that purpose (these tables also facilitate date/time conversion via lookup): M_TIME_DIMENSION... . SAP HANA Docu M_TIME_DIMENSION
Just select the range of time values you are interested in from these tables and outer join it with your aggregated actual values.

Postgresql generate_series of months

I'm trying to generate a series in PostgreSQL with the generate_series function. I need a series of months starting from Jan 2008 until current month + 12 (a year out). I'm using and restricted to PostgreSQL 8.3.14 (so I don't have the timestamp series options in 8.4).
I know how to get a series of days like:
select generate_series(0,365) + date '2008-01-01'
But I am not sure how to do months.
select DATE '2008-01-01' + (interval '1' month * generate_series(0,11))
Edit
If you need to calculate the number dynamically, the following could help:
select DATE '2008-01-01' + (interval '1' month * generate_series(0,month_count::int))
from (
select extract(year from diff) * 12 + extract(month from diff) + 12 as month_count
from (
select age(current_timestamp, TIMESTAMP '2008-01-01 00:00:00') as diff
) td
) t
This calculates the number of months since 2008-01-01 and then adds 12 on top of it.
But I agree with Scott: you should put this into a set returning function, so that you can do something like select * from calc_months(DATE '2008-01-01')
You can interval generate_series like this:
SELECT date '2014-02-01' + interval '1' month * s.a AS date
FROM generate_series(0,3,1) AS s(a);
Which would result in:
date
---------------------
2014-02-01 00:00:00
2014-03-01 00:00:00
2014-04-01 00:00:00
2014-05-01 00:00:00
(4 rows)
You can also join in other tables this way:
SELECT date '2014-02-01' + interval '1' month * s.a AS date, t.date, t.id
FROM generate_series(0,3,1) AS s(a)
LEFT JOIN <other table> t ON t.date=date '2014-02-01' + interval '1' month * s.a;
You can interval generate_series like this:
SELECT TO_CHAR(months, 'YYYY-MM') AS "dateMonth"
FROM generate_series(
'2008-01-01' :: DATE,
'2008-06-01' :: DATE ,
'1 month'
) AS months
Which would result in:
dateMonth
-----------
2008-01
2008-02
2008-03
2008-04
2008-05
2008-06
(6 rows)
Well, if you only need months, you could do:
select extract(month from days)
from(
select generate_series(0,365) + date'2008-01-01' as days
)dates
group by 1
order by 1;
and just parse that into a date string...
But since you know you'll end up with months 1,2,..,12, why not just go with select generate_series(1,12);?
In the generated_series() you can define the step, which is one month in your case. So, dynamically you can define the starting date (i.e. 2008-01-01), the ending date (i.e. 2008-01-01 + 12 months) and the step (i.e. 1 month).
SELECT generate_series('2008-01-01', '2008-01-01'::date + interval '12 month', '1 month')::date AS generated_dates
and you get
1/1/2008
2/1/2008
3/1/2008
4/1/2008
5/1/2008
6/1/2008
7/1/2008
8/1/2008
9/1/2008
10/1/2008
11/1/2008
12/1/2008
1/1/2009