BigQuery - Query for each elements

BigQuery - Query for each elements - sql

I would like to loop over several elements for a query.
Here is the query :
SELECT
timestamp_trunc(timestamp, DAY) as Day,
count(1) as Number
FROM `table`
WHERE user_id="12345" AND timestamp >= '2021-07-05 00:00:00 UTC' AND timestamp <= '2021-07-08 23:59:59 UTC'
GROUP BY 1
ORDER BY Day
So I have for the user "12345" a row counter per each day between two dates, this is perfect.
But I would like to do this query for each user_id of my table.
Thank you very much

SELECT
timestamp_trunc(timestamp, DAY) as Day,
user_id,
count(1) as Number
FROM `table`
WHERE timestamp >= '2021-07-05 00:00:00 UTC' AND timestamp <= '2021-07-08 23:59:59 UTC'
GROUP BY 1, 2
ORDER BY Day

If you know the users, then use conditional aggregation:
SELECT timestamp_trunc(timestamp, DAY) as Day,
COUNTIF(user_id = 12345) as cnt_12345,
COUNTIF(user_id = 67) as cnt_67,
COUNTIF(user_id = 89) as cnt_89
FROM `table`
WHERE timestamp >= '2021-07-05 00:00:00 UTC' AND
timestamp <= '2021-07-09 00:00:00 UTC'
GROUP BY 1
ORDER BY 1;
Note the change that I made to the time comparison as well -- so you don't have to worry about fractions of a second before midnight.

Related

Window function for average

I have this table timestamp_table and I'm using Presto SQL
timestamp | id
2021-01-01 10:00:00 | 2456
I would like to compute the number of unique IDs in the last 24 and 48 hours and I thought this could be achieved with window functions but I'm struggling. This is my proposed solution, but it needs work
SELECT COUNT(id) OVER (PARTITION BY timestamp ORDER BY timestamp RANGE BETWEEN INTERVAL '24' HOUR PRECEDING AND CURRENT ROW)

You're probably having trouble due to the PARTITION BY clause, since the COUNT will only apply to rows within the same timestamp values.
Try something like this, as a starting point:
The fiddle
SELECT *
, COUNT(id) OVER (ORDER BY timestamp RANGE BETWEEN INTERVAL '24' HOUR PRECEDING AND CURRENT ROW)
, MIN(id) OVER (ORDER BY timestamp RANGE BETWEEN INTERVAL '24' HOUR PRECEDING AND CURRENT ROW)
FROM tbl
;

I think that you can't get data for both time intervals by one table scan. Because row that is in last 24 hours must be in both groups: 24 hours and 48 hours. So you must do 2 request or union them.
select 'h24', count(distinct id)
from timestamp_table
where timestamp < current_timestamp and timestamp >= date_add(day, -1, current_timestamp)
union all
select 'h48', count(distinct id)
from timestamp_table
where timestamp < current_timestamp and timestamp >= date_add(day, -2, current_timestamp)

SQLite group by every specific interval

let's assume that I have a table with entries and these entries contains timestamp column (as Long) which is telling us when that entry arrived into a table.
Now, I want to make a SELECT query, in which I want to know how many entries came in selected interval with concrete frequency.
For example: interval is from 27.10.2020 to 30.10.2020 and frequency is 6 hours. The result of the query would tell me how many entries came in this interval in 6 hour groups.
Like:
27.10.2020 00:00:00 - 27.10.2020 06:00:00 : 2 entries
27.10.2020 06:00:00 - 27.10.2020 12:00:00 : 5 entries
27.10.2020 12:00:00 - 27.10.2020 18:00:00 : 0 entries
27.10.2020 18:00:00 - 28.10.2020 00:00:00 : 11 entries
28.10.2020 00:00:00 - 28.10.2020 06:00:00 : 8 entries
etc ...
The frequency parameter can be inserted in hours, days, weeks ...
Thank you all for you help!

First you need a recursive CTE like that returns the time intervals:
with cte as (
select '2020-10-27 00:00:00' datestart,
datetime('2020-10-27 00:00:00', '+6 hour') dateend
union all
select dateend,
min('2020-10-30 00:00:00', datetime(dateend, '+6 hour'))
from cte
where dateend < '2020-10-30 00:00:00'
)
Then you must do LEFT join of this CTE to the table and aggregate:
with cte as (
select '2020-10-27 00:00:00' datestart,
datetime('2020-10-27 00:00:00', '+6 hour') dateend
union all
select dateend,
min('2020-10-30 00:00:00', datetime(dateend, '+6 hour'))
from cte
where dateend < '2020-10-30 00:00:00'
)
select c.datestart, c.dateend, count(t.datecol) entries
from cte c left join tablename t
on datetime(t.datecol, 'unixepoch') >= c.datestart and datetime(t.datecol, 'unixepoch') < c.dateend
group by c.datestart, c.dateend
Replace tablename and datecol with the names of your table and date column.
If your date column contains milliseconds then change the ON clause to this:
on datetime(t.datecol / 1000, 'unixepoch') >= c.datestart
and datetime(t.datecol / 1000, 'unixepoch') < c.dateend

Here is one option:
select
datetime((strftime('%s', ts) / (6 * 60 * 60)) * 6 * 60 * 60, 'unixepoch') newts,
count(*) cnt
from mytable
where ts >= '2020-10-27' and ts < '2020-10-30'
group by newts
order by newts
ts represents the datetime column in your table. SQLite does not have a long datatype, so this assumes that you have a legitimate date stored as text.
The logic of the query is to turn the date to an epoch timestamp, then round it to 6 hours, which is represented by 6 * 60 * 60.

Postgresql Distinct Statement

How can i get the minutes distinct value with timestamp ...
Like , if table contains 1 minute 100 records are there...so i want count of records present or not per minute ...
For example,
SELECT DISTINCT(timestamp) FROM customers WHERE DATE(timestamp) = CURRENT_DATE
Result should be ..like
timestamp record
30-12-2019 11:30 5
30-12-2019 11:31 8

One option would be ::date conversion for timestamp column including GROUP BY :
SELECT timestamp, count(*)
FROM tab
WHERE timestamp::date = current_date
GROUP BY timestamp
Demo for current day
timestamp::date might be replaced with date(timestamp) like in your case.
Update : If the table contains data with precision upto microseconds, then
SELECT to_char(timestamp,'YYYY-MM-DD HH24:MI'), count(*)
FROM tab
WHERE date(timestamp) = current_date
GROUP BY to_char(timestamp,'YYYY-MM-DD HH24:MI')
might be considered.

Try something like the following:
SELECT DATE_TRUNC('minute', timestamp) as timestamp, COUNT(*) as record
FROM customers
WHERE DATE(timestamp) = CURRENT_DATE
GROUP BY DATE_TRUNC('minute', timestamp)
ORDER BY DATE_TRUNC('minute', timestamp)

Select Data From Multiple Days Between Certain Times (Spanning 2 days)

I need to know how many entries appear in my DB for the past 7 days with a timestamp between 23:00 & 01:00...
The Issue I have is the timestamp goes across 2 days and unsure if this is even possible in the one query.
So far I have come up with the below:
select trunc(timestamp) as DTE, extract(hour from timestamp) as HR, count(COLUMN) as Total
from TABLE
where trunc(timestamp) >= '12-NOV-19' and
extract(hour from timestamp) in ('23','00','01')
group by trunc(timestamp), extract(hour from timestamp)
order by 1,2 desc;
The result I am hoping for is something like this:
DTE | Total
20-NOV-19 5
19-NOV-19 4
18-NOV-19 4
17-NOV-19 6
Many thanks

Filter on the day first comparing it to TRUNC( SYSDATE ) - INTERVAL '7' DAY and then consider the hours by comparing the timestamp to itself truncated back to midnight with an offset of a number of hours.
select trunc(timestamp) as DTE,
extract(hour from timestamp) as HR,
count(COLUMN) as Total
from TABLE
WHERE timestamp >= TRUNC( SYSDATE ) - INTERVAL '7' DAY
AND ( timestamp <= TRUNC( timestamp ) + INTERVAL '01:00' HOUR TO MINUTE
OR timestamp >= TRUNC( timestamp ) + INTERVAL '23:00' HOUR TO MINUTE
)
group by trunc(timestamp), extract(hour from timestamp)
order by DTE, HR desc;

Subtract or add an hour to derive the date. I'm not sure what date you want to assign to each period, but the idea is:
select trunc(timestamp - interval '1' hour) as DTE,
count(*) as Total
from t
where trunc(timestamp - interval '1' hour) >= DATE '2019-11-12' and
extract(hour from timestamp) in (23, 0)
group by trunc(timestamp - interval '1' hour)
order by 1 desc;
Note: If you want times between 11:00 p.m. and 1:00 a.m., then you want the hour to be 23 or 0.

Need to simplify SQL query (Enter date at four locations)

I created a query which calculates the average of several sums over multiple tables. This needs to be run every week and how the code is made currently I need to change 4 dates in the query every time. I'm thinking this can be done more efficiently but i'm unsure how.
Select ROUND(
(Select sum (calls)
FROM (SELECT sum(ski.ANSTIME) AS calls
FROM SYNONYMS syn
JOIN SKILL ski on (syn.value = ski.split)
WHERE syn.ITEM_TYPE = 'split'
AND (SELECT (timestamp '1970-01-01 00:00:00 GMT' +numtodsinterval(ski.starttime_utc, 'SECOND'))
at time zone 'Europe/Warsaw'
FROM dual) >= '17-07-17 00:00:00 EUROPE/WARSAW' -- Date to be altered every week
AND (SELECT (timestamp '1970-01-01 00:00:00 GMT' +numtodsinterval(ski.starttime_utc, 'SECOND'))
at time zone 'Europe/Warsaw'
FROM dual) <= '24-07-17 00:00:00 EUROPE/WARSAW' -- Date to be altered every week
UNION ALL
SELECT sum(vdn.ANSTIME) AS calls
FROM SYNONYMS syn
JOIN VDN vdn on (syn.value = vdn.vdn)
WHERE syn.ITEM_TYPE = 'vdn'
AND (SELECT (timestamp '1970-01-01 00:00:00 GMT' +numtodsinterval(vdn.starttime_utc, 'SECOND'))
at time zone 'Europe/Warsaw'
FROM dual) >= '17-07-17 00:00:00 EUROPE/WARSAW' -- Date to be altered every week
AND (SELECT (timestamp '1970-01-01 00:00:00 GMT' +numtodsinterval(vdn.starttime_utc, 'SECOND'))
at time zone 'Europe/Warsaw'
FROM dual) <= '24-07-17 00:00:00 EUROPE/WARSAW')) -- Date to be altered every week
/ -- devided by
(SELECT sum (calltime)
FROM (SELECT sum(ski.acdcalls) AS calltime
FROM SYNONYMS syn
JOIN SKILL ski on (syn.value = ski.split)
WHERE syn.ITEM_TYPE = 'split'
AND (SELECT (timestamp '1970-01-01 00:00:00 GMT' +numtodsinterval(ski.starttime_utc, 'SECOND'))
at time zone 'Europe/Warsaw'
FROM dual) >= '17-07-17 00:00:00 EUROPE/WARSAW' -- Date to be altered every week
AND (SELECT (timestamp '1970-01-01 00:00:00 GMT' +numtodsinterval(ski.starttime_utc, 'SECOND'))
at time zone 'Europe/Warsaw'
FROM dual) <= '24-07-17 00:00:00 EUROPE/WARSAW' -- Date to be altered every week
UNION ALL
SELECT sum(vdn.acdcalls) AS calltime
FROM SYNONYMS syn
JOIN VDN vdn on (syn.value = vdn.vdn)
WHERE syn.ITEM_TYPE = 'vdn'
AND (SELECT (timestamp '1970-01-01 00:00:00 GMT' +numtodsinterval(vdn.starttime_utc, 'SECOND'))
at time zone 'Europe/Warsaw'
FROM dual) >= '17-07-17 00:00:00 EUROPE/WARSAW' -- Date to be altered every week
AND (SELECT (timestamp '1970-01-01 00:00:00 GMT' +numtodsinterval(vdn.starttime_utc, 'SECOND'))
at time zone 'Europe/Warsaw'
FROM dual) <= '24-07-17 00:00:00 EUROPE/WARSAW')) -- Date to be altered every week
,0) AS average
FROM dual

If I understand correctly you're trying to generate some weekly summary so instead of entering date you can try to use trunc(sysdate) for second date and trunc(sysdate - 7) for first.
Second possibility is to create temporary table (or just with statement) which will hold single date and join that to your query. Instead <= '24-07-17 00:00:00 EUROPE/WARSAW' you will have <= temp_date where temp_date comes from CTE.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery - Query for each elements - sql

SELECT timestamp_trunc(timestamp, DAY) as Day, user_id, count(1) as Number FROM `table` WHERE timestamp >= '2021-07-05 00:00:00 UTC' AND timestamp <= '2021-07-08 23:59:59 UTC' GROUP BY 1, 2 ORDER BY Day

Related

Window function for average

SQLite group by every specific interval

Postgresql Distinct Statement

Select Data From Multiple Days Between Certain Times (Spanning 2 days)

Need to simplify SQL query (Enter date at four locations)

Categories

Resources