I have an application where we use AWS Athena. I have 2 tables viz. events and event_transactions. events table contain event information and event_transactions contain individual events and there is a column event_date which tells the day on which event occurred.
I need to calculate the count of events for each event for last 1 month interval, last 1 week interval and last 1 day from today's date.
Format:
event_name, daily_count, weekly_count, monthly_count
I need to display all 3 counts for each event in the same row.
To calculate weekly_count I use below query:
select event_name, count(*) as weekly_count from event_transactions where event_name in ('ABC','XYZ')
and (event_date >= CAST(current_date - interval '7' day as varchar)) AND (event_date <= CAST(current_date - interval '1' day as varchar))
group by 1
Output:
event_name. weekly_count
ABC. 23
XYZ. 14
How can I write a SQL query which will print all 3 counts in a single row?
Use count_if. Something along this lines:
select event_name,
count_if(event_date >= CAST(current_date - interval '7' day as varchar) AND event_date <= CAST(current_date - interval '1' day as varchar)) as weekly_count ,
... -- rest of the counts
from event_transactions
where event_name in ('ABC','XYZ')
group by 1
Also I would recommend looking into between range operator and using date_parse on event_date if it has data in consistent format.
Related
I want to get data from last month day by day, I can get the last 30 days but I just want the month as it may be less or more than 30 days,
this is the query for getting the last 30 days
SELECT Trunc(timestamp),
Count(*)
FROM table1
WHERE Trunc(timestamp) > Trunc(sysdate - 30)
GROUP BY Trunc(timestamp)
ORDER BY 1;
Also, I am using it in a shell script if I can make a variable in the script and put it the query
To get data from the start of the current month until today:
SELECT TRUNC(timestamp) AS day,
COUNT(*)
FROM table1
WHERE timestamp >= TRUNC(SYSDATE, 'MM')
AND timestamp < TRUNC(SYSDATE) + INTERVAL '1' DAY
GROUP BY TRUNC(timestamp)
ORDER BY day
To get data from the same day last month until today:
SELECT TRUNC(timestamp) AS day,
COUNT(*)
FROM table1
WHERE timestamp >= ADD_MONTHS(TRUNC(SYSDATE), -1)
AND timestamp < TRUNC(SYSDATE) + INTERVAL '1' DAY
GROUP BY TRUNC(timestamp)
ORDER BY day
db<>fiddle here
I have a postgres table "Generation" with half-hourly timestamps spanning 2009 - present with energy data:
I need to aggregate (average) the data across different intervals from specific timepoints, for example data from 2021-01-07T00:00:00.000Z for one year at 7 day intervals, or 3 months at 1 day interval or 7 days at 1h interval etc. date_trunc() partly solves this, but rounds the weeks to the nearest monday e.g.
SELECT date_trunc('week', "DATETIME") AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2021-01-07T00:00:00.000Z' AND "DATETIME" <= '2022-01-06T23:59:59.999Z'
GROUP BY week
ORDER BY week ASC
;
returns the first time series interval as 2021-01-04 with an incorrect count:
week count gas coal
"2021-01-04 00:00:00" 192 18291.34375 2321.4427083333335
"2021-01-11 00:00:00" 336 14477.407738095239 2027.547619047619
"2021-01-18 00:00:00" 336 13947.044642857143 1152.047619047619
****EDIT: the following will return the correct weekly intervals by checking the start date relative to the nearest monday / start of week, and adjusts the results accordingly:
WITH vars1 AS (
SELECT '2021-01-07T00:00:00.000Z'::timestamp as start_time,
'2021-01-28T00:00:00.000Z'::timestamp as end_time
),
vars2 AS (
SELECT
((select start_time from vars1)::date - (date_trunc('week', (select start_time from vars1)::timestamp))::date) as diff
)
SELECT date_trunc('week', "DATETIME" - ((select diff from vars2) || ' day')::interval)::date + ((select diff from vars2) || ' day')::interval AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (select start_time from vars1) AND "DATETIME" < (select end_time from vars1)
GROUP BY week
ORDER BY week ASC
returns..
week count gas coal
"2021-01-07 00:00:00" 336 17242.752976190477 2293.8541666666665
"2021-01-14 00:00:00" 336 13481.497023809523 1483.0565476190477
"2021-01-21 00:00:00" 336 15278.854166666666 1592.7916666666667
And then for any daily or hourly (swap out day with hour) intervals you can use the following:
SELECT date_trunc('day', "DATETIME") AS day,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2022-01-07T00:00:00.000Z' AND "DATETIME" < '2022-01-10T23:59:59.999Z'
GROUP BY day
ORDER BY day ASC
;
In order to select the complete week, you should change the WHERe-clause to something like:
WHERE "DATETIME" >= date_trunc('week','2021-01-07T00:00:00.000Z'::timestamp)
AND "DATETIME" < (date_trunc('week','2022-01-06T23:59:59.999Z'::timestamp) + interval '7' day)::date
This will effectively get the records from January 4,2021 until (and including ) January 9,2022
Note: I changed <= to < to stop the end-date being included!
EDIT:
when you want your weeks to start on January 7, you can always group by:
(date_part('day',(d-'2021-01-07'))::int-(date_part('day',(d-'2021-01-07'))::int % 7))/7
(where d is the column containing the datetime-value.)
see: dbfiddle
EDIT:
This will get the list from a given date, and a specified interval.
see DBFIFFLE
WITH vars AS (
SELECT
'2021-01-07T00:00:00.000Z'::timestamp AS qstart,
'2022-01-06T23:59:59.999Z'::timestamp AS qend,
7 as qint,
INTERVAL '1 DAY' as qinterval
)
SELECT
(select date(qstart) FROM vars) + (SELECT qinterval from vars) * ((date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int-(date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int % (SELECT qint FROM vars)))::int) AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (SELECT qstart FROM vars) AND "DATETIME" <= (SELECT qend FROM vars)
GROUP BY week
ORDER BY week
;
I added the WITH vars to do the variable stuff on top and no need to mess with the rest of the query. (Idea borrowed here)
I only tested with qint=7,qinterval='1 DAY' and qint=14,qinterval='1 DAY' (but others values should work too...)
Using the function EXTRACT you may calculate the difference in days, weeks and hours between your timestamp ts and the start_date as follows
Difference in Days
extract (day from ts - start_date)
Difference in Weeks
Is the difference in day divided by 7 and truncated
trunc(extract (day from ts - start_date)/7)
Difference in Hours
Is the difference in day times 24 + the difference in hours of the day
extract (day from ts - start_date)*24 + extract (hour from ts - start_date)
The difference can be used in GROUP BY directly. E.g. for week grouping the first group is difference 0, i.e. same week, the next group with difference 1, the next week, etc.
Sample Example
I'm using a CTE for the start date to avoid multpile copies of the paramater
with start_time as
(select DATE'2021-01-07' as start_ts),
prep as (
select
ts,
extract (day from ts - (select start_ts from start_time)) day_diff,
trunc(extract (day from ts - (select start_ts from start_time))/7) week_diff,
extract (day from ts - (select start_ts from start_time)) *24 + extract (hour from ts - (select start_ts from start_time)) hour_diff,
value
from test_table
where ts >= (select start_ts from start_time)
)
select week_diff, avg(value)
from prep
group by week_diff order by 1
I've been trying for hours now to write a date_trunc statement to be used in a group by where my week starts on a Friday and ends the following Thursday.
So something like
SELECT
DATE_TRUNC(...) sales_week,
SUM(sales) sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
Which would return the results for the last complete week (by those standards) as 09-13-2019.
You can subtract 4 days and then add 4 days:
SELECT DATE_TRUNC(<whatever> - INTERVAL '4 DAY') + INTERVAL '4 DAY' as sales_week,
SUM(sales) as sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
The expression
select current_date - cast(cast(7 - (5 - extract(dow from current_date)) as text) || ' days' as interval);
should always give you the previous Friday's date.
if by any chance you might have gaps in data (maybe more granular breakdowns vs just per week), you can generate a set of custom weeks and left join to that:
drop table if exists sales_weeks;
create table sales_weeks as
with
dates as (
select generate_series('2019-01-01'::date,current_date,interval '1 day')::date as date
)
,week_ids as (
select
date
,sum(case when extract('dow' from date)=5 then 1 else 0 end) over (order by date) as week_id
from dates
)
select
week_id
,min(date) as week_start_date
,max(date) as week_end_date
from week_ids
group by 1
order by 1
;
I m using this query and i want to fetch the data of last 24 hours from the events_ table...
Select
CAST(TIMESTAMP_ADD(TIMESTAMP_MICROS(event_timestamp), INTERVAL 330
MINUTE) AS date) AS event_date,
event_name,user.value.string_value as context_device_id,
(event.value.string_value) as id,
(event_param.value.string_value) as contentType
FROM ``,
UNNEST(user_properties) AS user,
UNNEST(event_params) as event,
UNNEST(event_params) as event_param
where user.key="email" and event.key="postID" and
event_param.key="article_type" and
CAST(TIMESTAMP_ADD(TIMESTAMP_MICROS(event_timestamp), INTERVAL 330
MINUTE)AS date) between DATE_SUB(current_date(), INTERVAL 1 DAY) and
DATE_SUB(current_date(),INTERVAL 0 DAY)
But I want whenever query will run gives the data of last 24 hours only
means if i running the query at event 5pm today then it should fetch the data from yesterday 5pm to today's 5pm?
You need to update your table reference to use a wildcard, which can include multiple days, and then add a filter to restrict the tables that it matches. For example, you would want something like:
...
FROM `events_*`,
UNNEST(user_properties) AS user,
UNNEST(event_params) as event,
UNNEST(event_params) as event_param
WHERE _TABLE_SUFFIX >=
FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND
user.key="email" and
...
The filter on the _TABLE_SUFFIX pseudo-column restricts the scan to the tables for today and yesterday, and then the filter on the timestamp as in your original query further restricts to a 24 hour span.
In case sql server you can use this in where clause
where event_timestamp>=dateadd (hour , -24 , getdate()) and event_timestamp<getdate()
As an alternative, GETDATE() in MSSQL will let you do integer division:
SELECT GETDATE(), GETDATE() - 1
Result at this moment: 2018-08-31 07:38:18.260
2018-08-30 07:38:18.260
So in your case BETWEEN GETDATE() - 1 AND GETDATE() will do the trick as well.
I have a table employee in Postgres:
Query:
SELECT DISTINCT month_last_date,number_of_cases,reopens,csat
FROM employee
WHERE month_last_date >=(date('2017-01-31') - interval '6 month')
AND month_last_date <= date('2017-01-31')
AND agent_id='analyst'
AND name='SAM';
Output:
But if data is not in table for other month I want column value as 0.
Generate all dates you are interested in, LEFT JOIN to the table and default to 0 with COALESCE:
SELECT DISTINCT -- see below
i.month_last_date
, COALESCE(number_of_cases, 0) AS number_of_cases -- see below
, COALESCE(reopens, 0) AS reopens
, COALESCE(csat, 0) AS csat
FROM (
SELECT date '2017-01-31' - i * interval '1 mon' AS month_last_date
FROM generate_series(0, 5) i -- see below
) i
LEFT JOIN employee e ON e.month_last_date = i.month_last_date
AND e.agent_id = 'analyst' -- see below
AND e.name = 'SAM';
Notes
If you add or subtract an interval of 1 month and the same day does not exist in the target month, Postgres defaults to the latest existing day of that moth. So this works as desired, you get the last day of each month:
SELECT date '2017-12-31' - i * interval '1 mon' -- note 31
FROM generate_series(0,11) i;
But this does not, you'd get the 28th of each month:
SELECT date '2017-02-28' - i * interval '1 mon' -- note 28
FROM generate_series(0,11) i;
The safe alternative is to subtract 1 day from the first day of the next month, like #Oto demonstrated. Related:
Daily average for the month (needs number of days in month)
Here are two optimized ways to generate a series of last days of the month - up to and including a given month:
1.
SELECT (timestamp '2017-01-01' - i * interval '1 month')::date - 1 AS month_last_date
FROM generate_series(-1, 10) i; -- generate 12 months, off-by-1
Input is the first day of the month - or calculate it from a given date or timestamp with date_trunc():
SELECT date_trunc('month', timestamp '2017-01-17')::date AS this_mon1
Subtracting an interval from a date produces a timestamp. After the cast back to date we can simply subtract an integer to subtract days.
2.
SELECT m::date - 1 AS month_last_date
FROM generate_series(timestamp '2017-02-01' - interval '11 month' -- for 12 months
, timestamp '2017-02-01'
, interval '1 mon') m;
Input is the first day of the next month - or calculate it from any given date or timestamp with:
SELECT date_trunc('month', timestamp '2017-01-17' + interval '1 month')::date AS next_mon1
Related:
How do I determine the last day of the previous month using PostgreSQL?
Create list with first and last day of month for given period
Not sure you actually need DISTINCT. Typically, (agent_id, month_last_date) would be defined unique, then remove DISTINCT ...
Be sure to use the LEFT JOIN correctly. Join conditions go into the join clause, not the WHERE clause:
Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail
Finally, default to 0 with COALESCE where NULL values are filled in by the LEFT JOIN.
Note that COALESCE cannot distinguish between actual NULL values from the right table and NULL values filled in for missing rows. If your columns are not defined NOT NULL, there may be ambiguity to address.
As I see, you need generate last days of all last 6 months, before certain date. (before "2017-01-31" in this case).
If I correctly understand, then you can use this query, which generates all of these days
SELECT (date_trunc('MONTH', mnth) + INTERVAL '1 MONTH - 1 day')::DATE
FROM
generate_series('2017-01-31'::date - interval '6 month', '2017-01-31'::date, '1 month') as mnth;
You just need LEFT JOIN this query to your existing query, and you get desirable result
Please note that this will returns 7 record (days), not 6.