SQL - select most 'active' time from db - sql

Very closely related to SQL - Select most 'active' timespan fromdb but different question.
"I have a table of transactions. In this table I store the transaction datetime in UTC. I have a few months of data, about 20,000 transactions a day."
How would change
select datepart(hour, the_column) as [hour], count(*) as total
from t
group by datepart(hour, the_column)
order by total desc
so that I can select the specific year, month, day, hour, minute, and second that was the most 'active'.
To clarify, I'm not looking for which hour or minute of the day was most active. Rather, which moment in time was the most active.

Select
DATEPART(year, the_column) as year
,DATEPART(dayofyear,the_column) as day
,DATEPART(hh, the_column) as hour
,DATEPART(mi,the_column) as minute
,DATEPART(ss, the_column) as second
,count(*) as count from t
Group By
DATEPART(year, the_column)
, DATEPART(dayofyear,the_column)
, DATEPART(hh, the_column)
, DATEPART(mi,the_column)
, DATEPART(ss, the_column)
order by count desc

If minute resolution is enough:
select top 1 cast(the_column as smalldatetime) as moment, count(*) as total
from t
group by cast(the_column as smalldatetime)
order by total desc

Related

Group by days of a month in CockroachDB

In CockroachDB, I want to have such this query on a specific month for its every day:
select count(*), sum(amount)
from request
where code = 'code_string'
and created_at >= '2022-07-31T20:30:00Z' and created_at < '2022-08-31T20:30:00Z'
the problem is that I want it on my local date. What should I do?
My goal is:
"month, day, count, sum" as result columns for a month.
UPDATE:
I have found a suitable query for this purpose:
select count(amount), sum(amount), extract(month from created_at) as monthTime, extract(day from created_at) as dayTime
from request
where code = 'code_string' and created_at >= '2022-07-31T20:30:00Z' and created_at < '2022-08-31T20:30:00Z'
group by dayTime, monthTime
Thanks to #histocrat for easier answer :) by replacing
extract(month from created_at) as monthTime, extract(day from created_at) as dayTime
by this:
date_part('month', created_at) as monthTime, date_part('day', created_at) as dayTime
To group results by both month and day, you can use the date_part function.
select month, day, count(*), sum(things)
from request
where code = 'code_string'
group by date_part('month', created_at) as month, date_part('day', created_at) as day;
Depending on what type created_at is, you may need to cast or convert it first (for example, group by date_part('month', created_at::timestamptz)).

PostgreSQL: Simplifying a SQL query into a shorter query

I have a table called 'daily_prices' where I have 'sale_date', 'last_sale_price', 'symbol' as columns.
I need to calculate how many times 'last_sale_price' has gone up compared to previous day's 'last_sale_price' in 10 weeks.
Currently I have my query like this for 2 weeks:
select count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp", 'wk1' as "week"
from
(
select last_sale_price, LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where sale_date <= CAST('2020-09-18' AS DATE) AND sale_date >= CAST('2020-09-14' AS DATE)
and symbol='AAPL'
) nest
where last_sale_price > prev_price
UNION
select count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp", 'wk2' as "week"
from
(
select last_sale_price, LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where sale_date <= CAST('2020-09-11' AS DATE) AND sale_date >= CAST('2020-09-07' AS DATE)
and symbol='AAPL'
) nest
where last_sale_price > prev_price
I'm using 'UNION' to combine the weekly data. But as the number of weeks increase the query is going to be huge.
Is there a simpler way to write this query?
Any help is much appreciated. Thanks in advance.
you can extract week from sale_date. then apply group by on the upper query
select EXTRACT(year from sale_date) YEAR, EXTRACT('week' FROM sale_date) week, count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp"
from (
select
sale_date,
last_sale_price,
LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where symbol='AAPL'
)
where last_sale_price > prev_price
group by EXTRACT(year from sale_date), EXTRACT('week' FROM sale_date)
to extract only weekdays you can add this filter
EXTRACT(dow FROM sale_date) in (1,2,3,4,5)
PS: make sure that monday is first day of the week. In some countries sunday is the first day of the week
You can filter on the last 8 weeks in the where clause, then group by week and do conditional aggregation:
select extract(year from sale_date) yyyy, extract(week from saledate) ww,
sum(last_sale_price - lag_last_sale_price) filter(where lag_last_sale_price > last_sale_price) sum_dollars_up,
count(*) filter(where lag_last_sale_price > last_sale_price) cnt_dollars_up
from (
select dp.*,
lag(last_sale_price) over(partition by extract(year from sale_date), extract(week from saledate) order by sale_date) lag_last_sale_price
from daily_price
where symbol = 'AAPL'
and sale_date >= date_trunc('week', current_date) - '8 week'::interval
) dp
group by 1, 2
Notes:
I am asssuming that you don't want to compare the first price of a week to the last price of the previous week; if you do, then just remove the partition by clause from the over() clause of lag()
this dynamically computes the date as of 8 (entire) weeks ago
if there is no price increase during a whole week, the query still gives you a row, with 0 as sum_dollars_up and cnt_dollars_up

Visits per Isoweek in big query

I am trying to pull visits per isoweek from big query.
however I am failing with the date transformation.
Could you support?
StandardSQL
SELECT count (visitid) as Sessions, date,
EXTRACT (ISOYEAR FROM date) AS isoyear
FROM `xxx_*`
WHERE _TABLE_SUFFIX BETWEEN '201806020' AND '20180630'
GROUP BY date
order by date DESC
Have you tried a query like this?
SELECT EXTRACT(ISOYEAR FROM date) as yyyy,
EXTRACT(ISOWEEK FROM DATE) as ww,
COUNT(*) as Sessions
FROM `xxx_*`
WHERE _TABLE_SUFFIX BETWEEN '201806020' AND '20180630'
GROUP BY yyyy, ww
ORDER BY MIN(date) DESC;

Postgres - Cohort analysis across months sequentially, not if exists in any later month

I'm doing a cohort analysis and can get the group of users to examine, then see whether they transacted in the months following on. But I want it like this:
Of that group in December, who transacted in Jan; of the Jan group from Dec, who transacted in Feb. Basically i'm tracking decay of the customer base
What I don't want is those that return in any month following Dec, which is this:
WITH start_sample AS (
SELECT
user_fk,
created_at AS start_sample_date
FROM transactions
WHERE created_at >= '2016-11-01' AND created_at < '2016-12-01'
GROUP BY user_fk,
start_sample_date),
start_sample_min AS (
SELECT
user_fk,
MIN(start_sample_date) AS first_transaction
FROM start_sample
GROUP BY user_fk
)
SELECT
DATE_TRUNC('month', created_at) AS transacting_month,
COUNT(DISTINCT user_fk)
FROM transactions
WHERE created_at >= '2016-11-01'
AND t.user_fk IN(SELECT user_fk FROM start_sample_min)
GROUP BY transacting_month
ORDER BY transacting_month;
Then I made a churn model to see if it would get what I need, but it doesn't:
WITH monthly_users AS (
SELECT
user_fk AS monthly_user_fk,
DATE_TRUNC('month', created_at) AS month
FROM transactions
WHERE created_at >= '2016-11-01' AND created_at < '2017-12-01'
GROUP BY monthly_user_fk, month
ORDER BY monthly_user_fk, month
),
lag_lead AS (
SELECT
monthly_user_fk,
month,
LAG(month,1) OVER (PARTITION BY monthly_user_fk ORDER BY month) AS lag,
LEAD(month,1) OVER (PARTITION BY monthly_user_fk ORDER BY month) AS lead
FROM monthly_users),
lag_lead_with_diffs AS (
SELECT
monthly_user_fk,
month,
lag AS previous_month,
lead AS next_month,
EXTRACT(EPOCH FROM (month - lag)/86400)::INT AS lag_size,
EXTRACT(EPOCH FROM (lead - month)/86400)::INT AS lead_size
FROM lag_lead
),
calculated AS (
SELECT
month,
CASE WHEN previous_month IS NULL THEN 'ACTIVATION'
WHEN lag_size <= 31 THEN 'ACTIVE'
WHEN lag_size > 31 THEN 'RETURN' END AS this_month_values,
CASE WHEN (lead_size > 31 OR lead_size IS NULL) THEN 'CHURN' ELSE NULL END AS next_month_churn,
COUNT(DISTINCT monthly_user_fk) AS c_d_users
FROM lag_lead_with_diffs
GROUP BY month, 2, 3
)
SELECT
month,
this_month_values,
SUM(c_d_users) AS distinct_users
FROM calculated
GROUP BY month, this_month_values
UNION
SELECT month + INTERVAL '1 month',
'CHURN',
SUM(c_d_users)
FROM calculated
WHERE next_month_churn IS NOT NULL
GROUP BY month + INTERVAL '1 month', 2
HAVING (EXTRACT(EPOCH FROM (month + INTERVAL '1 month'))) < 1512086400
ORDER BY month, this_month_values;
However this is not fixed at the initial group. The Active group rolls from month to month.
I understand that the above is likely more complicated than what i'm asking, but I can't seem to get my head around it
Thanks in advance
Perhaps this is what you are looking for:
with Monthly_Users as (
select user_fk
, date_trunc('month',created_at) as month
, (date_part('year', created_at) - 2016) * 12
+ date_part('month', created_at) - 11 as Months_Between
from transactions
where created_at between date '2016-11-01'
and date '2017-12-01'
group by user_fk, month, months_between
), t2 as (
select Monthly_Users.*
, count(*) over (partition by user_fk
order by month rows between unbounded preceding
and 1 preceding) prev_rec_cnt
from Monthly_Users
)
select month
, count(*)
from t2
where Months_Between = Prev_Rec_Cnt
group by month
order by month;
In this query the Monthly_Users CTE is just like yours, but adds a computation of the number of Months_Between the created_at date and your initial starting date. In the second Common Table Expression, I count the number of occurrences of each user_fk prior to the current months record. Finally in the output query I limit the results to only those records where the Months_Between value matches the Prev_Rec_Cnt value. Any missed months will cause the Prev_Rec_Cnt value to not match the Months_Between value, so you'll be able to see the fall off of user_fk values from month to month.

Query data for hour by hour report

I've been tasked with a report that will show a bar chart where the X-axis is the hours of the shift, so 1-8 along the bottom. The bars are the number of transactions accomplished per hour. So the bar chart would easily let you see that in the first hour we've processed 30 orders, hour 2 we've processed 25, and so on, to the end of the shift.
I'm having trouble figuring out how to actually create this report though. Is my only option to do something like this (understand this is just pseudo-code, don't bother commenting on syntax issues):
create table #temp
(
Hour int,
Units int
)
insert into #temp
SELECT 1 as Hour, sum(Units) Units
FROM orders
WHERE DateCreated >= '6/14/2013 08:00:00' AND DateCreated < '6/14/2013 09:00:00'
insert into #temp
SELECT 2 as Hour, sum(Units) Units
FROM orders
WHERE DateCreated >= '6/14/2013 09:00:00' AND DateCreated < '6/14/2013 10:00:00'
insert into #temp
SELECT 3 as Hour, sum(Units) Units
FROM orders
WHERE DateCreated >= '6/14/2013 11:00:00' AND DateCreated < '6/14/2013 12:00:00'
.. and so on ..
select * from #temp
Also, this is in a stored procedure that the report calls.
Is there a better way to do this? Should I be just sending the entire day's data to the report and somehow handling it there? Any insights would be appreciated.
SELECT DATEPART(hh, DateCreated) AS hour, sum(Units) Units
FROM orders
WHERE DateCreated >= '6/14/2013' AND DateCreated < '6/15/2013'
GROUP BY DATEPART(hh, DateCreated)
Read more about DATEPART() here.
You can of course add more groupings for day, week, whatever you want:
SELECT DATEPART(dd, DateCreated) AS day, DATEPART(hh, DateCreated) AS hour, sum(Units) Units
FROM orders
WHERE DateCreated >= '6/10/2013' AND DateCreated < '6/15/2013'
GROUP BY DATEPART(dd, DateCreated), DATEPART(hour, DateCreated)