Group by days of a month in CockroachDB - sql

In CockroachDB, I want to have such this query on a specific month for its every day:
select count(*), sum(amount)
from request
where code = 'code_string'
and created_at >= '2022-07-31T20:30:00Z' and created_at < '2022-08-31T20:30:00Z'
the problem is that I want it on my local date. What should I do?
My goal is:
"month, day, count, sum" as result columns for a month.
UPDATE:
I have found a suitable query for this purpose:
select count(amount), sum(amount), extract(month from created_at) as monthTime, extract(day from created_at) as dayTime
from request
where code = 'code_string' and created_at >= '2022-07-31T20:30:00Z' and created_at < '2022-08-31T20:30:00Z'
group by dayTime, monthTime
Thanks to #histocrat for easier answer :) by replacing
extract(month from created_at) as monthTime, extract(day from created_at) as dayTime
by this:
date_part('month', created_at) as monthTime, date_part('day', created_at) as dayTime

To group results by both month and day, you can use the date_part function.
select month, day, count(*), sum(things)
from request
where code = 'code_string'
group by date_part('month', created_at) as month, date_part('day', created_at) as day;
Depending on what type created_at is, you may need to cast or convert it first (for example, group by date_part('month', created_at::timestamptz)).

Related

PostgreSQL: Simplifying a SQL query into a shorter query

I have a table called 'daily_prices' where I have 'sale_date', 'last_sale_price', 'symbol' as columns.
I need to calculate how many times 'last_sale_price' has gone up compared to previous day's 'last_sale_price' in 10 weeks.
Currently I have my query like this for 2 weeks:
select count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp", 'wk1' as "week"
from
(
select last_sale_price, LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where sale_date <= CAST('2020-09-18' AS DATE) AND sale_date >= CAST('2020-09-14' AS DATE)
and symbol='AAPL'
) nest
where last_sale_price > prev_price
UNION
select count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp", 'wk2' as "week"
from
(
select last_sale_price, LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where sale_date <= CAST('2020-09-11' AS DATE) AND sale_date >= CAST('2020-09-07' AS DATE)
and symbol='AAPL'
) nest
where last_sale_price > prev_price
I'm using 'UNION' to combine the weekly data. But as the number of weeks increase the query is going to be huge.
Is there a simpler way to write this query?
Any help is much appreciated. Thanks in advance.
you can extract week from sale_date. then apply group by on the upper query
select EXTRACT(year from sale_date) YEAR, EXTRACT('week' FROM sale_date) week, count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp"
from (
select
sale_date,
last_sale_price,
LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where symbol='AAPL'
)
where last_sale_price > prev_price
group by EXTRACT(year from sale_date), EXTRACT('week' FROM sale_date)
to extract only weekdays you can add this filter
EXTRACT(dow FROM sale_date) in (1,2,3,4,5)
PS: make sure that monday is first day of the week. In some countries sunday is the first day of the week
You can filter on the last 8 weeks in the where clause, then group by week and do conditional aggregation:
select extract(year from sale_date) yyyy, extract(week from saledate) ww,
sum(last_sale_price - lag_last_sale_price) filter(where lag_last_sale_price > last_sale_price) sum_dollars_up,
count(*) filter(where lag_last_sale_price > last_sale_price) cnt_dollars_up
from (
select dp.*,
lag(last_sale_price) over(partition by extract(year from sale_date), extract(week from saledate) order by sale_date) lag_last_sale_price
from daily_price
where symbol = 'AAPL'
and sale_date >= date_trunc('week', current_date) - '8 week'::interval
) dp
group by 1, 2
Notes:
I am asssuming that you don't want to compare the first price of a week to the last price of the previous week; if you do, then just remove the partition by clause from the over() clause of lag()
this dynamically computes the date as of 8 (entire) weeks ago
if there is no price increase during a whole week, the query still gives you a row, with 0 as sum_dollars_up and cnt_dollars_up

Can I reduce the number of SQL queries here (Postgresql)?

It's been a while since I've touched SQL.
I'm working on a pretty large database.
In a certain table which has some 30 million rows I'm trying to figure out when the highest number of entries was made for a certain period e.g. a year, down to the detail-level of one hour.
What I do now is something like this:
For the year 2018:
Find month with highest entry number for 2018 (i.e. 12 queries):
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD') like '2018-01-%'
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD') like '2018-02-%'
After I find the month with the highest number I must find the day (i.e. up to 31 queries) :
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD') = '2018-01-01'
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD') = '2018-01-02'
After I find the day with the highest number I must find the hour (i.e. 24 queries):
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD HH24:MI:SS') >= '2018-01-02 08:00:00'
and to_char(create_time, 'YYYY-MM-DD HH24:MI:SS') <= '2018-01-02 08:59:59'
As you can see this is a tedious task. So my question is, if and how I can optimize this process?
The database is a PostgreSQL, and I'm using the pgadmin.
Thanks in advance.
Youy can use GROUP BY and the date_part function to simplify things
SELECT date_part('month', create_time), count(*)
FROM sing
WHERE date_part('year', create_time) = 2018
GROUP BY date_part('month', create_time)
and then for the day
SELECT date_part('day', create_time), count(*)
FROM sing
WHERE date_part('year', create_time) = 2018
AND date_part('month', create_time) = <month from previous query>
GROUP BY date_part('day', create_time)
and so on
For the year 2018 would be 1 query:
select count(*) from sing where date_part('year', create_time) = '2018'
So you can use better date_part then to_char I think
https://www.w3resource.com/PostgreSQL/date_part-function.php

Visits per Isoweek in big query

I am trying to pull visits per isoweek from big query.
however I am failing with the date transformation.
Could you support?
StandardSQL
SELECT count (visitid) as Sessions, date,
EXTRACT (ISOYEAR FROM date) AS isoyear
FROM `xxx_*`
WHERE _TABLE_SUFFIX BETWEEN '201806020' AND '20180630'
GROUP BY date
order by date DESC
Have you tried a query like this?
SELECT EXTRACT(ISOYEAR FROM date) as yyyy,
EXTRACT(ISOWEEK FROM DATE) as ww,
COUNT(*) as Sessions
FROM `xxx_*`
WHERE _TABLE_SUFFIX BETWEEN '201806020' AND '20180630'
GROUP BY yyyy, ww
ORDER BY MIN(date) DESC;

Postgres - Cohort analysis across months sequentially, not if exists in any later month

I'm doing a cohort analysis and can get the group of users to examine, then see whether they transacted in the months following on. But I want it like this:
Of that group in December, who transacted in Jan; of the Jan group from Dec, who transacted in Feb. Basically i'm tracking decay of the customer base
What I don't want is those that return in any month following Dec, which is this:
WITH start_sample AS (
SELECT
user_fk,
created_at AS start_sample_date
FROM transactions
WHERE created_at >= '2016-11-01' AND created_at < '2016-12-01'
GROUP BY user_fk,
start_sample_date),
start_sample_min AS (
SELECT
user_fk,
MIN(start_sample_date) AS first_transaction
FROM start_sample
GROUP BY user_fk
)
SELECT
DATE_TRUNC('month', created_at) AS transacting_month,
COUNT(DISTINCT user_fk)
FROM transactions
WHERE created_at >= '2016-11-01'
AND t.user_fk IN(SELECT user_fk FROM start_sample_min)
GROUP BY transacting_month
ORDER BY transacting_month;
Then I made a churn model to see if it would get what I need, but it doesn't:
WITH monthly_users AS (
SELECT
user_fk AS monthly_user_fk,
DATE_TRUNC('month', created_at) AS month
FROM transactions
WHERE created_at >= '2016-11-01' AND created_at < '2017-12-01'
GROUP BY monthly_user_fk, month
ORDER BY monthly_user_fk, month
),
lag_lead AS (
SELECT
monthly_user_fk,
month,
LAG(month,1) OVER (PARTITION BY monthly_user_fk ORDER BY month) AS lag,
LEAD(month,1) OVER (PARTITION BY monthly_user_fk ORDER BY month) AS lead
FROM monthly_users),
lag_lead_with_diffs AS (
SELECT
monthly_user_fk,
month,
lag AS previous_month,
lead AS next_month,
EXTRACT(EPOCH FROM (month - lag)/86400)::INT AS lag_size,
EXTRACT(EPOCH FROM (lead - month)/86400)::INT AS lead_size
FROM lag_lead
),
calculated AS (
SELECT
month,
CASE WHEN previous_month IS NULL THEN 'ACTIVATION'
WHEN lag_size <= 31 THEN 'ACTIVE'
WHEN lag_size > 31 THEN 'RETURN' END AS this_month_values,
CASE WHEN (lead_size > 31 OR lead_size IS NULL) THEN 'CHURN' ELSE NULL END AS next_month_churn,
COUNT(DISTINCT monthly_user_fk) AS c_d_users
FROM lag_lead_with_diffs
GROUP BY month, 2, 3
)
SELECT
month,
this_month_values,
SUM(c_d_users) AS distinct_users
FROM calculated
GROUP BY month, this_month_values
UNION
SELECT month + INTERVAL '1 month',
'CHURN',
SUM(c_d_users)
FROM calculated
WHERE next_month_churn IS NOT NULL
GROUP BY month + INTERVAL '1 month', 2
HAVING (EXTRACT(EPOCH FROM (month + INTERVAL '1 month'))) < 1512086400
ORDER BY month, this_month_values;
However this is not fixed at the initial group. The Active group rolls from month to month.
I understand that the above is likely more complicated than what i'm asking, but I can't seem to get my head around it
Thanks in advance
Perhaps this is what you are looking for:
with Monthly_Users as (
select user_fk
, date_trunc('month',created_at) as month
, (date_part('year', created_at) - 2016) * 12
+ date_part('month', created_at) - 11 as Months_Between
from transactions
where created_at between date '2016-11-01'
and date '2017-12-01'
group by user_fk, month, months_between
), t2 as (
select Monthly_Users.*
, count(*) over (partition by user_fk
order by month rows between unbounded preceding
and 1 preceding) prev_rec_cnt
from Monthly_Users
)
select month
, count(*)
from t2
where Months_Between = Prev_Rec_Cnt
group by month
order by month;
In this query the Monthly_Users CTE is just like yours, but adds a computation of the number of Months_Between the created_at date and your initial starting date. In the second Common Table Expression, I count the number of occurrences of each user_fk prior to the current months record. Finally in the output query I limit the results to only those records where the Months_Between value matches the Prev_Rec_Cnt value. Any missed months will cause the Prev_Rec_Cnt value to not match the Months_Between value, so you'll be able to see the fall off of user_fk values from month to month.

unable to typecast timestamp to date in Group By

I am unable to typecast timestamp to date type in the Group By of my SQL Select statement.
SELECT geography_id,
listed_at::DATE,
EXTRACT(YEAR FROM listed_at) AS year,
EXTRACT(MONTH FROM listed_at) AS month,
EXTRACT(day FROM listed_at) AS day,
Count(*) AS active_listing_count,
SUM(list_price) AS sum_of_listing_price,
Date_part('day', current_date :: timestamp - listed_at :: timestamp) AS days_on_market,
COUNT(num_bathrooms) AS total_bathrooms,
COUNT(num_bedrooms) AS total_bedrooms
FROM properties
WHERE expired_at IS NULL
GROUP BY geography_id,
listed_at::DATE
ORDER BY listed_at::DATE DESC;
I am getting this error:
ERROR: column "properties.listed_at" must appear in the GROUP BY clause or be used in an aggregate function
Each occurrence of listed_at in select list should be casted to date:
SELECT geography_id,
listed_at::DATE,
EXTRACT(YEAR FROM listed_at::date) AS year,
EXTRACT(MONTH FROM listed_at::date) AS month,
EXTRACT(day FROM listed_at::date) AS day,
count(*) AS active_listing_count,
SUM(list_price) AS sum_of_listing_price,
date_part('day', current_date::timestamp - listed_at::date) AS days_on_market,
COUNT(num_bathrooms) AS total_bathrooms,
COUNT(num_bedrooms) AS total_bedrooms
FROM properties
WHERE expired_at IS NULL
GROUP BY geography_id,
listed_at::DATE
ORDER BY listed_at::DATE DESC;