Count distinct calendar weeks, months in Postgresql

Count distinct calendar weeks, months in Postgresql - sql

I already know how to count how many distinct days I have in my DB :
SELECT
COUNT(DISTINCT DATE (TIME)) AS distinct_days
FROM table;
But when I tried to count distinct weeks or months, the only solution I found is super-slow...
For months:
SELECT
COUNT(DISTINCT CONCAT (EXTRACT(YEAR FROM TIME),EXTRACT(MONTH FROM TIME))) AS distinct_months
FROM table;
For weeks
SELECT
COUNT(DISTINCT CONCAT (EXTRACT(YEAR FROM TIME),EXTRACT(MONTH FROM TIME), EXTRACT(WEEK FROM TIME))) AS distinct_weeks
FROM table;
Do you have any idea(s) to optimize ?
(update) Notice:
COUNT(DISTINCT DATE_TRUNC('week', time)) AS distinct_weeks
and
COUNT(DISTINCT CONCAT (EXTRACT(YEAR FROM TIME),EXTRACT(MONTH FROM TIME), EXTRACT(WEEK FROM TIME))) AS distinct_weeks
don't have the same result (I want the second one) !
With COUNT(DISTINCT DATE_TRUNC('week', time)) you have 53 possibilities, and with COUNT(DISTINCT CONCAT (EXTRACT(YEAR FROM TIME),EXTRACT(MONTH FROM TIME), EXTRACT(WEEK FROM TIME))), possibly an infinity (e.g. 2014-01 week 1 is different of 2013-01 week 1)...

Finally, I found something at least twice faster :
for distinct weeks:
SELECT COUNT(distinct_weeks)
FROM (SELECT CONCAT(EXTRACT(YEAR FROM TIME),EXTRACT(MONTH FROM TIME),EXTRACT(WEEK FROM TIME)) AS distinct_weeks
FROM table
GROUP BY EXTRACT(YEAR FROM TIME),
EXTRACT(MONTH FROM TIME),
EXTRACT(WEEK FROM TIME)) t
for distinct months:
SELECT COUNT(distinct_months)
FROM (SELECT CONCAT(EXTRACT(YEAR FROM TIME),EXTRACT(MONTH FROM TIME)) AS distinct_months
FROM table
GROUP BY EXTRACT(YEAR FROM TIME),
EXTRACT(MONTH FROM TIME)) t

Maybe simply truncating the date is faster, because you don't need string conversion and concatenation then:
SELECT
COUNT(DISTINCT DATE_TRUNC('week', mytime)) AS distinct_weeks,
COUNT(DISTINCT DATE_TRUNC('month', mytime)) AS distinct_months,
COUNT(DISTINCT DATE_TRUNC('year', mytime)) AS distinct_years
FROM mytable;

Related

PostgreSQL: Simplifying a SQL query into a shorter query

I have a table called 'daily_prices' where I have 'sale_date', 'last_sale_price', 'symbol' as columns.
I need to calculate how many times 'last_sale_price' has gone up compared to previous day's 'last_sale_price' in 10 weeks.
Currently I have my query like this for 2 weeks:
select count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp", 'wk1' as "week"
from
(
select last_sale_price, LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where sale_date <= CAST('2020-09-18' AS DATE) AND sale_date >= CAST('2020-09-14' AS DATE)
and symbol='AAPL'
) nest
where last_sale_price > prev_price
UNION
select count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp", 'wk2' as "week"
from
(
select last_sale_price, LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where sale_date <= CAST('2020-09-11' AS DATE) AND sale_date >= CAST('2020-09-07' AS DATE)
and symbol='AAPL'
) nest
where last_sale_price > prev_price
I'm using 'UNION' to combine the weekly data. But as the number of weeks increase the query is going to be huge.
Is there a simpler way to write this query?
Any help is much appreciated. Thanks in advance.

you can extract week from sale_date. then apply group by on the upper query
select EXTRACT(year from sale_date) YEAR, EXTRACT('week' FROM sale_date) week, count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp"
from (
select
sale_date,
last_sale_price,
LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where symbol='AAPL'
)
where last_sale_price > prev_price
group by EXTRACT(year from sale_date), EXTRACT('week' FROM sale_date)
to extract only weekdays you can add this filter
EXTRACT(dow FROM sale_date) in (1,2,3,4,5)
PS: make sure that monday is first day of the week. In some countries sunday is the first day of the week

You can filter on the last 8 weeks in the where clause, then group by week and do conditional aggregation:
select extract(year from sale_date) yyyy, extract(week from saledate) ww,
sum(last_sale_price - lag_last_sale_price) filter(where lag_last_sale_price > last_sale_price) sum_dollars_up,
count(*) filter(where lag_last_sale_price > last_sale_price) cnt_dollars_up
from (
select dp.*,
lag(last_sale_price) over(partition by extract(year from sale_date), extract(week from saledate) order by sale_date) lag_last_sale_price
from daily_price
where symbol = 'AAPL'
and sale_date >= date_trunc('week', current_date) - '8 week'::interval
) dp
group by 1, 2
Notes:
I am asssuming that you don't want to compare the first price of a week to the last price of the previous week; if you do, then just remove the partition by clause from the over() clause of lag()
this dynamically computes the date as of 8 (entire) weeks ago
if there is no price increase during a whole week, the query still gives you a row, with 0 as sum_dollars_up and cnt_dollars_up

How do I make that difference from months using standardSQL (BigQuery)

I have the following query:
#standardSQL
SELECT distinct (grand_total/months) AS avg, ((grand_total/days)) AS
avg_day
FROM
(select count(searchint.id) as Total, (DATE_DIFF(DATE ({{DATE_END}}),
DATE ({{DATE_START}}), DAY)+1) AS days, ((12 * YEAR(TIMESTAMP({{DATE_END}})) +
MONTH(TIMESTAMP({{DATE_END}}))) - (12 * YEAR(TIMESTAMP({{DATE_START}}))
+ MONTH(TIMESTAMP({{DATE_START}}))) +1) AS months,
(select count(searchint.id) as Total
from `dbsearch`
where cast(replace(searchint.createdDate,'Z','')as DateTime) >=
cast({{DATE_START}} as DateTime)
and cast(replace(searchint.createdDate,'Z','')as DateTime) <=
cast(DATE_ADD(cast({{DATE_END}} as date), Interval 1 day ) as DateTime)) AS grand_total
from `dbsearch`
where cast(replace(searchint.createdDate,'Z','')as DateTime) >=
cast({{DATE_START}} as DateTime)
and cast(replace(searchint.createdDate,'Z','')as DateTime) <=
cast(DATE_ADD(cast({{DATE_END}} as date), Interval 1 day ) as DateTime)
group by date(cast(replace(searchint.createdDate,'Z','')as DateTime))
ORDER BY 2 DESC) AS groupby
However, when I try to run BigQuery it gives the following error:
Function not found: YEAR at [5:180]
I understand it's because I'm using standardSQL, but how do I make that difference from months using standardSQL?

To find difference in months between two dates you better to use DATE_DIFF()
DATE_DIFF(DATE_END, DATE_START, MONTH)

StandardSQL in BigQuery supports the ISO/ANSI-standard function for extracting date parts. This is extract():
You want:
extract(year from <datecol>)
extract(month from <datecol>)
This is explained in the documentation.

unable to typecast timestamp to date in Group By

I am unable to typecast timestamp to date type in the Group By of my SQL Select statement.
SELECT geography_id,
listed_at::DATE,
EXTRACT(YEAR FROM listed_at) AS year,
EXTRACT(MONTH FROM listed_at) AS month,
EXTRACT(day FROM listed_at) AS day,
Count(*) AS active_listing_count,
SUM(list_price) AS sum_of_listing_price,
Date_part('day', current_date :: timestamp - listed_at :: timestamp) AS days_on_market,
COUNT(num_bathrooms) AS total_bathrooms,
COUNT(num_bedrooms) AS total_bedrooms
FROM properties
WHERE expired_at IS NULL
GROUP BY geography_id,
listed_at::DATE
ORDER BY listed_at::DATE DESC;
I am getting this error:
ERROR: column "properties.listed_at" must appear in the GROUP BY clause or be used in an aggregate function

Each occurrence of listed_at in select list should be casted to date:
SELECT geography_id,
listed_at::DATE,
EXTRACT(YEAR FROM listed_at::date) AS year,
EXTRACT(MONTH FROM listed_at::date) AS month,
EXTRACT(day FROM listed_at::date) AS day,
count(*) AS active_listing_count,
SUM(list_price) AS sum_of_listing_price,
date_part('day', current_date::timestamp - listed_at::date) AS days_on_market,
COUNT(num_bathrooms) AS total_bathrooms,
COUNT(num_bedrooms) AS total_bedrooms
FROM properties
WHERE expired_at IS NULL
GROUP BY geography_id,
listed_at::DATE
ORDER BY listed_at::DATE DESC;

postgreSQL: How Select the nearest date that is not null

I got a date that I want to find the all records in the past that got the same month and day.
The problem accrues when there is no such date in the same year. For example, the 29th February.
My goal is to get the nearest date from below the date that does not exist.
This is my currently query with the date 2012-02-29:
SELECT date, amount
FROM table_name
WHERE
EXTRACT(MONTH FROM date) = EXTRACT(MONTH FROM DATE('2012-02-29') )
AND EXTRACT(DAY FROM date) = EXTRACT(DAY FROM DATE('2012-02-29') )
AND date < '2012-02-29'
ORDER BY date DESC LIMIT 10;

If I understand correctly, you want one date per year with the property that that day is nearest to the given date.
I would suggest using distinct on:
select distinct on (date_trunc('year', date)) t.*
from table_name t
order by date_trunc('year', date),
abs(date_part('day, (date -
(date '2012-02-29' -
(extract(year from date '2012-02-29') - extract(year from date)) * interval '1 year'
)
)
)
)
);
EDIT:
An example of working code:
select distinct on (date_trunc('year', date)) t.*
from table_name t
order by date_trunc('year', date),
abs(date_part('day', date - (date '2012-02-29' -
((extract(year from date '2012-02-29') - extract(year from date)) * interval '1 year')
)
))

Append select with missing month and year values

I have SELECT:
SELECT month, year, ROUND(AVG(q_overall) OVER (rows BETWEEN 10000 preceding and current row),2) as avg
FROM (
SELECT EXTRACT(Month FROM date) as month, EXTRACT(Year FROM date) as year, ROUND(AVG(q_overall),1) as q_overall
FROM fb_parsed
WHERE business_id = 1
GROUP BY year, month
ORDER BY year, month) a
output:
month year avg
-----------------
12 2012 5
1 2013 4.5
2 2013 4.1
4 2013 4.8
5 2013 4.7
And I have to append this table with missing values (in this example with 3-rd month in 2013 year). The avg must be same as in previous row, that means I need to append this table with:
3 2013 4.1
Can I do this with SELF JOINS and generate_series, or with some UNION select?

You can simplify your select. It doesn't need a subquery:
SELECT EXTRACT(Month FROM date) as month,
EXTRACT(Year FROM date) as year,
ROUND(AVG(q_overall), 1) as q_overall,
ROUND(AVG(AVG(q_overall)) OVER (rows BETWEEN 10000 preceding and current row), 2)
FROM fb_parsed
WHERE business_id = 1
GROUP BY year, month;
The windows function needs an order by. I assume you really intend:
SELECT EXTRACT(Month FROM date) as month,
EXTRACT(Year FROM date) as year,
ROUND(AVG(q_overall), 1) as q_overall,
ROUND(AVG(AVG(q_overall)) OVER (ORDER BY year, month)), 2)
FROM fb_parsed
WHERE business_id = 1
GROUP BY year, month;
Then, to fill in the values you can use generate_series():
SELECT EXTRACT(Month FROM ym.date) as month,
EXTRACT(Year FROM ym.date) as year,
ROUND(AVG(AVG(q_overall)) OVER (ORDER BY year, month)), 2)
FROM (SELECT generate_series(date_trunc('month', min(date)),
date_trunc('month', max(date)),
interval '1 month') as date
FROM fb_parsed
) ym LEFT JOIN
fb_parsed p
ON EXTRACT(year FROM ym.date) = EXTRACT(year FROM p.date) AND
EXTRACT(month FROM ym.date) = EXTRACT(month FROM p.date) AND
p.business_id = 1
GROUP BY year, month;
I think this will do what you want.

Final query:
SELECT EXTRACT(Month FROM ym.date) as month,
EXTRACT(Year FROM ym.date) as year,
ROUND(AVG(AVG(q_overall)) OVER (ORDER BY EXTRACT(Year FROM ym.date), EXTRACT(Month FROM ym.date)), 2)
FROM
(SELECT generate_series(date_trunc('month', min(date)),
date_trunc('month', max(date)),
interval '1 month') as date
FROM fb_parsed WHERE business_id = 1 AND site = 'facebook')
ym LEFT JOIN
fb_parsed p
ON EXTRACT(year FROM ym.date) = EXTRACT(year FROM p.date) AND
EXTRACT(month FROM ym.date) = EXTRACT(month FROM p.date) AND
p.business_id = 1 AND site = 'facebook'
GROUP BY year, month;

Can I do this with SELF JOINS and generate_series?
Yep, you're close, but your current query does a Cumulative Average. The tricky part is the fill the gaps with the previous value (If PostgreSQL supported the IGNORE NULLS option of LAST_VALUE this would be easier...)
SELECT month,
year,
MAX(q_overall) -- assign the value to all rows within the same group
OVER (PARTITION BY grp)
FROM
(
SELECT all_months.month, all_months.year, p.q_overall,
-- assign a new group number whenever there's a value in q_overall
SUM(CASE WHEN q_overall IS NULL THEN 0 ELSE 1 END)
OVER (ORDER BY all_months.month, all_months.year
ROWS UNBOUNDED PRECEDING) AS grp
FROM
( -- create all months with min and max date
SELECT generate_series(date_trunc('month', min(date)),
date_trunc('month', max(date)),
interval '1 month') as date
FROM fb_parsed
) AS all_months
LEFT JOIN
( -- do the average per month calculation
SELECT EXTRACT(Month FROM date) as month,
EXTRACT(Year FROM date) as year,
ROUND(AVG(q_overall),1) as q_overall
FROM fb_parsed
WHERE business_id = 1
GROUP BY year, month
) AS p
ON EXTRACT(year FROM ym.date) = all_months.month
AND EXTRACT(month FROM ym.date) = all_months.year
) AS dt
Edit:
Oops, this was overly complicated, the question asked for a Cumulative Average and then NULLs will not change the result and there's no need to fill the gaps

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Count distinct calendar weeks, months in Postgresql - sql

Related

PostgreSQL: Simplifying a SQL query into a shorter query

How do I make that difference from months using standardSQL (BigQuery)

unable to typecast timestamp to date in Group By

postgreSQL: How Select the nearest date that is not null

Append select with missing month and year values

Categories

Resources