Postgresql - Window Function Aggregate - sql

I'm attempting to find the number of new users per month by product type. However, I continue to receive an error requesting cnt to be used in an aggregate function.
SELECT EXTRACT(MONTH FROM date) AS month
FROM (SELECT users.date,
COUNT(*) OVER(PARTITION BY product_type) AS cnt FROM users) AS u
GROUP BY month
ORDER BY cnt DESC;

That seems like a very strange construct. Here is a method that doesn't use window functions:
select date_trunc('month', date) as yyyymm, product_id, count(*)
from (select distinct on (u.userid) u.*
from users u
order by u.userid, u.date
) u
group by date_trunc('month', date), product_id
order by yyyymm, product_id;

Related

First users by categories in BigQuery

How can I count the new and existing users by categories and years?
For instance, during 2015-2020 if someone bought a product in category_A in 2016 first, it will be counted as a new uesr in 2016 in category_A although this user bought a product in category_B in 2015.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow
One approach uses two levels of aggregation:
select extract(year from mindate) yr, category, count(*) num_new
from (
select user_id, category, min(date) mindate
from table_1
group by user_id, category
) t
group by extract(year from mindate)
The subquery retrieves the first purchase date of each user by category. Then, the outer query aggregates by the year of that date.
If you want the count of current users as well, then it is a bit different. You can use a window function in the subquery rather than aggregation, then count distinct values in the outer query:
select extract(year from mindate) yr, category,
countdistinctif(user_id, date = mindate) num_new,
countdistinct(user_id) num_total
from (
select date, user_id, category, min(date) over(partition by user_id, category) mindate
from table_1
) t
group by extract(year from mindate)
Below is for BigQuery Standard SQL
#standardSQL
WITH temp AS (
SELECT *,
0 = COUNT(1) OVER(
PARTITION BY user_id, category
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) new_user
FROM `project.dataset.table_1`
ORDER BY date, user_id
)
SELECT EXTRACT(YEAR FROM date) AS year,
category,
COUNT(DISTINCT IF(new_user, user_id, NULL)) AS num_new,
COUNT(DISTINCT IF(new_user, NULL, user_id)) AS num_existing
FROM temp
GROUP BY year, category

How to calculate the median in Postgres?

I have created a basic database (picture attached) Database, I am trying to find the following:
"Median total amount spent per user in each calendar month"
I tried the following, but getting errors:
SELECT
user_id,
AVG(total_per_user)
FROM (SELECT user_id,
ROW_NUMBER() over (ORDER BY total_per_user DESC) AS desc_total,
ROW_NUMBER() over (ORDER BY total_per_user ASC) AS asc_total
FROM (SELECT EXTRACT(MONTH FROM created_at) AS calendar_month,
user_id,
SUM(amount) AS total_per_user
FROM transactions
GROUP BY calendar_month, user_id) AS total_amount
ORDER BY user_id) AS a
WHERE asc_total IN (desc_total, desc_total+1, desc_total-1)
GROUP BY user_id
;
In Postgres, you could just use aggregate function percentile_cont():
select
user_id,
percentile_cont(0.5) within group(order by total_per_user) median_total_per_user
from (
select user_id, sum(amount) total_per_user
from transactions
group by date_trunc('month', created_at), user_id
) t
group by user_id
Note that date_trunc() is probably closer to what you want than extract(month from ...) - unless you do want to sum amounts of the same month for different years together, which is not how I understood your requirement.
Just use percentile_cont(). I don't fully understand the question. If you want the median of the monthly spending, then:
SELECT user_id,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_per_user
ROW_NUMBER() over (ORDER BY total_per_user DESC) AS desc_total,
ROW_NUMBER() over (ORDER BY total_per_user ASC) AS asc_total
FROM (SELECT DATE_TRUNC('month', created_at) AS calendar_month,
user_id, SUM(amount) AS total_per_user
FROM transactions t
GROUP BY calendar_month, user_id
) um
GROUP BY user_id;
There is a built-in function for median. No need for fancier processing.

Count occurences in a row using aggregate functions

Consider the following relation
column measured_at holds thousands of different timestamps and column cell_id holds the number of the cell tower used at each timestamp. I want to query for each day saved in measured_at, which cell tower has the most occurences (used the most at that day, here is time irrelevant, only the date is to query). This probably can be done using window functions, but I want to do it using only aggregate functions and simple queries.
an output should look like for example:
cell_id measured_at
27997442 2015-12-22
for the above example because on 22-12-2015 tower number 27997442 has been used the most.
You can use aggregation and distinct on. To get the counts:
select date_trunc(date, measured_at) as dte, cell_id, count(*) as cnt
from t
group by dte, cell_id
And then extend this for only one value:
select distinct on (date_trunc(date, measured_at)) date_trunc(date, measured_at) as dte, cell_id, count(*) as cnt
from t
group by dte, cell_id
order by date_trunc(date, measured_at), count(*) desc;
Of course, you can use window functions as well -- and that is a better approach if you want to get ties as well:
select dte, cell_id, cnt
from (select date_trunc(date, measured_at) as dte, cell_id, count(*) as cnt,
rank() over (partition by date_trunc(date, measured_at) order by count(*) desc) as seqnum
from t
group by dte, cell_id
) dc
where seqnum = 1;

SQL subquery using group by item from main query

I have a table with a created timestamp and id identifier.
I can get number of unique id's per week with:
SELECT date_trunc('week', created)::date AS week, count(distinct id)
FROM my_table
GROUP BY week ORDER BY week;
Now I want to have the accumulated number of created by unique id's per week, something like this:
SELECT date_trunc('week', created)::date AS week, count(distinct id),
(SELECT count(distinct id)
FROM my_table
WHERE date_trunc('week', created)::date <= week) as acc
FROM my_table
GROUP BY week ORDER BY week;
But that doesn't work, as week is not accessible in the sub select (ERROR: column "week" does not exist).
How do I solve this?
I'm using PostgreSQL
Use a cumulative aggregation. But, I don't think you need the distinct, so:
SELECT date_trunc('week', created)::date AS week, count(*) as cnt,
SUM(COUNT(*)) OVER (ORDER BY MIN(created)) as running_cnt
FROM my_table
GROUP BY week
ORDER BY week;
In any case, as you've phrased the problem, you can change cnt to use count(distinct). Your subquery is not using distinct at all.
CTEs or a temp table should fix your problem. Here is an example using CTEs.
WITH abc AS (
SELECT date_trunc('week', created)::date AS week, count(distinct id) as IDCount
FROM my_table
GROUP BY week ORDER BY week;
)
SELECT abc.week, abc.IDcount,
(SELECT count(*)
FROM my_table
WHERE date_trunc('week', created)::date <= adc.week) as acc
FROM abc
GROUP BY week ORDER BY abc.week;
Hope this helps

Unique values per time period

In my table trips , I have two columns: created_at and user_id
My goal is to count unique user_ids per month with a query in postgres. So far, I have written this - but it returns an error
SELECT user_id,
to_char(created_at, 'YYYY-MM') as t COUNT(*)
FROM (SELECT DISTINCT user_id
FROM trips) group by t;
How should I change this query?
The query is much simpler than that:
SELECT to_char(created_at, 'YYYY-MM') as yyyymm, COUNT(DISTINCT user_id)
FROM trips
GROUP BY yyyymm
ORDER BY yyyymm;