First user by category - sql

How can I count the new users for each category who bought in the category for the first by year? For instance, 2015-2020 by year, if someone bought in 2015 for the first it will be counted as a new uesr in 2015 but not in 2016-2020.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow

You’ll want to start with a sub query to get the first date each user purchased in the category. This is a pretty straightforward group by problem:
select
user_id,
category,
min(date) as first_category_purchase
from my_table
group by user_id, category;
Next, you can use Postgres’s date_trunc function to group by year and category, using your first query as a sub query:
select
category,
date_trunc('year', first_category_purchase)
count(*)
from (
select
user_id,
category,
min(date) as first_category_purchase
from my_table
group by user_id, category
) a
group by 1, 2;

In Postgres, one method is group by after a distinct on:
select date, count(*) as num_new_users
from (select distinct on (user_id, category) t.*
from t
order by user_id, category, date asc
) d
group by date
order by date;
If date is really a date and not a year, then you need something like to_char() or date_trunc() to convert it to a year.

Related

Bigquery - Select a column with not grouping them in group by clause

I'm having day-wise tables with google analytics data that is split based on device_category(desktop/mobile/tablet) and user_type(new user/returning user).
My requirement is, to query for the top-performing product in the month and just know the type of device and user. I do not want to group them based on device_category, user_type.
When excluding them from my query is gives an error saying - "Query error: SELECT list expression references column device_category which is neither grouped nor aggregated at [3:21]"
QUERY THAT DOES NOT WORK(this is my requirement)
SELECT
month,
year,
device_category,
user_type,
product_name,
round(sum(item_revenue),2) as item_revenue
FROM
`ProjectName.DatasetName.GA_REPORT_3_*`
where
_table_suffix between '20201101' and '20210131'
and channel_grouping = 'Organic Search'
group by
month,
year,
channel_grouping,
product_name
order by
item_revenue desc;
QUERY THAT WORKS
SELECT
month,
year,
device_category,
user_type,
product_name,
round(sum(item_revenue),2) as item_revenue
FROM
`ProjectName.DatasetName.GA_REPORT_3_*`
where
_table_suffix between '20201101' and '20210131'
and channel_grouping = 'Organic Search'
group by
month,
year,
channel_grouping,
product_name,
device_category,
user_type
order by
item_revenue desc;
Sample Data
I know in regular SQL workbenches we can select a Column in SQL not in Group By clause, but the same does not work for my issue on Bigquery.
Could you help me with a workaround for this.
Technically, you can envelope device_category and user_type with ANY_VALUE or MAX or MIN:
SELECT
month,
year,
ANY_VALUE(device_category),
ANY_VALUE(user_type),
product_name,
round(sum(item_revenue),2) as item_revenue
FROM
`ProjectName.DatasetName.GA_REPORT_3_*`
where
_table_suffix between '20201101' and '20210131'
and channel_grouping = 'Organic Search'
group by
month,
year,
channel_grouping,
product_name
order by
item_revenue desc;
You can use a subquery to achieve this:
SELECT
x.month,
x.year,
x.device_category,
x.user_type,
x.product_name,
ROUND(SUM(x.item_revenue),2) as item_revenue
FROM
(SELECT
month,
year,
device_category,
user_type,
product_name,
item_revenue
FROM `ProjectName.DatasetName.GA_REPORT_3_*`
WHERE _table_suffix BETWEEN '20201101' and '20210131'
AND channel_grouping = 'Organic Search'
) x
GROUP BY
x.month,
x.year,
x.product_name,
x.device_category,
x.user_type
ORDER BY ROUND(SUM(x.item_revenue),2) DESC;

First users by categories in BigQuery

How can I count the new and existing users by categories and years?
For instance, during 2015-2020 if someone bought a product in category_A in 2016 first, it will be counted as a new uesr in 2016 in category_A although this user bought a product in category_B in 2015.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow
One approach uses two levels of aggregation:
select extract(year from mindate) yr, category, count(*) num_new
from (
select user_id, category, min(date) mindate
from table_1
group by user_id, category
) t
group by extract(year from mindate)
The subquery retrieves the first purchase date of each user by category. Then, the outer query aggregates by the year of that date.
If you want the count of current users as well, then it is a bit different. You can use a window function in the subquery rather than aggregation, then count distinct values in the outer query:
select extract(year from mindate) yr, category,
countdistinctif(user_id, date = mindate) num_new,
countdistinct(user_id) num_total
from (
select date, user_id, category, min(date) over(partition by user_id, category) mindate
from table_1
) t
group by extract(year from mindate)
Below is for BigQuery Standard SQL
#standardSQL
WITH temp AS (
SELECT *,
0 = COUNT(1) OVER(
PARTITION BY user_id, category
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) new_user
FROM `project.dataset.table_1`
ORDER BY date, user_id
)
SELECT EXTRACT(YEAR FROM date) AS year,
category,
COUNT(DISTINCT IF(new_user, user_id, NULL)) AS num_new,
COUNT(DISTINCT IF(new_user, NULL, user_id)) AS num_existing
FROM temp
GROUP BY year, category

Unique values per time period

In my table trips , I have two columns: created_at and user_id
My goal is to count unique user_ids per month with a query in postgres. So far, I have written this - but it returns an error
SELECT user_id,
to_char(created_at, 'YYYY-MM') as t COUNT(*)
FROM (SELECT DISTINCT user_id
FROM trips) group by t;
How should I change this query?
The query is much simpler than that:
SELECT to_char(created_at, 'YYYY-MM') as yyyymm, COUNT(DISTINCT user_id)
FROM trips
GROUP BY yyyymm
ORDER BY yyyymm;

Tag first observation for a user over products

I have a data table with the columns user_id, purchase_date and a date column in standard yyyy-mm-dd form.
Users in this table purchase multiple items of the same product (and different products) in the same month, so I needed to be able to capture the first time that they bought a particular product and then count each product by month.
I did it with the following:
SELECT yr, mo, COUNT(DISTINCT(CASE WHEN Product = 'product_a'
THEN user_id)) AS product_a
FROM
(
SELECT YEAR(min(purchase_date)) AS yr, MONTH(min(pruchase_date)) AS mo,
DAY(min(purchase_date)) AS dy, user_id, Product
FROM daily_purchases
GROUP BY user_id, Product
) b
GROUP BY yr, mo
ORDER BY yr, mo
This seems to work fine and capture what I am looking for. Does anyone have any suggestions - or is this the most appropriate way to go about it? Thanks!
I don't have any source data to test ... but here is how I would approach it ... this may need some tweaking as I have not tested it with source data:
Select [yr], [mo], [user_id], [firstBuy], [cntBuys] From
(
SELECT [yr], [mo], [user_id],
ROW_NUMBER() over (partition by user_id order by purchase_date) as 'firstBuy'
,COUNT(*) over (partition by user_id, Product) as 'cntBuys'
FROM daily_purchases
) a
Where a.firstBuy = 1
Group by a.yr, a.mo

SQL: aggregation (group by like) in a column

I have a select that group by customers spending of the past two months by customer id and date. What I need to do is to associate for each row the total amount spent by that customer in the whole first week of the two month time period (of course it would be a repetition for each row of one customer, but for some reason that's ok ). do you know how to do that without using a sub query as a column?
I was thinking using some combination of OVER PARTITION, but could not figure out how...
Thanks a lot in advance.
Raffaele
Query:
select customer_id, date, sum(sales)
from transaction_table
group by customer_id, date
If it's a specific first week (e.g. you always want the first week of the year, and your data set normally includes January and February spending), you could use sum(case...):
select distinct customer_id, date, sum(sales) over (partition by customer_ID, date)
, sum(case when date between '1/1/15' and '1/7/15' then Sales end)
over (partition by customer_id) as FirstWeekSales
from transaction_table
In response to the comments below; I'm not sure if this is what you're looking for, since it involves a subquery, but here's my best shot:
select distinct a.customer_id, date
, sum(sales) over (partition by a.customer_ID, date)
, sum(case when date between mindate and dateadd(DD, 7, mindate)
then Sales end)
over (partition by a.customer_id) as FirstWeekSales
from transaction_table a
left join
(select customer_ID, min(date) as mindate
from transaction_table group by customer_ID) b
on a.customer_ID = b.customer_ID