Bigquery - Select a column with not grouping them in group by clause - google-bigquery

I'm having day-wise tables with google analytics data that is split based on device_category(desktop/mobile/tablet) and user_type(new user/returning user).
My requirement is, to query for the top-performing product in the month and just know the type of device and user. I do not want to group them based on device_category, user_type.
When excluding them from my query is gives an error saying - "Query error: SELECT list expression references column device_category which is neither grouped nor aggregated at [3:21]"
QUERY THAT DOES NOT WORK(this is my requirement)
SELECT
month,
year,
device_category,
user_type,
product_name,
round(sum(item_revenue),2) as item_revenue
FROM
`ProjectName.DatasetName.GA_REPORT_3_*`
where
_table_suffix between '20201101' and '20210131'
and channel_grouping = 'Organic Search'
group by
month,
year,
channel_grouping,
product_name
order by
item_revenue desc;
QUERY THAT WORKS
SELECT
month,
year,
device_category,
user_type,
product_name,
round(sum(item_revenue),2) as item_revenue
FROM
`ProjectName.DatasetName.GA_REPORT_3_*`
where
_table_suffix between '20201101' and '20210131'
and channel_grouping = 'Organic Search'
group by
month,
year,
channel_grouping,
product_name,
device_category,
user_type
order by
item_revenue desc;
Sample Data
I know in regular SQL workbenches we can select a Column in SQL not in Group By clause, but the same does not work for my issue on Bigquery.
Could you help me with a workaround for this.

Technically, you can envelope device_category and user_type with ANY_VALUE or MAX or MIN:
SELECT
month,
year,
ANY_VALUE(device_category),
ANY_VALUE(user_type),
product_name,
round(sum(item_revenue),2) as item_revenue
FROM
`ProjectName.DatasetName.GA_REPORT_3_*`
where
_table_suffix between '20201101' and '20210131'
and channel_grouping = 'Organic Search'
group by
month,
year,
channel_grouping,
product_name
order by
item_revenue desc;

You can use a subquery to achieve this:
SELECT
x.month,
x.year,
x.device_category,
x.user_type,
x.product_name,
ROUND(SUM(x.item_revenue),2) as item_revenue
FROM
(SELECT
month,
year,
device_category,
user_type,
product_name,
item_revenue
FROM `ProjectName.DatasetName.GA_REPORT_3_*`
WHERE _table_suffix BETWEEN '20201101' and '20210131'
AND channel_grouping = 'Organic Search'
) x
GROUP BY
x.month,
x.year,
x.product_name,
x.device_category,
x.user_type
ORDER BY ROUND(SUM(x.item_revenue),2) DESC;

Related

How to aggregate rows on BigQuery

I need to group different years in my dataset so that I can see the total number of login_log_id each year has(BigQuery)
SELECT login_log_id,
DATE(login_time) as login_date,
EXTRACT(YEAR FROM login_time) as login_year,
TIME(login_time) as login_time,
FROM `steel-time-347714.flex.logs`
GROUP BY login_log_id
I want to make a group by so that I can see total number of login_log_id generated in different years.
My columns are login_log_id, login_time
I am getting following error :-
SELECT list expression references column login_time which is neither grouped nor aggregated at [2:6]
The error is because every column you refer to in the select need to be aggregated or be in the GROUP BY.
If you want the total logins by year, you can do:
SELECT
EXTRACT(YEAR FROM login_time) as login_year,
COUNT(1) as total_logins,
COUNT(DISTINCT login_log_id) as total_unique_logins
FROM `steel-time-347714.flex.logs`
GROUP BY login_year
But if you want the total by login_log_id and year:
SELECT
login_log_id,
EXTRACT(YEAR FROM login_time) as login_year,
COUNT(1) as total_logins
FROM `steel-time-347714.flex.logs`
GROUP BY login_log_id, login_year

First users by categories in BigQuery

How can I count the new and existing users by categories and years?
For instance, during 2015-2020 if someone bought a product in category_A in 2016 first, it will be counted as a new uesr in 2016 in category_A although this user bought a product in category_B in 2015.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow
One approach uses two levels of aggregation:
select extract(year from mindate) yr, category, count(*) num_new
from (
select user_id, category, min(date) mindate
from table_1
group by user_id, category
) t
group by extract(year from mindate)
The subquery retrieves the first purchase date of each user by category. Then, the outer query aggregates by the year of that date.
If you want the count of current users as well, then it is a bit different. You can use a window function in the subquery rather than aggregation, then count distinct values in the outer query:
select extract(year from mindate) yr, category,
countdistinctif(user_id, date = mindate) num_new,
countdistinct(user_id) num_total
from (
select date, user_id, category, min(date) over(partition by user_id, category) mindate
from table_1
) t
group by extract(year from mindate)
Below is for BigQuery Standard SQL
#standardSQL
WITH temp AS (
SELECT *,
0 = COUNT(1) OVER(
PARTITION BY user_id, category
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) new_user
FROM `project.dataset.table_1`
ORDER BY date, user_id
)
SELECT EXTRACT(YEAR FROM date) AS year,
category,
COUNT(DISTINCT IF(new_user, user_id, NULL)) AS num_new,
COUNT(DISTINCT IF(new_user, NULL, user_id)) AS num_existing
FROM temp
GROUP BY year, category

First user by category

How can I count the new users for each category who bought in the category for the first by year? For instance, 2015-2020 by year, if someone bought in 2015 for the first it will be counted as a new uesr in 2015 but not in 2016-2020.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow
You’ll want to start with a sub query to get the first date each user purchased in the category. This is a pretty straightforward group by problem:
select
user_id,
category,
min(date) as first_category_purchase
from my_table
group by user_id, category;
Next, you can use Postgres’s date_trunc function to group by year and category, using your first query as a sub query:
select
category,
date_trunc('year', first_category_purchase)
count(*)
from (
select
user_id,
category,
min(date) as first_category_purchase
from my_table
group by user_id, category
) a
group by 1, 2;
In Postgres, one method is group by after a distinct on:
select date, count(*) as num_new_users
from (select distinct on (user_id, category) t.*
from t
order by user_id, category, date asc
) d
group by date
order by date;
If date is really a date and not a year, then you need something like to_char() or date_trunc() to convert it to a year.

Unique values per time period

In my table trips , I have two columns: created_at and user_id
My goal is to count unique user_ids per month with a query in postgres. So far, I have written this - but it returns an error
SELECT user_id,
to_char(created_at, 'YYYY-MM') as t COUNT(*)
FROM (SELECT DISTINCT user_id
FROM trips) group by t;
How should I change this query?
The query is much simpler than that:
SELECT to_char(created_at, 'YYYY-MM') as yyyymm, COUNT(DISTINCT user_id)
FROM trips
GROUP BY yyyymm
ORDER BY yyyymm;

SQL: Dividing daily data by a monthly index

I have daily transaction data that is a product of this query:
SELECT transaction_date ,
Merchant,
Amount
into transaction.table
FROM source.table
WHERE (DESCRIPTION iLIKE '%Criteria%')
The field transaction_date is in the format of DATE (yyyy-MM-dd).
What I would like to do is take each row/transaction in transaction.table and divide Amount by a value tied to its RESPECTIVE month (this is key) contained in a separate table called Calendar.
The separate table called Calendar is queried from the same source.table as below:
select month,count(*) as distinct_month
into source.Calendar
from
(
select Population, to_char(optimized_transaction_date, 'YYYY-MM') as month
FROM source.table
group by Population, to_char(optimized_transaction_date, 'YYYY-MM')
)
group by month
My goal is to get a value for each day: Amount / distinct_month.
The key part is matching the daily data (transaction_date) in the first query with the monthly data in the second query (month).
Note that month from second query is a varchar whereas transact_date in first query is DATE.
I think you want something like this:
SELECT transaction_date, Merchant, Amount, newval
FROM (SELECT transaction_date, Merchant, Amount, Description,
(Amount / count(distinct population) over (partition by to_char(transaction_date, 'YYYY-MM')
) as newval
FROM source.table
) t
WHERE DESCRIPTION iLIKE '%Criteria%';
You only need the subquery because the total is calculated over all the data, without the filter condition.
EDIT:
Oops, I forgot that Postgres doesn't support COUNT(DISTINCT) as a window function. So do:
SELECT transaction_date, Merchant, Amount, newval
FROM (SELECT t.*,
(Amount / SUM( (seqnum = 1)::int) OVER (partition by to_char(transaction_date, 'YYYY-MM') )
) as newval
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY partition by to_char(transaction_date, 'YYYY-MM'), population ORDER BY population) as seqnum
FROM source.table t
) t
) t
WHERE DESCRIPTION iLIKE '%Criteria%';