Group by date but not include time - Oracle SQl - sql

I'm trying to use listagg to group categories by date, but the field is date-time. Categories are appearing on separate lines. Is it possible to group by date only? I've tried CAST as well as DATE in the group by, but it's still not working. Here's the base query:
select ACCOUNT,
ID,
NAME,
TERM,
listagg(CATEGORY, ', ') within group (order by CATEGORY) as cat_by_date,
trunc(TRANSACTION_DATE) short_date
from TABLE
where term= '2022'
and CATEGORY in ('T', 'H', 'P')
group by
ACCOUNT_UID,
ID,
NAME,
TERM,
TRANSACTION_DATE
order by 1

TRUNC in the GROUP BY as well
...
group by
ACCOUNT_UID,
ID,
NAME,
TERM,
TRUNC(TRANSACTION_DATE)

Related

Bigquery - Select a column with not grouping them in group by clause

I'm having day-wise tables with google analytics data that is split based on device_category(desktop/mobile/tablet) and user_type(new user/returning user).
My requirement is, to query for the top-performing product in the month and just know the type of device and user. I do not want to group them based on device_category, user_type.
When excluding them from my query is gives an error saying - "Query error: SELECT list expression references column device_category which is neither grouped nor aggregated at [3:21]"
QUERY THAT DOES NOT WORK(this is my requirement)
SELECT
month,
year,
device_category,
user_type,
product_name,
round(sum(item_revenue),2) as item_revenue
FROM
`ProjectName.DatasetName.GA_REPORT_3_*`
where
_table_suffix between '20201101' and '20210131'
and channel_grouping = 'Organic Search'
group by
month,
year,
channel_grouping,
product_name
order by
item_revenue desc;
QUERY THAT WORKS
SELECT
month,
year,
device_category,
user_type,
product_name,
round(sum(item_revenue),2) as item_revenue
FROM
`ProjectName.DatasetName.GA_REPORT_3_*`
where
_table_suffix between '20201101' and '20210131'
and channel_grouping = 'Organic Search'
group by
month,
year,
channel_grouping,
product_name,
device_category,
user_type
order by
item_revenue desc;
Sample Data
I know in regular SQL workbenches we can select a Column in SQL not in Group By clause, but the same does not work for my issue on Bigquery.
Could you help me with a workaround for this.
Technically, you can envelope device_category and user_type with ANY_VALUE or MAX or MIN:
SELECT
month,
year,
ANY_VALUE(device_category),
ANY_VALUE(user_type),
product_name,
round(sum(item_revenue),2) as item_revenue
FROM
`ProjectName.DatasetName.GA_REPORT_3_*`
where
_table_suffix between '20201101' and '20210131'
and channel_grouping = 'Organic Search'
group by
month,
year,
channel_grouping,
product_name
order by
item_revenue desc;
You can use a subquery to achieve this:
SELECT
x.month,
x.year,
x.device_category,
x.user_type,
x.product_name,
ROUND(SUM(x.item_revenue),2) as item_revenue
FROM
(SELECT
month,
year,
device_category,
user_type,
product_name,
item_revenue
FROM `ProjectName.DatasetName.GA_REPORT_3_*`
WHERE _table_suffix BETWEEN '20201101' and '20210131'
AND channel_grouping = 'Organic Search'
) x
GROUP BY
x.month,
x.year,
x.product_name,
x.device_category,
x.user_type
ORDER BY ROUND(SUM(x.item_revenue),2) DESC;

First users by categories in BigQuery

How can I count the new and existing users by categories and years?
For instance, during 2015-2020 if someone bought a product in category_A in 2016 first, it will be counted as a new uesr in 2016 in category_A although this user bought a product in category_B in 2015.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow
One approach uses two levels of aggregation:
select extract(year from mindate) yr, category, count(*) num_new
from (
select user_id, category, min(date) mindate
from table_1
group by user_id, category
) t
group by extract(year from mindate)
The subquery retrieves the first purchase date of each user by category. Then, the outer query aggregates by the year of that date.
If you want the count of current users as well, then it is a bit different. You can use a window function in the subquery rather than aggregation, then count distinct values in the outer query:
select extract(year from mindate) yr, category,
countdistinctif(user_id, date = mindate) num_new,
countdistinct(user_id) num_total
from (
select date, user_id, category, min(date) over(partition by user_id, category) mindate
from table_1
) t
group by extract(year from mindate)
Below is for BigQuery Standard SQL
#standardSQL
WITH temp AS (
SELECT *,
0 = COUNT(1) OVER(
PARTITION BY user_id, category
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) new_user
FROM `project.dataset.table_1`
ORDER BY date, user_id
)
SELECT EXTRACT(YEAR FROM date) AS year,
category,
COUNT(DISTINCT IF(new_user, user_id, NULL)) AS num_new,
COUNT(DISTINCT IF(new_user, NULL, user_id)) AS num_existing
FROM temp
GROUP BY year, category

First user by category

How can I count the new users for each category who bought in the category for the first by year? For instance, 2015-2020 by year, if someone bought in 2015 for the first it will be counted as a new uesr in 2015 but not in 2016-2020.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow
You’ll want to start with a sub query to get the first date each user purchased in the category. This is a pretty straightforward group by problem:
select
user_id,
category,
min(date) as first_category_purchase
from my_table
group by user_id, category;
Next, you can use Postgres’s date_trunc function to group by year and category, using your first query as a sub query:
select
category,
date_trunc('year', first_category_purchase)
count(*)
from (
select
user_id,
category,
min(date) as first_category_purchase
from my_table
group by user_id, category
) a
group by 1, 2;
In Postgres, one method is group by after a distinct on:
select date, count(*) as num_new_users
from (select distinct on (user_id, category) t.*
from t
order by user_id, category, date asc
) d
group by date
order by date;
If date is really a date and not a year, then you need something like to_char() or date_trunc() to convert it to a year.

SQL Group By for quarterly dates

transaction_date is in a date format.
What I'm actually trying to output is the COUNT DISTINCT of Unique_ID by quarter (i.e., how many times did a Unique_Id appear in a given quarter).
SELECT transaction_date ,
UNIQUE_ID,
FROM panel
WHERE (some criteria = 'x')
GROUP BY UNIQUE_ID
try this :
SELECT datepart(quarter,transaction_date),
count(distinct UNIQUE_ID) as cnt
FROM panel
WHERE (some criteria = 'x')
GROUP BY datepart(quarter,p.transaction_date)
but the count(distinct) will do a sort so it will take you a lot of time. so you can distinct it first in the table then do the count
SELECT datepart(quarter,p.transaction_date),
count(p.UNIQUE_ID) as cnt
FROM (select distinct transaction_date as transaction_date, UNIQUE_ID
from panel) as p
WHERE (some criteria = 'x')
GROUP BY datepart(quarter,p.transaction_date)
I'd use date_trunc:
select
date_trunc ('quarter', transaction_date), count (distinct unique_id)
from panel
where criteria = 'x'
group by 1
This presupposes that when you say "by quarter" that 1Q2015 is different than 1Q2014.
SELECT DATEPART(QUARTER, transaction_date) ,
COUNT(DISTINCT UNIQUE_ID),
FROM panel
GROUP BY transaction_date

SQL Query Early/Late dates

I am trying to create an SQL view, based on results from the earliest and latest dates. I am aware of the min and max functions but I've not been able to implement it correctly. So far I have:
select distinct
name,
study,
group,
ROUND (TLength * POWER (TWidth, 2) * 0.000523, 3) as Volume,
firstDate as firstDate,
lastDate as lastDate
from
(select
name,
study,
group,
min(operation_time) firstDate,
max(operation_time) lastDate,
MAX(DECODE (ACTIVITY,'length', RESULT_VALUE, NULL)) TLength,
MAX(DECODE (ACTIVITY,'width', RESULT_VALUE,NULL)) TWidth
from mx_all_data_vw
where mx_all_data_vw.study_name like '%MT%'
group by name, group study);
This gives me a single row for either the earliest or latest date, and two columns with earliest and latest dates.
I want 2 rows, that has a row containing all data for earliest date and another containing all data for latest date, rather than two columns seperating the early and late dates.
Thanks.
Simplified for readability:
SELECT *
FROM (
SELECT mx_all_data_vw.*,
ROW_NUMBER() OVER (PARTITION BY name, study, "group" ORDER BY operation_time) rna,
ROW_NUMBER() OVER (PARTITION BY name, study, "group" ORDER BY operation_time DESC) rnd,
DECODE(activity, 'length', result_value, NULL) AS TLength,
DECODE(activity, 'width', result_value, NULL) AS TWidth
FROM mx_all_data_vw
WHERE mx_all_data_vw.study_name like '%MT%'
)
WHERE 1 IN (rna, rnd)
Add the computed expressions instead of *.