BigQuery - Date_trunc in window function can't be group by on - sql

I am trying to use date_trunc() on some date in a window function on BigQuery. I used to do this previoulsy in Snowflake and everything went smoothly. Unfortunalty, BigQuery tells me that the full date needs to be in the group by, which defeat the purpose of using the date_trunc function. I wish to group by "year-month" and customer_id and give every customer a "rank" based on their order per "year-month". Here's an example of my script
select
id as customer_id,
date_trunc(month from date) as date,
count(1) as orders,
row_number() over (partition by date_trunc(month from date) order by count(1) desc) as customer_order
from table
group by 1,2
And I get this error code :
PARTITION BY expression references column date which is neither grouped nor aggregated
Anyone knows how to prevent this problem in an elegant manner? I know I could do a subquery \ CTE to fix this but I'm curious to understand why Big Query prevent this operation.

Related

What else do I need to add to my SQL query to bring related information in other columns if using MIN() GROUP BY

There is a table with the following column headers: indi_cod, ries_cod, date, time and level. Each ries_cod contains more than one indi_cod, and these indi_cod are random consecutive numbers.
Which SQL query would be appropriate to build if the aim is to find the smallest ID of each ries_cod, and at the same time bring its related information corresponding to date, time and level?
I tried the following query:
SELECT MIN (indi_cod) AS min_indi_cod
FROM my-project-01-354113.indi_cod.second_step
GROUP BY ries_cod
ORDER BY ries_cod
And, indeed, it presented me with the minimum value of indi_cod for each group of ries_cod, but I couldn't write the appropriate query to bring me the information from the date, time and level columns corresponding to each indi_cod.
I usually use some kind of ranking for this type of thing. you can use row_number, rank, or dense_rank depending on your rdbms. here is an example.
with t as(select a.*,
row_number() over (partition by ries_cod, order by indi_cod) as rn
from mytable)
select * from t where rn = 1
in addition if you are using oracle you can do this without two queries by using keep.
https://renenyffenegger.ch/notes/development/databases/SQL/select/group-by/keep-dense_rank/index
I think you just need to group by with the other columns
SELECT MIN (indi_cod), ries_cod, date, time, level AS min_indi_cod
FROM mytavke p
GROUP BY ries_cod, date, time, level
ORDER BY ries_cod

how to group by sql data in a sub-query to show evolution

I have my sqlite database which contains urls and the number of times it has been visited per week. it's stored this way :
uid, url, date, nb_visits,...
I would like to get every single URL with the evolution of the number of visits grouped by date.
something that could looks like :
"my_url_1","45,54,76,36,78"
here my assumption is that there are 5 dates stored into the db, don't need the dates, just want them to be ordered from old to recent
I tried something like this, but don't accept 2 field in the second select
select url, visits,
(select sum(visits) from data d2 where d2.url=d1.url group by date) as evol
from data d1
where date = (select max(date) from data);
This query isn't working just wanted to share what i'm trying to do
Thanks for the help
What you want is the result of GROUP_CONCAT() for each url.
The problem with the aggregate function GROUP_CONCAT() is that it does not support an ORDER BY clause, which you need to sort the visits by date, so the result that you would get is not guaranteed to be correct.
In SQLite there is also a GROUP_CONCAT() window function which supports an ORDER BY clause and can be used to return the result that you want:
SELECT DISTINCT url,
GROUP_CONCAT(visits) OVER (
PARTITION BY url
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) visits
FROM data
See a simplified demo.
Are you looking for group_concat()?
select url, group_concat(visits)
from (select d.*
from data d
order by url, date
) d
group by url;
Note: SQLite doesn't support an order by as part of the group_concat() syntax. In theory, sorting in the subquery should have the desired effect of ordering the dates.

Creating a partitioned table from query in Big Query does not yield same as without partitioning

When creating a table let's say "orders" with partitioning in the following way my result gets truncated in comparison to if I create it without partitioning. (Commenting and uncommenting rows five and 6).
I suspect that it might have something to do with the BQ limits (found here) but I can't figure out what. The ts is a timestamp field and order_id is a UUID string.
i.e. The count distinct on the last row will yield very different results. When partitioned it will return far less order_ids than without partitioning.
DROP TABLE IF EXISTS
`project.dataset.orders`;
CREATE OR REPLACE TABLE
`project.dataset.orders`
-- PARTITION BY
-- DATE(ts)
AS
SELECT
ts,
order_id,
SUM(order_value) AS order_value
FROM
`project.dataset.raw_orders`
GROUP BY
1, 2;
SELECT COUNT(DISTINCT order_id) FROM `project.dataset.orders`;
(This is not a valid 'answer', I just need a better place to write SQL than the comment box, I don't mind if moderator convert this answer into a comment AFTER it serves its purpose)
What is the number you'd get if you do query below, and which one does it align with (partitioned or non-partitioned)?
SELECT COUNT(DISTINCT order_id) FROM (
SELECT
ts,
order_id,
SUM(order_value) AS order_value
FROM
`project.dataset.raw_orders`
GROUP BY
1, 2
) t;
It turns out that there's a 60 day partition expiration!
https://cloud.google.com/bigquery/docs/managing-partitioned-tables#partition-expiration
So by updating the partition expiration I could get the full range.

Creating a table reports with WTD

How to create a table report with WTD integrated inside the report. e.g.
some option i could think of is creating an sp that returns a temp table, inside the sp is a loop that every week it will insert a wtd totals for that week. another one is if it can be achieved in the reporting service. so far no luck with those.
You can use grouping sets and order by. You don't show what your data looks like, but the idea is:
select date, sum(sales), sum(orders)
from t
group by grouping sets ( (date), (year(date), datepart(week, date)) )
order by max(date), grouping(date);
Here is a db<>fiddle.
Note: This leaves the "WTD" out, because that is a string and you seem to want to put it in a date column.
You can convert the date to a string and use coalece() (or case logic using grouping()):
select coalesce(convert(varchar(255), date), 'WTD'),
Here is a db<>fiddle.

date_trunc per day returns multiple rows for the same date

I using date_trunc in order to count events per day. I have a subquery that I use the date_trunc on. The problem is that the query returns multiple rows per one date. Any ideas?
select
date_trunc('day',date_) date_,
count(download),
count(subscribe)
from
(select
min(users.redshifted_at) date_,
users.id_for_vendor download,
subs.id_for_vendor subscribe
from Facetune2_device_info_log users
left join Facetune2_usage_store_user_subscribed subs
on users.id_for_vendor=subs.id_for_vendor
group by users.id_for_vendor,subs.id_for_vendor) b
group by date_
order by date_
date_ is confusing, because it is both a column and an alias. Columns get resolved first. So this should fix your problem:
group by date_trunc('day', date_)
You can also fix it by using a different alias name, one not already used for a column.