Problem to fix a question with WindowFunction - sql

For school, I have to answer the following question, using a window function.
For each year, for each month, for each product category, indicate the percentage of that month's turnover that was from the annual turnover of that category.
I tried to use the window function but it didnt work. Because i dont know how to use the over (partition by) function
select
catcode,
year(besteldatum) as jaar,
month(besteldatum) as maand,
sum(regelomzet) as omzet,
sum(regelomzet) / (
select sum(regelomzet)
from ##joinall t2
where t2.catcode = t1.catcode
and year(t2.besteldatum) = year(t1.besteldatum)
) * 100 as perc
from ##joinall t1
group by catcode, year(besteldatum), month(besteldatum)
order by catcode, year(besteldatum), month(besteldatum)

With the window functions there's a thing to realize about them.
They get processed after the GROUP BY.
Hence, it's possible to sum over a sum.
And the PARTITION BY in an OVER is kinda similar to GROUP BY.
SELECT
catcode,
year(besteldatum) as jaar,
month(besteldatum) as maand,
sum(regelomzet) as omzet,
cast(
(sum(regelomzet) /
SUM(sum(regelomzet)) OVER (PARTITION BY catcode, year(besteldatum))) * 100
as decimal(5,2)) as perc
FROM bestellingen t
GROUP BY catcode, year(besteldatum), month(besteldatum)
ORDER BY 1, 2, 3;

Related

Calculating % of COUNT with groupby function in bigquery

Running into some issues figuring out how to add in an extra column that will give me the percentage of the total of the aggregate of the count function. The query I have looks like this:
Select
count(*) AS num_rides,
member_casual
FROM `2020_bikeshare_data`
GROUP BY member_casual
ORDER BY num_rides DESC
And returns me this result:
num_rides
member_casual
2134988
member
1341217
casual
And what I'd like to do is add a 3rd column that lists the percent of the total each membership makes up
num_rides
member_casual
perc_tot
2134988
member
61.4%
1341217
casual
38.6
thoughts?
You window functions:
SELECT member_casual,
COUNT(*) AS num_rides,
COUNT(*) * 1.0 / SUM(COUNT(*)) OVER ()
FROM `2020_bikeshare_data`
GROUP BY member_casual
ORDER BY num_rides DESC;
No subquery is needed.
Consider below approach
select distinct member_casual,
count(num_rides) over type as num_rides,
round(count(num_rides) over type * 100.0 / count(num_rides) over(), 2) as perc_tot
from `2020_bikeshare_data`
window type as (partition by member_casual)
# order by num_rides desc
if applied to sample data in your question - output is
The simplest way is use a subquery as part of the column expression to calculate your percentage:
select
count(1) as num_rides,
member_casual,
sum(100) / (select sum(1.0) from `2020_bikeshare_data`) as perc_tot -- return as percentage
from
`2020_bikeshare_data`
group by
member_casual
Using the subquery, get the total number of rows and calculate the percentage accordingly.
Select
count(*) AS num_rides,
member_casual,
Concat(count(*) * 100 / totalRecord,' %') as perc_tot
FROM (SELECT *,COUNT(*) as totalRecord FROM `2020_bikeshare_data`)
GROUP BY member_casual
or
Select
count(*) AS num_rides,
member_casual,
Concat(count(*) * 100 / (SELECT COUNT(*) FROM `2020_bikeshare_data`) ,' %') as perc_tot
FROM `2020_bikeshare_data`
GROUP BY member_casual
In addition to the other answers, you can also break this down into simple SQL (without window functions) by organizing with CTEs.
with
data as (select * from `2020_bikeshare_data`),
total as (select count(*) as ride_count from data),
by_type as (select member_casual, count(*) as ride_count from data group by 1)
select
member_casual,
by_type.ride_count as num_rides,
by_type.ride_count / total.ride_count as perc_tot
from by_type
cross join total
In my opinion, this is much easier to see the perc_tot calculation.

How to add a new column that summarize rows

I have two issues :
I used 'Rollup' function to add Totals per Month and Year and I would like to change 'NULL' into grand_total as in the attached screenshot
I dont know how to add a new column that will summarize values starting from the second row
Please see attached screenshot of the results I need to receive and an example for a code from my side with a screenshot of the source output : [1]: https://i.stack.imgur.com/6B70o.png
[1]: https://i.stack.imgur.com/E2x8K.png
Select Year(Modifieddate) AS Year,
MONTH(modifieddate) as Month,
Sum(linetotal) as Sum_price
from Sales.SalesOrderDetail
Group by rollup( Year(Modifieddate),MONTH(modifieddate))
Thanks in advance,
I think this will work:
Select Year(Modifieddate) AS Year,
coalesce(convert(varchar(255), month(modifieddate)), 'Grand Total') as Month,
Sum(linetotal) as Sum_price,
sum(sum(linetotal)) over (partition by Year(Modifieddate)
order by coalesce(month(modifieddate), 100)
) as ytd_sum_price
from Sales.SalesOrderDetail
Group by rollup( Year(Modifieddate), month(modifieddate))
The coalesce() in the order by is to put the summary row last for the cumulative sum.
Like this:
Select Year(Modifieddate) AS Year, MONTH(modifieddate) as Month, Sum(linetotal) as Sum_price
from Sales.SalesOrderDetail
Group by rollup( Year(Modifieddate),MONTH(modifieddate))
UNION
Select Year(Modifieddate) AS Year, 'grand_total' as Month, Sum(linetotal) as Sum_price
from Sales.SalesOrderDetail
Group by Year(Modifieddate)
-- SQL SERVER
SELECT t.OrderYear
, CASE WHEN t.OrderMonth IS NULL THEN 'Grand Total' ELSE CAST(t.OrderMonth AS VARCHAR(20)) END b
, t.MonthlySales
, MAX(t.cum_total) cum_total
FROM (SELECT
YEAR(OrderDate) AS OrderYear,
MONTH(OrderDate) AS OrderMonth,
SUM(SubTotal) AS MonthlySales,
SUM(SUM(SubTotal)) OVER (ORDER BY YEAR(OrderDate), MONTH(OrderDate) ROWS UNBOUNDED PRECEDING) cum_total
FROM Sales.SalesOrderHeader
GROUP BY GROUPING SETS ((YEAR(OrderDate), MONTH(OrderDate)))) t
GROUP BY GROUPING SETS ((t.OrderYear
, t.OrderMonth
, t.MonthlySales), t.OrderYear);
Please check this url https://dbfiddle.uk/?rdbms=sqlserver_2019&sample=adventureworks&fiddle=e6cd2ba8114bd1d86b8c61b1453cafcf
To build one #GordonLinoff's answer, you are really supposed to use the GROUPING() function to check whether you are dealing with the grouping column. This behaves better in the face of nullable columns.
Select case when grouping(Year(Modifieddate)) = 0
then Year(Modifieddate)
else 'Grand Total' end AS Year,
case when grouping(month(modifieddate)) = 0
then convert(varchar(255), month(modifieddate))
else 'Grand Total' end as Month,
Sum(linetotal) as Sum_price,
sum(sum(linetotal)) over (
partition by
grouping(Year(Modifieddate)),
grouping(month(modifieddate)),
Year(Modifieddate)
order by month(modifieddate)
) as ytd_sum_price
from Sales.SalesOrderDetail
Group by rollup( Year(Modifieddate), month(modifieddate));

getting difference between two invoices by ranking and subtracting one from the other

Trying to grab difference in invoices
Attempted using cte's for ranks 1 and 2, but they have a subquery in them and cant be done!
the second query looks the same, but with rank=2.
select *
from (
SELECT i.id, i.subtotal/100 as subtotal, i.created_at, i.paid_at
,RANK() OVER (PARTITION BY i.subscription_id ORDER BY i.created_at DESC) AS Rank
From Invoices i
) as r
where r.rank = 1
order by r.created_at desc;
Following the path that you are on (using row_number()/rank()), you can use conditional aggregation. Assuming you want the difference of the subtotal, then:
select sum(case when seqnum = 1 then subtotal
else - subtotal
end) as difference
from (select i.*, i.subtotal/100 as subtotal,
row_number() over (partition by i.subscription_id order by i.created_at desc) as seqnum
from Invoices i
) i
where seqnum in (1, 2)
order by r.created_at desc;

Query for both daily aggregate, and then monthly aggregates in the same query?

I would like to count the number of daily unique active users by subreddit and day, and then aggregate these counts onto monthly unique active users by group and month. Doing each one individually is simple enough, but when I try to do them in one combined query, it tells me that I need to group by date_month_day in my second-level subquery, which would result in monthly_unique_users being the same as daily_unique_uauthors..(Error: Expression 'date_month_day' is not present in the GROUP BY list [invalidQuery]).
Here is the query I have so far:
SELECT * FROM
(
SELECT *,
(daily_unique_authors/monthly_unique_authors) * 1.0 AS ratio,
ROW_NUMBER() OVER (PARTITION BY date_month_day ORDER BY ratio DESC) rank
FROM
(
SELECT subreddit,
date_month_day,
daily_unique_authors,
SUM(daily_unique_authors) AS monthly_unique_authors,
LEFT(date_month_day, 7) as date_month
FROM
(
SELECT subreddit,
LEFT(DATE(SEC_TO_TIMESTAMP(created_utc)), 10) as date_month_day,
COUNT(UNIQUE(author)) as daily_unique_authors
FROM TABLE_QUERY([fh-bigquery:reddit_comments], "table_id CONTAINS \'20\' AND LENGTH(table_id)<8")
GROUP EACH BY subreddit, date_month_day
)
GROUP EACH BY subreddit, date_month))
WHERE rank <= 100
ORDER BY date_month ASC
The final output should ideally be something like:
subreddit date_month date_month_day daily_unique_users monthly_unique_users ratio
1 google 2005-12 2005-12-29 77 600 0.128
2 google 2005-12 2005-12-31 52 600 0.866
3 google 2005-12 2005-12-28 81 600 0.135
4 google 2005-12 2005-12-27 73 600 0.121
Below is for BigQuery Standard SQL
#standardSQL
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY date_month_day ORDER BY ratio DESC) rank
FROM (
SELECT
daily.subreddit subreddit,
daily.date_month date_month,
date_month_day,
daily_unique_authors,
monthly_unique_authors,
1.0 * daily_unique_authors / monthly_unique_authors AS ratio
FROM (
SELECT subreddit,
DATE(TIMESTAMP_SECONDS(created_utc)) AS date_month_day,
FORMAT_DATE('%Y-%m', DATE(TIMESTAMP_SECONDS(created_utc))) AS date_month,
COUNT(DISTINCT author) AS daily_unique_authors
FROM `fh-bigquery.reddit_comments.2018*`
GROUP BY subreddit, date_month_day, date_month
) daily
JOIN (
SELECT subreddit,
FORMAT_DATE('%Y-%m', DATE(TIMESTAMP_SECONDS(created_utc))) AS date_month,
COUNT(DISTINCT author) AS monthly_unique_authors
FROM `fh-bigquery.reddit_comments.2018*`
GROUP BY subreddit, date_month
) monthly
ON daily.subreddit = monthly.subreddit
AND daily.date_month = monthly.date_month
)
)
WHERE rank <= 100
ORDER BY date_month
Note: I tried to leave the original logic and structure as much as possible as it is in the question - so OP will be able to correlate answer with question and make further adjustments if needed :o)

SQL Oracle/Aggregation query

I'm trying to run a query on Oracle. I've a table of settled payments for accounts, I have a query which pulls through the last three settled amounts, plus any amount which was written off, for any account I need this info for.
However, some of the accounts are weekly based, and for these I would like to aggregate their weekly settlements into their monthly groups. Here is the code I have so far:
SELECT *
FROM (
SELECT *
FROM (
SELECT gwod.account_id,
gwod.charge_period_start,
SUM(gwod.total_due_on_charge) total_due_on_charge,
SUM(gwod.amount_written_off) amount_written_off,
DENSE_RANK() over (PARTITION BY gwod.account_id
ORDER BY charge_period_start DESC) rownumber
FROM report.accounts_write_off gwod
WHERE account_id IN (‘account_number’)
GROUP BY gwod.account_id,
gwod.charge_period_start
HAVING SUM (gwod.total_due_on_charge) <> 0) t1
WHERE t1.rownumber <=3)
PIVOT (MAX(charge_period_start) charge_period,
MAX(total_due_on_charge) total_due_on_charge,
MAX(amount_written_off) amount_written_off
FOR rownumber IN (1,2,3))
ORDER BY account_id.*
This works perfectly but for the weekly based accounts, so rather than pulling through the last three weekly amounts which were settled, i.e. 25-09-17, 18-09-17, 11-09-2017, I'd like to pull through the aggregated payments for September, August, and July.
I hope all this makes sense.
Simply change your aggregation from current unit level (i.e., weekly) to month level with EXTRACT(month ...) in inner query's SELECT and GROUP BY as well as PARTITION and PIVOT clauses:
SELECT *
FROM (
SELECT *
FROM (
SELECT gwod.account_id,
EXTRACT(month FROM gwod.charge_period_start) charge_period_month,
SUM(gwod.total_due_on_charge) total_due_on_charge,
SUM(gwod.amount_written_off) amount_written_off,
DENSE_RANK() over (PARTITION BY gwod.account_id
ORDER BY EXTRACT(month FROM gwod.charge_period_start) DESC) rownumber
FROM report.accounts_write_off gwod
WHERE account_id IN ('account_number')
GROUP BY gwod.account_id,
EXTRACT(month FROM gwod.charge_period_start)
HAVING SUM (gwod.total_due_on_charge) <> 0) t1
WHERE t1.rownumber <=3)
PIVOT (MAX(charge_period_month) charge_period,
MAX(total_due_on_charge) total_due_on_charge,
MAX(amount_written_off) amount_written_off
FOR rownumber IN (1,2,3))
ORDER BY account_id.*
DEMO (with random data):
http://rextester.com/UJK84858