This query would be too heavy , need to be refactored. how can i do? - sql

This query would be too heavy, needs to be refactored. How can I do that?
Please help
SELECT
contract_type, SUM(fte), ROUND(SUM(fte * 100 / t.s ), 0) AS "% of total"
FROM
design_studio_testing.empfinal_tableau
CROSS JOIN
(SELECT SUM(fte) AS s
FROM design_studio_testing.empfinal_tableau) t
GROUP BY
contract_type;
Output should be like this:

Use window functions:
SELECT contract_type,
SUM(fte),
ROUND(SUM(fte) * 100.0 / SUM(SUM(fte)) OVER (), 0) AS "% of total"
FROM design_studio_testing.empfinal_tableau
GROUP BY contract_type;
That said, your original version should not be that much slower than this, unless perhaps empfinal_tableau is a view.
If it is a table, you could further speed this with an index on empfinal_tableau(contract_type, fte).

There is no need to sum over the expression:
fte * 100 / t.s
which may slow the process.
Get SUM(fte) and then multiply and divide:
SELECT g.contract_type, g.sum_fte,
ROUND(100.0 * g.sum_fte / t.s, 0) AS [% of total]
FROM (
SELECT
contract_type,
SUM(fte) AS sum_fte
FROM design_studio_testing.empfinal_tableau
GROUP BY contract_type
) AS g CROSS JOIN (SELECT SUM(fte) AS s FROM design_studio_testing.empfinal_tableau) t
Edit for Oracle:
SELECT g.contract_type, g.sum_fte,
ROUND(100.0 * g.sum_fte / t.s, 0) AS "% of total"
FROM (
SELECT
contract_type,
SUM(fte) AS sum_fte
FROM empfinal_tableau
GROUP BY contract_type
) g CROSS JOIN (SELECT SUM(fte) AS s FROM empfinal_tableau) t

Related

Attempting to calculate absolute change and % change in 1 query

I'm having trouble with the SELECT portion of this query. I can calculate the absolute change just fine, but when I want to also find out the percent change I get lost in all the subqueries. Using BigQuery. Thank you!
SELECT
station_name,
ridership_2013,
ridership_2014,
absolute_change_2014 / ridership_2013 * 100 AS percent_change,
(ridership_2014 - ridership_2013) AS absolute_change_2014,
It will probably be beneficial to organize your query with CTEs and descriptive aliases to make things a bit easier. For example...
with
data as (select * from project.dataset.table),
ridership_by_year as (
select
extract(year from ride_date) as yr,
count(*) as rides
from data
group by 1
),
ridership_by_year_and_station as (
select
extract(year from ride_date) as yr,
station_name,
count(*) as rides
from data
group by 1,2
),
yearly_changes as (
select
this_year.yr,
this_year.rides,
prev_year.rides as prev_year_rides,
this_year.rides - coalesce(prev_year.rides,0) as absolute_change_in_rides,
safe_divide( this_year.rides - coalesce(prev_year.rides), prev_year.rides) as relative_change_in_rides
from ridership_by_year this_year
left join ridership_by_year prev_year on this_year.yr = prev_year.yr + 1
),
yearly_station_changes as (
select
this_year.yr,
this_year.station_name,
this_year.rides,
prev_year.rides as prev_year_rides,
this_year.rides - coalesce(prev_year.rides,0) as absolute_change_in_rides,
safe_divide( this_year.rides - coalesce(prev_year.rides), prev_year.rides) as relative_change_in_rides
from ridership_by_year this_year
left join ridership_by_year prev_year on this_year.yr = prev_year.yr + 1
)
select * from yearly_changes
--select * from yearly_station_changes
Yes this is a bit longer, but IMO it is much easier to understand.

Calculating % of COUNT with groupby function in bigquery

Running into some issues figuring out how to add in an extra column that will give me the percentage of the total of the aggregate of the count function. The query I have looks like this:
Select
count(*) AS num_rides,
member_casual
FROM `2020_bikeshare_data`
GROUP BY member_casual
ORDER BY num_rides DESC
And returns me this result:
num_rides
member_casual
2134988
member
1341217
casual
And what I'd like to do is add a 3rd column that lists the percent of the total each membership makes up
num_rides
member_casual
perc_tot
2134988
member
61.4%
1341217
casual
38.6
thoughts?
You window functions:
SELECT member_casual,
COUNT(*) AS num_rides,
COUNT(*) * 1.0 / SUM(COUNT(*)) OVER ()
FROM `2020_bikeshare_data`
GROUP BY member_casual
ORDER BY num_rides DESC;
No subquery is needed.
Consider below approach
select distinct member_casual,
count(num_rides) over type as num_rides,
round(count(num_rides) over type * 100.0 / count(num_rides) over(), 2) as perc_tot
from `2020_bikeshare_data`
window type as (partition by member_casual)
# order by num_rides desc
if applied to sample data in your question - output is
The simplest way is use a subquery as part of the column expression to calculate your percentage:
select
count(1) as num_rides,
member_casual,
sum(100) / (select sum(1.0) from `2020_bikeshare_data`) as perc_tot -- return as percentage
from
`2020_bikeshare_data`
group by
member_casual
Using the subquery, get the total number of rows and calculate the percentage accordingly.
Select
count(*) AS num_rides,
member_casual,
Concat(count(*) * 100 / totalRecord,' %') as perc_tot
FROM (SELECT *,COUNT(*) as totalRecord FROM `2020_bikeshare_data`)
GROUP BY member_casual
or
Select
count(*) AS num_rides,
member_casual,
Concat(count(*) * 100 / (SELECT COUNT(*) FROM `2020_bikeshare_data`) ,' %') as perc_tot
FROM `2020_bikeshare_data`
GROUP BY member_casual
In addition to the other answers, you can also break this down into simple SQL (without window functions) by organizing with CTEs.
with
data as (select * from `2020_bikeshare_data`),
total as (select count(*) as ride_count from data),
by_type as (select member_casual, count(*) as ride_count from data group by 1)
select
member_casual,
by_type.ride_count as num_rides,
by_type.ride_count / total.ride_count as perc_tot
from by_type
cross join total
In my opinion, this is much easier to see the perc_tot calculation.

Referencing other columns in a SQL SELECT

I have a SQL query in BigQuery:
SELECT
creator.country,
(SUM(length) / 60) AS total_minutes,
COUNT(DISTINCT creator.id) AS total_users,
(SUM(length) / 60 / COUNT(DISTINCT creator.id)) AS minutes_per_user
FROM
...
You may have noticed that the last column is equivalent to total_minutes / total_users.
I tried this, but it doesn't work:
SELECT
creator.country,
(SUM(length) / 60) AS total_minutes,
COUNT(DISTINCT creator.id) AS total_users,
(total_minutes / total_users) AS minutes_per_user
FROM
...
Is there any way to make this simpler?
Not really. That is, you cannot re-use column aliases in expressions in the same SELECT. If you really want, you can use a subquery or CTE:
SELECT c.*,
total_minutes / total_users
FROM (SELECT creator.country,
(SUM(length) / 60) AS total_minutes,
COUNT(DISTINCT creator.id) AS total_users
FROM
) c;
Another option is to move all business logic of metrics calculation into UDF (temp or permanent depends on usage needs) ...
create temp function custom_stats(arr any type) as ((
select as struct
sum(length) / 60 as total_minutes,
count(distinct id) as total_users,
sum(length) / 60 / count(distinct id) as minutes_per_user
from unnest(arr)
));
... and thus keep query itself simple and least verbose - as in below example
select creator.country,
custom_stats(array_agg(struct(length, creator.id))).*
from `project.dataset.table`
group by country

Two selects based on one select in on query

Can I do one select and then do different selects on the result in one query?
Now I want to do something like that (which is not working)
select
(select count(*), sum(amount) from view where amount > 5),
(select count(*), sum(amount) from view where amount < 5)
from
(select id, amount from warehouse where createDate = '2019-01-01') as view;
I don't want to select view and then select some data with additional filtering based on the view.
You can use conditional aggregation:
select count(*),
sum(amount) filter (where waga > 5),
sum(amount) filter (where amount < 5)
from warehouse
where createdate = date '2019-01-01'
About the general syntax question you could use WITH clause:
with v as
(
select id, amount from warehouse where createDate = '2019-01-01'
)
select * from
(
(select count(*), sum(amount) from v where waga > 5) as count1,
(select count(*), sum(amount) from v where amount < 5) as count2
);
(I don't mean it will be faster; it's just a way to use an "inline" view).

Problem to fix a question with WindowFunction

For school, I have to answer the following question, using a window function.
For each year, for each month, for each product category, indicate the percentage of that month's turnover that was from the annual turnover of that category.
I tried to use the window function but it didnt work. Because i dont know how to use the over (partition by) function
select
catcode,
year(besteldatum) as jaar,
month(besteldatum) as maand,
sum(regelomzet) as omzet,
sum(regelomzet) / (
select sum(regelomzet)
from ##joinall t2
where t2.catcode = t1.catcode
and year(t2.besteldatum) = year(t1.besteldatum)
) * 100 as perc
from ##joinall t1
group by catcode, year(besteldatum), month(besteldatum)
order by catcode, year(besteldatum), month(besteldatum)
With the window functions there's a thing to realize about them.
They get processed after the GROUP BY.
Hence, it's possible to sum over a sum.
And the PARTITION BY in an OVER is kinda similar to GROUP BY.
SELECT
catcode,
year(besteldatum) as jaar,
month(besteldatum) as maand,
sum(regelomzet) as omzet,
cast(
(sum(regelomzet) /
SUM(sum(regelomzet)) OVER (PARTITION BY catcode, year(besteldatum))) * 100
as decimal(5,2)) as perc
FROM bestellingen t
GROUP BY catcode, year(besteldatum), month(besteldatum)
ORDER BY 1, 2, 3;