sql group by sum of all sums - sql

I have a query (formatted for Oracle):
select sum(inv.quantity * inv.price), spn.salesperson_name
from invoice inv
inner join salesperson spn on spn.spn_id = inv.spn_id
where inc.invoice_date between to_date('05/01/2017', 'MM/dd/YYYY') and to_date('05/31/2017', 'MM/dd/YYYY')
group by spn.salesperson_name
To add up invoices for the month of May. The result is similar to:
$446,088.62 Bob
$443,439.29 Sally
$275,097.00 Tom
$95,170.00 George
$53,150.00 Jill
But, I need to divide each sum to the sum of the sums ($1,312,944.91), so that the result is:
$446,088.62 34% Bob
$443,439.29 34% Sally
$275,097.00 21% Tom
$95,170.00 7% George
$53,150.00 4% Jill
(The sum of the percentage column should be 100%)
Is there a way to accomplish this in the query?

When functions exist that do exactly what you need, it is best to use those functions. In this case, the SQL Standard analytic function RATIO_TO_REPORT (which is implemented at least in Oracle and SQL Server) does exactly what you need. https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions124.htm
Specifically, the select clause could be:
select sum(inv.quantity * inv.price) AS TOTAL_REVENUE -- use column aliases!
, ratio_to_report(sum(inv.quantity * inv.price)) over () AS RATIO,
, spn.salesperson_name
from ....... (rest of your query goes here)
Note that this solution, like the Accepted Answer, will show the ratio as a decimal, not as a percentage (and not rounded). If you need to attach a percentage sign, you will need to convert to string... and if so, the following trick (it is a trick!) will give you what you need:
to_char( ratio_to_report(.....), 'fm99L', 'nls_currency = %' ) AS RATIO, .....
The L element in to_char is for currency symbol; you define the currency symbol to be the percent sign.

Just use analytic functions:
select spn.salesperson_name, sum(inv.quantity * inv.price),
sum(inv.quantity * inv.price) / sum(sum(inv.quantity * inv.price)) over () as ratio
from invoice inv inner join
salesperson spn
on spn.spn_id = inv.spn_id
where inc.invoice_date between date '2017-05-01' and date '2017-05-31'
group by spn.salesperson_name;

Related

SQL- calculate ratio and get max ratio with corresponding user and date details

I have a table with user, date and a col each for messages sent and messages received:
I want to get the max of messages_sent/messages_recieved by date and user for that ratio. So this is the output I expect:
Andrew Lean 10/2/2020 10
Andrew Harp 10/1/2020 6
This is my query:
SELECT
ds.date, ds.user_name, max(ds.ratio) from
(select a.user_name, a.date, a.message_sent/ a.message_received as ratio
from messages a
group by a.user_name, a.date) ds
group by ds.date
But the output I get is:
Andrew Lean 10/2/2020 10
Jalinn Kim 10/1/2020 6
In the above output 6 is the correct max ratio for the date grouped but the user is wrong. What am I doing wrong?
With a recent version of most databases, you could do something like this.
This assumes, as in your data, there's one row per user per day. If you have more rows per user per day, you'll need to provide a little more detail about how to combine them or ignore some rows. You could want to SUM them. It's tough to know.
WITH cte AS (
select a.user_name, a.date
, a.message_sent / a.message_received AS ratio
, ROW_NUMBER() OVER (PARTITION BY a.date ORDER BY a.message_sent / a.message_received DESC) as rn
from messages a
)
SELECT t.user_name, t.date, t.ratio
FROM cte AS t
WHERE t.rn = 1
;
Note: There's no attempt to handle ties, where more than one user has the same ratio. We could use RANK (or other methods) for that, if your database supports it.
Here, I am just calculating the ratio for each column in the first CTE.
In the second part, I am getting the maximum results of the ratio calculated in the first part on date level. This means I am assuming each user will have one row for each date.
The max() function on date level will ensure that we always get the highest ratio on date level.
There could be ties, between the ratios for that we can use ROW_NUMBER' OR RANK()` to set a rank for each row based on the criteria that we would like to pass in case of ties and then filter on the rank generated.
with data as (
select
date,
user_id,
messages_sent / messages_recieved as ratio
from [table name]
)
select
date,
max(ratio) as higest_ratio_per_date
from data
group by 1,2

SQL - Sum Aggregate function

I am fairly new to SQL, so please bare that in mind.
I have have a table with data that is divided into two centers "Toronto & Montreal" for a total of 78 rows (39 per center), with multiple columns. I want to get the national total (sum) of X column for each respective center for a specific daterange (month). For example the Jan 2018 total of full time employees between Montreal + Toronto Centers combined. The logic statement for example would be to add Toronto + Montreal Jan 2018 results to yield me the national total. Although with my beginner skills I am having trouble writing a sql query that can be executed without syntax error.
select sum(fte), daterange, center
from (
select 'Toronto' as Center,
sum(fte) as total fte
from dbo.example
where daterange = '2015-11-01'
group by total fte
union
select 'Montreal' as Center,
sum(fte) as total fte
from dbo.example
where daterange = '2015-11-01'
group by total fte
)temptable
group by total fte
The above query is giving me a error "Incorrect syntax near 'fte'." fte is a column within my table.
Please advise me what I am doing wrong.
Cheers!
The query should look like this:
select Center,
sum(fte) as "[total fte]"
from dbo.example
where daterange = '2015-11-01'
and Center in ('Toronto','Montreal')
group by Center
Understanding that in the attribute Center exist these Strings.
select Center,
sum(convert(float,fte)) as total_fte
from dbo.example
where daterange = '2016-11-01'
and Center in ('Toronto','Montreal')
group by Center
Works ^ thanks everyone

Oracle query for the attached output

I have a scenario where I need to show daily transactions and also total transaction for that month with date and other fields like type, product etc.
Once I have that, the main requirement is to get the daily percentage of total for that month, below is an example of it. 3 transaction on 1st jan and 257 for total of jan and the percentage of 1st jan is (3/257)*100, similarly 10 is for 2nd jan and the percentage is (10/257) and so on.
can anyone help me with the sql query?
Date Type Transaction Total_For_month Percentage
1/1/2017 A 3 257 1%
1/2/2017 B 10 257 4%
1/3/2017 A 5 257 2%
1/4/2017 C 8 257 3%
1/5/2017 D 12 257 5%
1/6/2017 D 17 257 7%
Use window functions:
select t.*,
sum(transaction) over (partition by to_char(date, 'YYYY-MM')) as total_for_month,
transaction / sum(transaction) over (partition by to_char(date, 'YYYY-MM')) as ratio
from t;
DATE and TYPE are Oracle keywords, I hope you are not using them literally as column names. I will use DT and TP below.
You didn't say one way or the other, but it seems like you must filter your data so that the final report is for a single month (rather than for a full year, say). If so, you could do something like this. Notice the analytic function RATIO_TO_REPORT. Note that I multiply the ratio by 100, and I use some non-standard formatting to get the result in the "percentage" format; don't worry too much if you don't understand that part from the first reading.
select dt, tp, transaction, sum(transaction) over () as total_trans_for_month,
to_char(100 * ratio_to_report(transaction) over (), '90.0L',
'nls_currency=%') as pct_of_monthly_trans
from your_table
where dt >= date '2017-01-01' and dt < add_months(date '2017-01-01', 1)
order by dt -- if needed (add more criteria as appropriate).
Notice the analytic clause: over (). We are not partitioning by anything, and we are not ordering by anything either; but since we want every row of input to generate a row in the output, we still need the analytic version of sum, and the analytic function ratio_to_report. The proper way to achieve this is to include the over clause, but leave it empty: over ().
Note also that in the where clause I did not wrap dt within trunc or to_char or any other function. If you are lucky, there is an index on that column, and writing the where conditions as I did allows that index to be used, if the Optimizer finds it should be.
The date '2017-01-01' is arbitrary (chosen to match your example); in production it should probably be a bind variable.

SQL percentage syntax

May I ask why I should use CodeA instead of CodeB for calculating the percentage? The result are totally different.
Thank you so much for your help!
CodeA:
select name, round(sum(amount_paid) /
(select sum(amount_paid) from order_items) * 100.0, 2) as pct
from order_items
group by 1
order by 2 desc;
CodeB:
select name, round((amount_paid /
sum(amount_paid)) * 100.0, 2) as pct
from order_items
group by 1
order by 2 desc;
CodeB is totally wrong because it uses wrong GROUP BY statement (amount_paid column without aggregate function).
Strict databases will trowing error if you try to use this query.
CodeA uses subselect (select sum(amount_paid) from order_items) which calculates total sum(amount_paid) from table and then uses it for calculating percentage for each row.
It is difficult to be sure without knowing the database and the data you are operating on. I believe however that the group by section in SQL is executed before the select section. This means that in CodeB I think the records have already been grouped by the Name column by the time sum(amount_paid) is executed on the amount_paid field. This means it would be calculating the sum of the records by group instead of all the records.

12 month moving average by person, date

I have a table [production] that contains the following structure:
rep (char(10))
,cyc_date (datetime) ---- already standardized to mm/01/yyyy
,amt (decimal)
I have data for each rep from 1/1/2011 to 8/1/2013. What I want to be able to do is create a 12 month moving average beginning 1/1/2012 for each rep, as follows:
rep cyc_dt 12moAvg
-------------------------
A 1/1/2012 10000.01
A 2/1/2012 13510.05
. ........ ........
A 8/1/2013 22101.32
B 1/1/2012 98328.22
B ........ ........
where each row represents the 12 month moving average for said rep at stated time. I found some examples that were vaguely close and I tried them to no avail. It seems the addition of a group by rep component is the major departure from other examples.
This is about as far as I got:
SELECT
rep,
cyc_date,
(
SELECT Avg([amt])
FROM production Q
WHERE Q.[cyc_date] BETWEEN DateAdd("yyyy",-1,[cyc_date]+1) AND [cyc_date]
) AS 12moavg
FROM production
That query seems to pull an overall average or sum, since there is no grouping in the correlated subquery. When I try to group by, I get an error that it can only return at most one row.
I think it may work with 2 adjustments to the correlated subquery.
Subtract 11 months in the DateAdd() expression.
Include another WHERE condition to limit the average to the same rep as the current row of the parent (containing) query.
SELECT
p.rep,
p.cyc_date,
(
SELECT Avg(Q.amt)
FROM production AS Q
WHERE
Q.rep = p.rep
AND
Q.cyc_date BETWEEN DateAdd("m", -11, p.cyc_date)
AND p.cyc_date
) AS [12moavg]
FROM production AS p;
Correlated subqueries can be slow. Make sure to index rep and cyc_date to limit the pain with this one.