SQL percentage syntax - sql

May I ask why I should use CodeA instead of CodeB for calculating the percentage? The result are totally different.
Thank you so much for your help!
CodeA:
select name, round(sum(amount_paid) /
(select sum(amount_paid) from order_items) * 100.0, 2) as pct
from order_items
group by 1
order by 2 desc;
CodeB:
select name, round((amount_paid /
sum(amount_paid)) * 100.0, 2) as pct
from order_items
group by 1
order by 2 desc;

CodeB is totally wrong because it uses wrong GROUP BY statement (amount_paid column without aggregate function).
Strict databases will trowing error if you try to use this query.
CodeA uses subselect (select sum(amount_paid) from order_items) which calculates total sum(amount_paid) from table and then uses it for calculating percentage for each row.

It is difficult to be sure without knowing the database and the data you are operating on. I believe however that the group by section in SQL is executed before the select section. This means that in CodeB I think the records have already been grouped by the Name column by the time sum(amount_paid) is executed on the amount_paid field. This means it would be calculating the sum of the records by group instead of all the records.

Related

Get count of records of the top 3 rows and compare the counts

In SQL Server 2016, I have a query as such:
SELECT [Report_date], count(distinct indv_id)
FROM
[dbo].[STG_TABLE] group by report_date order by report_date desc
I get the results as below:
Report_date (No column name)
2020-08-21 47918
2020-08-12 968065
2020-07-31 977804
Now I want to compare the difference between the counts in each row. If the difference is more than 10%, then I need to send an email out in the SSIS package.
How can I go through each row and calculate the difference? I want to look at the first row and compare it with the second row.
You question seems to be about calculating the ratios between rows. For that, use lag(). To get the ratio:
SELECT [Report_date], COUNT(DISTINCT indv_id),
(COUNT(DISTINCT indv_id) * 1.0 / LAG(COUNT(DISTINCT indv_id)) OVER (ORDER BY report_date))
FROM [dbo].[STG_TABLE]
GROUP BY report_date
ORDER BYreport_date DESC;
I'm not sure what results you want, but this is the basic information.

Redshift - Find % as compared to total value

I have a table with count by product. I am trying to add a new column that would find % as compared to sum of all rows in that column.
prod_name,count
prod_a,100
prod_b,50
prod_c,150
For example, I want to find % of prod_a as compared to the total count and so on.
Expected output:
prod_name,count,%
prod_a,100,0.33
prod_b,50,0.167
prod_c,150,0.5
Edit on SQL:
select count(*),ratio_to_report(prod_name)
over (partition by count(*))
from sales
group by prod_name;
Using window functions.
select t.*,100.0*cnt_by_prod/sum(cnt_by_prod) over() as pct
from tbl t
Edit: Based on OP's question change, To compute the counts and then percentage, use
select prod_name,100.0*count(*)/sum(count(*)) over()
from tbl
group by prod_name

sql group by sum of all sums

I have a query (formatted for Oracle):
select sum(inv.quantity * inv.price), spn.salesperson_name
from invoice inv
inner join salesperson spn on spn.spn_id = inv.spn_id
where inc.invoice_date between to_date('05/01/2017', 'MM/dd/YYYY') and to_date('05/31/2017', 'MM/dd/YYYY')
group by spn.salesperson_name
To add up invoices for the month of May. The result is similar to:
$446,088.62 Bob
$443,439.29 Sally
$275,097.00 Tom
$95,170.00 George
$53,150.00 Jill
But, I need to divide each sum to the sum of the sums ($1,312,944.91), so that the result is:
$446,088.62 34% Bob
$443,439.29 34% Sally
$275,097.00 21% Tom
$95,170.00 7% George
$53,150.00 4% Jill
(The sum of the percentage column should be 100%)
Is there a way to accomplish this in the query?
When functions exist that do exactly what you need, it is best to use those functions. In this case, the SQL Standard analytic function RATIO_TO_REPORT (which is implemented at least in Oracle and SQL Server) does exactly what you need. https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions124.htm
Specifically, the select clause could be:
select sum(inv.quantity * inv.price) AS TOTAL_REVENUE -- use column aliases!
, ratio_to_report(sum(inv.quantity * inv.price)) over () AS RATIO,
, spn.salesperson_name
from ....... (rest of your query goes here)
Note that this solution, like the Accepted Answer, will show the ratio as a decimal, not as a percentage (and not rounded). If you need to attach a percentage sign, you will need to convert to string... and if so, the following trick (it is a trick!) will give you what you need:
to_char( ratio_to_report(.....), 'fm99L', 'nls_currency = %' ) AS RATIO, .....
The L element in to_char is for currency symbol; you define the currency symbol to be the percent sign.
Just use analytic functions:
select spn.salesperson_name, sum(inv.quantity * inv.price),
sum(inv.quantity * inv.price) / sum(sum(inv.quantity * inv.price)) over () as ratio
from invoice inv inner join
salesperson spn
on spn.spn_id = inv.spn_id
where inc.invoice_date between date '2017-05-01' and date '2017-05-31'
group by spn.salesperson_name;

20 Day moving average with joins alone

There are questions like this all over the place so let me specify where I specifically need help.
I have seen moving averages in SQL with Oracle Analytic functions, MSSQL apply, or a variety of other methods. I have also seen this done with self joins (one join for each day of the average, such as here How do you create a Moving Average Method in SQL? ).
I am curious as to if there is a way (only using self joins) to do this in SQL (preferably oracle, but since my question is geared towards joins alone this should be possible for any RDBMS). The way would have to be scalable (for a 20 or 100 day moving average, in contrast to the link I researched above, which required a join for each day in the moving average).
My thoughts are
select customer, a.tradedate, a.shares, avg(b.shares)
from trades a, trades b
where b.tradedate between a.tradedate-20 and a.tradedate
group by customer, a.tradedate
But when I tried it in the past it hadn't worked. To be more specific, I am trying a smaller but similar exmaple (5 day avg instead of 20 day) with this fiddle demo and cant find out where I am going wrong. http://sqlfiddle.com/#!6/ed008/41
select a.ticker, a.dt_date, a.volume, avg(b.volume)
from yourtable a, yourtable b
where b.dt_date between a.dt_date-5 and a.dt_date
and a.ticker=b.ticker
group by a.ticker, a.dt_date, a.volume
I don't see anything wrong with your second query, I think the only reason it's not what you're expecting is because the volume field is an integer data type so when you calculate the average the resulting output will also be an integer data type. For an average you have to cast it, because the result won't necessarily be an integer (whole number):
select a.ticker, a.dt_date, a.volume, avg(cast(b.volume as float))
from yourtable a
join yourtable b
on a.ticker = b.ticker
where b.dt_date between a.dt_date - 5 and a.dt_date
group by a.ticker, a.dt_date, a.volume
Fiddle:
http://sqlfiddle.com/#!6/ed008/48/0 (thanks to #DaleM for DDL)
I don't know why you would ever do this vs. an analytic function though, especially since you mention wanting to do this in Oracle (which has analytic functions). It would be different if your preferred database were MySQL or a database without analytic functions.
Just to add to the answer, this is how you would achieve the same result in Oracle using analytic functions. Notice how the PARTITION BY acts as the join you're using on ticker. That splits up the results so that the same date shared across multiple tickers don't interfere.
select ticker,
dt_date,
volume,
avg(cast(volume as decimal)) over( partition by ticker
order by dt_date
rows between 5 preceding
and current row ) as mov_avg
from yourtable
order by ticker, dt_date, volume
Fiddle:
http://sqlfiddle.com/#!4/0d06b/4/0
Analytic functions will likely run much faster.
http://sqlfiddle.com/#!6/ed008/45 would appear to be what you need.
select a.ticker,
a.dt_date,
a.volume,
(select avg(cast(b.volume as float))
from yourtable b
where b.dt_date between a.dt_date-5 and a.dt_date
and a.ticker=b.ticker)
from yourtable a
order by a.ticker, a.dt_date
not a join but a subquery

Query return rows whose sum of column value match given sum

I have tables with:
id desc total
1 baskets 25
2 baskets 15
3 baskets 75
4 noodles 10
I would like to ask the query with output which the sum of total is 40.
The output would be like:
id desc total
1 baskets 25
2 baskets 15
I believe this will get you a list of the results you're looking for, but not with your example dataset because nothing in your example dataset can provide a total sum of 40.
SELECT id, desc, total
FROM mytable
WHERE desc IN (
SELECT desc
FROM mytable
GROUP BY desc
HAVING SUM(total) = 40
)
Select Desc,SUM(Total) as SumTotal
from Table
group by desc
having SUM(Total) > = 40
Not quite sure what you want, but this may get you started
SELECT `desc`, SUM(Total) Total
FROM TableName
GROUP BY `desc`
HAVING SUM(Total) = 40
From reading your question, it sounds like you want a query that returns any subset of of sums that represent a certain target value and have the same description.
There is no simple way to do this. This migrates into algorithmic territory.
Assuming I am correct in what you are after, group bys and aggregate functions will not solve your problem. SQL cannot indicate that a query should be performed on subsets of data until it exhaust all possible permutations and finds the Sums that match your requirements.
You will have to intermix an algorithm into your sql ... i.e a stored procedure.
Or simply get all the data from the database that fits the desc then perform your algorithm on it in code.
I recall there was a CS algorithmic class I took where this was a known Problem:
I believe you could just adapt working versions of this algorithm to solve your problem
http://en.wikipedia.org/wiki/Subset_sum_problem
select desc
from (select desc, sum(total) as ct group by desc)