SQL query - percentage of sub sample - sql

I got a SQL statement:
Select
ID, GroupID, Profit
From table
I now want to add a fourth column percentage of group profits.
Therefore the query should sum all the profits for the same group id and then have that number divided by the profit for the unique ID.
Is there a way to do this? The regular sum function does not seem to do the trick.
Thanks

select t1.ID,
t1. GroupID,
(t1.Profit * 1.0) / t2.grp_profit as percentage_profit
from table t1
inner join
(
select GroupID, sum(Profit) as grp_profit
from table
group by GroupID
) t2 on t1.groupid = t2.groupid

One more option with window function
select ID, GroupID, Profit * 1. / SUM(profit) OVER(PARTITION BY GroupID)
from t1

An alternative solution using scalar sub-queries is as follows:
select t1.ID, t1.GroupID, (select sum(t2.Profit) * 1.0 / t1.Profit
from table t2
where t2.GroupID = t1.GroupID) as percentage_profit
from table t1;

To provide an alternate answer, albeit less efficient, is to use a scalar subquery.
SELECT ID, GroupId, Profit, (Profit/(SELECT sum(Profit)
FROM my_table
WHERE GroupId= mt.GroupId))*100 as pct
FROM my_table as mt
From the way it reads I'm not sure if you want "percentage of group profits" or you or want group_profit / individual profit
That's the way this sounds "Therefore the query should sum all the profits for the same group id and then have that number divided by the profit for the unique ID"
Either way just switch the divisor for what you want!
Also if you're using Postgresql >= 8.4 you can use a window function.
SELECT ID, GroupId, Profit, (Profit/ (sum(Profit) OVER(partition by GroupId)))*100 as pct
FROM core_dev.my_table as mt

Related

How to select all columns with sum function

Assuming there is a table with 100 columns, how can I select all columns with a sum without having to type out all the columns?
For example something like this:
select *, sum(price) as sales
from table
group by *
order by date
try this
select table.* , t.sales from table
inner join (select id, sum(price) as sales from table group by id ) t
on table.id=t.id
order by date
But in general it is not recommended to use an stare in a select statement,
for example dont use * in table-valued function

best way to get count and distinct count of rows in single query

What is the best way to get count of rows and distinct rows in a single query?
To get distinct count we can use subquery like this:
select count(*) from
(
select distinct * from table
)
I have 15+ columns and have many duplicates rows as well and I want to calculate count of rows as well as distinct count of rows in one query.
More if I use this
select count(*) as Rowcount , count(distinct *) as DistinctCount from table
This will not give accurate results as count(distinct *) doesn't work.
Why don't you just put the subquery inside another query?
select count(*),
(select count(*) from (select distinct * from table))
from table;
create table tbl
(
col int
);
insert into tbl values(1),(2),(1),(3);
select count(*) as distinct_count, sum(sum) as all_count
from (
select count(col) sum from tbl group by col
)A
I think I have understood what you are looking for. You need to use some window function. So, you query should be look like =>
Select COUNT(*) OVER() YourRowcount ,
COUNT(*) OVER(Partition BY YourColumnofGroup) YourDistinctCount --Basic of the distinct count
FROM Yourtable
NEW Update
select top 1
COUNT(*) OVER() YourRowcount,
DENSE_RANK() OVER(ORDER BY YourColumn) YourDistinctCount
FROM Yourtable ORDER BY TT DESC
Note: This code is written sql server. Please check the code and let me know.

How to create an additional column with the percentages related to a count distinct statement

I'm trying to query each distinct medical speciality (e.g. oncologist, pediatrician, etc.) in a table and then count the number of times a claim (claim_id) is linked to it, which I've done using this:
select distinct specialization, count(distinct claim_id) AS Claim_Totals
from table1
group by specialization
order by Claim_Totals DESC
However, I also want to include an additional column which lists the % that each speciality makes up in the table (based on the number of claim_id related to it). So for instance, if there were 100 total claims and "cardiologist" had 25 claim_id records related to it, "oncologist" had 15, "general surgeon" had 10, and so forth, I want the output to look like this:
specialization | Claims_Totals | PERCENTAGE
___________________________________________
cardiologist 25 25%
oncologist 15 15%
general surgeon 10 10%
Could do this? I'm not familiar with Barbaros's syntax. If that works its more concise and better.
select specialization, count(distinct claim_id) AS Claim_Totals, count(distinct claim_id)/total_claims
from table1
INNER JOIN ( SELECT COUNT(DISTINCT claim_id)*1.0000 total_claims AS total_claims
FROM table1 ) TMP
ON 1 = 1
group by specialization
order by Claim_Totals DESC
select specialization,
count(distinct claim_id) AS claim_by_spec,
count(distinct claim_id)/
( SELECT COUNT(DISTINCT claim_id)*1.0000
FROM table1 ) AS percentage_calc
from table1
group by specialization
order by Claim_Totals DESC
You can use sum(count(distinct)) over() to get the overall claims and use it in the denominator to get the percentage.
select specialization
,count(distinct claim_id) AS Claim_Totals
,round(100*count(distinct claim_id)/sum(count(distinct claim_id)) over(),3) as percentage
from table1
group by specialization
You can use
,concat_ws('',count(distinct claim_id),'%') as percentage
or
,concat(count(distinct claim_id),'%') as percentage
as added to the select list's tail
Btw, distinct before specialization in the select list is redundant, since already included in the group by list.
Because you are using count(distinct), window functions are less useful. You can try:
select t1.specialization,
count(distinct t1.claim_id) AS Claim_Totals,
count(distinct t1.claim_id) / tt1.num_claims
from table1 t1 cross join
(select count(distinct claim_id) as num_claims
from table1
) tt1
group by t1.specialization
order by Claim_Totals DESC

Calculate percentage of group using GROUP BY

I am doing a GROUP BY and COUNT(*) on a dataset, and I would like to calculate the percentage of each group over the total.
For example, in this query, I would like to know how much the count() for each state represents over the total ( select count() from publicdata:samples.natality ):
SELECT state, count(*)
FROM [publicdata:samples.natality]
GROUP by state
There are several ways to do it in SQL, but I haven't found a way to do it in Bigquery, does anyone know?
Thanks!
Check ratio_to_report, one of the recently announced window functions:
SELECT state, ratio * 100 AS percent FROM (
SELECT state, count(*) AS total, RATIO_TO_REPORT(total) OVER() AS ratio
FROM [publicdata:samples.natality]
GROUP by state
)
state percent
AL 1.4201828131159113
AK 0.23521048665998198
AZ 1.3332896746620975
AR 0.7709591206172346
CA 10.008298605982642
Modifying Felipe's answer for the standard SQL BigQuery dialect instead of the Legacy SQL dialect looks like this:
select state, 100*(state_count / total) as pct
from (
SELECT state, count(*) AS state_count, sum(count(*)) OVER() AS total
FROM `bigquery-public-data.samples.natality`
GROUP by state
) s
Documentation of the standard SQL BigQuery aggregate analytic functions (aka 'window functions') is here:
https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
You can use a window function to get the percentage of total by group, without the need for a subquery (improving on evan_b's solution):
SELECT
state
,count(*) / (sum(count(*)) OVER()) as pct
FROM
`bigquery-public-data.samples.natality`
GROUP BY
state
You can do a self join against the total, using a dummy value as a key. For example:
SELECT
t1.state AS state,
t1.cnt AS cnt,
100 * t1.cnt / t2.total as percent
FROM (
SELECT
state,
COUNT(*) AS cnt,
1 AS key
FROM
[publicdata:samples.natality]
WHERE state is not null
GROUP BY
state) AS t1
JOIN (
SELECT
COUNT(*) AS total,
1 AS key
FROM
[publicdata:samples.natality]) AS t2
ON t1.key = t2.key
ORDER BY percent DESC
When using Johnny V's solution, it returns frequencies for me. For calculating actual percentages I found that adding a simple *100 works:
SELECT
sex
,COUNT(*) / (SUM(COUNT(*))OVER()) * 100 AS percentage
FROM `powerful-hall-355408.comic_characters_wikia.dc_comics`
GROUP BY sex

SELECT * WHERE val = MIN ( val )?

I'm fairly new to Oracle SQL, but already it's logic is beginning to confuse me. I'm trying to select all columns from a table where a particular column PRICE has the minimum value.
This works:
SELECT MIN(PRICE) FROM my_tab;
This returns me the minimum value. But why can't I select all the columns in that row? The following won't work:
SELECT * FROM my_tab WHERE PRICE = MIN( PRICE );
What am I missing here? Cheers folks!
*EDIT*
Here is the full code I'm having trouble with:
SELECT * FROM ( SELECT c.NAME, o.* FROM customers c JOIN customer_orders o ON c.CUST_NBR = o.CUST_NBR ) AS t WHERE t.PRICE = ( SELECT MIN( t.PRICE) FROM t );
SELECT * FROM TABLE WHERE PRICE = (SELECT MIN(PRICE) FROM TABLE)
--Edited
WITH
TABLE AS
(QUERY)
SELECT * FROM TABLE
WHERE PRICE = (SELECT MIN(PRICE) FROM TABLE)
You can also use a subquery to get the result:
select t1.*
from my_tab t1
inner join
(
SELECT MIN(PRICE) MinPrice
FROM my_tab
) t2
on t1.price = t2.minprice
See previous SO question, and especially answer by "Vash" which is best for your purposes. Note that you probably want to avoid a subselect since Oracle may be smart enough to use an index on the price if available to look at only one record.
Most databases, but apparently not Oracle, have either TOP 1 or LIMIT clauses for questions like these.