Calculating percentages across groups using PostgreSQL - sql

There exists a table as below with a letter and a corresponding value.
practice=# select * from table;
letter | value
--------+---------
A | 5000.00
B | 6000.00
C | 6000.00
C | 7000.00
B | 8000.00
A | 9000.00
(6 rows)
I wish to obtain the sum of each letter through use of a GROUP BY clause, and then divide the total sum for each letter by the total value of all entries in the table as a whole - 41,000 as calculated below.
practice=# select sum(value) from table;
sum
----------
41000.00
(1 row)
When I run a GROUP BY clause in conjunction with a subquery, I am only able to calculate the percentage across each letter when I specify the total value of 41,000 in advance. Here is the query and output.
practice=# select letter, cast((group_values/41000)*100 as decimal(4,2)) as percentage from (select letter, sum(value) as group_values from table group by letter order by letter) as subquery;
letter | percentage
--------+------------
A | 34.15
B | 34.15
C | 31.71
(3 rows)
However, when attempting to obtain the total and then calculate the percentage, the query fails. Below is my attempt:
practice=# select letter, cast((group_values/sum(value))*100 as decimal(4,2)) as percentage from (select letter, value, sum(value) as group_values from table group by letter, value order by letter) as subquery;
ERROR: column "subquery.letter" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select letter, cast((group_values/sum(value))*100 as decimal...

As select sum(value) from table returns a scalar value you can replace the calcated number with it
select
letter
, cast((group_values/(select sum(value) from table))*100 as decimal(4,2)) as percentage
from (select letter, sum(value) as group_values from table group by letter order by letter) as subquery;

Related

Sum distinct by separate ID column

I have some data of the form:
ID Value
A 2
B 2
C 3
A 2
A 2
C 3
B 2
I want to sum value by distinct IDs.
select sum(distinct value) from table would give the sum of 2 and 3 = 5. I don't want that, I want the sum of value for each ID, i.e. A=2, B=2, C=3, there's 3 distinct IDs so sum(2,2,3) = 7.
In 'sql-ish' I want something like select sum(distinct value by ID) from table. Is this possible?
Get the distinct combinations of ID and Value in a subquery and then the sum of Values:
SELECT SUM(Value) sum_value
FROM (SELECT DISTINCT ID, Value FROM tablename) t
Another way to do it is with SUM() window function:
SELECT DISTINCT SUM(MAX(Value)) OVER() sum_value
FROM tablename
GROUP BY ID
See the demo.

compare value of one column sum to other column of same table

I have a table with some columns.I would like to take sum one column and compare the value to other single column value. if sum of first column equal of last row of other column then ok other wise show this difference.
Example.
Table abc have column code,val_1 and val_2
code val_1 val_2
A_1 200 100
A_1 150 50
A_1 250 25
A_1 50 650
Now if sum of val_1 is equal to last row of Val_2 "650" then ok if not equal then show it.
SQL tables represent unordered tables. There is no "last row".
If you do have an ordering column, you can use window functions:
select t.*
from (select t.*,
sum(val_1) over (partition by code) as sum_val_1,
row_number() over (partition by code order by <orderingcol> desc) as seqnum
from t
) t
where seqnum = 1 and sum_val_1 <> val_2;
If the last value in val_2 is always the maximum, then aggregation suffices:
select code
from t
group by code
having sum(val_1) <> max(val_2);

SQL select top rows based on limit

Please help me t make below select query
Source table
name Amount
-----------
A 2
B 3
C 2
D 7
if limit is 5 then result table should be
name Amount
-----------
A 2
B 3
if limit is 8 then result table
name Amount
-----------
A 2
B 3
C 2
You can use window function to achieve this:
select name,
amount
from (
select t.*,
sum(amount) over (
order by name
) s
from your_table t
) t
where s <= 8;
The analytic function sum will be aggregated row-by-row based on the given order order by name.
Once you found sum till given row using this, you can filter the result using a simple where clause to find rows till which sum of amount is under or equal to the given limit.
More on this topic:
The SQL OVER() clause - when and why is it useful?
https://explainextended.com/2009/03/08/analytic-functions-sum-avg-row_number/

PostgreSQL using sum in where clause

I have a table which has a numeric column named 'capacity'. I want to select first rows which the total sum of their capacity is no greater than X, Sth like this query
select * from table where sum(capacity )<X
But I know I can not use aggregation functions in where part.So what other ways exists for this problem?
Here is some sample data
id| capacity
1 | 12
2 | 13.5
3 | 15
I want to list rows which their sum is less than 26 with the order of id, so a query like this
select * from table where sum(capacity )<26 order by id
and it must give me
id| capacity
1 | 12
2 | 13.5
because 12+13.5<26
A bit late to the party, but for future reference, the following should work for a similar problem as the OP's:
SELECT id, sum(capacity)
FROM table
GROUP BY id
HAVING sum(capacity) < 26
ORDER by id ASC;
Use the PostgreSQL docs for reference to aggregate functions: https://www.postgresql.org/docs/9.1/tutorial-agg.html
Use Having clause
select * from table order by id having sum(capacity)<X
You can use the window variant of sum to produce a cumulative sum, and then use it in the where clause. Note that window functions can't be placed directly in the where clause, so you'd need a subquery:
SELECT id, capacity
FROM (SELECT id, capacity, SUM(capacity) OVER (ORDER BY id ASC) AS cum_sum
FROM mytable) t
WHERE cum_sum < 26
ORDER BY id ASC;

SQL Query to group text based on numeric column

I have a table 'TEST' as shown below
Number | Seq | Name
-------+-------+------
123 | 1 | Hello
123 | 2 | Hi
123 | 3 | Greetings
234 | 1 | Goodbye
234 | 2 | Bye
I want to write a query, to group the table by 'Number', and select the rows with the maximum sequence number (MAX(Seq)). The output of the query would be
Number | Seq | Name
-------+-------+------
123 | 3 | Greetings
234 | 2 | Bye
How do I go about this?
EDIT: TEST is actually a table that is the result from a long query (joining multiple tables) that I have already written. I already have a (SELECT ...) statement to get the values I need. Is there a way to remove duplicate rows (with the same 'Number' as shown above) and select only the one with maximum 'Seq' value.
I am on Microsoft SQL Server 2008 (SP2)
I was hoping there would be a way to achieve this by
SELECT * FROM (SELECT ...) TEST <condition to group>
You can use a select win in clause
select * from test
where (number, count) in (select number, max(count) from test group by Number)
Another option is to use a windowed ROW_NUMBER() function with a partition on the number:
With Cte As
(
Select *,
Row_Number() Over (Partition By Number Order By Count Desc) RN
From TEST
)
Select Number, Count, Name
From Cte
Where RN = 1
SELECT *
FROM (SELECT test.*, MAX (seq) OVER (PARTITION BY num) max_seq
FROM test)
WHERE seq = max_seq
I changed the column name from number because you can't use a reserved word for a column name. This is pretty much the same as the other answers, except that it explicitly gets the maximum sequence number for each NUM.
You want to use an ANALYTIC function together with a conditional clause to get you only the rows of TEST that you desire.
WITH TEST as (
...your really complex query that generates TEST...
)
SELECT
Number, Seq, Name,
RANK() OVER (PARTITION By Number ORDER BY Seq DESC) AS aRank
FROM Test
WHERE aRank = 1
;
This returns the Number, Seq, Name for each Number grouping where the Seq is maximum. Yes, it also returns a column named aRank with all '1' in it...hopefully it can be ignored.
The solution to this is to do an self join on only the MAX(Seq) values.
This answer can be found at SQL Select only rows with Max Value on a Column