SQL formatting for percentage - sql

Using the SQL code
select table.column, count(*) * 100.0 / sum(count(*)) over()
from table
group by table.column
I would like to use this function for its efficiency since I am working on a large DB. The function generates both values of percentage which sum to 100. I can't figure out a simple way to only generate the true value (1) or value of summed number over number of rows in the column. Is there a simple way I can do this or do I need to use a different function entirely ?
An example data set would be
N, Bit
0 | 0
1 | 0
2 | 1
3 | 0
4 | 0
5 | 1
It is a bit, null table where I am taking the percentage of true bits.
The N just stands for Number.

If you ONLY need to know the percentage of true bits, just do this:
SELECT COUNT(NULLIF(Bit, 0)) / CONVERT(Decimal, COUNT(*))
FROM Table
If you need to know the percentaje of true bits by other column (N in this case), you need something like this:
SELECT N, COUNT(NULLIF(Bit, 0)) / t2.C
FROM Table, (SELECT CONVERT(Decimal, COUNT(*)) C FROM Table) t2
GROUP BY N, C

Use SUM in order to count the True Bits.
SELECT
group_column,
100.0 * SUM(CASE WHEN bit_column<>0 THEN 1 ELSE 0 END) / COUNT(*) AS p
FROM myTable
GROUP BY group_column

Related

Select percentage of another column in postgresql

I would like to select, grouped by avfamily, the amount of records that have livingofftheland value equalling true and return it as the perc value.
Essentially column 3 divided by column 2 times 100.
select
avclassfamily,
count(distinct(malware_id)) as cc,
sum(case when livingofftheland = 'true' then 1 else 0 end),
(100.0 * (sum(case when livingofftheland = 'true' then 1 else 0 end) / (count(*)) ) ) as perc
from malwarehashesandstrings
group by avclassfamily having count(*) > 5000
order by perc desc;
Probably quite simple but my brains drawing a blank here.
select, grouped by avfamily, the amount of records that have livingofftheland value equalling true and return it as the perc value.
You could simply use avg() for this:
select
avclassfamily,
count(distinct(malware_id)) as cc,
avg(livingofftheland::int) * 100 as perc
from malwarehashesandstrings
group by avclassfamily
having count(*) > 5000
order by perc desc
livingofftheland::int turns the boolean value to 0 (false) or 1 (true). The average of this value gives you the ratio of records that satisfy the condition in the group, as a decimal number between 0 and 1, thatn you can then multiply by 100.
I would express this as:
select avclassfamily,
count(distinct malware_id) as cc,
count(*) filter (where livingofftheland = 'true'),
( count(*) filter (where livingofftheland = 'true') * 100.0 /
count(distinct malware_id)
) as perc
from malwarehashesandstrings
group by avclassfamily
having count(*) > 5000
order by perc desc;
Note that this replaces the conditional aggregation with filter, a SQL standard construct that Postgres supports. It also puts the 100.0 right next to the /, just to be sure Postgres doesn't decide to do integer division.

Get ratio between the length of a table and one of its subsets via SQL

I have a table named A that contains a column named x. What I'm trying to do is to count the number of items that belong to a certain subset of A (more precisely, the ones that satisfy the x > 4 condition) via a single SELECT query, for example:
SELECT COUNT(*)
FROM A
WHERE x > 4;
From thereon, I'd like to calculate the ratio between the size of this particular subset of A and A as a whole, i.e. perform the following division:
size_subset / size_A
My question is - how would I combine all of these pieces into a single SQL SELECT query?
My server is down, not able to get sure of the answer below:
SELECT count(case when x > 4 then x else null end) / COUNT(*) FROM A;
Is a slight better because its just a count, not a sum (nulls ill not be accounted)
but i prefer to do:
select (SELECT count(*) FROM A where x > 4)/(SELECT count(*) FROM A);
As I guess it can do faster
You want conditional aggregation:
SELECT sum(case when x > 4 then 1 else 0 end) / COUNT(*)
FROM A;
There's probably a less clunky way of doing this, but:
SELECT SUM(CASE WHEN x > 4 THEN 1 ELSE 0 END) / COUNT(*) FROM A

Compute percents from SUM() in the same SELECT sql query

In the table my_obj there are two integer fields:
(value_a integer, value_b integer);
I try to compute how many time value_a = value_b, and I want to express this ratio in percents.
This is the code I have tried:
select sum(case when o.value_a = o.value_b then 1 else 0 end) as nb_ok,
sum(case when o.value_a != o.value_b then 1 else 0 end) as nb_not_ok,
compute_percent(nb_ok,nb_not_ok)
from my_obj as o
group by o.property_name;
compute_percent is a stored_procedure that simply does (a * 100) / (a + b)
But PostgreSQL complains that the column nb_ok doesn't exist.
How would you do that properly ?
I use PostgreSQL 9.1 with Ubuntu 12.04.
There is more to this question than it may seem.
Simple version
This is much faster and simpler:
SELECT property_name
,(count(value_a = value_b OR NULL) * 100) / count(*) AS pct
FROM my_obj
GROUP BY 1;
Result:
property_name | pct
--------------+----
prop_1 | 17
prop_2 | 43
How?
You don't need a function for this at all.
Instead of counting value_b (which you don't need to begin with) and calculating the total, use count(*) for the total. Faster, simpler.
This assumes you don't have NULL values. I.e. both columns are defined NOT NULL. The information is missing in your question.
If not, your original query is probably not doing what you think it does. If any of the values is NULL, your version does not count that row at all. You could even provoke a division-by-zero exception this way.
This version works with NULL, too. count(*) produces the count of all rows, regardless of values.
Here's how the count works:
TRUE OR NULL = TRUE
FALSE OR NULL = NULL
count() ignores NULL values. Voilá.
Operator precedence governs that = binds before OR. You could add parentheses to make it clearer:
count ((value_a = value_b) OR FALSE)
You can do the same with
count NULLIF(<expression>, FALSE)
The result type of count() is bigint by default.
A division bigint / bigint, truncates fractional digits.
Include fractional digits
Use 100.0 (with fractional digit) to force the calculation to be numeric and thereby preserve fractional digits.
You may want to use round() with this:
SELECT property_name
,round((count(value_a = value_b OR NULL) * 100.0) / count(*), 2) AS pct
FROM my_obj
GROUP BY 1;
Result:
property_name | pct
--------------+-------
prop_1 | 17.23
prop_2 | 43.09
As an aside:
I use value_a instead of valueA. Don't use unquoted mixed-case identifiers in PostgreSQL. I have seen too many desperate question coming from this folly. If you wonder what I am talking about, read the chapter Identifiers and Key Words in the manual.
Probably the easiest way to do is to just use a with clause
WITH data
AS (SELECT Sum(CASE WHEN o.valuea = o.valueb THEN 1 ELSE 0 END) AS nbOk,
Sum(CASE WHEN o.valuea != o.valueb THEN 1 ELSE 0 END) AS nbNotOk,
FROM my_obj AS o
GROUP BY o.property_name)
SELECT nbok,
nbnotok,
Compute_percent(nbok, nbnotok)
FROM data
You might also want to try this version:
WITH all(count) as (SELECT COUNT(*)
FROM my_obj),
matching(count) as (SELECT COUNT(*)
FROM my_obj
WHERE valueA = valueB)
SELECT nbOk, nbNotOk, Compute_percent(nbOk, nbNotOk)
FROM (SELECT matching.count as nbOk, all.count - matching.count as nbNotOk
FROM all
CROSS JOIN matching) data

select result set row to columns transformation

I've a table remarks with columns id, story_id, like like can be +1, -1
I want my select query to return the following columns story_id, total, n_like, n_dislike where total = n_like + n_dislike without sub queries.
I am currently doing a group by on like and selecting like as like_t, count(like) as total which is giving me an output like
-- like_t --+ --- total --
-1 | 2
1 | 6
and returning two rows in result set. But what I want is to get 1 row where n_like is 6 and n_dislike is 2 and total is 8
First, LIKE is a reserved word in PostgreSQL, so you have to double-quote it. Maybe a better name should be picked for this column.
CREATE TABLE testbed (id int4, story_id int4, "like" int2);
INSERT INTO testbed VALUES
(1,1,'+1'),(1,1,'+1'),(1,1,'+1'),
(1,1,'+1'),(1,1,'+1'),(1,1,'+1'),
(1,1,'-1'),(1,1,'-1');
SELECT
story_id,
sum(CASE WHEN "like" > 0 THEN abs("like") ELSE 0 END) AS n_like,
sum(CASE WHEN "like" < 0 THEN abs("like") ELSE 0 END) AS n_dislike,
count(story_id) AS total
-- for cases +2 / -3 in the "like" field, use following construct instead
-- sum(abs("like")) AS total
FROM testbed
GROUP BY story_id;
I used abs("like") for cases when you'll have +2 or -3 in your "like" column.

SQL: Having trouble with query that gets percentages using aggregate functions

I'm not an expert in SQL by any means, and am having a hard time getting the data I need from a query. I'm working with a single table, Journal_Entry, that has a number of columns. One column is Status_ID, which is a foreign key to a Status table with three values "Green", "Yellow", and "Red". Also, a journal entry is logged against a particular User (User_ID).
I'm trying to get the number of journal entries logged for each Status, as a percentage of the total number of journal entries logged by a particular user. So far I've got the following for a Status of 1, which is green (and I know this doesn't work):
SELECT CAST((SELECT COUNT(Journal_Entry_ID)
FROM Journal_Entry
WHERE Status_ID = 1 AND User_ID = 3 /
SELECT COUNT(Journal_Entry_ID)
FROM Journal_Entry AND User_ID = 3)) AS FLOAT * 100
I need to continue the query for the other two status ID's, 2 and 3, and ideally would like to end with the selection of three columns as percentages, one for each Status: "Green_Percent", "Yellow_Percent", and "Red_Percent".
This is probably the most disjointed question I've ever asked, so I apologize for any lack of clarity. I'll be happy to clarify as necessary. Also, I'm using SQL Server 2005.
Thanks very much.
Use:
SELECT je.statusid,
COUNT(*) AS num,
(COUNT(*) / (SELECT COUNT(*)+.0
FROM JOURNAL_ENTRY) ) * 100
FROM JOURNAL_ENTRY je
GROUP BY je.statusid
Then it's a matter of formatting the precision you want:
CAST(((COUNT(*) / (SELECT COUNT(*)+.0 FROM BCCAMPUS.dbo.COURSES_RFIP)) * 100)
AS DECIMAL(4,2))
...will give two decimal places. Cast the result to INT if you don't want any decimal places.
You could use a CTE to minimize the duplication:
WITH cte AS (
SELECT je.*
FROM JOURNAL_ENTRY je
WHERE je.user_id = 3)
SELECT c.statusid,
COUNT(*) AS num,
(COUNT(*) / (SELECT COUNT(*)+.0
FROM cte) ) * 100
FROM cte c
GROUP BY c.statusid
This should work:
SELECT
user_id,
(CAST(SUM(CASE WHEN status_id = 1 THEN 1 ELSE 0 END) AS DECIMAL(6, 4))/COUNT(*)) * 100 AS pct_green,
(CAST(SUM(CASE WHEN status_id = 2 THEN 1 ELSE 0 END) AS DECIMAL(6, 4))/COUNT(*)) * 100 AS pct_yellow,
(CAST(SUM(CASE WHEN status_id = 3 THEN 1 ELSE 0 END) AS DECIMAL(6, 4))/COUNT(*)) * 100 AS pct_red
FROM
Journal_Entry
WHERE
user_id = 1
GROUP BY
user_id
If you don't need the user_id returned then you could get rid of that and the GROUP BY clause as long as you're only ever returning data for one user (or you want the aggregates for all users in the WHERE clause). If you want it for each user then you can keep the GROUP BY and simply get rid of the WHERE clause.
DECLARE #JournalEntry TABLE
( StatusID INT
);
INSERT INTO #JournalEntry (StatusID) VALUES
(1), (1),(1),(1),(1),(1),(1)
,(2), (2),(2),(2),(2),(2),(2)
,(3), (3),(3),(3),(3),(3),(3);
SELECT
CAST(SUM(CASE WHEN StatusID = 1 THEN 1 ELSE 0 END) AS DECIMAL) / CAST(COUNT(*) AS DECIMAL) Green
,CAST(SUM(CASE WHEN StatusID = 2 THEN 1 ELSE 0 END) AS DECIMAL) / CAST(COUNT(*) AS DECIMAL) Yellow
,CAST(SUM(CASE WHEN StatusID = 3 THEN 1 ELSE 0 END) AS DECIMAL) / CAST(COUNT(*) AS DECIMAL) Blue
FROM #JournalEntry;