Compute percents from SUM() in the same SELECT sql query - sql

In the table my_obj there are two integer fields:
(value_a integer, value_b integer);
I try to compute how many time value_a = value_b, and I want to express this ratio in percents.
This is the code I have tried:
select sum(case when o.value_a = o.value_b then 1 else 0 end) as nb_ok,
sum(case when o.value_a != o.value_b then 1 else 0 end) as nb_not_ok,
compute_percent(nb_ok,nb_not_ok)
from my_obj as o
group by o.property_name;
compute_percent is a stored_procedure that simply does (a * 100) / (a + b)
But PostgreSQL complains that the column nb_ok doesn't exist.
How would you do that properly ?
I use PostgreSQL 9.1 with Ubuntu 12.04.

There is more to this question than it may seem.
Simple version
This is much faster and simpler:
SELECT property_name
,(count(value_a = value_b OR NULL) * 100) / count(*) AS pct
FROM my_obj
GROUP BY 1;
Result:
property_name | pct
--------------+----
prop_1 | 17
prop_2 | 43
How?
You don't need a function for this at all.
Instead of counting value_b (which you don't need to begin with) and calculating the total, use count(*) for the total. Faster, simpler.
This assumes you don't have NULL values. I.e. both columns are defined NOT NULL. The information is missing in your question.
If not, your original query is probably not doing what you think it does. If any of the values is NULL, your version does not count that row at all. You could even provoke a division-by-zero exception this way.
This version works with NULL, too. count(*) produces the count of all rows, regardless of values.
Here's how the count works:
TRUE OR NULL = TRUE
FALSE OR NULL = NULL
count() ignores NULL values. Voilá.
Operator precedence governs that = binds before OR. You could add parentheses to make it clearer:
count ((value_a = value_b) OR FALSE)
You can do the same with
count NULLIF(<expression>, FALSE)
The result type of count() is bigint by default.
A division bigint / bigint, truncates fractional digits.
Include fractional digits
Use 100.0 (with fractional digit) to force the calculation to be numeric and thereby preserve fractional digits.
You may want to use round() with this:
SELECT property_name
,round((count(value_a = value_b OR NULL) * 100.0) / count(*), 2) AS pct
FROM my_obj
GROUP BY 1;
Result:
property_name | pct
--------------+-------
prop_1 | 17.23
prop_2 | 43.09
As an aside:
I use value_a instead of valueA. Don't use unquoted mixed-case identifiers in PostgreSQL. I have seen too many desperate question coming from this folly. If you wonder what I am talking about, read the chapter Identifiers and Key Words in the manual.

Probably the easiest way to do is to just use a with clause
WITH data
AS (SELECT Sum(CASE WHEN o.valuea = o.valueb THEN 1 ELSE 0 END) AS nbOk,
Sum(CASE WHEN o.valuea != o.valueb THEN 1 ELSE 0 END) AS nbNotOk,
FROM my_obj AS o
GROUP BY o.property_name)
SELECT nbok,
nbnotok,
Compute_percent(nbok, nbnotok)
FROM data

You might also want to try this version:
WITH all(count) as (SELECT COUNT(*)
FROM my_obj),
matching(count) as (SELECT COUNT(*)
FROM my_obj
WHERE valueA = valueB)
SELECT nbOk, nbNotOk, Compute_percent(nbOk, nbNotOk)
FROM (SELECT matching.count as nbOk, all.count - matching.count as nbNotOk
FROM all
CROSS JOIN matching) data

Related

Query - display zero where null in one column and select sum of two columns where not null in next column

I need to display a zero where "Silo Wt" is null, and display the sum of the two values in the Total column even if "Silo Wt" is null.. may not require any changes if I can get a zero in the Silo column
SELECT DISTINCT (coffee_type) AS "Coffee_Type",
(SELECT ItemName
FROM [T01_Item_Name_TBL]
WHERE Item = B.Coffee_Type) AS "Description",
(SELECT COUNT(Green_Inventory_ID)
FROM [Green_Inventory] AS A
WHERE A.Coffee_Type = B.Coffee_Type
AND current_Quantity > 0) AS "Current Units",
SUM((Unit_Weight) * (Current_Quantity)) AS "Green Inv Wt",
(SELECT SUM(TGWeight)
FROM [P04_Green_STotal_TBL] AS C
WHERE TGItem = Coffee_type) AS "Silo Wt",
(SUM((Unit_Weight) * (Current_Quantity)) +
(SELECT SUM(TGWeight)
FROM [P04_Green_STotal_TBL] AS C
WHERE TGItem = Coffee_type)) AS Total
FROM
[Green_Inventory] AS B
WHERE
Pallet_Status = 0
GROUP BY
Coffee_Type
SS of query results now
You just need to wrap them in ISNULL.
However, your query could do with some serious cleanup and simplification:
DISTINCT makes no sense as you are grouping by that column anyway.
Two of the subqueries can be combined using OUTER APPLY, although this requires moving the grouped Green_Inventory into a derived table.
Another subquery, the self-join on Green_Inventory, can be transformed into conditional aggregation.
Not sure whether I've got the logic right, as the subquery did not have a filter on Pallet_Status, but it looks like you would also need to move that condition into conditional aggregation for the SUM, and use a HAVING. It depends exactly on your requirements.
Don't use quoted table or column names unless you have to.
Use meaningful table aliases, rather than A B C.
Specify table names when referencing columns, especially when using subqueries, or you might get unintended results.
SELECT
gi.Coffee_Type,
(SELECT ItemName
FROM T01_Item_Name_TBL AS n
WHERE n.Item = gi.coffee_Type
) AS Description,
ISNULL(gst.TGWeight, 0) AS SiloWt,
ISNULL(gi.GreenInvWt, 0) + ISNULL(gst.TGWeight, 0) AS Total
FROM (
SELECT
gi.Coffee_Type,
COUNT(CASE WHEN gi.current_Quantity > 0 THEN 1 END) AS CurrentUnits,
SUM(CASE WHEN gi.Pallet_Status = 0 THEN gi.Unit_Weight * gi.Current_Quantity END) AS GreenInvWt
FROM
Green_Inventory AS gi
GROUP BY
gi.Coffee_Type
HAVING
SUM(CASE WHEN gi.Pallet_Status = 0 THEN gi.Unit_Weight * gi.Current_Quantity END) > 0
) AS gi
OUTER APPLY (
SELECT SUM(gst.TGWeight) AS TGWeight
FROM P04_Green_STotal_TBL AS gst
WHERE gst.TGItem = gi.Coffee_Type
) AS gst;

Why error: 01428. 00000 - "argument '%s' is out of range" SQl Developer

I have the following SQL script,
Select * From
(Select To_Char(Bmret.Pricedate, 'dd-mm-yyyy') As Pricedate, Bmret.Bmval, Bmret.id
, Cast(Exp(Sum(Ln(Cast(Bmret.Bmval As number))) Over (Partition By bmret.id)) As Number) As Twr
, RANK() OVER (PARTITION BY bmret.id ORDER BY bmret.pricedate asc) AS rank
From Tab_A Bmret
Where 1=1
) B
Where 1=1
And B.Rank=1
;
, which provides me with the desired result of a column, twr, that contains the product of the elements in column Bmval across pricedates, grouped by id.
However, I obtain the following error: 01428. 00000 - "argument '%s' is out of range".
I am aware that the error stems from the part Cast(Exp(Sum(Ln(Cast(Bmret.Bmval As number))) Over (Partition By bmret.id)) As Number) of the code and in particular that the "parameter passed into the function was not a valid value". Hence, my question is, is there any way to identify the id with values that are not valid?
I am not allowed to share the sample data. I am sorry.
Thank you in advance.
Best regards,
Please check the value of Cast(Bmret.Bmval As number). It must be greater than 0.
For further read:
https://www.techonthenet.com/oracle/functions/ln.php
Oracle / PLSQL: LN Function This Oracle tutorial explains how to use
the Oracle/PLSQL LN function with syntax and examples.
Description The Oracle/PLSQL LN function returns the natural logarithm
of a number.
Syntax The syntax for the LN function in Oracle/PLSQL is:
LN( number ) Parameters or Arguments number The numeric value used to
calculate the natural logarithm. It must be greater than 0.
You need to define what will be the Ln(Cast(Bmret.Bmval As number)) if Bmret.Bmval <=0. If you define it as 0( which might not be correct for the calculation) then your query would be:
Select * From
(Select To_Char(Bmret.Pricedate, 'dd-mm-yyyy') As Pricedate, Bmret.Bmval, Bmret.id
, Cast(Exp(Sum(case when Cast(Bmret.Bmval As number)>0 then Ln(Cast(Bmret.Bmval As number)) else 0 end) Over (Partition By bmret.id)) As Number) As Twr
, RANK() OVER (PARTITION BY bmret.id ORDER BY bmret.pricedate asc) AS rank
From Tab_A Bmret
Where 1=1
) B
Where 1=1
And B.Rank=1;
As #Kazi said, and as earlier answers had already mentioned, the issue is with using ln() with a negative number or zero. The documentation says:
LN returns the natural logarithm of n, where n is greater than 0.
so you can identify the IDs with out-of-range values with:
select id from tab_a where bmval <= 0
As you want the product of several numbers, you probably still want to include those values; but then having a zero amongst them should make the result zero, one negative number should make the result negative, two should make it positive, etc.
You can use the absolute value of your numbers for the calculation, and at the same time count how many negative values there are - then if that count of negatives is an odd number, multiply the whole result by -1.
Adapting the answer to your previous question, and changing the table and column names to match this question, that would be:
select to_char(a1.pricedate, 'dd-mm-yyyy') as pricedate, b1.bm, a1.bmval,
round(cast(exp(sum(ln(cast(abs(a1.bmval) as binary_double))) over (partition by b1.bmik)) as number))
*
case
when mod(count(case when a1.bmval < 0 then pricedate end) over (partition by b1.bmik), 2) = 0
then 1
else -1
end as product
from tab_a a1
inner join benchmarkdefs b1 on (a1.id = b1.bmik);
db<>fiddle with a group that has two negatives (which cancel out), one negative (which is applied), and one with a zero - where the product ends up as zero, as you'd hopefully expect.
The point of the cast() calls was to improve performance, as noted in the old question I linked to, by performing the exp/ln part as binary_double; there is no point casting a number to number. If you don't want the binary_double part then you can take the casts out completely; but then you do also have to deal with zeros as well as negative values, e.g. keeping track of whether you have any of those too:
select to_char(a1.pricedate, 'dd-mm-yyyy') as pricedate, b1.bm, a1.bmval,
round(exp(sum(ln(abs(nullif(a1.bmval, 0)))) over (partition by b1.bmik)))
*
case when min(abs(a1.bmval)) over (partition by b1.bmik) = 0 then 0 else 1 end
*
case
when mod(count(case when a1.bmval < 0 then pricedate end) over (partition by b1.bmik), 2) = 0
then 1
else -1
end as product
from tab_a a1
inner join benchmarkdefs b1 on (a1.id = b1.bmik);
db<>fiddle
For this query, which just gets values for the first date and product across all dates, that would translate (with casting) to:
select * from
(
select to_char(bmret.pricedate, 'dd-mm-yyyy') as pricedate, bmret.bmval, bmret.id
, round(exp(sum(ln(abs(nullif(bmret.bmval, 0)))) over (partition by bmret.id)))
*
case when min(abs(bmret.bmval)) over (partition by bmret.id) = 0 then 0 else 1 end
*
case
when mod(count(case when bmret.bmval < 0 then pricedate end) over (partition by bmret.id), 2) = 0
then 1
else -1
end as twr
, rank() over (partition by bmret.id order by bmret.pricedate asc) as rank
from tab_a bmret
) b
where b.rank=1
PRICEDATE
BMVAL
ID
TWR
RANK
11-08-2021
1
1
120
1
11-08-2021
12
2
524160
1
11-08-2021
22
3
-7893600
1
11-08-2021
1
4
0
1
db<>fiddle
As you were told in an old answer, if you don't want to see the (not very interesting) rank column then change select * from to select pricedate, bmval, id, twr from in the outer query.
You could also use aggregation with keep to avoid needing an inline view:
select to_char(min(pricedate), 'dd-mm-yyyy') as pricedate
, min(bmret.bmval) keep (dense_rank first order by pricedate) as bmval
, min(bmret.id) keep (dense_rank first order by pricedate) as id
, round(exp(sum(ln(abs(nullif(bmret.bmval, 0))))))
*
case when min(abs(bmret.bmval)) = 0 then 0 else 1 end
*
case
when mod(count(case when bmret.bmval < 0 then pricedate end), 2) = 0
then 1
else -1
end as twr
from tab_a bmret
group by bmret.id
PRICEDATE
BMVAL
ID
TWR
11-08-2021
1
1
120
11-08-2021
12
2
524160
11-08-2021
22
3
-7893600
11-08-2021
1
4
0
db<>fiddle

SQL: identify if there are multiples (not duplicates) in a column

I am currently struggling in identifying a possibility to identify certain patterns in my data using SSMS.
I wish to identify rows that contain multiples (x2, x3, or x*4) of an entry within the same column.
I really have no clue on how to even start my "where" statement right now.
SELECT [numbers], [product_ID]
FROM [db].[dbo].[tablename]
WHERE [numbers] = numbers*2
My problem is that with the code above I can obviously only identify zeros.
Google only helps me out with finding duplicates but I can't find a way to identify multiples of a value...
My desired result would be a table that only contains numbers (linked to product_IDs) that are multiples of each other
Anyone can help me out here?
If a column contains multiples, then all are multiples of the smallest non-zero value. Let me assume the values are positive or zero for this purpose.
So, you can determine if this is the case using window functions and modulo arithmetic:
select t.*
from (select t.*,
min(case when number > 0 then number end) over () as min_number
from t
) t
where number % min_number = 0 or min_number = 1;
If you want to know if all numbers meet this criteria, use aggregation:
select (case when min(number % min_number) = 0 then 'all multiples' else 'oops' end)
from (select t.*,
min(case when number > 0 then number end) over () as min_number
from t
) t
My desired result would be a table that only contains numbers (linked to product_IDs) that are multiples of each other
You'll need to test all pairs of rows, which means a CROSS JOIN.
Something like this:
with q as
(
SELECT [numbers],
[product_ID],
cast(a.numbers as float) / coalesce(b.numbers, null) ratio
FROM [tablename] a
CROSS JOIN [tablename] b
)
select *
from q
where ratio = cast(ratio as bigint)
and ratio > 1

SQL formatting for percentage

Using the SQL code
select table.column, count(*) * 100.0 / sum(count(*)) over()
from table
group by table.column
I would like to use this function for its efficiency since I am working on a large DB. The function generates both values of percentage which sum to 100. I can't figure out a simple way to only generate the true value (1) or value of summed number over number of rows in the column. Is there a simple way I can do this or do I need to use a different function entirely ?
An example data set would be
N, Bit
0 | 0
1 | 0
2 | 1
3 | 0
4 | 0
5 | 1
It is a bit, null table where I am taking the percentage of true bits.
The N just stands for Number.
If you ONLY need to know the percentage of true bits, just do this:
SELECT COUNT(NULLIF(Bit, 0)) / CONVERT(Decimal, COUNT(*))
FROM Table
If you need to know the percentaje of true bits by other column (N in this case), you need something like this:
SELECT N, COUNT(NULLIF(Bit, 0)) / t2.C
FROM Table, (SELECT CONVERT(Decimal, COUNT(*)) C FROM Table) t2
GROUP BY N, C
Use SUM in order to count the True Bits.
SELECT
group_column,
100.0 * SUM(CASE WHEN bit_column<>0 THEN 1 ELSE 0 END) / COUNT(*) AS p
FROM myTable
GROUP BY group_column

Group by multiple criteria

Given the table like
| userid | active | anonymous |
| 1 | t | f |
| 2 | f | f |
| 3 | f | t |
I need to get:
number of users
number of users with 'active' = true
number of users with 'active' = false
number of users with 'anonymous' = true
number of users with 'anonymous' = false
with single query.
As for now, I only came out with the solution using union:
SELECT count(*) FROM mytable
UNION ALL
SELECT count(*) FROM mytable where active
UNION ALL
SELECT count(*) FROM mytable where anonymous
So I can take first number and find non-active and non-anonymous users with simple deduction .
Is there any way to get rid of union and calculate number of records matching these simple conditions with some magic and efficient query in PostgreSQL 9?
You can use an aggregate function with a CASE to get the result in separate columns:
select
count(*) TotalUsers,
sum(case when active = 't' then 1 else 0 end) TotalActiveTrue,
sum(case when active = 'f' then 1 else 0 end) TotalActiveFalse,
sum(case when anonymous = 't' then 1 else 0 end) TotalAnonTrue,
sum(case when anonymous = 'f' then 1 else 0 end) TotalAnonFalse
from mytable;
See SQL Fiddle with Demo
Assuming your columns are boolean NOT NULL, this should be a bit faster:
SELECT total_ct
,active_ct
,(total_ct - active_ct) AS not_active_ct
,anon_ct
,(total_ct - anon_ct) AS not_anon_ct
FROM (
SELECT count(*) AS total_ct
,count(active OR NULL) AS active_ct
,count(anonymous OR NULL) AS anon_ct
FROM tbl
) sub;
Find a detailed explanation for the techniques used in this closely related answer:
Compute percents from SUM() in the same SELECT sql query
Indexes are hardly going to be of any use, since the whole table has to be read anyway. A covering index might be of help if your rows are bigger than in the example. Depends on the specifics of your actual table.
-> SQLfiddle comparing to #bluefeet's version with CASE statements for each value.
SQL server folks are not used to the proper boolean type of Postgres and tend to go the long way round.