Division of integers returns 0 - sql

I feel like I'm missing something obvious. I am trying to test out the distribution of random(). Here is the table:
create table test (
id int,
random_float float,
random_int int
);
Here is what I want to do:
truncate table test;
insert into test (id)
values (generate_series(1,1000));
update test
set
random_float = random() * 10 + 1;
update test
set
random_int = trunc(random_float);
select
random_int,
count(random_int) as Count,
cast( count(random_int) / max(id) as float) as Percent
from test
group by random_int
order by random_int;
However, the "Percent" column returns zero for every record. I tried casting it as float, as decimal, I tried changing the random_int column to decimal instead of integer, always same result.
Here is a fiddle.
Any insight as to what I am doing wrong?

You should cast before you divide, but also you were missing a subquery to get the total count from the table. Here's the sample.
select
random_int,
count(random_int) as Count,
cast(count(random_int) as decimal(7,2)) / cast((select count(random_int) from test) as decimal(7,2)) as Percent
from test
group by random_int
order by random_int;

Try this query instead:
select
random_int,
count(random_int) as Count,
cast( count(random_int) / max(id) as float) as Percent,
(100.0 * count(random_int) / max(id))::numeric(5,2) as pct
from test
group by random_int
order by random_int;
PostgreSQL has a strong types system. In your case, type is implied by count() function, which returns bigint (or int8) and id column, which is integer.
I would recommend using 100.0 as initial multiplier, that'll cause whole expression to be calculated as numeric and will also provide real percents. You might also want to cast to numeric(5,2) at the end to get rid of too big number.

Related

How to get percentages?

SELECT
date,
location,
total_cases,
total_deaths,
(total_deaths/total_cases)* 100 as Death_percentage
FROM CovidDeaths
ORDER BY 1,2
From my query (to get percentage of COVID deaths per location) death_percentage column returns 0 for every row. What am I doing wrong?
The problem is integer division. Since both operands are integers, SQLite returns an integer result (basically the floor of the result of the division).
SELECT ( 2 / 3 ) * 100;
--> 0
Here, since the denominator of the division is always greater than the numerator, the division always yields 0. We can work around this with a little trick, by turning the 100 multiplier to a decimal value and putting it first in the calculation. It is also important to remove the parentheses around the division, so the decimal typing properly "propagates" to the division:
SELECT 100.0 * 2 / 3
--> 66.666667
In your query:
SELECT
date,
location,
total_cases,
total_deaths,
100.0 * total_deaths / nullif(total_cases, 0) as death_percentage
FROM CovidDeaths
ORDER BY 1,2
Note that the expression also adresses the possibility of a 0 values in the total_cases column - which would otherwise generate an arithmetic error, as pointed out by Jonas Metzler in the comments.

Is it possible to get up to 3 decimal places in Float in PostgreSQL?

I have a table in PostgreSQL, that have a Float column. In my select I use AVG() on that column, so often it gives a number with many decimals. Is there any way to retrict the number of decimals to a maximum of 3, meaning there can be less but not more than 3.
This is the Query:
SELECT team, AVG(score) FROM team_score_table GROUP BY team
You can use round():
select round(val::numeric, 3)
You can also convert to a numeric, but you need a precision appropriate for your values:
select val::numeric(20, 3)
I actually prefer the explicit cast() because it sets the data type of the column to a numeric with an explicit scale -- so downstream apps are aware of the number of decimal places intended in the result.
round() returns a numeric value but it is a "generic" numeric, with no specified scale and precision.
You can see the difference in this example.
You can use a several functions to do that:
SELECT round(42.43666, 2) -- 42.44
SELECT trunc(42.43666, 2) -- 42.43
or cast:
SELECT cast(42.43666 as numeric(20, 2)) -- 42.44
according to your example should be:
SELECT team, round(AVG(score)::numeric, 2) FROM team_score_table GROUP BY team
SELECT team, trunc(AVG(score)::numeric, 2) FROM team_score_table GROUP BY team
SELECT team, cast(AVG(score) as numeric(20,2)) FROM team_score_table GROUP BY team

SQL - 1. Round the difference to 2 decimal places

I am trying to create an SQL statement with a subquery in the SELECT attribute list to show the product id, the current price and the difference between the current price and the overall average.
I know that using the ROUND function will round the difference to zero decimals but I want to round the difference to 2 decimal places.
SELECT p_code, p_price, ROUND(p_price - (SELECT AVG(p_price) FROM product)) AS "Difference"
FROM product;
I tried using CAST but it still gave me the same output.
SELECT p_code, p_price, CAST(ROUND(p_price - (SELECT AVG(p_price) FROM Lab6_Product)) as numeric(10,2)) AS "Difference"
FROM lab6_product;
Thank you in advance for your time and help!
round() takes a second argument:
SELECT p_code, p_price,
ROUND(p_price - AVG(p_price) OVER (), 2) AS "Difference"
FROM product;
Note that I also changed the subquery to a window function.
I often recommend converting to a number or decimal/numeric) instead:
SELECT p_code, p_price,
cast(p_price - AVG(p_price) OVER () as number(10, 2)) AS "Difference"
FROM product;
This ensures that the two decimal points are displayed as well.

How to use count(*) and Division Operation in SQL statements

I'm loading big data into database,
i want to know how this process going.
i use
select count(*) from table
to check how many rows loaded.
and now i want to get a Percentage of the process.
i tried:
select ( count(*)/20000 ) from table
and
select cast( ( count(*)/20000 ) as DECIMAL(6,4) ) from table
but they all return 0 ,
so how can i do this?
And better if it can show the percentage .
thanks
Integer division returns an integer, the decimal part is truncated. You could divide by 20000.0:
select ( count(*)/20000.0 ) from table
Demo
MSDN / (Divide)
If an integer dividend is divided by an integer divisor, the result is
an integer that has any fractional part of the result truncated.
Select CONVERT(DECIMAL(6,4),COUNT(*)) / CONVERT(DECIMAL(6,4),20000) FROM TABLE
-
Its important that you match the type explicitly because not all numbers are integers and not all decimals are the same. DECIMAL(6,4) is effectively its own data type which is not the same as DECIMAL(6,3).

Why doesn't this sum of percentages add up to 100%?

I have a series of calculation times in a DB2 SQL DB that are stored as float with a default value of 0.0.
The table being updated is as follows:
CREATE TABLE MY_CALC_DATA_TABLE
(
CALCDATE TIMESTAMP,
INDIV_CALC_DURATION_IN_S FLOAT WITH DEFAULT 0.0,
CALC_TIME_PERCENTAGE FLOAT WITH DEFAULT 0.0
)
Using a sproc. I am calculating the sum as follows:
CREATE OR REPLACE PROCEDURE MY_SCHEMA.MY_SPROC (IN P_DATE TIMESTAMP)
LANGUAGE SQL
NO EXTERNAL ACTION
BEGIN
DECLARE V_TOTAL_CALC_TIME_IN_S FLOAT DEFAULT 0.0;
-- other stuff setting up and joining data
-- Calculate the total time taken to perform the
-- individual calculations
SET V_TOTAL_CALC_TIME_IN_S =
(
SELECT
SUM(C.INDIV_CALC_DURATION_IN_S)
FROM
MY_SCHEMA.MY_CALC_DATA_TABLE C
WHERE
C.CALCDATE = P_DATE
)
-- Now calculate each individual calculation's percentage
-- of the toal time.
UPDATE
MY_SCHEMA.MY_CALC_DATA_TABLE C
SET
C.CALC_TIME_PERCENTAGE =
(C.INDIV_CALC_DURATION_IN_S / V_TOTAL_CALC_TIME_IN_S) * 100
WHERE
C.CALCDATE = P_DATE;
END#
Trouble is, when I do a sum of all the CALC_TIME_PERCENTAGE values for the specified CALC_DATE it is always less than 100% with the sum being values like 80% or 70% for different CALC_DATES.
We are talking between 35k and 55k calculations here with the maximum individual calculation's percentage of the total, as calculated above, being 11% and lots of calculations in the 0.00000N% range.
To calculate the total percentage I am using the simple query:
SELECT
SUM(C.CALC_TIME_PERCENTAGE)
FROM
MY_SCHEMA.MY_CALC_DATA_TABLE C
WHERE
C.CALCDATE = P_DATE;
Any suggestions?
Update: Rearranging the calc. as suggested fixed the problem. Thanks. BTW In DB2 FLOAT and DOUBLE are the same type. And now to read that suggested paper on floats.
If the field C.INDIV_CALC_DURATION_IN_S were Integer, I would assume it's a rounding error. Reading again, that is not the problem as the datatype is FLOAT.
You can still try using this. I wouldn't be surprised if this yielded (slighly) different results than the previous method:
SET
C.CALC_TIME_PERCENTAGE =
(C.INDIV_CALC_DURATION_IN_S * 100.0 / V_TOTAL_CALC_TIME_IN_S)
But you mention that there a lot of rows in a calculation for a certain date, so it may be a rounding error due to that. Try with DOUBLE datatype in both fields (or at least the CALC_TIME_PERCENTAGE field) and see if the difference from 100% gets smaller.
I'm not sure if DB2 has DECIMAL(x,y) datatype. It may be more appropriate in this case.
Another problem is how you find the sum of CALC_TIME_PERCENTAGE. I suppose you (and everyone else) would use the:
SELECT
P_DATE, SUM(CALC_TIME_PERCENTAGE)
FROM
MY_SCHEMA.MY_CALC_DATA_TABLE C
GROUP BY P_DATE
This way, you have no way to determine in what order the summation will be done. It may not be even possible to determine that but you can try:
SELECT
P_DATE, SUM(CALC_TIME_PERCENTAGE)
FROM
( SELECT
P_DATE, CALC_TIME_PERCENTAGE
FROM
MY_SCHEMA.MY_CALC_DATA_TABLE C
ORDER BY P_DATE
, CALC_TIME_PERCENTAGE ASC
) AS tmp
GROUP BY P_DATE
The optimizer may disregard the interior ORDER BY but it's worth a shot.
Another possibility for this big difference is that rows are deleted from the table between the UPDATE and the SHOW percent SUM operations.
You can test if that happens by running the calculations (without UPDATE) and summing up:
SELECT
P_DATE
, SUM( INDIV_CALC_DURATION_IN_S * 100.0 / T.TOTAL )
AS PERCENT_SUM
FROM
MY_SCHEMA.MY_CALC_DATA_TABLE C
, ( SELECT SUM(INDIV_CALC_DURATION_IN_S) AS TOTAL
FROM MY_SCHEMA.MY_CALC_DATA_TABLE
) AS TMP
GROUP BY P_DATE
Might be a rounding problem. Try C.INDIV_CALC_DURATION_IN_S * 100 / V_TOTAL_CALC_TIME_IN_S instead.
If C.INDIV_CALC_DURATION_IN_S is very small but you have a large number of rows (and thus V_TOTAL_CALC_TIME_IN_S becomes large in comparison) then
(C.INDIV_CALC_DURATION_IN_S / V_TOTAL_CALC_TIME_IN_S) * 100
is likely to lose precision, especially if you're using FLOATs.
If this is the case, then changing the calculation (as mentioned elsewhere) to
(C.INDIV_CALC_DURATION_IN_S * 100) / V_TOTAL_CALC_TIME_IN_S
should increase the total, although it may not get you all the way to 100%
If that's the case and a lot of the measurements are small fractions of a second, I'd consider looking beyond this procedure: could the times be recorded in, say, milli- or micro-seconds? Either would give you some headroom for additional significant digits.