Postgres float column sum delta. Update schema or use cast? - sql

It's not a new thing to discuss, but I just want you opinion.
What is a better solution in this case:
there is a column with float values:
Column | Type | Collation | Nullable |
----------------------------------------
work_res | real | | not null |
When I calculate sum over the whole table it gives one value, let's assume 100.2123
select sum(work_res) from work_data
>> 100.2123
But when I work on the same data and group first and after do sum, then sum is 100.9124
select sum(gd.work_res)
from (
select type, sum(work_res) as work_res
from work_data
group by type
) as gd
>> 100.9124
type column is just for example as a column to group by
So the problem is round here. If I will use cast to double precision in the sum function, then numbers are identical.
select sum(cast(work_res as double precision)) from work_data
>> 100.9124
Can you please advise what is better solution for me:
Use casting each time
Update the schema
Values in the column can be in range (0, 1].
For example, it can be: 0.(3), 1, 0.5, 0.(1) and so on
I think in both cases I will still see a small delta. Right?
Thanks a lot

Related

Use subquery with multiple rows

I have been trying to work out code in SQL to clean up a data sheet (more than 200 rows and 50 columns) to add trailing zeros before the decimal point values.
I tried to apply a to_char to convert string data into a 0 padded figure, for all values less than 1
select to_char((select "1980" from imf_population where "1980" <1), '0.999')
from imf_population
However due to the subquery the to_char cannot perform a conversion on multiple rows returned from the 1980 column as there is more one record whose value is less than 1.
Any tips on how to get around this?
Your to_char must be in, not out. Once in, the outer select is not needed anymore:
select to_char("1980",'0.999') from imf_population where "1980"<1;
"1980" is a column name, right? (well, sqlite accepted create table imf_population ("1980" number); select "1980" from imf_population;, but it does not have to_char, I guess you're using oracle)
Note : Only Use Lowercase Letters, Numbers, and Underscores when
naming columns. Use Simple, Descriptive Column Names
CREATE TABLE imf_population (col varchar2(20) )
INSERT INTO imf_population (col) VALUES ('0.5289')
select to_char(col,'0.999') from imf_population where col<1;
| TO_CHAR(COL,'0.999') |
| :------------------- |
| 0.529 |
db<>fiddle here

How do I query a column where a specific number does not exist in any of the rows of that column

I have ID | Name | Salary with types as Integer | String | Integer respectively.
I need to query the avg of all the rows of the Salary column, and then query the avg of all the rows of the Salary column again, but if any of those rows contain 0, remove 0 from those numbers, and calculate the avg.
So like if Salary returns 1420, 2006, 500, the next query should return 142, 26, 5. Then I calculate the avg of the subsequent numbers not containing 0.
I tried googling my specific problem but am not finding anything close to a solution. I'm not looking for an answer too much more than a shove in the right direction.
My Thoughts
Maybe I need to convert the integer data type to a varchar or string then remove the '0' digit from there, then convert back?
Maybe I need to create a temporary table from the first tables results, and insert them, just without 0?
Any ideas? Hope I was clear. Thanks!
Sample table data:
ID | Name | Salary
---+----------+-------
1 | Kathleen | 1420
2 | Bobby | 690
3 | Cat | 500
Now I need to query the above table but with the 0's removed from the salary rows
ID | Name | Salary
---+----------+-------
1 | Kathleen | 142
2 | Bobby | 69
3 | Cat | 5
You want to remove all 0s from your numbers, then take a numeric average of the result. As you are foreseeing, this requires mixing string and numeric operations.
The actual syntax will vary across databases. In MySQL, SQL Server and Oracle, you should be able to do:
select avg(replace(salary, '0', '') + 0) as myavg
from mytable
This involves two steps of implicit conversion: replace() forces string context, and + 0 turns the result back to a number. In SQL Server, you will get an integer result - if you want a decimal average instead, you might need to add a decimal value instead - so + 0.0 instead of + 0.
In Postgres, where implicit conversion is not happening as easily, you would use explicit casts:
select avg(replace(salary::text, '0', '')::int) as myavg
from mytable
This returns a decimal value.
Do you just want conditional aggregation?
select avg(salary), avg(case when salary <> 0 then salary end)
from t;
or do you want division?
select id, name, floor(salary / 10)
from t;
This produces the results you specify but it has nothing to do with "average"s.

How to use a group by but still access every number values cleverly

I need to "group by" my data to distinguish between tests (each test has a specific id, name and temperature) and to calculate their count, standard deviation, etc. But I also need access every raw data value from each group, for further indexes calculations that I do in a python script.
I have found two solution to this problem, but both seems non-optimal/flawed:
1) Using listagg to store every raw value that were grouped into a single string row. It does the work but it is not optimized : I concatenate multiples float values into a giant string that I will immediately de-concatenate and convert back to float. That seem necessary and costly.
2) Removing the group by entirely and do the count and standard deviation though partitioning. But that seems even worse to me. I don't know if PLSQL/oracle optimizes this, it could be calculating the same count and standard deviation for every line (I don't know how to check this). The query result also becomes messy: since there is no 'group by' anymore, I have to do add multiple checks in my python file in order to differentiate every test data (specific id, name and temperature).
I think that my first solution can be improved but I don't know how. How can I use a group by but still access every number values cleverly ?
A function similar to list_agg but with a collection/array output type instead of a string output type could maybe do the trick (a sort of 'array_agg' compatible with oracle), but I don't know any.
EDIT:
The sample data is complex and probably restricted to the company viewing, but I can show you my simplified query for my 1) :
SELECT
rav.rav_testid as test_id,
tte.tte_testname as test_name,
tsc.tsc_temperature as temperature,
listagg(rav.rav_value, ' ')WITHIN GROUP (ORDER BY rav.rav_value) as all_specific_test_values,
COUNT(rav.rav_value) as n,
STDDEV(rav.rav_value) as sigma,
FROM
...
(8 inner joins)
GROUP BY
rav.rqv_testid, tte.tte_testname,tsc.tsc_temperature
ORDER BY
rav.RAV_testid, tte.tte_testname, spd.SPD_SPLITNAMEINTERNAL,tsc.tsc_temperature
The result looks like :
test_id | test_name | temperature | all_specific_test_values | n | sigma
-------------------------------------------------------------------------
6001 |VADC_A(...) | -40 | ,8094034194946289 ,8(...)| 58 | 0,54
6001 |VADC_A(...) | 25 | ,5054857852946545 ,6(...)| 56 | 0,24
6001 |VADC_A(...) | 150 | ,8625754277452524 ,4(...)| 56 | 0,26
6002 |VADC_B(...) | -40 | ,9874514651548454 ,5(...)| 57 | 0,44
I think you want analytic functions:
select t.*,
count(*) over (partition by test) as cnt,
avg(value) over (partition by test) as avg_value,
stddev(value) over (partition by test) as stddev_value
from t;
This adds additional columns on each row.
I would suggest going with #Gordon_Linoff's solution. That is likely the most standard solution.
If you want to go with a less standard solution, you can have a group by that returns a collection as one of the columns. Presumably, your script could iterate through that collection though it might take a bit of work in the script to do that.
create type num_tbl as table of number;
/
create table foo (
grp integer,
val number
);
insert into foo values( 1, 1.1 );
insert into foo values( 2, 1.2 );
insert into foo values( 1, 1.3 );
insert into foo values( 2, 1.4 );
select grp, avg(val), cast( collect( val ) as num_tbl )
from foo
group by grp

Redshift division result does not include decimals

I'm trying to do something really quite basic to calculate a kind of percentage between two columns in Redshift. However, when I run the query with an example the result is simply zero because the decimals are not being covered.
code:
select 1701 / 84936;
Output:
I tried :
select cast(1701 / 84936 as numeric (10,10));
but the result was 0.0000000000.
How could I solve this silly thing?
It is integer division. Make sure that at least one argument is: NUMERIC(accurate data type)/FLOAT(caution: it's approximate data type):
/ division (integer division truncates the result)
select 1701.0 / 84936;
-- or
SELECT 1.0 * 1701 / 84936;
-- or
SELECT CAST(1701 AS NUMERIC(10,4))/84936;
DBFiddle Demo
When mixing data types the order counts
Note that the order of the elements in a math expression counts for the data type of the result.
Let's assume that we intend to calculate the percentage unit_sales/total_sales where both columns (or numbers) are integers.
See and try with this code here.
-- Some dummy table
drop table if exists sales;
create table sales as
select 3 as unit_sales, 9 as total_sales;
-- The calculations
select
unit_sales/total_sales*100, --> 0 (integer)
unit_sales/total_sales*100.0, --> 0.0 (float)
100.0*unit_sales/total_sales --> 33.3 (float and expected result)
from sales;
The output
0 | 0.0 | 33.33
The first column is 0 (integer) because of 3/9=0 in an integer division.
The second column is 0.0 because SQL first got the integer 0 (3/9), and later, SQL converts it to float in order to perform the multiplication by 100.0.
The expected result.
The non-integer 100.0 at the beginning of the expression force a non-integer calculation.

Greater than in ActiveRecord query fails with floats?

I need to calculate ranking for values in my Rails app.
I was following example in this question
def rank
User.where("points > ?", points).count + 1
end
Initially I verfied it with integers and it was working. But I also have need to rank floats.
For example, I have following values
0.6238564767774734
0.03700210614260772
0.022441047654982744
0.00935025180031852
0.0016195952859973067
0.0010382902478650936
0.0009367068270665785
0.0004916500182958447
0.00016560735047205894
If I call query
User.where("points > ?", 0.6238564767774734).count + 1
It returns 2. Why is that, shouldn't it return 0 as there are no values that are bigger than it? Also, queries with fourth and fifth values both return value of 5.
SQL queries from console as follows:
SELECT `users`.* FROM `users` WHERE (points > 0.623856)
SELECT `users`.* FROM `users` WHERE (points > 0.00935025)
SELECT `users`.* FROM `users` WHERE (points > 0.0016196)
Just in case I also tried length and size instead of count.
What is wrong and how I could I fix it? All help appreciated.
It looks like a problem with difference between that how mysql rounds floats and how ruby rounds floats. Using decimal instead of float might be a better idea.
Also take a look at
http://dev.mysql.com/doc/refman/5.0/en/floating-point-types.html
0.6238564767774734 goes beyond the precision of a float.
What you'd get in Postgres (I'm unaware of a pg_typeof() equivalent in MySQL):
denis=# select pg_typeof(0.6238564767774734);
pg_typeof
-----------
numeric
(1 row)
denis=# select 0.6238564767774734::decimal, 0.6238564767774734::float;
numeric | float8
--------------------+-------------------
0.6238564767774734 | 0.623856476777473
(1 row)
On its end, Ruby is using a BigDecimal. The MySQL type that would match it (more or less in MySQL, since you need to specify the precision) would be the decimal type:
http://dev.mysql.com/doc/refman/5.7/en/fixed-point-types.html
Be wary that MySQL requires a precision in this case:
mysql> select cast(0.6238564767774734 as decimal);
+-------------------------------------+
| cast(0.6238564767774734 as decimal) |
+-------------------------------------+
| 1 |
+-------------------------------------+
1 row in set (0.00 sec)
mysql> select cast(0.6238564767774734 as decimal(20,20));
+--------------------------------------------+
| cast(0.6238564767774734 as decimal(20,20)) |
+--------------------------------------------+
| 0.62385647677747340000 |
+--------------------------------------------+
1 row in set (0.00 sec)
Lastly, note that you'll still get errors due to rounding problems related to how floating point types are represented, if you stick to floats and adjust your criteria:
http://en.wikipedia.org/wiki/Floating_point#Representable_numbers.2C_conversion_and_rounding
(I'm guessing you're using decimals internally in there somewhere, but the above set of problems related to floats are good to have in mind when doing comparisons.)