Greater than in ActiveRecord query fails with floats? - sql

I need to calculate ranking for values in my Rails app.
I was following example in this question
def rank
User.where("points > ?", points).count + 1
end
Initially I verfied it with integers and it was working. But I also have need to rank floats.
For example, I have following values
0.6238564767774734
0.03700210614260772
0.022441047654982744
0.00935025180031852
0.0016195952859973067
0.0010382902478650936
0.0009367068270665785
0.0004916500182958447
0.00016560735047205894
If I call query
User.where("points > ?", 0.6238564767774734).count + 1
It returns 2. Why is that, shouldn't it return 0 as there are no values that are bigger than it? Also, queries with fourth and fifth values both return value of 5.
SQL queries from console as follows:
SELECT `users`.* FROM `users` WHERE (points > 0.623856)
SELECT `users`.* FROM `users` WHERE (points > 0.00935025)
SELECT `users`.* FROM `users` WHERE (points > 0.0016196)
Just in case I also tried length and size instead of count.
What is wrong and how I could I fix it? All help appreciated.

It looks like a problem with difference between that how mysql rounds floats and how ruby rounds floats. Using decimal instead of float might be a better idea.
Also take a look at
http://dev.mysql.com/doc/refman/5.0/en/floating-point-types.html

0.6238564767774734 goes beyond the precision of a float.
What you'd get in Postgres (I'm unaware of a pg_typeof() equivalent in MySQL):
denis=# select pg_typeof(0.6238564767774734);
pg_typeof
-----------
numeric
(1 row)
denis=# select 0.6238564767774734::decimal, 0.6238564767774734::float;
numeric | float8
--------------------+-------------------
0.6238564767774734 | 0.623856476777473
(1 row)
On its end, Ruby is using a BigDecimal. The MySQL type that would match it (more or less in MySQL, since you need to specify the precision) would be the decimal type:
http://dev.mysql.com/doc/refman/5.7/en/fixed-point-types.html
Be wary that MySQL requires a precision in this case:
mysql> select cast(0.6238564767774734 as decimal);
+-------------------------------------+
| cast(0.6238564767774734 as decimal) |
+-------------------------------------+
| 1 |
+-------------------------------------+
1 row in set (0.00 sec)
mysql> select cast(0.6238564767774734 as decimal(20,20));
+--------------------------------------------+
| cast(0.6238564767774734 as decimal(20,20)) |
+--------------------------------------------+
| 0.62385647677747340000 |
+--------------------------------------------+
1 row in set (0.00 sec)
Lastly, note that you'll still get errors due to rounding problems related to how floating point types are represented, if you stick to floats and adjust your criteria:
http://en.wikipedia.org/wiki/Floating_point#Representable_numbers.2C_conversion_and_rounding
(I'm guessing you're using decimals internally in there somewhere, but the above set of problems related to floats are good to have in mind when doing comparisons.)

Related

Postgres float column sum delta. Update schema or use cast?

It's not a new thing to discuss, but I just want you opinion.
What is a better solution in this case:
there is a column with float values:
Column | Type | Collation | Nullable |
----------------------------------------
work_res | real | | not null |
When I calculate sum over the whole table it gives one value, let's assume 100.2123
select sum(work_res) from work_data
>> 100.2123
But when I work on the same data and group first and after do sum, then sum is 100.9124
select sum(gd.work_res)
from (
select type, sum(work_res) as work_res
from work_data
group by type
) as gd
>> 100.9124
type column is just for example as a column to group by
So the problem is round here. If I will use cast to double precision in the sum function, then numbers are identical.
select sum(cast(work_res as double precision)) from work_data
>> 100.9124
Can you please advise what is better solution for me:
Use casting each time
Update the schema
Values in the column can be in range (0, 1].
For example, it can be: 0.(3), 1, 0.5, 0.(1) and so on
I think in both cases I will still see a small delta. Right?
Thanks a lot

WHILE Window Operation with Different Starting Point Values From Column - SQL Server [duplicate]

In SQL there are aggregation operators, like AVG, SUM, COUNT. Why doesn't it have an operator for multiplication? "MUL" or something.
I was wondering, does it exist for Oracle, MSSQL, MySQL ? If not is there a workaround that would give this behaviour?
By MUL do you mean progressive multiplication of values?
Even with 100 rows of some small size (say 10s), your MUL(column) is going to overflow any data type! With such a high probability of mis/ab-use, and very limited scope for use, it does not need to be a SQL Standard. As others have shown there are mathematical ways of working it out, just as there are many many ways to do tricky calculations in SQL just using standard (and common-use) methods.
Sample data:
Column
1
2
4
8
COUNT : 4 items (1 for each non-null)
SUM : 1 + 2 + 4 + 8 = 15
AVG : 3.75 (SUM/COUNT)
MUL : 1 x 2 x 4 x 8 ? ( =64 )
For completeness, the Oracle, MSSQL, MySQL core implementations *
Oracle : EXP(SUM(LN(column))) or POWER(N,SUM(LOG(column, N)))
MSSQL : EXP(SUM(LOG(column))) or POWER(N,SUM(LOG(column)/LOG(N)))
MySQL : EXP(SUM(LOG(column))) or POW(N,SUM(LOG(N,column)))
Care when using EXP/LOG in SQL Server, watch the return type http://msdn.microsoft.com/en-us/library/ms187592.aspx
The POWER form allows for larger numbers (using bases larger than Euler's number), and in cases where the result grows too large to turn it back using POWER, you can return just the logarithmic value and calculate the actual number outside of the SQL query
* LOG(0) and LOG(-ve) are undefined. The below shows only how to handle this in SQL Server. Equivalents can be found for the other SQL flavours, using the same concept
create table MUL(data int)
insert MUL select 1 yourColumn union all
select 2 union all
select 4 union all
select 8 union all
select -2 union all
select 0
select CASE WHEN MIN(abs(data)) = 0 then 0 ELSE
EXP(SUM(Log(abs(nullif(data,0))))) -- the base mathematics
* round(0.5-count(nullif(sign(sign(data)+0.5),1))%2,0) -- pairs up negatives
END
from MUL
Ingredients:
taking the abs() of data, if the min is 0, multiplying by whatever else is futile, the result is 0
When data is 0, NULLIF converts it to null. The abs(), log() both return null, causing it to be precluded from sum()
If data is not 0, abs allows us to multiple a negative number using the LOG method - we will keep track of the negativity elsewhere
Working out the final sign
sign(data) returns 1 for >0, 0 for 0 and -1 for <0.
We add another 0.5 and take the sign() again, so we have now classified 0 and 1 both as 1, and only -1 as -1.
again use NULLIF to remove from COUNT() the 1's, since we only need to count up the negatives.
% 2 against the count() of negative numbers returns either
--> 1 if there is an odd number of negative numbers
--> 0 if there is an even number of negative numbers
more mathematical tricks: we take 1 or 0 off 0.5, so that the above becomes
--> (0.5-1=-0.5=>round to -1) if there is an odd number of negative numbers
--> (0.5-0= 0.5=>round to 1) if there is an even number of negative numbers
we multiple this final 1/-1 against the SUM-PRODUCT value for the real result
No, but you can use Mathematics :)
if yourColumn is always bigger than zero:
select EXP(SUM(LOG(yourColumn))) As ColumnProduct from yourTable
I see an Oracle answer is still missing, so here it is:
SQL> with yourTable as
2 ( select 1 yourColumn from dual union all
3 select 2 from dual union all
4 select 4 from dual union all
5 select 8 from dual
6 )
7 select EXP(SUM(LN(yourColumn))) As ColumnProduct from yourTable
8 /
COLUMNPRODUCT
-------------
64
1 row selected.
Regards,
Rob.
With PostgreSQL, you can create your own aggregate functions, see http://www.postgresql.org/docs/8.2/interactive/sql-createaggregate.html
To create an aggregate function on MySQL, you'll need to build an .so (linux) or .dll (windows) file. An example is shown here: http://www.codeproject.com/KB/database/mygroupconcat.aspx
I'm not sure about mssql and oracle, but i bet they have options to create custom aggregates as well.
You'll break any datatype fairly quickly as numbers mount up.
Using LOG/EXP is tricky because of numbers <= 0 that will fail when using LOG. I wrote a solution in this question that deals with this
Using CTE in MS SQL:
CREATE TABLE Foo(Id int, Val int)
INSERT INTO Foo VALUES(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)
;WITH cte AS
(
SELECT Id, Val AS Multiply, row_number() over (order by Id) as rn
FROM Foo
WHERE Id=1
UNION ALL
SELECT ff.Id, cte.multiply*ff.Val as multiply, ff.rn FROM
(SELECT f.Id, f.Val, (row_number() over (order by f.Id)) as rn
FROM Foo f) ff
INNER JOIN cte
ON ff.rn -1= cte.rn
)
SELECT * FROM cte
Not sure about Oracle or sql-server, but in MySQL you can just use * like you normally would.
mysql> select count(id), count(id)*10 from tablename;
+-----------+--------------+
| count(id) | count(id)*10 |
+-----------+--------------+
| 961 | 9610 |
+-----------+--------------+
1 row in set (0.00 sec)

How do I query a column where a specific number does not exist in any of the rows of that column

I have ID | Name | Salary with types as Integer | String | Integer respectively.
I need to query the avg of all the rows of the Salary column, and then query the avg of all the rows of the Salary column again, but if any of those rows contain 0, remove 0 from those numbers, and calculate the avg.
So like if Salary returns 1420, 2006, 500, the next query should return 142, 26, 5. Then I calculate the avg of the subsequent numbers not containing 0.
I tried googling my specific problem but am not finding anything close to a solution. I'm not looking for an answer too much more than a shove in the right direction.
My Thoughts
Maybe I need to convert the integer data type to a varchar or string then remove the '0' digit from there, then convert back?
Maybe I need to create a temporary table from the first tables results, and insert them, just without 0?
Any ideas? Hope I was clear. Thanks!
Sample table data:
ID | Name | Salary
---+----------+-------
1 | Kathleen | 1420
2 | Bobby | 690
3 | Cat | 500
Now I need to query the above table but with the 0's removed from the salary rows
ID | Name | Salary
---+----------+-------
1 | Kathleen | 142
2 | Bobby | 69
3 | Cat | 5
You want to remove all 0s from your numbers, then take a numeric average of the result. As you are foreseeing, this requires mixing string and numeric operations.
The actual syntax will vary across databases. In MySQL, SQL Server and Oracle, you should be able to do:
select avg(replace(salary, '0', '') + 0) as myavg
from mytable
This involves two steps of implicit conversion: replace() forces string context, and + 0 turns the result back to a number. In SQL Server, you will get an integer result - if you want a decimal average instead, you might need to add a decimal value instead - so + 0.0 instead of + 0.
In Postgres, where implicit conversion is not happening as easily, you would use explicit casts:
select avg(replace(salary::text, '0', '')::int) as myavg
from mytable
This returns a decimal value.
Do you just want conditional aggregation?
select avg(salary), avg(case when salary <> 0 then salary end)
from t;
or do you want division?
select id, name, floor(salary / 10)
from t;
This produces the results you specify but it has nothing to do with "average"s.

PostgreSQL: order by column, with specific NON-NULL value LAST

When I discovered NULLS LAST, I kinda hoped it could be generalised to 'X LAST' in a CASE statement in the ORDER BY portion of a query.
Not so, it would seem.
I'm trying to sort a table by two columns (easy), but get the output in a specific order (easy), with one specific value of one column to appear last (got it done... ugly).
Let's say that the columns are zone and status (don't blame me for naming a column zone - I didn't name them). status only takes 2 values ('U' and 'S'), whereas zone can take any of about 100 values.
One subset of zone's values is (in pseudo-regexp) IN[0-7]Z, and those are first in the result. That's easy to do with a CASE.
zone can also take the value 'Future', which should appear LAST in the result.
In my typical kludgy-munge way, I have simply imposed a CASE value of 1000 as follows:
group by zone, status
order by (
case when zone='IN1Z' then 1
when zone='IN2Z' then 2
when zone='IN3Z' then 3
.
. -- other IN[X]Z etc
.
when zone = 'Future' then 1000
else 11 -- [number of defined cases +1]
end), zone, status
This works, but it's obviously a kludge, and I wonder if there might be one-liner doing the same.
Is there a cleaner way to achieve the same result?
Postgres allows boolean values in the ORDER BY clause, so here is your generalised 'X LAST':
ORDER BY (my_column = 'X')
The expression evaluates to boolean, resulting values sort this way:
FALSE (0)
TRUE (1)
NULL
Since we deal with non-null values, that's all we need. Here is your one-liner:
...
ORDER BY (zone = 'Future'), zone, status;
Related:
Sorting null values after all others, except special
Select query but show the result from record number 3
SQL two criteria from one group-by
I'm not familiar postgreSQL specifically, but I've worked with similar problems in MS SQL server. As far as I know, the only "nice" way to solve a problem like this is to create a separate table of zone values and assign each one a sort sequence.
For example, let's call the table ZoneSequence:
Zone | Sequence
------ | --------
IN1Z | 1
IN2Z | 2
IN3Z | 3
Future | 1000
And so on. Then you simply join ZoneSequence into your query, and sort by the Sequence column (make sure to add good indexes!).
The good thing about this method is that it's easy to maintain when new zone codes are created, as they likely will be.

Round up value to nearest whole number in SQL UPDATE

I'm running SQL that needs rounding up the value to the nearest whole number.
What I need is 45.01 rounds up to 46. Also 45.49 rounds to 46. And 45.99 rounds up to 46, too. I want everything up one whole digit.
How do I achieve this in an UPDATE statement like the following?
Update product SET price=Round
You could use the ceiling function; this portion of SQL code :
select ceiling(45.01), ceiling(45.49), ceiling(45.99);
will get you "46" each time.
For your update, so, I'd say :
Update product SET price = ceiling(45.01)
BTW : On MySQL, ceil is an alias to ceiling ; not sure about other DB systems, so you might have to use one or the other, depending on the DB you are using...
Quoting the documentation :
CEILING(X)
Returns the smallest integer value not
less than X.
And the given example :
mysql> SELECT CEILING(1.23);
-> 2
mysql> SELECT CEILING(-1.23);
-> -1
Try ceiling...
SELECT Ceiling(45.01), Ceiling(45.49), Ceiling(45.99)
http://en.wikipedia.org/wiki/Floor_and_ceiling_functions
For MS SQL CEILING(your number) will round it up.
FLOOR(your number) will round it down
Combine round and ceiling to get a proper round up.
select ceiling(round(984.375000), 0)) => 984
while
select round(984.375000, 0) => 984.000000
and
select ceil (984.375000) => 985
Ceiling is the command you want to use.
Unlike Round, Ceiling only takes one parameter (the value you wish to round up), therefore if you want to round to a decimal place, you will need to multiply the number by that many decimal places first and divide afterwards.
Example.
I want to round up 1.2345 to 2 decimal places.
CEILING(1.2345*100)/100 AS Cost
If you want to round off then use the round function. Use ceiling function when you want to get the smallest integer just greater than your argument.
For ex: select round(843.4923423423,0) from dual gives you 843 and
select round(843.6923423423,0) from
dual gives you 844
This depends on the database server, but it is often called something like CEIL or CEILING. For example, in MySQL...
mysql> select ceil(10.5);
+------------+
| ceil(10.5) |
+------------+
| 11 |
+------------+
You can then do UPDATE PRODUCT SET price=CEIL(some_other_field);