Bigquery: INTEGER type overflow - google-bigquery

I'm experiencing problem with INTEGER type. It oveflows and where is no way to prevent it (as it's 64 bit unsigned int). The worst thing it oveflows with no error, just becoming negative number
SELECT 9223372036854775807 + 1
Is there any possibility to overcome this issue (maybe google has plans to introduce new int types)?

BigQuery will provide an option for SQL to raise error in such cases (integer overflow, division by zero etc)

You can detect such conditions and use e.g. NULL as an error indicator, at the cost of more typing.
Something like (assuming you are adding up two non-negative values):
select if(a + b >= a, a + b, NULL) from
( -- sample data
select 9223372036854775807 as a, 1 as b
)

Related

SELECT vs UPDATE, Unexpected rounding when using ABS function

Attached is a code sample to run in SQL. This seems like unexpected behavior for SQL Server. What should happen is to remove the negative from the number but when using the same function under the update command it does the absolute value and also rounds the number. Why is this?
DECLARE #TEST TABLE (TEST varchar(2048));
INSERT INTO #TEST VALUES (' -29972.95');
SELECT TEST FROM #TEST;
SELECT ABS(TEST) FROM #TEST;
UPDATE #TEST SET TEST = ABS(TEST);
SELECT TEST FROM #TEST;
Below are the results of that code.
-29972.95
29972.95
29973
This seems more a "feature" of the CONVERT function than anything to do with SELECT or UPDATE (only reason it is different is because the UPDATE implicitly converts the FLOAT(8) returned by ABS(...) back into VARCHAR).
The compute scalar in the update plan contains the expression
[Expr1003] = Scalar Operator(CONVERT_IMPLICIT(varchar(2048),
abs(CONVERT_IMPLICIT(float(53),[TEST],0))
,0) /*<-- style used for convert from float*/
)
Value - Output
0 (default) - A maximum of 6 digits. Use in scientific notation, when appropriate.
1 - Always 8 digits. Always use in scientific notation.
2 - Always 16 digits. Always use in scientific notation.
From MSDN: https://learn.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-2017
This can be seen in the example below:
SELECT
[# Digits],
CONVERT(FLOAT(8), CONVERT(VARCHAR(20), N)) AS [FLOAT(VARCHAR(N))],
CONVERT(FLOAT(8), CONVERT(VARCHAR(20), N, 0)) AS [FLOAT(VARCHAR(N, 0))],
CONVERT(FLOAT(8), CONVERT(VARCHAR(20), N, 1)) AS [FLOAT(VARCHAR(N, 1))]
FROM (SELECT '6 digits', ABS('9972.95') UNION ALL SELECT '7 digits', ABS('29972.95')) T ([# Digits], N)
This returns the following results:
# Digits FLOAT(VARCHAR(N)) FLOAT(VARCHAR(N, 0)) FLOAT(VARCHAR(N, 1))
-------- ----------------- -------------------- --------------------
6 digits 9972.95 9972.95 9972.95
7 digits 29973 29973 29972.95
This proves the UPDATE was using CONVERT(VARCHAR, ABS(...)) effectively with the default style of "0". This limited the FLOAT from the ABS to 6 digits. Taking 1 character away so it does not overflow the implicit conversion, you retain the actual values in this scenario.
Taking this back to the OP:
The ABS function in this case is returning a FLOAT(8) in the example.
The UPDATE then caused an implicit conversion that was effectively `CONVERT(VARCHAR(2048), ABS(...), 0), which then overflowed the max digits of the default style.
To get around this behavior (if this is related to a practical issue), you need to specify the style of 1 or 2 (or even 3 to get 17 digits) to avoid this truncation (but be sure to handle the scientific notation used since it is now always returned in this case)
(some preliminary testing deleted for brevity)
It definitely has to do with silent truncating during INSERT/UPDATEs.
If you change the value insertion to this:
INSERT INTO #TEST SELECT ABS(' -29972.95')
You immediately get the same rounding/truncation without doing an UPDATE.
Meanwhile, SELECT ABS(' -29972.95') produces expected results.
Further testing supports the theory of an implicit float conversion, and indicates that the culprit lies with the conversion back to varchar:
DECLARE #Flt float = ' -29972.95'
SELECT #Flt;
SELECT CAST(#Flt AS varchar(2048))
Produces:
-29972.95
-29972
Probably final edit:
I was sniffing up the same tree as Martin. I found this.
Which made me try this:
DECLARE #Flt float = ' -29972.95'
SELECT #Flt;
SELECT CONVERT(varchar(2048),#Flt,128)
Which produced this:
-29972.95
-29972.95
So I'm gonna call this kinda documented since the 128 style is a legacy style that is deprecated and may go away in a future release. But none of the currently documented styles produce the same result. Very interesting.
ABS() is supposed to operate on numeric values and varchar input is converted to float. Most likely explanation for this behavior is that float has highest precedence among all numeric data types such as decimal, int, bit.
Your SELECT statement simply returns the float result. However the UPDATE statement implicitly converts the float back to varchar producing unexpected results:
SELECT
test,
ABS(test) AS test_abs,
CAST(ABS(test) AS VARCHAR(100)) AS test_abs_str
FROM (VALUES
('-29972.95'),
('-29972.94'),
('-29972.9')
) AS test(test)
test | test_abs | test_abs_str
----------|----------|-------------
-29972.95 | 29972.95 | 29973
-29972.94 | 29972.94 | 29972.9
-29972.9 | 29972.9 | 29972.9
I would suggest that you use explicit conversion and exact numeric datatype to avoid this and other potential problems with implicit conversions / floats:
SELECT
test,
ABS(CAST(test AS DECIMAL(18, 2))) AS test_abs,
CAST(ABS(CAST(test AS DECIMAL(18, 2))) AS VARCHAR(100)) AS test_abs_str
FROM (VALUES
('-29972.95'),
('-29972.94'),
('-29972.9')
) AS test(test)
test | test_abs | test_abs_str
----------|----------|-------------
-29972.95 | 29972.95 | 29972.95
-29972.94 | 29972.94 | 29972.94
-29972.9 | 29972.90 | 29972.90
ABS is a mathematical function, that means is designed to work with numeric values, you cannot expect a proper behavior of the function when using other data types like in this case VARCHAR, I suggest first to do the required CAST to a numeric data type before applying the ABS function as follows:
UPDATE #TEST SET TEST = ABS(CAST(TEST AS DECIMAL(18,2)))
After this your query will output
29972.95
This does not solve how it is posible that ABS works fine when selecting and not when updating a value, maybe it is a bug on sqlserver but also it is a really bad practice to avoid casting to proper data types required by functions. Maybe an implicit cast occurs when a SELECT clause is performed but ignored on UPDATE because microsoft is expecting you to do the right thing.

Why does count(*) return an unsigned integer?

The following query is an example where default values (in this example INTEGER(21)) are mixed with computed values (in this example COUNT(*)).
SELECT
dimension,
SUM(metric)
FROM (
SELECT
"dim1" AS dimension,
INTEGER(21) AS metric),
(
SELECT
dimension,
COUNT(*) AS metric
FROM (
SELECT
"dim2" AS dimension,
INTEGER(42) AS metric)
GROUP BY
dimension)
GROUP BY
dimension
When running this query, it gets rejected with the following error message:
Cannot union tables : Incompatible types. 'metric' : TYPE_INT64 'metric' : TYPE_UINT64
In other words, the count operation returns an unsigned integer whereas an integer created manually is signed. I understand the underlaying logic of the count operation, which obviously always return an integer being greater or equal than 0. The same goes with the fact that this can be avoided by casting COUNT(*) by encapsulating it with the INTEGER constructor on line 11 of my sample query.
I guess my real question is: why does COUNT(*) return an unsigned integer instead of a signed one (which would allow for cleaner and simpler queries as is the case in other SQL-like environments)?
It was just an unfortunate mistake to make COUNT return unsigned integer type, especially since BigQuery doesn't even support unsigned integers in its metadata. But this (and many other issues) is fixed with standard SQL support in BigQuery, which is available as Alpha. For details how to enable it - check https://cloud.google.com/bigquery/sql-reference/enabling-standard-sql
If you're doing a count it isn't possible to have a negative number. So by making it an unsigned int then the range of numbers that can be handled is expanded.
There are several reasons why using an unsigned int is advantageous:
Philosophical: As you mentioned, COUNT cannot return negative numbers, only natural numbers, which is what unsigned ints are designed to represent. It's the right tool for the job.
Range: An unsigned int can store roughly twice as many non-negative values as a signed int. This greatly decreases the likelihood that the variable will overflow while representing the output of the function.
Type safety: By using a type that cannot represent invalid data it prevents you, the user, from trying to make invalid comparisons. If you try to compare the output of COUNT with a negative number the analyzer can tell you immediately that the comparison you are doing doesn't make sense and is likely to be wrong, thereby potentially saving you from annoying bugs down the line.

Average Row [SQL]

Actually I'm a bit confused about what should i wrote in the subject.
The point is like this, I want to average the Speed01,Speed02,Speed03 and Speed04 :
SELECT
Table01.Test_No,
Table01.Speed01,
Table01.Speed02,
Table01.Speed03,
Table01.Speed04,
I want to create new column that consists of this average -->>
AVG(Table01.Speed01, Table01.Speed02, Table01.Speed03,Table01.Speed04) as "Average"
I have tried this, but it did not work.
From
Table01
So, the contain of the Speed column could be exist but sometimes the Speed02 don't have number but the others are have numbers. sometimes speed04 data is also missing and the others is exist, sometimes only one data (example: only Speed01) have the data. lets say it depends on the sensor ability to catch the speed of the test material.
It will be a big help if you can find the solution. I'm newbie here.
THANK YOU ^^
AVG is a SQL aggregate function, therefore not applicable. So simply do the math. Average is sum divided by count:
(SPEED01 + SPEED02 + SPEED03 +SPEED04)/4
To deal with missing values, use NULLIF or COALESCE:
(COALESCE(SPEED01, 0) + COALESCE(SPEED02, 0) + COALESCE(SPEED03, 0) + COALESCE(SPEED04, 0))
That leaves the denominator. You need to add 1 for every non null. For example:
(COALESCE(SPEED01/SPEED01,0) + COALESCE(SPEED02/SPEED02,0) + ...)
You can also use CASE, depending on the supported SQL dialect, to avoid the possible divide by 0:
CASE WHEN SPEED01 IS NULL THEN 0 ELSE 1
OR you can normalize the data, extract all SPEEDs into a 1:M relation and use the AVG aggregate, avoiding all these issues. Not to mention the possibility to add a 5th measurement, then a 6th and so on and so forth!
Just add the columns and divide them by 4. To deal with the "missing" values use coalesce to treat NULL values as zero:
SELECT Test_No,
(coalesce(Speed01,0) + coalesce(Speed02,0) + coalesce(Speed03,0) + coalesce(Speed04,0)) / 4 as "Average"
FROM Table01;
You didn't mention your DBMS (Postgres, Oracle, ...), but the above is ANSI (standard) SQL and should run on nearly every DBMS.
As I understood your question, I supposed that Table01.Speed01, Table01.Speed03, Table01.Speed04 are nullable and of type int whereas Table01.Speed02 is nullable and of type nvarchar:
SELECT
Table01.Test_No,
(
ISNULL(Table01.Speed01, 0) +
CASE ISNUMERIC(Table01.Speed02) WHEN 0 THEN 0 ELSE CAST(Table01.Speed02 AS int) END +
ISNULL(Table01.Speed03, 0) +
ISNULL(Table01.Speed04, 0)
)/4 AS AVG
FROM Table01

SQL Server POWER function

Using SQL Server 2008 R2 when I enter the following query:
SELECT CAST(POWER(2.0, 63.0) AS BIGINT);
Which yields the result:
9223372036854775800
However, using the Windows desktop calculator and raising 2 to the 63 yields:
9223372036854775807
Can someone please explain the difference -- or is there some internal conversion that SQL Server is doing? ... or am I missing something else?
The range of BIGINTin MS Sql Server is:
-2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)
And your calculator is giving you the wrong number, because 2^63 can't have an odd number for its right-most digit.
The POWER function in SQL Server (http://technet.microsoft.com/en-us/library/ms174276.aspx), returns the same type as its first argument.
The correct way to write this query is:
DECLARE #foo REAL = 2.0
SELECT CAST(POWER( #foo, 63.0 ) AS BIGINT)
By which, you will get Arithmetic overflow error converting expression to data type bigint. error message.
And about the reason that's
http://www.extremeoptimization.com/resources/Articles/FPDotNetConceptsAndFormats.aspx
And regarding the question of why POWER function is returning a wrong number? As #simonatrcl mentioned in his answer, there is arithmetic problems with floating-point numbers which sometimes result in invalid result. You can read about floating-point numbers and the problems with them here:
http://www.extremeoptimization.com/resources/Articles/FPDotNetConceptsAndFormats.aspx
You can also check the boundaries for integer types in MS Sql Server here:
http://technet.microsoft.com/en-us/library/ms187745.aspx
Power will be returning a FLOAT. Floating point numbers are not accurate beyond certain limits, and will drop a bit of accuracy (if you've ever has a negative 0 problem you'll know what I mean!).
That's what you're getting here...
As far as the calculator goes and tested on XP, Win7 and Win8.1:
2^63 = 9223372036854775808 (obviously)
As far as MSSQL goes:
The upper limit of a BIGINT is defined as 2^63-1, meaning 1 less than 2^63
Now if you would like MSSQL to calculate that for you one would be tempted to write something like:
SELECT POWER(CAST(2 AS BIGINT), 63) - 1
The result would be a bigint because you've cast the first argument of the power to a bigint. MSSQL will first calculate the power and then subtract 1. However, since the result of the power would exceed the range of a bigint, this statement will fail: Arithmetic overflow error converting expression to data type bigint.
So let us invoke some math to solve this. I assume everyone agrees with
2^4 = 2 * 2 * 2 * 2 = 2 * (2^3) = 2^3 + 2^3
and thus
2^4-1 = 2 * 2 * 2 * 2 - 1 = 2 * (2^3) - 1 = 2^3 + 2^3 - 1
That's what we're going to make use of...
SELECT POWER(CAST(2 AS BIGINT), 62) + (POWER(CAST(2 AS BIGINT), 62) - 1)
This results in 9223372036854775807 which is indeed the upper limit of a bigint.
Note that the () around the subtraction is really needed. Otherwise the addition of the result of the two powers would be done first, again resulting in an overflow.

I'm confused about Sqlite comparisons on a text column

I've got an Sqlite database where one of the columns is defined as "TEXT NOT NULL". Some of the values are strings and some can be cast to a DOUBLE and some can be case to INTEGER. Once I've narrowed it down to DOUBLE values, I want to do a query that gets a range of data. Suppose my column is named "Value". Can I do this?
SELECT * FROM Tbl WHERE ... AND Value >= 23 AND Value < 42
Is that going to do some kind of ASCII comparison or a numeric comparison? INTEGER or REAL? Does the BETWEEN operator work the same way?
And what happens if I do this?
SELECT MAX(Value) FROM Tbl WHERE ...
Will it do string or integer or floating-point comparisons?
It is all explained in the Datatypes In SQLite Version 3 article. For example, the answer to the first portion of questions is
An INTEGER or REAL value is less than any TEXT or BLOB value. When an INTEGER or REAL is compared to another INTEGER or REAL, a numerical comparison is performed.
This is why SELECT 9 < '1' and SELECT 9 < '11' both give 1 (true).
The expression "a BETWEEN b AND c" is treated as two separate binary comparisons "a >= b AND a <= c"
The most important point to know is that column type is merely an annotation; SQLite is dynamically typed so each value can have any type.
you cant convert text to integer or double so you wont be able to do what you want.
If the column were varchar you could have a chance by doing:
select *
from Tbl
WHERE ISNUMERIC(Value ) = 1 --condition to avoid a conversion from string to int for example
and cast(value as integer) > 1 --rest of your conditions