Finding non-numeric values in varchar column - sql

Requirement :
Generic query/function to check if the value provided in a varchar column in a table is actually a number & the precision does not exceed the allowed precision.
Available values:
Table_Name, Column_Name, Allowed Precision, Allowed Scale
General advise would be to create a function & use to_number() to validate the value however it won't validate the allowed length (precision-scale).
My solution:
Validate Number using Regexp NOT REGEXP_LIKE(COLUMN_NAME, '^-?[0-9.]+$')
Validate Length of left component (before decimal) (I have no idea what's its actually called) because for scale, oracle automatically rounds off if required. As the actual column is varchar i will use substr, instr to find the component on the left of decimal point.
As above Regexp allows number like 123...123124..55 I will also validate the number of decimal points. [If > 1 then error]
Query to find invalid number's:
Select * From Table_Name
Where
(NOT REGEXP_LIKE(COLUMN_NAME, '^-?[0-9.]+$')
OR
Function_To_Fetch_Left_Component(COLUMN_NAME) > (Precision-Scale)
/* Can use regexp_substr now but i already had a function for that */
OR
LENGTH(Column_Name) - LENGTH(REPLACE(Column_Name,'.','')) > 1
/* Can use regexp_count aswell*/)
I was happy & satisfied with my solution until a column with only '.' value escaped my check and I saw the limitation of my checks. Although adding another check to validate this as well will solve my problem the solution as a whole looks very inefficient to me.
I will really appreciate a better solution [in any way].
Thanks in advance.

Look for:
One-or-more digits optionally followed by a decimal point and zero-or-more digits; or
A leading decimal point (no preceding unit digit) and then one or more (decimal) digits.
Like this:
Select *
From Table_Name
Where NOT REGEXP_LIKE(COLUMN_NAME, '^[+-]?(\d+(\.\d*)?|\.\d+)$')
If you do not want zero-padded values in the number string then:
Select *
From Table_Name
Where NOT REGEXP_LIKE(COLUMN_NAME, '^[+-]?(([1-9]\d*|0)(\.\d*)?|\.\d+)$')
With precision and scale (assuming it works as per a NUMBER( precision, scale ) data type and scale < precision):
Select *
From Table_Name
Where NOT REGEXP_LIKE(COLUMN_NAME, '^[+-]?(\d{1,'||(precision-scale)||'}(\.\d{0,'||scale||'})?|\.\d{1,'||scale||'})$')
or, for non-zero-padded numbers with precision and scale:
Select *
From Table_Name
Where NOT REGEXP_LIKE(COLUMN_NAME, '^[+-]?(([1-9]\d{0,'||(precision-scale-1)||'}|0)(\.\d{0,'||scale||'})?|\.\d{1,'||scale||'})$')
or, for any precision and scale:
Select *
From Table_Name
Where NOT REGEXP_LIKE(
COLUMN_NAME,
CASE
WHEN scale <= 0
THEN '^[+-]?(\d{1,'||precision||'}0{'||(-scale)||'})$'
WHEN scale < precision
THEN '^[+-]?(\d{1,'||(precision-scale)||'}(\.\d{0,'||scale||'})?|\.\d{1,'||scale||'})$'
WHEN scale >= precision
THEN '^[+-]?(0(\.0{0,'||scale||'})?|0?\.0{'||(scale-precision)||'}\d{1,'||precision||'})$'
END
)

The precision means that you want at most allowed_precision digits in the number (strictly speaking, not counting leading zeros, but I'll ignore that). The scale means that at most allowed_scale can be after the decimal point.
This suggests a regular expression such as:
[-]?[0-9]{1,<before>}[.]?[0-9]{0,<after>}
You can construct the regular expression:
NOT REGEXP_LIKE(COLUMN_NAME,
REPLACE(REPLACE('[-]?[0-9]{1,<before>}[.]?[0-9]{0,<after>}', '<before>', allowed_precision - allowed_scale
), '<after>', allowed_scale)
Now, variable regular expressions are highly inefficient. You can do the logic using like and other functions as well. I think the conditions are:
(column_name not like '%.%.%' and
column_name not like '_%-%' and
translate(column_name, '0123456789-.x', 'x') is null and
length(translate(column_name, '-.x', 'x') <= allowed_precision and
length(translate(column_name, '-.x', 'x') >= 1 and
instr(translate(column_name, '-.x', 'x'), '.') <= allowed_precision - allowed_scale
)

Related

Hive casting function

In a hive table how can I add the '-' sign in a field, but for random records? If I use the syntax below it changes all the records in the field to negative, but I want to change random records to negative.
This is the syntax I used which changed all the records to negative:
CAST(CAST(-1 AS DECIMAL(1,0)) AS DECIMAL(19,2))
*CAST(regexp_replace(regexp_replace(TRIM(column name),'\\-',''),'-','') as decimal(19,2)),
If you want to change random values to negative, why not use a case expression?
select (case when rand() < 0.5 then - column_name else column_name end)
Despite your query, this assumes that the column is a number of some sort, because negating strings doesn't make much sense.

How to check if a value is a number in SQLite

I have a column that contains numbers and other string values (like "?", "???", etc.)
Is it possible to add an "is number" condition to the where clause in SQLite? Something like:
select * from mytable where isnumber(mycolumn)
From the documentation,
The typeof(X) function returns a string that indicates the datatype of the expression X: "null", "integer", "real", "text", or "blob".
You can use where typeof(mycolumn) = "integer"
You could try something like this also:
select * from mytable where printf("%d", field1) = field1;
In case your column is text and contains numeric and string, this might be somewhat helpful in extracting integer data.
Example:
CREATE TABLE mytable (field1 text);
insert into mytable values (1);
insert into mytable values ('a');
select * from mytable where printf("%d", field1) = field1;
field1
----------
1
SELECT *
FROM mytable
WHERE columnNumeric GLOB '*[0-9]*'
select * from mytable where abs(mycolumn) <> 0.0 or mycolumn = '0'
http://sqlfiddle.com/#!5/f1081/2
Based on this answer
To test whether the column contains exclusively an integer with no other alphanumeric characters, use:
NOT myColumn GLOB '*[^0-9]*' AND myColumn LIKE '_%'
I.e., we test whether the column contains anything else than a digit and invert the result. Additionally we test whether it contains at least one character.
Note that GLOB '*[0-9]*' will find digits nested between other characters as well. The function typeof() will return 'text' for a column typed as TEXT, even if the text represents a number. As #rayzinnz mentioned, the abs() function is not reliable as well.
As SQLite and MySQL follow the same syntax and loose datatypes.
The query below is also possible
SELECT
<data>
, (
LENGTH(CAST(<data> AS UNSIGNED))
)
=
CASE WHEN CAST(<data> AS UNSIGNED) = 0
THEN CAST(<data> AS UNSIGNED)
ELSE (LENGTH(<data>)
) END AS is_int;
Note the <data> is BNF you would have the replace those values.
This answer is based on mine other answer
Running SQLite demo
For integer strings, test whether the roundtrip CAST matches the original string:
SELECT * FROM mytable WHERE cast(cast(mycolumn AS INTEGER) AS TEXT) = mycolumn
For consistently-formatted real strings (for example, currency):
SELECT * FROM mytable WHERE printf("%.2f", cast(mycolumn AS REAL)) = mycolumn
Input values:
Can't have leading zeroes
Must format negatives as -number rather than (number).
You can use the result of the function CAST( field as INTEGER) for numbers greater than zero and the simple condition like '0' per numbers equal to zero
SELECT *
FROM tableName
WHERE CAST(fieldName AS INTEGER) > 0
UNION
SELECT *
FROM tableName
WHERE fieldName like '0';
This answer is comprehensive and eliminates the shortcomings of all other answers. The only caveat is that it isn't sql standard... but neither is SQLite. If you manage to break this code please comment below, and I will patch it.
Figured this out accidentally. You can check for equality with the CAST value.
CASE {TEXT_field}
WHEN CAST({TEXT_field} AS INTEGER) THEN 'Integer' -- 'Number'
WHEN CAST({TEXT_field} AS REAL) THEN 'Real' -- 'Number'
ELSE 'Character'
END
OR
CASE
WHEN {TEXT_field} = CAST({TEXT_field} AS INTEGER) THEN 'Integer' --'Number'
WHEN {TEXT_field} = CAST({TEXT_field} AS Real) THEN 'Real' --'Number'
ELSE 'Character'
END
(It's the same thing just different syntax.)
Note the order of execution. REAL must come after INTEGER.
Perhaps their is some implicit casting of values prior to checking for equality so that the right-side is re-CAST to TEXT before comparison to left-side.
Updated for comment: #SimonWillison
I have added a check for 'Real' values
'1 frog' evaluated to 'Character' for me; which is correct
'0' evaluated to 'Integer' for me; which is correct
I am using SQLite version 3.31.1 with python sqlite3 version 2.6.0. The python element should not affect how a query executes.

Value of real type incorrectly compares

I have field of REAL type in db. I use PostgreSQL. And the query
SELECT * FROM my_table WHERE my_field = 0.15
does not return rows in which the value of my_field is 0.15.
But for instance the query
SELECT * FROM my_table WHERE my_field > 0.15
works properly.
How can I solve this problem and get the rows with my_field = 0.15 ?
To solve your problem use the data type numeric instead, which is not a floating point type, but an arbitrary precision type.
If you enter the numeric literal 0.15 into a numeric (same word, different meaning) column, the exact amount is stored - unlike with a real or float8 column, where the value is coerced to next possible binary approximation. This may or may not be exact, depending on the number and implementation details. The decimal number 0.15 happens to fall between possible binary representations and is stored with a tiny error.
Note that the result of a calculation can be inexact itself, so be still wary of the = operator in such cases.
It also depends how you test. When comparing, Postgres coerces diverging numeric types to a type that can best hold the result.
Consider this demo:
CREATE TABLE t(num_r real, num_n numeric);
INSERT INTO t VALUES (0.15, 0.15);
SELECT num_r, num_n
, num_r = num_n AS test1 --> FALSE
, num_r = num_n::real AS test2 --> TRUE
, num_r - num_n AS result_nonzero --> float8
, num_r - num_n::real AS result_zero --> real
FROM t;
db<>fiddle here
Old sqlfiddle
Therefore, if you have entered 0.15 as numeric literal into your column of data type real, you can find all such rows with:
SELECT * FROM my_table WHERE my_field = real '0.15'
Use numeric columns if you need to store fractional digits exactly.
Your problem originates from IEEE 754.
0.15 is not 0.15, but 0.15000000596046448 (assuming double precision), as it can not be exactly represented as a binary floating point number.
(check this calculator)
Why is this a problem? In this case, most likely because the other side of the comparison uses the exact value 0.15 - through an exact representation, like a numeric type. (Cleared up on suggestion by Eric)
So there are two ways:
use a format that actually stores the numbers in decimal format - as Erwin suggested
(or at least use the same type across the board)
use rounding as Jack suggested - which has to be used carefully (by the way this uses a numeric type too, to exactly represent 0.15...)
Recommended reading:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
(Sorry for the terse answer...)
Well, I can't see your data, but I'm guessing that my_field doesn't exactly equal 0.15. Try:
select * from my_table where round(my_field::numeric,2) = 0.15;
Considering both PPTerka's and Jack's answer.
Approximate numeric data types do not store the exact values specified for many numbers;
Look here for MS' decription of real values.
http://technet.microsoft.com/en-us/library/ms187912(v=sql.105).aspx

How do I count decimal places in SQL?

I have a column X which is full of floats with decimals places ranging from 0 (no decimals) to 6 (maximum). I can count on the fact that there are no floats with greater than 6 decimal places. Given that, how do I make a new column such that it tells me how many digits come after the decimal?
I have seen some threads suggesting that I use CAST to convert the float to a string, then parse the string to count the length of the string that comes after the decimal. Is this the best way to go?
You can use something like this:
declare #v sql_variant
set #v=0.1242311
select SQL_VARIANT_PROPERTY(#v, 'Scale') as Scale
This will return 7.
I tried to make the above query work with a float column but couldn't get it working as expected. It only works with a sql_variant column as you can see here: http://sqlfiddle.com/#!6/5c62c/2
So, I proceeded to find another way and building upon this answer, I got this:
SELECT value,
LEN(
CAST(
CAST(
REVERSE(
CONVERT(VARCHAR(50), value, 128)
) AS float
) AS bigint
)
) as Decimals
FROM Numbers
Here's a SQL Fiddle to test this out: http://sqlfiddle.com/#!6/23d4f/29
To account for that little quirk, here's a modified version that will handle the case when the float value has no decimal part:
SELECT value,
Decimals = CASE Charindex('.', value)
WHEN 0 THEN 0
ELSE
Len (
Cast(
Cast(
Reverse(CONVERT(VARCHAR(50), value, 128)) AS FLOAT
) AS BIGINT
)
)
END
FROM numbers
Here's the accompanying SQL Fiddle: http://sqlfiddle.com/#!6/10d54/11
This thread is also using CAST, but I found the answer interesting:
http://www.sqlservercentral.com/Forums/Topic314390-8-1.aspx
DECLARE #Places INT
SELECT TOP 1000000 #Places = FLOOR(LOG10(REVERSE(ABS(SomeNumber)+1)))+1
FROM dbo.BigTest
and in ORACLE:
SELECT FLOOR(LOG(10,REVERSE(CAST(ABS(.56544)+1 as varchar(50))))) + 1 from DUAL
A float is just representing a real number. There is no meaning to the number of decimal places of a real number. In particular the real number 3 can have six decimal places, 3.000000, it's just that all the decimal places are zero.
You may have a display conversion which is not showing the right most zero values in the decimal.
Note also that the reason there is a maximum of 6 decimal places is that the seventh is imprecise, so the display conversion will not commit to a seventh decimal place value.
Also note that floats are stored in binary, and they actually have binary places to the right of a binary point. The decimal display is an approximation of the binary rational in the float storage which is in turn an approximation of a real number.
So the point is, there really is no sense of how many decimal places a float value has. If you do the conversion to a string (say using the CAST) you could count the decimal places. That really would be the best approach for what you are trying to do.
I answered this before, but I can tell from the comments that it's a little unclear. Over time I found a better way to express this.
Consider pi as
(a) 3.141592653590
This shows pi as 11 decimal places. However this was rounded to 12 decimal places, as pi, to 14 digits is
(b) 3.1415926535897932
A computer or database stores values in binary. For a single precision float, pi would be stored as
(c) 3.141592739105224609375
This is actually rounded up to the closest value that a single precision can store, just as we rounded in (a). The next lowest number a single precision can store is
(d) 3.141592502593994140625
So, when you are trying to count the number of decimal places, you are trying to find how many decimal places, after which all remaining decimals would be zero. However, since the number may need to be rounded to store it, it does not represent the correct value.
Numbers also introduce rounding error as mathematical operations are done, including converting from decimal to binary when inputting the number, and converting from binary to decimal when displaying the value.
You cannot reliably find the number of decimal places a number in a database has, because it is approximated to round it to store in a limited amount of storage. The difference between the real value, or even the exact binary value in the database will be rounded to represent it in decimal. There could always be more decimal digits which are missing from rounding, so you don't know when the zeros would have no more non-zero digits following it.
Solution for Oracle but you got the idea. trunc() removes decimal part in Oracle.
select *
from your_table
where (your_field*1000000 - trunc(your_field*1000000)) <> 0;
The idea of the query: Will there be any decimals left after you multiply by 1 000 000.
Another way I found is
SELECT 1.110000 , LEN(PARSENAME(Cast(1.110000 as float),1)) AS Count_AFTER_DECIMAL
I've noticed that Kshitij Manvelikar's answer has a bug. If there are no decimal places, instead of returning 0, it returns the total number of characters in the number.
So improving upon it:
Case When (SomeNumber = Cast(SomeNumber As Integer)) Then 0 Else LEN(PARSENAME(Cast(SomeNumber as float),1)) End
Here's another Oracle example. As I always warn non-Oracle users before they start screaming at me and downvoting etc... the SUBSTRING and INSTRING are ANSI SQL standard functions and can be used in any SQL. The Dual table can be replaced with any other table or created. Here's the link to SQL SERVER blog whre i copied dual table code from: http://blog.sqlauthority.com/2010/07/20/sql-server-select-from-dual-dual-equivalent/
CREATE TABLE DUAL
(
DUMMY VARCHAR(1)
)
GO
INSERT INTO DUAL (DUMMY)
VALUES ('X')
GO
The length after dot or decimal place is returned by this query.
The str can be converted to_number(str) if required. You can also get the length of the string before dot-decimal place - change code to LENGTH(SUBSTR(str, 1, dot_pos))-1 and remove +1 in INSTR part:
SELECT str, LENGTH(SUBSTR(str, dot_pos)) str_length_after_dot FROM
(
SELECT '000.000789' as str
, INSTR('000.000789', '.')+1 dot_pos
FROM dual
)
/
SQL>
STR STR_LENGTH_AFTER_DOT
----------------------------------
000.000789 6
You already have answers and examples about casting etc...
This question asks of regular SQL, but I needed a solution for SQLite. SQLite has neither a log10 function, nor a reverse string function builtin, so most of the answers here don't work. My solution is similar to Art's answer, and as a matter of fact, similar to what phan describes in the question body. It works by converting the floating point value (in SQLite, a "REAL" value) to text, and then counting the caracters after a decimal point.
For a column named "Column" from a table named "Table", the following query will produce a the count of each row's decimal places:
select
length(
substr(
cast(Column as text),
instr(cast(Column as text), '.')+1
)
) as "Column-precision" from "Table";
The code will cast the column as text, then get the index of a period (.) in the text, and fetch the substring from that point on to the end of the text. Then, it calculates the length of the result.
Remember to limit 100 if you don't want it to run for the entire table!
It's not a perfect solution; for example, it considers "10.0" as having 1 decimal place, even if it's only a 0. However, this is actually what I needed, so it wasn't a concern to me.
Hopefully this is useful to someone :)
Probably doesn't work well for floats, but I used this approach as a quick and dirty way to find number of significant decimal places in a decimal type in SQL Server. Last parameter of round function if not 0 indicates to truncate rather than round.
CASE
WHEN col = round(col, 1, 1) THEN 1
WHEN col = round(col, 2, 1) THEN 2
WHEN col = round(col, 3, 1) THEN 3
...
ELSE null END

Determine MAX Decimal Scale Used on a Column

In MS SQL, I need a approach to determine the largest scale being used by the rows for a certain decimal column.
For example Col1 Decimal(19,8) has a scale of 8, but I need to know if all 8 are actually being used, or if only 5, 6, or 7 are being used.
Sample Data:
123.12345000
321.43210000
5255.12340000
5244.12345000
For the data above, I'd need the query to either return 5, or 123.12345000 or 5244.12345000.
I'm not concerned about performance, I'm sure a full table scan will be in order, I just need to run the query once.
Not pretty, but I think it should do the trick:
-- Find the first non-zero character in the reversed string...
-- And then subtract from the scale of the decimal + 1.
SELECT 9 - PATINDEX('%[1-9]%', REVERSE(Col1))
I like #Michael Fredrickson's answer better and am only posting this as an alternative for specific cases where the actual scale is unknown but is certain to be no more than 18:
SELECT LEN(CAST(CAST(REVERSE(Col1) AS float) AS bigint))
Please note that, although there are two explicit CAST calls here, the query actually performs two more implicit conversions:
As the argument of REVERSE, Col1 is converted to a string.
The bigint is cast as a string before being used as the argument of LEN.
SELECT
MAX(CHAR_LENGTH(
SUBSTRING(column_name::text FROM '\.(\d*?)0*$')
)) AS max_scale
FROM table_name;
*? is the non-greedy version of *, so \d*? catches all digits after the decimal point except trailing zeros.
The pattern contains a pair of parentheses, so the portion of the text that matched the first parenthesized subexpression (that is \d*?) is returned.
References:
https://www.postgresql.org/docs/9.6/static/sql-createcast.html
https://www.postgresql.org/docs/9.6/static/functions-matching.html
Note this will scan the entire table:
SELECT TOP 1 [Col1]
FROM [Table]
ORDER BY LEN(PARSENAME(CAST([Col1] AS VARCHAR(40)), 1)) DESC