PostgreSQL text array - query as integer, ignoring non-digits - sql

I have a table of people. Each person can have several regnums (mostly integers but some like M/2344 and W345). To make things a bit more complicated, there are NULLs, empties, and strings like 'NA'. Due to their unpredictable composition, the regnums are stored in a text array field (e.g. {12345,M/2344} and {3459,NA}).
Because most people have regnums that can be treated as integers, I would like to be able to do things with this field like find people with a regnum between, say, 491555 and 491685.
I've tried:
SELECT id,forename,surname,regnum FROM (SELECT *, unnest(regnum) reg FROM people) as TBL WHERE reg BETWEEN '491555' AND '491685';
but results include out-of-range regnums, e.g. 49162. I assume this is because the unnested regnum field is still a text field(?)
I've also tried casting the regnum as an integer field - unnest(regnum::integer[]) - but I get errors:
Error in query: ERROR: invalid input syntax for integer: "NA"
I think I'm on the right track, but I don't get how to ignore non-int-like regnums. Any ideas?

You can test if a text value consists only of digits by checking it with regular expression, like this:
SELECT '1234' ~ '^[0-9]+$' -- true
SELECT 'NA' ~ '^[0-9]+$' -- false
So, in your case you need to cast value to integer only if it is numerical:
WHERE (CASE WHEN reg ~ '^[0-9]+$' THEN reg::integer ELSE null END) BETWEEN 491555 AND 491685

Related

SQL Decode format numbers only

I want to format amounts to salary format, e.g. 10000 becomes 10,000, so I use to_char(amount, '99,999,99')
SELECT SUM(DECODE(e.element_name,'Basic Salary',to_char(v.screen_entry_value,'99,999,99'),0)) Salary,
SUM(DECODE(e.element_name,'Transportation Allowance',to_char(v.screen_entry_value,'99,999,99'),0)) Transportation,
SUM(DECODE(e.element_name,'GOSI Processing',to_char(v.screen_entry_value,'99,999,99'),0)) GOSI,
SUM(DECODE(e.element_name,'Housing Allowance',to_char(v.screen_entry_value,'99,999,99'),0)) Housing
FROM values v,
values_types vt,
elements e
WHERE vt.value_type = 'Amount'
this gives error invalid number because not all values are numbers until value_type is equal to Amount but I guess decode check all values anyway although what I know is that the execution begins with from then where then select, what's going wrong here?
You said you added decode(...), but it looks like you might have actually added sum(decode(...)).
You are converting your values to strings with to_char(v.screen_entry_value,'99,999,99'), so your decode() generates a string - the default 0 will be converted to '0' - giving you a value like '1,234,56'. Then you are aggregating those, so sum() has to implicitly convert those strings to numbers - and it is throwing the error when it tries to do that:
select to_number('1,234,56') from dual
will also get "ORA-01722: invalid number", unless you supply a similar format mask so it knows how to interpret it. You could do that, e.g.:
SUM(to_number(DECODE(e.element_name,'Basic Salary',to_char(v.screen_entry_value,'99,999,99'),0),'99,999,99'))
... but it's maybe more obvious that something is strange, and even if you did, you would end up with a number, not a formatted string.
So instead of doing:
SUM(DECODE(e.element_name,'Basic Salary',to_char(v.screen_entry_value,'99,999,99'),0))
you should format the result after aggregating:
to_char(SUM(DECODE(e.element_name,'Basic Salary',v.screen_entry_value,0)),'99,999,99')
fiddle with dummy tables, data and joins.

Converting column value from varchar to integer in PostgreSql

I would like to return all rows where the sv column, a varchar, is greater than 40. How can I convert sv to an integer on the fly.
The line below returns ERROR: invalid input syntax for integer: "SV"
SELECT namefirst, namelast, yearid FROM pitching JOIN people ON pitching.playerID = people.playerID WHERE CAST(sv AS INTEGER)>40;
Postgres doesn't have a built-in way to avoid conversion errors. One method is to use a case expression:
WHERE (CASE WHEN sv ~ '^[0-9]+$' THEN sv::integer END) > 40
Or, if the integers are zero padded on the left, then you might be able to use string comparisons:
WHERE sv >= '40'
However, this runs the risk of matching non-numeric values (which you seem to have given the error you are getting).

Treat TO_NUMBER() invalid format errors as NULL

I have a string column which usually contains integers in two formats... zero-padded, and not:
5
05
I want to sort based on these values numerically. To do that I do something like:
SELECT * FROM things ORDER BY TO_NUMBER(num, '0000');
This works fine, but sometimes there is invalid data, like abc, or !## in this column. Postgres becomes unhappy with me:
ERROR: invalid input syntax for type numeric: " "
What I'd like to do is treat invalid values/failures of TO_NUMBER() as NULL so that they are sorted accordingly. Is this possible? Or, some other alternative?
If you are using PostgreSQL, you can use this query:
SELECT * FROM things ORDER BY
TO_NUMBER((case when num ~ '^[0-9\.]+$' THEN num else '0' end),'0000');

Hive - how to check if a numeric columns have number/decimal?

I am trying to generate a hive query which will take multiple numeric column names and check whether it is has numeric values. If the column has numeric values then the output should be (column name,true) else if the field has NULL or some string value the output should be (column name,false)
SELECT distinct (test_nr1,test_nr2) FROM test.abc WHERE (test_nr1,test_nr2) not like '%[^0-9]%';
SELECT distinct test_nr1,test_nr2 from test.abc limit 2;
test_nr1 test_nr2
NULL 81432269
NULL 88868060
the desired output should be :
test_nr1 false
test_nr2 true
Since test_nr1 is a decimal field and it has NULL values, it should output false.
Appreciate valuable suggestions.
You can use cast function. It returns NULL when the value can not not be cast to numeric.
For example:
select case when cast('23ccc' as double) is null then false else true end as IsNumber;
You're trying to use character class pattern matching syntax here, and it doesn't work in every SQL implementation IIRC, however, regexp matching works in most, if not all, SQL implementations.
Considering you're using hive, this should do it:
SELECT ('test_nr1', test_nr1 RLIKE '\d'), ('test_nr2', test_nr2 RLIKE '\d') FROM test.abc;
You should remember that regexp matching is very slow in SQL though.

Locate Cause of IBM DB2 CAST Failure

I need to work on an IBM DB2 database.
The LOCATION field is a CHARACTER(8) field of numbers.
To sort the table, the column is cast to an INTEGER:
SELECT LOCATION, PARTNO, INSTOCK
FROM INVENTORY
ORDER BY CAST(LOCATION AS INTEGER)
Currently, this fails with:
ERROR [22018] [IBM][DB2/AIX64] SQL0420N Invalid character found in a character string argument of the function "INTEGER".
Is there a quick way to determine which row is failing?
IBM's solution is to "Insure that the results set for the query item that the cast it being applied to does not contain non numeric SQL constants when casting to a numeric type."
That wasn't really helpful.
Thinking someone inserted a letter O or lower case L, I tried this:
SELECT DISTINCT LOCATION
FROM LOCATIONS
WHERE LOCATION LIKE '%l%' OR LOCATION LIKE '%O%'
ORDER BY LOCATION
Zero records returned.
That wasn't really helpful.
That's IBM error messages and documentation in a nutshell.
One place to start is the TRANSLATE() function.
SELECT LOCATION, PARTNO, INSTOCK
FROM INVENTORY
WHERE TRANSLATE(LOCATION, '', ' 0123456789') <> ''
You can add other characters, like -, ., etc. depending on what you find.