impala cast as decimal errors out for null values - impala

I have a hive table which I am querying thru Impala and I have a very basic issue. The table has time duration as a string and sometimes due to error the value is not populated and it's empty.
When I use a query like the select cast(duration as decimal) from my table then if it encounters null in the sample, it fails with an error like UDF ERROR. STRING TO DECIMAL PARSE FAILED.
So my question is how to handle null here. I want Impala to ignore those rows which has null or simply treat as 0 rather than coming to a grinding halt.
I tried running with the -c option at impala-shell to ignore the error but it did not help.
When I try to use coalesce to return 0 instead of null it gives different error.
when I try ZEROIFNULL it gives another error.
Nothing seems to be working.
This seems a very basic issue so I assume there should be some good way to handle nulls. Unfortunately can not find anything even if I google for hours. so please help. :)

After lots of research I realized all the standard way to handle null was not working as the data which I was "thinking" to be null was not actually null but empty string. Seems like Impala treats null and empty string differently. Found another question on this board another post and that helped to handle empty string like below:
trim(duration) not like ''
using above I was able to exclude those rows which has empty string. thanks for reading.

This should work:
select
case col
when null then 0
when '' then 0
else cast(col as decimal(15, 2))
end as col
from table;

I prefer to use regex and nullif for these situations. Regex to pull out the decimal value and nullif to account for the empty string situation.
select cast(nullif(regexp_extract(null, '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal(2,1)) as shouldbenull;
select cast(nullif(regexp_extract('', '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal(2,1)) as shouldbenull;
select cast(nullif(regexp_extract('dirtydata', '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal(2,1)) as shouldbenull;
select cast(nullif(regexp_extract('1.4', '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal) as shouldroundto1;
select cast(nullif(regexp_extract('1.4', '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal(2,1)) as shouldbe1_4;
If you need your final answer to be zero, then just wrap it in a zeroifnull function.
select zeroifnull(cast(nullif(regexp_extract('', '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal(2,1))) as shouldbezero;
select zeroifnull(cast(nullif(regexp_extract('1.4', '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal(2,1))) as shouldbe1_4;

Related

SQL Server CASE statement with multiple THEN clauses

I have seen several similar questions but none cover what I need. I need to put another THEN statement after the first one. My column contains int's. When it returns NULL I need it to display a blank space, but when I try the below code, I just get '0'.
CASE
WHEN Column1 IS NULL
THEN ''
ELSE Column1
END
If I try to put a sting after THEN then it tells me that it cannot convert it from int. I need to convert it to varchar and then change its output to a blank space afterwards, such as:
e.g.
CASE
WHEN Column1 IS NULL
THEN CONVERT(varchar(10), Column1)
THEN ''
ELSE Column1
END
Is there a way of doing this?
Thanks
Rob
A case expression returns a single value -- with a given type. If you want a string result, then you need to be sure that all paths in the case return strings:
CASE WHEN Column1 IS NULL
THEN ''
ELSE CAST(Column1 AS VARCHAR(255))
END
This is more simply written using COALESCE():
COALESCE(CAST(Column1 as VARCHAR(255)), '')
You cannot display an integer as a "blank" (other than using a NULL value).

How many ways can you generate an error converting varchar to numeric that won't be caught by ISNUMERIC()?

I am in the process of loading a bunch of tables into SQL Server and converting them from varchar to specific data types (int, date, etc.). One frustration is how many different ways there are to break the conversion from string to numeric (int, decimal, etc) and that there is not an easy diagnostic tool to find the offending rows (besides ISNUMERIC() which doesn't work all the time).
Here is my list of ways to break the conversion that won't get caught by ISNUMERIC().
The string contains scientific notation (ie 3.55E-10)
The string contains a blank ('')
The string contains a non-alphanumeric symbol ('$', '-', ',')
Here's what I'm currently using to compensate:
SELECT
CASE
WHEN [MyColumn] IN ('','-') THEN NULL -- deals with blanks
WHEN [MyColumn] LIKE '%E%' THEN CONVERT(DECIMAL(20, 4), CONVERT(FLOAT(53), [MyColumn])) -- deals with scientific notation
ELSE CAST(REPLACE(REPLACE([MyColumn] , '$', ''), '-', '') AS DECIMAL(20, 4))
END [MyColumn] -- deals with special characters
FROM
MyTable
Does anyone else have others? Or good ways to diagnose?
Don't use ISNUMERIC(). If you are on 2012+ then you could use TRY_CAST or TRY_CONVERT.
If you are on older versions, you could use some syntax like this:
SELECT *
FROM #TableA
WHERE ColA NOT LIKE '%[^0-9]%'
You can try to use LIKE '%[0-9]%' instead of ISNUMERIC()
SELECT col, CASE WHEN col NOT LIKE '%[^0-9]%' and col<>''
THEN 1
ELSE 0
END
FROM T
You can use NOT LIKE to exclude anything that isn't a digit... and REPLACE for commas and periods. Naturally, you can add other nested REPLACE functions for values you want to accept.
declare #var varchar(64) = '55,5646'
SELECT
CASE
WHEN replace(replace(#var,'.',''),',','') NOT LIKE '%[^0-9]%'
THEN 1
ELSE 0
END
This allows you to accept decimals for your decimal / numeric / float conversions.

SQL: How to make a replace on the field ''

I have a very but tricky question for you guys. So, listen I have a field with spaces and numbers in one of my table columns. The key part is transform the content in a decimal field. The drawback is basically that for some rows I could get something like:
' 1584.00 '
' 156546'
'545.00 '
' '
So, to clean up my column, I have done a LTRIM and RTRIM so spaces gone. So now for a couple of records where the record were just spaces the new content is ''. Finally I need to convert this result to a decimal.
Issue: The thing is that for field that contend just the spaces the new result is '' and I'm not able to apply a REPLACE on this because it's a blank and the code below doesn't work:
SELECT REPLACE('','','0')
-- Final current verison
SELECT CAST(COALESCE(REPLACE(REPLACE([Gross_Weight],' ','0'),',',''),'0') AS DECIMAL(13,3))
How could I figure it out?
thanks so much
SELECT COALESCE(NULLIF(MyColumn, ''), 0)
This has the side-effect that you will also turn NULL values into 0, which you might not want. If that's a problem then a simple CASE statement should do the trick:
SELECT CASE WHEN MyColumn = '' THEN 0 ELSE CAST(MyColumn AS DECIMAL(10, 4)) END
Obviously you'll also have to incorporate any other manipulations that you're already doing.
No need for replace, just concatenate a zero to your column, like
SELECT RTRIM('0' + LTRIM(column))
I presume your data is in a table.
Lets call this table 'DATA' and the column 'VALUE'
Then you might use the below query
UPDATE DATA SET VALUE = 0 where VALUE = ''
To select the value do the below
select case ltrim(rtrim([Gross_Weight])) when ''
THEN 0
ELSE ltrim(rtrim([Gross_Weight])) END
Let me know if i get the requirement wrong.

Looping a CASE WHEN and REPLACE statement in SQL

Apologies for the multiple basic questions - I am very new to SQL and still trying to work things out.
I would like to insert records from my staging table to another table in my database, both removing the double quotes in the source file with a 'replace' function and converting the data from nvarchar (staging table) to datetime2. I can't quite work out how to do this: if I loop the 'case when' within 'replace', as below, then SQL doesn't recognise my data and nulls it out:
CASE WHEN ISDATE (REPLACE([Column1], '"', '')) = 1
THEN CONVERT(datetime2, Column1, 103)
ELSE null END
However if I loop my 'replace' within my 'case when', as below, SQL gives me an error message saying that it is unable to convert nvarchar into datetime2:
LTRIM(REPLACE([Column1], '"', '')
,CASE WHEN ISDATE(Column1) = 1 THEN CONVERT(datetime2, Column1, 103)
ELSE null END
What order / syntax do I need to be using to achieve this? An example of the data field would be:
"16/10/2017"
It uploads to my staging table as nvarchar
"16/10/2017"
and I would like to move it into my table2 as datetime2:
16/10/2017
Instead of isdate(), use try_convert():
TRY_CONVERT(datetime2, LTRIM(REPLACE([Column1], '"', ''), 103)
I think your confusion is that you need to do the string manipulation before the conversion. To do this, the string manipulation needs to be an argument to the conversion.
You are doing it right. The problem is, convert needs value without " ", and hence your convert was failing.
Just try this :
select
CASE WHEN ISDATE (REPLACE([Column1], '"', '')) = 1
THEN CONVERT(datetime2, (REPLACE([Column1], '"', '')), 103)
ELSE null END
from #tbl
more details : cast and convert doc

SQL strip text and convert to integer

In my database (SQL 2005) I have a field which holds a comment but in the comment I have an id and I would like to strip out just the id, and IF possible convert it to an int:
activation successful of id 1010101
The line above is the exact structure of the data in the db field.
And no I don't want to do this in the code of the application, I actually don't want to touch it, just in case you were wondering ;-)
This should do the trick:
SELECT SUBSTRING(column, PATINDEX('%[0-9]%', column), 999)
FROM table
Based on your sample data, this that there is only one occurence of an integer in the string and that it is at the end.
I don't have a means to test it at the moment, but:
select convert(int, substring(fieldName, len('activation successful of id '), len(fieldName) - len('activation successful of id '))) from tableName
Would you be open to writing a bit of code? One option, create a CLR User Defined function, then use Regex. You can find more details here. This will handle complex strings.
If your above line is always formatted as 'activation successful of id #######', with your number at the end of the field, then:
declare #myColumn varchar(100)
set #myColumn = 'activation successful of id 1010102'
SELECT
#myColumn as [OriginalColumn]
, CONVERT(int, REVERSE(LEFT(REVERSE(#myColumn), CHARINDEX(' ', REVERSE(#myColumn))))) as [DesiredColumn]
Will give you:
OriginalColumn DesiredColumn
---------------------------------------- -------------
activation successful of id 1010102 1010102
(1 row(s) affected)
select cast(right(column_name,charindex(' ',reverse(column_name))) as int)
CAST(REVERSE(LEFT(REVERSE(#Test),CHARINDEX(' ',REVERSE(#Test))-1)) AS INTEGER)
-- Test table, you will probably use some query
DECLARE #testTable TABLE(comment VARCHAR(255))
INSERT INTO #testTable(comment)
VALUES ('activation successful of id 1010101')
-- Use Charindex to find "id " then isolate the numeric part
-- Finally check to make sure the number is numeric before converting
SELECT CASE WHEN ISNUMERIC(JUSTNUMBER)=1 THEN CAST(JUSTNUMBER AS INTEGER) ELSE -1 END
FROM (
select right(comment, len(comment) - charindex('id ', comment)-2) as justnumber
from #testtable) TT
I would also add that this approach is more set based and hence more efficient for a bunch of data values. But it is super easy to do it just for one value as a variable. Instead of using the column comment you can use a variable like #chvComment.
If the comment string is EXACTLY like that you can use replace.
select replace(comment_col, 'activation successful of id ', '') as id from ....
It almost certainly won't be though - what about unsuccessful Activations?
You might end up with nested replace statements
select replace(replace(comment_col, 'activation not successful of id ', ''), 'activation successful of id ', '') as id from ....
[sorry can't tell from this edit screen if that's entirely valid sql]
That starts to get messy; you might consider creating a function and putting the replace statements in that.
If this is a one off job, it won't really matter. You could also use a regex, but that's quite slow (and in any case mean you now have 2 problems).