Matching on Values, but Erroring on New Value in SQL Server - sql

I am comparing data from two different databases (one MariaDB and one SQL Server) within my Node project, and am then doing inserts and updates as necessary depending on the comparison results.
I have a question about this code that I use to iterate through results in Node, going one at a time and passing in values to check against (note - I am more familiar with Node and JS than with SQL, hence this question):
SELECT TOP 1
CASE
WHEN RM00101.CUSTCLAS LIKE ('%CUSR%')
THEN CAST(REPLACE(LEFT(LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)), '-', '') AS INT)
ELSE 0
END AS Id,
CASE
WHEN LR301.RMDTYPAL = 7 THEN LR301.ORTRXAMT * -1
WHEN LR301.RMDTYPAL = 9 THEN LR301.ORTRXAMT * -1
ELSE LR301.ORTRXAMT
END DocumentAmount,
GETDATE() VerifyDate
FROM
CRDB..RM20101
INNER JOIN
CRDB..RM00101 ON LR301.CUSTNMBR = RM00101.CUSTNMBR
WHERE
CONVERT(BIGINT, (REPLACE(LEFT(LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)), '-', ''))) = 589091
Currently, the above works for me for finding records that match. However, if I enter a value that doesn't yet exist - in this line below, like so:
WHERE CONVERT(BIGINT, (REPLACE(LEFT( LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)), '-', ''))) = 789091
I get this error:
Error converting data type varchar to bigint.
I assume the issue is that, if the value isn't found, it can't cast it to an INTEGER, and so it errors out. Sound right?
What I ideally want is for the query to execute successfully, but just return 0 results when a match is not found. In JavaScript I might doing something like an OR clause to handle this:
const array = returnResults || [];
But I'm not sure how to handle this with SQL.
By the way, the value in SQL Server that's being matched is of type char(21), and the values look like this: 00000516542-000. The value in MariaDB is of type INT.
So two questions:
Will this error out when I enter a value that doesn't currently match?
If so, how can I handle this so as to just return 0 rows when a match isn't found?
By the way, as an added note, someone suggested using TRY_CONVERT, but while this works in SQL Server, it doesn't work when I use it with the NODE mssql package.

I think the issue is happening because the varchar value is not always made of numbers. You can make the comparison in varchar format itself to avoid this issue:
WHERE (REPLACE(LEFT( LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)), '-', '')) = '789091'
Hope this helps.
Edit: based on the format in the comment, this should do the trick;
WHERE REPLACE(LTRIM(REPLACE(REPLACE(LEFT( LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)),'0',' '),'-','')),' ','0') = '789091'

Related

Splitting and Analyzing data

I have a field that sometimes contains a string like the following: 2/23/2013 12:25:55~45
I need to split the string at the ~ and identify if what's left of the ~ is a valid date time value and what's right of the ~ is a valid integer. Basically what i want to return is a True/False whether those conditions are correct.
Keep in mind that the field could contain nulls, could contain any other type of data, and it could contain multiple tildes. In all cases I need to return false. The only time I need to return true is when the field contains a date/time value, a single tilde, and a whole number.
In SQL Server, you could do:
select (case when col like '%~%'
then (case when isdate(left(col, charindex('~', col) - 1)) = 1 and
isnumeric(substring(col, charindex('~', col)+1, 1000)) = 1 and col not like '%~%.%' and col not like '%~%e%'
then 1
else 0
end)
else 0
end) as IsFunkyFormat, substring(col, charindex('~', col)+1, 1000), left(col, charindex('~', col) - 1)
The nested case is to prevent errors when the separator is not found. The not like expressions are intended to rule out number formats that are not integers.
This question is trickier than it looks because it would be easy to get wrong, or to write it in such a way that it works now, for your given set of data, but fails to work for other sets of data.
If this is the data you are storing in your database, I strongly encourage you to learn about database normalization. One tenet of normalization is that you only store one value in a column. In this case, you are storing a datetime and an integer value in the same column. It would be far better to store the data in multiple columns.
That being said, I know that there are times when you are given some raw data that you need to import in to your database. Often times, we cannot control the raw data that we are given, so we must make do with SQL gymnastics. In this particular case, there are several different types of back-flips that will be useful.
Determine the number of ~ characters in your string.
splitting the data on the tilde.
Making sure one of the values is a datetime
Making sure the other value is an integer.
Only 1 of the 4 items is built in to SQL Server. There is a function names IsDate that takes a string parameter and returns a bit indicating whether the date represented by the string can be converted to a date.
To determine the number of ~ in your string, the trick is to determine the length of the string with the tilde's, and the length of the string without the tilde's. We can determine which rows contain a single tilde by doing this:
When Len(Data) = Len(Replace(Data, '~', '')) + 1
The other tricky problem to solve is to determine if a string represents a whole number. There are multiple ways of doing this, but my favorite method is to concatenate hard coded values to your data and then test for numeric. For example, the IsNumeric function will return true for the string 1e4 because the e represents scientific notation and 1e4 could be interpreted as 1000. So if you do this:
IsNumeric(Data + 'e0')
This will return false for scientific notation because the data would be something like 1e4, which is concatenated to 'e0' to get '1e4e0' which is not numeric. Similarly, we can concatenate .0 to the string to check for fractional numbers. If your data is 45.2 (which is numeric) and you concatenate .0 to it, you get '45.2.0' which is not numeric. You can also add '-' to the test to check for positive numbers. '-20' is numeric, but '-' + '-20' (which is '--20') is not numeric.
Select YourColumnHere,
Len(Replace(YourColumnHere, '~', '')) + 1,
Case When Len(YourColumnHere) = Len(Replace(YourColumnHere, '~', '')) + 1
Then
Case When IsDate(Left(YourColumnHere, CharIndex('~', YourColumnHere)-1)) = 1
Then
Case When Right(YourColumnHere, Len(YourColumnHere)-CharIndex('~', YourColumnHere)) > ''
Then IsNumeric('-' + Right(YourColumnHere, Len(YourColumnHere)-CharIndex('~', YourColumnHere)) + '.0e0')
Else 0
End
Else 0
End
Else 0
End
From YourTableNameHere

how can I force SQL to only evaluate a join if the value can be converted to an INT?

I've got a query that uses several subqueries. It's about 100 lines, so I'll leave it out. The issue is that I have several rows returned as part of one subquery that need to be joined to an integer value from the main query. Like so:
Select
... columns ...
from
... tables ...
(
select
... column ...
from
... tables ...
INNER JOIN core.Type mt
on m.TypeID = mt.TypeID
where dpt.[DataPointTypeName] = 'TheDataPointType'
and m.TypeID in (100008, 100009, 100738, 100739)
and datediff(d, m.MeasureEntered, GETDATE()) < 365 -- only care about measures from past year
and dp.DataPointValue <> ''
) as subMdp
) as subMeas
on (subMeas.DataPointValue NOT LIKE '%[^0-9]%'
and subMeas.DataPointValue = cast(vcert.IDNumber as varchar(50))) -- THIS LINE
... more tables etc ...
The issue is that if I take out the cast(vcert.IDNumber as varchar(50))) it will attempt to compare a value like 'daffodil' to a number like 3245. Even though the datapoint that contains 'daffodil' is an orphan record that should be filtered out by the INNER JOIN 4 lines above it. It works fine if I try to compare a string to a string but blows up if I try to compare a string to an int -- even though I have a clause in there to only look at things that can be converted to integers: NOT LIKE '%[^0-9]%'. If I specifically filter out the record containing 'daffodil' then it's fine. If I move the NOT LIKE line into the subquery it will still fail. It's like the NOT LIKE is evaluated last no matter what I do.
So the real question is why SQL would be evaluating a JOIN clause before evaluating a WHERE clause contained in a subquery. Also how I can force it to only evaluate the JOIN clause if the value being evaluated is convertible to an INT. Also why it would be evaluating a record that will definitely not be present after an INNER JOIN is applied.
I understand that there's a strong element of query optimizer voodoo going on here. On the other hand I'm telling it to do an INNER JOIN and the optimizer is specifically ignoring it. I'd like to know why.
The problem you are having is discussed in this item of feedback on the connect site.
Whilst logically you might expect the filter to exclude any DataPointValue values that contain any non numeric characters SQL Server appears to be ordering the CAST operation in the execution plan before this filter happens. Hence the error.
Until Denali comes along with its TRY_CONVERT function the way around this is to wrap the usage of the column in a case expression that repeats the same logic as the filter.
So the real question is why SQL would be evaluating a JOIN clause
before evaluating a WHERE clause contained in a subquery.
Because SQL engines are required to behave as if that's what they do. They're required to act like they build a working table from all of the table constructors in the FROM clause; expressions in the WHERE clause are applied to that working table.
Joe Celko wrote about this many times on Usenet. Here's an old version with more details.
First of all,
NOT LIKE '%[^0-9]%'
isn`t work well. Example:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT CASE WHEN #INT LIKE '%[^0-9]%' THEN 1 ELSE 0 END AS Is_Number
Result: 1
But it is not a number!
To check if it is real int value , you should use ISNUMERIC function. Let`s check this:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT ISNUMERIC(#int) Is_Int
Result:0
Result is correct.
So, instead of
NOT LIKE '%[^0-9]%'
try to change this to
ISNUMERIC(subMeas.DataPointValue)=0
UPDATE
How check if value is integer?
First here:
WHERE ISNUMERIC(str) AND str NOT LIKE '%.%' AND str NOT LIKE '%e%' AND str NOT LIKE '%-%'
Second:
CREATE Function dbo.IsInteger(#Value VarChar(18))
Returns Bit
As
Begin
Return IsNull(
(Select Case When CharIndex('.', #Value) > 0
Then Case When Convert(int, ParseName(#Value, 1)) <> 0
Then 0
Else 1
End
Else 1
End
Where IsNumeric(#Value + 'e0') = 1), 0)
End
Filter out the non-numeric records in a subquery or CTE

How does one filter based on whether a field can be converted to a numeric?

I've got a report that has been in use quite a while - in fact, the company's invoice system rests in a large part upon this report (Disclaimer: I didn't write it). The filtering is based upon whether a field of type VarChar(50) falls between two numeric values passed in by the user.
The problem is that the field the data is being filtered on now not only has simple non-numeric values such as '/A', 'TEST' and a slew of other non-numeric data, but also has numeric values that seem to be defying any type of numeric conversion I can think of.
The following (simplified) test query demonstrates the failure:
Declare #StartSummary Int,
#EndSummary Int
Select #StartSummary = 166285,
#EndSummary = 166289
Select SummaryInvoice
From Invoice
Where IsNull(SummaryInvoice, '') <> ''
And IsNumeric(SummaryInvoice) = 1
And Convert(int, SummaryInvoice) Between #StartSummary And #EndSummary
I've also attempted conversions using bigint, real and float and all give me similar errors:
Msg 8115, Level 16, State 2, Line 7
Arithmetic overflow error converting
expression to data type int.
I've tried other larger numeric datatypes such as BigInt with the same error. I've also tried using sub-queries to sidestep the conversion issue by only extracting fields that have numeric data and then converting those in the wrapper query, but then I get other errors which are all variations on a theme indicating that the value stored in the SummaryInvoice field can't be converted to the relevant data type.
Short of extracting only those records with numeric SummaryInvoice fields to a temporary table and then querying against the temporary table, is there any one-step solution that would solve this problem?
Edit: Here's the field data that I suspect is causing the problem:
SummaryInvoice
11111111111111111111111111
IsNumeric states that this field is numeric - which it is. But attempting to convert it to BigInt causes an arithmetic overflow. Any ideas? It doesn't appear to be an isolated incident, there seems to have been a number of records populated with data that causes this issue.
It seems that you are gonna have problems with the ISNUMERIC function, since it returns 1 if can be cast to any number type (including ., ,, e0, etc). If you have numbers longer than 2^63-1, you can use DECIMAL or NUMERIC. I'm not sure if you can use PATINDEX to perform an regex look on SummaryInvoice, but if you can, then you should try this:
SELECT SummaryInvoice
FROM Invoice
WHERE ISNULL(SummaryInvoice, '') <> ''
AND CASE WHEN PATINDEX('%[^0-9]%',SummaryInvoice) > 0 THEN CONVERT(DECIMAL(30,0), SummaryInvoice) ELSE -1 END
BETWEEN #StartSummary And #EndSummary
You can't guarantee what order the WHERE clause filters will be applied.
One ugly option to decouple inner and outer.
SELECT
*
FROM
(
Select TOP 2000000000
SummaryInvoice
From Invoice
Where IsNull(SummaryInvoice, '') <> ''
And IsNumeric(SummaryInvoice) = 1
ORDER BY SummaryInvoice
) foo
WHERE
Convert(int, SummaryInvoice) Between #StartSummary And #EndSummary
Another using CASE
Select SummaryInvoice
From Invoice
Where IsNull(SummaryInvoice, '') <> ''
And
CASE WHEN IsNumeric(SummaryInvoice) = 1 THEN Convert(int, SummaryInvoice) ELSE -1 END
Between #StartSummary And #EndSummary
YMMV
Edit: after question update
use decimal(38,0) not int
Change ISNUMERIC(SummaryInvoice) to ISNUMERIC(SummaryInvoice + '0e0')
AND with IsNumeric(SummaryInvoice) = 1, will not short circuit in SQL Server.
But may be you can use
AND (CASE IsNumeric(SummaryInvoice) = 1 THEN Convert(int, SummaryInvoice) ELSE 0 END)
Between #StartSummary And #EndSummary
Your first issue is to fix your database structure so bad data cannot get into the field. You are putting a band-aid on a wound that needs stitches and wondering why it doesn't heal.
Database refactoring is not fun, but it needs to be done when there is a data integrity problem. I assume you aren't really invoicing someone for 11,111,111,111,111,111,111,111,111 or 'test'. So don't allow those values to ever get entered (if you can't change the structure to the correct data type, consider a trigger to prevent bad data from going in) and delete the ones you do have that are bad.

SQL query - LEFT 1 = char, RIGHT 3-5 = numbers in Name

I need to filter out junk data in SQL (SQL Server 2008) table. I need to identify these records, and pull them out.
Char[0] = A..Z, a..z
Char[1] = 0..9
Char[2] = 0..9
Char[3] = 0..9
Char[4] = 0..9
{No blanks allowed}
Basically, a clean record will look like this:
T1234, U2468, K123, P50054 (4 record examples)
Junk data looks like this:
T12.., .T12, MARK, TP1, SP2, BFGL, BFPL (7 record examples)
Can someone please assist with a SQL query to do a LEFT and RIGHT method and extract those characters, and do a LIKE IN or something?
A function would be great though!
The following should work in a few different systems:
SELECT *
FROM TheTable
WHERE Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9]%'
AND Data NOT LIKE '% %'
This approach will indeed match P2343, P23423JUNK, and other similar text but requires that the format is A0000*.
Now, if the OP implies a format of 1st position is a character and all succeeding positions are numeric, as in A0+, then use the following (in SQL Server and a good deal of other database systems):
SELECT *
FROM TheTable
WHERE SUBSTRING(Data, 1, 1) LIKE '[A-Za-z]'
AND SUBSTRING(Data, 2, LEN(Data) - 1) NOT LIKE '%[^0-9]%'
AND LEN(Data) >= 5
To incorporate this into a SQL Server 2008 function, since this appears to be what you'd like most, you can write:
CREATE FUNCTION ufn_IsProperFormat(#data VARCHAR(50))
RETURNS BIT
AS
BEGIN
RETURN
CASE
WHEN SUBSTRING(#Data, 1, 1) LIKE '[A-Za-z]'
AND SUBSTRING(#Data, 2, LEN(#Data) - 1) NOT LIKE '%[^0-9]%'
AND LEN(#Data) >= 5 THEN 1
ELSE 0
END
END
...and call into it like so:
SELECT *
FROM TheTable
WHERE dbo.ufn_IsProperFormat(Data) = 1
...this query needs to change for Oracle queries because Oracle doesn't appear to support bracket notation in LIKE clauses:
SELECT *
FROM TheTable
WHERE REGEXP_LIKE(Data, '^[A-za-z]\d{4,}$')
This is the expansion gbn is doing in his answer, but these versions allow for varying string lengths without the OR conditions.
EDIT: Updated to support examples in SQL Server and Oracle for ensuring the format A0+, so that A1324, A2342388, and P2342 match but A2342JUNK and A234 do not.
The Oracle REGEXP_LIKE code was borrowed from Mark's post but updated to support 4 or more numeric digits.
Added a custom SQL Server 2008 approach which implements these techniques.
Depends on your database. Many have regex functions (note examples not tested so check)
e.g. Oracle
SELECT x
FROM table
WHERE REGEXP_LIKE(x, '^[A-za-z][:digit:]{4}$')
Sybase uses LIKE
Given that you're allowing between 3 and 6 digits for the number in your examples then it's probably better to use the ISNUMERIC() function on the 2nd character onwards:
SELECT *
FROM TheTable
-- start with a letter
WHERE Data LIKE '[A-Za-z]%'
-- everything from 2nd character onwards is a number
AND ISNUMERIC( SUBSTRING( Data, 2, 50 ) ) = 1
-- number doesn't have a decimal place
AND Data NOT LIKE '%.%'
For more information look at the ISNUMERIC function on MSDN.
Also note that:
I've limited the 2nd part with the number to 50 characters maximum, change this to suit your needs.
Strictly speaking you should check for currency symbols etc, as ISNUMERIC allows them, as well as +/- and some others
A better option might be to create a function that checks that each character after the first is between 0 and 9 (or 1 and 0 if you're using ASCII codes).
You can't use Regular Expressions in SQL Server, so you have to use OR. Correcting David Andres' answer...
WHERE
(
Data LIKE '[A-Za-z][0-9][0-9][0-9]'
OR
Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9]'
OR
Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9][0-9]'
)
David's answer allows "D1234junk" through
You also only need "[A-Z]" if you don't have case sensitivity

Conditionally branching in SQL based on the type of a variable

I'm selecting a value out of a table that can either be an integer or a nvarchar. It's stored as nvarchar. I want to conditionally call a function that will convert this value if it is an integer (that is, if it can be converted into an integer), otherwise I want to select the nvarchar with no conversion.
This is hitting a SQL Server 2005 database.
select case
when T.Value (is integer) then SomeConversionFunction(T.Value)
else T.Value
end as SomeAlias
from SomeTable T
Note that it is the "(is integer)" part that I'm having trouble with. Thanks in advance.
UPDATE
Check the comment on Ian's answer. It explains the why and the what a little better. Thanks to everyone for their thoughts.
select case
when ISNUMERIC(T.Value) then T.Value
else SomeConversionFunction(T.Value)
end as SomeAlias
Also, have you considered using the sql_variant data type?
The result set can only have one type associated with it for each column, you will get an error if the first row converts to an integer and there are strings that follow:
Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the nvarchar value 'word' to data type int.
try this to see:
create table testing
(
strangevalue nvarchar(10)
)
insert into testing values (1)
insert into testing values ('word')
select * from testing
select
case
when ISNUMERIC(strangevalue)=1 THEN CONVERT(int,strangevalue)
ELSE strangevalue
END
FROM testing
best bet is to return two columns:
select
case
when ISNUMERIC(strangevalue)=1 THEN CONVERT(int,strangevalue)
ELSE NULL
END AS StrangvalueINT
,case
when ISNUMERIC(strangevalue)=1 THEN NULL
ELSE strangevalue
END AS StrangvalueString
FROM testing
or your application can test for numeric and do your special processing.
You can't have a column that is sometimes an integer and sometimes a string. Return the string and check it using int.TryParse() in the client code.
ISNUMERIC. However, this accepts +, - and decimals so more work is needed.
However, you can't have the columns as both datatypes in one go: you'll need 2 columns.
I'd suggest that you deal with this in your client or use an ISNUMERIC replacement
IsNumeric will get you part of the way there. You can then add some further code to check whether it is an integer
for example:
select top 10
case
when isnumeric(mycolumn) = 1 then
case
when convert(int, mycolumn) = mycolumn then
'integer'
else
'number but not an integer'
end
else
'not a number'
end
from mytable
To clarify some other answers, your SQL statement can't return different data types in one column (it looks like the other answers are saying you can't store different data types in one column - yours are all strign represenations).
Therefore, if you use ISNUMERIC or another function, the value will be cast as a string in the table that is returned anyway if there are other strigns being selected.
If you are selecting only one value then it could return a string or a number, however your front end code will need to be able to return the different data types.
Just to add to some of the other comments about not being able to return different data types in the same column... Database columns should know what datatype they are holding. If they don't then that should be a BIG red flag that you have a design problem somewhere, which almost guarantees future headaches (like this one).