SQL how to join on another field if the first field has letters in it - sql

I have a SQL join that looks like this
INNER JOIN UPLOAD PH on cast(p.PickTicket_Number as varchar(20))=PH.FIELD004
However, sometimes, the field I really want to join on is PH.FIELD003. This happens when PH.FIELD004 is a different field that has letters in it. How do I include a condition in the ON clause where if field004 does not have letters in it (is numeric), it joins just like that above, but if it does have letters in it, it instead joins on the pickticket number = field003?

You can express the logic like this:
UPLOAD PH
ON cast(p.PickTicket_Number as varchar(20)) = PH.FIELD004 OR
(TRY_CONVERT(int, PH.FIELD004) IS NULL AND
CAST(p.PickTicket_Number as varchar(20)) = PH.FIELD003
)
Notes:
This checks that FIELD004 cannot be converted to an INT. This is hopefully close enough to "has letters in it". If not, you can use a CASE expression with LIKE.
OR is a performance killer for JOIN conditions. Function calls (say for conversion) also impede the optimizer.
If performance is an issue, ask a new question.
I am guessing the "number" value is an INT; if not, change to the appropriate type.
Converting numbers to strings for comparison can be dangerous. I've had problems with leading 0's, for example.
Because of the last issue, I would recommend:
UPLOAD PH
ON TRY_CONVERT(INT, PH.FIELD004) = p.PickTicket_Number OR
(TRY_CONVERT(int, PH.FIELD004) IS NULL AND
p.PickTicket_Number= PH.FIELD003
)

Related

How to Replace a part of string in Left Join Statement in SQL

I have sql statement
LEFT JOIN SeniorCitizen on FinishedTransaction.SCID = SeniorCitizen.OSCAID
SCID has 1234
OSCAID has 1234/102938
How can I remove /102938 so that it matches
Hmmm, one method is to use LIKE:
ON SeniorCitizen.OSCAID LIKE FinishedTransaction.SCID + '/%'
No guarantees on performance, but this should do the join correctly.
EDIT:
You can do this operation efficiently by using a computed column and then an index on the computed column.
So:
alter table SeniorCitizen
add OSCAIDshort as ( cast(left(OSCAID, CHARINDEX('/', OSCAID) - 1) as int) );
create index idx_SeniorCitizen_OSCAIDshort on SeniorCitizen(OSCAIDshort);
(The cast presumes that the SCID column is an integer.)
Then you can use this in the join as:
LEFT JOIN SeniorCitizen on FinishedTransaction.SCID = SeniorCitizen.OSCAIDshort
This formulation can use the index on the computed column and hence is probably the fastest way to do the join.
If you knew that the length of the numbers you were comparing was always 4, you could use SUBSTRING, like so:
LEFT JOIN SeniorCitizen on FinishedTransaction.SCID = SUBSTRING(SeniorCitizen.OSCAID, 1, 4)
to just grab the first four characters from OSCAID for the comparison.
However, even if you knew the length was always 4, it's still safer to assume that you won't know the length, because maybe at some point in the future the length grows. And if it does, your query can scale with it with no issues. To do this, you can use a combination of SUBSTRING and CHARINDEX, like so:
LEFT JOIN SeniorCitizen on FinishedTransaction.SCID = SUBSTRING(SeniorCitizen.OSCAID, 1, CHARINDEX('/', SeniorCitizen.OSCAID, 0))
This will start at the first character in OSCAID and continue reading until it finds a /. So if the string is 1234/102938, it'll return 1234. And if grows to 123456/102938, it'll return 123456.
Be sure to check out the docs for each of those functions to get a better understanding of their capabilities:
SUBSTRING: https://msdn.microsoft.com/en-us/library/ms187748.aspx
CHARINDEX: https://msdn.microsoft.com/en-us/library/ms186323.aspx
You Can Use LEFT or SUBSTRING Functions to do this.
SUBSTRING(SeniorCitizen.OSCAID, 1, 4)
LEFT(SeniorCitizen.OSCAID, 4)
But keep in mind that, usage of user defined functions might make the query non-sargable.

What does LEFT in SQL do when it is not paired with JOIN and why does it cause my query to time out?

I was given the following statement:
LEFT(f.field4, CASE WHEN PATINDEX('%[^0-9]%',f.field4) = 0 THEN LEN(f.field4) ELSE PATINDEX('%[^0-9]%',f.field4) - 1 END)=#DealNumber
and am having trouble contacting the person that wrote it. Could someone explain what that statement does, and if it is valid SQL? The goal of the statement is to compare the numeric character in f.field for to the DealNumber. DNumber and DealNumber are the same except for a wildcard at the end of DealNumber.
I am trying to use it in the context of the following statement:
SELECT d.Description, d.FileID, d.DateFiled, u.Contact AS UserFiledName, d.Pages, d.Notes
FROM Documents AS d
LEFT JOIN Files AS f ON d.FileID=f.FileID
LEFT JOIN Users AS u ON d.UserFiled=u.UserID
WHERE SUBSTRING(f.Field8, 2, 1) = #LocationIDString
AND f.field4=#DNumber OR LEFT(f.field4, CASE WHEN PATINDEX('%[^0-9]%',f.field4) = 0 THEN LEN(f.field4) ELSE PATINDEX('%[^0-9]%',f.field4) - 1 END)=#DealNumber"
but my code keeps timing out when I execute it.
It's the CASE clause which is slowing things down, not LEFT per se (although LEFT may prevent the use of indexes, which will have an effect).
The CASE determines what should be compared with #DealNumber, and I think it does the following...
If f.field4 does not start with a digit, use LEFT(f.field4, LEN(f.field4))=#DealNumber: that's equivalent to f.field4=#DealNumber.
If f.field4 does start with digits, use {those digits}=#DealNumber.
This sort of computation isn't very efficient.
I would attempt the following, which makes the large assumption that a mixed string can be cast as an integer — that is, that if you convert ABC to an integer you get zero, and if you convert 123ABC you get what can be converted, 123. I can't find any documentation which says whether that is possible or not.
AND f.field4=#DNumber
OR (f.field4=#DealNumber AND integer(f.field4)=0)
OR (integer(f.field4)=#DealNumber)
The first line is the same as your AND. The second line selects f.field4=#DealNumber only if f.field4 does not start with a number. The third line selects where the initial numeric portion of f.field4 is the same as #DealNumber.
As I say, there is an assumption here that integer() will work in this way. You may need to define a CAST function to do that conversion with strings. That's rather beyond me, although I would be confident that even such a function would be faster than a CASE as you currently have.
From the doc:
left(str text, n int)
Return first n characters in the string. When n is negative, return all but last |n| characters.

how can I force SQL to only evaluate a join if the value can be converted to an INT?

I've got a query that uses several subqueries. It's about 100 lines, so I'll leave it out. The issue is that I have several rows returned as part of one subquery that need to be joined to an integer value from the main query. Like so:
Select
... columns ...
from
... tables ...
(
select
... column ...
from
... tables ...
INNER JOIN core.Type mt
on m.TypeID = mt.TypeID
where dpt.[DataPointTypeName] = 'TheDataPointType'
and m.TypeID in (100008, 100009, 100738, 100739)
and datediff(d, m.MeasureEntered, GETDATE()) < 365 -- only care about measures from past year
and dp.DataPointValue <> ''
) as subMdp
) as subMeas
on (subMeas.DataPointValue NOT LIKE '%[^0-9]%'
and subMeas.DataPointValue = cast(vcert.IDNumber as varchar(50))) -- THIS LINE
... more tables etc ...
The issue is that if I take out the cast(vcert.IDNumber as varchar(50))) it will attempt to compare a value like 'daffodil' to a number like 3245. Even though the datapoint that contains 'daffodil' is an orphan record that should be filtered out by the INNER JOIN 4 lines above it. It works fine if I try to compare a string to a string but blows up if I try to compare a string to an int -- even though I have a clause in there to only look at things that can be converted to integers: NOT LIKE '%[^0-9]%'. If I specifically filter out the record containing 'daffodil' then it's fine. If I move the NOT LIKE line into the subquery it will still fail. It's like the NOT LIKE is evaluated last no matter what I do.
So the real question is why SQL would be evaluating a JOIN clause before evaluating a WHERE clause contained in a subquery. Also how I can force it to only evaluate the JOIN clause if the value being evaluated is convertible to an INT. Also why it would be evaluating a record that will definitely not be present after an INNER JOIN is applied.
I understand that there's a strong element of query optimizer voodoo going on here. On the other hand I'm telling it to do an INNER JOIN and the optimizer is specifically ignoring it. I'd like to know why.
The problem you are having is discussed in this item of feedback on the connect site.
Whilst logically you might expect the filter to exclude any DataPointValue values that contain any non numeric characters SQL Server appears to be ordering the CAST operation in the execution plan before this filter happens. Hence the error.
Until Denali comes along with its TRY_CONVERT function the way around this is to wrap the usage of the column in a case expression that repeats the same logic as the filter.
So the real question is why SQL would be evaluating a JOIN clause
before evaluating a WHERE clause contained in a subquery.
Because SQL engines are required to behave as if that's what they do. They're required to act like they build a working table from all of the table constructors in the FROM clause; expressions in the WHERE clause are applied to that working table.
Joe Celko wrote about this many times on Usenet. Here's an old version with more details.
First of all,
NOT LIKE '%[^0-9]%'
isn`t work well. Example:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT CASE WHEN #INT LIKE '%[^0-9]%' THEN 1 ELSE 0 END AS Is_Number
Result: 1
But it is not a number!
To check if it is real int value , you should use ISNUMERIC function. Let`s check this:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT ISNUMERIC(#int) Is_Int
Result:0
Result is correct.
So, instead of
NOT LIKE '%[^0-9]%'
try to change this to
ISNUMERIC(subMeas.DataPointValue)=0
UPDATE
How check if value is integer?
First here:
WHERE ISNUMERIC(str) AND str NOT LIKE '%.%' AND str NOT LIKE '%e%' AND str NOT LIKE '%-%'
Second:
CREATE Function dbo.IsInteger(#Value VarChar(18))
Returns Bit
As
Begin
Return IsNull(
(Select Case When CharIndex('.', #Value) > 0
Then Case When Convert(int, ParseName(#Value, 1)) <> 0
Then 0
Else 1
End
Else 1
End
Where IsNumeric(#Value + 'e0') = 1), 0)
End
Filter out the non-numeric records in a subquery or CTE

What applications are there for NULLIF()?

I just had a trivial but genuine use for NULLIF(), for the first time in my career in SQL. Is it a widely used tool I've just ignored, or a nearly-forgotten quirk of SQL? It's present in all major database implementations.
If anyone needs a refresher, NULLIF(A, B) returns the first value, unless it's equal to the second in which case it returns NULL. It is equivalent to this CASE statement:
CASE WHEN A <> B OR B IS NULL THEN A END
or, in C-style syntax:
A == B || A == null ? null : A
So far the only non-trivial example I've found is to exclude a specific value from an aggregate function:
SELECT COUNT(NULLIF(Comment, 'Downvoted'))
This has the limitation of only allowing one to skip a single value; a CASE, while more verbose, would let you use an expression.
For the record, the use I found was to suppress the value of a "most recent change" column if it was equal to the first change:
SELECT Record, FirstChange, NULLIF(LatestChange, FirstChange) AS LatestChange
This was useful only in that it reduced visual clutter for human consumers.
I rather think that
NULLIF(A, B)
is syntactic sugar for
CASE WHEN A = B THEN NULL ELSE A END
But you are correct: it is mere syntactic sugar to aid the human reader.
I often use it where I need to avoid the Division by Zero exception:
SELECT
COALESCE(Expression1 / NULLIF(Expression2, 0), 0) AS Result
FROM …
Three years later, I found a material use for NULLIF: using NULLIF(Field, '') translates empty strings into NULL, for equivalence with Oracle's peculiar idea about what "NULL" represents.
NULLIF is handy when you're working with legacy data that contains a mixture of null values and empty strings.
Example:
SELECT(COALESCE(NULLIF(firstColumn, ''), secondColumn) FROM table WHERE this = that
SUM and COUNT have the behavior of turning nulls into zeros. I could see NULLIF being handy when you want to undo that behavior. If fact this came up in a recent answer I provided. If I had remembered NULLIF I probably would have written the following
SELECT student,
NULLIF(coursecount,0) as courseCount
FROM (SELECT cs.student,
COUNT(os.course) coursecount
FROM #CURRENTSCHOOL cs
LEFT JOIN #OTHERSCHOOLS os
ON cs.student = os.student
AND cs.school <> os.school
GROUP BY cs.student) t

Regex: does a SQL statement include a WHERE clause?

I need a regex that will determine if a given SQL statement has a WHERE clause. My problem is that the passed SQL statements will most likely be complex, so I can not rely on just the existence of the word WHERE in the statement.
For example this should match
SELECT Contacts.ID
, CASE WHEN (Contacts.Firstname IS NULL) THEN ''
ELSE CAST(Contacts.Firstname AS varchar)
END AS Firstname
, CASE WHEN (Contacts.Lastname IS NULL) THEN ''
ELSE CAST(Contacts.Lastname AS varchar)
END AS Lastname
, CASE WHEN (tbl_ContactExtras.Prequalified=-1 OR
tbl_ContactExtras.Prequalified IS NULL) THEN ''
WHEN tbl_ContactExtras.Prequalified=0 THEN 'No'
WHEN tbl_ContactExtras.Prequalified=1 THEN 'Yes - Other'
WHEN tbl_ContactExtras.Prequalified=2 THEN 'Yes'
ELSE CAST(tbl_ContactExtras.Prequalified AS varchar)
END AS Prequalified
FROM contacts
LEFT JOIN tbl_ContactExtras
ON tbl_ContactExtras.ContactID = Contacts.ID
WHERE (Contacts.Firstname LIKE 'Bob%')
and this should not match:
SELECT Contacts.ID
, CASE WHEN (Contacts.Firstname IS NULL) THEN ''
ELSE CAST(Contacts.Firstname AS varchar)
END AS Firstname
, CASE WHEN (Contacts.Lastname IS NULL) THEN ''
ELSE CAST(Contacts.Lastname AS varchar)
END AS Lastname
, CASE WHEN (tbl_ContactExtras.Prequalified=-1 OR
tbl_ContactExtras.Prequalified IS NULL) THEN ''
WHEN tbl_ContactExtras.Prequalified=0 THEN 'No'
WHEN tbl_ContactExtras.Prequalified=1 THEN 'Yes - Other'
WHEN tbl_ContactExtras.Prequalified=2 THEN 'Yes'
ELSE CAST(tbl_ContactExtras.Prequalified AS varchar)
END AS Prequalified
FROM contacts
LEFT JOIN tbl_ContactExtras
ON tbl_ContactExtras.ContactID = Contacts.ID
Those are examples of some of the simpler statements: a statement could have up to 30 CASE statements in it, or it could have none at all.
I need to programmatically add WHERE parameters, but doing this correctly requires knowing whether a WHERE clause is already present.
Any idea on a regex that would work for this? If not, any other ideas on how to tell the two apart?
Thanks,
This is not possible, since a WHERE clause may be arbitrarily nested inside the FROM clause.
This may not catch all cases but you may find you can catch most of them just by finding the last from and the last where in the statement.
if the where is after the from, then it has a where clause. If the where is before the from (or there is no where at all), then no where clause exists.
Sometimes, it's okay to leave restrictions or limitations in your code, as long as they're properly documented.
For example, I've worked on a project before that parsed SQL and we discovered it didn't handle things like between:
where recdate between '2010-01-01' and '2010-12-31'
Rather than spend a bucket-load of money fixing the problem (and probably introducing bugs on the way), we simply published it as a restriction and told everyone they had to change it to:
where recdate >= '2010-01-01'
and recdate <= '2010-12-31'
Problem solved. While it's good to keep customers happy, you don't have to cater to every whim :-)
Other than that, you need an SQL parser, and SQL is not a pretty language to parse, trust me on that one.
Are all of the joins the same? If so you could find the index of all or part of the FROM statement (perhaps using a regex to be tolerant of slight differences in syntax and whitespace) and then look for the occurrence of the word WHERE after that index.
In general you would be better off using a parser. But if this is just a one off thing and the statements are all fairly similar then the above approach should be okay.
Regex is not designed to do this. Parsing SQL properly requires matching balanced parentheses (and other matching pairs, such as quotes), something regex is not designed to do (and pure regex isn't even equipped to; PCRE can but it's not pretty).
Instead, just write a basic state machine or something to parse it.
What's the problem you're trying to solve? Are you trying to determine if it's safe to add constraints to these existing queries?
For example, if you've got this query
...
where foo = 'bar'
then you know it's safe to add
and bat = 'quux'
but if you don't have a WHERE clause already, then you have to do it as
where bat = 'quux'
Is that the problem you're trying to solve? If so, can you make every SQL query you're working with have a WHERE clause by adding a "WHERE 0=0" to those queries that don't have one? Then you know in your post-process phase that every query already has one.
This is just a guess, of course. Your question sounded like that might be the larger issue.