decoding a text string to use in a join - sql

I'm trying to extract the number from a text string and join it to another table. Here's what I have so far:
SELECT sect.id,
sect.section_number,
sect.expression,
p.abbreviation
FROM sections sect
JOIN period p ON SUBSTR(sect.expression, 1, (INSTR(sect.expression,'(')-1)) = p.period_number
AND p.schoolid = 73253
AND p.year_id = 20
JOIN courses c ON sect.course_number = c.course_number
WHERE sect.schoolid = 73253
AND sect.termid >= 2000
I read some other threads and figured out how to strip out the number (which always comes before the left parenthesis). The problem is that this only accounts for two of the three styles of data that live in the sect.expression column-
9(A) - check
10(A) - check
but not
5-6(A)
5-6(A) would kick back an Oracle 01722 invalid number error.
Is there a way I could modify the substr... line so that for the 5-6(A) data type it would grab the first number (the 5) and join off of that?
It's worth mentioning that I only have read rights to this table so any solution that depends on creating some kind of helper table/column won't work.
Thanks!

You can use REGEXP_REPLACE
1) If you want to extract only numbers:
JOIN period p ON REGEXP_REPLACE(sect.expression, '[^0-9]', '') = p.period_number
2) If you want to match with the digits in the start of the string and ignore the ones that appear later:
JOIN period p ON REGEXP_REPLACE(sect.expression, '^(\d+)(.*)', '\1')

Being Oracle 10g, you could use a regex instead:
JOIN period p ON REGEXP_SUBSTR(sect.expression, '^\d+', 1, 1) = p.period_number
Admittedly, the regex I provided needs work - it will get the first number at the start of the string. If you need a more complicated regex, I recommend this site: http://www.regular-expressions.info/tutorial.html

Related

Substring in Left Join condition

I want to do substring within the join condition, but it is not working.
SELECT
IF (ps.shop = 'NL',TopCat.Parent_Title, CategoryUID.Parent_Title) as Parent_Title,
IF (ps.shop = 'NL',TopCat.Sub_Title_1, CategoryUID.Sub_Title_1) as Sub_Title_1,
IF (ps.shop = 'NL',TopCat.Sub_Title_2, CategoryUID.Sub_Title_2) as Sub_Title_2,
ps.ean, ps.product_resource_id
FROM `xxlhoreca-bi.PriceSearch.XXL_PriceComparison` ps
LEFT JOIN
`xxlhoreca-bi.DataImport.TopCategories` topCat
ON
ps.product_resource_id = topCat.product_resource_id
LEFT JOIN
`DataImport.CategoryUID` CategoryUID
ON
SAFE_CAST(SUBSTR('DataImport.CategoryMappingWithLocalID.Reporting_ID', 4) AS INT64) = CategoryUID.Category_ID
GROUP BY
1, 2, 3, 4, 5
Is there any way around how I can write substring within LEFT JOIN condition?
I need to change the substring part, but I have not been able to achieve it. Any helps would be really appreciated!
Thanks in advance!
You are on roughly the right track.
I am going to make a few assumptions here so bear with me, but I think there are educated guesses.
I think this DataImport.CategoryMappingWithLocalID.Reporting_ID is a field (Reporting_ID) from a table (CategoryMappingWithLocalID) you have in your dataset (DataImport).
What you are trying to achieve is to get the categories that are included in your CategoryMappingWithLocalID.
You are trying to get a substring from the Reporting_ID field because it has the ID you want within the first 4 characters.
Because SUBSTR requires a string, you are trying to turn that dataset.table.field reference in a string by putting it in single quotes, which leads me to think it might actually be a numeric field in the original table.
Now, the solution.
You need to use the table in your query if you want to use it in your JOIN ON clause. Therefore, you need to add an extra JOIN there.
You are on the right track with the SUBSTR part, but what you need to use is CAST(field AS STRING) to convert your numeric value into a string.
Put those two things together in your query and you are ready to go my friend.
JOIN `DataImport.CategoryMappingWithLocalID` AS category_mapping
ON
SAFE_CAST(SUBSTR(CAST(DataImport.CategoryMappingWithLocalID.Reporting_ID AS STRING), 4) AS INT64) = CategoryUID.Category_ID

How to Replace a part of string in Left Join Statement in SQL

I have sql statement
LEFT JOIN SeniorCitizen on FinishedTransaction.SCID = SeniorCitizen.OSCAID
SCID has 1234
OSCAID has 1234/102938
How can I remove /102938 so that it matches
Hmmm, one method is to use LIKE:
ON SeniorCitizen.OSCAID LIKE FinishedTransaction.SCID + '/%'
No guarantees on performance, but this should do the join correctly.
EDIT:
You can do this operation efficiently by using a computed column and then an index on the computed column.
So:
alter table SeniorCitizen
add OSCAIDshort as ( cast(left(OSCAID, CHARINDEX('/', OSCAID) - 1) as int) );
create index idx_SeniorCitizen_OSCAIDshort on SeniorCitizen(OSCAIDshort);
(The cast presumes that the SCID column is an integer.)
Then you can use this in the join as:
LEFT JOIN SeniorCitizen on FinishedTransaction.SCID = SeniorCitizen.OSCAIDshort
This formulation can use the index on the computed column and hence is probably the fastest way to do the join.
If you knew that the length of the numbers you were comparing was always 4, you could use SUBSTRING, like so:
LEFT JOIN SeniorCitizen on FinishedTransaction.SCID = SUBSTRING(SeniorCitizen.OSCAID, 1, 4)
to just grab the first four characters from OSCAID for the comparison.
However, even if you knew the length was always 4, it's still safer to assume that you won't know the length, because maybe at some point in the future the length grows. And if it does, your query can scale with it with no issues. To do this, you can use a combination of SUBSTRING and CHARINDEX, like so:
LEFT JOIN SeniorCitizen on FinishedTransaction.SCID = SUBSTRING(SeniorCitizen.OSCAID, 1, CHARINDEX('/', SeniorCitizen.OSCAID, 0))
This will start at the first character in OSCAID and continue reading until it finds a /. So if the string is 1234/102938, it'll return 1234. And if grows to 123456/102938, it'll return 123456.
Be sure to check out the docs for each of those functions to get a better understanding of their capabilities:
SUBSTRING: https://msdn.microsoft.com/en-us/library/ms187748.aspx
CHARINDEX: https://msdn.microsoft.com/en-us/library/ms186323.aspx
You Can Use LEFT or SUBSTRING Functions to do this.
SUBSTRING(SeniorCitizen.OSCAID, 1, 4)
LEFT(SeniorCitizen.OSCAID, 4)
But keep in mind that, usage of user defined functions might make the query non-sargable.

Parsing an Comparing FullNames on a Join between two tables

I want to compare two strings from two different tables which contain the full name of a person is this format "Blow, Joe" since in one table the user may have the full name like that and other table might have the same user but the full name as "Blow, Joseph) so I want to grab the first two character from both the first and last name and see if they match. Then if they do I wan to update the record. I am not sure what I am doing wrong but I was getting an out of range error and now I am getting incorrect syntax near 'SUBSTRING' which I am looking into now. Does anyone know of a good way to achieve what I am trying to accomplish?
This is what I currently have:
SELECT *
FROM EmployeeMaster e
JOIN EmployeeDivisions d ON SUBSTRING(REPLACE(RTRIM(LTRIM(LEFT(e.FullName,CHARINDEX(',',e.FullName) - 1))),' ',''),1,3) LIKE SUBSTRING(REPLACE(RTRIM(LTRIM(LEFT(d.Name,CHARINDEX(',',d.Name) - 1))),' ',''),1,3)
SUBSTRING(REPLACE(RTRIM(LTRIM(SUBSTRING(e.FullName,CHARINDEX(',',e.FullName) + 1, LEN(e.FullName)))),' ',''),1,3) LIKE SUBSTRING(REPLACE(RTRIM(LTRIM(SUBSTRING(d.Name,CHARINDEX(',',d.Name) + 1, LEN(d.Name)))),' ',''),1,3)
I guess I don't have to point out that this check might match names that are very different. In your example Blow, Josephwould match not onlyBlow, Joebut alsoBlack, Johnand so on...
Maybe you should at least extend the check to include the complete surname together with part of the given name.
But... if you still want to compare the first two letters in the word before the comma, and the first two letters in the word after the comma then use this:
SELECT *
FROM EmployeeMaster e
JOIN EmployeeDivisions d ON
(
SUBSTRING(REPLACE(RTRIM(LTRIM(LEFT(e.FullName,CHARINDEX(',',e.FullName) - 1))),' ',''),1,2)
=
SUBSTRING(REPLACE(RTRIM(LTRIM(LEFT(d.Name,CHARINDEX(',',d.Name) - 1))),' ',''),1,2)
)
AND
(
SUBSTRING(REPLACE(RTRIM(LTRIM(SUBSTRING(e.FullName,CHARINDEX(',',e.FullName) + 1, LEN(e.FullName)))),' ',''),1,2)
=
SUBSTRING(REPLACE(RTRIM(LTRIM(SUBSTRING(d.Name,CHARINDEX(',',d.Name) + 1, LEN(d.Name)))),' ',''),1,2)
)
You might be able to reduce the complexity of the join to this:
LEFT(LTRIM(e.FullName),CHARINDEX(',',e.FullName)-1)
=
LEFT(LTRIM(d.Name),CHARINDEX(',',d.Name)-1)
AND
SUBSTRING(e.FullName,CHARINDEX(',',e.FullName) + 1, 3)
=
SUBSTRING(d.Name,CHARINDEX(',',d.Name) + 1, 3)

Using the '?' Parameter in SQL LIKE Statement

I'm accessing a Firebird database through Microsoft Query in Excel.
I have a parameter field in Excel that contains a 4 digit number. One of my DB tables has a column (TP.PHASE_CODE) containing a 9 digit phase code, and I need to return any of those 9 digit codes that start with the 4 digit code specified as a parameter.
For example, if my parameter field contains '8000', I need to find and return any phase code in the other table/column that is LIKE '8000%'.
I am wondering how to accomplish this in SQL since it doesn't seem like the '?' representing the parameter can be included in a LIKE statement. (If I write in the 4 digits, the query works fine, but it won't let me use a parameter there.)
The problematic statements is this one: TP.PHASE_CODE like '?%'
Here is my full code:
SELECT C.COSTS_ID, C.AREA_ID, S.SUB_NUMBER, S.SUB_NAME, TP.PHASE_CODE, TP.PHASE_DESC, TI.ITEM_NUMBER, TI.ITEM_DESC,TI.ORDER_UNIT,
C.UNIT_COST, TI.TLPE_ITEMS_ID FROM TLPE_ITEMS TI
INNER JOIN TLPE_PHASES TP ON TI.TLPE_PHASES_ID = TP.TLPE_PHASES_ID
LEFT OUTER JOIN COSTS C ON C.TLPE_ITEMS_ID = TI.TLPE_ITEMS_ID
LEFT OUTER JOIN AREA A ON C.AREA_ID = A.AREA_ID
LEFT OUTER JOIN SUPPLIER S ON C.SUB_NUMBER = S.SUB_NUMBER
WHERE (C.AREA_ID = 1 OR C.AREA_ID = ?) and S.SUB_NUMBER = ? and TI.ITEM_NUMBER = ? and **TP.PHASE_CODE like '?%'**
ORDER BY TP.PHASE_CODE
Any ideas on alternate ways of accomplishing this query?
If you use `LIKE '?%', then the question mark is literal text, not a parameter placeholder.
You can use LIKE ? || '%', or alternatively if your parameter itself never contains a LIKE-pattern: STARTING WITH ? which might be more efficient if the field you're querying is indexed.
You can do
and TP.PHASE_CODE like ?
but when you pass your parameter 8000 to the SQL, you have to add the % behind it, so in this case, you would pass "8000%" to the SQL.
Try String Functions: Left?
WHERE (C.AREA_ID = 1 OR Left(C.AREA_ID,4) = "8000")

What does LEFT in SQL do when it is not paired with JOIN and why does it cause my query to time out?

I was given the following statement:
LEFT(f.field4, CASE WHEN PATINDEX('%[^0-9]%',f.field4) = 0 THEN LEN(f.field4) ELSE PATINDEX('%[^0-9]%',f.field4) - 1 END)=#DealNumber
and am having trouble contacting the person that wrote it. Could someone explain what that statement does, and if it is valid SQL? The goal of the statement is to compare the numeric character in f.field for to the DealNumber. DNumber and DealNumber are the same except for a wildcard at the end of DealNumber.
I am trying to use it in the context of the following statement:
SELECT d.Description, d.FileID, d.DateFiled, u.Contact AS UserFiledName, d.Pages, d.Notes
FROM Documents AS d
LEFT JOIN Files AS f ON d.FileID=f.FileID
LEFT JOIN Users AS u ON d.UserFiled=u.UserID
WHERE SUBSTRING(f.Field8, 2, 1) = #LocationIDString
AND f.field4=#DNumber OR LEFT(f.field4, CASE WHEN PATINDEX('%[^0-9]%',f.field4) = 0 THEN LEN(f.field4) ELSE PATINDEX('%[^0-9]%',f.field4) - 1 END)=#DealNumber"
but my code keeps timing out when I execute it.
It's the CASE clause which is slowing things down, not LEFT per se (although LEFT may prevent the use of indexes, which will have an effect).
The CASE determines what should be compared with #DealNumber, and I think it does the following...
If f.field4 does not start with a digit, use LEFT(f.field4, LEN(f.field4))=#DealNumber: that's equivalent to f.field4=#DealNumber.
If f.field4 does start with digits, use {those digits}=#DealNumber.
This sort of computation isn't very efficient.
I would attempt the following, which makes the large assumption that a mixed string can be cast as an integer — that is, that if you convert ABC to an integer you get zero, and if you convert 123ABC you get what can be converted, 123. I can't find any documentation which says whether that is possible or not.
AND f.field4=#DNumber
OR (f.field4=#DealNumber AND integer(f.field4)=0)
OR (integer(f.field4)=#DealNumber)
The first line is the same as your AND. The second line selects f.field4=#DealNumber only if f.field4 does not start with a number. The third line selects where the initial numeric portion of f.field4 is the same as #DealNumber.
As I say, there is an assumption here that integer() will work in this way. You may need to define a CAST function to do that conversion with strings. That's rather beyond me, although I would be confident that even such a function would be faster than a CASE as you currently have.
From the doc:
left(str text, n int)
Return first n characters in the string. When n is negative, return all but last |n| characters.