Join SQL Server tables on a like statement

Join SQL Server tables on a like statement - sql

I am hoping this isn't a repeat. I've checked the searches and I can't seem to find a clear answer to this.
I have a table that has it's primary key set to be a UniqueIdentifier. I also have another table that has a varchar column that basically contains a url with a query string that contains guids from my first table.
So my 2 tables are like:
StateTable
StateID StateName
EB06F84C-15B9-4397-98AD-4A63DA2A238E Active
URLTable
URL
page.aspx?id=EB06F84C-15B9-4397-98AD-4A63DA2A238E
What I'm trying to do is join together URLTable and StateTable ON the value of StateID is contained in URL of URL table. I haven't really figured out the join. I've even tried just selecting the one table and tried to filter by the values in StateTable. I've tried doing something like this:
SELECT *
FROM URLTable
WHERE EXISTS
(SELECT *
FROM StateTable
WHERE URL LIKE '%' + StateID + '%')
Even that doesn't work because it says I'm comparing uniqueidentifier and varchar.
Is there any way to join 2 tables using a like command and where the like command isn't comparing 2 incompatible variables?
Thank you!!
UPDATE: Let me add some additional things I should have mentioned. The query is for the purposes of building analytics reports. The tables are part of a CMS analytics package... so updating or changing the table structure is not an option.
Secondly, these tables see a very high amount of traffic since they're capturing site analytics... so performance is very much an issue. The 3rd thing is that in my example, I said id= but there may be multiple values such as id=guid&user=guid&date=date.
UPDATE 2: One more thing I just realized to my horror is that sometimes the query string has the dashes removed from the GUID.. and sometimes not.. so unless I"m mistaken, I can't cast the substring to Uniqueidentifier. Can anyone confirm? sigh. I did get it to work using
REPLACE('-','',CONVERT(varchar(50), a.AutomationStateId))
but now I'm very much worried about performance issues with this since the URL's table is very large. This might be the nature of the beast, though, unless there's anything I can do.

Cast StateID to a compatible type, e.g.
WHERE URL LIKE '%' + CONVERT(varchar(50), StateID) + '%'
or
WHERE URL LIKE N'%' + CONVERT(nvarchar(50), StateID) + N'%'
if URL is nvarchar(...)
EDIT
As pointed out in another answer, this could result in poor performance on large tables.
The LIKE combined with a CONVERT will result in a table scan. This may not be a problem for small tables, but you should consider splitting the URL into two columns if performance becomes a problem. One column would contain 'page.aspx?id=' and the other the UNIQUEIDENTIFIER. Your query could then be optimized much more easily.

select u.* from urltable
join statetable s on url like N'%' + (convert(varchar(50),s.stateid) + N'%'
performance is likely to be awful

Do you know that the = is always there and always is a UNIQUEIDENTIFIER. Then you can do this:
WHERE CAST(SUBSTRING(URL, CHARINDEX('=',URL)+1,LEN(URL)) AS UNIQUEIDENTIFIER)=StateID
EDIT
As part of the comment you can also so it with a JOIN. Like this:
select
u.*
from
urltable
join statetable s
on CAST(SUBSTRING(URL, CHARINDEX('=',URL)+1,LEN(URL)) AS UNIQUEIDENTIFIER)=StateID

You may get a performance improvement if you build a temp table first, with the option to index the temp table. You could then also modify the schema (of your temp table) which could give you options on your join. Often when joining to BIG tables it helps to extract a subset of data to a temp table first, then join to it. Other times the overhead of the temp table is bigger than using an 'ugly' join

Related

SQL Server More Efficient Substring

I am running a query in SQL Server where I need to join two tables into one where the full name field matches the partial name field in another after apostrophes have been removed. For a code example the join is happening like this:
from [Data1]
right join [Data2]
on replace([Data2].[PartialName], '''','')=Substring([Data1].[FullName],1,1+LEN(replace([Data2].[PartialName], '''','')))
And it works. But it takes what would be a 10 second execution if we just used where name=name and makes it take around 20 minutes. This is rather unacceptable in terms of run time so I was wondering if anyone had any more efficient alternatives to consider.
Btw Data 1 has about 800 lines and Data2 has about 1.6 million if it's relevant.
Edit: I've been told I need to give a bit more descriptive information. Basically in this example Data1 is a table from an outside source that contains a name field [FullName] which contains people's full names in the form of 'Last-Name , First-Name Middle-Name(s)' with any apostrophes removed (for example in the name O'Neil it would just be ONeil).
So an example would be 'ONeil , Sarah Conner'
Data2 contains a name field that has names in the form 'Last-Name , First-Name' Middle names are omitted and apostrophes are intact. So for example 'O'Neil , Sarah'
These tables need to be merged together on their name fields, hence the logic above.

DavidG is right, a PERSISTED column is the way to go here. After drinking a little more coffee, I think you need a computed column and then LIKE in your JOIN. The PERSISTED column's SQL would be something like:
ALTER TABLE [Data2] ADD PartialName_na AS REPLACE(PartialName,'''','') PERSISTED;
You may want to add that to an index. Then your new (pseudo) SQL query would be:
SELECT ...
FROM Data2 D2
LEFT JOIN Data1 D1 ON D1.FullName = D2.PartialName_na + '%';
There's no need to use SUBSTRING. LIKE will maintain SARGability here, it doesn't use a leading wildcard.
Edit: Couple of notes. I used the _na suffix to stand for "No Apostrophe"; you can call the column whatever you want. I also changed the query from a RIGHT JOIN to a LEFT JOIN. Personally I feel that LEFT JOINs are much easier to read, however, if you want to swap it back round, feel free.

difficulty with an sql query using Oracle database

the question:
Find the title of the books whose keyword contains the last 3 characters of the bookgroup of the book which was booked by "Mr. Karim".
I am trying to do it like this:
SELECT Title
FROM Lib_Book
WHERE BookKeywords LIKE '%(SELECT BookGroup FROM Lib_Book WHERE BookId=(SELECT BookId from Lib_Booking, Lib_Borrower WHERE Lib_Booking.BId=Lib_Borrower.BId AND Lib_Borrower.BName = 'Mr. Karim'))%';
from the part after % upto the end returns me an answer which is 'programming'. so i need to indicate the BookKeyword as '%ing%'. How can i do that?
**the tables are huge so i hvnt written those here..if anyone need to those plz lemme know...thnx

You have the basic concept down, although it's not possible to process a SELECT statement inside a LIKE clause (and I'd probably shoot the RDBMS developer who allowed that - it's a GAPING HUGE HOLE for SQL Injection). Also, you're likely to have problems with multiple results, as your 'query' would fail the moment Mr. Karim had borrowed more than one book.
I'd probably start by attempting to make things run off of joins (oh, and never use implicit join syntax):
SELECT d.title
FROM Lib_Borrower as a
JOIN Lib_Booking as b
ON b.bId = a.bId
JOIN Lib_Book as c
ON c.bookId = b.bookId
JOIN Lib_Book as d
ON d.bookKeywords LIKE '%' + SUBSTRING(c.bookGroup, LENGTH(c.bookGroup) - 3) + '%'
WHERE a.bName = 'Mr. Karim'
Please note the following caveats:
This will get you all titles for all books with similar keywords to all borrowed books (from all 'Mr. Karim's). You may need to include some sort of restrictive criteria while joining to Lib_Booking.
The column bookKeywords seems potentially like a multi-value column (would need example data). If so, your table structure needs to be revised.
The use of SUBSTRING() or RIGHT() will invalidate the use of inidicies in joining to the bookGroup column. There isn't much you can necessarily do about that, given your requirements...
This table is not internationalization safe (because the bookGroup column is language-dependant parsed text). You may find yourself better served by creating a Book_Group table, a Keywords table, and a cross-reference Book_Keywords table, and joining on numerical ids. You may also want language-keyed Book_Group_Description and Keyword_Description tables. This will take more space, and probably take more processing time (increased number of joins, although potentially less textual processing), but give you increased flexibility and 'safety'.

LIKE work-around in SQL (Performance issues)

I've been reading around and found that using LIKE causes a big slowdown in queries.
A workmate recommended we use
Select Name
From mytable
a.Name IN (SELECT Name
FROM mytable
WHERE Name LIKE '%' + ISNULL(#Name, N'') + '%'
GROUP BY Name)
in lieu of
Select Name
From mytable
a.Name LIKE '%' + ISNULL(#Name, N'') + '%'
Now I'm no SQL expert and I don't really understand the inner workings of these statements. Is this a better option worth the effort of typing a few extra characters with each like statement? Is there an even better (and easier to type) alternative?

There are a couple of performance issues to address...
Don't Access the Same Table More Than Once, If Possible
Don't use a subquery for criteria that can be done without the need for referencing additional copies of the same table. It's acceptable if you need data from a copy of the table due to using aggregate functions (MAX, MIN, etc), though analytic functions (ROW_NUMBER, RANK, etc) might be more accommodating (assuming supported).
Don't Compare What You Don't Need To
If your parameter is NULL, and that means that you want any value for the columns you are comparing against, don't include filtration criteria. Statements like these:
WHERE a.Name LIKE '%' + ISNULL(#Name, N'') + '%'
...guarantee the optimizer will have to compare values for the name column, wildcarding or not. Worse still in the case with LIKE is that wildcarding the left side of the evaluation ensures that an index can't be used if one is present on the column being searched.
A better performing approach would be:
IF #Name IS NOT NULL
BEGIN
SELECT ...
FROM ...
WHERE a.name LIKE '%' + #Name + '%'
END
ELSE
BEGIN
SELECT ...
FROM ...
END
Well performing SQL is all about tailoring to exactly what you need. Which is why you should be considering dynamic SQL when you have queries with two or more independent criteria.
Use The Right Tool
The LIKE operator isn't very efficient at searching text when you're checking for the existence of a string within text data. Full Text Search (FTS) technology was designed to address the shortcomings:
IF #Name IS NOT NULL
BEGIN
SELECT ...
FROM ...
WHERE CONTAINS(a.name, #Name)
END
ELSE
BEGIN
SELECT ...
FROM ...
END
Always Test & Compare
I agree with LittleBobbyTables - the solution ultimately relies on checking the query/execution plan for all the alternatives because table design & data can impact optimizer decision & performance. In SQL Server, the one with the lowest subtreecost is the most efficient, but it can change over time if the table statistics and indexes aren't maintained.

Simply compare the execution plans and you should see the difference.
I don't have your exact data, but I ran the following queries against a SQL Server 2005 database of mine (yes, it's nerdy):
SELECT UnitName
FROM Units
WHERE (UnitName LIKE '%Space Marine%')
SELECT UnitName
FROM Units
WHERE UnitName IN (
(SELECT UnitName FROM Units
WHERE UnitName LIKE '%Space Marine%' GROUP BY UnitName)
)
Here were my execution plan results:
Your co-worker's suggestion adds a nested loop and a second clustered index scan to my query as you can see above. Your mileage may vary, but definitely check the execution plans to see how they compare. I can't imagine how it would be more efficient.

Unless IIQR is some smaller table that indexes the names somehow (and is not the original table being queried here from the start), I don't see how that longer version helps at all; it's doing the exact same thing, but just adding in an extra step of creating a set of results which is when used in an IN.
But I'd be dubious even if IIQR is a smaller 'index' table. I'd want to see more about the database in question and what the query plan ends up being for each.
LIKE can have a negative effect on query performance because it often requires a table scan - physically loading each record's relevant field and searching for the text in question. Even if the field is indexed, this is likely the case. But there may be no way around it, if what you need to do is search for partial text at any possible location inside a field.
Depending on the size of the table in question, though; it may really not matter at all.
For you, though; I would suggest that keeping it simple is best. Unless you really do know what the whole effect of complicating a query would be on performance, it can be hard to try to decide which way to do things.

How do I optimize a database for superstring queries?

So I have a database table in MySQL that has a column containing a string. Given a target string, I want to find all the rows that have a substring contained in the target, ie all the rows for which the target string is a superstring for the column. At the moment I'm using a query along the lines of:
SELECT * FROM table WHERE 'my superstring' LIKE CONCAT('%', column, '%')
My worry is that this won't scale. I'm currently doing some tests to see if this is a problem but I'm wondering if anyone has any suggestions for an alternative approach. I've had a brief look at MySQL's full-text indexing but that also appears to be geared toward finding a substring in the data, rather than finding out if the data exists in a given string.

You could create a temporary table with a full text index and insert 'my superstring' into it. Then you could use MySQL's full text match syntax in a join query with your permanent table. You'll still be doing a full table scan on your permanent table because you'll be checking for a match against every single row (what you want, right?). But at least 'my superstring' will be indexed so it will likely perform better than what you've got now.
Alternatively, you could consider simply selecting column from table and performing the match in a high level language. Depending on how many rows are in table, this approach might make more sense. Offloading heavy tasks to a client server (web server) can often be a win because it reduces load on the database server.

If your superstrings are URLs, and you want to find substrings in them, it would be useful to know if your substrings can be anchored on the dots.
For instance, you have superstrings :
www.mafia.gov.ru
www.mymafia.gov.ru
www.lobbies.whitehouse.gov
If your rules contain "mafia' and you want the first 2 to match, then what I'll say doesn't apply.
Else, you can parse your URLs into things like : [ 'www', 'mafia', 'gov', 'ru' ]
Then, it will be much easier to look up each element in your table.

Well it appears the answer is that you don't. This type of indexing is generally not available and if you want it within your MySQL database you'll need to create your own extensions to MySQL. The alternative I'm pursuing is to do the indexing in my application.
Thanks to everyone that responded!

I created a search solution using views that needed to be robust enought to grow with the customers needs. For Example:
CREATE TABLE tblMyData
(
MyId bigint identity(1,1),
Col01 varchar(50),
Col02 varchar(50),
Col03 varchar(50)
)
CREATE VIEW viewMySearchData
as
SELECT
MyId,
ISNULL(Col01,'') + ' ' +
ISNULL(Col02,'') + ' ' +
ISNULL(Col03,'') + ' ' AS SearchData
FROM tblMyData
SELECT
t1.MyId,
t1.Col01,
t1.Col02,
t1.Col03
FROM tblMyData t1
INNER JOIN viewMySearchData t2
ON t1.MyId = t2.MyId
WHERE t2.SearchData like '%search string%'
If they then decide to add columns to tblMyData and they want those columns to be searched then modify viewMysearchData by adding the new colums to "AS SearchData" section.
If they decide that there are two many columns in the search then just modify the viewMySearchData by removing the unwanted columns from the "AS SearchData" section.

Use a LIKE clause in part of an INNER JOIN

Can/Should I use a LIKE criteria as part of an INNER JOIN when building a stored procedure/query? I'm not sure I'm asking the right thing, so let me explain.
I'm creating a procedure that is going to take a list of keywords to be searched for in a column that contains text. If I was sitting at the console, I'd execute it as such:
SELECT Id, Name, Description
FROM dbo.Card
WHERE Description LIKE '%warrior%'
OR
Description LIKE '%fiend%'
OR
Description LIKE '%damage%'
But a trick I picked up a little while go to do "strongly typed" list parsing in a stored procedure is to parse the list into a table variable/temporary table, converting it to the proper type and then doing an INNER JOIN against that table in my final result set. This works great when sending say a list of integer IDs to the procedure. I wind up having a final query that looks like this:
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblExclusiveCard ON dbo.Card.Id = #tblExclusiveCard.CardId
I want to use this trick with a list of strings. But since I'm looking for a particular keyword, I am going to use the LIKE clause. So ideally I'm thinking I'd have my final query look like this:
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblKeyword ON dbo.Card.Description LIKE '%' + #tblKeyword.Value + '%'
Is this possible/recommended?
Is there a better way to do something like this?
The reason I'm putting wildcards on both ends of the clause is because there are "archfiend", "beast-warrior", "direct-damage" and "battle-damage" terms that are used in the card texts.
I'm getting the impression that depending on the performance, I can either use the query I specified or use a full-text keyword search to accomplish the same task?
Other than having the server do a text index on the fields I want to text search, is there anything else I need to do?

Try this
select * from Table_1 a
left join Table_2 b on b.type LIKE '%' + a.type + '%'
This practice is not ideal. Use with caution.

Your first query will work but will require a full table scan because any index on that column will be ignored. You will also have to do some dynamic SQL to generate all your LIKE clauses.
Try a full text search if your using SQL Server or check out one of the Lucene implementations. Joel talked about his success with it recently.

try it...
select * from table11 a inner join table2 b on b.id like (select '%'+a.id+'%') where a.city='abc'.
Its works for me.:-)

It seems like you are looking for full-text search. Because you want to query a set of keywords against the card description and find any hits? Correct?

Personally, I have done it before, and it has worked out well for me. The only issues i could see is possibly issues with an unindexed column, but i think you would have the same issue with a where clause.
My advice to you is just look at the execution plans between the two. I'm sure that it will differ which one is better depending on the situation, just like all good programming problems.

#Dillie-O
How big is this table?
What is the data type of Description field?
If either are small a full text search will be overkill.
#Dillie-O
Maybe not the answer you where looking for but I would advocate a schema change...
proposed schema:
create table name(
nameID identity / int
,name varchar(50))
create table description(
descID identity / int
,desc varchar(50)) --something reasonable and to make the most of it alwase lower case your values
create table nameDescJunc(
nameID int
,descID int)
This will let you use index's without have to implement a bolt on solution, and keeps your data atomic.
related: Recommended SQL database design for tags or tagging

a trick I picked up a little while go
to do "strongly typed" list parsing in
a stored procedure is to parse the
list into a table variable/temporary
table
I think what you might be alluding to here is to put the keywords to include into a table then use relational division to find matches (could also use another table for words to exclude). For a worked example in SQL see Keyword Searches by Joe Celko.

Performance will be depend on the actual server than you use, and on the schema of the data, and the amount of data. With current versions of MS SQL Server, that query should run just fine (MS SQL Server 7.0 had issues with that syntax, but it was addressed in SP2).
Have you run that code through a profiler? If the performance is fast enough and the data has the appropriate indexes in place, you should be all set.

LIKE '%fiend%' will never use an seek, LIKE 'fiend%' will. Simply a wildcard search is not sargable

Try this;
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblKeyword ON dbo.Card.Description LIKE '%' +
CONCAT(CONCAT('%',#tblKeyword.Value),'%') + '%'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas