SQL Server More Efficient Substring - sql

I am running a query in SQL Server where I need to join two tables into one where the full name field matches the partial name field in another after apostrophes have been removed. For a code example the join is happening like this:
from [Data1]
right join [Data2]
on replace([Data2].[PartialName], '''','')=Substring([Data1].[FullName],1,1+LEN(replace([Data2].[PartialName], '''','')))
And it works. But it takes what would be a 10 second execution if we just used where name=name and makes it take around 20 minutes. This is rather unacceptable in terms of run time so I was wondering if anyone had any more efficient alternatives to consider.
Btw Data 1 has about 800 lines and Data2 has about 1.6 million if it's relevant.
Edit: I've been told I need to give a bit more descriptive information. Basically in this example Data1 is a table from an outside source that contains a name field [FullName] which contains people's full names in the form of 'Last-Name , First-Name Middle-Name(s)' with any apostrophes removed (for example in the name O'Neil it would just be ONeil).
So an example would be 'ONeil , Sarah Conner'
Data2 contains a name field that has names in the form 'Last-Name , First-Name' Middle names are omitted and apostrophes are intact. So for example 'O'Neil , Sarah'
These tables need to be merged together on their name fields, hence the logic above.

DavidG is right, a PERSISTED column is the way to go here. After drinking a little more coffee, I think you need a computed column and then LIKE in your JOIN. The PERSISTED column's SQL would be something like:
ALTER TABLE [Data2] ADD PartialName_na AS REPLACE(PartialName,'''','') PERSISTED;
You may want to add that to an index. Then your new (pseudo) SQL query would be:
SELECT ...
FROM Data2 D2
LEFT JOIN Data1 D1 ON D1.FullName = D2.PartialName_na + '%';
There's no need to use SUBSTRING. LIKE will maintain SARGability here, it doesn't use a leading wildcard.
Edit: Couple of notes. I used the _na suffix to stand for "No Apostrophe"; you can call the column whatever you want. I also changed the query from a RIGHT JOIN to a LEFT JOIN. Personally I feel that LEFT JOINs are much easier to read, however, if you want to swap it back round, feel free.

Related

cannot implement this inner join in oracle

I have recently started to learn oracle, and I am having difficulty understanding this inner join on the tables.
INSERT INTO temp_bill_pay_ft
SELECT DISTINCT
ft.ft_id,
ft.ft_credit_acct_no,
ft.ft_debit_acct_no,
ft.ft_stmt_nos,
ft.ft_debit_their_ref,
ft.ft_date_time
FROM
funds_transfer_his ft
INNER JOIN temp_bill_pay_lwday_pl dt
ON ft.ft_id = dt.ac_ste_trans_reference || ';1'
AND ft.ft_credit_acct_no = dt.ac_id;
It is this line specifically which I dont understand, why do we use || here, I suppose it is for concatenation.
ON ft.ft_id = dt.ac_ste_trans_reference||';1'
Can somebody please explain to me this sql query. I would really appreciate it. Thank you.
This is string concatenation. The need is because there is a design error in the database and join keys are not the same in the two tables. So the data might look something like this:
ft_id ac_ste_trans_reference
123;1 123
abc;1 abc
In order for the join to work, the keys need to match. One possibility is to remove the last two characters from ft_id, but I'm guessing those are meaningful.
I can speculate on why this is so. One possibility is that ft_id is really a compound key combined into a single column -- and the 1 is used to indicate the "type" of key. If so, then there are possibly other values after this:
ft_id
123;1
garbled;2
special;3
The "2" and "3" would refer to other reference tables.
If this is the situation, then it would be cleaner to have a separate column with the correct ac_ste_trans_reference. However that occupies additional space, and can require multiple additional columns for each type. So hacks like the one you see are sometimes implemented.
Yes it is used for concatenation.
But only somebody having worked on this database model can explain what table data represent and why this concatenation is needed for this joining condition.

is there a way to search within multiple tables in SQL

Right now i have 100 tables in SQL and i am looking for a specific string value in all tables, and i do not know which column it is in.
select * from table1, table2 where column1 = 'MyLostString' will not work because i do not know which column it has to be in.
Is there a SQL query for that, must i brute force search every table for every column for that 'MyLostString'
If I were to brute-force search across all tables, is there an efficient query for that?
For instance:
select * from table3 where allcolumns = MyLostString
It is the defining feature of a RDBMS (or at least one of them), that the meaning of a value depends on the column it is in. E.g.: The value 17 will have quite different meanings, if it stands in a customer_id column, than in the product_id of a fictional orders table.
This leads to the fact, that RDBMS are not well equipped to search for a value, no matter in which column of which tables it might be used.
My recommendation is to first study the data model to try and find out, which column of which table should be holding the value. If this really fails, you have a problem much worse than a "lost string".
The last ressort is to transform the DB into something better suited for fulltext search ... such as a flat file. You might want to try mydbexportcommand --options | grep -C10 'My lost string' or friends.

difficulty with an sql query using Oracle database

the question:
Find the title of the books whose keyword contains the last 3 characters of the bookgroup of the book which was booked by "Mr. Karim".
I am trying to do it like this:
SELECT Title
FROM Lib_Book
WHERE BookKeywords LIKE '%(SELECT BookGroup FROM Lib_Book WHERE BookId=(SELECT BookId from Lib_Booking, Lib_Borrower WHERE Lib_Booking.BId=Lib_Borrower.BId AND Lib_Borrower.BName = 'Mr. Karim'))%';
from the part after % upto the end returns me an answer which is 'programming'. so i need to indicate the BookKeyword as '%ing%'. How can i do that?
**the tables are huge so i hvnt written those here..if anyone need to those plz lemme know...thnx
You have the basic concept down, although it's not possible to process a SELECT statement inside a LIKE clause (and I'd probably shoot the RDBMS developer who allowed that - it's a GAPING HUGE HOLE for SQL Injection). Also, you're likely to have problems with multiple results, as your 'query' would fail the moment Mr. Karim had borrowed more than one book.
I'd probably start by attempting to make things run off of joins (oh, and never use implicit join syntax):
SELECT d.title
FROM Lib_Borrower as a
JOIN Lib_Booking as b
ON b.bId = a.bId
JOIN Lib_Book as c
ON c.bookId = b.bookId
JOIN Lib_Book as d
ON d.bookKeywords LIKE '%' + SUBSTRING(c.bookGroup, LENGTH(c.bookGroup) - 3) + '%'
WHERE a.bName = 'Mr. Karim'
Please note the following caveats:
This will get you all titles for all books with similar keywords to all borrowed books (from all 'Mr. Karim's). You may need to include some sort of restrictive criteria while joining to Lib_Booking.
The column bookKeywords seems potentially like a multi-value column (would need example data). If so, your table structure needs to be revised.
The use of SUBSTRING() or RIGHT() will invalidate the use of inidicies in joining to the bookGroup column. There isn't much you can necessarily do about that, given your requirements...
This table is not internationalization safe (because the bookGroup column is language-dependant parsed text). You may find yourself better served by creating a Book_Group table, a Keywords table, and a cross-reference Book_Keywords table, and joining on numerical ids. You may also want language-keyed Book_Group_Description and Keyword_Description tables. This will take more space, and probably take more processing time (increased number of joins, although potentially less textual processing), but give you increased flexibility and 'safety'.

Is there a way to rename a similarly named column from two tables when performing a join?

I have two tables that I am joining with the following query...
select *
from Partners p
inner join OrganizationMembers om on p.ParID = om.OrganizationId
where om.EmailAddress = 'my_email#address.com'
and om.deleted = 0
Which works great but some of the columns from Partners I want to be replaced with similarly named columns from OrganizationMembers. The number of columns I want to replace in the joined table are very few, shouldn't be more than 3.
It is possible to get the result I want by selectively choosing the columns I want in the resulting join like so...
select om.MemberID,
p.ParID,
p.Levelz,
p.encryptedSecureToken,
p.PartnerGroupName,
om.EmailAddress,
om.FirstName,
om.LastName
from Partners p
inner join OrganizationMembers om on p.ParID = om.OrganizationId
where om.EmailAddress = 'my_email#address.com'
and om.deleted = 0
But this creates a very long sequence of select p.a, p.b, p.c, p.d, ... etc ... which I am trying to avoid.
In summary I am trying to get several columns from the Partners table and up to 3 columns from the OrganizationMembers table without having a long column specification sequence at the beginning of the query. Is it possible or am I just dreaming?
select om.MemberID as mem
Use th AS keyword. This is called aliasing.
You are dreaming in your implementation.
Also, as a best practice, select * is something that is typically frowned upon by DBA's.
If you want to limit the results or change anything you must explicitly name the results, as a potential "stop gap you could do something like this.
SELECT p.*, om.MemberId, etc..
But this ONLY works if you want ALL columns from the first table, and then selected items.
Try this:
p.*,
om.EmailAddress,
om.FirstName,
om.LastName
You should never use * though. Always specifying the columns you actually need makes it easier to find out what happens.
But this creates a very long sequence
of select p.a, p.b, p.c, p.d, ... etc
... which I am trying to avoid.
Don't avoid it. Embrace it!
There are lots of reasons why it's best practice to explicity list the desired columns.
It's easier to do searches for where a particular column is being used.
The behavior of the query is more obvious to someone who is trying to maintain it.
Adding a column to the table won't automatically change the behavior of your query.
Removing a column from the table will break your query earlier, making bugs appear closer to the source, and easier to find and fix.
And anything that uses the query is going to have to list all the columns anyway, so there's no point being lazy about it!

How do I optimize a database for superstring queries?

So I have a database table in MySQL that has a column containing a string. Given a target string, I want to find all the rows that have a substring contained in the target, ie all the rows for which the target string is a superstring for the column. At the moment I'm using a query along the lines of:
SELECT * FROM table WHERE 'my superstring' LIKE CONCAT('%', column, '%')
My worry is that this won't scale. I'm currently doing some tests to see if this is a problem but I'm wondering if anyone has any suggestions for an alternative approach. I've had a brief look at MySQL's full-text indexing but that also appears to be geared toward finding a substring in the data, rather than finding out if the data exists in a given string.
You could create a temporary table with a full text index and insert 'my superstring' into it. Then you could use MySQL's full text match syntax in a join query with your permanent table. You'll still be doing a full table scan on your permanent table because you'll be checking for a match against every single row (what you want, right?). But at least 'my superstring' will be indexed so it will likely perform better than what you've got now.
Alternatively, you could consider simply selecting column from table and performing the match in a high level language. Depending on how many rows are in table, this approach might make more sense. Offloading heavy tasks to a client server (web server) can often be a win because it reduces load on the database server.
If your superstrings are URLs, and you want to find substrings in them, it would be useful to know if your substrings can be anchored on the dots.
For instance, you have superstrings :
www.mafia.gov.ru
www.mymafia.gov.ru
www.lobbies.whitehouse.gov
If your rules contain "mafia' and you want the first 2 to match, then what I'll say doesn't apply.
Else, you can parse your URLs into things like : [ 'www', 'mafia', 'gov', 'ru' ]
Then, it will be much easier to look up each element in your table.
Well it appears the answer is that you don't. This type of indexing is generally not available and if you want it within your MySQL database you'll need to create your own extensions to MySQL. The alternative I'm pursuing is to do the indexing in my application.
Thanks to everyone that responded!
I created a search solution using views that needed to be robust enought to grow with the customers needs. For Example:
CREATE TABLE tblMyData
(
MyId bigint identity(1,1),
Col01 varchar(50),
Col02 varchar(50),
Col03 varchar(50)
)
CREATE VIEW viewMySearchData
as
SELECT
MyId,
ISNULL(Col01,'') + ' ' +
ISNULL(Col02,'') + ' ' +
ISNULL(Col03,'') + ' ' AS SearchData
FROM tblMyData
SELECT
t1.MyId,
t1.Col01,
t1.Col02,
t1.Col03
FROM tblMyData t1
INNER JOIN viewMySearchData t2
ON t1.MyId = t2.MyId
WHERE t2.SearchData like '%search string%'
If they then decide to add columns to tblMyData and they want those columns to be searched then modify viewMysearchData by adding the new colums to "AS SearchData" section.
If they decide that there are two many columns in the search then just modify the viewMySearchData by removing the unwanted columns from the "AS SearchData" section.