Sql Un-Wizardry: Compare values from one list in another - sql

I have a comparison I'd like to make more efficient in SQL.
The input field (fldInputField) is a comma separated list of "1,3,4,5"
The database has a field (fldRoleList) which contains "1,2,3,4,5,6,7,8"
So, for the first occurrence of fldInputField within fldRoleList, tell us which value it was.
Is there a way to achieve the following in MySQL or a Stored Procedure?
pseudo-code
SELECT *
FROM aTable t1
WHERE fldInputField in t1.fldRoleList
/pseudo-code
I'm guessing there might be some functions that are best suited for this type of comparison? I couldn't find anything in the search, if someone could direct me I'll delete the question... Thanks!
UPDATE: This isn't the ideal (or good) way to do things. It's inherited code and we are simply trying to put in a quick fix while we look at building in the logic to deal with this via normalized rows.. Luckily this isn't heavily used code.

I agree with #Ken White's answer that comma-delimited lists have no place in a normalized database design.
The solution would be simpler and perform better if you stored the fldRoleList as multiple rows in a dependent table:
SELECT t1.*, r1.fldRole
FROM aTable t1 JOIN aTableRoles r1 USING (aTable_id)
WHERE FIND_IN_SET(r1.fldRole, fldInputField);
(see the MySQL function FIND_IN_SET())
But that outputs multiple rows if multiple roles match the comma-separated input string. If you need to restrict the result to one row per aTable entry, with the first matching role:
SELECT t1.*, MIN(r1.fldRole) AS First_fldRole
FROM aTable t1 JOIN aTableRoles r1 USING (aTable_id)
WHERE FIND_IN_SET(r1.fldRole, fldInputField);
GROUP BY t1.aTable_id;

You have a terrible schema design, you know. Comma-delimited lists have no business in a DB.
That being said... You're looking for LIKE.
SELECT * FROM aTable t1 WHERE t.fldRoleList LIKE fldInputField + '%'
If the content might not always match at the beginning, add another percent sign before fldInputField.
SELECT * FROM aTable t1 WHERE t.fldRoleList LIKE '%' + fldInputField + '%'

Related

How can I create a temporary numbers table with SQL?

So I came upon a question where someone asked for a list of unused account numbers. The query I wrote for it works, but it is kind of hacky and relies on the existence of a table with more records than existing accounts:
WITH tmp
AS (SELECT Row_number()
OVER(
ORDER BY cusno) a
FROM custtable
fetch first 999999 rows only)
SELECT tmp.a
FROM tmp
WHERE a NOT IN (SELECT cusno
FROM custtable)
This works because customer numbers are reused and there are significantly more records than unique customer numbers. But, like I said, it feels hacky and I'd like to just generate a temporary table with 1 column and x records that are numbered 1 through x. I looked at some recursive solutions, but all of it looked way more involved than the solution I wound up using. Is there an easier way that doesn't rely on existing tables?
I think the simple answer is no. To be able to make a determination of absence, the platform needs to know the expected data set. You can either generate that as a temporary table or data set at runtime - using the method you've used (or a variation thereof) - or you can create a reference table once, and compare against it each time. I'd favour the latter - a table with a single column of integers won't put much of a dent in your disk space and it doesn't make sense to compute an identical result set over and over again.
Here's a really good article from Aaron Bertrand that deals with this very issue:
https://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
(Edit: The queries in that article are TSQL specific, but they should be easily adaptable to DB2 - and the underlying analysis is relevant regardless of platform)
If you search all unused account number you can do it :
with MaxNumber as
(
select max(cusno) MaxID from custtable
),
RecurceNumber (id) as
(
values 1
union all
select id + 1 from RecurceNumber cross join MaxNumber
where id<=MaxID
)
select f1.* from RecurceNumber f1 exception join custtable f2 on f1.id=f2.cusno

Difference between two tables, unknown fields

Is there a way in Access using SQL to get the difference between 2 tables?
I'm building an audit function and I want to return all records from table1 where a value (or values) doesn't match the corresponding record in table2. Primary keys will always match between the two tables. They will always contain the exact same number of fields, field names, and types, as each other. However, the number and name of those fields cannot be determined before the query is run.
Please also note, I am looking for an Access SQL solution. I know how to solve this with VBA.
Thanks,
There are several possibilities to compare fields with known names, but there is no way in SQL to access fields without knowing their name. Mostly becase SQL doesn't consider fields to have a specific order in a table.
So the only way to accomplish what you need in pure Access-SQL would be, if there was a SQL-Command for it (kind of like the * as placeholder for all fields). But there isn't. Microsoft Access SQL Reference.
What you COULD do is create an SQL-clause on the fly in VBA. (I know, you said you didn't want to do it in VBA - but this is doing it in SQL, but using VBA to create the SQL..).
Doing everything in VBA would probably take some time, but creating an SQL on the fly is very fast and you can optimize it to the specific table. Then executing the SQL is the fastest solution you can get.
Not sure without your table structure but you can probably get that done using NOT IN operator (OR) using WHERE NOT EXISTS like
select * from table1
where some_field not in (select some_other_field from table2);
(OR)
select * from table1 t1
where not exists (select 1 from table2 where some_other_field = t1.some_field);
SELECT A.*, B.* FROM A FULL JOIN B ON (A.C = B.C) WHERE A.C IS NULL OR B.C IS NULL;
IF you have tables A and B, both with colum C, here are the records, which are present in table A but not in B.To get all the differences with a single query, a full join must be used,like above

How do I optimize a database for superstring queries?

So I have a database table in MySQL that has a column containing a string. Given a target string, I want to find all the rows that have a substring contained in the target, ie all the rows for which the target string is a superstring for the column. At the moment I'm using a query along the lines of:
SELECT * FROM table WHERE 'my superstring' LIKE CONCAT('%', column, '%')
My worry is that this won't scale. I'm currently doing some tests to see if this is a problem but I'm wondering if anyone has any suggestions for an alternative approach. I've had a brief look at MySQL's full-text indexing but that also appears to be geared toward finding a substring in the data, rather than finding out if the data exists in a given string.
You could create a temporary table with a full text index and insert 'my superstring' into it. Then you could use MySQL's full text match syntax in a join query with your permanent table. You'll still be doing a full table scan on your permanent table because you'll be checking for a match against every single row (what you want, right?). But at least 'my superstring' will be indexed so it will likely perform better than what you've got now.
Alternatively, you could consider simply selecting column from table and performing the match in a high level language. Depending on how many rows are in table, this approach might make more sense. Offloading heavy tasks to a client server (web server) can often be a win because it reduces load on the database server.
If your superstrings are URLs, and you want to find substrings in them, it would be useful to know if your substrings can be anchored on the dots.
For instance, you have superstrings :
www.mafia.gov.ru
www.mymafia.gov.ru
www.lobbies.whitehouse.gov
If your rules contain "mafia' and you want the first 2 to match, then what I'll say doesn't apply.
Else, you can parse your URLs into things like : [ 'www', 'mafia', 'gov', 'ru' ]
Then, it will be much easier to look up each element in your table.
Well it appears the answer is that you don't. This type of indexing is generally not available and if you want it within your MySQL database you'll need to create your own extensions to MySQL. The alternative I'm pursuing is to do the indexing in my application.
Thanks to everyone that responded!
I created a search solution using views that needed to be robust enought to grow with the customers needs. For Example:
CREATE TABLE tblMyData
(
MyId bigint identity(1,1),
Col01 varchar(50),
Col02 varchar(50),
Col03 varchar(50)
)
CREATE VIEW viewMySearchData
as
SELECT
MyId,
ISNULL(Col01,'') + ' ' +
ISNULL(Col02,'') + ' ' +
ISNULL(Col03,'') + ' ' AS SearchData
FROM tblMyData
SELECT
t1.MyId,
t1.Col01,
t1.Col02,
t1.Col03
FROM tblMyData t1
INNER JOIN viewMySearchData t2
ON t1.MyId = t2.MyId
WHERE t2.SearchData like '%search string%'
If they then decide to add columns to tblMyData and they want those columns to be searched then modify viewMysearchData by adding the new colums to "AS SearchData" section.
If they decide that there are two many columns in the search then just modify the viewMySearchData by removing the unwanted columns from the "AS SearchData" section.

SQL Server - Query Short-Circuiting?

Do T-SQL queries in SQL Server support short-circuiting?
For instance, I have a situation where I have two database and I'm comparing data between the two tables to match and copy some info across. In one table, the "ID" field will always have leading zeros (such as "000000001234"), and in the other table, the ID field may or may not have leading zeros (might be "000000001234" or "1234").
So my query to match the two is something like:
select * from table1 where table1.ID LIKE '%1234'
To speed things up, I'm thinking of adding an OR before the like that just says:
table1.ID = table2.ID
to handle the case where both ID's have the padded zeros and are equal.
Will doing so speed up the query by matching items on the "=" and not evaluating the LIKE for every single row (will it short circuit and skip the LIKE)?
SQL Server does NOT short circuit where conditions.
it can't since it's a cost based system: How SQL Server short-circuits WHERE condition evaluation .
You could add a computed column to the table. Then, index the computed column and use that column in the join.
Ex:
Alter Table Table1 Add PaddedId As Right('000000000000' + Id, 12)
Create Index idx_WhateverIndexNameYouWant On Table1(PaddedId)
Then your query would be...
select * from table1 where table1.PaddedID ='000000001234'
This will use the index you just created to quickly return the row.
You want to make sure that at least one of the tables is using its actual data type for the IDs and that it can use an index seek if possible. It depends on the selectivity of your query and the rate of matches though to determine which one should be converted to the other. If you know that you have to scan through the entire first table, then you can't use a seek anyway and you should convert that ID to the data type of the other table.
To make sure that you can use indexes, also avoid LIKE. As an example, it's much better to have:
WHERE
T1.ID = CAST(T2.ID AS VARCHAR) OR
T1.ID = RIGHT('0000000000' + CAST(T2.ID AS VARCHAR), 10)
than:
WHERE
T1.ID LIKE '%' + CAST(T2.ID AS VARCHAR)
As Steven A. Lowe mentioned, the second query might be inaccurate as well.
If you are going to be using all of the rows from T1 though (in other words a LEFT OUTER JOIN to T2) then you might be better off with:
WHERE
CAST(T1.ID AS INT) = T2.ID
Do some query plans with each method if you're not sure and see what works best.
The absolute best route to go though is as others have suggested and change the data type of the tables to match if that's at all possible. Even if you can't do it before this project is due, put it on your "to do" list for the near future.
How about,
table1WithZero.ID = REPLICATE('0', 12-len(table2.ID))+table2.ID
In this case, it should able to use the index on the table1
Just in case it's useful, as the linked page in Mladen Prajdic's anwer explains, CASE clauses are short-circuit evaluated.
If the ID is purely numeric (as your example), I would reccomend (if possible) changing that field to a number type instead. If the database is allready in use it might be hard to change the type though.
fix the database to be consistent
select * from table1 where table1.ID LIKE '%1234'
will match '1234', '01234', '00000000001234', but also '999991234'. Using LIKE pretty much guarantees an index scan (assuming table1.ID is indexed!). Cleaning up the data will improve performance significantly.
if cleaning up the data is not possible, write a user-defined function (UDF) to strip off leading zeros, e.g.
select * from table1 where dbo.udfStripLeadingZeros(table1.ID) = '1234'
this may not improve performance (since the function will have to run for each row) but it will eliminate false matches and make the intent of the query more obvious
EDIT: Tom H's suggestion to CAST to an integer would be best, if that is possible.

Use a LIKE clause in part of an INNER JOIN

Can/Should I use a LIKE criteria as part of an INNER JOIN when building a stored procedure/query? I'm not sure I'm asking the right thing, so let me explain.
I'm creating a procedure that is going to take a list of keywords to be searched for in a column that contains text. If I was sitting at the console, I'd execute it as such:
SELECT Id, Name, Description
FROM dbo.Card
WHERE Description LIKE '%warrior%'
OR
Description LIKE '%fiend%'
OR
Description LIKE '%damage%'
But a trick I picked up a little while go to do "strongly typed" list parsing in a stored procedure is to parse the list into a table variable/temporary table, converting it to the proper type and then doing an INNER JOIN against that table in my final result set. This works great when sending say a list of integer IDs to the procedure. I wind up having a final query that looks like this:
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblExclusiveCard ON dbo.Card.Id = #tblExclusiveCard.CardId
I want to use this trick with a list of strings. But since I'm looking for a particular keyword, I am going to use the LIKE clause. So ideally I'm thinking I'd have my final query look like this:
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblKeyword ON dbo.Card.Description LIKE '%' + #tblKeyword.Value + '%'
Is this possible/recommended?
Is there a better way to do something like this?
The reason I'm putting wildcards on both ends of the clause is because there are "archfiend", "beast-warrior", "direct-damage" and "battle-damage" terms that are used in the card texts.
I'm getting the impression that depending on the performance, I can either use the query I specified or use a full-text keyword search to accomplish the same task?
Other than having the server do a text index on the fields I want to text search, is there anything else I need to do?
Try this
select * from Table_1 a
left join Table_2 b on b.type LIKE '%' + a.type + '%'
This practice is not ideal. Use with caution.
Your first query will work but will require a full table scan because any index on that column will be ignored. You will also have to do some dynamic SQL to generate all your LIKE clauses.
Try a full text search if your using SQL Server or check out one of the Lucene implementations. Joel talked about his success with it recently.
try it...
select * from table11 a inner join table2 b on b.id like (select '%'+a.id+'%') where a.city='abc'.
Its works for me.:-)
It seems like you are looking for full-text search. Because you want to query a set of keywords against the card description and find any hits? Correct?
Personally, I have done it before, and it has worked out well for me. The only issues i could see is possibly issues with an unindexed column, but i think you would have the same issue with a where clause.
My advice to you is just look at the execution plans between the two. I'm sure that it will differ which one is better depending on the situation, just like all good programming problems.
#Dillie-O
How big is this table?
What is the data type of Description field?
If either are small a full text search will be overkill.
#Dillie-O
Maybe not the answer you where looking for but I would advocate a schema change...
proposed schema:
create table name(
nameID identity / int
,name varchar(50))
create table description(
descID identity / int
,desc varchar(50)) --something reasonable and to make the most of it alwase lower case your values
create table nameDescJunc(
nameID int
,descID int)
This will let you use index's without have to implement a bolt on solution, and keeps your data atomic.
related: Recommended SQL database design for tags or tagging
a trick I picked up a little while go
to do "strongly typed" list parsing in
a stored procedure is to parse the
list into a table variable/temporary
table
I think what you might be alluding to here is to put the keywords to include into a table then use relational division to find matches (could also use another table for words to exclude). For a worked example in SQL see Keyword Searches by Joe Celko.
Performance will be depend on the actual server than you use, and on the schema of the data, and the amount of data. With current versions of MS SQL Server, that query should run just fine (MS SQL Server 7.0 had issues with that syntax, but it was addressed in SP2).
Have you run that code through a profiler? If the performance is fast enough and the data has the appropriate indexes in place, you should be all set.
LIKE '%fiend%' will never use an seek, LIKE 'fiend%' will. Simply a wildcard search is not sargable
Try this;
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblKeyword ON dbo.Card.Description LIKE '%' +
CONCAT(CONCAT('%',#tblKeyword.Value),'%') + '%'