How to do an IN on a subquery with a CSV? - sql

I have the following varchar value '1,2,4,5, ...' in a table column.
Basically, they're just codes in CSV format. I know it's a bad design, but that's what I have to deal with unfortunately.
Now I want to create a subquery to find descriptions that correspond to all codes.
How do I do that?
I've tried a subquery like
SELECT description FROM table WHERE table.CODE IN tablewithcsv.csvcolumn
But to no avail.
Normal IN's expect values like ('1', '2', '3'), but I suspect my value gets passed as '1,2,3'. Do I have to do some replacing here?

To search in a string, you can use:
SELECT description FROM `table`
INNER JOIN `tablewithcsv` ON `tablewithcsv`.csvcolumn LIKE '%' + `table`.CODE + '%'
This will work if the integers are only on character long, but will return false positives on more than one char. If you have more than one char, you can change it as such:
SELECT description FROM `table`
INNER JOIN `tablewithcsv` ON `tablewithcsv`.csvcolumn = `table`.CODE
OR `tablewithcsv`.csvcolumn LIKE `table`.CODE + ',%'
OR `tablewithcsv`.csvcolumn LIKE '%,' + `table`.CODE
OR `tablewithcsv`.csvcolumn LIKE '%,' + `table`.CODE + ',%'
This could also be done more elegantly using regular expressions

Related

How to optimize Impala query to combine LIKE with IN (literally or effectively)?

I need to try and optimize a query in Impala SQL that does partial string matches on about 60 different strings, against two columns in a database of 50+ billion rows. The values in these two columns are encrypted and have to be decrypted with a user defined function (in Java) to do the partial string match. So query would look something like:
SELECT decrypt_function(column_A), decrypt_function(column_B) FROM myTable WHERE ((decrypt_function(column_A) LIKE '%' + partial_string_1 + '%') OR (decrypt_function(column_B) LIKE '%' + partial_string_1 + '%')) OR ((decrypt_function(column_A) LIKE '%' + partial_string_2 + '%') OR (decrypt_function(column_B) LIKE '%' + partial_string_2 + '%')) OR ... [up to partial_string_60]
What I really want to do is decrypt the two column values I'm comparing with, once for each row and then compare that value with all the partial strings, then go onto the next row etc (for 55 billion rows). Is that possible somehow? Can there be a subquery that assigns the decrypted column value to a variable before using that to do the string comparison to each of the 60 strings? Then go onto the next row...
Or is some other optimization possible? E.g. using 'IN', so ... WHERE (decrypt_function(column_A) IN ('%' + partial_string_1 + '%', '%' + partial_string_2 + '%', ... , '%' + partial_string_60 + '%')) OR (decrypt_function(column_B) IN ('%' + partial_string_1 + '%', '%' + partial_string_2 + '%', ... , '%' + partial_string_60 + '%'))
Thanks
Use subquery and also regexp_like can have many patterns concatenated with OR (|), so you can check all alternatives in single regexp, though you may need to split into several function calls if the pattern string is too long:
select colA, ColB
from
(--decrypt in the subquery
SELECT decrypt_function(column_A) as colA, decrypt_function(column_B) as ColB
FROM myTable
) as s
where
--put most frequent substrings first in the regexp
regexp_like(ColA,'partial_string_1|partial_string_2|partial_string_3') --add more
OR
regexp_like(ColB,'partial_string_1|partial_string_2|partial_string_3')
In Hive use this syntax:
where ColA rlike 'partial_string_1|partial_string_2|partial_string_3'
OR ColB rlike 'partial_string_1|partial_string_2|partial_string_3'

Search for specific string inside column field

In my table for specific column values are stored in three diffrent ways as shown below. It could be either one, two or three items separated by commas (of course if more than one value). Minimum is one value, maximum 3 values separated by commas. To be clear i know it's bad approach it was done (not by me) however i have to work on this and i have to change just only this query. Example showing three ways of storing values:
MaterialAttributes (column name)
----------------------------------
1,12,32
3,1
9
I have specific sql query for searching if some value existing within field. It is universal to check all tree ways.
somevalue1
or:
somevalue1,somevalue2
or:
somevalue1,somevalue2,somevalue3
Therefore for instance if i search entire table for each row in that column to get records where somevalue2 appears this query correctly gives me correct result.
This is the query:
";WITH spacesdeleted (vater, matatt) as (SELECT vater, REPLACE(MaterialAttributes, ' ', '') MaterialAttributes FROM myTable),
matattrfiltered (vat) as (SELECT vater FROM spacesdeleted WHERE matatt = #matAttrId
OR matatt LIKE #matAttrId +',%'
OR matatt LIKE '%,'+#matAttrId
OR matatt LIKE '%,'+#matAttrId+',%' ),
dictinctVaters (disc_vats) as (SELECT distinct(vat) FROM matattrfiltered)
SELECT ID from T_Artikel WHERE Vater IN (SELECT disc_vats FROM dictinctVaters)"
Note: For security reasons if for some reasons there are spaces close to commas there will be removed (just information from other developer).
What is the question:
The problem now is that logic changed in the way there could be instead of 3 (max) - 12 to store in that column.
OR matatt LIKE #matAttrId +',%'
OR matatt LIKE #matAttrId +',%'
These are the same, should one be '%,' + #matAttrId?
Regardless, I think there's only 4 cases you need:
= #matAttrId
LIKE #matAttrId + ',%'
LIKE '%,' +#matAttrId
LIKE '%,' +#matAttrId + ',%'
Covering
single value, equals
value is start of list
value is end of list
value is in middle of list
Which is what your original query already has.
You can search with
WHERE ',' + matatt + ',' LIKE '%,' + #matAttrId + ',%'
It works like this: matatt is extended to look like this
,1,12,32,
,3,1,
,9,
Now you can always serach for an id looking like ,id, by using the search pattern %,id,%, where id is the real id.
This works for any number of values per column.

Like Query in SQL taking time

So I've looked around to try to find some posts on this and there are many
Like Query 1 and Like Query 2 but none that address my specific question (that I could find).
I have two tables in which I have around 5000000+ records and I am returning Search result from these tables as :
SELECT A.ContactFirstName, A.ContactLastName
FROM Customer.CustomerDetails AS A WITH (nolock)
WHERE (A.ContactFirstName + ' ' + A.ContactLastName LIKE '%' + 'a' + '%')
UNION
SELECT C.ContactFirstName, C.ContactLastName
FROM Customer.Contacts AS C WITH (nolock)
WHERE (C.ContactFirstName + ' ' + C.ContactLastName LIKE '%' + 'a' + '%')
My problem is it is taking around 1 minute to execute.
For above query I am expecting result like :
Please suggest me the best practice to improve performance. Thanks in advance.
NOTE : No missing Indexes.
when you use "LIKE '%xxx%'" index are not used that why your query is slow i think. When you use "LIKE 'xxx%')" index is used (if an index exist on column of course. >Other proble you do a like on concatenante column, i dont knwo if index is used in this case. And why do a 'xxx' + ' ' + 'yyy' like 'z%', just do 'xxx' like 'z%' its the same. You can try to modify your query like this
SELECT A.ContactFirstName, A.ContactLastName
FROM Customer.CustomerDetails AS A WITH (nolock)
WHERE A.ContactFirstName LIKE '%a%' or A.ContactLastName LIKE '%a%'
UNION
SELECT C.ContactFirstName, C.ContactLastName
FROM Customer.Contacts AS C WITH (nolock)
WHERE C.ContactFirstName LIKE 'a%'
Use Charindex which improves performance of the search ,Here it checks the string to match with first charcter of given search charecter and doesn't search for any more matches.
DECLARE #Search VARCHAR(10)='a'
SELECT A.ContactFirstName, A.ContactLastName
FROM Customer.CustomerDetails AS A WITH (NOLOCK)
WHERE CHARINDEX(#Search,(A.ContactFirstName + ' ' + A.ContactLastName),0)>1

performing sql select against a full name using wildcards

I have a stored procedure that I am passing in a general string variable called #SearchText as a varchar. This variable contains names, either part of a name, or a full name. I need to do a select on a table based on this variable, using wildcards. The inbound variable could be anything like (for the name 'john smith'):
'j', 'joh', 'john', 'sm', 'smith', 'john s', john smith'... you get the point.
So, the blunt approach I took is
select x from TableA
where FirstName like '%' + #SearchText + '%'
OR LastName like '%' + #SearchText + '%'
Obviously when a space is encountered it screws up the result set. Can someone please help me understand how to tweak this so it can match on any "amount" of the full name?
If this has already been answered, I couldn't find it... a hotlink to an existing solution would be just as appreciated here.
I might suggest something like this:
where FirstName + ' ' + Lastname like '%' + replace(#Searchtest, ' ', '%') + '%' or
LastName + ' ' + Firstname like '%' + replace(#Searchtest, ' ', '%') + '%'
However, if you are trying to do such full text searches, you might consider using a full text index. That generally provides the right level of functionality for these types of queries.
You can do this as follows:
select x from TableA
where FirstName+' '+LastName like '%' + #SearchText + '%'
Basically, you concatenate the first and last name first and apply the LIKE operator on the concatenation.

how to compare string in SQL without using LIKE

I am using SQL query shown below to compare AMCcode. But if I compare the AMCcode '1' using LIKE operator it will compare all the entries with AMCcode 1, 10,11,12,13. .. 19, 21,31.... etc. But I want to match the AMCcode only with 1. Please suggest how can I do it. The code is given below :
ISNULL(CONVERT(VARCHAR(10),PM.PA_AMCCode), '') like
(
CASE
WHEN #AMCCode IS NULL THEN '%'
ELSE '%'+#AMCCode+ '%'
END
)
This is part of the code where I need to replace the LIKE operator with any other operator which will give the AMCcode with 1 when I want to search AMCcode of 1, not all 10,11,12..... Please help
I think you are looking for something like this:
where ',' + cast(PM.PA_AMCCode as varchar(255)) + ',' like '%,' + #AMCCodes + ',%'
This includes the delimiters in the comparison.
Note that a better method is to split the string and use a join, something like this:
select t.*
from t cross apply
(select cast(code as int) as code
from dbo.split(#AMCCodes, ',') s(code)
) s
where t.AMCCode = s.code;
This is better because under some circumstances, this version can make use of an index on AMCCode.
If you want to exactly match the value, you don't need to use '%' in your query. You can just use like as below
ISNULL(CONVERT(VARCHAR(10),PM.PA_AMCCode), '') like
(
CASE
WHEN #AMCCode IS NULL THEN '%'
ELSE #AMCCode
END
)
Possibly you can remove case statement and query like
ISNULL(CONVERT(VARCHAR(10),PM.PA_AMCCode), '') like #AMCCode