Advanced Sql query to identify missing data - sql

I am currently dealing with a sql server table 'suburb' which has a suburb_id column and an adjacent_suburb_ids column. The adjacent_suburb_ids column is a comma separated string of other suburb_ids.
I have found that some of the records are not reciprocating -
e.g "SuburbA" has "SuburbB" id in adjacent_suburb_ids but "SuburbB" does not have "SuburbA" id in adjacent_suburb_ids
I need to identify all the suburbs which are not reciprocating the adjacent_suburbs, can I do this with a SQL query?
Please do not comment on the data/table structure as it is not in my control and I can't change it.

Assuming I'm understanding your question correctly, you can join the table to itself using the like and not like operators:
select s.suburb_id, s2.suburb_id as s2id
from suburb s
join suburb s2 on
s.suburb_id <> s2.suburb_id
and ',' + s2.adjacent_suburb_ids + ',' like
'%,' + cast(s.suburb_id as varchar(10)) + ',%'
and ',' + s.adjacent_suburb_ids + ',' not like
'%,' + cast(s2.suburb_id as varchar(10)) + ',%'
SQL Fiddle Demo
You need to concatenate the comma before and after to do a search within the set. And yes, if you had the chance, you should consider normalizing the data.

Related

How to optimize Impala query to combine LIKE with IN (literally or effectively)?

I need to try and optimize a query in Impala SQL that does partial string matches on about 60 different strings, against two columns in a database of 50+ billion rows. The values in these two columns are encrypted and have to be decrypted with a user defined function (in Java) to do the partial string match. So query would look something like:
SELECT decrypt_function(column_A), decrypt_function(column_B) FROM myTable WHERE ((decrypt_function(column_A) LIKE '%' + partial_string_1 + '%') OR (decrypt_function(column_B) LIKE '%' + partial_string_1 + '%')) OR ((decrypt_function(column_A) LIKE '%' + partial_string_2 + '%') OR (decrypt_function(column_B) LIKE '%' + partial_string_2 + '%')) OR ... [up to partial_string_60]
What I really want to do is decrypt the two column values I'm comparing with, once for each row and then compare that value with all the partial strings, then go onto the next row etc (for 55 billion rows). Is that possible somehow? Can there be a subquery that assigns the decrypted column value to a variable before using that to do the string comparison to each of the 60 strings? Then go onto the next row...
Or is some other optimization possible? E.g. using 'IN', so ... WHERE (decrypt_function(column_A) IN ('%' + partial_string_1 + '%', '%' + partial_string_2 + '%', ... , '%' + partial_string_60 + '%')) OR (decrypt_function(column_B) IN ('%' + partial_string_1 + '%', '%' + partial_string_2 + '%', ... , '%' + partial_string_60 + '%'))
Thanks
Use subquery and also regexp_like can have many patterns concatenated with OR (|), so you can check all alternatives in single regexp, though you may need to split into several function calls if the pattern string is too long:
select colA, ColB
from
(--decrypt in the subquery
SELECT decrypt_function(column_A) as colA, decrypt_function(column_B) as ColB
FROM myTable
) as s
where
--put most frequent substrings first in the regexp
regexp_like(ColA,'partial_string_1|partial_string_2|partial_string_3') --add more
OR
regexp_like(ColB,'partial_string_1|partial_string_2|partial_string_3')
In Hive use this syntax:
where ColA rlike 'partial_string_1|partial_string_2|partial_string_3'
OR ColB rlike 'partial_string_1|partial_string_2|partial_string_3'

Filter rows by whether a text column contains any words in a string in SQL

My SQL Server database table has a column text which is a long string of text.
The search list is a string of words separated by comma. I want to grab those rows where the text column contains any one of words in the string.
DECLARE #words_to_search nvarchar(50)
SET #words_to_search = 'apple, pear, orange'
SELECT *
FROM myTbl
WHERE text ??? --how to specify text contains #words_to_search
Thanks a lot in advance.
If you're running SQL Server 2016 or later, you can use STRING_SPLIT to convert the words to search into a single column table, and then JOIN that to your table using LIKE:
DECLARE #words_to_search nvarchar(50)
SET #words_to_search = 'apple,pear,orange'
SELECT *
FROM myTbl
JOIN STRING_SPLIT(#words_to_search, ',') ON text LIKE '%' + value + '%';
Demo on SQLFiddle
Note that as the query is written it will (for example) match apple within Snapple. You can work around that by making the JOIN condition a bit more complex:
SELECT *
FROM myTbl t
JOIN STRING_SPLIT(#words_to_search, ',') v
ON t.text LIKE '%[^A-Za-z]' + value + '[^A-Za-z]%'
OR t.text LIKE value + '[^A-Za-z]%'
OR t.text LIKE '%[^A-Za-z]' + value;
Demo on SQLFiddle
First, I would use exists, unless you want to return the matching word.
Second, you can do this with a single like comparison If words are separated by spaces:
select t.*
from t
where exists (select 1
from string_split(#words_to_search, ',') s
where ' ' + t.text + ' ' like '% ' + value + ' %'
);
For more generic separators, you can use:
select t.*
from t
where exists (select 1
from string_split(#words_to_search, ',') s
where ' ' + t.text + ' ' like '%[^A-Za-z]' + value + '[^A-Za-z]%'
);
Or whatever describes your separators.
Note that your list of words is separated by a comma-space, not just a comma. However, based on your description (not the sample data), I have only used a ',' for the separator.

Search for specific string inside column field

In my table for specific column values are stored in three diffrent ways as shown below. It could be either one, two or three items separated by commas (of course if more than one value). Minimum is one value, maximum 3 values separated by commas. To be clear i know it's bad approach it was done (not by me) however i have to work on this and i have to change just only this query. Example showing three ways of storing values:
MaterialAttributes (column name)
----------------------------------
1,12,32
3,1
9
I have specific sql query for searching if some value existing within field. It is universal to check all tree ways.
somevalue1
or:
somevalue1,somevalue2
or:
somevalue1,somevalue2,somevalue3
Therefore for instance if i search entire table for each row in that column to get records where somevalue2 appears this query correctly gives me correct result.
This is the query:
";WITH spacesdeleted (vater, matatt) as (SELECT vater, REPLACE(MaterialAttributes, ' ', '') MaterialAttributes FROM myTable),
matattrfiltered (vat) as (SELECT vater FROM spacesdeleted WHERE matatt = #matAttrId
OR matatt LIKE #matAttrId +',%'
OR matatt LIKE '%,'+#matAttrId
OR matatt LIKE '%,'+#matAttrId+',%' ),
dictinctVaters (disc_vats) as (SELECT distinct(vat) FROM matattrfiltered)
SELECT ID from T_Artikel WHERE Vater IN (SELECT disc_vats FROM dictinctVaters)"
Note: For security reasons if for some reasons there are spaces close to commas there will be removed (just information from other developer).
What is the question:
The problem now is that logic changed in the way there could be instead of 3 (max) - 12 to store in that column.
OR matatt LIKE #matAttrId +',%'
OR matatt LIKE #matAttrId +',%'
These are the same, should one be '%,' + #matAttrId?
Regardless, I think there's only 4 cases you need:
= #matAttrId
LIKE #matAttrId + ',%'
LIKE '%,' +#matAttrId
LIKE '%,' +#matAttrId + ',%'
Covering
single value, equals
value is start of list
value is end of list
value is in middle of list
Which is what your original query already has.
You can search with
WHERE ',' + matatt + ',' LIKE '%,' + #matAttrId + ',%'
It works like this: matatt is extended to look like this
,1,12,32,
,3,1,
,9,
Now you can always serach for an id looking like ,id, by using the search pattern %,id,%, where id is the real id.
This works for any number of values per column.

Concatenate fields with different data types

I'm trying to concatenate fields with VarChar and Date into one VarChar field for a database of tourdates. So, imagine a table of, say, Madonna tour dates as such:
TourName TourStDt TourEndDt
Like A Virgin 3/5/1985 12/1/1985
Material Girl 1/15/1986 10/10/1987
I'm sure the dates aren't accurate, but whatever... Anyway, so I want to create a query that concatenates the TourName, TourStDt and TourEndDt fields so that it looks like this:
Like A Virgin (3/5/1985 to 12/1/1985)
Material Girl (1/15/1986 to 10/10/1987)
I wrote a query like this:
Select DISTINCT
TourID,
TourName + '(' + TourStDt + ' to ' + TourEndDt + ')' AS TourName2
from [tblTours]
ORDER BY [TourID] ASC
When I do this, I get an error:
The data types nvarchar and date are incompatible in the add operator.
Can anyone tell me how to produce the results I outlined above?
You need to convert the values to strings:
Select DISTINCT TourID,
TourName + '(' + convert(varchar(255), TourStDt) + ' to ' + convert(varchar(255), TourEndDt) + ')' AS TourName2
from [tblTours]
ORDER BY [TourID] ASC ;
I would suggest that you add a third argument to convert() specifying the date format.

Remove a substring in a varchar field from multiple rows of a table

I would like to consult about the best way to remove a certain substring from a varchar field of every row in a table.
Let's assume I have a single column table, the column names is "user_list" and it is a varchar field that contain user names seperated by ";".
for example:
row1: james;david;moses
row2: mary;moses;terry
row3: ronaldo;messi;zlatan
the lists are not sorted in anyway.
I want to crate a SP that gets a username and removes it from every row it appears,
for instance if the db is the example above and i got as an input 'moses'
I would like it to look like
row1: james;david;
row2: mary;terry
row3: ronaldo;messi;zlatan
I want it to be a single update command and not a cursor, and i'm thinking with myself (and now with you) what is the best way to do it.
Thanks!
You have a very poor data structure. SQL has this great structure for storing lists of things. It is called a "table". In particular, you want a junction table instead of storing values as lists.
That said, you cannot always control how data is structured. The following should help:
update table t
set usernames = replace(replace(';' + usernames + ';', ';' + #UserName + ';', ''), ';;', ';')
where ';' + usernames + ';' like '%;' + #UserName + ';%';
This will put a semicolon at the beginning and the end of the list. If that is a problem, you can remove them using left() or stuff().
EDIT:
To remove the ; at the beginning, use stuff():
update table t
set usernames = stuff(replace(replace(';' + usernames + ';', ';' + #UserName + ';', ''), ';;', ';'), 1, 1, '')
where ';' + usernames + ';' like '%;' + #UserName + ';%';
Okay so I took what Gordon suggested and to resolve the problem i encountered (can be seen in the comments) I did the following (How didn't I think about it in the first place? :( )
update matan_test
SET usernames= replace(replace(mail_list,#UserName+';', ''), #UserName, '')
where usernames like '%'+#UserName+'%';