SQL Search string from a column in another column - sql

This may have been been asked before but I am not sure how to search for it.
I want to find if the string in Column2 is a part of , or not used at all in Column1
Column1 | Column2
=======================
ABCDE + JKL | XC
XC - PQ | A
XYZ + A | C
AC + PQ | MA
So the result for column2 never used in column 1 would be
C
MA

The description of the problem talks about the string in column2. YOu can do this with some variation on like. In most databases, some variation of:
select t.*
from t
where t.column1 not like '%' || t.column2 || '%';
Some databases spell || as + or even concat(), but the idea is the same.
However, I'm not sure what the sample data is doing. In no case is the string in column2 in column1.

Seems like another regex expressions task with no regex allowed.
Assuming you have expressions containing only letters, you can write the following query:
CREATE TABLE Expressions
(
Column1 varchar(20),
Column2 varchar(20)
)
INSERT Expressions VALUES
('ABCDE + JKL', 'XC'),
('XC - PQ', 'A'),
('XYZ + A', 'C'),
('AC + PQ', 'MA'),
('A+CF', 'ZZ'),
('BB+ZZ+CF', 'YY')
SELECT E1.Column2
FROM Expressions E1
WHERE NOT EXISTS (
SELECT *
FROM Expressions E2
WHERE E1.Column2=E2.Column1 --Exact match
OR PATINDEX(E1.Column2+'[^A-Z]%', E2.Column1) <> 0 --Starts with
OR PATINDEX('%[^A-Z]'+E1.Column2, E2.Column1) <> 0 --Ends with
OR PATINDEX('%[^A-Z]'+E1.Column2+'[^A-Z]%', E2.Column1) <> 0 --In the middle
)
It returns:
Column2
-------
C
MA
YY

Related

Condition check to get output

I have 2 tables name and match. The name and match table have columns type.
The columns and data in the name table
ID| type |
--| ---- |
1| 1ABC |
2| 2DEF |
3| 3DEF |
4| 4IJK |
The columns and data in match table is
type
DATA
NOT %ABC% AND NOT %DEF%
NOT ABC AND NOT DEF
%DEF%
DEF ONLY
NOT %DEF% AND NOT %IJK%
NOT DEF AND NOT IJK
I have tried using case statement. The first 3 characters will be NOT if there is a NOT in the type in match table.
The below query is giving me a missing keyword error. I am not sure what I am missing here
SELECT s.id, s.type, m.data
where case when substr(m.type1,3)='NOT' then s.type not in (REPLACE(REPLACE(m.type,'NOT',''),'AND',','))
ELSE s.type in (m.type) end
from source s, match m;
I need the output to match the type in source column and display the data in match column.
The output should be
ID|type|DATA
1 |1ABC|NOT DEF AND NOT IJK
2 |2DEF|DEF ONLY
3 |3DEF|DEF ONLY
4 |4IJK|NOT ABC AND NOT DEF
The biggest problem with your attempted query seems to be that SQL requires the WHERE clause to come after the FROM clause.
But your query is flawed in other ways as well. Although it can have complicated logic within it, including subqueries, a CASE statement must ultimately return a constant. Conditions within it are not applied as if they are in a WHERE clause of the main query (like you appear to be trying to do).
My recommendation would be to not store the match table as you currently are. It seems much preferable to have something that contains each condition you want to evaluate. Assuming that's not possible, I suggest a CTE (or even a view) that breaks it down that way first.
This query (based on Nefreo's answer for breaking strings into multiple rows)...
SELECT
data,
regexp_count(m.type, ' AND ') + 1 num,
CASE WHEN REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value) like 'NOT %' THEN 1 ELSE 0 END negate,
replace(replace(REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value), 'NOT '), ' AND ') match
FROM match m INNER JOIN
table(cast(multiset(select level from dual connect by level <= regexp_count(m.type, ' AND ') + 1) as sys.OdciNumberList)) levels
ON 1=1
... breaks your match table into something more like:
DATA
NUM
NEGATE
MATCH
NOT ABC AND NOT DEF
2
1
%ABC%
NOT ABC AND NOT DEF
2
1
%DEF%
DEF ONLY
1
0
%DEF%
NOT DEF AND NOT IJK
2
1
%DEF%
NOT DEF AND NOT IJK
2
1
%IJK%
So we now know each specific like condition, whether it should be negated, and the number of conditions that need to be matched for each MATCH row. (For simplicity, I am using match.data as essentially a key for this since it is unique for each row in match and is what we want to return anyway, but if you were actually storing the data this way you'd probably use a sequence of some sort and not repeat the human-readable text.)
That way, your final query can be quite simple:
SELECT name.id, name.type, criteria.data
FROM name INNER JOIN criteria
ON
(criteria.negate = 0 AND name.type LIKE criteria.match)
OR
(criteria.negate = 1 AND name.type NOT LIKE criteria.match)
GROUP BY name.id, name.type, criteria.data
HAVING COUNT(*) = MAX(criteria.num)
ORDER BY name.id
The conditions in the ON do the appropriate LIKE or NOT LIKE (matches one condition from the CRITERIA view/CTE), and the condition in the HAVING makes sure we had the correct number of total matches to return the row (makes sure we matched all the conditions in one row of the MATCH table).
You can see the entire thing...
WITH criteria AS
(
SELECT
data,
regexp_count(m.type, ' AND ') + 1 num,
CASE WHEN REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value) like 'NOT %' THEN 1 ELSE 0 END negate,
replace(replace(REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value), 'NOT '), ' AND ') match
FROM match m INNER JOIN
table(cast(multiset(select level from dual connect by level <= regexp_count(m.type, ' AND ') + 1) as sys.OdciNumberList)) levels
ON 1=1
)
SELECT name.id, name.type, criteria.data
FROM name INNER JOIN criteria
ON
(criteria.negate = 0 AND name.type LIKE criteria.match)
OR
(criteria.negate = 1 AND name.type NOT LIKE criteria.match)
GROUP BY name.id, name.type, criteria.data
HAVING COUNT(*) = MAX(criteria.num)
ORDER BY name.id
... working in this fiddle.
As a one-off, I don't think this is significantly different than the other answer already provided, but I wanted to do this since I think this is probably more maintainable if the complexity of your conditions changes.
It already handles arbitrary numbers of conditions, mixes of NOT and not-NOT within the same row of MATCH, and allows for the % signs (for the like) to be placed arbitrarily (e.g. startswith%, %endswith, %contains%, start%somewhere%end, exactmatch should all work as expected). If in the future you want to add different types of conditions or handle ORs, I think the general ideas here will apply.
Not knowing the possible other rules for selecting rows, just with your data from the question, maybe you could use this:
WITH
tbl_name AS
(
Select 1 "ID", '1ABC' "A_TYPE" From Dual Union All
Select 2 "ID", '2DEF' "A_TYPE" From Dual Union All
Select 3 "ID", '3DEF' "A_TYPE" From Dual Union All
Select 4 "ID", '4IJK' "A_TYPE" From Dual
),
tbl_match AS
(
Select 'NOT %ABC% AND NOT %DEF%' "A_TYPE", 'NOT ABC AND NOT DEF' "DATA" From Dual Union All
Select '%DEF%' "A_TYPE", 'DEF ONLY' "DATA" From Dual Union All
Select 'NOT %DEF% AND NOT %IJK%' "A_TYPE", 'NOT DEF AND NOT IJK' "DATA" From Dual
)
Select
n.ID "ID",
n.A_TYPE,
m.DATA
From
tbl_match m
Inner Join
tbl_name n ON (1=1)
Where
(
INSTR(m.A_TYPE, 'NOT %' || SubStr(n.A_TYPE, 2) || '%', 1, 1) = 0
AND
INSTR(m.A_TYPE, 'NOT %' || SubStr(n.A_TYPE, 2) || '%', 1, 2) = 0
AND
Length(m.A_TYPE) > Length(SubStr(n.A_TYPE, 2)) + 2
)
OR
(
Length(m.A_TYPE) = Length(SubStr(n.A_TYPE, 2)) + 2
AND
'%' || SubStr(n.A_TYPE, 2) || '%' = m.A_TYPE
)
Order By n.ID
Result:
ID
A_TYPE
DATA
1
1ABC
NOT DEF AND NOT IJK
2
2DEF
DEF ONLY
3
3DEF
DEF ONLY
4
4IJK
NOT ABC AND NOT DEF
Any other format of condition should be evaluated separately ...
Regards...
WITH match_cte AS (
SELECT m.data
,m.type
,decode(instr(m.type,'NOT')
,1 -- found at position 1
,0
,1) should_find_str_1
,substr(m.type
,instr(m.type,'%',1,1) + 1
,instr(m.type,'%',1,2) - instr(m.type,'%',1,1) - 1) str_1
,decode(instr(m.type,'NOT',instr(m.type,'%',1,2))
,0 -- no second NOT
,1
,0) should_find_str_2
,substr(m.type
,instr(m.type,'%',1,3) + 1
,instr(m.type,'%',1,4) - instr(m.type,'%',1,3) - 1) str_2
FROM match m
)
SELECT s.id
,s.type
,m.data
FROM source s
CROSS JOIN match_cte m
WHERE m.should_find_str_1 = sign(instr(s.type,m.str_1))
AND (m.str_2 IS NULL
OR m.should_find_str_2 = sign(instr(s.type, m.str_2))
)
ORDER BY s.id, m.data
MATCH_CTE
|DATA|TYPE|SHOULD_FIND_STR_1|STR_1|SHOULD_FIND_STR_2|STR_2|
|-|-|-|-|-|-|
|NOT ABC AND NOT DEF|NOT %ABC% AND NOT %DEF%|0|ABC|0|DEF|
|DEF|%DEF%|1|DEF|1|NULL|
|NOT DEF AND NOT IJK|NOT %DEF% AND NOT %IJK%|0|DEF|0|IJK|

Check if a value starts with one of the value of a column

Hi I'd like to check if a value starts with one of the value of a another column.
t1 | t2
----------
3253 | 123
1234 | 000
9876 | 932
So here for example I should have True for the value 1234 because it starts with 123.
I should have false for the other values.
I can't find any solutions.
Thank you in advance for you help !
I already tried :
t1 LIKE (t2 || '%')
starts_with(t1,t2)
starts_with(t1, (select t2))
Using t1 LIKE (t2 || '%') should be able to get you close to what I think you need, however maybe you are just missing a bit of script logic?
Ignoring any platform specific language (I don't use Big Query), this is just to show the logic that might help you get the result you desire
With the data as:
create table my_table
(t1 number,
t2 number);
insert into my_table values(3253,123);
insert into my_table values(1234,000);
insert into my_table values(9876,932);
You can use a case statement wrapped in a sum to count the matches using like t2||'%. Any value in the resulting column that is 1 or greater should be read as True, and value of 0 as False.
SELECT a.t1,
Sum(CASE
WHEN a.t1 LIKE b.t2
|| '%' THEN 1
ELSE 0
END) AS starts_with
FROM my_table a,
my_table b
GROUP BY a.t1
This gives the output
T1 STARTS_WITH
9876 0
1234 1
3253 0
That works for me :
-- To match your example
WITH T (t1,t2) AS
(
SELECT * FROM (VALUES (3253,123),(1234,000),(9876,932))t(t1,t2)
)
-- What your query should like
SELECT TAB1.t1,TAB1.t2, Tab2.t2
FROM T AS Tab1
CROSS JOIN T AS Tab2
WHERE CAST(TAB1.t1 AS STRING) LIKE (CAST(TAB2.t2 AS STRING) + '%')
Another option to try
select * except(check),
regexp_contains(t1, check)
from (
select *, r'^' || string_agg(t2, '|') over() check
from your_table
)
if applied to sample data in your question - output is

Count frequencies of words separated with multiple spaces

I would like to count the occurrences of all words in a column. The tricky part is that words in a row can appear in long stretches; meaning there are many spaces in-between.
This is a dummy example:
column_name
aaa bbb ccc ddd
[aaa]
bbb
bbb
So far I managed to use the following code
SELECT column_name,
SUM(LEN(column_name) - LEN(REPLACE(column_name, ' ', ''))+1) as counts
FROM
dbo.my_own
GROUP BY
column_name
The code gives me smth like this
column_name counts
aaa bbb ccc ddd 1
[aaa] 1
bbb 2
However, my desired output is:
column_name counts
aaa 1
[aaa] 1
bbb 3
ccc 1
ddd 1
In SQL Server, you would use string_split():
select s.value as word, count(*)
from dbo.my_own o cross apply
string_split(o.column_name, ' ') s
where s.value <> ''
group by s.value;
String manipulation is highly database-dependent. Most databases have some method for doing this, but they can be quite different.
First, take a look at this question to see how to split the words in your column into multiple rows. In that question the words are separated by comma, but, of course, it works the same with spaces.
For your case, assuming a table tablename with an id and your words in columnname, where you have at most 4 words in the column, it would look like this:
SELECT
tablename.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(tablename.columnname, ' ', numbers.n), ' ', -1) columnname
FROM
(SELECT 1 AS n UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4) numbers
INNER JOIN tablename
ON LENGTH(tablename.columnname) - LENGTH(REPLACE(tablename.columnname, ' ', '')) >= numbers.n - 1
ORDER BY
id, n
Then, you can simply count the words:
SELECT columnname, count(*) FROM (
SELECT
tablename.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(tablename.columnname, ' ', numbers.n), ' ', -1) columnname
FROM
(SELECT 1 AS n UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4) numbers
INNER JOIN tablename
ON LENGTH(tablename.columnname) - LENGTH(REPLACE(tablename.columnname, ' ', '')) >= numbers.n - 1
ORDER BY
id, n
) normalized
GROUP BY columnname
If you have more than 4 words in your column, you need to expand the select from numbers accordingly.
Edit: Oh, I am late, and I assumed MySQL.

T-SQL Comma delimited value from resultset to in clause in Subquery

I have an issue where in my data I will have a record returned where a column value will look like
-- query
Select col1 from myTable where id = 23
-- result of col1
111, 104, 34, 45
I want to feed these values to an in clause. So far I have tried:
-- Query 2 -- try 1
Select * from mytableTwo
where myfield in (
SELECT col1
from myTable where id = 23)
-- Query 2 -- try 2
Select * from mytableTwo
where myfield in (
SELECT '''' +
Replace(col1, ',', ''',''') + ''''
from myTable where id = 23)
-- query 2 test -- This works and will return data, so I verify here that data exists
Select * from mytableTwo
where myfield in ('111', '104', '34', '45')
Why aren't query 2 try 1 or 2 working?
You don't want an in clause. You want to use like:
select *
from myTableTwo t2
where exists (select 1
from myTable t
where id = 23 and
', '+t.col1+', ' like '%, '+t2.myfield+', %'
);
This uses like for the comparison in the list. It uses a subquery for the value. You could also phrase this as a join by doing:
select t2.*
from myTableTwo t2 join
myTable t
on t.id = 23 and
', '+t.col1+', ' like '%, '+t2.myfield+', %';
However, this could multiply the number of rows in the output if there is more than one row with id = 23 in myTable.
If you observe closely, Query 2 -- try 1 & Query 2 -- try 2 are considered as single value.
like this :
WHERE myfield in ('111, 104, 34, 45')
which is not same as :
WHERE myfield in ('111', '104', '34', '45')
So, If you intend to filter myTable rows from MyTableTwo, you need to extract the values of fields column data to a table variable/table valued function and filter the data.
I have created a table valued function which takes comma seperated string and returns a table value.
you can refer here T-SQL : Comma separated values to table
Final code to filter the data :
DECLARE #filteredIds VARCHAR(100)
-- Get the filter data
SELECT #filteredIds = col1
FROM myTable WHERE id = 23
-- TODO : Get the script for [dbo].[GetDelimitedStringToTable]
-- from the given link and execute before this
SELECT *
FROM mytableTwo T
CROSS APPLY [dbo].[GetDelimitedStringToTable] ( #filteredIds, ',') F
WHERE T.myfield = F.Value
Please let me know If this helps you!
I suppose col is a character type, whose result would be like like '111, 104, 34, 45'. If this is your situation, it's not the best of the world (denormalized database), but you can still relate these tables by using character operators like LIKE or CHARINDEX. The only gotcha is to convert the numeric column to character -- the default conversion between character and numeric is numeric and it will cause a conversion error.
Since #Gordon, responded using LIKE, I present a solution using CHARINDEX:
SELECT *
FROM mytableTwo tb2
WHERE EXISTS (
SELECT 'x'
FROM myTable tb1
WHERE tb1.id = 23
AND CHARINDEX(CONVERT(VARCHAR(20), tb2.myfield), tb1.col1) > 0
)

Remove last 2 digits from column1 and append those 2 digits to column2

I have 2 columns of string data type named column1 and column2. Both the columns contains data in the following way.
**
Column1 Column2
ABCD12 5678ABC
ABCD99 2341KFJ
GDHF33 1233DFG
**
now i want to remove last 2 digits from column 1 and append it to Column2. And my data has to be in this way.
**
Column1 Column2
ABCD 12 5678ABC
ABCD 99 2341KFJ
GDHF 33 1233DFG
**
how can this be done in SQL Server and SSIS.
if the last two digits are fixed, you can use LEFT and RIGHT functions.
SELECT LEFT(Column1, LEN(Column1) - 2) Edited_Column1,
RIGHT(Column1, 2) + ' ' + Column2 Edited_Column2
FROM table1
SQLFiddle Demo
SELECT LEFT(Column1, LEN(Column1) - 2) Column1,
RIGHT(Column1, 2) + ' ' + Column2 Column2
FROM table1