How to search whole words in a string that has delimiter ";"? - sql

I have a column that has values like this 'Blood work;MRI;ICC', which can be a string with some words separated by ';'.
I wonder with a like clause, how can I make a query that returns results that when you search by 'Blood work', 'mri', 'icc' but not by 'blood' or 'mr' or 'ic'?

To search for a field in a CSV list, one method is:
where ';' + mycol + ';' like '%;mri;%'
Demo on DB Fiddle:
with
csv as (select 'Blood work;MRI;ICC' v),
match as (select 'mri' m union all select 'Blood work' union all select 'Blood')
select csv.v, match.m,
case when ';' + csv.v + ';' like '%;' + match.m + ';%'
then 'match'
else 'no match'
end matched
from csv
cross join match
v | m | matched
:----------------- | :--------- | :-------
Blood work;MRI;ICC | mri | match
Blood work;MRI;ICC | Blood work | match
Blood work;MRI;ICC | Blood | no match

I would personally use a string splitter:
SELECT {Columns}
FROM dbo.YourTable YT
CROSS APPLY STRING_SPLIT (YT.YourColumn,';') SS
WHERE SS.[Value] = 'mri';
If you're not using SQL Sevrer 2016+, then you can use a custom splitter, like DelimitedSplit8K_LEAD.

Related

PostgreSQL Substring pattern with spaces

I've been struggling with this query trying solutions found in this forum, but I can't go on. I need help.
I have a column that stores ship names througout the ship's life and I want to split them into three columns.
Mainly I have these three options,
a) Only one name
select t2.esp1,t2.espectro,t2.espectro1, t2.id from(
select substring(t.espectro, t.posfin)::varchar as esp1, t.espectro,t.espectro1,t.id from(
select "Id" as id, strpos(shipname, ', ') as posinic, strpos(shipname, ' y ') as posfin,shipname as espectro, shipname1 as espectro1 from ships) t)t2 (esp1, espectro, espectro1, id)
where t2.esp1 not like '% y %'`)
b) two names
select t2.esp1,t2.espectro,t2.espectro1, t2.id from(
select substring(t.espectro,1, t.posfin)::varchar as esp1, t.espectro,t.espectro1,t.id from(
select "Id" as id, strpos(shipname, ', ') as posinic, strpos(shipname ' y ') as posfin,shipname as espectro, shipname1 as espectro1 from ships) t)t2 (esp1, espectro, espectro1, id)
where t2.esp1 not like '%, %'`) and for the second name (`select t2.esp1,t2.espectro,t2.espectro1, t2.id from(
select substring(t.espectro, t.posfin)::varchar as esp2, t.espectro,t.espectro2,t.id from(
select "Id" as id, strpos(shipname, ', ') as posinic, strpos(shipname, ' y ') as posfin,shipname as espectro, shipname2 as espectro2 from ships) t)t2 (esp2, espectro, espectro2, id)
where t2.esp2 like '% y %' and t2.espectro not like '%, %';
and c) three names: I could get first
select substring(t.espectro,1,t.posicion) from(
select strpos(shipname, ',') as posicion,shipname as espectro from ships) t;` and third `select t2.esp3,t2.espectro,t2.espectro3, t2.id from(
select substring(t.espectro, t.posfin)::varchar as esp3, t.espectro,t.espectro3,t.id from(
select "Id" as id, strpos(shipname, ', ') as posinic, strpos(shipname, ' y ') as posfin,shipname as espectro, shipname3 as espectro3 from ships) t)t2 (esp3, espectro, espectro3, id)
where t2.esp3 like '% y %' and t2.espectro like '%, %';
but not second
The three named records look like this:
Nuestra Señora del Rosario, Santo Domingo y San José
I have tried this option:
select substring(t.shipsnames from '%#",_y#"%' for '#') as name2 from ships t
With several changes in the #"pattern#" to find the white spaces and get the second name.
Then I tried this option:
select t2.name2[6:7] from (regexp_split_to_array(t.shipnames, E'\\s+') as name2 from ships t) t2
But It doesn't work because not every record has the same length so some are solved like {"Santo","Domingo"} but other not like {"Rosario",","}.
I am not familiarized with regex sintax, I have found this example in the PostgreSQL documentation. Any hint?
When names should be split whenever they are separated by comma plus optional whitespace or with an y surrounded by mandatory whitespace the following regular expression will work:
\s*,\s*|\s+y\s+
\s: whitespace character, +: at least one, *: zero or more and | means alternation.
Example SQL utilizing this regular expression:
SELECT Id, ShipNamesArray[1] ShipName1, ShipNamesArray[2] ShipName2, ShipNamesArray[3] ShipName3
FROM (
SELECT Id, regexp_split_to_array(Shipnames, '\s*,\s*|\s+y\s+') ShipNamesArray
FROM (VALUES
(1, 'Nuestra Señora del Rosario, Santo Domingo y San José'),
(2, 'Nuestra Señora del Rosario y Santo Domingo'),
(3, 'Nuestra Señora del Rosario')
) AS ExampleShipNames (Id, ShipNames)
) AS SplitShipNames
The SQL will produce this output:
Id | ShipName1 | ShipName2 | ShipName3
-- | -------------------------- | ------------- | ---------
1 | Nuestra Señora del Rosario | Santo Domingo | San José
2 | Nuestra Señora del Rosario | Santo Domingo |
3 | Nuestra Señora del Rosario | |

Remove additional comma without knowing the length of the string

My tables
MyTable
+----+-------+---------------+
| Id | Title | DependencyIds |
+----+-------+---------------+
DependentIds contains values like 14;77;120.
MyDependentTable
+--------------+------+
| DependencyId | Name |
+--------------+------+
Background
I have to select data from MyTable with every dependency from MyDependentTable separated with a comma.
Expected output:
+---------+-------------------------------------+
| Title | Dependencies |
+---------+-------------------------------------+
| Test | ABC, One-two-three, Some Dependency |
+---------+-------------------------------------+
| Example | ABC |
+---------+-------------------------------------+
My query
SELECT t.Title,
(SELECT ISNULL((
SELECT DISTINCT
(
SELECT dt.Name + '',
CASE WHEN DependencyIds LIKE '%;%' THEN ', ' ELSE '' END AS [text()]
FROM MyDependentTable dt
WHERE dt.DependencyId IN (SELECT Value FROM dbo.fSplitIds(t.DependencyIds, ';'))
ORDER BY dt.DependencyId
FOR XML PATH('')
)), '')) Dependencies
FROM dbo.MyTable t
Problem description
The query works, but adds an additional comma when there are multiple dependencies:
+---------+---------------------------------------+
| Title | Dependencies |
+---------+---------------------------------------+
| Test | ABC, One-two-three, Some Dependency, |
+---------+---------------------------------------+
| Example | ABC |
+---------+---------------------------------------+
I can't use SUBSTRING(ISNULL(... because I can't access the length of the string and therefore I'm not able to set the length of the SUBSTRING.
Is there any possibility to get rid of that unnecessary additional comma?
Normally for group concatenation in Sql Server, people will add leading comma and remove it using STUFF function but even that looks ugly.
Outer Apply method looks neat to do this instead of correlated sub-query. In this method we don't have to wrap the SELECT query with ISNULL or STUFF
SELECT DISTINCT t.title,
Isnull(LEFT(dependencies, Len(dependencies) - 1), '')
Dependencies
FROM dbo.mytable t
OUTER apply (SELECT dt.NAME + ','
FROM mydependenttable dt
WHERE dt.dependencyid IN (SELECT value
FROM
dbo.Fsplitids(t.dependencyids,';'))
ORDER BY dt.dependencyid
FOR xml path('')) ou (dependencies)
Here is the method using STUFF.
SELECT t.Title
,STUFF((SELECT ', ' + CAST(dt.Name AS VARCHAR(10)) [text()]
FROM MyDependentTable dt
WHERE dt.DependencyId IN (SELECT Value FROM dbo.fSplitIds(t.DependencyIds, ';'))
ORDER BY dt.DependencyId
FOR XML PATH(''), TYPE).value('.','NVARCHAR(MAX)'),1,2,' ') Dependencies
FROM dbo.MyTable t

SQL Search string from a column in another column

This may have been been asked before but I am not sure how to search for it.
I want to find if the string in Column2 is a part of , or not used at all in Column1
Column1 | Column2
=======================
ABCDE + JKL | XC
XC - PQ | A
XYZ + A | C
AC + PQ | MA
So the result for column2 never used in column 1 would be
C
MA
The description of the problem talks about the string in column2. YOu can do this with some variation on like. In most databases, some variation of:
select t.*
from t
where t.column1 not like '%' || t.column2 || '%';
Some databases spell || as + or even concat(), but the idea is the same.
However, I'm not sure what the sample data is doing. In no case is the string in column2 in column1.
Seems like another regex expressions task with no regex allowed.
Assuming you have expressions containing only letters, you can write the following query:
CREATE TABLE Expressions
(
Column1 varchar(20),
Column2 varchar(20)
)
INSERT Expressions VALUES
('ABCDE + JKL', 'XC'),
('XC - PQ', 'A'),
('XYZ + A', 'C'),
('AC + PQ', 'MA'),
('A+CF', 'ZZ'),
('BB+ZZ+CF', 'YY')
SELECT E1.Column2
FROM Expressions E1
WHERE NOT EXISTS (
SELECT *
FROM Expressions E2
WHERE E1.Column2=E2.Column1 --Exact match
OR PATINDEX(E1.Column2+'[^A-Z]%', E2.Column1) <> 0 --Starts with
OR PATINDEX('%[^A-Z]'+E1.Column2, E2.Column1) <> 0 --Ends with
OR PATINDEX('%[^A-Z]'+E1.Column2+'[^A-Z]%', E2.Column1) <> 0 --In the middle
)
It returns:
Column2
-------
C
MA
YY

Merge multiple rows in a single

I need merge multiple rows in a single row with data concatenated on the columns.
This three lines are result is from my query with INNER JOIN
Name | SC | Type
----------------------
name1 | 121212 | type1
name2 | 123456 | null
name3 | null | type1
I want display result like this:
Name | SC | Type
----------------------
name1; 121212; type1;
name2; 123456; ;
name3; ; type1;
It's a single row, each column with data concatenated with ; and a \n in the end of each data.
The final query need run in SQL Server and Oracle.
I honestly doubt you can use the same query in both oracle and SQL-Server since they both have different functions when it comes to dealing with null values.
For Oracle:
SELECT NVL(Name,'') || ';' as name,
NVL(SC,'') || ';' as SC,
NVL(type,'') || ';' as type
FROM (YourQueryHere)
For SQL-Server
SELECT isnull(Name,'') + ';' as name,
isnull(SC,'') + ';' as SC,
isnull(type,'') + ';' as type
FROM (YourQueryHere)
Note that as #jarlh said, in concatenating side you can use concat(value,value2) which should work both on SQL-Server and Oracle, depending on your version.
You could simply concatenate the fields:
SELECT ISNULL(Name,'') + ';' as Name,
ISNULL(SC, '') + ';' as SC,
ISNULL(Type, '') + ';' as Type
FROM
(
-- whatever is your query goes here...
);

SQL Coldfusion - eliminate the word THE when searching for duplicate names

I have a process that creates a list of possible duplicate companies. The problem is that "The ABC Company, Inc." and "ABC Company, Inc." both in Dallas, TX are probably duplicates but I won't find them with my criteria. I've eliminated the first 4 characters if they are "the " but I also need to check for the right 5 characters if they are " Inc.".
I have a view that creates a column thename. The prefix "the " has been stripped;
SELECT CASE WHEN LEFT(name, 4) = 'The ' THEN RIGHT(name, (len(name) - 4)) ELSE name END AS thename, CASE WHEN CHARINDEX(' ', ltrim(rtrim(Name)))
= 0 THEN ltrim(Name) WHEN CHARINDEX(' ', ltrim(Name)) = 1 THEN ltrim('b') ELSE SUBSTRING(ltrim(Name) + ' x', 1, CHARINDEX(' ', ltrim(Name))) END AS subname,
CHARINDEX(' ', LTRIM(Name)) AS wordcheck, Name, Address_Line_1, City AS Company_City, State AS Company_State, Zip, Area_Code, Phone, Status_Flag, ID,
Not_Dupe_Flag, DUNS, Temp_Check_Dupes_Flag, Parent_Company_Number, Special_Display,
CASE WHEN c.parent_company_number = 0 THEN c.id ELSE c.parent_company_number END AS parent
FROM dbo.Companies AS c
Then I use that view in my query to look for duplicates;
<cfquery name="qResults" datasource="#request.dsnlive#" timeout="200">
SELECT b.ID,
Thename,
substring(TheName,1,(CHARINDEX(' ',TheNAME,1))) as subName,
name,
b.address_line_1,
b.zip,
b.company_state,
b.company_city,
b.area_code,
b.phone,
b.Special_Display,
isnull(not_dupe_flag,'False') as not_dupe_flag,
isnull(Temp_Check_Dupes_Flag,'False') as Temp_Check_Dupes_Flag,
b.id as bID,
b.duns
FROM dbo.vw_Comp_Details_withFirstWord as b
WHERE isnull(b.status_flag,'') != 'D'
and b.ID <> #arguments.CompNum#
and isnull(b.Temp_Check_Dupes_Flag,'False') = 'False'
<cfif arguments.IncludeDunsOnly eq 0>
<cfif arguments.FirstWord>
AND b.subName = '#arguments.CompanySubName#'
<cfelse>
AND (substring(dbo.KeepAlphaNumCharacters(Thename),1,#val(arguments.WordLength)#) = substring('#arguments.CompanyName#',1,#val(arguments.WordLength)#)
or differnce(soundex(Thename),soundex('#arguments.CompanyName#')) > 2)
</cfif>
AND (
( company_city = '#arguments.City#'
AND Isnull(company_city, '') > '' )
AND ( b.parent != #val(arguments.Parent)#
AND Isnull(b.parent, '0') > 0 )
)
<cfif arguments.IncludeDuns>
AND (
( REPLACE(LTRIM(REPLACE(b.duns, '0', ' ')), ' ', '0') = '#val(arguments.Duns)#'
AND REPLACE(LTRIM(REPLACE(b.duns, '0', ' ')), ' ', '0') > ' '
AND #val(arguments.Duns)# > 0 )
or REPLACE(LTRIM(REPLACE(b.duns, '0', ' ')), ' ', '0') = ' '
)
</cfif>
<cfelse>
and (REPLACE(LTRIM(REPLACE(b.duns, '0', ' ')), ' ', '0') = '#val(arguments.Duns)#')
</cfif>
</cfquery>
Now I need to add code to strip the suffix " Inc." but I can't seem to come up with the logic to end up with a column that contains the name without the prefix "The " and the suffix " Inc."
I will like to share my question from some days ago. This was made in Postgres but Im sure you can find an equivalent for split string into rows for your rdbms.
What you do is split the string and remove the offending string like The or Inc
SQL Fiddle Demo
| ID | token |
|----|---------|
| 1 | The |
| 1 | ABC |
| 1 | Company |
| 1 | Inc. |
| 2 | ABC |
| 2 | Company |
| 2 | Inc. |
| 3 | ABC |
| 3 | Company |
Then you go the other way and join the remaining strings together postgres use string_agg() MSsql use XML PATH, etc
Many possible ways to do this. Consider if you want to have a fulltext index won the field which can then search for similar names and eliminate noise words like the. Or you can use an SSIS package to do fuzzy matching (this would also help with abbreviations vice spelling the whole word out). Or you can use Data Quality Services which is probably your best bet.
https://msdn.microsoft.com/en-us/library/ff877917.aspx