comparing two strings with a different format but same meaning - sql

Is there a way i can use to compare the two strings 12 AD E4 9F and 12:ad:E4:9f and get a result which says they are similar. There are stored in different tables and i would like to create a view by joining the tables using strings as a joining criteria

You can convert the second string to upper case and then remove all ' ' and ':' before comparing the strings
Select UPPER(REPLACE('12:ad:E4:9f',':',' ') from dual;

One way would be to eliminate the characters before doing the comparison:
where replace(col1, ' ', '') = replace(col2, ':', '')
Or do the replacement of one separator to the other:
where replace(col1, ' ', ':') = col2
In general, SQL is case-insensitive in string comparisons, unless you explicitly set it to be case-sensitive. If you have case sensitivity one, then wrap the above in lower() or upper(). The functions replace(), upper(), and lower() are available in most databases (although some might have slightly different names).

Try this way for tsql.
You should use replace and upper function. In below example col2 is 12:ad:E4:9f
select *
from tab1 t1
join tab2 t2 on upper(t1.col1) = upper(REPLACE((t2.col2,':',''))

REPLACE(LTRIM(RTRIM(ProductDescription)), ':', '')
SELECT UPPER(ProductDescription)
then compare..

Related

SQL SELECT WHERE without underscore and more

I want to select where 2 strings but without taking
underscore
apostrophe
dash..
Hello !
I want to select an option in my SQL database who look like this :
Chef d'équipe aménagement-finitions
With an original tag who look like this
chef-déquipe-aménagement-finitions
Some results in database had a - too
SELECT *
FROM table
WHERE REPLACE(name, '-', ' ') = REPLACE('chef-déquipe-aménagement-finitions', '-', ' ')
didnt work because of missing '
And a double replace didn't work too.
I want the string be able to compare without taking
underscore
apostrophe
dash
and all things like that
is this possible ?
Thanks for your help
Have good day !
Depends on your rdbms, but here's how I would perform in MySQL 8. If using a different version or rdbms, then first determine how to escape the single quote and modify as needed.
with my_data as (
select 'Chef d''équipe aménagement-finitions' as name
)
select name,
lower(replace(replace(name, '\'', ''), ' ', '-')) as name2
from my_data;
name
name2
Chef d'équipe aménagement-finitions
chef-déquipe-aménagement-finitions
Sql-server and Postgres version:
lower(replace(replace(name, '''', ''), ' ', '-')) as name
After posting, this, I re-read and noticed you are also looking to replace other characters. You could either keep layering the replace function, or, look into other functions.

In SQL Server, how can I identify "double" strings and correct?

How can I find strings in a column that are doubled-up and correct them? I feel like there is an easy answer to this I just can't think of it.
Example:
I want to find instances of a repeating string, example "SolonSolon", and then update the column to "Solon".
Update:
They're always the same. No extra characters, but might have a space as part of the repeating value. Other examples would be...
"PlacePlace", "TreeTree", "OrangeOrange", "TravisMemorialHSTravisMemorialHS", "Texas HSTexas HS"
You can check if the string is equal to the first half replicated.
SELECT LEFT(YourCol,LEN(REPLACE(YourCol, ' ', 'x'))/2)
FROM YourTable
WHERE YourCol = REPLICATE(LEFT(YourCol,LEN(REPLACE(YourCol, ' ', 'x'))/2),2)
The reason for the REPLACE of spaces with x before calculating the LEN is because trailing spaces are ignored by this function. You can also use the technique in #lptr's answer for this but an edge case will be if the string was varchar(8000) and already 8000 characters long in which case concatenating an extra character won't do anything (LEN(SPACE(8000) + 'x') is 0).
..replace the first half of the value with an empty string..if there is nothing left..the value consists of two equal parts
select *, substring(c, 1, (len(c+'.')-1)/2)
from
(
values
('solosolo'), ('yoyo'), ('andand'), ('1212'),(' . .'),
('ababc'), ('onetwoone')
) as t(c)
where replace(c, substring(c, 1, (len(c+'.')-1)/2), '') = '';
Another alternative. The query removes inner spaces using REPLACE(str_col, ' ', ''), removes leading/traling spaces using TRIM, and checks to make sure the first half of the string equals the second half.
select left(no_spaces.str_col, v.str_len/2)
from foo f
cross apply (values (replaced trim(f.str_col), ' ', '')) no_spaces(str_col)
cross apply (values (len(no_spaces.str_col))) v(str_len)
where no_spaces.str_col=replicate(left(f.str_col, v.str_len/2), 2);

Oracle remove special characters

I have a column in a table ident_nums that contains different types of ids. I need to remove special characters(e.g. [.,/#&$-]) from that column and replace them with space; however, if the special characters are found at the beginning of the string, I need to remove it without placing a space. I tried to do it in steps; first, I removed the special characters and replaced them with space (I used
REGEXP_REPLACE) then found the records that contain spaces at the beginning of the string and tried to use the TRIM function to remove the white space, but for some reason is not working that.
Here is what I have done
Select regexp_replace(id_num, '[:(),./#*&-]', ' ') from ident_nums
This part works for me, I remove all the unwanted characters from the column, however, if the string in the column starts with a character I don't want to have space in there, I would like to remove just the character, so I tried to use the built-in function TRIM.
update ident_nums
set id_num = TRIM(id_num)
I'm getting an error ORA-01407: can't update ident_nums.id_num to NULL
Any ideas what I am doing wrong here?
It does work if I add a where clause,
update ident_nums
set id_num = TRIM(id_num) where id = 123;
but I need to update all the rows with the white space at the beginning of the string.
Any suggestions are welcome.
Or if it can be done better.
The table has millions of records.
Thank you
Regexp can be slow sometimes so if you can do it by using built-in functions - consider it.
As #Abra suggested TRIM and TRANSLATE is a good choice, but maybe you would prefer LTRIM - removes only leading spaces from string (TRIM removes both - leading and trailing character ). If you want to remove "space" you can ommit defining the trim character parameter, space is default.
select
ltrim(translate('#kdjdj:', '[:(),./#*&-]', ' '))
from dual;
select
ltrim(translate(orginal_string, 'special_characters_to_remove', ' '))
from dual;
Combination of Oracle built-in functions TRANSLATE and TRIM worked for me.
select trim(' ' from translate('#$one,$2-zero...', '#$,-.',' ')) as RESULT
from DUAL
Refer to this dbfiddle
I think trim() is the key, but if you want to keep only alpha numerics, digits, and spaces, then:
select trim(' ' from regexp_replace(col, '[^a-zA-Z0-9 ]', ' ', 1, 0))
regexp_replace() makes it possible to specify only the characters you want to keep, which could be convenient.
Thanks, everyone, It this query worked for me
update update ident_nums
set id_num = LTRIM(REGEXP_REPLACE(id_num, '[:space:]+', ' ')
where REGEXP_LIKE(id_num, '^[ ?]')
this should work for you.
SELECT id_num, length(id_num) length_old, NEW_ID_NUM, length(NEW_ID_NUM) len_NEW_ID_NUM, ltrim(NEW_ID_NUM), length(ltrim(NEW_ID_NUM)) length_after_ltrim
FROM (
SELECT id_num, regexp_replace(id_num, '[:(),./#*&-#]', ' ') NEW_ID_NUM FROM
(
SELECT '1234$%45' as id_num from dual UNION
SELECT '#SHARMA' as id_num from dual UNION
SELECT 'JACK TEST' as id_num from dual UNION
SELECT 'XYZ#$' as id_num from dual UNION
SELECT '#ABCDE()' as id_num from dual -- THe 1st character is space
)
)

Get index of two consecutive upper case characters

I am trying to separate a city/state/zip field into the city, state, and zip. Normally I would do this with charindex of ',' to get the city and state, and isnumeric and right() for the zip.
This will work fine for the zip, but most of the rows in the data I am working with now have no commas City ST Zip. Is there a way to identify the index of two upper case characters?
If not, does anybody have a better idea than just a case statement checking for each state individually?
EDIT: I found the PATINDEX/COLLATE option to work fairly intermittently. See my answer below.
PATINDEX should work for you:
PATINDEX('% [A-Z][A-Z] %', A COLLATE Latin1_general_cs_as)
So your full extract would be something like:
WITH CTE AS
( SELECT i = PATINDEX('% [A-Z][A-Z] %', A COLLATE Latin1_general_cs_as) + 1,
A
FROM (VALUES
('City ST Zip'),
('Another City ST Zip'),
('City, with comma ST Zip')
) t (A)
)
SELECT City = LEFT(A, i - 2),
State = SUBSTRING(A, i, 2),
Zip = SUBSTRING(A, i + 3, LEN(A))
FROM CTE;
Example on SQL Fiddle
The reason why PATINDEX appears to work intermittently is that you cannot use a character range (i.e. A-Z) to accomplish a case-sensitive search, even if using a case-sensitive collation. The issue is that character ranges work like sorting, and case-sensitive sorting groups the upper-case letters with their lower-case equivalents, just like it would be ordered in a dictionary. Range sorting is really: a,A,b,B,c,C,d,D,etc. Or, depending on the collation, it might be: A,a,B,b,C,c,D,d,etc (there are 31 Collations that sort upper-case first). When doing this in a case-sensitive collation, that merely groups all A entries together, separate from the a entries, whereas in a case-insensitive sort they would be intermixed.
But if you specify each of the letters individually (hence not using a range), then it will work as expected:
PATINDEX(N'%[ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]%',
[CityStZip] COLLATE Latin1_General_100_CS_AS)
The reason that PATINDEX and LIKE (both of which allow for a single character class of [A-Z]) work this way is that the [start-end] syntax is not a Regular Expression. Many people claim that PATINDEX and LIKE support "limited" RegEx due to supporting this syntax, but that is not true. It is merely a very similar (and a confusingly similar) syntax to RegEx where [A-Z] would normally not include any lower-case matches.
Of course, if you are guaranteed to only be searching on the US-English letters of A-Z, then a binary collation (i.e. one ending in _BIN2; don't use ones ending in _BIN as they have been deprecated since SQL Server 2005 was introduced, I believe) should work.
PATINDEX(N'%[A-Z][A-Z]%', [CityStZip] COLLATE Latin1_General_100_BIN2)
For more details about case-sensitive matching, especially in regards to including Unicode / NVARCHAR data, please see my related answer on DBA.StackExchange:
How to find values with multiple consecutive upper case characters
If you have zip code and state at the end of the string, then this might work:
select right(address, 5) as zip,
left(right(address, 8), 2) as state,
left(address, len(address) - 9) as city
You can start by removing the commas and double spaces from the address.
If you have a table of states(which you should) with a column of the abbreviations you can do things like this:
SELECT a.* FROM Addresses a
INNER JOIN States s ON
a.CityStateZip Like '% ' + s.UpperCaseAbbreviation + ' %' --space on either side of abbreviation
You can make it work for both commas and spaces:
SELECT a.* FROM Addresses a
INNER JOIN States s ON
Replace(a.CityStateZip, ',' , ' ') Like '% ' + s.UpperCaseAbbreviation + ' %'
I found the PATINDEX/COLLATE option to work fairly intermittently. Here is what I ended up doing:
--get rid of the sparsely used commas
--get rid of the duplicate spaces
update MyTable set
CityStZip=
replace(
replace(
replace(CityStZip,' ',' '),
' ',' '),
',','')
select
--check if state and zip are there and then grab the city
case when isNumeric(right(CityStZip,1))=1
then left(CityStZip,len(CityStZip)-charindex(' ',reverse(CityStZip),
charindex(' ',reverse(CityStZip))+1)+1)
--no zip. check for state
when left(right(CityStZip,3),1) = ' '
then left(CityStZip,len(CityStZip)-charIndex(' ',reverse(CityStZip)))
else CityStZip
end as City,
--check if zip is there and then grab the city
case when isNumeric(right(CityStZip,1))=1
then substring(CityStZip,
len(CityStZip)-charindex(' ',reverse(CityStZip),
charindex(' ',reverse(CityStZip))+1)+2,
2)
--no zip. check if 3rd to last char is a space and grab the last two chars
when left(right(CityStZip,3),1) = ' '
then right(CityStZip,2)
end as [State],
--grab everything after the last space if the last character is numeric
case when isNumeric(right(CityStZip,1))=1
then substring(CityStZip,
len(CityStZip)-charindex(' ',reverse(CityStZip))+1,
charindex(' ',reverse(CityStZip)))
end as Zip
from MyTable

T-SQL Split String Like Clause

I have declare #a varchar(100) = 'abc bcd cde def'. What I need is to select from a table where a column is like 'abc' or 'bcd' or 'cde' or 'def'. I can use a split function and a while to get what I want, but somewhere I saw a smart solution using replace or something similar and I just can't remember it.
I know I can use an xml variable, and parse it that way. However, the value is part of a large procedure, and the best way for me is to use it in string form.
I know I can solve this by building a dynamic sql query, but that is not an option in the domain I'm working in.
Damn, I just can remember the solution. Its a hack, a little dirty trick that do the job.
Anyways, I ll use the code bellow (Im over SQL Server 2008), is it a good idea? I prefer it over the dirty split. Is it more performatic?
declare #w varchar(100) = 'some word'
declare #f xml
set #f = '<word>' + replace(#w, ' ', '</word><word>') + '</word>'
select
template.item.value('.', 'varchar(100)') as word
from #f.nodes('/word') template(item)
Use a function to split the individual items into a table, one record per item. Then you simply join to that table.
insert into #FilterTable (filters)
select Items from dbo.Split(#YourFilterString)
select *
from YourTable yt
join #FilterTable f on f.filters = yt.YourColumn
Of course my example is using equality. It gets more complicated if you truly intend to use "like" with wildcards.
In tsql you can use a pattern col like '[abcd]'
http://msdn.microsoft.com/en-us/library/ms179859.aspx
For matching multiple words (not letter) and without dynamic SQL, you'll have to get the values into a temp table. For a split function try this page http://www.sommarskog.se/arrays-in-sql-2005.html#iterative and look at the List of Strings function iter_charlist_to_table.
Or maybe you are thinking of this little trick Parameterize an SQL IN clause from the SO CEO.
for 4 sections max
WHERE
PARSENAME(REPLACE(#a, ' ', '.'), 1) = 'xxx'
OR
PARSENAME(REPLACE(#a, ' ', '.'), 2) = 'xxx'
OR
PARSENAME(REPLACE(#a, ' ', '.'), 3) = 'xxx'
OR
PARSENAME(REPLACE(#a, ' ', '.'), 4) = 'xxx'