In SQL Server, how can I identify "double" strings and correct? - sql

How can I find strings in a column that are doubled-up and correct them? I feel like there is an easy answer to this I just can't think of it.
Example:
I want to find instances of a repeating string, example "SolonSolon", and then update the column to "Solon".
Update:
They're always the same. No extra characters, but might have a space as part of the repeating value. Other examples would be...
"PlacePlace", "TreeTree", "OrangeOrange", "TravisMemorialHSTravisMemorialHS", "Texas HSTexas HS"

You can check if the string is equal to the first half replicated.
SELECT LEFT(YourCol,LEN(REPLACE(YourCol, ' ', 'x'))/2)
FROM YourTable
WHERE YourCol = REPLICATE(LEFT(YourCol,LEN(REPLACE(YourCol, ' ', 'x'))/2),2)
The reason for the REPLACE of spaces with x before calculating the LEN is because trailing spaces are ignored by this function. You can also use the technique in #lptr's answer for this but an edge case will be if the string was varchar(8000) and already 8000 characters long in which case concatenating an extra character won't do anything (LEN(SPACE(8000) + 'x') is 0).

..replace the first half of the value with an empty string..if there is nothing left..the value consists of two equal parts
select *, substring(c, 1, (len(c+'.')-1)/2)
from
(
values
('solosolo'), ('yoyo'), ('andand'), ('1212'),(' . .'),
('ababc'), ('onetwoone')
) as t(c)
where replace(c, substring(c, 1, (len(c+'.')-1)/2), '') = '';

Another alternative. The query removes inner spaces using REPLACE(str_col, ' ', ''), removes leading/traling spaces using TRIM, and checks to make sure the first half of the string equals the second half.
select left(no_spaces.str_col, v.str_len/2)
from foo f
cross apply (values (replaced trim(f.str_col), ' ', '')) no_spaces(str_col)
cross apply (values (len(no_spaces.str_col))) v(str_len)
where no_spaces.str_col=replicate(left(f.str_col, v.str_len/2), 2);

Related

how to replace dots from 2nd occurrence

I have column with software versions. I was trying to remove dot from 2nd occurrence in column, like
select REGEXP_REPLACE('12.5.7.8', '.','');
expected out is 12.578
sample data is here
Is it possible to remove dot from 2nd occurrence
One option is to break this into two pieces:
Get the first number.
Get the rest of the numbers as an array.
Then convert the array to a string with no separator and combine with the first:
select (split_part('12.5.7.8', '.', 1) || '.' ||
array_to_string((REGEXP_SPLIT_TO_ARRAY('12.5.7.8', '[.]'))[2:], '')
)
Another option is to replace the first '.' with something else, then get rid of the '.'s and replace the something else with a '|':
select translate(regexp_replace(version, '^([^.]+)[.](.*)$', '\1|\2'), '|.', '.')
from software_version;
Here is a db<>fiddle with the three versions, including the version a_horse_with_no_name mentions in the comment.
I'd just take the left and right:
concat(
left(str, position('.' in str)),
replace(right(str, -position('.' in str)), '.', '')
)
For a str of 12.34.56.78, the left gives 12. and the right using a negative position gives 34.56.78 upon which the replace then removes the dots

Oracle remove special characters

I have a column in a table ident_nums that contains different types of ids. I need to remove special characters(e.g. [.,/#&$-]) from that column and replace them with space; however, if the special characters are found at the beginning of the string, I need to remove it without placing a space. I tried to do it in steps; first, I removed the special characters and replaced them with space (I used
REGEXP_REPLACE) then found the records that contain spaces at the beginning of the string and tried to use the TRIM function to remove the white space, but for some reason is not working that.
Here is what I have done
Select regexp_replace(id_num, '[:(),./#*&-]', ' ') from ident_nums
This part works for me, I remove all the unwanted characters from the column, however, if the string in the column starts with a character I don't want to have space in there, I would like to remove just the character, so I tried to use the built-in function TRIM.
update ident_nums
set id_num = TRIM(id_num)
I'm getting an error ORA-01407: can't update ident_nums.id_num to NULL
Any ideas what I am doing wrong here?
It does work if I add a where clause,
update ident_nums
set id_num = TRIM(id_num) where id = 123;
but I need to update all the rows with the white space at the beginning of the string.
Any suggestions are welcome.
Or if it can be done better.
The table has millions of records.
Thank you
Regexp can be slow sometimes so if you can do it by using built-in functions - consider it.
As #Abra suggested TRIM and TRANSLATE is a good choice, but maybe you would prefer LTRIM - removes only leading spaces from string (TRIM removes both - leading and trailing character ). If you want to remove "space" you can ommit defining the trim character parameter, space is default.
select
ltrim(translate('#kdjdj:', '[:(),./#*&-]', ' '))
from dual;
select
ltrim(translate(orginal_string, 'special_characters_to_remove', ' '))
from dual;
Combination of Oracle built-in functions TRANSLATE and TRIM worked for me.
select trim(' ' from translate('#$one,$2-zero...', '#$,-.',' ')) as RESULT
from DUAL
Refer to this dbfiddle
I think trim() is the key, but if you want to keep only alpha numerics, digits, and spaces, then:
select trim(' ' from regexp_replace(col, '[^a-zA-Z0-9 ]', ' ', 1, 0))
regexp_replace() makes it possible to specify only the characters you want to keep, which could be convenient.
Thanks, everyone, It this query worked for me
update update ident_nums
set id_num = LTRIM(REGEXP_REPLACE(id_num, '[:space:]+', ' ')
where REGEXP_LIKE(id_num, '^[ ?]')
this should work for you.
SELECT id_num, length(id_num) length_old, NEW_ID_NUM, length(NEW_ID_NUM) len_NEW_ID_NUM, ltrim(NEW_ID_NUM), length(ltrim(NEW_ID_NUM)) length_after_ltrim
FROM (
SELECT id_num, regexp_replace(id_num, '[:(),./#*&-#]', ' ') NEW_ID_NUM FROM
(
SELECT '1234$%45' as id_num from dual UNION
SELECT '#SHARMA' as id_num from dual UNION
SELECT 'JACK TEST' as id_num from dual UNION
SELECT 'XYZ#$' as id_num from dual UNION
SELECT '#ABCDE()' as id_num from dual -- THe 1st character is space
)
)

Isolating an alphanumerical ID that can be anywhere in a text based cell

I'm trying to isolate a certain character string from a text cell.
For example, I would like to extract "AB-T120-15" from the string "His server ID was AB-T120-15 and his problem was that he needed a reboot"
AB-T120-15 is an example, but they would all be codes of a max length of 13 characters starting by something like AB-T, CL-R, etc.
The codes can appear anywhere in a text field of the column.
string_split() cannot be used since the DB we are under is older.
I have tried many combinations of Substring and LEFT, but I cannot seem to have it worked.
Any thoughts?
String operations are not the strength of SQL Server -- which I assume you are using.
You can do this with rather painful string manipulation:
select left(stuff(str, 1, patindex('%[A-Z][A-Z]-[A-Z]%', str) - 1, ''),
charindex(' ', stuff(str, 1, patindex('%[A-Z][A-Z]-[A-Z]%', str), '') + ' ')
)
from (values ('His server ID was AB-T120-15 and his problem was that he needed a reboot')) v(str);

Separating strings and retrieving. using spaces

The string contains many words separated by spaces e.g.
employee_first_nm = "John Walker"
I want to retrieve the first part alone ("John"). this part I have done using the following code:
SUBSTR(employee_first_nm, 1, INSTR(employee_first_nm, ' '));
In some cases the string has only one word e.g. "sonia", this is where I got a problem. Here if there is only one word the function doesn't retrieve any value at all. But I want it to get the full string in this case i.e. "sonia".
Please help
Simply ensure there always will be a space;
... INSTR(employee_first_nm + ' ', ' ')
If there is already a space in the string then stuffing another one on the end makes no difference as the 1st one will be found, if there is no existing space adding one makes your logic work.
(Usually you also need to INSTR(..)-1 to strip the trailing space)
SUBSTR(employee_first_nm, 1, INSTR(CONCAT(employee_first_nm,' '), ' '));
Alternatively, you could check that INSTR returns a correct value:
(case when INSTR(employee_first_nm, ' ') > 0
then SUBSTR(employee_first_nm, 1, INSTR(employee_first_nm, ' '))
else employee_first_nm
end)
Why not harness the power of regular expressions here.
REGEXP_SUBSTR(employee_first_nm, '^[a-zA-Z]*\s?')
This should return the substring you desire.
You can try:
SUBSTRING(select employee_first_nm,0, charindex(' ',employee_first_nm)
Tested with:
select substring('John walker',0, charindex(' ','John walker more'))
returns -> 'John'

How can I remove leading and trailing quotes in SQL Server?

I have a table in a SQL Server database with an NTEXT column. This column may contain data that is enclosed with double quotes. When I query for this column, I want to remove these leading and trailing quotes.
For example:
"this is a test message"
should become
this is a test message
I know of the LTRIM and RTRIM functions but these workl only for spaces. Any suggestions on which functions I can use to achieve this.
I have just tested this code in MS SQL 2008 and validated it.
Remove left-most quote:
UPDATE MyTable
SET FieldName = SUBSTRING(FieldName, 2, LEN(FieldName))
WHERE LEFT(FieldName, 1) = '"'
Remove right-most quote: (Revised to avoid error from implicit type conversion to int)
UPDATE MyTable
SET FieldName = SUBSTRING(FieldName, 1, LEN(FieldName)-1)
WHERE RIGHT(FieldName, 1) = '"'
I thought this is a simpler script if you want to remove all quotes
UPDATE Table_Name
SET col_name = REPLACE(col_name, '"', '')
You can simply use the "Replace" function in SQL Server.
like this ::
select REPLACE('this is a test message','"','')
note: second parameter here is "double quotes" inside two single quotes and third parameter is simply a combination of two single quotes. The idea here is to replace the double quotes with a blank.
Very simple and easy to execute !
My solution is to use the difference in the the column values length compared the same column length but with the double quotes replaced with spaces and trimmed in order to calculate the start and length values as parameters in a SUBSTRING function.
The advantage of doing it this way is that you can remove any leading or trailing character even if it occurs multiple times whilst leaving any characters that are contained within the text.
Here is my answer with some test data:
SELECT
x AS before
,SUBSTRING(x
,LEN(x) - (LEN(LTRIM(REPLACE(x, '"', ' ')) + '|') - 1) + 1 --start_pos
,LEN(LTRIM(REPLACE(x, '"', ' '))) --length
) AS after
FROM
(
SELECT 'test' AS x UNION ALL
SELECT '"' AS x UNION ALL
SELECT '"test' AS x UNION ALL
SELECT 'test"' AS x UNION ALL
SELECT '"test"' AS x UNION ALL
SELECT '""test' AS x UNION ALL
SELECT 'test""' AS x UNION ALL
SELECT '""test""' AS x UNION ALL
SELECT '"te"st"' AS x UNION ALL
SELECT 'te"st' AS x
) a
Which produces the following results:
before after
-----------------
test test
"
"test test
test" test
"test" test
""test test
test"" test
""test"" test
"te"st" te"st
te"st te"st
One thing to note that when getting the length I only need to use LTRIM and not LTRIM and RTRIM combined, this is because the LEN function does not count trailing spaces.
I know this is an older question post, but my daughter came to me with the question, and referenced this page as having possible answers. Given that she's hunting an answer for this, it's a safe assumption others might still be as well.
All are great approaches, and as with everything there's about as many way to skin a cat as there are cats to skin.
If you're looking for a left trim and a right trim of a character or string, and your trailing character/string is uniform in length, here's my suggestion:
SELECT SUBSTRING(ColName,VAR, LEN(ColName)-VAR)
Or in this question...
SELECT SUBSTRING('"this is a test message"',2, LEN('"this is a test message"')-2)
With this, you simply adjust the SUBSTRING starting point (2), and LEN position (-2) to whatever value you need to remove from your string.
It's non-iterative and doesn't require explicit case testing and above all it's inline all of which make for a cleaner execution plan.
The following script removes quotation marks only from around the column value if table is called [Messages] and the column is called [Description].
-- If the content is in the form of "anything" (LIKE '"%"')
-- Then take the whole text without the first and last characters
-- (from the 2nd character and the LEN([Description]) - 2th character)
UPDATE [Messages]
SET [Description] = SUBSTRING([Description], 2, LEN([Description]) - 2)
WHERE [Description] LIKE '"%"'
You can use following query which worked for me-
For updating-
UPDATE table SET colName= REPLACE(LTRIM(RTRIM(REPLACE(colName, '"', ''))), '', '"') WHERE...
For selecting-
SELECT REPLACE(LTRIM(RTRIM(REPLACE(colName, '"', ''))), '', '"') FROM TableName
you could replace the quotes with an empty string...
SELECT AllRemoved = REPLACE(CAST(MyColumn AS varchar(max)), '"', ''),
LeadingAndTrailingRemoved = CASE
WHEN MyTest like '"%"' THEN SUBSTRING(Mytest, 2, LEN(CAST(MyTest AS nvarchar(max)))-2)
ELSE MyTest
END
FROM MyTable
Some UDFs for re-usability.
Left Trimming by character (any number)
CREATE FUNCTION [dbo].[LTRIMCHAR] (#Input NVARCHAR(max), #TrimChar CHAR(1) = ',')
RETURNS NVARCHAR(max)
AS
BEGIN
RETURN REPLACE(REPLACE(LTRIM(REPLACE(REPLACE(#Input,' ','¦'), #TrimChar, ' ')), ' ', #TrimChar),'¦',' ')
END
Right Trimming by character (any number)
CREATE FUNCTION [dbo].[RTRIMCHAR] (#Input NVARCHAR(max), #TrimChar CHAR(1) = ',')
RETURNS NVARCHAR(max)
AS
BEGIN
RETURN REPLACE(REPLACE(RTRIM(REPLACE(REPLACE(#Input,' ','¦'), #TrimChar, ' ')), ' ', #TrimChar),'¦',' ')
END
Note the dummy character '¦' (Alt+0166) cannot be present in the data (you may wish to test your input string, first, if unsure or use a different character).
To remove both quotes you could do this
SUBSTRING(fieldName, 2, lEN(fieldName) - 2)
you can either assign or project the resulting value
You can use TRIM('"' FROM '"this "is" a test"') which returns: this "is" a test
CREATE FUNCTION dbo.TRIM(#String VARCHAR(MAX), #Char varchar(5))
RETURNS VARCHAR(MAX)
BEGIN
RETURN SUBSTRING(#String,PATINDEX('%[^' + #Char + ' ]%',#String)
,(DATALENGTH(#String)+2 - (PATINDEX('%[^' + #Char + ' ]%'
,REVERSE(#String)) + PATINDEX('%[^' + #Char + ' ]%',#String)
)))
END
GO
Select dbo.TRIM('"this is a test message"','"')
Reference : http://raresql.com/2013/05/20/sql-server-trim-how-to-remove-leading-and-trailing-charactersspaces-from-string/
I use this:
UPDATE DataImport
SET PRIO =
CASE WHEN LEN(PRIO) < 2
THEN
(CASE PRIO WHEN '""' THEN '' ELSE PRIO END)
ELSE REPLACE(PRIO, '"' + SUBSTRING(PRIO, 2, LEN(PRIO) - 2) + '"',
SUBSTRING(PRIO, 2, LEN(PRIO) - 2))
END
Try this:
SELECT left(right(cast(SampleText as nVarchar),LEN(cast(sampleText as nVarchar))-1),LEN(cast(sampleText as nVarchar))-2)
FROM TableName