How to select specific row in SQL from a bad designed schema? - sql

I have a string in a column of a db schema I did not design, like this:
numbers column
--------------------
First: 1,2,33,34,43,5
Second: 1,2,3,4,5
Despite I know this is not the best practice scenario, I would still want to select the row which contains only '3' value, not '33' or '34' or '43'.
How could I select only second row?
SELECT *
FROM tblNumbers
WHERE numbers like '%,3,%' OR numbers like '3,%' OR numbers like '%,3'
This query selected both 2 columns. How can I do this, to get just the second row?
Here is my problem:
Thanks.

You should be storing the values in a separate table, with one row per column and per number.
Sometimes, though, we are stuck with other peoples bad data structures. If so, you can do what you want in this rather cumbersome way:
where replace(replace(numbers, '{', ','), '}', ',') like '%,3,%'
That is, put the delimiters around all the numbers in numbers.
Let me repeat, though: the proper way to store this data is using a separate table. If you need to store multiple values in a column like this, then do some research on XML and JSON formats (which are supported only in the most recent version of SQL Server).
EDIT:
Exactly the same idea applies, the code is just simpler:
where ',' + numbers + ',' like '%,3,%'

Did you try it like this?
SELECT *
FROM tblNumbers
WHERE number = '3' OR ReportedGMY = '3'

if you are storing numbers as integers
SELECT *
FROM tblNumbers
WHERE number = '3'
if you are storing as string
SELECT *
FROM tblNumbers
WHERE number like "3"

Its is bad practice to save command separated value in a column. This should be avoid as much as possible. If you really need to do it, then can be done using user defined function.
CREATE FUNCTION dbo.HasDigit (#String VARCHAR(MAX), #DigitToCheck INT, #Delimiter VARCHAR(10))
RETURNS BIT
AS
BEGIN
DECLARE #DelimiterPosition INT
DECLARE #Digit INT
DECLARE #ContainsDigit BIT = 0
WHILE CHARINDEX(#Delimiter, #String) > 0
BEGIN
SELECT #DelimiterPosition = CHARINDEX(#Delimiter, #String)
SELECT #Digit = CAST(SUBSTRING(#String, 1, #DelimiterPosition - 1) AS INT)
IF(#Digit = #DigitToCheck)
BEGIN
SET #ContainsDigit = 1
END
SELECT #String = SUBSTRING(#String, #DelimiterPosition + 1, LEN(#String) - #DelimiterPosition)
END
RETURN #ContainsDigit
END;
GO
CREATE TABLE TEST (
Numbers VARCHAR(MAX),
COLUMNNAME VARCHAR(MAX)
)
GO
INSERT INTO TEST VALUES('First:', '1,2,33,34,43,5')
INSERT INTO TEST VALUES('Second:', ' 1,2,3,4,5')
GO
SELECT * FROM TEST WHERE dbo.HasDigit(COLUMNNAME, 3, ',') = 1
Output:
--Numbers COLUMNNAME
--------- ----------------
--Second: 1,2,3,4,5

Related

How to replace all special characters in string

I have a table with the following columns:
dbo.SomeInfo
- Id
- Name
- InfoCode
Now I need to update the above table's InfoCode as
Update dbo.SomeInfo
Set InfoCode= REPLACE(Replace(RTRIM(LOWER(Name)),' ','-'),':','')
This replaces all spaces with - & lowercase the name
When I do check the InfoCode, I see there are Names with some special characters like
Cathe Friedrich''s Low Impact
coffeyfit-cardio-box-&-burn
Jillian Michaels: Cardio
Then I am manually writing the update sql against this as
Update dbo.SomeInfo
SET InfoCode= 'cathe-friedrichs-low-impact'
where Name ='Cathe Friedrich''s Low Impact '
Now, this solution is not realistic for me. I checked the following links related to Regex & others around it.
UPDATE and REPLACE part of a string
https://www.codeproject.com/Questions/456246/replace-special-characters-in-sql
But none of them is hitting the requirement.
What I need is if there is any character other [a-z0-9] replace it - & also there should not be continuous -- in InfoCode
The above Update sql has set some values of InfoCode as the-dancer's-workout®----starter-package
Some Names have value as
Sleek Technique™
The Dancer's-workout®
How can I write Update sql that could handle all such special characters?
Using NGrams8K you could split the string into characters and then rather than replacing every non-acceptable character, retain only certain ones:
SELECT (SELECT '' + CASE WHEN N.token COLLATE Latin1_General_BIN LIKE '[A-z0-9]'THEN token ELSE '-' END
FROM dbo.NGrams8k(V.S,1) N
ORDER BY position
FOR XML PATH(''))
FROM (VALUES('Sleek Technique™'),('The Dancer''s-workout®'))V(S);
I use COLLATE here as on my default collation in my instance the '™' is ignored, therefore I use a binary collation. You may want to use COLLATE to switch the string back to its original collation outside of the subquery.
This approach is fully inlinable:
First we need a mock-up table with some test data:
DECLARe #SomeInfo TABLE (Id INT IDENTITY, InfoCode VARCHAR(100));
INSERT INTO #SomeInfo (InfoCode) VALUES
('Cathe Friedrich''s Low Impact')
,('coffeyfit-cardio-box-&-burn')
,('Jillian Michaels: Cardio')
,('Sleek Technique™')
,('The Dancer''s-workout®');
--This is the query
WITH cte AS
(
SELECT 1 AS position
,si.Id
,LOWER(si.InfoCode) AS SourceText
,SUBSTRING(LOWER(si.InfoCode),1,1) AS OneChar
FROM #SomeInfo si
UNION ALL
SELECT cte.position +1
,cte.Id
,cte.SourceText
,SUBSTRING(LOWER(cte.SourceText),cte.position+1,1) AS OneChar
FROM cte
WHERE position < DATALENGTH(SourceText)
)
,Cleaned AS
(
SELECT cte.Id
,(
SELECT CASE WHEN ASCII(cte2.OneChar) BETWEEN 65 AND 90 --A-Z
OR ASCII(cte2.OneChar) BETWEEN 97 AND 122--a-z
OR ASCII(cte2.OneChar) BETWEEN 48 AND 57 --0-9
--You can easily add more ranges
THEN cte2.OneChar ELSE '-'
--You can easily nest another CASE to deal with special characters like the single quote in your examples...
END
FROM cte AS cte2
WHERE cte2.Id=cte.Id
ORDER BY cte2.position
FOR XML PATH('')
) AS normalised
FROM cte
GROUP BY cte.Id
)
,NoDoubleHyphens AS
(
SELECT REPLACE(REPLACE(REPLACE(normalised,'-','<>'),'><',''),'<>','-') AS normalised2
FROM Cleaned
)
SELECT CASE WHEN RIGHT(normalised2,1)='-' THEN SUBSTRING(normalised2,1,LEN(normalised2)-1) ELSE normalised2 END AS FinalResult
FROM NoDoubleHyphens;
The first CTE will recursively (well, rather iteratively) travers down the string, character by character and a return a very slim set with one row per character.
The second CTE will then GROUP the Ids. This allows for a correlated sub-query, where the actual check is performed using ASCII-ranges. FOR XML PATH('') is used to re-concatenate the string. With SQL-Server 2017+ I'd suggest to use STRING_AGG() instead.
The third CTE will use a well known trick to get rid of multiple occurances of a character. Take any two characters which will never occur in your string, I use < and >. A string like a--b---c will come back as a<><>b<><><>c. After replacing >< with nothing we get a<>b<>c. Well, that's it...
The final SELECT will cut away a trailing hyphen. If needed you can add similar logic to get rid of a leading hyphen. With v2017+ There was TRIM('-') to make this easier...
The result
cathe-friedrich-s-low-impact
coffeyfit-cardio-box-burn
jillian-michaels-cardio
sleek-technique
the-dancer-s-workout
You can create a User-Defined-Function for something like that.
Then use the UDF in the update.
CREATE FUNCTION [dbo].LowerDashString (#str varchar(255))
RETURNS varchar(255)
AS
BEGIN
DECLARE #result varchar(255);
DECLARE #chr varchar(1);
DECLARE #pos int;
SET #result = '';
SET #pos = 1;
-- lowercase the input and remove the single-quotes
SET #str = REPLACE(LOWER(#str),'''','');
-- loop through the characters
-- while replacing anything that's not a letter to a dash
WHILE #pos <= LEN(#str)
BEGIN
SET #chr = SUBSTRING(#str, #pos, 1)
IF #chr LIKE '[a-z]' SET #result += #chr;
ELSE SET #result += '-';
SET #pos += 1;
END;
-- SET #result = TRIM('-' FROM #result); -- SqlServer 2017 and beyond
-- multiple dashes to one dash
WHILE #result LIKE '%--%' SET #result = REPLACE(#result,'--','-');
RETURN #result;
END;
GO
Example snippet using the function:
-- using a table variable for demonstration purposes
declare #SomeInfo table (Id int primary key identity(1,1) not null, InfoCode varchar(100) not null);
-- sample data
insert into #SomeInfo (InfoCode) values
('Cathe Friedrich''s Low Impact'),
('coffeyfit-cardio-box-&-burn'),
('Jillian Michaels: Cardio'),
('Sleek Technique™'),
('The Dancer''s-workout®');
update #SomeInfo
set InfoCode = dbo.LowerDashString(InfoCode)
where (InfoCode LIKE '%[^A-Z-]%' OR InfoCode != LOWER(InfoCode));
select *
from #SomeInfo;
Result:
Id InfoCode
-- -----------------------------
1 cathe-friedrichs-low-impact
2 coffeyfit-cardio-box-burn
3 jillian-michaels-cardio
4 sleek-technique-
5 the-dancers-workout-

Extracting specific column values embedded within composite Strings of codes

I am trying to create a piece of code in sql server 2008 that will grab specific values from each distinct string within my dbo table. The ultimate goal is to make a drop down box within Visual Studio so that one can choose all lines from the database that contain a specific product code (see definition of product code below). Example strings:
in_0314_95pf_500_w_0315
in_0314_500_95pf_0315_w
The part of these strings I am wishing to identify is the 3 digit numeric code (in this case let us call it product code) that appears once within each string. There are roughly 300 different product codes.
The problem is that these product code values do not appear in the same position within each unique string. Hence, I am having a hard time determining the product code because I can't use substring, charindex, like, etc.
Any ideas? Any help is MUCH appreciated.
This can be done with PATINDEX:
DECLARE #s NVARCHAR(100) = 'in_0314_95pf_500_w_0315'
SELECT SUBSTRING(#s, PATINDEX('%[_][0-9][0-9][0-9][_]%', #s) + 1, 3)
Output:
500
If there are no underscores then:
SELECT SUBSTRING(#s, PATINDEX('%[^0-9][0-9][0-9][0-9][^0-9]%', #s) + 1, 3)
This means 3 digits between any symbols that are not digits.
EDIT:
Apply to table like:
SELECT SUBSTRING(ColumnName, PATINDEX('%[^0-9][0-9][0-9][0-9][^0-9]%', ColumnName) + 1, 3)
FROM TableName
One approach is to use a String splitting table function like this one which breaks the string up into its components. You can then filter the components based on your criteria:
SELECT Name
FROM dbo.splitstring('in_0314_95pf_500_w_0315', '_')
WHERE ISNUMERIC(Name) = 1 AND LEN(Name) = 3;
I've amended the function slightly to accept the delimiter as a parameter.
CREATE FUNCTION dbo.splitstring ( #stringToSplit VARCHAR(MAX), #delimiter VARCHAR(50))
RETURNS
#returnList TABLE ([Name] [nvarchar] (500))
AS
BEGIN
DECLARE #name NVARCHAR(255)
DECLARE #pos INT
WHILE CHARINDEX(#delimiter, #stringToSplit) > 0
BEGIN
SELECT #pos = CHARINDEX(#delimiter, #stringToSplit)
SELECT #name = SUBSTRING(#stringToSplit, len(#delimiter), #pos-len(#delimiter))
INSERT INTO #returnList
SELECT #name
SELECT #stringToSplit = SUBSTRING(#stringToSplit, #pos+LEN(#delimiter),
LEN(#stringToSplit)-#pos)
END
INSERT INTO #returnList
SELECT #stringToSplit
RETURN
END
To apply this to your table, use CROSS APPLY (Single Delimiter):
SELECT mt.Name, x.Name AS ProductCode
FROM MyTable mt
CROSS APPLY dbo.splitstring(mt.Name, '_') x
WHERE ISNUMERIC(x.Name) = 1 AND LEN(x.Name) = 3
Update, Multiple Delimiters
I guess the real underlying problem is that ultimately the product codes need to be normalized out of the composite key (e.g. add a distinct ProductId or ProductCode column to the same table), derived using a query like this, and then stored back in the table via an update. Reverse engineering the product codes out of the string appears to be a trial and error process.
Nonetheless, you can continue to keep passing the split strings through further splitting functions (one per each type of delimiter), before applying your final discriminating filter:
SELECT *
FROM MyTable mt
CROSS APPLY dbo.splitstring(mt.Name, 'test') y -- First alias
CROSS APPLY dbo.splitstring(y.Name, '_') x -- Reference the preceding alias
WHERE ISNUMERIC(x.Name) = 1 AND LEN(x.Name) = 3; -- Must reference the last alias (x)
Note that the stringsplit function has again been changed to accommodate multicharacter delimiters.
If you have a table (or can generate in inline view) of the product codes, you can join the list of long strings to the product codes with a like clause.
Create Table longcodes (
longcode varchar(20)
)
Create Table products (
prodCode char(3)
)
insert products values('100')
insert products values('111')
insert products values('123')
insert longcodes values ('abc_a_100_test')
insert longcodes values ('asdf_111_bob')
insert longcodes values ('in_0314_123_95pf')
insert longcodes values ('f_100_u')
insert longcodes values ('hihi_111_bye')
insert longcodes values ('in_123_0314_95pf')
insert longcodes values ('a_b__c_d_100_efg')
select *
from products p
join longcodes l on l.longcode like '%_' + p.prodCode + '_%'
And they get aligned with the product codes like this:
prodCode longcode
100 abc_a_100_test
100 f_100_u
100 a_b__c_d_100_efg
111 asdf_111_bob
111 hihi_111_bye
123 in_0314_123_95pf
123 in_123_0314_95pf
EDIT: Seeing the developments in the other answer, you can simplify the like clause to
like p.prodCode
and just deal with the fact that you have a much greater chance of a single composite string producing multiple matches.

Sort nvarchar in SQL Server 2008

I have a table with this data in SQL Server :
Id
=====
1
12e
5
and I want to order this data like this:
id
====
1
5
12e
My id column is of type nvarchar(50) and I can't convert it to int.
Is this possible that I sort the data in this way?
As a general rule, if you ever find yourself manipulating parts of columns, you're almost certainly doing it wrong.
If your ID is made up of a numeric and alpha component and you need to fiddle with just the numeric bit, make it two columns and save yourself some angst. In that case, you have an integral id_numeric and a varchar id_alpha and your query is simply:
select char(id_numeric) | id_alpha as id
from mytable
order by id_numeric asc
Or, if you really must store that as a single column, create extra columns to hold the individual parts and use those for sorting and selection. But, in order to mitigate the problems in having duplicate data in a row, use triggers to ensure the data remains consistent:
select id
from mytable
order by id_numeric asc
You usually don't want to have to do this splitting on every select since that never scales well. By doing it as an update/insert trigger, you only do the splitting when needed (ie, when the data changes) and this cost is amortised across all the selects. That's a good idea because, in the vast majority of cases, databases are read far more often than they're written.
And it's perfectly normal practice to revert to lesser levels of normalisation for performance reasons, provided that you understand and mitigate the consequences.
I'd actually use something along the lines of this function, though be warned that it's not going to be super-speedy. I've modified that function to return only the numbers:
CREATE FUNCTION dbo.UDF_ParseNumericChars
(
#string VARCHAR(8000)
)
RETURNS VARCHAR(8000)
WITH SCHEMABINDING
AS
BEGIN
DECLARE #IncorrectCharLoc SMALLINT
SET #IncorrectCharLoc = PATINDEX('%[^0-9]%', #string)
WHILE #IncorrectCharLoc > 0
BEGIN
SET #string = STUFF(#string, #IncorrectCharLoc, 1, '')
SET #IncorrectCharLoc = PATINDEX('%[^0-9]%', #string)
END
SET #string = #string
RETURN #string
END
GO
Once you create that function, then you can do your sort like this:
SELECT YourMixedColumn
FROM YourTable
ORDER BY CONVERT(INT, dbo.UDF_ParseNumericChars(YourMixedColumn))
It can be sort with the Len function
create table #temp (id nvarchar(50) null)
select * from #temp order by LEN(id)

What is the simplest/best way to remove substrings at the end of a string?

I have a function that normalizes addresses. What I would like to do now is remove any of the strings in a limited, specified list if they occur at the end of the string. Let's say the strings I want to remove are 'st', 'ave', 'rd', 'dr', 'ct'... If the string ends with any of these strings, I want to remove them. What is the best way to accomplish this, using T-SQL (this will not be part of a select statement)?
Edit:
This is a function that accepts one address and formats it. I would like to inline the code, and the list, but in the simplest way possible. For example, some code that I've been playing with is:
if #address LIKE '%st'
SET #address = substring(#address, 1, PatIndex('%st', #address) - 1)
Is this a good method? How can I put it in some sort of loop so I can repeat this code with different values (other than st)?
Adding the values to be trimmed to a new table allows you to
easily add new values
use this table to clean up adresses
SQL Statement
DECLARE #Input VARCHAR(32)
SET #Input = 'Streetsstaverddrad'
DECLARE #Trim TABLE (Value VARCHAR(32))
INSERT INTO #Trim
SELECT 'st'
UNION ALL SELECT 'ave'
UNION ALL SELECT 'rd'
UNION ALL SELECT 'dr'
UNION ALL SELECT 'ad'
WHILE EXISTS (
SELECT *
FROM (
SELECT [Adres] = #Input
) i
INNER JOIN #Trim t ON i.Adres LIKE '%' + t.Value
)
BEGIN
SELECT #Input = SUBSTRING(Adres, 1, LEN(Adres) - LEN(t.Value))
FROM (
SELECT [Adres] = #Input
) i
INNER JOIN #Trim t ON i.Adres LIKE '%' + t.Value
END
SELECT #Input
In SQL Server 2005 it is possible to define a user-function which enables regular expression matching. You will need to defined a function which strips the trailing strings. A RegEx to match the scenarios you mention would be something like...
\s+(ave|rd|dr|ct)\s*$

SQL Server: any equivalent of strpos()?

I'm dealing with an annoying database where one field contains what really should be stored two separate fields. So the column is stored something like "The first string~#~The second string", where "~#~" is the delimiter. (Again, I didn't design this, I'm just trying to fix it.)
I want a query to move this into two columns, that would look something like this:
UPDATE UserAttributes
SET str1 = SUBSTRING(Data, 1, STRPOS(Data, '~#~')),
str2 = SUBSTRING(Data, STRPOS(Data, '~#~')+3, LEN(Data)-(STRPOS(Data, '~#~')+3))
But I can't find that any equivalent to strpos exists.
User charindex:
Select CHARINDEX ('S','MICROSOFT SQL SERVER 2000')
Result: 6
Link
The PatIndex function should give you the location of the pattern as a part of a string.
PATINDEX ( '%pattern%' , expression )
http://msdn.microsoft.com/en-us/library/ms188395.aspx
If you need your data in columns here is what I use:
create FUNCTION [dbo].[fncTableFromCommaString] (#strList varchar(8000))
RETURNS #retTable Table (intValue int) AS
BEGIN
DECLARE #intPos tinyint
WHILE CHARINDEX(',',#strList) > 0
BEGIN
SET #intPos=CHARINDEX(',',#strList)
INSERT INTO #retTable (intValue) values (CONVERT(int, LEFT(#strList,#intPos-1)))
SET #strList = RIGHT(#strList, LEN(#strList)-#intPos)
END
IF LEN(#strList)>0
INSERT INTO #retTable (intValue) values (CONVERT(int, #strList))
RETURN
END
Just replace ',' in the function with your delimiter (or maybe even parametrize it)