Extracting specific column values embedded within composite Strings of codes - sql

I am trying to create a piece of code in sql server 2008 that will grab specific values from each distinct string within my dbo table. The ultimate goal is to make a drop down box within Visual Studio so that one can choose all lines from the database that contain a specific product code (see definition of product code below). Example strings:
in_0314_95pf_500_w_0315
in_0314_500_95pf_0315_w
The part of these strings I am wishing to identify is the 3 digit numeric code (in this case let us call it product code) that appears once within each string. There are roughly 300 different product codes.
The problem is that these product code values do not appear in the same position within each unique string. Hence, I am having a hard time determining the product code because I can't use substring, charindex, like, etc.
Any ideas? Any help is MUCH appreciated.

This can be done with PATINDEX:
DECLARE #s NVARCHAR(100) = 'in_0314_95pf_500_w_0315'
SELECT SUBSTRING(#s, PATINDEX('%[_][0-9][0-9][0-9][_]%', #s) + 1, 3)
Output:
500
If there are no underscores then:
SELECT SUBSTRING(#s, PATINDEX('%[^0-9][0-9][0-9][0-9][^0-9]%', #s) + 1, 3)
This means 3 digits between any symbols that are not digits.
EDIT:
Apply to table like:
SELECT SUBSTRING(ColumnName, PATINDEX('%[^0-9][0-9][0-9][0-9][^0-9]%', ColumnName) + 1, 3)
FROM TableName

One approach is to use a String splitting table function like this one which breaks the string up into its components. You can then filter the components based on your criteria:
SELECT Name
FROM dbo.splitstring('in_0314_95pf_500_w_0315', '_')
WHERE ISNUMERIC(Name) = 1 AND LEN(Name) = 3;
I've amended the function slightly to accept the delimiter as a parameter.
CREATE FUNCTION dbo.splitstring ( #stringToSplit VARCHAR(MAX), #delimiter VARCHAR(50))
RETURNS
#returnList TABLE ([Name] [nvarchar] (500))
AS
BEGIN
DECLARE #name NVARCHAR(255)
DECLARE #pos INT
WHILE CHARINDEX(#delimiter, #stringToSplit) > 0
BEGIN
SELECT #pos = CHARINDEX(#delimiter, #stringToSplit)
SELECT #name = SUBSTRING(#stringToSplit, len(#delimiter), #pos-len(#delimiter))
INSERT INTO #returnList
SELECT #name
SELECT #stringToSplit = SUBSTRING(#stringToSplit, #pos+LEN(#delimiter),
LEN(#stringToSplit)-#pos)
END
INSERT INTO #returnList
SELECT #stringToSplit
RETURN
END
To apply this to your table, use CROSS APPLY (Single Delimiter):
SELECT mt.Name, x.Name AS ProductCode
FROM MyTable mt
CROSS APPLY dbo.splitstring(mt.Name, '_') x
WHERE ISNUMERIC(x.Name) = 1 AND LEN(x.Name) = 3
Update, Multiple Delimiters
I guess the real underlying problem is that ultimately the product codes need to be normalized out of the composite key (e.g. add a distinct ProductId or ProductCode column to the same table), derived using a query like this, and then stored back in the table via an update. Reverse engineering the product codes out of the string appears to be a trial and error process.
Nonetheless, you can continue to keep passing the split strings through further splitting functions (one per each type of delimiter), before applying your final discriminating filter:
SELECT *
FROM MyTable mt
CROSS APPLY dbo.splitstring(mt.Name, 'test') y -- First alias
CROSS APPLY dbo.splitstring(y.Name, '_') x -- Reference the preceding alias
WHERE ISNUMERIC(x.Name) = 1 AND LEN(x.Name) = 3; -- Must reference the last alias (x)
Note that the stringsplit function has again been changed to accommodate multicharacter delimiters.

If you have a table (or can generate in inline view) of the product codes, you can join the list of long strings to the product codes with a like clause.
Create Table longcodes (
longcode varchar(20)
)
Create Table products (
prodCode char(3)
)
insert products values('100')
insert products values('111')
insert products values('123')
insert longcodes values ('abc_a_100_test')
insert longcodes values ('asdf_111_bob')
insert longcodes values ('in_0314_123_95pf')
insert longcodes values ('f_100_u')
insert longcodes values ('hihi_111_bye')
insert longcodes values ('in_123_0314_95pf')
insert longcodes values ('a_b__c_d_100_efg')
select *
from products p
join longcodes l on l.longcode like '%_' + p.prodCode + '_%'
And they get aligned with the product codes like this:
prodCode longcode
100 abc_a_100_test
100 f_100_u
100 a_b__c_d_100_efg
111 asdf_111_bob
111 hihi_111_bye
123 in_0314_123_95pf
123 in_123_0314_95pf
EDIT: Seeing the developments in the other answer, you can simplify the like clause to
like p.prodCode
and just deal with the fact that you have a much greater chance of a single composite string producing multiple matches.

Related

How to manipulate comma-separated list in SQL Server

I have a list of values such as
1,2,3,4...
that will be passed into my SQL query.
I need to have these values stored in a table variable. So essentially I need something like this:
declare #t (num int)
insert into #t values (1),(2),(3),(4)...
Is it possible to do that formatting in SQL Server? (turning 1,2,3,4... into (1),(2),(3),(4)...
Note: I can not change what those values look like before they get to my SQL script; I'm stuck with that list. also it may not always be 4 values; it could 1 or more.
Edit to show what values look like: under normal circumstances, this is how it would work:
select t.pk
from a_table t
where t.pk in (#place_holder#)
#placeholder# is just a literal place holder. when some one would run the report, #placeholder# is replaced with the literal values from the filter of that report:
select t.pk
from a_table t
where t.pk in (1,2,3,4) -- or whatever the user selects
t.pk is an int
note: doing
declare #t as table (
num int
)
insert into #t values (#Placeholder#)
does not work.
Your description is a bit ridicuolus, but you might give this a try:
Whatever you mean with this
I see what your trying to say; but if I type out '#placeholder#' in the script, I'll end up with '1','2','3','4' and not '1,2,3,4'
I assume this is a string with numbers, each number between single qoutes, separated with a comma:
DECLARE #passedIn VARCHAR(100)='''1'',''2'',''3'',''4'',''5'',''6'',''7''';
SELECT #passedIn; -->: '1','2','3','4','5','6','7'
Now the variable #passedIn holds exactly what you are talking about
I'll use a dynamic SQL-Statement to insert this in a temp-table (declared table variable would not work here...)
CREATE TABLE #tmpTable(ID INT);
DECLARE #cmd VARCHAR(MAX)=
'INSERT INTO #tmpTable(ID) VALUES (' + REPLACE(SUBSTRING(#passedIn,2,LEN(#passedIn)-2),''',''','),(') + ');';
EXEC (#cmd);
SELECT * FROM #tmpTable;
GO
DROP TABLE #tmpTable;
UPDATE 1: no dynamic SQL necessary, all ad-hoc...
You can get the list of numbers as derived table in a CTE easily.
This can be used in a following statement like WHERE SomeID IN(SELECT ID FROM MyIDs) (similar to this: dynamic IN section )
WITH MyIDs(ID) AS
(
SELECT A.B.value('.','int') AS ID
FROM
(
SELECT CAST('<x>' + REPLACE(SUBSTRING(#passedIn,2,LEN(#passedIn)-2),''',''','</x><x>') + '</x>' AS XML) AS AsXml
) as tbl
CROSS APPLY tbl.AsXml.nodes('/x') AS A(B)
)
SELECT * FROM MyIDs
UPDATE 2:
And to answer your question exactly:
With this following the CTE
insert into #t(num)
SELECT ID FROM MyIDs
... you would actually get your declared table variable filled - if you need it later...

I need to use a comma delimited string for SQL AND condition on each word

So I have a parameter. Lets say:
#Tags = 'Red,Large,New'
I have a field in my table called [Tags]
Lets say one particular row that field contains "Red,Large,New,SomethingElse,AndSomethingElse"
How can I break that apart to basically achieve the following logic, which I'll type for understanding but I know is wrong:
SELECT * FROM MyTable WHERE
Tags LIKE 'FirstWordInString'
AND Tags Like 'SecondWordInString'
AND Tags Like 'ThirdWordInString'
But it knows where to stop? Knows if there's just one word? Or two? Or three?
Workflow:
Someone clicks a tag and the dataset is filtered by that tag. They click another and the tag is appended to the search box and the dataset is then filtered by both of those tags, etc.
Thank you!
Update:
This is a product based situation.
When a product is created the creator can enter search tags separated by commas.
When the product is inserted, the search tags are inserted into a separate table called ProductTags (ProductID, TagID)
So, in the Product table, I have a field that has them separated by string for display purposes in the application side, and these same tags are also found in the ProductTag table separated by row based on ProductID.
What I'm not understanding is how to place the condition in the select that delivers results if all the tags in the search exist for that product.
I thought it would be easier to just use the comma separated field, but perhaps I should be using the corresponding table (ProductTags)
One relational division approach using the ProductTags table is
DECLARE #TagsToSearch TABLE (
TagId VARCHAR(50) PRIMARY KEY )
INSERT INTO #TagsToSearch
VALUES ('Red'),
('Large'),
('New')
SELECT PT.ProductID
FROM ProductTags PT
JOIN #TagsToSearch TS
ON TS.TagId = PT.TagId
GROUP BY PT.ProductID
HAVING COUNT(*) = (SELECT COUNT(*)
FROM #TagsToSearch)
The #TagsToSearch table would ideally be a table valued parameter passed directly from your application but could also be populated by using a split function against your comma delimited parameter.
You just need to add wildcards:
SELECT *
FROM MyTable
WHERE ','+Tags+',' LIKE '%,FirstWordInString,%'
AND ','+Tags+','Like '%,SecondWordInString,%'
AND ','+Tags+','Like '%,ThirdWordInString,%'
UPDATE:
I understand now the problem is that you have a variable you are trying to match the tags to. The variable has the string list of tags. If you split the list of user selected tags using a split string function, or can get the user selected tags from input form separately, then you can just use:
SELECT *
FROM MyTable
WHERE ',,'+Tags+',' LIKE '%,'+#tag1+',%'
AND ',,'+Tags+',' LIKE '%,'+#tag2+',%'
AND ',,'+Tags+',' LIKE '%,'+#tag3+',%'
AND ',,'+Tags+',' LIKE '%,'+#tag4+',%'
etc.
This will work for any number of entered tags, as an empty #tag would result in the double comma appended to the tags field matching '%,,%'.
From codeproject:
-- =============================================
-- Author: Md. Marufuzzaman
-- Create date:
-- Description: Split an expression.
-- Note: If you are using SQL Server 2000, You need to change the
-- length (MAX) to your maximum expression length of each datatype.
-- =============================================
/*
SELECT * FROM [dbo].[SPLIT] (';','I love codeProject;!!!;Your development resources')
*/
CREATE FUNCTION [dbo].[SPLIT]
( #DELIMITER VARCHAR(5),
#LIST VARCHAR(MAX)
)
RETURNS #TABLEOFVALUES TABLE
( ROWID SMALLINT IDENTITY(1,1),
[VALUE] VARCHAR(MAX)
)
AS
BEGIN
DECLARE #LENSTRING INT
WHILE LEN( #LIST ) > 0
BEGIN
SELECT #LENSTRING =
(CASE CHARINDEX( #DELIMITER, #LIST )
WHEN 0 THEN LEN( #LIST )
ELSE ( CHARINDEX( #DELIMITER, #LIST ) -1 )
END
)
INSERT INTO #TABLEOFVALUES
SELECT SUBSTRING( #LIST, 1, #LENSTRING )
SELECT #LIST =
(CASE ( LEN( #LIST ) - #LENSTRING )
WHEN 0 THEN ''
ELSE RIGHT( #LIST, LEN( #LIST ) - #LENSTRING - 1 )
END
)
END
RETURN
END
http://www.codeproject.com/Articles/38843/An-Easy-But-Effective-Way-to-Split-a-String-using

SQL How to Split One Column into Multiple Variable Columns

I am working on MSSQL, trying to split one string column into multiple columns. The string column has numbers separated by semicolons, like:
190230943204;190234443204;
However, some rows have more numbers than others, so in the database you can have
190230943204;190234443204;
121340944534;340212343204;134530943204
I've seen some solutions for splitting one column into a specific number of columns, but not variable columns. The columns that have less data (2 series of strings separated by commas instead of 3) will have nulls in the third place.
Ideas? Let me know if I must clarify anything.
Splitting this data into separate columns is a very good start (coma-separated values are an heresy). However, a "variable number of properties" should typically be modeled as a one-to-many relationship.
CREATE TABLE main_entity (
id INT PRIMARY KEY,
other_fields INT
);
CREATE TABLE entity_properties (
main_entity_id INT PRIMARY KEY,
property_value INT,
FOREIGN KEY (main_entity_id) REFERENCES main_entity(id)
);
entity_properties.main_entity_id is a foreign key to main_entity.id.
Congratulations, you are on the right path, this is called normalisation. You are about to reach the First Normal Form.
Beweare, however, these properties should have a sensibly similar nature (ie. all phone numbers, or addresses, etc.). Do not to fall into the dark side (a.k.a. the Entity-Attribute-Value anti-pattern), and be tempted to throw all properties into the same table. If you can identify several types of attributes, store each type in a separate table.
If these are all fixed length strings (as in the question), then you can do the work fairly simply (at least relative to other solutions):
select substring(col, 1+13*(n-1), 12) as val
from t join
(select 1 as n union all select union all select 3
) n
on len(t.col) <= 13*n.n
This is a useful hack if all the entries are the same size (not so easy if they are of different sizes). Do, however, think about the data structure because semi-colon (or comma) separated list is not a very good data structure.
IF I were you, I would create a simple function that is dividing values separated with ';' like this:
IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id(N'fn_Split_List') AND xtype IN (N'FN', N'IF', N'TF'))
BEGIN
DROP FUNCTION [dbo].[fn_Split_List]
END
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[fn_Split_List](#List NVARCHAR(512))
RETURNS #ResultRowset TABLE ( [Value] NVARCHAR(128) PRIMARY KEY)
AS
BEGIN
DECLARE #XML xml = N'<r><![CDATA[' + REPLACE(#List, ';', ']]></r><r><![CDATA[') + ']]></r>'
INSERT INTO #ResultRowset ([Value])
SELECT DISTINCT RTRIM(LTRIM(Tbl.Col.value('.', 'NVARCHAR(128)')))
FROM #xml.nodes('//r') Tbl(Col)
RETURN
END
GO
Than simply called in this way:
SET NOCOUNT ON
GO
DECLARE #RawData TABLE( [Value] NVARCHAR(256))
INSERT INTO #RawData ([Value] )
VALUES ('1111111;22222222')
,('3333333;113113131')
,('776767676')
,('89332131;313131312;54545353')
SELECT SL.[Value]
FROM #RawData AS RD
CROSS APPLY [fn_Split_List] ([Value]) as SL
SET NOCOUNT OFF
GO
The result is as the follow:
Value
1111111
22222222
113113131
3333333
776767676
313131312
54545353
89332131
Anyway, the logic in the function is not complicated, so you can easily put it anywhere you need.
Note: There is not limitations of how many values you will have separated with ';', but there are length limitation in the function that you can set to NVARCHAR(MAX) if you need.
EDIT:
As I can see, there are some rows in your example that will caused the function to return empty strings. For example:
number;number;
will return:
number
number
'' (empty string)
To clear them, just add the following where clause to the statement above like this:
SELECT SL.[Value]
FROM #RawData AS RD
CROSS APPLY [fn_Split_List] ([Value]) as SL
WHERE LEN(SL.[Value]) > 0

Splitting delimited values in a SQL column into multiple rows

I would really like some advice here, to give some background info I am working with inserting Message Tracking logs from Exchange 2007 into SQL. As we have millions upon millions of rows per day I am using a Bulk Insert statement to insert the data into a SQL table.
In fact I actually Bulk Insert into a temp table and then from there I MERGE the data into the live table, this is for test parsing issues as certain fields otherwise have quotes and such around the values.
This works well, with the exception of the fact that the recipient-address column is a delimited field seperated by a ; character, and it can be incredibly long sometimes as there can be many email recipients.
I would like to take this column, and split the values into multiple rows which would then be inserted into another table. Problem is anything I am trying is either taking too long or not working the way I want.
Take this example data:
message-id recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5#Server.com user1#domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E#Server.com user2#domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user3#domain3.com;user4#domain4.com;user5#domain5.com
I would like this to be formatted as followed in my Recipients table:
message-id recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5#Server.com user1#domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E#Server.com user2#domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user3#domain3.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user4#domain4.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user5#domain5.com
Does anyone have any ideas about how I can go about doing this?
I know PowerShell pretty well, so I tried in that, but a foreach loop even on 28K records took forever to process, I need something that will run as quickly/efficiently as possible.
Thanks!
If you are on SQL Server 2016+
You can use the new STRING_SPLIT function, which I've blogged about here, and Brent Ozar has blogged about here.
SELECT s.[message-id], f.value
FROM dbo.SourceData AS s
CROSS APPLY STRING_SPLIT(s.[recipient-address], ';') as f;
If you are still on a version prior to SQL Server 2016
Create a split function. This is just one of many examples out there:
CREATE FUNCTION dbo.SplitStrings
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
AS
RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(#List, Number,
CHARINDEX(#Delimiter, #List + #Delimiter, Number) - Number)))
FROM (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1 CROSS APPLY sys.all_objects) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(#List))
AND SUBSTRING(#Delimiter + #List, Number, 1) = #Delimiter
) AS y);
GO
I've discussed a few others here, here, and a better approach than splitting in the first place here.
Now you can extrapolate simply by:
SELECT s.[message-id], f.Item
FROM dbo.SourceData AS s
CROSS APPLY dbo.SplitStrings(s.[recipient-address], ';') as f;
Also I suggest not putting dashes in column names. It means you always have to put them in [square brackets].
SQL Server 2016 include a new table function string_split(), similar to the previous solution.
The only requirement is Set compatibility level to 130 (SQL Server 2016)
You may use CROSS APPLY (available in SQL Server 2005 and above) and STRING_SPLIT function (available in SQL Server 2016 and above):
DECLARE #delimiter nvarchar(255) = ';';
-- create tables
CREATE TABLE MessageRecipients (MessageId int, Recipients nvarchar(max));
CREATE TABLE MessageRecipient (MessageId int, Recipient nvarchar(max));
-- insert data
INSERT INTO MessageRecipients VALUES (1, 'user1#domain.com; user2#domain.com; user3#domain.com');
INSERT INTO MessageRecipients VALUES (2, 'user#domain1.com; user#domain2.com');
-- insert into MessageRecipient
INSERT INTO MessageRecipient
SELECT MessageId, ltrim(rtrim(value))
FROM MessageRecipients
CROSS APPLY STRING_SPLIT(Recipients, #delimiter)
-- output results
SELECT * FROM MessageRecipients;
SELECT * FROM MessageRecipient;
-- delete tables
DROP TABLE MessageRecipients;
DROP TABLE MessageRecipient;
Results:
MessageId Recipients
----------- ----------------------------------------------------
1 user1#domain.com; user2#domain.com; user3#domain.com
2 user#domain1.com; user#domain2.com
and
MessageId Recipient
----------- ----------------
1 user1#domain.com
1 user2#domain.com
1 user3#domain.com
2 user#domain1.com
2 user#domain2.com
for table = "yelp_business", split the column categories values separated by ; into rows and display as category column.
SELECT unnest(string_to_array(categories, ';')) AS category
FROM yelp_business;

SQL How to find if all values from one field exist in another field in any order

I am trying to match data from an external source to an in house source. For example one table would have a field with a value of "black blue" and another table would have a field with a value of "blue black". I am trying to figure out how to check if all individual words in the first table are contained in a record the 2nd table in any order. It's not always two words that need to be compared it could be 3 or 4 as well. I know I could use a cursor and build dynamic sql substituting the space with the AND keywod and using the contains function but I'm hoping not to have to do that.
Any help would be much appreciated.
Try doing something like this: Split the data from the first table on the space into a temporary table variable. Then use CHARINDEX to determine if each word is contained in the second table's record. Then just do this for each word in the first record and if the count is the same as the successful checks then you know every word from the first record is used in the second.
Edit: Use a Split function such as:
CREATE FUNCTION dbo.Split (#sep char(1), #s varchar(512))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep, #s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(#sep, #s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn,
SUBSTRING(#s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
FROM Pieces
)
Here's another method you could try, you could sample some simple attributes of your strings such as, length, number of spaces, etc.; then you could use a cross-join to create all of the possible string match combinations.
Then within your where-clause you can sort by matches, the final piece of which in this example is a check using the patindex() function to see if the sampled piece of the first string is in the second string.
-- begin sample table variable set up
declare #s table(
id int identity(1,1)
,string varchar(255)
,numSpace int
,numWord int
,lenString int
,firstPatt varchar(255)
);
declare #t table(
id int identity(1,1)
,string varchar(255)
,numSpace int
,numWord int
,lenString int
);
insert into #t(string)
values ('my name');
insert into #t(string)
values ('your name');
insert into #t(string)
values ('run and jump');
insert into #t(string)
values ('hello my name is');
insert into #s(string)
values ('name my');
insert into #s(string)
values ('name your');
insert into #s(string)
values ('jump and run');
insert into #s(string)
values ('my name is hello');
update #s
set numSpace = len(string)-len(replace(string,' ',''));
update #s
set numWord = len(string)-len(replace(string,' ',''))+1;
update #s
set lenString = len(string);
update #s
set firstPatt = rtrim(substring(string,1,charindex(' ',string,0)));
update #t
set numSpace = len(string)-len(replace(string,' ',''));
update #t
set numWord = len(string)-len(replace(string,' ',''))+1;
update #t
set lenString = len(string);
-- end sample table variable set up
-- select all combinations of strings using a cross join
-- and sort the entries in your where clause
-- the pattern index checks to see if the sampled string
-- from the first table variable is in the second table variable
select *
from
#s s cross join #t t
where
s.numSpace = t.numspace
and s.numWord = t.numWord
and s.lenString = t.lenString
and patindex('%'+s.firstPatt+'%',t.string)>0;