SQL How to Split One Column into Multiple Variable Columns - sql

I am working on MSSQL, trying to split one string column into multiple columns. The string column has numbers separated by semicolons, like:
190230943204;190234443204;
However, some rows have more numbers than others, so in the database you can have
190230943204;190234443204;
121340944534;340212343204;134530943204
I've seen some solutions for splitting one column into a specific number of columns, but not variable columns. The columns that have less data (2 series of strings separated by commas instead of 3) will have nulls in the third place.
Ideas? Let me know if I must clarify anything.

Splitting this data into separate columns is a very good start (coma-separated values are an heresy). However, a "variable number of properties" should typically be modeled as a one-to-many relationship.
CREATE TABLE main_entity (
id INT PRIMARY KEY,
other_fields INT
);
CREATE TABLE entity_properties (
main_entity_id INT PRIMARY KEY,
property_value INT,
FOREIGN KEY (main_entity_id) REFERENCES main_entity(id)
);
entity_properties.main_entity_id is a foreign key to main_entity.id.
Congratulations, you are on the right path, this is called normalisation. You are about to reach the First Normal Form.
Beweare, however, these properties should have a sensibly similar nature (ie. all phone numbers, or addresses, etc.). Do not to fall into the dark side (a.k.a. the Entity-Attribute-Value anti-pattern), and be tempted to throw all properties into the same table. If you can identify several types of attributes, store each type in a separate table.

If these are all fixed length strings (as in the question), then you can do the work fairly simply (at least relative to other solutions):
select substring(col, 1+13*(n-1), 12) as val
from t join
(select 1 as n union all select union all select 3
) n
on len(t.col) <= 13*n.n
This is a useful hack if all the entries are the same size (not so easy if they are of different sizes). Do, however, think about the data structure because semi-colon (or comma) separated list is not a very good data structure.

IF I were you, I would create a simple function that is dividing values separated with ';' like this:
IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id(N'fn_Split_List') AND xtype IN (N'FN', N'IF', N'TF'))
BEGIN
DROP FUNCTION [dbo].[fn_Split_List]
END
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[fn_Split_List](#List NVARCHAR(512))
RETURNS #ResultRowset TABLE ( [Value] NVARCHAR(128) PRIMARY KEY)
AS
BEGIN
DECLARE #XML xml = N'<r><![CDATA[' + REPLACE(#List, ';', ']]></r><r><![CDATA[') + ']]></r>'
INSERT INTO #ResultRowset ([Value])
SELECT DISTINCT RTRIM(LTRIM(Tbl.Col.value('.', 'NVARCHAR(128)')))
FROM #xml.nodes('//r') Tbl(Col)
RETURN
END
GO
Than simply called in this way:
SET NOCOUNT ON
GO
DECLARE #RawData TABLE( [Value] NVARCHAR(256))
INSERT INTO #RawData ([Value] )
VALUES ('1111111;22222222')
,('3333333;113113131')
,('776767676')
,('89332131;313131312;54545353')
SELECT SL.[Value]
FROM #RawData AS RD
CROSS APPLY [fn_Split_List] ([Value]) as SL
SET NOCOUNT OFF
GO
The result is as the follow:
Value
1111111
22222222
113113131
3333333
776767676
313131312
54545353
89332131
Anyway, the logic in the function is not complicated, so you can easily put it anywhere you need.
Note: There is not limitations of how many values you will have separated with ';', but there are length limitation in the function that you can set to NVARCHAR(MAX) if you need.
EDIT:
As I can see, there are some rows in your example that will caused the function to return empty strings. For example:
number;number;
will return:
number
number
'' (empty string)
To clear them, just add the following where clause to the statement above like this:
SELECT SL.[Value]
FROM #RawData AS RD
CROSS APPLY [fn_Split_List] ([Value]) as SL
WHERE LEN(SL.[Value]) > 0

Related

Store more character than the assign column in SQL Server

Can I do something like this, column is of type nchar(8), but the string I wanted to store in the table is longer than that.
The reason I am doing this is because I want to convert from one table to another table. Table A is nchar(8) and Table B is nvarchar(100). I want all characters in Table B transfer to Table A without missing any single character.
If the nvarchar(100) contains only latin characters with a length up to 16 chars, then you can squeeze the nvarchar(100) into the nchar(8):
declare #t table
(
col100 nvarchar(100),
col8 nchar(8)
);
insert into #t(col100) values('1234567890123456');
update #t
set col8 = cast(cast(col100 as varchar(100)) as varbinary(100))
select *, cast(cast(cast(col8 as varbinary(100)) as varchar(100)) as nvarchar(100)) as from8to100_16charsmax
from #t;
If you cannot modify A, then you cannot use it to store the data. Create another table for the overflow . . . something like:
create table a_overflow (
a_pk int primary key references a(pk),
column nvarchar(max) -- why stop at 100?
);
Then, you can construct a view to bring in the data from this table when viewing a:
create view vw_a as
select . . . , -- all the other columns
coalesce(ao.column, a.column) as column
from a left join
a_overflow ao
on ao.a_pk = a.pk;
And, if you really want to "hide" the view, you can create an insert trigger on vw_a, which inserts the appropriate values into the two tables.
This is a lot of work. Simply modifying the table is much simpler. That said, this approach is sometimes needed when you need to modify a large table and altering a column would incur too much locking overhead.

Extracting specific column values embedded within composite Strings of codes

I am trying to create a piece of code in sql server 2008 that will grab specific values from each distinct string within my dbo table. The ultimate goal is to make a drop down box within Visual Studio so that one can choose all lines from the database that contain a specific product code (see definition of product code below). Example strings:
in_0314_95pf_500_w_0315
in_0314_500_95pf_0315_w
The part of these strings I am wishing to identify is the 3 digit numeric code (in this case let us call it product code) that appears once within each string. There are roughly 300 different product codes.
The problem is that these product code values do not appear in the same position within each unique string. Hence, I am having a hard time determining the product code because I can't use substring, charindex, like, etc.
Any ideas? Any help is MUCH appreciated.
This can be done with PATINDEX:
DECLARE #s NVARCHAR(100) = 'in_0314_95pf_500_w_0315'
SELECT SUBSTRING(#s, PATINDEX('%[_][0-9][0-9][0-9][_]%', #s) + 1, 3)
Output:
500
If there are no underscores then:
SELECT SUBSTRING(#s, PATINDEX('%[^0-9][0-9][0-9][0-9][^0-9]%', #s) + 1, 3)
This means 3 digits between any symbols that are not digits.
EDIT:
Apply to table like:
SELECT SUBSTRING(ColumnName, PATINDEX('%[^0-9][0-9][0-9][0-9][^0-9]%', ColumnName) + 1, 3)
FROM TableName
One approach is to use a String splitting table function like this one which breaks the string up into its components. You can then filter the components based on your criteria:
SELECT Name
FROM dbo.splitstring('in_0314_95pf_500_w_0315', '_')
WHERE ISNUMERIC(Name) = 1 AND LEN(Name) = 3;
I've amended the function slightly to accept the delimiter as a parameter.
CREATE FUNCTION dbo.splitstring ( #stringToSplit VARCHAR(MAX), #delimiter VARCHAR(50))
RETURNS
#returnList TABLE ([Name] [nvarchar] (500))
AS
BEGIN
DECLARE #name NVARCHAR(255)
DECLARE #pos INT
WHILE CHARINDEX(#delimiter, #stringToSplit) > 0
BEGIN
SELECT #pos = CHARINDEX(#delimiter, #stringToSplit)
SELECT #name = SUBSTRING(#stringToSplit, len(#delimiter), #pos-len(#delimiter))
INSERT INTO #returnList
SELECT #name
SELECT #stringToSplit = SUBSTRING(#stringToSplit, #pos+LEN(#delimiter),
LEN(#stringToSplit)-#pos)
END
INSERT INTO #returnList
SELECT #stringToSplit
RETURN
END
To apply this to your table, use CROSS APPLY (Single Delimiter):
SELECT mt.Name, x.Name AS ProductCode
FROM MyTable mt
CROSS APPLY dbo.splitstring(mt.Name, '_') x
WHERE ISNUMERIC(x.Name) = 1 AND LEN(x.Name) = 3
Update, Multiple Delimiters
I guess the real underlying problem is that ultimately the product codes need to be normalized out of the composite key (e.g. add a distinct ProductId or ProductCode column to the same table), derived using a query like this, and then stored back in the table via an update. Reverse engineering the product codes out of the string appears to be a trial and error process.
Nonetheless, you can continue to keep passing the split strings through further splitting functions (one per each type of delimiter), before applying your final discriminating filter:
SELECT *
FROM MyTable mt
CROSS APPLY dbo.splitstring(mt.Name, 'test') y -- First alias
CROSS APPLY dbo.splitstring(y.Name, '_') x -- Reference the preceding alias
WHERE ISNUMERIC(x.Name) = 1 AND LEN(x.Name) = 3; -- Must reference the last alias (x)
Note that the stringsplit function has again been changed to accommodate multicharacter delimiters.
If you have a table (or can generate in inline view) of the product codes, you can join the list of long strings to the product codes with a like clause.
Create Table longcodes (
longcode varchar(20)
)
Create Table products (
prodCode char(3)
)
insert products values('100')
insert products values('111')
insert products values('123')
insert longcodes values ('abc_a_100_test')
insert longcodes values ('asdf_111_bob')
insert longcodes values ('in_0314_123_95pf')
insert longcodes values ('f_100_u')
insert longcodes values ('hihi_111_bye')
insert longcodes values ('in_123_0314_95pf')
insert longcodes values ('a_b__c_d_100_efg')
select *
from products p
join longcodes l on l.longcode like '%_' + p.prodCode + '_%'
And they get aligned with the product codes like this:
prodCode longcode
100 abc_a_100_test
100 f_100_u
100 a_b__c_d_100_efg
111 asdf_111_bob
111 hihi_111_bye
123 in_0314_123_95pf
123 in_123_0314_95pf
EDIT: Seeing the developments in the other answer, you can simplify the like clause to
like p.prodCode
and just deal with the fact that you have a much greater chance of a single composite string producing multiple matches.

User-defined In-line Table-Valued Functions Called On Each Other In SQL Server 2008

I am using SQL Server 2008, and I am struggling with learning how to correctly call a User-defined In-line Table-Valued Function on a User-defined In-line Table-Valued Function (that is, since each expects a scalar or scalars as input and outputs a table, I want to learn how to correctly call one by passing it another table, whereupon each row is treated as its scalar inputs).
I posted a couple questions related to this recently, but I think I was not clear enough, and did not sufficiently encapsulate the problem to cleanly demonstrate it. I have now prepared the proper statements to provide anyone interested in helping the necessary tables, views, functions, and SELECT outputs to see the problem occur in front of them by executing the query below.
There are several ways I can phrase the core question, and from here and other forums, I can tell I have difficulty clearly expressing it. I am going to phrase it several ways here, but these are all meant to be the same question, phrased differently so people from different backgrounds can more easily understand me.
How do I correctly write the "imageFileNameFromAddress" function below so it works as intended; to wit, the intent is that it takes the same input as "bookAndPageFromAddress" and, using bookAndPageFromAddress and imageFileNameFromBookPage, passing the input to the first, then its output to the second, and returns the second's output?
Why does the third SELECT statement at the bottom below provide different results from the second one, and how do I fix the underlying function(s) to provide identical results, without repeating code from the other functions?
What is the correct syntax for the OUTER APPLY call in imageFileNameFromAddress so that its output is not null?
WARNING: The code below constructs the necessary tables, views, and functions to demonstrate the problem by dropping them first if they exist, so please please please check first to make sure you don't drop anything of your own! The final three SELECTS demonstrate the problem; the final two SELECTS should have identical output, but do not - the first one (of the final two, so the middle of the three) is a three row table of strings, and the final one is a one row table containing only a NULL.
USE [TOM_GIS]
GO
IF OBJECT_ID(N'[dbo].[constant]', N'U') IS NOT NULL
DROP TABLE [dbo].[constant]
CREATE TABLE [dbo].[constant]
(
ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
BOOK varchar(5),
PAGE varchar(5),
DocID numeric(8, 0)
)
INSERT INTO [dbo].[constant]
VALUES(' 4043',' 125', 576030)
GO
IF OBJECT_ID(N'[dbo].[images]', N'U') IS NOT NULL
DROP TABLE [dbo].[images]
CREATE TABLE [dbo].[images]
(
ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
DocID numeric(8, 0),
ImageID numeric(12,0)
)
INSERT INTO [dbo].[images] VALUES(576030, 1589666);
INSERT INTO [dbo].[images] VALUES(576030, 1589667);
INSERT INTO [dbo].[images] VALUES(576030, 1589668);
GO
IF OBJECT_ID(N'[dbo].[addressBookPage]', N'U') IS NOT NULL
DROP TABLE [dbo].[addressBookPage]
CREATE TABLE [dbo].[addressBookPage]
(
ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
PARCEL_ADDRESS nvarchar(50),
BOOK nchar(10),
PAGE nchar(10),
)
INSERT INTO [dbo].[addressBookPage]
VALUES('155 CENTER STREET','4043', '125')
GO
IF OBJECT_ID(N'[dbo].[vw_quindraco]') IS NOT NULL
DROP VIEW [dbo].[vw_quindraco]
GO
CREATE VIEW [dbo].[vw_quindraco]
AS
WITH files AS (SELECT RIGHT('00000000' + LTRIM(STR(c.DocID)), 8) AS PathInfo
,RIGHT('0000000000' + LTRIM(STR(i.ImageID)), 12) AS FileName
,ltrim(c.Book) as Book
,ltrim(c.Page) as Page
FROM [dbo].[constant] AS c INNER JOIN
[dbo].[images] AS i ON c.DocID = i.DocID)
SELECT 'Images/' + SUBSTRING(PathInfo, 1, 2) + '/' + SUBSTRING(PathInfo, 3, 2) + '/' + SUBSTRING(PathInfo, 5, 2)
+ '/' + RIGHT(PathInfo, 8) + '/' + FileName + '.tif' AS FullFileName
,Book
,Page
FROM files AS files_1
GO
IF OBJECT_ID(N'[dbo].[bookAndPageFromAddress]') IS NOT NULL
DROP FUNCTION [dbo].[bookAndPageFromAddress];
GO
CREATE FUNCTION [dbo].[bookAndPageFromAddress] (#address NVARCHAR(max))
RETURNS TABLE AS RETURN(
SELECT PARCEL_ADDRESS AS Address, Book, Page
FROM [dbo].[addressBookPage]
WHERE PARCEL_ADDRESS like '%' + #address + '%'
);
GO
IF OBJECT_ID(N'[dbo].[imageFileNameFromBookPage]') IS NOT NULL
DROP FUNCTION [dbo].[imageFileNameFromBookPage];
GO
CREATE FUNCTION [dbo].[imageFileNameFromBookPage] (#book nvarchar(max), #page nvarchar(max))
RETURNS TABLE AS RETURN(
SELECT i.FullFileName
FROM [dbo].[vw_quindraco] i
WHERE i.Book like #book
AND i.Page like #page
);
GO
IF OBJECT_ID(N'[dbo].[imageFileNameFromAddress]') IS NOT NULL
DROP FUNCTION [dbo].[imageFileNameFromAddress];
GO
CREATE FUNCTION [dbo].[imageFileNameFromAddress] (#address NVARCHAR(max))
RETURNS TABLE AS RETURN(
SELECT *
FROM [dbo].[bookAndPageFromAddress](#address) addresses
OUTER APPLY [dbo].[imageFileNameFromBookPage](addresses.Book, addresses.Page) foo
);
GO
SELECT Book,Page FROM [dbo].[bookAndPageFromAddress]('155 Center Street');
SELECT FullFileName FROM [dbo].[imageFileNameFromBookPage]('4043','125');
SELECT FullFileName FROM [dbo].[imageFileNameFromAddress]('155 Center Street')
You have your table fields as nchars, and you are using Like.
Because it's nchar, the value is padded with spaces to the declared length (10).
Because it's Like, the spaces are considered essential part of a match, whereas the equality operator, =, would ignore trailing spaces.
Because data types in the table and in the function parameters do not match, implicit conversions happen in the background, ultimately causing comparison to fail because of spaces.
Use = instead of Like inside imageFileNameFromBookPage to quickly fix it.
Better yet, use correct data types in all functions and views to avoid any conversions.

Sane/fast method to pass variable parameter lists to SqlServer2008 stored procedure

A fairly comprehensive query of the brain has turned up a thousand and one ways to pass variable length parameter lists that involve such methods as:
CLR based methods for parsing strings to lists of integers
Table valued functions that require the presence of a 'Numbers' table (wtf?)
Passing the data as XML
Our requirements are to pass two variable length lists of integers (~max 20 ints) to a stored procedure. All methods outlined above seem to smell funny.
Is this just the way it has to be done, or is there a better way?
Edit:
I've just found this, which may qualify this question as a dupe
Yes, I'd definitely look at Table Valued Parameters for this. As a side benefit, it may allow you to use a nice, clean set-based implementation for the innards of your procedure directly, without any data massaging required.
Here's another reference as well...
Here is a fairly fast method to split strings using only T-SQL and you input parameter is only a string. You need to have a table and a function (as described below) already set up to use this method.
create this table:
CREATE TABLE Numbers (Number int not null primary key identity(1,1))
DECLARE #n int
SET #n=1
SET IDENTITY_INSERT Numbers ON
WHILE #N<=8000
BEGIN
INSERT INTO Numbers (Number) values (#n)
SET #n=#n+1
END
SET IDENTITY_INSERT Numbers OFF
create this function to split the string array (I have other versions, where empty sections are eliminated and ones that do not return row numbers):
CREATE FUNCTION [dbo].[FN_ListAllToNumberTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000) --REQUIRED, the list to split apart
)
RETURNS
#ParsedList table
(
RowNumber int --REQUIRED, the list to split apart
,ListValue varchar(500) --OPTIONAL, the character to split the #List string on, defaults to a comma ","
)
AS
BEGIN
--this will return empty rows, and row numbers
INSERT INTO #ParsedList
(RowNumber,ListValue)
SELECT
ROW_NUMBER() OVER(ORDER BY number) AS RowNumber
,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(#SplitOn, ListValue, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS ListValue
) AS InnerQuery
INNER JOIN Numbers n ON n.Number < LEN(InnerQuery.ListValue)
WHERE SUBSTRING(ListValue, number, 1) = #SplitOn
RETURN
END
go
here is an example of how to split the parameter apart:
CREATE PROCEDURE TestPass
(
#ArrayOfInts varchar(255) --pipe "|" separated list of IDs
)
AS
SET NOCOUNT ON
DECLARE #TableIDs TABLE (RowNumber int, IDValue int null)
INSERT INTO #TableIDs (RowNumber, IDValue) SELECT RowNumber,CASE WHEN LEN(ListValue)<1 then NULL ELSE ListValue END FROM dbo.FN_ListAllToNumberTable('|',#ArrayOfInts)
SELECT * FROM #TableIDs
go
this is based on: http://www.sommarskog.se/arrays-in-sql.html

Filtering With Multi-Select Boxes With SQL Server

I need to filter result sets from sql server based on selections from a multi-select list box. I've been through the idea of doing an instring to determine if the row value exists in the selected filter values, but that's prone to partial matches (e.g. Car matches Carpet).
I also went through splitting the string into a table and joining/matching based on that, but I have reservations about how that is going to perform.
Seeing as this is a seemingly common task, I'm looking to the Stack Overflow community for some feedback and maybe a couple suggestions on the most commonly utilized approach to solving this problem.
I solved this one by writing a table-valued function (we're using 2005) which takes a delimited string and returns a table. You can then join to that or use WHERE EXISTS or WHERE x IN. We haven't done full stress testing yet, but with limited use and reasonably small sets of items I think that performance should be ok.
Below is one of the functions as a starting point for you. I also have one written to specifically accept a delimited list of INTs for ID values in lookup tables, etc.
Another possibility is to use LIKE with the delimiters to make sure that partial matches are ignore, but you can't use indexes with that, so performance will be poor for any large table. For example:
SELECT
my_column
FROM
My_Table
WHERE
#my_string LIKE '%|' + my_column + '|%'
.
/*
Name: GetTableFromStringList
Description: Returns a table of values extracted from a delimited list
Parameters:
#StringList - A delimited list of strings
#Delimiter - The delimiter used in the delimited list
History:
Date Name Comments
---------- ------------- ----------------------------------------------------
2008-12-03 T. Hummel Initial Creation
*/
CREATE FUNCTION dbo.GetTableFromStringList
(
#StringList VARCHAR(1000),
#Delimiter CHAR(1) = ','
)
RETURNS #Results TABLE
(
String VARCHAR(1000) NOT NULL
)
AS
BEGIN
DECLARE
#string VARCHAR(1000),
#position SMALLINT
SET #StringList = LTRIM(RTRIM(#StringList)) + #Delimiter
SET #position = CHARINDEX(#Delimiter, #StringList)
WHILE (#position > 0)
BEGIN
SET #string = LTRIM(RTRIM(LEFT(#StringList, #position - 1)))
IF (#string <> '')
BEGIN
INSERT INTO #Results (String) VALUES (#string)
END
SET #StringList = RIGHT(#StringList, LEN(#StringList) - #position)
SET #position = CHARINDEX(#Delimiter, #StringList, 1)
END
RETURN
END
I've been through the idea of doing an
instring to determine if the row value
exists in the selected filter values,
but that's prone to partial matches
(e.g. Car matches Carpet)
It sounds to me like you aren't including a unique ID, or possibly the primary key as part of values in your list box. Ideally each option will have a unique identifier that matches a column in the table you are searching on. If your listbox was like below then you would be able to filter for specifically for cars because you would get the unique value 3.
<option value="3">Car</option>
<option value="4">Carpret</option>
Then you just build a where clause that will allow you to find the values you need.
Updated, to answer comment.
How would I do the related join
considering that the user can select
and arbitrary number of options from
the list box? SELECT * FROM tblTable
JOIN tblOptions ON tblTable.FK = ? The
problem here is that I need to join on
multiple values.
I answered a similar question here.
One method would be to build a temporary table and add each selected option as a row to the temporary table. Then you would simply do a join to your temporary table.
If you want to simply create your sql dynamically you can do something like this.
SELECT * FROM tblTable WHERE option IN (selected_option_1, selected_option_2, selected_option_n)
I've found that a CLR table-valued function which takes your delimited string and calls Split on the string (returning the array as the IEnumerable) is more performant than anything written in T-SQL (it starts to break down when you have around one million items in the delimited list, but that's much further out than the T-SQL solution).
And then, you could join on the table or check with EXISTS.