Split and Concat String on SQL and SSIS - sql

I am trying to split and concat a string.
Example: Data value1: "12abc,34efg,56hij"
Data value2: "12abc"
Expected result:
Numbers Column 1: "12,34,56"
Numbers Column 2: "12"
Alphabets Column 1: "abc,efg,hij"
Alphabets Column 2 "abc"
Several attempts made:
1.
SELECT [String], value, CONCAT(SUBSTRING(value,1,2), ',') AS Numbers, CONCAT(SUBSTRING(value,3,3), ',') AS Alphabets, LEFT(String,LEN(String)-CHARINDEX(',',String))
FROM [Test].[dbo].[TEST]
CROSS APPLY string_split([String],',') value
WHERE String = String
2.
SELECT [String], LEFT(String,LEN(String)-CHARINDEX(',',String)), LEFT(String,2) AS Numbers, RIGHT(STRING,3) AS Alphabets
FROM [Test].[dbo].[TEST]
WHERE String = String
I have followed [How to split a string after specific character in SQL Server and update this value to specific column] because I thought it was pretty similar but I did not receive the results I want so I do not know how to proceed or what I went wrong.
I am unsure of how to concatenate different columns into 1 column.
Additional info:
I am currently using SQL Server Management Studio v18.9.2.
*Apologies if my explanation is horrible.

Firstly, let's get to the point; your design is flawed. Never store delimited data in your database, it breaks the fundamental rules of normalisation. I strongly suggest that what you actually do here is fix your design and normalise your data.
Next, the assumptions:
You are using SQL Server 2017+
The column string can only contain alphanumerical characters (A-z, 0-9)
You are using a case insensitive collation or all characters are lowercase
If this is the case, then you can just use TRANSLATE and REPLACE to remove the characters. You'll need to create some variables (or use the tally inline) to create the replacement strings first.
So, firstly, we get the 2 variables we need, which is one containing the letters a-z, and the other with the numbers 0-9. I use a tally to achieve this:
DECLARE #Alphas varchar(26),
#Numerics varchar(10);
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP (26)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3)
SELECT #Alphas = STRING_AGG(CHAR(96 + T.I),''),
#Numerics = STRING_AGG(CASE WHEN T.I <= 10 THEN CHAR(47+T.I) END,'')
FROM Tally T;
Now we can use those values to TRANSLATE all those characters to a different character (I'm going to use a pipe (|)) and the REPLACE those pipe characters with nothing:
SELECT YT.String,
REPLACE(TRANSLATE(YT.String, #Alphas,REPLICATE('|',LEN(#Alphas))),'|','') AS Numerics,
REPLACE(TRANSLATE(YT.String, #Numerics,REPLICATE('|',LEN(#Numerics))),'|','') AS Alphas
FROM dbo.YourTable YT;
Or, of course, you could just type it out. ;)
SELECT YT.String,
REPLACE(TRANSLATE(YT.String, 'abcdefghijklmnopqrstuvwxyz',REPLICATE('|',LEN('abcdefghijklmnopqrstuvwxyz'))),'|','') AS Numerics,
REPLACE(TRANSLATE(YT.String, '0123456789',REPLICATE('|',LEN('0123456789'))),'|','') AS Numerics
FROM dbo.YourTable YT;

You can CROSS APPLY to a STRING_SPLIT that uses STRING_AGG (since Sql Server 2017) to stick the numbers and alphabets back together.
select Numbers, Alphabets
from TEST
cross apply (
select
string_agg(left(value, patindex('%[0-9][^0-9]%', value)), ',') as Numbers
, string_agg(right(value, len(value)-patindex('%[0-9][^0-9]%', value)), ',') as Alphabets
from string_split(String, ',') s
) ca;
GO
Numbers | Alphabets
:------- | :----------
12,34,56 | abc,efg,hij
12 | abc
db<>fiddle here

Related

Get Rightmost Pair of Letters from String in SQL

Given a field with combinations of letters and numbers, is there a way to get the last (Rightmost) pair of letters (2 letters) in SQL?
SAMPLE DATA
RT34-92837DF82982
DRE3-9292928373DO
FOR THOSE, I would want
DF and
DO
For clarity, there will only be numbers after these letters.
Edits
This is for SQL Server.
I would remove any characters that aren't letters, using REGEXP_REPLACE or similar function based on your DBMS.
regexp_replace(col1, '[^a-zA-Z]+', '')
Then use a RIGHT or SUBSTRING function to select the "right-most".
right(regexp_replace(col1, '[^a-zA-Z]+', ''), 2)
substring(regexp_replace(col1, '[^a-zA-Z]+', ''),len(regexp_replace(col1, '[^a-zA-Z]+', ''))-2,len(regexp_replace(col1, '[^a-zA-Z]+', ''))
If you can have single occurrences of letters ('DF1234A124') then could change the regex pattern to remove those also - ([^a-zA-Z][a-zA-Z][^a-zA-Z])|[^a-zA-Z]
As you said, there will only be numbers after these letters, you can use the Trim and Right functions as the following:
select
Right(Trim('0123456789' from val), 2) as res
from t
Note: This is valid from SQL Server 2017.
For older versions try the following:
select
Left
(
Right(val, PATINDEX('%[A-Z]%', Reverse(val))+1),
2
) as res
from t
See demo

SQL - Split string with multiple delimiters into multiple rows and columns

I am trying to split a string in SQL with the following format:
'John, Mark, Peter|23, 32, 45'.
The idea is to have all the names in the first columns and the ages in the second column.
The query should be "dynamic", the string can have several records depending on user entries.
Does anyone know how to this, and if possible without SQL functions? I have tried the cross apply approach but I wasn't able to make it work.
Any ideas?
This solution uses Jeff Moden's DelimitedSplit8k. Why? Because his solution provides the ordinal position of the item. Ordinal Position something that many others functions, including Microsoft's own STRING_SPLIT, does not provide. It's going to be vitally important for getting this to work correctly.
Once you have that, the solution becomes fairly simple:
DECLARE #NameAges varchar(8000) = 'John, Mark, Peter|23, 32, 45';
WITH Splits AS (
SELECT S1.ItemNumber AS PipeNumber,
S2.ItemNumber AS CommaNumber,
S2.Item
FROM dbo.DelimitedSplit8K (REPLACE(#NameAges,' ',''), '|') S1 --As you have spaces between the delimiters I've removed these. Be CAREFUL with that
CROSS APPLY DelimitedSplit8K (S1.item, ',') S2)
SELECT S1.Item AS [Name],
S2.Item AS Age
FROM Splits S1
JOIN Splits S2 ON S1.CommaNumber = S2.CommaNumber
AND S2.PipeNumber = 2
WHERE S1.PipeNumber = 1;

SQL Server search using like while ignoring blank spaces

I have a phone column in the database, and the records contain unwanted spaces on the right. I tried to use trim and replace, but it didn't return the correct results.
If I use
phone like '%2581254%'
it returns
customerid
-----------
33470
33472
33473
33474
but I need use percent sign or wild card in the beginning only, I want to match the left side only.
So if I use it like this
phone like '%2581254'
I get nothing, because of the spaces on the right!
So I tried to use trim and replace, and I get one result only
LTRIM(RTRIM(phone)) LIKE '%2581254'
returns
customerid
-----------
33474
Note that these four ids have same phone number!
Table data
customerid phone
-------------------------------------
33470 96506217601532388254
33472 96506217601532388254
33473 96506217601532388254
33474 96506217601532388254
33475 966508307940
I added many number for test propose
The php function takes last 7 digits and compare them.
For example
01532388254 will be 2581254
and I want to search for all users that has this 7 digits in their phone number
2581254
I can't figure out where's the problem!
It should return 4 ids instead of 1 id
Given the sample data, I suspect you have control characters in your data. For example char(13), char(10)
To confirm this, just run the following
Select customerid,phone
From YourTable
Where CharIndex(CHAR(0),[phone])+CharIndex(CHAR(1),[phone])+CharIndex(CHAR(2),[phone])+CharIndex(CHAR(3),[phone])
+CharIndex(CHAR(4),[phone])+CharIndex(CHAR(5),[phone])+CharIndex(CHAR(6),[phone])+CharIndex(CHAR(7),[phone])
+CharIndex(CHAR(8),[phone])+CharIndex(CHAR(9),[phone])+CharIndex(CHAR(10),[phone])+CharIndex(CHAR(11),[phone])
+CharIndex(CHAR(12),[phone])+CharIndex(CHAR(13),[phone])+CharIndex(CHAR(14),[phone])+CharIndex(CHAR(15),[phone])
+CharIndex(CHAR(16),[phone])+CharIndex(CHAR(17),[phone])+CharIndex(CHAR(18),[phone])+CharIndex(CHAR(19),[phone])
+CharIndex(CHAR(20),[phone])+CharIndex(CHAR(21),[phone])+CharIndex(CHAR(22),[phone])+CharIndex(CHAR(23),[phone])
+CharIndex(CHAR(24),[phone])+CharIndex(CHAR(25),[phone])+CharIndex(CHAR(26),[phone])+CharIndex(CHAR(27),[phone])
+CharIndex(CHAR(28),[phone])+CharIndex(CHAR(29),[phone])+CharIndex(CHAR(30),[phone])+CharIndex(CHAR(31),[phone])
+CharIndex(CHAR(127),[phone]) >0
If the Test Results are Positive
The following UDF can be used to strip the control characters from your data via an update
Update YourTable Set Phone=[dbo].[udf-Str-Strip-Control](Phone)
The UDF if Interested
CREATE FUNCTION [dbo].[udf-Str-Strip-Control](#S varchar(max))
Returns varchar(max)
Begin
;with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(C) As (Select Top (32) Char(Row_Number() over (Order By (Select NULL))-1) From cte1 a,cte1 b)
Select #S = Replace(#S,C,' ')
From cte2
Return LTrim(RTrim(Replace(Replace(Replace(#S,' ','><'),'<>',''),'><',' ')))
End
--Select [dbo].[udf-Str-Strip-Control]('Michael '+char(13)+char(10)+'LastName') --Returns: Michael LastName
As promised (and nudged by Bill), the following is a little commentary on the UDF.
We pass a string that we want stripped of Control Characters
We create an ad-hoc tally table of ascii characters 0 - 31
We then run a global search-and-replace for each character in the
tally-table. Each character found will be replaced with a space
The final string is stripped of repeating spaces (a little trick
Gordon demonstrated several weeks ago - don't have the original
link)

Get rows that contain only certain characters

I want to get only those rows that contain ONLY certain characters in a column.
Let's say the column name is DATA.
I want to get all rows where in DATA are ONLY (must have all three conditions!):
Numeric characters (1 2 3 4 5 6 7 8 9 0)
Dash (-)
Comma (,)
For instance:
Value "10,20,20-30,30" IS OK
Value "10,20A,20-30,30Z" IS NOT OK
Value "30" IS NOT OK
Value "AAAA" IS NOT OK
Value "30-" IS NOT OK
Value "30," IS NOT OK
Value "-," IS NOT OK
Try patindex:
select * from(
select '10,20,20-30,30' txt union
select '10,20,20-30,40' txt union
select '10,20A,20-30,30Z' txt
)x
where patindex('%[^0-9,-]%', txt)=0
For you table, try like:
select
DATA
from
YourTable
where
patindex('%[^0-9,-]%', DATA)=0
As per your new edited question, the query should be like:
select
DATA
from
YourTable
where
PATINDEX('%[^0-9,-]%', DATA)=0 and
PATINDEX('%[0-9]%', LEFT(DATA, 1))=1 and
PATINDEX('%[0-9]%', RIGHT(DATA, 1))=1 and
PATINDEX('%[,-][-,]%', DATA)=0
Edit: Your question was edited, so this answer is no longer correct. I won't bother updating it since someone else already has updated theirs. This answer does not fulfil the condition that all three character types must be found.
You can use a LIKE expression for this, although it's slightly convoluted:
where data not like '%[^0123456789,!-]%' escape '!'
Explanation:
[^...] matches any character that is not in the ... part. % matches any number (including zero) of any character. So [^0123456789-,] is the set of characters that you want to disallow.
However: - is a special character inside of [], so we must escape it, which we do by using an escape character, and I've chosen !.
So, you match rows that do not contain (not like) any character that is not in your disallowed set.
Use option with PATINDEX and LIKE logic operator
SELECT *
FROM dbo.test70
WHERE PATINDEX('%[A-Z]%', DATA) = 0
AND PATINDEX('%[0-9]%', DATA) > 0
AND DATA LIKE '%-%'
AND DATA LIKE '%,%'
Demo on SQLFiddle
As already mentioned u can use a LIKE expression but it will only work with some minor modifications, otherwise too many rows will be filtered out.
SELECT * FROM X WHERE T NOT LIKE '%[^0-9!-,]%' ESCAPE '!'
see working example here:
http://sqlfiddle.com/#!3/474f5/6
edit:
to meet all 3 conditions:
SELECT *
FROM X
WHERE T LIKE '%[0-9]%'
AND T LIKE '%-%'
AND T LIKE '%,%'
see: http://sqlfiddle.com/#!3/86328/1
Maybe not the most beautiful but a working solution.

Splitting the full name and writing it to another table in SQL Server 2008

I have a table, say A, in which there is a column FULLNAME. Values stored under this column are in the format of "surname name middle_name" (with one space between each). And I have another table B, in which I have columns SURNAME, NAME and MIDDLENAME. What would be the best way to take all of the FULLNAME cells from the table A, split them accordingly and insert them into the table B?
Thanks
You can combine functions for searching an occurence in a string (which return normally its index) with the Substring function, besides you will need the Left and Right functions
For example in SQL Server you will find the functions:
CHARINDEX ( expressionToFind ,expressionToSearch [ , start_location ] )
SUBSTRING ( expression ,start , length )
LEFT ( character_expression , integer_expression )
RIGHT ( character_expression , integer_expression )
STEPS:
Use the LEFT to get the 1st word (integer_expression = index of 1st
Emtpy space)
Use Substring to get the middle word (start is the index of 1st
Emtpy space + 1 , length is the entire length - the second index of
the emtpy space, use the startlocation to search the second occurence which should be the first occurence +1)
Use the right function to get the last word similar to step 1
Notice that if you have any names including empty spaces in the middle (example a first name like anna maria) this wouldnt work as expected.
This query will spilt your string.
select left(FULLNAME,CHARINDEX(' ',FULLNAME)), SUBSTRING(FULLNAME,CHARINDEX(' ',name)+1,len(FULLNAME)) from tableA