Split one column with weird string into multiple columns by specific delimiter in single select sql - sql

I am seeking/hoping for a simpler solution, although I got a working solution already.
But it is hard for me to accept, that this is the only way. Therefore my hope is, that someone who is a good sql poweruser may have a better idea.
Background:
A simple table looking like that:
weirdstring
ID
A;GHL+BH;BC,NA-NB,[AB]
1
B;GHL+BH;BC,NA-NB,[AB]
2
C;GHL+BH;BC,NA-NB,[AB]
3
CREATE TABLE TESTTABLE (weirdstring varchar(MAX),
ID int);
INSERT INTO TESTTABLE
VALUES ('A;GHL+BH;BC,NA-NB,[AB]', 1);
INSERT INTO TESTTABLE
VALUES ('B;GHL+BH;BC,NA-NB,[AB]', 2);
INSERT INTO TESTTABLE
VALUES ('C;GHL+BH;BC,NA-NB,[AB]', 3);
All I need in the end is the first 3 "letter-groups" (1-3 letterst) from weirdstring (eg.ID 1 = A,GHL and BH, the rest of the string is not important now) in seperate columns:
ID
weirdstring
group1
group2
group3
1
A;GHL+BH;BC,NA-NB,[AB]
A
GHL
BH
2
B;GHL+BH;BC,NA-NB,[AB]
B
GHL
BH
3
C;GHL+BH;BC,NA-NB,[AB]
C
GHL
BH
What have been done so far is:
change all weird delimiters(;+- and potential more) in the string to comma, eliminate the brackets around "letter-groups". REPLACE daisy-chained is being used. So from A;GHL+BH;BC,NA-NB,[AB] to
A,GHL,BH,BC,NA,NB,AB first.
split the new string to columns by comma as delimiter.
The query used is:
SELECT t1.ID,
t1.weirdstring,
t2.group1,
t2.group2,
t2.group3
FROM TESTTABLE t1
LEFT JOIN (SELECT grp1.ID,
grp1.weirdstring AS group1,
grp2.weirdstring AS group2,
grp3.weirdstring AS group3
FROM (SELECT ID,
weirdstring
FROM (SELECT ID,
weirdstring,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) AS ROWNUM
FROM (SELECT ID,
value AS weirdstring
FROM TESTTABLE
CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(weirdstring, '[', ''), ']', ''), ';', ','), '+', ','), '-', ','), '.', ','), ',')
WHERE weirdstring IS NOT NULL) splitted ) s1
WHERE ROWNUM = 1) grp1
LEFT JOIN (SELECT ID,
weirdstring
FROM (SELECT ID,
weirdstring,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) AS ROWNUM
FROM (SELECT ID,
value AS weirdstring
FROM TESTTABLE
CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(weirdstring, '[', ''), ']', ''), ';', ','), '+', ','), '-', ','), '.', ','), ',')
WHERE weirdstring IS NOT NULL) splitted ) s2
WHERE ROWNUM = 2) grp2 ON grp1.ID = grp2.ID
LEFT JOIN (SELECT ID,
weirdstring
FROM (SELECT ID,
weirdstring,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) AS ROWNUM
FROM (SELECT ID,
value AS weirdstring
FROM TESTTABLE
CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(weirdstring, '[', ''), ']', ''), ';', ','), '+', ','), '-', ','), '.', ','), ',')
WHERE weirdstring IS NOT NULL) splitted ) s3
WHERE ROWNUM = 3) grp3 ON grp3.ID = grp2.ID) t2 ON t1.ID = t2.ID;
But I could not believe how much of a query have been created in the end for my small task. At least I believe its small. I am on an older version (14) of sql-server and therefore I cannot use string_split with its third parameter (enable-ordinal) Syntax:
STRING_SPLIT ( string , separator [ , enable_ordinal ] )
Note: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver16 : The enable_ordinal argument and ordinal output column are currently supported in Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics (serverless SQL pool only). Beginning with SQL Server 2022 (16.x) Preview, the argument and output column are available in SQL Server.
Is there some other, shorter ways to achieve the same results? I know that topic has been discussed many many times, but I could not find a solution to my specific problem here. Thanks in advance for any kind of help!

It seems that you are using SQL Server 2017 (v.14), so a possible option is the following JSON-based approach. The idea is to transform the stored text into a valid JSON array (A;GHL+BH;BC,NA-NB,[AB] into ["A","GHL","BH","BC","NA","NB","AB"]) using TRANSLATE() for character replacement and get the expected parts of the string using JSON_VALUE():
SELECT
weirdstring,
JSON_VALUE(jsonweirdstring, '$[0]') AS group1,
JSON_VALUE(jsonweirdstring, '$[1]') AS group2,
JSON_VALUE(jsonweirdstring, '$[2]') AS group3
FROM (
SELECT
weirdstring,
CONCAT('["', REPLACE(TRANSLATE(weirdstring, ';+-,[]', '######'), '#', '","'), '"]') AS jsonweirdstring
FROM TESTTABLE
) t

Related

Segregate column values in SQL Server

I have a table with 2 columns (Col1 & Col2) and values are stores like below:
Col1 Col2
A/B/C Red/Orange/Green
D/E Red/Orange
I want the output like below.
Col1 Col2
A Red
B Orange
C Green
D Red
E Orange
Did you try CROSS APPLY? Please replace 'your_table_name' with the name of your table. It should work, just copy and paste.
SELECT Col1, value AS Col2 INTO Table_2
FROM your_table_name
CROSS APPLY STRING_SPLIT(Col2, '/');
SELECT Col2, value AS Col1 INTO Table_3
FROM Table_2
CROSS APPLY STRING_SPLIT(Col1, '/');
SELECT * FROM Table_3;
Not easy, but doable.
I would do it by "flattening" the table:
SELECT (left bit of column 1), (left bit of column2)
UNION ALL
SELECT (middle bit of column 1), (middle bit of column 2)
where [column 1] like '%/%'
UNION ALL
SELECT (last bit of column 1), (last bit of column 2)
where [column 1] like '%/%/%'
If you have the possibility of more slashes and data, you need to add further UNIONs.
Use CHARINDEX to find the slash and SUBSTRING to extract the bits.
Maybe String split can help?
https://learn.microsoft.com/it-it/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver15
look at the example D and E
Unfortunately, the built-in string split function in SQL Server does NOT return the position in the string. In my opinion, this is a significant oversight.
Assuming your strings have no duplicate values, you can use row_number() and charindex() to add an enumeration:
select t.*, ss.*
from t cross apply
(select s1.value as value1, s2.value as value2
from (select s1.value,
row_number() over (order by charindex('/' + s1.value + '/', '/' + t.col1 + '/')) as pos
from string_split(t.col1, '/') s1
) s1 join
(select s2.value,
row_number() over (order by charindex('/' + s2.value + '/', '/' + t.col2 + '/')) as pos
from string_split(t.col2, '/') s2
) s2
on s1.pos = s2.pos
) ss;
Here is a db<>fiddle.

How to combine return results of query in one row

I have a table that save personnel code.
When I select from this table I get 3 rows result such as:
2129,3394,3508,3534
2129,3508
4056
I want when create select result combine in one row such as:
2129,3394,3508,3534,2129,3508,4056
or distinct value such as:
2129,3394,3508,3534,4056
You should ideally avoid storing CSV data at all in your tables. That being said, for your first result set we can try using STRING_AGG:
SELECT STRING_AGG(col, ',') AS output
FROM yourTable;
Your second requirement is more tricky, and we can try going through a table to remove duplicates:
WITH cte AS (
SELECT DISTINCT VALUE AS col
FROM yourTable t
CROSS APPLY STRING_SPLIT(t.col, ',')
)
SELECT STRING_AGG(col, ',') WITHIN GROUP (ORDER BY CAST(col AS INT)) AS output
FROM cte;
Demo
I solved this by using STUFF and FOR XML PATH:
SELECT
STUFF((SELECT ',' + US.remain_uncompleted
FROM Table_request US
WHERE exclusive = 0 AND reqact = 1 AND reqend = 0
FOR XML PATH('')), 1, 1, '')
Thank you Tim

Trimming a string until a value

I ve got a column which has different sized strings, looking like:
abcd, efgh, ijkl2, 2345, xyzw
I need to trim it into 2 different columns, and get the string before the comma from right and then the other one, so I will have 2 other columns with:
2345 xyz
I ve tried to get only the first part of the string before the first comma:
RTRIM(LTRIM(RIGHT(A.[column],charindex(',',A.[column]+',')-1))) as 'aa'
RIGHT(A.[column], len(A.[column]) - charindex(',',A.[column])) as 'ab'
But I get it mixed, sometimes I get some of the values after the comma, but incomplete.
Any thoughts?
Thank you, I appreciate it.
Such string functions in SQL Server are really painful. I like to use apply for such operations, because this makes it easier to keep intermediate results.
For this particular problem:
select v.str, v2.lastval, v3.secondlastval
from (values ('abcd, efgh, ijkl2, 2345, xyzw')) v(str) cross apply
(select stuff(str, 1, len(str) - charindex(',', reverse(str)) + 2, ''),
stuff(str, len(str) - charindex(',', reverse(str)) + 1, len(str), '')
) v2(lastval, rest) cross apply
(select stuff(v2.rest, 1, len(v2.rest) - charindex(',', reverse(v2.rest)) + 2, ''),
stuff(v2.rest, len(v2.rest) - charindex(',', reverse(v2.rest)) + 1, len(v2.rest), '')
) v3(secondlastval, rest);
Having said that, you should probably reconsider your data structure. Storing lists of values in delimited strings is a poor data structure in SQL because SQL does not have very good string processing capabilities. Instead, you should be using a junction/association table.
you can use string_split if you are one sql server 2016...
try Something like this :
drop TABLE REC.TestTable;
CREATE TABLE REC.TestTable (id int identity, --Avoid using keywords for column names
str varchar(500));
GO
INSERT INTO REC.TestTable (str)
VALUES ('abcd, efgh, ijkl2, 123, ijk'),
('abcd, efgh, ijkl2, 4567, lmn'),
('abcd, efgh, ijkl2, 89, opqr')
;
GO
with tmp as (
select f1.*, f2.value, row_number() over(partition by id order by (select null)) position
from REC.TestTable f1 cross apply STRING_SPLIT(str, ',') f2
)
select f1.*, f2.value ValPos1, f3.value ValPos2, f4.value ValPos3, f5.value ValPos4, f6.value ValPos5
from REC.TestTable f1
left outer join tmp f2 on f1.id=f2.id and f2.position =1
left outer join tmp f3 on f1.id=f3.id and f3.position =2
left outer join tmp f4 on f1.id=f4.id and f4.position =3
left outer join tmp f5 on f1.id=f5.id and f5.position =4
left outer join tmp f6 on f1.id=f6.id and f6.position =5

remove duplicate values from a oracle sql query's output

I have a situation where I want to remove the duplicated record from the result by using sql query in oracle 10g. I am using regular expression to remove the alphabets from the result
Original value = 1A,1B,2C,2F,4A,4z,11A,11B
Current Sql query
select REGEXP_REPLACE( tablex.column, '[A-Za-z]' , '' )
from db1
gives me the following output
1,1,2,3,4,4,11,11
how can i remove duplicate from the output to just show unique values
i.e.
1,2,3,4,11
Assuming that your table contains strings with values separated with commas.
You can try something like this:
Here is a sqlfiddle demo
select rtrim(xmltype('<r><n>' ||
replace(REGEXP_REPLACE( col, '[A-Za-z]' , '' ), ',', ',</n><n>')||',</n></r>'
).extract('//n[not(preceding::n = .)]/text()').getstringval(), ',')
from tablex;
What it does is after using your regexp_replace it makes a xmltype from it and then uses XPATH to get the desired output.
If you also want to sort the values (and still use the xml approach) then you need XSL
select rtrim(xmltype('<r><n>' ||
replace(REGEXP_REPLACE( col, '[A-Za-z]' , '' ), ',', '</n><n>')||'</n></r>'
).extract('//n[not(preceding::n = .)]')
.transform(xmltype('<?xml version="1.0" ?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/"><xsl:for-each select="//n[not(preceding::n = .)]"><xsl:sort select="." data-type="number"/><xsl:value-of select="."/>,</xsl:for-each></xsl:template></xsl:stylesheet>'))
.getstringval(), ',')
from tablex;
But you can also try different approaches, such as splitting the tokens to rows and then recollecting them
select rtrim(xmlagg(xmlelement(e, n || ',') order by to_number(n))
.extract('//text()'), ',')
from(
SELECT distinct rn, trim(regexp_substr(col, '[^,]+', 1, level)) n
FROM (select row_number() over (order by col) rn ,
REGEXP_REPLACE( col, '[A-Za-z]' , '' ) col
from tablex) t
CONNECT BY instr(col, ',', 1, level - 1) > 0
)
group by rn;

How Can I Sort A 'Version Number' Column Generically Using a SQL Server Query

I wonder if the SQL geniuses amongst us could lend me a helping hand.
I have a column VersionNo in a table Versions that contains 'version number' values like
VersionNo
---------
1.2.3.1
1.10.3.1
1.4.7.2
etc.
I am looking to sort this, but unfortunately, when I do a standard order by, it is treated as a string, so the order comes out as
VersionNo
---------
1.10.3.1
1.2.3.1
1.4.7.2
Intead of the following, which is what I am after:
VersionNo
---------
1.2.3.1
1.4.7.2
1.10.3.1
So, what I need to do is to sort by the numbers in reverse order (e.g. in a.b.c.d, I need to sort by d,c,b,a to get the correct sort ourder).
But I am stuck as to how to achieve this in a GENERIC way. Sure, I can split the string up using the various sql functions (e.g. left, right, substring, len, charindex), but I can't guarantee that there will always be 4 parts to the version number. I may have a list like this:
VersionNo
---------
1.2.3.1
1.3
1.4.7.2
1.7.1
1.10.3.1
1.16.8.0.1
Can, does anyone have any suggestions? Your help would be much appreciated.
If You are using SQL Server 2008
select VersionNo from Versions order by cast('/' + replace(VersionNo , '.', '/') + '/' as hierarchyid);
What is hierarchyid
Edit:
Solutions for 2000, 2005, 2008: Solutions to T-SQL Sorting Challenge here.
The challenge
Depending on SQL engine for MySQL would be sth like this:
SELECT versionNo FROM Versions
ORDER BY
SUBSTRING_INDEX(versionNo, '.', 1) + 0,
SUBSTRING_INDEX(SUBSTRING_INDEX(versionNo, '.', -3), '.', 1) + 0,
SUBSTRING_INDEX(SUBSTRING_INDEX(versionNo, '.', -2), '.', 1) + 0,
SUBSTRING_INDEX(versionNo, '.', -1) + 0;
For MySQL version 3.23.15 an above
SELECT versionNo FROM Versions ORDER BY INET_ATON(ip);
Another way to do it:
Assuming you only have a,b,c,d only you may as well separate the data out to columns and do an order by a,b,c,d(all desc) and get the top 1 row
If you need to scale to more than d to say e,f,g... just change 1,2,3,4, to 1,2,3,4,5,6,7 and so on in the query
Query :
see demo
create table t (versionnumber varchar(255))
insert into t values
('1.0.0.505')
,('1.0.0.506')
,('1.0.0.507')
,('1.0.0.508')
,('1.0.0.509')
,('1.0.1.2')
; with cte as
(
select
column1=row_number() over (order by (select NULL)) ,
column2=versionnumber
from t
)
select top 1
CONCAT([1],'.',[2],'.',[3],'.',[4])
from
(
select
t.column1,
split_values=SUBSTRING( t.column2, t1.N, ISNULL(NULLIF(CHARINDEX('.',t.column2,t1.N),0)-t1.N,8000)),
r= row_number() over( partition by column1 order by t1.N)
from cte t
join
(
select
t.column2,
1 as N
from cte t
UNION ALL
select
t.column2,
t1.N + 1 as N
from cte t
join
(
select
top 8000
row_number() over(order by (select NULL)) as N
from
sys.objects s1
cross join
sys.objects s2
) t1
on SUBSTRING(t.column2,t1.N,1) = '.'
) t1
on t1.column2=t.column2
)a
pivot
(
max(split_values) for r in ([1],[2],[3],[4])
)p
order by [1] desc,[2] desc,[3] desc,[4] desc
If you can, alter the schema so that the version has 4 columns instead of one. Then sorting is easy.