How to select 2 cross split string column in single query - sql

CREATE TABLE #StudentClasses
(
ID INT,
Student VARCHAR(100),
Classes VARCHAR(100),
CCode VARCHAR(30)
)
GO
INSERT INTO #StudentClasses
SELECT 1, 'Mark', 'Maths,Science,English', 'US,UK,AUS'
UNION ALL
SELECT 2, 'John', 'Science,English', 'BE,DE'
UNION ALL
SELECT 3, 'Robert', 'Maths,English', 'CA,IN'
GO
SELECT *
FROM #StudentClasses
GO
SELECT ID, Student, value ,value
FROM #StudentClasses
CROSS APPLY STRING_SPLIT(Classes, ',')
CROSS APPLY STRING_SPLIT(CCode, ',')

This must be put in the very first place: Do not store delimited data! If there is any chance to change your table's design, you should use related side-tables to store data this kind...
Your question is not much better than the one before. Without your expected result any suggestion must be guessing.
What I guess: You want to transform 'Maths,Science,English', 'US,UK,AUS' in a way, that Maths goes along with US, Science along with UK and English matches AUS. Try this
SELECT sc.ID
,sc.Student
,A.[key] AS Position
,A.[value] AS Class
,B.[value] AS CCode
FROM #StudentClasses sc
CROSS APPLY OPENJSON('["' + REPLACE(Classes,',','","') + '"]') A
CROSS APPLY OPENJSON('["' + REPLACE(CCode,',','","') + '"]') B
WHERE A.[key]=B.[key];
You did not tell us your SQL Server's version... But you tagged with Azure. Therefore I assume, that v2016 is okay for you. With a lower version (or a lower compatibility level of the given database) there is no JSON support.
Why JSON at all? This is the best way at the moment to split CSV data and get the fragments together with their position within the array. Regrettfully STRING_SPLIT() does not guarantee to return the expected order. With versions lower than v2016 there are several more or less ugly tricks...
If you need your result side-by-side you should read about conditional aggregation.

use select all or use alias
CREATE TABLE #StudentClasses
(ID INT, Student VARCHAR(100), Classes VARCHAR(100),CCode varchar(30))
INSERT INTO #StudentClasses
SELECT 1, 'Mark', 'Maths,Science,English', 'US,UK,AUS'
UNION ALL
SELECT 2, 'John', 'Science,English', 'BE,DE'
UNION ALL
SELECT 3, 'Robert', 'Maths,English', 'CA,IN'
SELECT *,v1.value as clases,v2.value as codes
FROM #StudentClasses
CROSS APPLY STRING_SPLIT(Classes, ',') v2
CROSS APPLY STRING_SPLIT(CCode,
',') v1

Related

Join using a LIKE clause is taking too long

Please see the TSQL below:
create table #IDs (id varchar(100))
insert into #IDs values ('123')
insert into #IDs values ('456')
insert into #IDs values ('789')
insert into #IDs values ('1010')
create table #Notes (Note varchar(500))
insert into #Notes values ('Here is a note for 123')
insert into #Notes values ('A note for 789 here')
insert into #Notes values ('456 has a note here')
I want to find all the IDs that are referenced in the #Notes table. This works:
select #IDs.id from #IDs inner join #Notes on #Notes.note like '%' + #IDs.id + '%'
However, there are hundreds of thousands of records in both tables and the query does not complete. I was thinking about FreeText searching, but I don't think it can be applied here. A cursor takes too long to run as well (I think it will take over one month). Is there anything else I can try? I am using SQL Server 2019.
The size of the input is only one aspect of the solution.
By splitting the text to tokens you indeed increase the number of records, but in the same time you enable equality join, which can be implemented using Hash Join.
You should get the query results in a few minutes top, basically the time it takes to your system to do a full scan on both tables, plus some processing time.
No need for temp tables.
No need for indexes.
Select id
from #IDS
where id in (select w.value
from #Notes as n
cross apply string_split(n.Note, ' ') as w
)
Fiddle
Per the OP request -
Here is a code that handles more complicated scenario, where an id could contain various characters (as defined by #token_char) and the separators are potentially all other characters
declare #token_char varchar(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
;
with cte_notes as
(
select Note
,replace(translate(Note,#token_char,space(len(#token_char))),' ','') as non_token_char
from #Notes
)
select id
from #IDS
where id in
(
select w.value
from cte_notes as n
cross apply string_split(translate(n.Note,n.non_token_char,space(len(n.non_token_char))),' ') as w
where w.value != ''
)
The Fiddle data sample was altered accordingly, to reflect the change
If you are going to do this search often you may want to explore using a wonderful (if underused) feature of SQL Server called 'Full Text Search.' To quote Microsoft:
A LIKE query against millions of rows of text data can take minutes to
return; whereas a full-text query can take only seconds or less
against the same data, depending on the number of rows that are
returned.'
I have seen searches go from minutes to seconds using this feature.
You would need to create a Full Text Search Catalog and then create indexs on the tables you want to search. It's not hard and will take you a few minutes to learn how to do this.
This is a good starting point:
https://learn.microsoft.com/en-us/sql/relational-databases/search/get-started-with-full-text-search?view=sql-server-ver15
I would apply CTE with string_split to filter out all alphabetic components and then join #ID table with the result of the CTE by id column. The query was tested on a sample of 1 mm rows.
With CTE As (
Select T.value As id
From #Notes Cross Apply String_Split(Note,' ') As T
Where Try_Convert(Int, T.value) Is Not Null
)
Select I.id
From #IDs As I Inner Join CTE On (I.id=CTE.id)
If you just want to extract a numeric value from a string, in this case join is excessive.
Select T.value As id, #Notes.Note
From #Notes Cross Apply String_Split(Note,' ') As T
Where Try_Convert(Int, T.value) Is Not Null And T.value Like '%[0-9]%'
id
Note
123
Here is a note for 123
789
A note for 789 here
456
456 has a note here
No matter what, under the given circumstances, I would use join to filter out those numbers that are not represented in #IDs table.
With CTE As (
Select distinct(id) As id
From #IDs
)
Select T.value As id, #Notes.Note
From #Notes Cross Apply String_Split(Note,' ') As T
Inner Join CTE On (T.value=CTE.id)
Where Try_Convert(Int, T.value) Is Not Null
And T.value Like '%[0-9]%'
If the string contains brackets or parenthesis instead of spaces like this:
"456(this is an id number) has a note here" or "456[01/01/2022]"
as last resorts (since it degrades performance) you can use TRANSLATE to replace those brackets with spaces as follows:
With CTE As (
Select distinct(id) As id
From #IDs
)
Select T.value As id, #Notes.Note
From #Notes Cross Apply String_Split(TRANSLATE(Note,'[]()',' '),' ') As T
Inner Join CTE On (T.value=CTE.id)
Where Try_Convert(Int, T.value) Is Not Null
And T.value Like '%[0-9]%'
db<>fiddle

Alternate approach to WITH CTE and large UNION query

I'd like to rework a script I've been given.
The way it currently works is via a WITH CTE using a large number of UNIONs.
Current setup
We're taking one record from a source table, inserting it into a destination table once with [Name] A then inserting it again with [Name] B. Essentially creating multiple rows in the destination, albeit with different [Name].
An example of one transaction would be to take this row from [Source]:
ID [123] Name [Red and Green]
The results of my current set up in the [Destination] is:
ID [123] Name [Red]
ID [123] Name [Green]
Current logic
Here's a simplified version of the current logic:
WITH CTE
AS
(SELECT ID,
'Red' AS [Name]
FROM [Source_Table]
WHERE [Name] = 'Red and Green'
UNION ALL
SELECT ID,
'Green' AS [Name]
FROM [Source_Table]
WHERE [Name] = 'Red and Green')
INSERT INTO [Destination_Table]
(ID,
[Name])
SELECT ID,
[Name]
FROM CTE;
The reason I'd like to rework this is when we get a new [Name], we have to manually add another portion of code into our (ever increasing) UNION, to make sure it gets picked up.
What I've considered
What I was considering was setting up a WHILE LOOP (or CURSOR) running off a control table, where we could store all of the [Names]. However, I'm not sure if this would be the best approach and I'm not too familiar yet with LOOPS/CURSORS. Also, Wouldn't be too sure of how to stop the loop once all [Name]s had been completed.
Any help much appreciated.
You can use cross apply to duplicate the rows:
insert into [destination_table] (id, name)
select x.*
from source_table s
cross apply (values (id, 'Red'), (id, 'Green')) x(id, name)
where name = 'Red and Green'
Introduce a new table called Color_List which just contains one row for each possible color. Then do this:
with cte as
(
select
st.ID,
c.colorname
from
Source_Table s
inner join
Color_List c
on
CHARINDEX(c.colorname, s.[Name]) > 0
)
insert into Destination_Table
(
ID,
[Name]
)
select
ID,
colorname
from
cte
The benefit of this method is that you aren't hard-coding any color names in the query. All the color names (and presumably there can be many more than two) get maintained in the Color_List table.
You could use string_split to split the values apart. First replace the ' and ' with a pipe '|'. Then do a string split on the vertical pipe.
drop table if exists #tTEST;
go
select * INTO #tTEST from (values
(1, '[123]', 'Name', '[Red and Green]')) V(ID, testCol, nameCol, stringCol);
select ID, testCol, nameCol,
case when left([value], 1)!='[' then concat('[',[value]) else
case when right([value], 1)!=']' then concat([value], ']') else [value] end end valCol
from #tTEST t
cross apply string_split(replace(t.stringCol, ' and ', '|'), '|');
Results
ID testCol nameCol valCol
1 [123] Name [Red]
1 [123] Name [Green]

Select from a comma separated list in a colum

I have a table, Table 1,
and I want to select all regions that are neighbours of region 3. What would be my query for this? The NEIGHBOUR column is a CHAR column.
I know the table should not be set up this way, but that is what I have to work with, as I don't have the rights to the database.
Fix your data model! There are numerous reasons why this is broken.
But if you ware stuck with it, you can use:
select t.*
from t
where ',' + neighbor + ',' like '%,3,%';
You can also unnest the value using string_split():
select t.*
from t cross apply
string_split(t.neighbor, ',') s
where s.value = '3';
You can use STRING_SPLIT() as
SELECT *
FROM Data
WHERE Region IN
(
SELECT Value
FROM STRING_SPLIT((SELECT Neighbor FROM Data WHERE Region = 3), ',')
);
The query 'll returns 0 rows because there is no region in the table marked as a neighbor for region 3.
If you change (3, 'Name3', '5,8,12'), to (3, 'Name3', '1,2'),, then it'll returns the regions 1 and 2 because they 're neighbors of region 3.
Here is a db<>fiddle.
Another way without using a string splitter
SELECT *
FROM Data D
JOIN (VALUES((SELECT Neighbor FROM Data WHERE Region = 3))) T(V)
ON CONCAT(',', T.V, ',') LIKE CONCAT('%,', D.Region,',%');

Order by with regular expression

there a column which i want to sort
C_NUMBER
---------
1718-SI-1
1718-SI-2
1718-SI-10
1718-SI-13
1718-SI-5
1718-SI-6
1718-SI-11
and this is the query where i am bringing my data in one table and applying order by but it is not working.
MYTABLE order by MYTABLE.C_NUMBER asc, patindex('%0-9]%',MYTABLE.C_NUMBER),len(MYTABLE.C_NUMBER)
I think you don't want to be ordering by the C_NUMBER first, just your patindex section. Try this;
order by patindex('%0-9]%',MYTABLE.C_NUMBER),len(MYTABLE.C_NUMBER)
Try this
;With cte(C_NUMBER)
AS
(
SELECT '1718-SI-1' UNION ALL
SELECT '1718-SI-2' UNION ALL
SELECT '1718-SI-10' UNION ALL
SELECT '1718-SI-13' UNION ALL
SELECT '1718-SI-5' UNION ALL
SELECT '1718-SI-6' UNION ALL
SELECT '1718-SI-11'
)
SELECT * FROM cte
Order by CAST(REVERSE(SUBSTRING(REVERSE(C_NUMBER),0, CHARINDEX('-',REVERSE(C_NUMBER)))) AS INT)
Result
C_NUMBER
----------
1718-SI-1
1718-SI-2
1718-SI-5
1718-SI-6
1718-SI-10
1718-SI-11
1718-SI-13
This code will cut your string in three parts and use these values - typesafe!! - in the ORDER BY clause separately. This will sort differen 1718-SI or 1719-BI or whatever you might have.
DECLARE #mockup TABLE(ID INT IDENTITY, C_NUMBER VARCHAR(100));
INSERT INTO #mockup VALUES
('1718-SI-1')
,('1718-SI-2')
,('1718-SI-10')
,('1718-SI-13')
,('1718-SI-5')
,('1718-SI-6')
,('1718-SI-11');
SELECT * FROM #mockup;
SELECT m.*
FROM #mockup AS m
CROSS APPLY(SELECT CAST('<x>' + REPLACE(m.C_NUMBER,'-','</x><x>') + '</x>' AS XML)) AS Parted(AsXML)
ORDER BY AsXML.value('/x[1]/text()[1]','int')
,AsXML.value('/x[2]/text()[1]','nvarchar(max)')
,AsXML.value('/x[3]/text()[1]','int')
HINT: Your design breaks 1.NF
Store the three parts in three different typed columns, apply indexes and create the output format on-the-fly. You should not store more than one value in a cell...

Split column data into multiple rows

I have data currently in my table like below under currently section. I need the selected column data which is comma delimited to be converted into the format marked in green (Read and write of a category together)
Any ways to do it in SQL Server?
Please do look at the proposal data carefully....
Maybe I wasn't clear before: this isn't merely the splitting that is the issue but to group all reads and writes of a category together(sometimes they are just merely reads/writes), it's not merely putting comma separated values in multiple rows.
-- script:
use master
Create table prodLines(id int , prodlines varchar(1000))
--drop table prodLines
insert into prodLines values(1, 'Automotive Coatings (Read), Automotive Coatings (Write), Industrial Coatings (Read), S.P.S. (Read), Shared PL''s (Read)')
insert into prodLines values(2, 'Automotive Coatings (Read), Automotive Coatings (Write), Industrial Coatings (Read), S.P.S. (Read), Shared PL''s (Read)')
select * from prodLines
Using Jeff's DelimitedSplit8K
;
with cte as
(
select id, prodlines, ItemNumber, Item = ltrim(Item),
grp = dense_rank() over (partition by id order by replace(replace(ltrim(Item), '(Read)', ''), '(Write)', ''))
from #prodLines pl
cross apply dbo.DelimitedSplit8K(prodlines, ',') c
)
select id, prodlines, prod = stuff(prod, 1, 1, '')
from cte c
cross apply
(
select ',' + Item
from cte x
where x.id = c.id
and x.grp = c.grp
order by x.Item
for xml path('')
) i (prod)
Take a look at STRING_SPLIT, You can do something like:
SELECT user_access
FROM Product
CROSS APPLY STRING_SPLIT(user_access, ',');
or what ever you care about.
Simply use LATERAL VIEW EXPLODE function:
select access from Product
LATERAL VIEW explode(split(user_access, '[,]')) usrAccTable AS access;