Split query results with long strings into multiple rows in T-SQL - sql

I have a table in MS SQL Server that contains multiple TEXT fields that can have very long strings (from 0 characters to 100,000+ characters).
I'd like to create a view (or a stored proc that populates a reporting table) that prepares this data for export to Excel, which has a certain character limit allowable per cell (32,767 chars).
It's relatively trivial to write a query to truncate the fields after a certain number of characters, but I'd like to create new rows containing the text that would be truncated.
Example - Row 1, Col1 and Col3 contains text that is wrapped into 2 rows.
ID | COL1 | COL 2 | COL 3 |
1 AAAAAA BBBBBBB CCCCCC
1 AAA CC
2 XX YY ZZ

You can try something along this:
A mockup table to simulate your issue
DECLARE #tbl TABLE(ID INT IDENTITY, LongString VARCHAR(1000));
INSERT INTO #tbl VALUES('blah')
,('blah blah')
,('blah bleh blih bloh')
,('blah bleh blih bloh bluuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh');
--We can specify the chunk's length
DECLARE #Chunk INT=6;
SELECT t.ID
,A.Nmbr AS ChunkNmbr
,SUBSTRING(t.LongString,A.Nmbr*#Chunk+1,#Chunk) AS ChunkOfString
FROM #tbl t
CROSS APPLY(SELECT TOP(LEN(t.LongString)/#Chunk + 1)
ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1
FROM master..spt_values) A(Nmbr);
The idea in short:
We use a trick with APPLY and a computed TOP-clause. The source master..spt_values is just a common table with a lot of rows. We don't need the values, just a set to compute a running number using ROW_NUMBER(). APPLY will be called row-by-row. That means, that a long string will create more numbers than a short one.
To get your chunks I use a simple SUBSTRING(), where we compute the start of each chunk by rather simple multiplication.
UPDATE: More than one column in one go
Try this to use this approach for more than one column
DECLARE #tbl TABLE(ID INT IDENTITY, LongString1 VARCHAR(1000), LongString2 VARCHAR(1000));
INSERT INTO #tbl VALUES('blah','dsfafadafdsafdsafdsafsadfdsafdsafdsf')
,('blah blah','afdsafdsafd')
,('blah bleh blih bloh','adfdsafdsafdfdsafdsafdafdsaasdfdsafdsafdsafdsafdsafsadfsadfdsafdsafdsafdsafdafdsafdsafadf')
,('blah bleh blih bloh bluuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh','asdfdsaf');
DECLARE #Chunk INT=6;
SELECT t.ID
,A.MaxLen
,B.Nmbr AS ChunkNmbr
,SUBSTRING(t.LongString1,B.Nmbr*#Chunk+1,#Chunk) AS ChunkOfString1
,SUBSTRING(t.LongString2,B.Nmbr*#Chunk+1,#Chunk) AS ChunkOfString1
FROM #tbl t
CROSS APPLY(SELECT MAX(strLen) FROM (VALUES(LEN(t.LongString1)),(LEN(t.LongString2))) vals(strLen)) A(MaxLen)
CROSS APPLY(SELECT TOP(A.MaxLen/#Chunk + 1)
ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1
FROM master..spt_values) B(Nmbr);
The new idea: We use an APPLY first to find the longest string in one row. We have to do the chunk computations only for this maximum number.

Related

data type of each characters in a varchar T-sql

I'm curious on the data I get from someone. Most of the time I need to get 3 integers then a space then eight integers.
And The integration created a column varchar(20) ... Don't doubt it works, but that gives me some matching errors.
Because of this, I'd like to know what is the data type of the characters on each row.
For exemple : 0 is for integer, s for space, a for char, * for specific
AWB | data type
---------------------------------
012 12345678 | 000s00000000
9/5 ab0534 | 0*0saa0000
I'd like to know if there is a function or a formula to get this kind of results.
Right after I'll be able to group by this column and finally be able to check how good is the data quality.
I don't know if there is a specific word for what I tried to explain, so excuse me if this is a duplicate of a post, I didn't find it.
Thank you for your feedback.
There's nothing built-in, but you might use an approach like this:
DECLARE #tbl TABLE(ID INT IDENTITY,AWB VARCHAR(100));
INSERT INTO #tbl VALUES
('012 12345678')
,('9/5 ab0534');
WITH cte AS
(
SELECT t.ID
,t.AWB
,A.Nmbr
,C.YourMask
FROM #tbl t
CROSS APPLY (SELECT TOP (DATALENGTH(t.AWB)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY (SELECT SUBSTRING(t.AWB,A.Nmbr,1)) B(SingleCharacter)
CROSS APPLY (SELECT CASE WHEN B.SingleCharacter LIKE '[0-9]' THEN '0'
WHEN B.SingleCharacter LIKE '[a-z]' THEN 'a'
WHEN B.SingleCharacter = ' ' THEN 's'
ELSE '*' END) C(YourMask)
)
SELECT ID
,AWB
,(
SELECT YourMask
FROM cte cte2
WHERE cte2.ID=cte.ID
ORDER BY cte2.Nmbr
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)') YourMaskConcatenated
FROM cte
GROUP BY ID,AWB;
The idea in short:
The cte will create a derived set of your table.
The first CROSS APPLY will create a list of numbers as long as the current AWB value.
The second CROSS APPLY will read each character separately.
The third CROSS APPLY will finally use some rather simple logic to translate your values to the mask you expect.
The final SELECT will then use GROUP BY and a correlated sub-query with FOR XML to get the mask characters re-concatenated (With version v2017+ this would be easier calling STRING_AGG()).

How do i find max combination from given result string in SQL

Here is the output.
ID Stack
-----------------------------------
123 307290,303665,307285
123 307290,307285,303424,303665
123 307290,307285,303800,303665
123 307061,307290
I want output like only last three row. The reason is in 1st output line stack column all three numbers are available in output line 2 and 3 stack column, so I don't need output line 1.
But the output lines 2,3,4 is different so I want those lines in my result.
I have tried doing it with row_number() and charindex but I'm not getting the proper result.
Thank you.
All the comments telling you to change your database's structure are right! You really should avoid comma separated values. This is breaking 1.NF and will be a pain in the neck forever.
The result of the second CTE might be used to shift all data into a new 1:n related structure.
Something like this?
DECLARE #tbl TABLE(ID INT,Stack VARCHAR(100));
INSERT INTO #tbl VALUES
(123,'307290,303665,307285')
,(123,'307290,307285,303424,303665')
,(123,'307290,307285,303800,303665')
,(123,'307061,307290');
WITH Splitted AS
(
SELECT ID
,Stack
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowIndex
,CAST('<x>' + REPLACE(Stack,',','</x><x>') + '</x>' AS XML) Casted
FROM #tbl
)
,DerivedDistinctValues AS
(
SELECT DISTINCT
ID
,Stack
,RowIndex
,StackNr.value('.','int') AS Nr
FROM Splitted
CROSS APPLY Casted.nodes('/x') AS A(StackNr)
)
SELECT ddv1.ID
,ddv1.Stack
FROM DerivedDistinctValues AS ddv1
FULL OUTER JOIN DerivedDistinctValues AS ddv2 ON ddv1.RowIndex<>ddv2.RowIndex
AND ddv1.Nr=ddv2.Nr
WHERE ddv2.ID IS NULL
GROUP BY ddv1.ID,ddv1.Stack
This will be slow, especially with larger data sets.
Some explanation:
The first CTE will transform the CSV numbers to <x>307290</x><x>303665</x>... This can be casted to XML, which allows to generate a derived table returning all the numbers as rows. This happens in the second CTE calling the XQuery function .nodes().
The last query will do a full outer join - each with each. All rows, where there is at least one row without a corresponding row are to be kept.
But I assume, that this might not work with each and any situation (e.g. circular data)

how to replace a column value which is separated by comma in SQL

I am having a table which is having a column named as CDR.
In that CDR column we have values stored as comma separated like 20,5,40,10,30
I just need to replace last value(here it is 30) to 0 in every row.
Can someone suggest me how can we do?
Thanks
If you are able, first correct the database design as the table is not in first normal form. It is bad design to have more than one value stored in one column, as evidenced by you having to ask this question. :-) Having said that, I have to deal with vendor data that has the same issue that is beyond my control to change, so in Oracle 11g I would do this:
update table_name
set CDR = regexp_replace(CDR, '(.*,)\d+$', '\10');
The regex matches and remembers all characters up to and including the last comma before one or more digits right before the end of the string. The replace string is the remembered part referenced by the \1, referring to the first grouping of characters inside parenthesis), plus the 0.
If you are using SQL Server, this should do for you.
create table #A(id int , cdr varchar(100))
insert into #A values(1,'10,20,30,40'),(2, '20,30,40,50'),(3,'30,40,50,60,70')
Declare #tA as table(id int , String varchar(10))
insert into #tA
SELECT id,
Split.a.value('.', 'VARCHAR(100)') AS String
FROM (SELECT [id],
CAST ('<M>' + REPLACE([cdr], ',', '</M><M>') + '</M>' AS XML) AS String
FROM #A) AS A CROSS APPLY String.nodes ('/M') AS Split(a);
delete from #tA where [String] = '30'
SELECT distinct id,
ISNULL(STUFF((SELECT ', ' + String
FROM #tA t
WHERE t.id = ta.id
FOR XML PATH('')
), 1, 1, ''), '') AS Str
into #tempA
FROM #tA ta
select * from #tempA
drop table #A, #tempA
UPDATE TableName
SET CDR = REPLACE(CDR, (SUBSTRING( CDR, LEN(CDR) - CHARINDEX(',',REVERSE(CDR)) + 2 , LEN(CDR))),0);
You should think about splitting up your comma separated list into a separate table. That way you can do other things in SQL. SQL is not the best with string manipulation and your queries are gonna get obscene and unruly.
table Users
user_id user_name job_list
1 Billy "1,2,3,4"
table Jobs
job_id job_desc
1 plumber
2 carpenter
3 electrician
4 programmer
If you do this you're gonna have some heartaches where a job goes away or something you're gonna have a lot of annoying cleanup like #jarlh suggests.
If you make a third table to hold the relationships user_id to job_id you will have a much better time if you need to do something like delete a job_id from existence. Of course this is all made up based on your limited question, but it should help you out.
table UserJobRelationship
relationship_id user_id job_id
1 1 1
2 1 2
3 1 3
4 1 4
Gives you much more flexibility and allows you to delete the most recent entry. You can simply just do max of relationship_id where user_id equals that user or you can do it for the whole table.

I am looking to update an integer value of one column based on partial data contained in a VARCHAR Column

Column A (VAR CHAR) contains a UNC Path eg.
lab03-app01\66\2016\10\3\LAB03-REC01\4989\1_6337230127359919046_6337230127366210371.wav
Within the UNC Path is an index number 4989.
I need to be able to update Column B (INT) to be equal to the value of Index Number located in Column A.
Is this possible?
This would be much easier with CLR and regular expressions.
One way of doing what you need in TSQL is below (demo).
DECLARE #T TABLE (A VARCHAR(255), B INT NULL);
INSERT INTO #T(A) VALUES ('lab03-app01\66\2016\10\3\LAB03-REC01\4989\1_6337230127359919046_6337230127366210371.wav')
UPDATE T
SET B = CAST(REVERSE(SUBSTRING(ReverseA, FinalSlash, PenultimateSlash - FinalSlash)) AS INT)
FROM #T T
CROSS APPLY (SELECT REVERSE(A)) ca1(ReverseA)
CROSS APPLY (SELECT 1 + CHARINDEX('\', ReverseA)) ca2(FinalSlash)
CROSS APPLY (SELECT CHARINDEX('\', ReverseA, FinalSlash)) ca3(PenultimateSlash);
SELECT *
FROM #T;
I think charindex() does what you want:
select charindex('4989', a)
To be safe, you might want to include the delimiters:
select charindex('\4989\', a)

SQL - create multiple rows from a single row

I'm a longtime SAS programmer and we're looking at moving our system from SAS to another platform. I've only a very basic knowledge of SQL the marketing folks talk about using SQL a lot but I wonder how it might do some things we need done. For instance, we have files with up to 50 million rows of vaccination records for each vaccine that was administered to a patient. Some vaccines are actually a combination vaccine that represent 2-4 different types of vaccines. The type of vaccine is based on the value of CVX. Using a do-loop it's fairly simple to do this in SAS, but I've no idea of how it might be done in SQL. It's be safe to assume that we have all the CVX codes in a table with 1 to 4 vaccine types that need to be generated. But how would you do it in SQL?
Thanks,
Steve
I don't know anything about your schema, but it sounds like you're looking to split multiple columns into rows. You can use either CROSS APPLY or the UNPIVOT operator. Here is an example that takes a contrived test table and splits it into separate rows per key:
create table #Test
(
Test_Key int identity(1,1) primary key clustered,
Test_A int,
Test_B int,
Test_C int
)
declare #n int
set #n = 1
while #n < 10000
begin
insert into #Test (Test_A, Test_B, Test_C)
select #n * 5 + 1, #n * 5 + 2, #n * 5 + 3
set #n = #n + 1
end
select * from #Test
-- this example converts the columns into rows using CROSS APPLY
-- this may be slightly less expensive than the UNPIVOT example below
select
F_Key,
F_Value
from #Test
cross apply
(
values
(Test_Key, Test_A), -- 1st row is Test_A
(Test_Key, Test_B), -- 2nd row is Test_B
(Test_Key, Test_C) -- 3rd row is Test_C
) as F(F_Key, F_Value)
-- this example converts the columns into rows using the UNPIVOT operator
select
Test_Key, TestKey
from #Test
unpivot
(TestKey for Test_Type in (Test_A, Test_B, Test_C)) as C
drop table #Test