Select chunked data with row number using T-SQL - sql

I have a SQL Server table containing a long text (varchar(max)) column and several key columns. I need to load this data into another table, but break the long text into chunks (varchar(4000)), so each row in the source table may become multiple rows in the target table.
Is there a way using T-SQL which can do this in one select statement and also provide a row_number? Sort of like partition over chunk size, if this makes sense.

Recursive CTE to split the string into 4000 character chunks (as a variation on this comma splitting thread: Turning a Comma Separated string into individual rows)
;WITH cte(SomeID, RowNum, DataItem, String) AS
(
SELECT
SomeID,
1,
LEFT(String, 4000),
case when len(String) > 4000 then
RIGHT(String, LEN(String) - 4000)
else
''
end
FROM myTable
UNION all
SELECT
SomeID,
RowNum + 1,
LEFT(String, 4000),
case when len(String) > 4000 then
RIGHT(String, LEN(String) - 4000)
else
''
end
FROM cte
WHERE
String > ''
)
SELECT *
FROM cte

You can use a recursive subquery:
with cte as (
select left(longtext, 4000) as val, 1 as rn,
stuff(longtext, 1, 4000, '') as rest
from t
union all
select left(rest, 4000), rn + 1,
stuff(longtext, 1, 4000, '') as rest
from cte
where longtext > ''
)
select *
from cte;
You probably want to include other columns as well, but your question doesn't mention them.
Also, if your text is really long (say more than a handful of chunks per row), then this is not going to be the most efficient method. There are related methods using recursive CTEs to solve this.

Related

Convert comma separated string into rows

I have a comma separated string.
Now I'd like to separate this string value into each row.
Input:
1,2,3,4,5
Required output:
value
----------
1
2
3
4
5
How can I achieve this in sql?
Thanks in advance.
Use the STRING_SPLIT function if you are on SQL Server
SELECT value
FROM STRING_SPLIT('1,2,3,4,5', ',')
Else you can loop on the SUBSTRING_INDEX() function and insert every string in a temporary table.
If you are using Postgres, you can use string_to_array and unnest:
select *
from unnest(string_to_array('1,2,3,4,5',',') as t(value);
In Postgres, you can also use the 'regexp_split_to_table()' function.
If you're using MariaDB or MySQL you can use a recursive CTE such as:
with recursive itemtable as (
select
trim(substring_index(data, ',', 1)) as value,
right(data, length(data) - locate(',', data, 1)) as data
from (select '1,2,3,4,5' as data) as input
union
select
trim(substring_index(data, ',', 1)) as value,
right(data, length(data) - locate(',', data, 1)) as data
from itemtable
)

Using Oracle REGEXP_SUBSTR to extract uppercase data separated by underscores

sample column data:
Failure on table TOLL_USR_TRXN_HISTORY:
Failure on table DOCUMENT_IMAGES:
Error in CREATE_ACC_STATEMENT() [line 16]
I am looking for a way to extract only the uppercase words (table names) separated by underscores. I want the whole table name, the maximum is 3 underscores and the minimum is 1 underscore. I would like to ignore any capital letters that are initcap.
You can just use regexp_substr():
select regexp_substr(str, '[A-Z_]{3,}', 1, 1, 'c')
from (select 'Failure on table TOLL_USR_TRXN_HISTORY' as str from dual) x;
The pattern says to find substrings with capital letters or underscores, at least 3 characters long. The 1, 1 means start from the first position and return the first match. The 'c' makes the search case-sensitive.
You may use such a SQL Select statement for each substituted individual line
( Failure on table TOLL_USR_TRXN_HISTORY in the below case )
from your text :
select regexp_replace(q.word, '[^a-zA-Z0-9_]+', '') as word
from
(
select substr(str,nvl(lag(spc) over (order by lvl),1)+1*sign(lvl-1),
abs(decode(spc,0,length(str),spc)-nvl(lag(spc) over (order by lvl),1))) word,
nvl(lag(spc) over (order by lvl),1) lg
from
(
with tab as
( select 'Failure on table TOLL_USR_TRXN_HISTORY' str from dual )
select instr(str,' ',1,level) spc, str, level lvl
from tab
connect by level <= 10
)
) q
where lg > 0
and upper(regexp_replace(q.word, '[^a-zA-Z0-9_]+', ''))
= regexp_replace(q.word, '[^a-zA-Z0-9_]+', '')
and ( nvl(length(regexp_substr(q.word,'_',1,1)),0)
+ nvl(length(regexp_substr(q.word,'_',1,2)),0)
+ nvl(length(regexp_substr(q.word,'_',1,3)),0)) > 0
and nvl(length(regexp_substr(q.word,'_',1,4)),0) = 0;
Alternate way to get only table name from below error message , the below query will work only if table_name at end in the mentioned way
with t as( select 'Failure on table TOLL_USR_TRXN_HISTORY:' as data from dual)
SELECT RTRIM(substr(data,instr(data,' ',-1)+1),':') from t
New Query for all messages :
select replace (replace ( 'Failure on table TOLL_USR_TRXN_HISTORY:
Failure on table DOCUMENT_IMAGES:' , 'Failure on table', ' ' ),':',' ') from dual

ORDER BY specific numerical value in string [SQL]

Have a column ID that I would like to ORDER in a specific format. Column has a varchar data type and always has an alphabetic value, typically P in front followed by three to four numeric values. Possibly even followed by an underscore or another alphabetic value. I have tried few options and none are returning what I desire.
SELECT [ID] FROM MYTABLE
ORDER BY
(1) LEN(ID), ID ASC
/ (2) LEFT(ID,2)
OPTIONS TRIED (3) SUBSTRING(ID,2,4) ASC
\ (4) ROW_NUMBER() OVER (ORDER BY SUBSTRING(ID,2,4))
(5) SUBSTRING(ID,PATINDEX('%[0-9]%',ID),LEN(ID))
(6) LEFT(ID, PATINDEX('%[0-9]%', ID)-1)
Option 1 seems to be closest to what I am looking for except when an _ or Alphabetic values follow the numeric value. See results from Option 1 below
P100
P208
P218
P301
P305
P306
P4200
P4510
P4511
P4512
P5011
P1400A
P4125H
P4202A
P4507L
P4706A
P1001_2
P2103_B
P4368_RL
Would like to see..
P100
P208
P218
P301
P305
P306
P1001_2
P1400A
P2103_B
P4125H
P4200
P4202A
P4368_RL
P4507L
P4510
P4511
P4512
P4706A
P5011
ORDER BY
CAST(SUBSTRING(id, 2, 4) AS INT),
SUBSTRING(id, 6, 3)
http://sqlfiddle.com/#!6/9eecb7db59d16c80417c72d1e1f4fbf1/9464
And one that's still less complex than a getOnlyNumbers() UDF, but copes with varying length of numeric part.
CROSS APPLY
(
SELECT
tail_start = PATINDEX('%[0-9][^0-9]%', id + '_')
)
stats
CROSS APPLY
(
SELECT
numeric = CAST(SUBSTRING(id, 2, stats.tail_start-1) AS INT),
alpha = RIGHT(id, LEN(id) - stats.tail_start)
)
id_tuple
ORDER BY
id_tuple.numeric,
id_tuple.alpha
http://sqlfiddle.com/#!6/9eecb7db59d16c80417c72d1e1f4fbf1/9499
Finally, one that can cope with there being no number at all (but still assumes the first character exists and should be ignored).
CROSS APPLY
(
SELECT
tail_start = NULLIF(PATINDEX('%[0-9][^0-9]%', id + '_'), 0)
)
stats
CROSS APPLY
(
SELECT
numeric = CAST(SUBSTRING(id, 2, stats.tail_start-1) AS INT),
alpha = RIGHT(id, LEN(id) - ISNULL(stats.tail_start, 1))
)
id_tuple
ORDER BY
id_tuple.numeric,
id_tuple.alpha
http://sqlfiddle.com/#!6/9eecb7db59d16c80417c72d1e1f4fbf1/9507
This is a rather strange way to sort but now that I understand it I figured out a solution. I am using a table valued function here to strip out only the numbers from a string. Since the function returns all numeric characters I also need to check for the _ and only pass in the part of the string before that.
Here is the function.
create function GetOnlyNumbers
(
#SearchVal varchar(8000)
) returns table as return
with MyValues as
(
select substring(#SearchVal, N, 1) as number
, t.N
from cteTally t
where N <= len(#SearchVal)
and substring(#SearchVal, N, 1) like '[0-9]'
)
select distinct NumValue = STUFF((select number + ''
from MyValues mv2
order by mv2.N
for xml path('')), 1, 0, '')
from MyValues mv
This function is using a tally table. If you have one you can tweak that code slightly to fit. Here is my tally table. I keep it as a view.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
GO
Next of course we need to have some data to work. In this case I just created a table variable to represent your actual table.
declare #Something table
(
SomeVal varchar(10)
)
insert #Something values
('P100')
, ('P208')
, ('P218')
, ('P301')
, ('P305')
, ('P306')
, ('P4200')
, ('P4510')
, ('P4511')
, ('P4512')
, ('P5011')
, ('P1400A')
, ('P4125H')
, ('P4202A')
, ('P4507L')
, ('P4706A')
, ('P1001_2')
, ('P2103_B')
, ('P4368_RL')
With all the legwork and setup behind us we can get to the actual query needed to accomplish this.
select s.SomeVal
from #Something s
cross apply dbo.GetOnlyNumbers(case when charindex('_', s.SomeVal) = 0 then s.SomeVal else left(s.SomeVal, charindex('_', s.SomeVal) - 1) end) x
order by convert(int, x.NumValue)
This returns the rows in the order you listed them in your question.
You can break down ID in steps to extract the number. Then, order by the number and ID. I like to break down long string manipulation into steps using CROSS APPLY. You can do it inline (it'd be long) or bundle it into an inline TVF.
SELECT t.*
FROM MYTABLE t
CROSS APPLY (SELECT NoP = STUFF(ID, 1, 1, '')) nop
CROSS APPLY (SELECT FindNonNumeric = LEFT(NoP, ISNULL(NULLIF(PATINDEX('%[^0-9]%', NoP)-1, -1), LEN(NoP)))) fnn
CROSS APPLY (SELECT Number = CONVERT(INT, FindNonNumeric)) num
ORDER BY Number
, ID;
I think your best bet is to create a function that strips the numbers out of the string, like this one, and then sort by that. Even better, as #SeanLange suggested, would be to use that function to store the number value in a new column and sort by that.

Query Split string into rows

I have a table that looks like this:
ID Value
1 1,10
2 7,9
I want my result to look like this:
ID Value
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
1 10
2 7
2 8
2 9
I'm after both a range between 2 numbers with , as the delimiter (there can only be one delimiter in the value) and how to split this into rows.
Splitting the comma separated numbers is a small part of this problem. The parsing should be done in the application and the range stored in separate columns. For more than one reason: Storing numbers as strings is a bad idea. Storing two attributes in a single column is a bad idea. And, actually, storing unsanitized user input in the database is also often a bad idea.
In any case, one way to generate the list of numbers is to use a recursive CTE:
with t as (
select t.*, cast(left(value, charindex(',', value) - 1) as int) as first,
cast(substring(value, charindex(',', value) + 1, 100) as int) as last
from table t
),
cte as (
select t.id, t.first as value, t.last
from t
union all
select cte.id, cte.value + 1, cte.last
from cte
where cte.value < cte.last
)
select id, value
from cte
order by id, value;
You may need to fiddle with the value of MAXRECURSION if the ranges are really big.
Any table that a field with multiple values such as this is a problem in terms of design. The only way to deal with these records as it is is to split the values on the delimiter and put them into a temporary table, implement custom splitting code, integrate a CTE as noted, or redesign the original table to put the comma-delimited fields into separate fields, eg
ID LOWLIMIT HILIMIT
1 1 10
similar with Gordon Linoff variant, but has some difference
--create temp table for data sample
DECLARE #Yourdata AS TABLE ( id INT, VALUE VARCHAR(20) )
INSERT #Yourdata
( id, VALUE )
VALUES ( 1, '1,10' ),
( 2, '7,9' )
--final query
;WITH Tally
AS ( SELECT MIN(CONVERT(INT, SUBSTRING(y.VALUE, 1, CHARINDEX(',', y.value) - 1))) AS MinV ,
MAX(CONVERT(INT, SUBSTRING(y.VALUE, CHARINDEX(',', y.value) + 1, 18))) AS MaxV
FROM #yourdata AS y
UNION ALL
SELECT MinV = MinV + 1 , MaxV
FROM Tally
WHERE MinV < Maxv
)
SELECT y.id , t.minV AS value
FROM #yourdata AS y
JOIN tally AS t ON t.MinV BETWEEN CONVERT(INT, SUBSTRING(y.VALUE, 1, CHARINDEX(',', y.value) - 1))
AND CONVERT(INT, SUBSTRING(y.VALUE, CHARINDEX(',', y.value) + 1, 18))
ORDER BY id, minV
OPTION ( MAXRECURSION 999 ) --change it if required
output

Split on colons in SQL Server

I have a column in SQL table which would have data like this:
"College: Queenstown College" or "University: University of Queensland"
Text before the ":" could be different. So, how can i select only the text before the ":" from the column and text after the ":(space)"?
You should probably consider putting your table into first normal form rather than storing 2 pieces of data in the same column...
;with yourtable as
(
select 'College: Queenstown College' as col UNION ALL
select 'University: University of Queensland'
)
select
left(col,charindex(': ',col)-1) AS InstitutionType,
substring(col, charindex(': ',col)+2, len(col)) AS InstitutionName
from yourtable
Using the charindex and substring functions:
substring(theField, 1, charindex(':', theField) - 1)