Split column value into separate columns based on length - sql

I have multiple comma-separated values in one column with a size up to 20000 characters, and I want to split that column into one column but its based on character values 2000 (like into one new column it will take 2000 character and if length is grater than 2000 then its will be in second column like this).
When it's comma-separated value goes into first new column, then it should be meaningful like it should be based on , and up to 2000 characters only like this.
I have done from row level value to column level only but its should be 2000 character and based on comma
Could you please help me with this ?

siddesh, although this question lacks of everything I want to point some things out and help you (as you are an unexperienced SO-user):
First I set up a minimal reproducible exampel. This is on you the next time.
I'll start with a declared table with some rows inserted.
We on SO can copy'n'paste this into our environment which makes it easy to answer.
DECLARE #tbl TABLE(ID INT IDENTITY, YourCSVString VARCHAR(MAX));
INSERT INTO #tbl VALUES('1 this is long text, 2 some second fragment, 3 third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ')
,('1 this is other long text, 2 some second fragment to show that this works with tabular data, 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ');
--This is, what you actually need:
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength);
The result should be stored within a physical table, where the fkID points to the ID of the original row and the fragmentPosition stores the order. fkID and fragmentPosition should be a combined unique key.
If you really want to do, what you are suggesting in your question (not recommended!) you can try something along this:
DECLARE #maxPerColumn INT=75; --You can set the portion's max size, in your case 2000.
WITH cte AS
(
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength)
)
,recCTE AS
(
SELECT *
,countPerColumn = 1
,columnCounter = 1
,sumLength = LEN(fragmentContent)
,growingString = CAST(fragmentContent AS NVARCHAR(MAX))
FROM cte WHERE fragmentPosition=0
UNION ALL
SELECT r.fkID
,cte.fragmentPosition
,cte.fragmentContent
,cte.framgentLength
,CASE WHEN A.newSumLength>#maxPerColumn THEN 1 ELSE r.countPerColumn + 1 END
,r.columnCounter + CASE WHEN A.newSumLength>#maxPerColumn THEN 1 ELSE 0 END
,CASE WHEN A.newSumLength>#maxPerColumn THEN LEN(cte.fragmentContent) ELSE newSumLength END
,CASE WHEN A.newSumLength>#maxPerColumn THEN cte.fragmentContent ELSE CONCAT(r.growingString,N', ',cte.fragmentContent) END
FROM cte
INNER JOIN recCTE r ON r.fkID=cte.fkID AND r.fragmentPosition+1=cte.fragmentPosition
CROSS APPLY(VALUES(r.sumLength+LEN(cte.fragmentContent))) A(newSumLength)
)
SELECT TOP 1 WITH TIES
fkID
,growingString
,LEN(growingString)
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC );
The result
fkID pos Content
1 2 1 this is long text, 2 some second fragment, 3 third fragment
1 4 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf
2 0 1 this is other long text
2 1 2 some second fragment to show that this works with tabular data
2 3 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k
2 4 5 halksjfh asdkf
The idea in short:
The first cte does the splitting (as above)
The recursive cte will iterate down the string and do the magic.
The final SELECT uses a hack with TOP 1 WITH TIES together with an ORDER BY ROW_NUMBER() OVER(...). This will return the highest intermediate result only.
Hint: Don't do this...
UPDATE
Just for fun:
You can replace the final SELECT with this
,getPortions AS
(
SELECT TOP 1 WITH TIES
fkID
,fragmentPosition
,growingString
,LEN(growingString) portionLength
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC )
)
SELECT p.*
FROM
(
SELECT fkID
,CONCAT(N'col',ROW_NUMBER() OVER(PARTITION BY fkID ORDER BY fragmentPosition)) AS ColumnName
,growingString
FROM getPortions
) t
PIVOT(MAX(growingString) FOR ColumnName IN(col1,col2,col3,col4,col5)) p;
This will return exactly what you are asking for.
But - as said before - this is against all rules of best practice...

Related

Snowflake: Repeating rows based on column value

How to repeat rows based on column value in snowflake using sql.
I tried a few methods but not working such as dual and connect by.
I have two columns: Id and Quantity.
For each ID, there are different values of Quantity.
So if you have a count, you can use a generator:
with ten_rows as (
select row_number() over (order by null) as rn
from table(generator(ROWCOUNT=>10))
), data(id, count) as (
select * from values
(1,2),
(2,4)
)
SELECT
d.*
,r.rn
from data as d
join ten_rows as r
on d.count >= r.rn
order by 1,3;
ID
COUNT
RN
1
2
1
1
2
2
2
4
1
2
4
2
2
4
3
2
4
4
Ok let's start by generating some data. We will create 10 rows, with a QTY. The QTY will be randomly chosen as 1 or 2.
Next we want to duplicate the rows with a QTY of 2 and leave the QTY =1 as they are.
Obviously you can change all parameters above to suit your needs - this solution works super fast and in my opinion way better than table generation.
Simply stack SPLIT_TO_TABLE(), REPEAT() with a LATERAL() join and voila.
WITH TEN_ROWS AS (SELECT ROW_NUMBER()OVER(ORDER BY NULL)SOME_ID,UNIFORM(1,2,RANDOM())QTY FROM TABLE(GENERATOR(ROWCOUNT=>10)))
SELECT
TEN_ROWS.*
FROM
TEN_ROWS,LATERAL SPLIT_TO_TABLE(REPEAT('hire me $10/hour',QTY-1),'hire me $10/hour')ALTERNATIVE_APPROACH;

more efficiently pivot rows

I am trying to join multiple tables together. One of the tables I am trying to join has hundreds of rows per ID of data. I am trying to pivot about 100 rows for each ID into columns. The value I am trying to use isn't always in the same row. Below is an example (my real table has hundreds of rows per ID). AccNum for example in ID 1 may be in the NumV column, but for ID 2 it may be in the CharV column.
ID QType CharV NumV
1 AccNum 10
1 EmpNam John Inc 0
1 UW Josh 0
2 AccNum 11
2 EmpNam CBS 0
2 UW Dan 0
The original code I used was a select statement with hundreds of lines like one below:
Max(Case When PM.[QType] = 'AccNum' Then NumV End) as AccNum
This code with hundreds on lines completed in just under 10 min. The problem however is that in only pulls in values from the column I specify, so I will always loss the data that is in a different column. (In the example above I would get AccNum 10, but not AccNum11 because it's in the CharV column).
I updated the code to use a pivot:
;with CTE
As
(
Select [PMID], [QType],
Value=concat(Nullif([CharV],''''),Nullif([NumV],0))
From [DBase].[dbo].[PM]
)
Select C.[ID] AS M_ID
,Max(c.[AccNum]) As AcctNum
,Max(c.[EmpNam]) As EmpName
and so on...
I then select all of my hundreds of rows and then pivot it the data:
from CTE
pivot (max(Value) for [QType] in ([AccNum],[EmpNam],(more rows)))As c
The problem with this code, however, is that it takes almost 2 hours to run.
Is there a different, more efficient solution to what I am trying to accomplish? I need to have the speed of the first code, but the result of the second.
Perhaps you can reduce the Concat/NullIf processing by using a UNION ALL
Select ID,QType,Value=CharV From #YourTable where CharV>''
Union All
Select ID,QType,Value=cast(NumV as varchar(25)) From #YourTable where NumV>0
For the conditional aggregation approach
No need to worry about which field, just reference VALUE
Select [ID]
,[Accnum] = Max(Case When [QType] = 'AccNum' Then Value End)
,[EmpNam] = Max(Case When [QType] = 'EmpNam' Then Value End)
,[UW] = Max(Case When [QType] = 'UW' Then Value End)
From (
Select ID,QType,Value=CharV From #YourTable where CharV>''
Union All
Select ID,QType,Value=cast(NumV as varchar(25)) From #YourTable where NumV>0
) A
Group By ID
For the PIVOT approach
Select [ID],[AccNum],[EmpNam],[UW]
From (
Select ID,QType,Value=CharV From #YourTable where CharV>''
Union All
Select ID,QType,Value=cast(NumV as varchar(25)) From #YourTable where NumV>0
) A
Pivot (max([Value]) For [QType] in ([AccNum],[EmpNam],[UW])) p

Generate SQL rows

Given a number of types and a number of occurrences per type, I would like to generate something like this in T-SQL:
Occurrence | Type
-----------------
0 | A
1 | A
0 | B
1 | B
2 | B
Both the number of types and the number of occurrences per type are presented as values in different tables.
While I can do this with WHILE loops, I'm looking for a better solution.
Thanks!
This works with a number-table which i would use.
SELECT Occurrence = ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Type) - 1
, Type
FROM Numbers num
INNER JOIN #temp1 t
ON num.n BETWEEN 1 AND t.Occurrence
Tested with this sample data:
create table #temp1(Type varchar(10),Occurrence int)
insert into #temp1 VALUES('A',2)
insert into #temp1 VALUES('B',3)
How to create a number-table? http://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
If you have a table with the columns type and num, you have two approaches. One way is to use recursive CTEs:
with CTE as (
select type, 0 as occurrence, num
from table t
union all
select type, 1 + occurrence, num
from cte
where occurrence + 1 < num
)
select cte.*
from cte;
You may have to set the MAXRECURSION option, if the number exceeds 100.
The other way is to join in a numbers table. SQL Server uses spt_values for this purpose:
select s.number - 1 as occurrence, t.type
from table t join
spt_values s
on s.number <= t.num ;

Find the longest sequence of consecutive increasing numbers in SQL

For this example say I have a table with two fields, AREA varchar(30) and OrderNumber INT.
The table has the following data
AREA | OrderNumber
Fontana | 32
Fontana | 42
Fontana | 76
Fontana | 12
Fontana | 3
Fontana | 99
RC | 32
RC | 1
RC | 8
RC | 9
RC | 4
I would like to return
The results I would like to return is for each area the longest length of increasing consecutive values. For Fontana it is 3 (32, 42, 76). For RC it is 2 (8,9)
AREA | LongestLength
Fontana | 3
RC | 2
How would I do this on MS Sql 2005?
One way is to use a recursive CTE that steps over each row. If the row meets the criteria (increasing order number for the same area), you increase the chain length by one. If it doesn't, you start a new chain:
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1
)
, recurse as
(
select rn
, area
, OrderNumber
, 1 as ChainLength
from numbered
where rn = 1
union all
select cur.rn
, cur.area
, cur.OrderNumber
, case
when cur.area = prev.area
and cur.OrderNumber > prev.OrderNumber
then prev.ChainLength + 1
else 1
end
from recurse prev
join numbered cur
on prev.rn + 1 = cur.rn
)
select area
, max(ChainLength)
from recurse
group by
area
Live example at SQL Fiddle.
An alternative way is to use a query to find "breaks", that is, rows that end a sequence of increasing order numbers for the same area. The number of rows between breaks is the length.
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1 t1
)
-- Select rows that break an increasing chain
, breaks as
(
select row_number() over (order by cur.rn) rn2
, cur.rn
, cur.Area
from numbered cur
left join
numbered prev
on cur.rn = prev.rn + 1
where cur.OrderNumber <= prev.OrderNumber
or cur.Area <> prev.Area
or prev.Area is null
)
-- Add a final break after the last row
, breaks2 as
(
select *
from breaks
union all
select count(*) + 1
, max(rn) + 1
, null
from breaks
)
select series_start.area
, max(series_end.rn - series_start.rn)
from breaks2 series_start
join breaks2 series_end
on series_end.rn2 = series_start.rn2 + 1
group by
series_start.area
Live example at SQL Fiddle.
You do not explain why RC's longest sequence does not include 1 while Fontana's does include 32. I take it, the 1 is excluded because it is a decrease: it comes after 32. The Fontana's 32, however, is the first ever item in the group, and I've got two ideas how to explain why it is considered an increase. That's either exactly because it's the group's first item or because it is also positive (i.e. as if coming after 0 and, therefore, an increase).
For the purpose of this answer, I'm assuming the latter, i.e. a group's first item is an increase if it is positive. The below script implements the following idea:
Enumerate the rows in every AREA group in the order of the eventtime column you nearly forgot to mention.
Join the enumerated set to itself to link every row with it's predecessor.
Get the sign of the difference between the row and its preceding value (defaulting the latter to 0). At this point the problem turns into a gaps-and-islands one.
Partition every AREA group by the signs determined in #3 and enumerate every subgroup's rows.
Find the difference between the row numbers from #1 and those found in #4. That would be a criterion to identify individual streaks (together with AREA).
Finally, group the results by AREA, the sign from #3 and the result from #5, count the rows and get the maximum count per AREA.
I implemented the above like this:
WITH enumerated AS (
SELECT
*,
row = ROW_NUMBER() OVER (PARTITION BY AREA ORDER BY eventtime)
FROM atable
),
signed AS (
SELECT
this.eventtime,
this.AREA,
this.row,
sgn = SIGN(this.OrderNumber - COALESCE(last.OrderNumber, 0))
FROM enumerated AS this
LEFT JOIN enumerated AS last
ON this.AREA = last.AREA
AND this.row = last.row + 1
),
partitioned AS (
SELECT
AREA,
sgn,
grp = row - ROW_NUMBER() OVER (PARTITION BY AREA, sgn ORDER BY eventtime)
FROM signed
)
SELECT DISTINCT
AREA,
LongestIncSeq = MAX(COUNT(*)) OVER (PARTITION BY AREA)
FROM partitioned
WHERE sgn = 1
GROUP BY
AREA,
grp
;
A SQL Fiddle demo can be found here.
You can do some math by ROW_NUMBER() to figure out where you have consecutive items.
Here's the code sample:
;WITH rownums AS
(
SELECT [area],
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [ordernumber]) AS rid1,
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [eventtime]) AS rid2
FROM SomeTable
),
differences AS
(
SELECT [area],
[calc] = rid1 - rid2
FROM rownums
),
summation AS
(
SELECT [area], [calc], COUNT(*) AS lengths
FROM differences
GROUP BY [area], [calc]
)
SELECT [area], MAX(lengths) AS LongestLength
FROM differences
JOIN summation
ON differences.[calc] = summation.[calc]
AND differences.area = calc.area
GROUP BY [area]
So if I do one set of row numbers ordered by my ordernumber and another set of row numbers by my event time, the difference between those two numbers will always be the same, so long as their order is the same.
You can then get a count grouped by those differences and then pull the largest count to get what you need.
EDIT: ...
Ignore the first edit, what I get for rushing.

How to split the string value in one column and return the result table

Assume we have the following table:
id name member
1 jacky a;b;c
2 jason e
3 kate i;j;k
4 alex null
Now I want to use the sql or t-sql to return the following table:
1 jacky a
1 jacky b
1 jacky c
2 jason e
3 kate i
......
How to do that?
I'm using the MSSQL, MYSQL and Oracle database.
This is the shortest and readable string-to-rows splitter one could devise, and could be faster too.
Use case of choosing pure CTE instead of function, e.g. when you're not allowed to create a function on database :-)
Creating rows generator via function(which could be implemented by using loop or via CTE too) shall still need to use lateral joins(DB2 and Sybase have this functionality, using LATERAL keyword; In SQL Server, this is similar to CROSS APPLY and OUTER APPLY) to ultimately join the splitted rows generated by a function to the main table.
Pure CTE approach could be faster than function approach. The speed metrics lies in profiling though, just check the execution plan of this compared to other solutions if this is indeed faster:
with Pieces(theId, pn, start, stop) AS
(
SELECT id, 1, 1, charindex(';', member)
from tbl
UNION ALL
SELECT id, pn + 1, stop + 1, charindex(';', member, stop + 1)
from tbl
join pieces on pieces.theId = tbl.id
WHERE stop > 0
)
select
t.id, t.name,
word =
substring(t.member, p.start,
case WHEN stop > 0 THEN p.stop - p.start
ELSE 512
END)
from tbl t
join pieces p on p.theId = t.id
order by t.id, p.pn
Output:
ID NAME WORD
1 jacky a
1 jacky b
1 jacky c
2 jason e
3 kate i
3 kate j
3 kate k
4 alex (null)
Base logic sourced here: T-SQL: Opposite to string concatenation - how to split string into multiple records
Live test: http://www.sqlfiddle.com/#!3/2355d/1
Well... let me first introduce you to Adam Machanic who taught me about a Numbers table. He's also written a very fast split function using this Numbers table.
http://dataeducation.com/counting-occurrences-of-a-substring-within-a-string/
After you implement a Split function that returns a table, you can then join against it and get the results you want.
IF OBJECT_ID('dbo.Users') IS NOT NULL
DROP TABLE dbo.Users;
CREATE TABLE dbo.Users
(
id INT IDENTITY NOT NULL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
member VARCHAR(1000)
)
GO
INSERT INTO dbo.Users(name, member) VALUES
('jacky', 'a;b;c'),
('jason', 'e'),
('kate', 'i;j;k'),
('alex', NULL);
GO
DECLARE #spliter CHAR(1) = ';';
WITH Base AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM Base
WHERE n < CEILING(SQRT(1000)) --generate numbers from 1 to 1000, you may change it to a larger value depending on the member column's length.
)
, Nums AS --Numbers Common Table Expression, if your database version doesn't support it, just create a physical table.
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n
FROM Base AS B1 CROSS JOIN Base AS B2
)
SELECT id,
SUBSTRING(member, n, CHARINDEX(#spliter, member + #spliter, n) - n) AS element
FROM dbo.Users
JOIN Nums
ON n <= DATALENGTH(member) + 1
AND SUBSTRING(#spliter + member, n, 1) = #spliter
ORDER BY id
OPTION (MAXRECURSION 0); --Nums CTE is generated recursively, we don't want to limit recursion count.