How to split the string value in one column and return the result table - sql

Assume we have the following table:
id name member
1 jacky a;b;c
2 jason e
3 kate i;j;k
4 alex null
Now I want to use the sql or t-sql to return the following table:
1 jacky a
1 jacky b
1 jacky c
2 jason e
3 kate i
......
How to do that?
I'm using the MSSQL, MYSQL and Oracle database.

This is the shortest and readable string-to-rows splitter one could devise, and could be faster too.
Use case of choosing pure CTE instead of function, e.g. when you're not allowed to create a function on database :-)
Creating rows generator via function(which could be implemented by using loop or via CTE too) shall still need to use lateral joins(DB2 and Sybase have this functionality, using LATERAL keyword; In SQL Server, this is similar to CROSS APPLY and OUTER APPLY) to ultimately join the splitted rows generated by a function to the main table.
Pure CTE approach could be faster than function approach. The speed metrics lies in profiling though, just check the execution plan of this compared to other solutions if this is indeed faster:
with Pieces(theId, pn, start, stop) AS
(
SELECT id, 1, 1, charindex(';', member)
from tbl
UNION ALL
SELECT id, pn + 1, stop + 1, charindex(';', member, stop + 1)
from tbl
join pieces on pieces.theId = tbl.id
WHERE stop > 0
)
select
t.id, t.name,
word =
substring(t.member, p.start,
case WHEN stop > 0 THEN p.stop - p.start
ELSE 512
END)
from tbl t
join pieces p on p.theId = t.id
order by t.id, p.pn
Output:
ID NAME WORD
1 jacky a
1 jacky b
1 jacky c
2 jason e
3 kate i
3 kate j
3 kate k
4 alex (null)
Base logic sourced here: T-SQL: Opposite to string concatenation - how to split string into multiple records
Live test: http://www.sqlfiddle.com/#!3/2355d/1

Well... let me first introduce you to Adam Machanic who taught me about a Numbers table. He's also written a very fast split function using this Numbers table.
http://dataeducation.com/counting-occurrences-of-a-substring-within-a-string/
After you implement a Split function that returns a table, you can then join against it and get the results you want.

IF OBJECT_ID('dbo.Users') IS NOT NULL
DROP TABLE dbo.Users;
CREATE TABLE dbo.Users
(
id INT IDENTITY NOT NULL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
member VARCHAR(1000)
)
GO
INSERT INTO dbo.Users(name, member) VALUES
('jacky', 'a;b;c'),
('jason', 'e'),
('kate', 'i;j;k'),
('alex', NULL);
GO
DECLARE #spliter CHAR(1) = ';';
WITH Base AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM Base
WHERE n < CEILING(SQRT(1000)) --generate numbers from 1 to 1000, you may change it to a larger value depending on the member column's length.
)
, Nums AS --Numbers Common Table Expression, if your database version doesn't support it, just create a physical table.
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n
FROM Base AS B1 CROSS JOIN Base AS B2
)
SELECT id,
SUBSTRING(member, n, CHARINDEX(#spliter, member + #spliter, n) - n) AS element
FROM dbo.Users
JOIN Nums
ON n <= DATALENGTH(member) + 1
AND SUBSTRING(#spliter + member, n, 1) = #spliter
ORDER BY id
OPTION (MAXRECURSION 0); --Nums CTE is generated recursively, we don't want to limit recursion count.

Related

Split column value into separate columns based on length

I have multiple comma-separated values in one column with a size up to 20000 characters, and I want to split that column into one column but its based on character values 2000 (like into one new column it will take 2000 character and if length is grater than 2000 then its will be in second column like this).
When it's comma-separated value goes into first new column, then it should be meaningful like it should be based on , and up to 2000 characters only like this.
I have done from row level value to column level only but its should be 2000 character and based on comma
Could you please help me with this ?
siddesh, although this question lacks of everything I want to point some things out and help you (as you are an unexperienced SO-user):
First I set up a minimal reproducible exampel. This is on you the next time.
I'll start with a declared table with some rows inserted.
We on SO can copy'n'paste this into our environment which makes it easy to answer.
DECLARE #tbl TABLE(ID INT IDENTITY, YourCSVString VARCHAR(MAX));
INSERT INTO #tbl VALUES('1 this is long text, 2 some second fragment, 3 third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ')
,('1 this is other long text, 2 some second fragment to show that this works with tabular data, 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ');
--This is, what you actually need:
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength);
The result should be stored within a physical table, where the fkID points to the ID of the original row and the fragmentPosition stores the order. fkID and fragmentPosition should be a combined unique key.
If you really want to do, what you are suggesting in your question (not recommended!) you can try something along this:
DECLARE #maxPerColumn INT=75; --You can set the portion's max size, in your case 2000.
WITH cte AS
(
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength)
)
,recCTE AS
(
SELECT *
,countPerColumn = 1
,columnCounter = 1
,sumLength = LEN(fragmentContent)
,growingString = CAST(fragmentContent AS NVARCHAR(MAX))
FROM cte WHERE fragmentPosition=0
UNION ALL
SELECT r.fkID
,cte.fragmentPosition
,cte.fragmentContent
,cte.framgentLength
,CASE WHEN A.newSumLength>#maxPerColumn THEN 1 ELSE r.countPerColumn + 1 END
,r.columnCounter + CASE WHEN A.newSumLength>#maxPerColumn THEN 1 ELSE 0 END
,CASE WHEN A.newSumLength>#maxPerColumn THEN LEN(cte.fragmentContent) ELSE newSumLength END
,CASE WHEN A.newSumLength>#maxPerColumn THEN cte.fragmentContent ELSE CONCAT(r.growingString,N', ',cte.fragmentContent) END
FROM cte
INNER JOIN recCTE r ON r.fkID=cte.fkID AND r.fragmentPosition+1=cte.fragmentPosition
CROSS APPLY(VALUES(r.sumLength+LEN(cte.fragmentContent))) A(newSumLength)
)
SELECT TOP 1 WITH TIES
fkID
,growingString
,LEN(growingString)
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC );
The result
fkID pos Content
1 2 1 this is long text, 2 some second fragment, 3 third fragment
1 4 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf
2 0 1 this is other long text
2 1 2 some second fragment to show that this works with tabular data
2 3 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k
2 4 5 halksjfh asdkf
The idea in short:
The first cte does the splitting (as above)
The recursive cte will iterate down the string and do the magic.
The final SELECT uses a hack with TOP 1 WITH TIES together with an ORDER BY ROW_NUMBER() OVER(...). This will return the highest intermediate result only.
Hint: Don't do this...
UPDATE
Just for fun:
You can replace the final SELECT with this
,getPortions AS
(
SELECT TOP 1 WITH TIES
fkID
,fragmentPosition
,growingString
,LEN(growingString) portionLength
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC )
)
SELECT p.*
FROM
(
SELECT fkID
,CONCAT(N'col',ROW_NUMBER() OVER(PARTITION BY fkID ORDER BY fragmentPosition)) AS ColumnName
,growingString
FROM getPortions
) t
PIVOT(MAX(growingString) FOR ColumnName IN(col1,col2,col3,col4,col5)) p;
This will return exactly what you are asking for.
But - as said before - this is against all rules of best practice...

SQL Server - Manipulate data and return new results

Is there any way to do something in SQL Server to manipulate data like manipulating arrays in any other programming language?
I have one SQL query that returns 3 columns, "dt_ref" (date), "vlr_venda" (float) and "qt_parcelas" (int)
Basically, I need to do something like this:
- When field "qt_parcelas" is higher than 1, I need to do a "loop" with this row and generate 3 rows.
So, I need to divide the field "vlr_venda" by field "qt_parcelas", and use the field "dt_ref" as reference for the date start and increment month in the date field for the value of "qt_parcelas"
For example, if my query returns these structure:
| dt_ref | vlr_venda | qt_parcelas |
-------------------------------------
|20180901 | 3000 | 3 |
I need to do something to return this:
| dt_ref | vlr_venda |
----------------------
|20180901 | 1000 |
|20181001 | 1000 |
|20181101 | 1000 |
Is it possible to do it in SQL Server?
I've searched for something like this but haven't found anything useful...
Any ideas?
You can use a recursive CTE: Sql Fiddle
with cte as (
select dt_ref, vlr_venda / qt_parcelas as new_val, qt_parcelas, 1 as num
from t
union all
select dateadd(month, 1, dt_ref), new_val, qt_parcelas, num + 1
from cte
where num < qt_parcelas
)
select dt_ref, new_val
from cte;
As written, this will work for up to 100 months. You need to add OPTION (MAXRECURSION 0) for longer periods.
Instead of using a rCTE you could use a tally table. If you have much larger numbers than 3 this'll probably be far be efficient:
WITH N AS(
SELECT n
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(n)),
Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) I
FROM N N1 --10 rows
CROSS JOIN N N2 --100 rows
--CROSS JOIN N N3 --Keep adding more CROSS JOINs to create more rows
),
VTE AS (
SELECT CONVERT(date,V.dt_ref) AS dt_ref,
V.vlr_venda,
V.qt_parcelas
FROM (VALUES('20180901',3000,3),
('20181001',12000,6)) V(dt_ref,vlr_venda ,qt_parcelas))
SELECT DATEADD(MONTH,T.I,V.dt_ref),
CONVERT(decimal(10,4),V.vlr_venda / (V.qt_parcelas * 1.0)) --incase you need decimal points
FROM VTE V
JOIN Tally T ON V.qt_parcelas >= T.I;
I developed a software to generate tickets and i had a similar experience than you. I have tried CURSORS and recursive CTE and they all took something like 50 minutes when creating tickets for clients
I used this function to replicate my clients and generate my tickets
/****** Object: UserDefinedFunction [dbo].[NumbersTable] Script Date: 28/09/2018 10:51:25 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[NumbersTable] (
#fromNumber int,
#toNumber int,
#byStep int
)
RETURNS #NumbersTable TABLE (i int)
AS
BEGIN
WITH CTE_NumbersTable AS (
SELECT #fromNumber AS i
UNION ALL
SELECT i + #byStep
FROM CTE_NumbersTable
WHERE
(i + #byStep) <= #toNumber
)
INSERT INTO #NumbersTable
SELECT i FROM CTE_NumbersTable OPTION (MAXRECURSION 0)
RETURN;
END
GO
Then you can use a
CROSS APPLY dbo.NumbersTable(1,qt_parcelas ,1);
To generate your rows
Believe me this way is more efficient and when dealing with large amount of data (something like 8 to 10 million rows) it takes something like 2 minutes instead of 40

SQL: create sequential list of numbers from various starting points

I'm stuck on this SQL problem.
I have a column that is a list of starting points (prevdoc), and anther column that lists how many sequential numbers I need after the starting point (exdiff).
For example, here are the first several rows:
prevdoc | exdiff
----------------
1 | 3
21 | 2
126 | 2
So I need an output to look something like:
2
3
4
22
23
127
128
I'm lost as to where even to start. Can anyone advise me on the SQL code for this solution?
Thanks!
;with a as
(
select prevdoc + 1 col, exdiff
from <table> where exdiff > 0
union all
select col + 1, exdiff - 1
from a
where exdiff > 1
)
select col
If your exdiff is going to be a small number, you can make up a virtual table of numbers using SELECT..UNION ALL as shown here and join to it:
select prevdoc+number
from doc
join (select 1 number union all
select 2 union all
select 3 union all
select 4 union all
select 5) x on x.number <= doc.exdiff
order by 1;
I have provided for 5 but you can expand as required. You haven't specified your DBMS, but in each one there will be a source of sequential numbers, for example in SQL Server, you could use:
select prevdoc+number
from doc
join master..spt_values v on
v.number <= doc.exdiff and
v.number >= 1 and
v.type = 'p'
order by 1;
The master..spt_values table contains numbers between 0-2047 (when filtered by type='p').
If the numbers are not too large, then you can use the following trick in most databases:
select t.exdiff + seqnum
from t join
(select row_number() over (order by column_name) as seqnum
from INFORMATION_SCHEMA.columns
) nums
on t.exdiff <= seqnum
The use of INFORMATION_SCHEMA columns in the subquery is arbitrary. The only purpose is to generate a sequence of numbers at least as long as the maximum exdiff number.
This approach will work in any database that supports the ranking functions. Most databases have a database-specific way of generating a sequence (such as recursie CTEs in SQL Server and CONNECT BY in Oracle).

SQL Query - Display Count & All ID's With Same Name

I'm trying to display the amount of table entries with the same name and the unique ID's associated with each of those entries.
So I have a table like so...
Table Names
------------------------------
ID Name
0 John
1 Mike
2 John
3 Mike
4 Adam
5 Mike
I would like the output to be something like:
Name | Count | IDs
---------------------
Mike 3 1,3,5
John 2 0,2
Adam 1 4
I have the following query which does this except display all the unique ID's:
select name, count(*) as ct from names group by name order by ct desc;
select name,
count(id) as ct,
group_concat(id) as IDs
from names
group by name
order by ct desc;
You can use GROUP_CONCAT for that
Depending on version of MSSQL you are using (2005+), you can use the FOR XML PATH option.
SELECT
Name,
COUNT(*) AS ct,
STUFF((SELECT ',' + CAST(ID AS varchar(MAX))
FROM names i
WHERE i.Name = n.Name FOR XML PATH(''))
, 1, 1, '') as IDs
FROM names n
GROUP BY Name
ORDER BY ct DESC
Closest thing to group_concat you'll get on MSSQL unless you use the SQLCLR option (which I have no experience doing). The STUFF function takes care of the leading comma. Also, you don't want to alias the inner SELECT as it will wrap the element you're selecting in an XML element (alias of TD causes each element to return as <TD>value</TD>).
Given the input above, here's the result I get:
Name ct IDs
Mike 3 1,3,5
John 2 0,2
Adam 1 4
EDIT: DISCLAIMER
This technique will not work as intended for string fields that could possibly contain special characters (like ampersands &, less than <, greater than >, and any number of other formatting characters). As such, this technique is most beneficial for simple integer values, although can still be used for text if you are ABSOLUTELY SURE there are no special characters that would need to be escaped. As such, read the solution posted HERE to ensure these characters get properly escaped.
Here is another SQL Server method, using recursive CTE:
Link to SQLFiddle
; with MyCTE(name,ids, name_id, seq)
as(
select name, CAST( '' AS VARCHAR(8000) ), -1, 0
from Data
group by name
union all
select d.name,
CAST( ids + CASE WHEN seq = 0 THEN '' ELSE ', ' END + cast(id as varchar) AS VARCHAR(8000) ),
CAST( id AS int),
seq + 1
from MyCTE cte
join Data d
on cte.name = d.name
where d.id > cte.name_id
)
SELECT name, ids
FROM ( SELECT name, ids,
RANK() OVER ( PARTITION BY name ORDER BY seq DESC )
FROM MyCTE ) D ( name, ids, rank )
WHERE rank = 1

SQL return multiple rows from one record

This is the opposite of reducing repeating records.
SQL query to create physical inventory checklists
If widget-xyz has a qty of 1 item return 1 row, but if it has 5, return 5 rows etc.
For all widgets in a particular warehouse.
Previously this was handled with a macro working through a range in excel, checking the qty column. Is there a way to make a single query instead?
The tables are FoxPro dbf files generated by an application and I am outputting this into html
Instead of generating an xml string and using xml parsing functions to generate a counter as Nestor has suggested, you might consider joining on a recursive CTE as a counter, as LukLed has hinted to:
WITH Counter AS
(
SELECT 0 i
UNION ALL
SELECT i + 1
FROM Counter
WHERE i < 100
),
Data AS
(
SELECT 'A' sku, 1 qty
UNION
SELECT 'B', 2
UNION
SELECT 'C', 3
)
SELECT *
FROM Data
INNER JOIN Counter ON i < qty
According to query analyzer, this query is much faster than the xml pseudo-table. This approach also gives you a recordset with a natural key (sku, i).
There is a default recursion limit of 100 in MSSQL that will restrict your counter. If you have quantities > 100, you can either increase this limit, use nested counters, or create a physical table for counting.
For SQL 2005/2008, take a look at
CROSS APPLY
What I would do is CROSS APPLY each row with a sub table with as many rows as qty has. A secondary question is how to create that sub table (I'd suggest to create an xml string and then parse it with the xml operators)
I hope this gives you a starting pointer....
Starting with
declare #table table (sku int, qty int);
insert into #table values (1, 5), (2,4), (3,2);
select * from #table;
sku qty
----------- -----------
1 5
2 4
3 2
You can generate:
with MainT as (
select *, convert(xml,'<table>'+REPLICATE('<r></r>',qty)+'</table>') as pseudo_table
from #table
)
select p.sku, p.qty
from MainT p
CROSS APPLY
(
select p.sku from p.pseudo_table.nodes('/table/r') T(row)
) crossT
sku qty
----------- -----------
1 5
1 5
1 5
1 5
1 5
2 4
2 4
2 4
2 4
3 2
3 2
Is that what you want?
Seriously dude... next time put more effort writing your question. It's impossible to know exactly what you are looking for.
You can use table with number from 1 to max(quantity) and join your table by quantity <= number. You can do it in many ways, but it depends on sql engine.
You can do this using dynamic sql.