How to split comma delimited data from one column into multiple rows - sql

I'm trying to write a query that will have a column show a specific value depending on another comma delimited column. The codes are meant to denote Regular time/overtime/doubletime/ etc. and they come from the previously mentioned comma delimited column. In the original view, there are columns for each of the different hours accrued separately. For the purposes of this, we can say A = regular time, B = doubletime, C = overtime. However, we have many codes that can represent the same type of time.
What my original view looks like:
Employee_FullName
EmpID
Code
Regular Time
Double Time
Overtime
John Doe
123
A,B
7
2
0
Jane Doe
234
B
4
0
1
What my query outputs:
Employee_FullName
EmpID
Code
Hours
John Doe
123
A, B
10
John Doe
123
A, B
5
Jane Doe
234
B
5
What I want the output to look like:
Employee_FullName
EmpID
Code
Hours
John Doe
123
A
10
John Doe
123
B
5
Jane Doe
234
B
5
It looks the way it does in the first table because currently it's only pulling from the regular time column. I've tried using a case switch to have it look for a specific code and then pull the number, but I get a variety of errors no matter how I write it. Here's what my query looks like:
SELECT [Employee_FullName],
SUBSTRING(col, 1, CHARINDEX(' ', col + ' ' ) -1)'Code',
hrsValue
FROM
(
SELECT [Employee_FullName], col, hrsValue
FROM myTable
CROSS APPLY
(
VALUES ([Code],[RegularHours])
) C (COL, hrsValue)
) SRC
Any advice on how to fix it or perspective on what to use is appreciated!
Edit: I cannot change the comma delimited data, it is provided that way. I think a case within a cross apply will solve it but I honestly don't know.
Edit 2: I will be using a unique EmployeeID to identify them. In this case yes A is regular time, B is double time, C is overtime. The complication is that there are a variety of different codes and multiple refer to each type of time. There is never a case where A would refer to regular time for one employee and double time for another, etc. I am on SQL Server 2017. Thank you all for your time!

If you are on SQL Server 2016 or better, you can use OPENJSON() to split up the code values instead of cumbersome string operations:
SELECT t.Employee_FullName,
Code = LTRIM(j.value),
Hours = MAX(CASE j.[key]
WHEN 0 THEN RegularTime
WHEN 1 THEN DoubleTime
WHEN 2 THEN Overtime END)
FROM dbo.MyTable AS t
CROSS APPLY OPENJSON('["' + REPLACE(t.Code,',','","') + '"]') AS j
GROUP BY t.Employee_FullName, LTRIM(j.value);
Example db<>fiddle

You can use the following code to split up the values
Note how NULLIF nulls out the CHARINDEX if it returns 0
The second half of the second APPLY is conditional on that null
SELECT
t.[Employee_FullName],
Code = TRIM(v2.Code),
v2.Hours
FROM myTable t
CROSS APPLY (VALUES( NULLIF(CHARINDEX(',', t.Code), 0) )) v1(comma)
CROSS APPLY (
SELECT Code = ISNULL(LEFT(t.Code, v1.comma - 1), t.Code), Hours = t.RegularTime
UNION ALL
SELECT SUBSTRING(t.Code, v1.comma + 1, LEN(t.Code)), t.DoubleTime
WHERE v1.comma IS NOT NULL
) v2;
db<>fiddle

You can go for CROSS APPLY based approach as given below.
Thanks to #Chalieface for the insert script.
CREATE TABLE mytable (
"Employee_FullName" VARCHAR(8),
"Code" VARCHAR(3),
"RegularTime" INTEGER,
"DoubleTime" INTEGER,
"Overtime" INTEGER
);
INSERT INTO mytable
("Employee_FullName", "Code", "RegularTime", "DoubleTime", "Overtime")
VALUES
('John Doe', 'A,B', '10', '5', '0'),
('Jane Doe', 'B', '5', '0', '0');
SELECT
t.[Employee_FullName],
c.Code,
CASE WHEN c.code = 'A' THEN t.RegularTime
WHEN c.code = 'B' THEN t.DoubleTime
WHEN c.code = 'C' THEN t.Overtime
END AS Hours
FROM myTable t
CROSS APPLY (select value from string_split(t.code,',')
) c(code)
Employee_FullName
Code
Hours
John Doe
A
10
John Doe
B
5
Jane Doe
B
0

Related

SQL query multiple values in just one cell

Hello I am kinda new to sql. Just wanna know if this is possible via sql:
Table: (Multiple values are in just 1 cell.)
COLUMN 1
COLUMN 2
"2023-01-01", "2023-01-02", "2023-01-03"
"User A, User B, User C"
Needed Output:
COLUMN 1
COLUMN 2
2023-01-01
User A
2023-01-02
User A
2023-01-03
User A
2023-01-01
User B
2023-01-02
User B
2023-01-03
User B
2023-01-01
User C
2023-01-02
User C
2023-01-03
User C
Basically, each date from the row is assigned to all users in that same row. Any help or tip will be appreciated.
Thank you!
Screenshot of data/required table
I have no idea yet on how to go around this
You can use the string_to_array function to get all parts of a string as elements of an array, then use the unnest function on that array to get the desired result, check the following:
select col1,
unnest(string_to_array(replace(replace(COLUMN2,'"',''),', ',','), ',')) as col2
from
(
select unnest(string_to_array(replace(replace(COLUMN1,'"',''),', ',','), ',')) as col1
, COLUMN2
from table_name
) T
order by col1, col2
See demo
We can use a combination of STRING_TO_ARRAY with UNNEST and LATERAL JOIN here:
SELECT col1.column1, col2.column2
FROM
(SELECT UNNEST(
STRING_TO_ARRAY(column1,',')
) AS column1 FROM test) col1
LEFT JOIN LATERAL
(SELECT UNNEST(
STRING_TO_ARRAY(column2,',')
) AS column2 FROM test) col2
ON true
ORDER BY col2.column2, col1.column1;
Try out: db<>fiddle
STRING_TO_ARRAY will split the different dates and the different users into separate items.
UNNEST will write those items in separate rows.
LATERAL JOIN will put the three dates together with the three users (or of course less/more, depending on your data) and so creates the nine rows shown in your question. It works similar to the CROSS APPLY approach which will do on a SQL Server DB.
The ORDER BY clause just creates the same order as shown in your question, we can remove it if not required. The question doesn't really tell us if it's needed.
Because implementation details van change on different DBMS's, here is an example of how to do it in MySQL (8.0+):
WITH column1 as (
SELECT TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(column1,',',x),',',-1)) as Value
FROM test
CROSS JOIN (select 1 as x union select 2 union select 3 union select 4) x
WHERE x <= LENGTH(Column1)-LENGTH(REPLACE(Column1,',',''))+1
),
column2 as (
SELECT TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(column2,',',x),',',-1)) as Value
FROM test
CROSS JOIN (select 1 as x union select 2 union select 3 union select 4) x
WHERE x <= LENGTH(Column2)-LENGTH(REPLACE(Column2,',',''))+1
)
SELECT *
FROM column1, column2;
see: DBFIDDLE
NOTE:
The CROSS JOIN, with only 4 values should be expanded when more than 4 items exist.
There is not data type connected to the values that are fetched. This implementation does not know that "2023-01-08" is, sorry CAN BE, a date. It just sticks to strings.
In sql server this can be done using string_split
select x.value as date_val,y.value as user_val
from test a
CROSS APPLY string_split(Column1,',')x
CROSS APPLY string_split(Column2,',')y
order by y.value,x.value
date_val user_val
2023-01-01 User A
2023-01-02 User A
2023-01-03 User A
2023-01-03 User B
2023-01-02 User B
2023-01-01 User B
2023-01-01 User C
2023-01-02 User C
2023-01-03 User C
db fiddle link
https://dbfiddle.uk/YNJWDPBq
In mysql you can do it as follows :
WITH dates as (
select TRIM(SUBSTRING_INDEX(_date, ',', 1)) AS 'dates'
from _table
union
select TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(_date, ',', 2), ',', -1)) AS 'dates'
from _table
union
select TRIM(SUBSTRING_INDEX(_date, ',', -1)) AS 'dates'
from _table
),
users as
( select TRIM(SUBSTRING_INDEX(user, ',', 1)) AS 'users'
from _table
union
select TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(user, ',', 2), ',', -1)) AS 'users'
from _table
union
select TRIM(SUBSTRING_INDEX(user, ',', -1)) AS 'users'
from _table
)
select *
from dates, users
order by dates, users;
check it here : https://dbfiddle.uk/_oGix9PD

Using UNNEST Function in BigQuery

I need help on how to use BigQuery UNNEST function. My query:
I have table as shown in the image and I want to unnest the field "domains" (string type) currently separated by comma, so that I get each comma separated domain into a different row for each "acname". The output needed is also enclosed in the image:
enter image description here
I tried this logic but did not work:
select acc.acname,acc.amount,acc.domains as accdomains from project.dataset.dummy_account as acc
CROSS JOIN UNNEST(acc.domains)
But this gave error "Values referenced in UNNEST must be arrays. UNNEST contains expression of type STRING". The error makes sense completely but did not understand, how to convert string to an array.
Can someone please help with solution and also explain a bit, how actually it works. Thank you.
Below is for BigQuery Standard SQL
#standardSQL
SELECT acname, amount, domain
FROM `project.dataset.dummy`,
UNNEST(SPLIT(domains)) domain
You can test, play with above using dummy data from your question as in example below
#standardSQL
WITH `project.dataset.dummy` AS (
SELECT 'abc' acname, 100 amount, 'a,b,c' domains UNION ALL
SELECT 'pqr', 300, 'p,q,r' UNION ALL
SELECT 'lmn', 500, 'l,m,n'
)
SELECT acname, amount, domain
FROM `project.dataset.dummy`,
UNNEST(SPLIT(domains)) domain
with output
Row acname amount domain
1 abc 100 a
2 abc 100 b
3 abc 100 c
4 pqr 300 p
5 pqr 300 q
6 pqr 300 r
7 lmn 500 l
8 lmn 500 m
9 lmn 500 n
The source table project.dataset.dummy which had field "domains" has comma separated values but after the comma there is a space (e.g. 'a'commaspace'b'commaspacec a, b, c). This results in space before the values b c q r m n; in the field "domains" in "Output After Unnest" table. Now I'm joining this table with "salesdomain" as a key. But because of space before b c q r m n, the output received is not correct
To address this - you can just simply use TRIM function to removes all leading and trailing spaces, like in example below
#standardSQL
WITH `project.dataset.dummy` AS (
SELECT 'abc' acname, 100 amount, 'a, b, c' domains UNION ALL
SELECT 'pqr', 300, 'p, q, r' UNION ALL
SELECT 'lmn', 500, 'l, m, n'
)
SELECT acname, amount, TRIM(domain, ' ') domain
FROM `project.dataset.dummy`,
UNNEST(SPLIT(domains)) domain

How to split a cell and create a new row in sql

I have a column which stores multiple comma separated values. I need to split it in a way so that it gets split into as many rows as values in that column along with remaining values in that row.
eg:
John 111 2Jan
Sam 222,333 3Jan
Jame 444,555,666 2Jan
Jen 777 4Jan
Output:
John 111 2Jan
Sam 222 3Jan
Sam 333 3Jan
Jame 444 2Jan
Jame 555 2Jan
Jame 666 2Jan
Jen 777 4Jan
P.S : I have seen multiple questions similar to this, but could not find a way to split in such a way.
This solution is built on Vertica, but it works for every database that offers a function corresponding to SPLIT_PART().
Part of it corresponds to the un-pivoting technique that works with every ANSI compliant database platform that I explain here (just the un-pivoting part of the script):
Pivot sql convert rows to columns
So I would do it like here below. I'm assuming that the minimalistic date representation is part of the second column of a two-column input table. So I'm first splitting that short date literal away, in a first Common Table Expression (and, in a comment, I list that CTE's output), before splitting the comma separated list into tokens.
Here goes:
WITH
-- input
input(name,the_string) AS (
SELECT 'John', '111 2Jan'
UNION ALL SELECT 'Sam' , '222,333 3Jan'
UNION ALL SELECT 'Jame', '444,555,666 2Jan'
UNION ALL SELECT 'Jen' , '777 4Jan'
)
,
-- put the strange date literal into a separate column
the_list_and_the_date(name,list,datestub) AS (
SELECT
name
, SPLIT_PART(the_string,' ',1)
, SPLIT_PART(the_string,' ',2)
FROM input
)
-- debug
-- SELECT * FROM the_list_and_the_date;
-- name|list |datestub
-- John|111 |2Jan
-- Sam |222,333 |3Jan
-- Jame|444,555,666|2Jan
-- Jen |777 |4Jan
,
-- ten integers (too many for this example) to use as pivoting value and as "index"
ten_ints(idx) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 8
UNION ALL SELECT 9
UNION ALL SELECT 10
)
-- the final query - pivoting prepared input using a CROSS JOIN with ten_ints
-- and filter out where the SPLIT_PART() expression evaluates to the empty string
SELECT
name
, SPLIT_PART(list,',',idx) AS token
, datestub
FROM the_list_and_the_date
CROSS JOIN ten_ints
WHERE SPLIT_PART(list,',',idx) <> ''
;
name|token|datestub
John|111 |2Jan
Jame|444 |2Jan
Jame|555 |2Jan
Jame|666 |2Jan
Sam |222 |3Jan
Sam |333 |3Jan
Jen |777 |4Jan
Happy playing ...
Marco the Sane

Conditionally append a character in select statement

Functionality I'm trying to add to my DB2 stored procedure:
Select a MIN() date from a joined table column.
IF there was more than one row in this joined table, append a " * " to the date.
Thanks, any help or guidance is much appreciated.
It's not clear which flavor of DB2 is needed nor if any suggestion worked. This works on DB2 for i:
SELECT
T1.joinCol1,
max( T2.somedateColumn ),
count( t2.somedateColumn ),
char(max( T2.somedateColumn )) concat case when count( T2.somedateColumn )>1 then '*' else '' end
FROM joinFile1 t1 join joinFile2 t2
on joinCol1 = joinCol2
GROUP BY T1.joinCol1
ORDER BY T1.joinCol1
The SQL is fairly generic, so it should translate to many environments and versions.
Substitute table and column names as needed. The COUNT() here actually counts rows from the JOIN rather than the number of times the specific date occurs. If a count of duplicate dates is needed, then some changes to this example are also needed.
Hope this helps
Say I have result coming as
1 Jeff 1
2 Jeff 333
3 Jeff 77
4 Jeff 1
5 Jeff 14
6 Bob 22
7 Bob 4
8 Bob 5
9 Bob 6
Here the value 1 is repeated twice(in 3 column)
So, this query gets the count as 2 along with the * concatenated along with it
SELECT A.USER_VAL,
DECODE(A.CNT, '1', A.CNT, '0', A.CNT, CONCAT(A.CNT, '*')) AS CNT
FROM (SELECT DISTINCT BT.USER_VAL, CAST(COUNT(*) AS VARCHAR2(2)) AS CNT
FROM SO_BUFFER_TABLE_8 BT
GROUP BY BT.USER_VAL) A

How to split the string value in one column and return the result table

Assume we have the following table:
id name member
1 jacky a;b;c
2 jason e
3 kate i;j;k
4 alex null
Now I want to use the sql or t-sql to return the following table:
1 jacky a
1 jacky b
1 jacky c
2 jason e
3 kate i
......
How to do that?
I'm using the MSSQL, MYSQL and Oracle database.
This is the shortest and readable string-to-rows splitter one could devise, and could be faster too.
Use case of choosing pure CTE instead of function, e.g. when you're not allowed to create a function on database :-)
Creating rows generator via function(which could be implemented by using loop or via CTE too) shall still need to use lateral joins(DB2 and Sybase have this functionality, using LATERAL keyword; In SQL Server, this is similar to CROSS APPLY and OUTER APPLY) to ultimately join the splitted rows generated by a function to the main table.
Pure CTE approach could be faster than function approach. The speed metrics lies in profiling though, just check the execution plan of this compared to other solutions if this is indeed faster:
with Pieces(theId, pn, start, stop) AS
(
SELECT id, 1, 1, charindex(';', member)
from tbl
UNION ALL
SELECT id, pn + 1, stop + 1, charindex(';', member, stop + 1)
from tbl
join pieces on pieces.theId = tbl.id
WHERE stop > 0
)
select
t.id, t.name,
word =
substring(t.member, p.start,
case WHEN stop > 0 THEN p.stop - p.start
ELSE 512
END)
from tbl t
join pieces p on p.theId = t.id
order by t.id, p.pn
Output:
ID NAME WORD
1 jacky a
1 jacky b
1 jacky c
2 jason e
3 kate i
3 kate j
3 kate k
4 alex (null)
Base logic sourced here: T-SQL: Opposite to string concatenation - how to split string into multiple records
Live test: http://www.sqlfiddle.com/#!3/2355d/1
Well... let me first introduce you to Adam Machanic who taught me about a Numbers table. He's also written a very fast split function using this Numbers table.
http://dataeducation.com/counting-occurrences-of-a-substring-within-a-string/
After you implement a Split function that returns a table, you can then join against it and get the results you want.
IF OBJECT_ID('dbo.Users') IS NOT NULL
DROP TABLE dbo.Users;
CREATE TABLE dbo.Users
(
id INT IDENTITY NOT NULL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
member VARCHAR(1000)
)
GO
INSERT INTO dbo.Users(name, member) VALUES
('jacky', 'a;b;c'),
('jason', 'e'),
('kate', 'i;j;k'),
('alex', NULL);
GO
DECLARE #spliter CHAR(1) = ';';
WITH Base AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM Base
WHERE n < CEILING(SQRT(1000)) --generate numbers from 1 to 1000, you may change it to a larger value depending on the member column's length.
)
, Nums AS --Numbers Common Table Expression, if your database version doesn't support it, just create a physical table.
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n
FROM Base AS B1 CROSS JOIN Base AS B2
)
SELECT id,
SUBSTRING(member, n, CHARINDEX(#spliter, member + #spliter, n) - n) AS element
FROM dbo.Users
JOIN Nums
ON n <= DATALENGTH(member) + 1
AND SUBSTRING(#spliter + member, n, 1) = #spliter
ORDER BY id
OPTION (MAXRECURSION 0); --Nums CTE is generated recursively, we don't want to limit recursion count.