Split comma separated values based on another table

Split comma separated values based on another table - sql

I would like to split comma separated values based on another table
I cannot normalize it since original table has over 8 million rows. It crushed my laptop when I tried it.
How can I put data into relevant columns and create a new column if data is not found.
For example:
TableA,
Type1 Type2
---------------------
A F
B G
C H
D I
E NULL
TableB
ID Country AllTypes
---------------------------------
1 Italy A, B, C
2 USA D, E, A, F
4 Japan I, O, Z
5 UK NULL
By using these two tables, I would like to get the output such as
ID Country AllTypes Type1 Type2 UnCaptured
----------------------------------------------------------------------
1 Italy A, B, C A, B, C NULL NULL
2 USA D, E, G, F D, E G, F NULL
4 Japan I, O, Z NULL I O, Z
5 UK NULL NULL NULL NULL
This is I have done so far
with TableA as (
select 'A' as Type1, 'F' as Type2 union all
select 'B', 'G' union all
select 'C', 'H' union all
select 'D', 'I' union all
select 'E', NULL
),
TableB as (
select 1 as ID, 'Italy' as Country, 'A, B, C' as Alltypes union all
select 2, 'USA', 'D, E, A, F' union all
select 4, 'Japan', 'I', 'O', 'Z' union all
select 5, 'UK', NULL
)
select b.Id, b.Country, b.Alltypes,
String_Agg(v.type1,',') Type1,
String_Agg(v.type2,',') Type2
**String_Agg(v.Type3,',') Uncaptured*** ------- This query
from tableb b
outer apply (
select Trim(value) t,
case when exists
(select * from tablea a where a.type1=Trim(value))
then Trim(value) end type1,
case when exists
(select * from tablea a where a.type2=Trim(value))
then Trim(value) end Type2,
Case when not exists ------------This query
( (select * from tablea a where a.type1=Trim(value)) -------
and ------
(select * from tablea a where a.type2=Trim(value))------
) then Trim(value) end Type3** -------------
from String_Split(alltypes, ',')
)v
group by Id, Country, AllTypes
Without highlighted queries(-----) which are for creating a new column (Uncaptured), it works ok like below.
Id Country Alltypes Type1 Type2
1 Italy A, B, C A,B,C NULL
2 USA D, E, A, F D,E,A F
4 Japan I, O, Z I NULL
5 UK NULL NULL NULL
But if I add those highlighted queries, it shows error. I was also thinking of else but did not work as well.
Could someone help me please?

----------------------- DDL+DML: Should have been provided by the OP !
DROP TABLE IF EXISTS TableA,TableB
GO
create table TableA(Type1 CHAR(1), Type2 char(1))
GO
INSERT TableA (Type1,Type2) VALUES
('A', 'F' ),
('B', 'G' ),
('C', 'H' ),
('D', 'I' ),
('E', NULL )
GO
CREATE TABLE TableB (ID INT, Country NVARCHAR(100), AllTypes NVARCHAR(100))
GO
INSERT TableB (ID,Country,AllTypes)VALUES
(1, 'Italy','A, B, C' ),
(2, 'USA ','D, E, G, F' ),
(4, 'Japan','I, O, Z' ),
(5, 'UK ','NULL' )
GO
----------------------- Solution
;WITH MyCTE AS (
SELECT ID,Country,AllTypes, MyType = TRIM([value])
FROM TableB
CROSS APPLY string_split(AllTypes,',')
)
,MyCTE02 as (
SELECT ID,Country,AllTypes, MyType,a1.Type1,a2.Type2,
UnCaptured = CASE WHEN a1.Type1 IS NULL and a2.Type2 IS NULL THEN MyType END
FROM MyCTE c
LEFT JOIN TableA a1 ON c.MyType = a1.Type1
LEFT JOIN TableA a2 ON c.MyType = a2.Type2
)
SELECT ID,Country,AllTypes--,MyType
,Type1 = STRING_AGG(Type1,','),Type2 = STRING_AGG(Type2,','),UnCaptured = STRING_AGG(UnCaptured,',')
FROM MyCTE02
GROUP BY ID,Country,AllTypes
GO

How about
outer apply (
select Trim(value) t, a1.type1, a2.type2,
CASE WHEN COALESCE(a1.type1, a2.type2) IS NULL THEN Trim(s.value) END unCaptured
from String_Split(alltypes, ',') s
left join tablea a1 where a1.type1=Trim(s.value)
left join tablea a2 where a2.type2=Trim(s.value)
)v

Related

Lookup table from another table and create a new column

I would like to use SQL Server something like lookup table in Excel.
I cannot normalize it since original table has over 8 million rows. It crushed my laptop when I tried it.
How can I put data into relevant columns and create a new column if data is not found.
For example)
tableA,
Type1 Type2
---------------------
A F
B G
C H
D I
E NULL
TableB
ID Country AllTypes
---------------------------------
1 Italy A, B, C
2 USA D, E, A, F
4 Japan I, O, Z
5 UK NULL
By using these two tables, I would like to get the output such as
ID Country AllTypes Type1 Type2 UnCaptured
----------------------------------------------------------------------
1 Italy A, B, C A, B, C NULL NULL
2 USA D, E, G, F D, E G, F NULL
4 Japan I, O, Z NULL I O, Z
5 UK NULL NULL NULL NULL
========================= Thanks to answers before I edited my question, I could do it here so far
with TableA as (
select 'A' as Type1, 'F' as Type2 union all
select 'B', 'G' union all
select 'C', 'H' union all
select 'D', 'I' union all
select 'E', NULL
),
TableB as (
select 1 as ID, 'Italy' as Country, 'A, B, C' as Alltypes union all
select 2, 'USA', 'D, E, A, F' union all
select 4, 'Japan', 'I', 'O', 'Z' union all
select 5, 'UK', NULL
)
select b.Id, b.Country, b.Alltypes,
String_Agg(v.type1,',') Type1,
String_Agg(v.type2,',') Type2,
String_Agg(v.Type3,',') Uncaptured
from tableb b
outer apply (
select Trim(value) t,
case when exists
(select * from tablea a where a.type1=Trim(value))
then Trim(value) end type1,
case when exists
(select * from tablea a where a.type2=Trim(value))
then Trim(value) end Type2,
Case when not exists
( (select * from tablea a where a.type1=Trim(value))
and
(select * from tablea a where a.type2=Trim(value))
) then Trim(value) end Type3
from String_Split(alltypes, ',')
)v
group by Id, Country, AllTypes
But it shows error. I was also thinking of else but did not work as well.
Could you help me please?

Having multiple delimited values in a single column is always going to be problematic, one way is to use a combination of string_split and string_agg if you are using SQL Server 2017+
select b.Id, b.Country, b.Alltypes,
String_Agg(v.type1,',') Type1,
String_Agg(v.type2,',') Type2
from tableb b
outer apply (
select Trim(value) t,
case when exists
(select * from tablea a where a.type1=Trim(value))
then Trim(value) end type1,
case when exists
(select * from tablea a where a.type2=Trim(value))
then Trim(value) end type2
from String_Split(alltypes, ',')
)v
group by Id, Country, AllTypes

Too long for a comment. Any chance you can refactor the DB?
I would suggest
tableA
col1 Type
A 1
B 1
..
F 2
G 2
Country
id Name
1 Italy
2 USA
Country_TableA
countryId aId
1 A
1 B

As commented previously, combining string_split and string_agg allows you to reach what you want. This is my (shorter) version:
select
b.ID, b.Country, b.Alltypes,
string_agg(a.Type1, ',') as Type1,
string_agg(aa.Type2, ',') as Type2
from TableB b
outer apply string_split(b.Alltypes, ',')
left join TableA a
on a.Type1 = ltrim(rtrim(value))
left join TableA aa
on aa.Type2 = ltrim(rtrim(value))
group by b.ID, b.Country, b.Alltypes
You can test on this db<>fiddle

Get Distinct values without null

I have a table like this;
--Table_Name--
A | B | C
-----------------
A1 NULL NULL
A1 NULL NULL
A2 NULL NULL
NULL B1 NULL
NULL B2 NULL
NULL B3 NULL
NULL NULL C1
I want to get like this ;
--Table_Name--
A | B | C
-----------------
A1 B1 C1
A2 B2 NULL
NULL B3 NULL
How should I do that ?

Here's one option:
sample data is from line #1 - 9
the following CTEs (lines #11 - 13) fetch ranked distinct not null values from each column
the final query (line #15 onward) returns desired result by outer joining previous CTEs on ranked value
SQL> with test (a, b, c) as
2 (select 'A1', null, null from dual union all
3 select 'A1', null, null from dual union all
4 select 'A2', null, null from dual union all
5 select null, 'B1', null from dual union all
6 select null, 'B2', null from dual union all
7 select null, 'B3', null from dual union all
8 select null, null, 'C1' from dual
9 ),
10 --
11 ta as (select distinct a, dense_rank() over (order by a) rn from test where a is not null),
12 tb as (select distinct b, dense_rank() over (order by b) rn from test where b is not null),
13 tc as (select distinct c, dense_rank() over (order by c) rn from test where c is not null)
14 --
15 select ta.a, tb.b, tc.c
16 from ta full outer join tb on ta.rn = tb.rn
17 full outer join tc on ta.rn = tc.rn
18 order by a, b, c
19 /
A B C
-- -- --
A1 B1 C1
A2 B2
B3
SQL>

If you have only one value per column, then I think a simpler solution is to enumerate the values and aggregate:
select max(a) as a, max(b) as b, max(c) as c
from (select t.*,
dense_rank() over (partition by (case when a is null then 1 else 2 end),
(case when b is null then 1 else 2 end),
(case when c is null then 1 else 2 end)
order by a, b, c
) as seqnum
from t
) t
group by seqnum;
This only "aggregates" once and only uses one window function, so I think it should have better performance than handling each column individually.
Another approach is to use lateral joins which are available in Oracle 12C -- but this assumes that the types are compatible:
select max(case when which = 'a' then val end) as a,
max(case when which = 'b' then val end) as b,
max(case when which = 'c' then val end) as c
from (select which, val,
dense_rank() over (partition by which order by val) as seqnum
from t cross join lateral
(select 'a' as which, a as val from dual union all
select 'b', b from dual union all
select 'c', c from dual
) x
where val is not null
) t
group by seqnum;
The performance may be comparable, because the subquery removes so many rows.

SQL Statement with 3 select statements

I am trying to combine the data of three tables but running into a minor issue.
Let's say we have 3 tables
Table A
ID | ID2 | ID3 | Name | Age
1 2x 4y John 23
2 7j Mike 27
3 1S1 6HH Steve 67
4 45 O8 Carol 56
Table B
| ID2 | ID3 | Price
2x 4y 23
7j 8uj 27
x4 Q6 56
Table C
|ID | Weight|
1 145
1 210
1 240
2 234
2 110
3 260
3 210
4 82
I want to get every record from table A of everyone who weighs 200 or more but they cannot be in table B. Table A and C are joined by ID. Table A and B are joined by either ID2 or ID3. ID2 and ID3 don't both have to necessarily be populated but at least 1 will. Either can be present or both and they will be unique. So expected result is
3 | 1S1 | 6HH | Steve| 67
Note that a person can have multiple weights but as long as at least one record is 200 or above they get pulled.
What I have so far
Select *
From tableA x
Where
x.id in (Select distinct y.id
From tableA y, tableC z
Where y.id = z.id
And z.weight >= '200'
And y.id not in (Select distinct h.id
From tableA h, tableB k
Where (h.id2 = k.id2 or h.id3 = k.id3)))
When I do this it seems to ignore the check on tableB and I get John, Mike and Steve. Any ideas? Sorry it's convoluted, this is what I have to work with. I am doing this in oracle by the way.

This sounds like exists and not exists. So a direct translation is:
select a.*
from tableA a
where exists (select 1 from tableC c where c.id = a.id and c.weight >= 200) and
not exists (select 1 from tableB b where b.id2 = a.id2 or b.id3 = a.id3);
Splitting the or into two separate subqueries can often improve performance:
select a.*
from tableA a
where exists (select 1 from tableC c where c.id = a.id and c.weight >= 200) and
not exists (select 1 from tableB b where b.id2 = a.id2) and
not exists (select 1 from tableB b where b.id3 = a.id3);

Here's what I came up with.
SELECT DISTINCT
A.ID,
A.ID2,
A.ID3,
A.Name,
A.Age
FROM
A
LEFT OUTER JOIN C ON C.ID = A.ID
LEFT OUTER JOIN B ON
B.ID2 = A.ID2
OR B.ID3 = A.ID3
WHERE
C.Weight >= 200
AND B.Price IS NULL
BELOW is test data
CREATE TABLE A
(
ID INT,
ID2 VARCHAR(3),
ID3 VARCHAR(3),
Name VARCHAR(10),
Age INT
);
INSERT INTO A VALUES (1, '2x', '4y', 'John', 23);
INSERT INTO A VALUES (2, '7j', NULL , 'Mike', 27);
INSERT INTO A VALUES (3, '1S1', '6HH', 'Steve', 67);
INSERT INTO A VALUES (4, '45', 'O8', 'Carol', 56);
CREATE TABLE B
(
ID2 VARCHAR(3),
ID3 VARCHAR(3),
Price INT
);
INSERT INTO B VALUES ('2x', '4y', 23);
INSERT INTO B VALUES ('7j', '8uj', 27);
INSERT INTO B VALUES ('x4', 'Q6', 56);
CREATE TABLE C
(
ID INT,
Weight INT
);
INSERT INTO C VALUES (1, 145);
INSERT INTO C VALUES (1, 210);
INSERT INTO C VALUES (1, 240);
INSERT INTO C VALUES (2, 234);
INSERT INTO C VALUES (2, 110);
INSERT INTO C VALUES (3, 260);
INSERT INTO C VALUES (3, 210);
INSERT INTO C VALUES (4, 82);

Select a.id, a.id2, a.id3
From table_a a
Left join table_c c on a.id = c.id
Where c.weight >=200
And not exists
(Select 1
From table_b b
Where a.id = b.id2
Or a.id = b.id3
);

I was beating to the answers, but I used INNER JOIN on tables a and c and a NOT EXISTS on table b.
--This first section is creating the test data
with Table_A (id, id2, id3, Name, age) as
(select 1, '2x', '4y', 'John', 23 from dual union all
select 2, '7j', null, 'Mike', 27 from dual union all
select 3, '1S1', '6HH', 'Steve', 67 from dual union all
select 4, '45', 'O8', 'Carol', 56 from dual),
Table_B(id2, id3, price) as
(select '2x', '4y', 23 from dual union all
select '7j', '8uj', 27 from dual union all
select 'x4', 'Q6', 56 from dual),
Table_C(id, weight) as
(select 1, 145 from dual union all
select 1, 210 from dual union all
select 1, 240 from dual union all
select 2, 234 from dual union all
select 2, 110 from dual union all
select 3, 260 from dual union all
select 3, 210 from dual union all
select 4, 82 from dual)
--Actual query starts here
select distinct a.*
from table_a a
--join to table c, include the weight filter
inner join table_c c on (a.id = c.id and c.weight >= 200)
where not exists -- The rest is the NOT EXISTS to exclude the values in table b
(select 1 from table_b b
where a.id2 = b.id2
or a.id3 = b.id3);

Select continuous ranges from table

I need to extract continous ranges from a table based on consecutive numbers (column N) and same "category" these numbers relate to (column C below). Graphically it looks like this:
N C D
--------
1 x a C N1 N2 D1 D2
2 x b ------------------
3 x c x 1 4 a d (continuous range with same N)
4 x d ==> x 6 7 e f (new range because "5" is missing)
6 x e y 8 10 g h (new range because C changed to "y")
7 x f
8 y g
9 y h
10 y i
SQL Server is 2005. Thanks.

DECLARE #myTable Table
(
N INT,
C CHAR(1),
D CHAR(1)
)
INSERT INTO #myTable(N,C,D) VALUES(1, 'x', 'a');
INSERT INTO #myTable(N,C,D) VALUES(2, 'x', 'b');
INSERT INTO #myTable(N,C,D) VALUES(3, 'x', 'c');
INSERT INTO #myTable(N,C,D) VALUES(4, 'x', 'd');
INSERT INTO #myTable(N,C,D) VALUES(6, 'x', 'e');
INSERT INTO #myTable(N,C,D) VALUES(7, 'x', 'f');
INSERT INTO #myTable(N,C,D) VALUES(8, 'y', 'g');
INSERT INTO #myTable(N,C,D) VALUES(9, 'y', 'h');
INSERT INTO #myTable(N,C,D) VALUES(10, 'y', 'i');
WITH StartingPoints AS(
SELECT A.*, ROW_NUMBER() OVER(ORDER BY A.N) AS rownum
FROM #myTable AS A
WHERE NOT EXISTS(
SELECT *
FROM #myTable B
WHERE B.C = A.C
AND B.N = A.N - 1
)
),
EndingPoints AS(
SELECT A.*, ROW_NUMBER() OVER(ORDER BY A.N) AS rownum
FROM #myTable AS A
WHERE NOT EXISTS (
SELECT *
FROM #myTable B
WHERE B.C = A.C
AND B.N = A.N + 1
)
)
SELECT StartingPoints.C,
StartingPoints.N AS [N1],
EndingPoints.N AS [N2],
StartingPoints.D AS [D1],
EndingPoints.D AS [D2]
FROM StartingPoints
JOIN EndingPoints ON StartingPoints.rownum = EndingPoints.rownum
Results:
C N1 N2 D1 D2
---- ----------- ----------- ---- ----
x 1 4 a d
x 6 7 e f
y 8 10 g i

The RANK function is a safer bet than ROW_NUMBER, in case any N values are duplicated, as in the following example:
declare #ncd table(N int, C char, D char);
insert into #ncd
select 1,'x','a' union all
select 2,'x','b' union all
select 3,'x','c' union all
select 4,'x','d' union all
select 4,'x','e' union all
select 7,'x','f' union all
select 8,'y','g' union all
select 9,'y','h' union all
select 10,'y','i' union all
select 10,'y','j';
with a as (
select *
, r = N-rank()over(partition by C order by N)
from #ncd
)
select C=MIN(C)
, N1=MIN(N)
, N2=MAX(N)
, D1=MIN(D)
, D2=MAX(D)
from a
group by r;
Result, which correctly withstands the duplicated 4 and 10:
C N1 N2 D1 D2
---- ----------- ----------- ---- ----
x 1 4 a e
x 7 7 f f
y 8 10 g j

Using this answer as a starting point, I ended up with the following:
;
WITH data (N, C, D) AS (
SELECT 1, 'x', 'a' UNION ALL
SELECT 2, 'x', 'b' UNION ALL
SELECT 3, 'x', 'c' UNION ALL
SELECT 4, 'x', 'd' UNION ALL
SELECT 6, 'x', 'e' UNION ALL
SELECT 7, 'x', 'f' UNION ALL
SELECT 8, 'y', 'g' UNION ALL
SELECT 9, 'y', 'h' UNION ALL
SELECT 10, 'y', 'i'
),
ranked AS (
SELECT
curr.*,
Grp = curr.N - ROW_NUMBER() OVER (PARTITION BY curr.C ORDER BY curr.N),
IsStart = CASE WHEN pred.C IS NULL THEN 1 ELSE 0 END,
IsEnd = CASE WHEN succ.C IS NULL THEN 1 ELSE 0 END
FROM data AS curr
LEFT JOIN data AS pred ON curr.C = pred.C AND curr.N = pred.N + 1
LEFT JOIN data AS succ ON curr.C = succ.C AND curr.N = succ.N - 1
)
SELECT
C,
N1 = MIN(N),
N2 = MAX(N),
D1 = MAX(CASE IsStart WHEN 1 THEN D END),
D2 = MAX(CASE IsEnd WHEN 1 THEN D END)
FROM ranked
WHERE 1 IN (IsStart, IsEnd)
GROUP BY C, Grp

Write a stored procedure. It will create and fill a temporary table witch will contain C, N1, N2, D1 and D2 columns.
Create the temporary table
use a cursor to loop on entries in table containing N, C, D ordered by N
use a variable to detect a new range (Ni < N(i-1)-1) and to store N1, N2, D1 and D2
INSERT into the temporary table for each range detected (new range detected or and of the cursor)
Tell me if you need a code example.

How to group by to get rid of duplicates rows

How do I remove duplicates from the table where all the columns are significant apart from PK.
declare #dummy table
(
pk int,
a char(1),
b char(1),
c char(1)
)
insert into #dummy
select 1, 'A', 'B', 'B' union all
select 2, 'A', 'B', 'B' union all
select 3, 'P', 'Q', 'R' union all
select 4, 'P', 'Q', 'R' union all
select 5, 'X', 'Y', 'Z' union all
select 6, 'X', 'Y', 'Z' union all
select 7, 'A', 'B', 'Z'
what I get with out distinction:
select * from #dummy
pk a b c
----------- ---- ---- ----
1 A B B
2 A B B
3 P Q R
4 P Q R
5 X Y Z
6 X Y Z
7 A B Z
What I'd like is:
select ... do magic ....
pk a b c
----------- ---- ---- ----
1 A B B
3 P Q R
5 X Y Z
7 A B Z

Found it:
select min(pk), a, b, c
from #dummy
group by a, b, c

You want something like this, I think:
DELETE FROM f
FROM #dummy AS f INNER JOIN #dummy AS g
ON g.data = f.data
AND f.id < g.id
Check out this article: http://www.simple-talk.com/sql/t-sql-programming/removing-duplicates-from-a-table-in-sql-server/

At first, I thought distinct would do it, but I'm fairly certain what you want is group by:
select * from #dummy group by a,b,c
Since there's a unique primary key, all rows are distinct.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Split comma separated values based on another table - sql

How about outer apply ( select Trim(value) t, a1.type1, a2.type2, CASE WHEN COALESCE(a1.type1, a2.type2) IS NULL THEN Trim(s.value) END unCaptured from String_Split(alltypes, ',') s left join tablea a1 where a1.type1=Trim(s.value) left join tablea a2 where a2.type2=Trim(s.value) )v

Related

Lookup table from another table and create a new column

Get Distinct values without null

SQL Statement with 3 select statements

Select continuous ranges from table

How to group by to get rid of duplicates rows

Categories

Resources