SQL Server: Split Data in a column based on a delimiter and then join with reference table to get ID values associated

SQL Server: Split Data in a column based on a delimiter and then join with reference table to get ID values associated - sql

I would like to achieve the column Expected in the screenshot below. Could you please help me achieve this on Synapse Datawarehouse using the following table scripts and sample data.
Note:
This is just a sample data set. The original Users table would have millions of rows.
Users column can more than N number of users separated with delimiter ';'
CREATE TABLE [BTS_Test].[Users]
(
[Date] [date] NOT NULL,
[Users] [varchar](500) NOT NULL
)
WITH
(
DISTRIBUTION = ROUND_ROBIN
);
INSERT INTO [BTS_Test].[Users] VALUES('2023-01-11','Rupesh; Suresh; Yogesh');
INSERT INTO [BTS_Test].[Users] VALUES('2023-01-11','Anne; Prudvi; Mahesh');
INSERT INTO [BTS_Test].[Users] VALUES('2023-01-11','Bobby');
INSERT INTO [BTS_Test].[Users] VALUES('2023-01-11','Crystal; Abella');
INSERT INTO [BTS_Test].[Users] VALUES('2023-01-11','Balaji; Kishan; Silpa; Sindhu Srinivas; Kiran');
INSERT INTO [BTS_Test].[Users] VALUES('2023-01-12','Cindrella');
INSERT INTO [BTS_Test].[Users] VALUES('2023-01-12','Monika; Chandler');
INSERT INTO [BTS_Test].[Users] VALUES('2023-01-13','Niko Paul');
CREATE TABLE [BTS_Test].[Student]
(
[ID] [int] NOT NULL,
[StudentName] [varchar](500) NOT NULL
)
WITH
(
DISTRIBUTION = REPLICATE
);
INSERT INTO [BTS_Test].[Student] VALUES(1,'Rupesh');
INSERT INTO [BTS_Test].[Student] VALUES(2,'Suresh');
INSERT INTO [BTS_Test].[Student] VALUES(3,'Yogesh');
INSERT INTO [BTS_Test].[Student] VALUES(4,'Anne');
INSERT INTO [BTS_Test].[Student] VALUES(5,'Prudvi');
INSERT INTO [BTS_Test].[Student] VALUES(6,'Mahesh');
INSERT INTO [BTS_Test].[Student] VALUES(7,'Bobby');
INSERT INTO [BTS_Test].[Student] VALUES(8,'Crystal');
INSERT INTO [BTS_Test].[Student] VALUES(9,'Abella');
INSERT INTO [BTS_Test].[Student] VALUES(10,'Balaji');
INSERT INTO [BTS_Test].[Student] VALUES(11,'Kishan');
INSERT INTO [BTS_Test].[Student] VALUES(12,'Silpa');
INSERT INTO [BTS_Test].[Student] VALUES(13,'Sindhu Srinivas');
INSERT INTO [BTS_Test].[Student] VALUES(14,'Kiran');
INSERT INTO [BTS_Test].[Student] VALUES(15,'Cindrella');
INSERT INTO [BTS_Test].[Student] VALUES(16,'Monika');
INSERT INTO [BTS_Test].[Student] VALUES(17,'Chandler');
INSERT INTO [BTS_Test].[Student] VALUES(18,'Niko Paul');

Here is an option using JSON to keep the sequence. Performance over millions of rows??? Just keep in mind there are penalties for storing delimited data.
Example
Select *
From [Users] A
Cross Apply (
Select Expected = string_agg(ID,';') WITHIN GROUP ( ORDER BY [key] )
From OpenJSON( '["'+replace(string_escape([Users],'json'),';','","')+'"]' )
Join [Student] on trim(Value)=StudentName
) B
Results
Date Users Expected
2023-01-11 Rupesh; Suresh; Yogesh 1;2;3
2023-01-11 Anne; Prudvi; Mahesh 4;5;6
2023-01-11 Bobby 7
2023-01-11 Crystal; Abella 8;9
2023-01-11 Balaji; Kishan; Silpa; Sindhu Srinivas; Kiran 10;11;12;13;14
2023-01-12 Cindrella 15
2023-01-12 Monika; Chandler 16;17
2023-01-13 Niko Paul 18

This produces results using STRING_SPLIT and XML:
SELECT u.[DATE], u.USERS, (STUFF((SELECT ';' + Y
FROM (select CAST(s.id AS VARCHAR) AS y from STRING_SPLIT (u.USERS, ';') sp
INNER JOIN STUDENT s on s.STUDENTNAME = trim(sp.Value)) X
FOR XML PATH('')) ,1,1,'')) as EXPECTED
FROM USERS u

Related

Merge not working for insert a record when it's doesn't exist

Can I use Merge to insert a record when it's doesn't exist like below,
MERGE INTO [dbo].[Test] AS [Target]
USING (SELECT DISTINCT [Name] FROM [dbo].[Test]) AS [Source]
ON [Target].[Name] = [Source].[Name]
WHEN NOT MATCHED THEN
INSERT ([Id], [Name])
VALUES (NEWID(), 'Hello');
If the record with value Hello does not exists in table Test, insert it otherwise don't do anything. With above code record is not inserted even I don't have this record in table. And there are no errors.
I know how to accomplish this using insert ... where not exists (...) but am specifically wanting to know how to do it using a merge statement.

The reason your merge statement wasn't working is that you were merging the same table, dbo.Test, back onto itself, so of course there is no missing record.
You can insert a single missing record as follows, where you create a source query to contain the record(s) you wish to insert:
declare #Test table (id uniqueidentifier, [Name] nvarchar(64))
select * from #Test
-- Returns
-- id | Name
-- ----------------------------------------------
MERGE INTO #Test AS [Target]
USING (select 'Hello' [Name]) AS [Source]
ON [Target].[Name] = [Source].[Name]
WHEN NOT MATCHED THEN
INSERT ([Id], [Name])
VALUES (NEWID(), [Name]);
select * from #Test
-- Returns
-- id | Name
-- ----------------------------------------------
-- C1C87CD5-F745-436D-BD8D-55B2AF431BED | Hello

I agree with the answer from Dale K. Its correct.
If I suppose you might have a source_table from where the data needs to get inserted and not to get inserted if the record already exists then you can do the following.
Instead of the MERGE you can
insert
into dbo.Test
(id
,name
)
select top 1
newID()
,'Hello'
from dbo.Test a
where not exists(select 1
from dbo.Test b
where b.name='Hello')

Updating the parent table considering the values of the child table in oracle

I have table Parent_tbl which consists of 3 columns H_N, Col58 and Type this both the first two columns will be having the same values, only the column type differs.
I have a child table where col58 defines the relationship with the parent but rest of the columns in child_tbl is specific to that table only H_N is the unique column in both of the tables.
I need to update TYPE as EXCHANGE in PARENT_TBL when ever i find the the CHILD_TBL I_STATUS having all the values like S,R and V else the parent_tbl type remains untouched, how can we do this ?
The Parent_tbl.col58 = 1140 that type should be 'EXCHANGE' because child_tbl.col58 = 1140 is having every letter i.e S,R,V.
Here is the DDL for the samples.
CREATE TABLE PARENT_TBL (
H_N number,
col58 number,
TYPE varchar(100)
);
Insert into PARENT_TBL (H_N,COL58,TYPE) values (2,2,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (16,16,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (20,20,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (34,34,'VOID');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (38,38,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (102,102,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (111,111,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (117,117,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (1140,1140,'RETURN');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (131,131,'SALE');
commit;
CREATE TABLE CHILD_TBL
(
I_STATUS varchar(100),
H_n number,
col58 number
);
Insert into CHILD_TBL (I_STATUS,H_N,COL58) values ('S',3,2);
Insert into CHILD_TBL (I_STATUS,H_N,COL58) values ('S',5,2);
Insert into CHILD_TBL (I_STATUS,H_N,COL58) values ('S',7,2);
Insert into CHILD_TBL (I_STATUS,H_N,COL58) values ('S',8,2);
Insert into CHILD_TBL (I_STATUS,H_N,COL58) values ('S',10,2);
Insert into CHILD_TBL (I_STATUS,H_N,COL58) values ('S',1141,1140);
Insert into CHILD_TBL (I_STATUS,H_N,COL58) values ('V',1142,1140);
Insert into CHILD_TBL (I_STATUS,H_N,COL58) values ('R',1143,1140);
Insert into CHILD_TBL (I_STATUS,H_N,COL58) values ('R',1144,1140);
Insert into CHILD_TBL (I_STATUS,H_N,COL58) values ('S',1145,1140);
commit;
EXPECTED OUTPUT:
truncate table PARENT_TBL ;
Insert into PARENT_TBL (H_N,COL58,TYPE) values (2,2,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (16,16,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (20,20,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (34,34,'VOID');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (38,38,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (102,102,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (111,111,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (117,117,'SALE');
Insert into PARENT_TBL (H_N,COL58,TYPE) values (1140,1140,**'EXCHANGE'**);
Insert into PARENT_TBL (H_N,COL58,TYPE) values (131,131,'SALE');

Use this
update PARENT_TBL p
set TYPE='EXCHANGE'
where exists
( select 1
from child_tbl c
where
i_status in ('S','R','V')
and c.col58=p.col58
group by col58
having count(distinct(i_status))=3
)
Explanation:
select col58
from child_tbl c
where
i_status in ('S','R','V')
group by col58
having count(distinct(i_status))=3
This will give you the col58 where count(distinct(i_status))=3 after the filter i_status in ('S','R','V'). So it will be 3 only if there are at least 1 each status of 'S','R','V'. Now use this in exists clause and add
a where condition in the above query and c.col58=p.col58 to join it with the parent table while updating.
Please try this first of your test data and try this without committing the original data. Commit only when you are sure that you got expected result.

Find rows in child table (CHILD_TBL) with proper grouping and use merge:
merge into parent_tbl p
using (select col58
from child_tbl
group by col58
having count(decode(i_status, 'S', 1)) > 0
and count(decode(i_status, 'R', 1)) > 0
and count(decode(i_status, 'V', 1)) > 0) c
on (p.col58 = c.col58)
when matched then update set type = 'EXCHANGE'

SQL Server: Insert batch with output clause

I'm trying the following
Insert number of records to table A with a table-valued parameter (tvp). This tvp has extra column(s) that are not in A
Get the inserted ids from A and the corresponding extra columns in the the tvp and add them to another table B
Here's what I tried
Type:
CREATE TYPE tvp AS TABLE
(
id int,
otherid int,
name nvarchar(50),
age int
);
Tables:
CREATE TABLE A (
[id_A] [int] IDENTITY(1,1) NOT NULL,
[name] [varchar](50),
[age] [int]
);
CREATE TABLE B (
[id_B] [int] IDENTITY(1,1) NOT NULL,
[id_A] [int],
[otherid] [int]
);
Insert:
DECLARE #a1 AS tvp;
DECLARE #a2 AS tvp
-- create a tvp (dummy data here - will be passed to as a param to an SP)
INSERT INTO #a1 (name, age, otherid) VALUES ('yy', 10, 99999), ('bb', 20, 88888);
INSERT INTO A (name, age)
OUTPUT
inserted.id_A,
inserted.name,
inserted.age,
a.otherid -- <== isn't accepted here
INTO #a2 (id, name, age, otherid)
SELECT name, age FROM #a1 a;
INSERT INTO B (id_A, otherid) SELECT id, otherid FROM #a2
However, this fails with The multi-part identifier "a.otherid" could not be bound., which I guess is expected because columns from other tables are not accepted for INSERT statement (https://msdn.microsoft.com/en-au/library/ms177564.aspx).
from_table_name
Is a column prefix that specifies a table included in the FROM clause of a DELETE, UPDATE, or MERGE statement that is used to specify the rows to update or delete.
So is there any other way to achieve this?

You cannot select value from a source table by using INTO operator.
Use OUTPUT clause in the MERGE command for such cases.
DECLARE #a1 AS tvp;
DECLARE #a2 AS tvp
INSERT INTO #a1 (name, age, otherid) VALUES ('yy', 10, 99999), ('bb', 20, 88888);
MERGE A a
USING #a1 a1
ON a1.id =a.[id_A]
WHEN NOT MATCHED THEN
INSERT (name, age)
VALUES (a1.name, a1.age)
OUTPUT inserted.id_A,
a1.otherId,
inserted.name,
inserted.age
INTO #a2;
INSERT INTO B (id_A, otherid) SELECT id, otherid FROM #a2

How to use merge statement to split one staging tables into two for loading from Staging to RealtionalDB?

I have following tables. I want to insert values into companyGroup and Comapany from test1 table. what would be better way CTE or Using Merge directly and how can i do that using tsql. Test1 is on database A and company and companygroup are on database B.
create table test1
(
companyID int identity
,CompanyName Varchar(50)
,[Group] Varchar(5)
)
INSERT INTO (CompanyName, [Group]) values ('Unknown', '0')
INSERT INTO (CompanyName, [Group]) values ('APPLE', 'IOS')
INSERT INTO (CompanyName, [Group]) values ('Google', 'Android')
INSERT INTO (CompanyName, [Group]) values ('Samsung', 'Android')
INSERT INTO (CompanyName, [Group]) values ('Lg', 'IOS')
create table CompanyGroup
(
Groupkey int identity (0,1) primary key
,GroupName varchar(5)
) ;
INSERT INTO (GroupName) VALUES ('Unknown')
create table Company
(
compnayKey int identity (0,1) primary key
,CompanyName Varchar(50)
,companyGroupKey int References CompanyGroup(GroupKey)
)
INSERT INTO (CompanyName,companyGroupKey) VALUES ('Unknown',0)
when I insert the values into company how can i convert group to int ? Any ideas? What would be Best TSQL for this load.

Try using this:
INSERT INTO CompanyGroup(GroupName)
SELECT [Group]
FROM test1
WHERE NOT EXISTS(SELECT 1
FROM CompanyGroup
WHERE GroupName = [Group]);
MERGE Company AS target
USING (SELECT t.CompanyName, g.Groupkey
FROM test1 AS t
LEFT JOIN CompanyGroup AS g ON t.[Group] = g.GroupName
) AS source
ON (target.CompanyName = source.CompanyName
AND target.companyGroupKey = source.Groupkey)
WHEN NOT MATCHED THEN
INSERT (CompanyName, companyGroupKey)
VALUES (source.CompanyName, source.Groupkey);

insert data to temporary table in sql 2008

CREATE TABLE #tmpt
(
ID INT
,sName varchar(20)
)
INSERT INTO #tmpt VALUES (1,'ran')
INSERT INTO #tmpt VALUES (2,'pan')
INSERT INTO #tmpt VALUES (3,'fan')
INSERT INTO #tmpt VALUES (4,'gan')
This type of insert does not work. Can you help

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server: Split Data in a column based on a delimiter and then join with reference table to get ID values associated - sql

This produces results using STRING_SPLIT and XML: SELECT u.[DATE], u.USERS, (STUFF((SELECT ';' + Y FROM (select CAST(s.id AS VARCHAR) AS y from STRING_SPLIT (u.USERS, ';') sp INNER JOIN STUDENT s on s.STUDENTNAME = trim(sp.Value)) X FOR XML PATH('')) ,1,1,'')) as EXPECTED FROM USERS u

Related

Merge not working for insert a record when it's doesn't exist

Updating the parent table considering the values of the child table in oracle

SQL Server: Insert batch with output clause

How to use merge statement to split one staging tables into two for loading from Staging to RealtionalDB?

insert data to temporary table in sql 2008

Categories

Resources