Concatenate and group multiple rows - sql

I am doing this:
, cte_proc_code (accn,proc_code) as
(SELECT accn_id,
(SELECT proc_code + ','
FROM [XDataFullExtract].[dbo].[accn_billed_procedures]
FOR XML PATH('')
)
FROM [XDataFullExtract].[dbo].[accn_billed_procedures]
group by accn_id)
My data looks like this:
accn_id,proc_code
AA123, 1132
AA123, 5234
AA123, 4524
BB123, 2345
BB123, 4444
The result that I would like is:
accn_id,proc_code
AA123, 1132, 5234, 4524
BB123, 2345, 4444
My solution works however IT'S WAY TOO SLOW!!
Is there a faster way to do this? I think the XML is slowing me down.

This approach involves adding a staging column to your table, but could run faster:
-- table, with new varchar(max) column cncat added
declare #t table(accn_id varchar(30), proc_code varchar(30), cncat varchar(max));
declare #concat varchar(max)=''; --staging variable
 
insert into #t values
('AA123','1132','')
, ('AA123','5234','')
, ('AA123','4524','')
, ('BB123','2345','')
, ('BB123','4444','');
 
-- update cncat
with cte as (select *,r=row_number()over(partition by accn_id order by proc_code) from #t)
update cte set #concat = cncat = case cte.r when 1 then '' else #concat end + ','+proc_code
 
-- results
select accn_id, cncat=stuff(max(cncat),1,1,'')
from #t
group by accn_id;
 
-- clean up (optional)
update #t set cncat='';
go

In the query you have provided it is not "the XML that is slowing you down".
You are building a comma separated string using all values in your table for every row returned.
You are missing a where clause in your sub-query that should filter your concatenated values to only use rows for the current row in the outer query.

Related

Dynamic SQL - union all tables (number of tables is dynamically created)

I don't know how to union all tables with dynamic SQL.
The issue is that I'm inserting into db a number of tables - all having the same structure (only one varchar column
[Line]
). I don't know that would be the number of tables inserted - it depends on the project. But I want to automate the process in SQL.
I'm using this query to find those tables, additionally I'm adding some [RowNum] that may serve as an ID of each table:
SELECT
ROW_NUMBER() OVER (ORDER BY Name) AS [RowNum],
[Name] AS [Name]
INTO #all_tables_with_ids
FROM #all_tables
This query returns:
RowNum | Name
------------------------
1 | Table 1
2 | Table 2
3 | Table 3
4 | Table 4
I would like to merge all tables together. I was trying to write some insert into in while loop but it didn't work. I figured out that I need dynamic SQL.
Can you suggest something? I was trying to find some examples but all of them fail due to the fact that the list of tables is not known at the beginning, so it needs to be created dynamically as well.
Demo here:
create table #test
(
RowNum int,
Name varchar(100)
)
insert into #test
select 1,quotename('table1')
union all
select 2,quotename('table2')
declare #sql nvarchar(max)
set #sql='select somecol from tbl union all '
declare #sql1 nvarchar(max)
;with cte
as
(select #sql as ql,name,rplc
from
#test t1
cross apply
(select replace(#sql,'tbl',name) as rplc from #test t2 where t1.rownum=t2.rownum)b
)
select #sql1= stuff(
(select ''+rplc
from cte
for xml path('')
),1,0,'')
set #sql1=substring(#sql1,1,len(#sql1)-10)
print #sql1
--exec(#Sql1)

What is the best way to join between two table which have coma seperated columns

Table1
ID Name Tags
----------------------------------
1 Customer1 Tag1,Tag5,Tag4
2 Customer2 Tag2,Tag6,Tag4,Tag11
3 Customer5 Tag6,Tag5,Tag10
and Table2
ID Name Tags
----------------------------------
1 Product1 Tag1,Tag10,Tag6
2 Product2 Tag2,Tag1,Tag5
3 Product5 Tag1,Tag2,Tag3
what is the best way to join Table1 and Table2 with Tags column?
It should look at the tags column which coma seperated on table 2 for each coma seperated tag on the tags column in the table 1
Note: Tables are not full-text indexed.
The best way is not to have comma separated values in a column. Just use normalized data and you won't have trouble with querying like this - each column is supposed to only have one value.
Without this, there's no way to use any indices, really. Even a full-text index behaves quite different from what you might thing, and they are inherently clunky to use - they're designed for searching for text, not meaningful data. In the end, you will not get much better than something like
where (Col like 'txt,%' or Col like '%,txt' or Col like '%,txt,%')
Using a xml column might be another alternative, though it's still quite a bit silly. It would allow you to treat the values as a collection at least, though.
I don't think there will ever be an easy and efficient solution to this. As Luaan pointed out, it is a very bad idea to store data like this : you lose most of the power of SQL when you squeeze what should be individual units of data into a single cell.
But you can manage this at the slight cost of creating two user-defined functions. First, use this brilliant recursive technique to split the strings into individual rows based on your delimiter :
CREATE FUNCTION dbo.TestSplit (#sep char(1), #s varchar(512))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep, #s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(#sep, #s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn AS SplitIndex,
SUBSTRING(#s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS SplitPart
FROM Pieces
)
Then, make a function that takes two strings and counts the matches :
CREATE FUNCTION dbo.MatchTags (#a varchar(512), #b varchar(512))
RETURNS INT
AS
BEGIN
RETURN
(SELECT COUNT(*)
FROM dbo.TestSplit(',', #a) a
INNER JOIN dbo.TestSplit(',', #b) b
ON a.SplitPart = b.SplitPart)
END
And that's it, here is a test roll with table variables :
DECLARE #A TABLE (Name VARCHAR(20), Tags VARCHAR(100))
DECLARE #B TABLE (Name VARCHAR(20), Tags VARCHAR(100))
INSERT INTO #A ( Name, Tags )
VALUES
( 'Customer1','Tag1,Tag5,Tag4'),
( 'Customer2','Tag2,Tag6,Tag4,Tag11'),
( 'Customer5','Tag6,Tag5,Tag10')
INSERT INTO #B ( Name, Tags )
VALUES
( 'Product1','Tag1,Tag10,Tag6'),
( 'Product2','Tag2,Tag1,Tag5'),
( 'Product5','Tag1,Tag2,Tag3')
SELECT * FROM #A a
INNER JOIN #B b ON dbo.MatchTags(a.Tags, b.Tags) > 0
I developed a solution as follows:
CREATE TABLE [dbo].[Table1](
Id int not null,
Name nvarchar(250) not null,
Tag nvarchar(250) null,
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Table2](
Id int not null,
Name nvarchar(250) not null,
Tag nvarchar(250) null,
) ON [PRIMARY]
GO
get sample data for Table1, it will insert 28000 records
INSERT INTO Table1
SELECT CustomerID,CompanyName, (FirstName + ',' + LastName)
FROM AdventureWorks.SalesLT.Customer
GO 3
sample data for Table2.. i need same tags for Table2
declare #tag1 nvarchar(50) = 'Donna,Carreras'
declare #tag2 nvarchar(50) = 'Johnny,Caprio'
get sample data for Table2, it will insert 9735 records
INSERT INTO Table2
SELECT ProductID,Name, (case when(right(ProductID,1)>=5) then #tag1 else #tag2 end)
FROM AdventureWorks.SalesLT.Product
GO 3
My Solution
create TABLE #dt (
Id int IDENTITY(1,1) PRIMARY KEY,
Tag nvarchar(250) NOT NULL
);
I've create temp table and i will fill with Distinct Tag-s in Table1
insert into #dt(Tag)
SELECT distinct Tag
FROM Table1
Now i need to vertical table for tags
create TABLE #Tags ( Tag nvarchar(250) NOT NULL );
Now i'am fill #Tags table with While, you can use Cursor but while is faster
declare #Rows int = 1
declare #Tag nvarchar(1024)
declare #Id int = 0
WHILE #Rows>0
BEGIN
Select Top 1 #Tag=Tag,#Id=Id from #dt where Id>#Id
set #Rows =##RowCount
if #Rows>0
begin
insert into #Tags(Tag) SELECT Data FROM dbo.StringToTable(#Tag, ',')
end
END
last step : join Table2 with #Tags
select distinct t.*
from Table2 t
inner join #Tags on (',' + t.Tag + ',') like ('%,' + #Tags.Tag + ',%')
Table rowcount= 28000 Table2 rowcount=9735 select is less than 2 second
I use this kind of solution with paths of trees. First put a comma at the very begin and at the very end of the string. Than you can call
Where col1 like '%,' || col2 || ',%'
Some database index the column also for the like(postgres do it partially), therefore is also efficient. I don't know sqlserver.

Multiple Filter in the same column

I have a sql table with some values and a lot of filters
ID | Name | Filter1 | Filter2 | Filter3 | Filter4 ... and so on...
As now the filters have been set as int and I am running a query as follows to get the data required
select Name
from tblABC
where Filter1=1 and Filter2 = 7 and Filter3 = 33 ... and so on...'
My issue is that I want a filter column to hold multiple numbers. eg:- row no 3 will have numbers 8 and 13 in Filter1 cell, so that when I run a query for 8 or 13 I get the same result.
ie I want both the below queries to return the same result.
select... where Filter1=8
select... where Filter1=13
How can this be done? I tried converting the Filter columns to nvarchar and entering data as .8.13. where '.' where was used as separators. After this, running a query 'select... where Filter1 LIKE '%.8.%' is working for me.. But there are like 12 Filter columns and when such a string search is run in large volumes, wouldn't it make the query slow. What would be a more efficient way of doing this?
I am using Microsoft SQL 2014
A more efficient way, hmm. Separating tblABC from the filters would be my suggested way to go, even if it's not the most efficient way it will make up for it in maintenance (and it sure is more efficient than using like with wildcards for it).
tblABC ID Name
1 Somename
2 Othername
tblABCFilter ID AbcID Filter
1 1 8
2 1 13
3 1 33
4 2 5
How you query this data depends on your required output of course. One way is to just use the following:
SELECT tblABC.Name FROM tblABC
INNER JOIN tblABCFilter ON tblABC.ID = tblABCFilter.AbcID
WHERE tblABCFilter.Filter = 33
This will return all Name with a Filter of 33.
If you want to query for several Filters:
SELECT tblABC.Name FROM tblABC
INNER JOIN tblABCFilter ON tblABC.ID = tblABCFilter.AbcID
WHERE tblABCFilter.Filter IN (33,7)
This will return all Name with Filter in either 33 or 7.
I have created a small example fiddle.
I'm going to post a solution I use. I use a split function ( there are a lot of SQL Server split functions all over the internet)
You can take as example
CREATE FUNCTION [dbo].[SplitString]
(
#List NVARCHAR(MAX),
#Delim VARCHAR(255)
)
RETURNS TABLE
AS
RETURN ( SELECT [Value] FROM
(
SELECT
[Value] = LTRIM(RTRIM(SUBSTRING(#List, [Number],
CHARINDEX(#Delim, #List + #Delim, [Number]) - [Number])))
FROM (SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(#List)
AND SUBSTRING(#Delim + #List, [Number], LEN(#Delim)) = #Delim
) AS y
);
and run your query like this
select Name
from tblABC
where Filter1 IN (
SELECT * FROM SplitString(#DatatoFilter,',') and
Filter2 (IN (
SELECT * FROM SplitString(#DatatoFilter,',') and
..so on.
If you have hunderds of thousands of records it may not perform very well. But it should work.
My personal aproch would be a stored procedure and temp tables. Create a temp table with all the values you want to use as filter
SELECT *
INTO #Filter1
FROM SplitString(#DatatoFilter,',')
SELECT *
INTO #Filter2
FROM SplitString(#DatatoFilter,',')
then the final select
SELECT * FROM yourtable
WHERE Filter1 IN (SELECT DISTINCT Part FROM #Filter1) and
Filter2 IN (SELECT DISTINCT Part FROM #Filter2)
I don't think it makes any big difference from the first query, but it is easier to read.
Another solution which you can try is to convert the columns to XML. Its better than converting the columns to VARCHAR. You can use .exist to get only the records matching your criteria. Something like this.
DECLARE #table1 TABLE
(
[ID] int, [Name] varchar(9),Filter1 XML
)
INSERT INTO #table1
([ID], [Name],Filter1)
VALUES
(1, 'Somename','<Filter>8</Filter>'),
(2, 'Othername','<Filter>8</Filter><Filter>13</Filter>'),
(3, 'Thirdname','<Filter>25</Filter>')
DECLARE #FilterValue INT = 8
SELECT Filter1.query('/Filter'),*
FROM #table1
WHERE Filter1.exist('/Filter[. = sql:variable("#FilterValue")]') = 1
EDIT
You can even use the XML column to store all 12 of your filters. So this filter xml column which store all your filters and their multiple values.
DECLARE #table1 TABLE
(
[ID] int, [Name] varchar(9),Filter XML
)
INSERT INTO #table1
([ID], [Name],Filter)
VALUES
(1, 'Somename','<Filter ID = "1"><FilterVal>8</FilterVal></Filter><Filter ID = "2"><FilterVal>3</FilterVal><FilterVal>12</FilterVal></Filter>'),
(2, 'Othername','<Filter ID = "1"><FilterVal>8</FilterVal><FilterVal>13</FilterVal></Filter><Filter ID = "2"><FilterVal>8</FilterVal><FilterVal>13</FilterVal></Filter>'),
(3, 'Thirdname','<Filter ID = "2"><FilterVal>12</FilterVal><FilterVal>25</FilterVal></Filter><Filter ID = "3"><FilterVal>33</FilterVal></Filter>')
DECLARE #Filter1Value INT = 8
DECLARE #Filter2Value INT = 12
SELECT *
FROM #table1
WHERE Filter.exist('/Filter[#ID = 1]/FilterVal[. = sql:variable("#Filter1Value")]') = 1
AND Filter.exist('/Filter[#ID = 2]/FilterVal[. = sql:variable("#Filter2Value")]') = 1

Update specific columns in a table iteratively (Do a bulk update)

My Table Schema is as follows:
Gender: char(1), not null
Last Name: varchar(25), null
First Name: varhcar(35), not null
The data in the table looks like:
Gender | Last Name | First Name |
M Doe John
F Marie Jane
M Jones Jameson
F Simpson Alice
I now am trying to update all the names in the table from the names present in the txt file.
My Query is as follows:
-- Sort out the Forenames we'll be using for the data, we make a #Name2 table because I have yet to figure our
-- inserting specific columns using BULK INSERT and without using a format file.
CREATE TABLE #Name (Name VARCHAR(50))
CREATE TABLE #ForeNames (FirstName VARCHAR(50), Gender VARCHAR(1))
-- Move data in the #Name2 table
BULK INSERT #Name FROM "c:\girlsforenames.txt" WITH (ROWTERMINATOR='\n')
-- Now move it to the forename table and add the gender
INSERT INTO #ForeNames SELECT [Name], 'F' FROM #Name
-- Delete the names from temporary table
TRUNCATE TABLE #Name
-- Same for the boys
BULK INSERT #Name FROM "c:\boysforenames.txt" WITH (ROWTERMINATOR='\n')
INSERT INTO #ForeNames SELECT [Name], 'M' FROM #Name
-- Now do the surnames
TRUNCATE TABLE #Name
BULK INSERT #Name FROM "c:\surnames.txt" WITH (ROWTERMINATOR='\n')
DECLARE #Counter BIGINT
SET #Counter = 4
WHILE (#Counter > 0)
BEGIN
UPDATE TableName
set
[last_name]= (SELECT TOP 1 FirstName from #ForeNames),
[first_name]=(SELECT TOP 1 Name FROM #Name ORDER BY NEWID()),
[gender]= ( SELECT TOP 1 Gender FROM #ForeNames ORDER BY NEWID());
SET #Counter=#Counter-1
END
DROP TABLE #Name
DROP TABLE #ForeNames
SELECT * FROM TableName
What Happens is all the rows in the table are updated with the same values and each time i execute the query they are updated with the new set of values.
What I want is to loop through each row and update it and den update the next row with the other set of random name. But here it is updating the same random name across all the rows of the table.
Any help would be appreciated.
Each SELECT statement is only being executed once in your example (and thus returning 1 result), and since your UPDATE isn't being limited, you're applying the same value to every row.
If you want to update each row with different values, you can use a CTE and the ROW_NUMBER() function to update rows at a time.
There's no need to loop, you can do it in one fell swoop:
WITH cte AS (SELECT *,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS n1
FROM TableName
)
UPDATE cte
SET FirstName = names.Name
FROM cte
JOIN (SELECT *,ROW_NUMBER() OVER (ORDER BY NEWID()) AS n2
FROM #name
)names
on cte.n1 = names.n2
Demo: SQL Fiddle
This example is just for the FirstName.

Remove a sentence from a paragraph that has a specific pattern with T-SQL

I have a large number of descriptions that can be anywhere from 5 to 20 sentences each. I am trying to put a script together that will locate and remove a sentence that contains a word with numbers before or after it.
before example: Hello world. Todays department has 345 employees. Have a good day.
after example: Hello world. Have a good day.
My main problem right now is identifying the violation.
Here "345 employees" is what causes the sentence to be removed. However, each description will have a different number and possibly a different variation of the word employee.
I would like to avoid having to create a table of all the different variations of employee.
JTB
This would make a good SQL Puzzle.
Disclaimer: there are probably TONS of edge cases that would blow this up
This would take a string, split it out into a table with a row for each sentence, then remove the rows that matched a condition, and then finally join them all back into a string.
CREATE FUNCTION dbo.fn_SplitRemoveJoin(#Val VARCHAR(2000), #FilterCond VARCHAR(100))
RETURNS VARCHAR(2000)
AS
BEGIN
DECLARE #tbl TABLE (rid INT IDENTITY(1,1), val VARCHAR(2000))
DECLARE #t VARCHAR(2000)
-- Split into table #tbl
WHILE CHARINDEX('.',#Val) > 0
BEGIN
SET #t = LEFT(#Val, CHARINDEX('.', #Val))
INSERT #tbl (val) VALUES (#t)
SET #Val = RIGHT(#Val, LEN(#Val) - LEN(#t))
END
IF (LEN(#Val) > 0)
INSERT #tbl VALUES (#Val)
-- Filter out condition
DELETE FROM #tbl WHERE val LIKE #FilterCond
-- Join back into 1 string
DECLARE #i INT, #rv VARCHAR(2000)
SET #i = 1
WHILE #i <= (SELECT MAX(rid) FROM #tbl)
BEGIN
SELECT #rv = IsNull(#rv,'') + IsNull(val,'') FROM #tbl WHERE rid = #i
SET #i = #i + 1
END
RETURN #rv
END
go
CREATE TABLE #TMP (rid INT IDENTITY(1,1), sentence VARCHAR(2000))
INSERT #tmp (sentence) VALUES ('Hello world. Todays department has 345 employees. Have a good day.')
INSERT #tmp (sentence) VALUES ('Hello world. Todays department has 15 emps. Have a good day. Oh and by the way there are 12 employees somewhere else')
SELECT
rid, sentence, dbo.fn_SplitRemoveJoin(sentence, '%[0-9] Emp%')
FROM #tmp t
returns
rid | sentence | |
1 | Hello world. Todays department has 345 employees. Have a good day. | Hello world. Have a good day.|
2 | Hello world. Todays department has 15 emps. Have a good day. Oh and by the way there are 12 employees somewhere else | Hello world. Have a good day. |
I've used the split/remove/join technique as well.
The main points are:
This uses a pair of recursive CTEs, rather than a UDF.
This will work with all English sentence endings: . or ! or ?
This removes whitespace to make the comparison for "digit then employee" so you don't have to worry about multiple spaces and such.
Here's the SqlFiddle demo, and the code:
-- Split descriptions into sentences (could use period, exclamation point, or question mark)
-- Delete any sentences that, without whitespace, are like '%[0-9]employ%'
-- Join sentences back into descriptions
;with Splitter as (
select ID
, ltrim(rtrim(Data)) as Data
, cast(null as varchar(max)) as Sentence
, 0 as SentenceNumber
from Descriptions -- Your table here
union all
select ID
, case when Data like '%[.!?]%' then right(Data, len(Data) - patindex('%[.!?]%', Data)) else null end
, case when Data like '%[.!?]%' then left(Data, patindex('%[.!?]%', Data)) else Data end
, SentenceNumber + 1
from Splitter
where Data is not null
), Joiner as (
select ID
, cast('' as varchar(max)) as Data
, 0 as SentenceNumber
from Splitter
group by ID
union all
select j.ID
, j.Data +
-- Don't want "digit+employ" sentences, remove whitespace to search
case when replace(replace(replace(replace(s.Sentence, char(9), ''), char(10), ''), char(13), ''), char(32), '') like '%[0-9]employ%' then '' else s.Sentence end
, s.SentenceNumber
from Joiner j
join Splitter s on j.ID = s.ID and s.SentenceNumber = j.SentenceNumber + 1
)
-- Final Select
select a.ID, a.Data
from Joiner a
join (
-- Only get max SentenceNumber
select ID, max(SentenceNumber) as SentenceNumber
from Joiner
group by ID
) b on a.ID = b.ID and a.SentenceNumber = b.SentenceNumber
order by a.ID, a.SentenceNumber
One way to do this. Please note that it only works if you have one number in all sentences.
declare #d VARCHAR(1000) = 'Hello world. Todays department has 345 employees. Have a good day.'
declare #dr VARCHAR(1000)
set #dr = REVERSE(#d)
SELECT REVERSE(RIGHT(#dr,LEN(#dr) - CHARINDEX('.',#dr,PATINDEX('%[0-9]%',#dr))))
+ RIGHT(#d,LEN(#d) - CHARINDEX('.',#d,PATINDEX('%[0-9]%',#d)) + 1)