SQL Server : Breaking Comma Delimited String and matching values in a table - sql

I am using SQL Server and have a "tags" column that has a string of of comma-separated tags. Is there a way to break the string up and place in another table and have them match to be able to easily see liked tags?
Row 1:
3years_andmore,access_ccc,access_sdl,associate_iii,ccc_tickets,desoto_counter,phone_call__property_tax,ticketing,trainer
Row 2:
3years_andmore,access_ccc,access_sdl,associate_iii,ccc_tickets,desoto_counter,phone_call__dmv,ticketing,trainer
Row 3:
5_minutes,access_ccc,access_sdl,associate_ii,ccc_tickets,desoto_counter,lessthan_3years,phone_call__operations___title_by_mail_inquiry,trainer
Row4:
access_ccc,access_sdl,associate_ii,ccc_customer_request_manager_other,ccc_tickets,desoto_counter,lessthan_3years,phone_call__associate_requesting_manager__customer_requesting_mr_brierton,trainer
There is no real order and not the same tags in field but is there a way to at least sort into a new table and break them up and match to see what tickets have the same tags?
SELECT
[id],
[url],
[external_id],
[type],
[subject],
[description],
[priority],
[status],
[recipient],
[requester_id],
[submitter_id],
[assignee_id],
[organization_id],
[group_id],
[collaborator_ids],
[forum_topic_id],
[problem_id],
[has_incidents],
[due_at],
[tags],
[via],
[custom_fields],
[satisfaction_rating],
[sharing_agreement_ids],
[followup_ids],
[ticket_form_id],
[created_at],
[updated_at],
[channel]
FROM
[Brierton].[dbo].[Tickets]
WHERE
created_at BETWEEN '2017-11-01' AND '2018-08-23'
AND ',' + tags + ',' LIKE '%,' + 'ccc_tickets' + ',%'

Since version 2016 there's a built in string_split() function. To get the tags per ticket in a "more relational way" you could use:
SELECT t.id,
x.value
FROM tickets t
CROSS APPLY (SELECT value
FROM string_split(t.tags, ',')) x;
From there you could aggregate to get the most used tags.
SELECT x.value,
count(*)
FROM tickets t
CROSS APPLY (SELECT value
FROM string_split(t.tags, ',')) x
GROUP BY x.value
ORDER BY count(*) DESC;
db<>fiddle

Related

How does Partitioning By a Substring in T-SQL Work?

I found the perfect example while browsing through sites of what I'm looking for. In this code example, all country names that appear in long formatted rows are concatenated together into one result, with a comma and space between each country.
Select CountryName from Application.Countries;
Select SUBSTRING(
(
SELECT ',' + CountryName AS 'data()'
FROM Application.Countries FOR XML PATH('')
), 2 , 9999) As Countries
Source: https://www.mytecbits.com/microsoft/sql-server/concatenate-multiple-rows-into-single-string
My question is: how can you partition these results with a second column that would read as "Continent" in such a way that each country would appear within its respective continent? The theoretical "OVER (PARTITION BY Continent)" in this example would not work without an aggregate function before it. Perhaps there is a better way to accomplish this? Thanks.
Use a continents table (you seem not to have one, so derive one with distinct), and then use the same code in a cross apply using the where as a "join" condition:
select *
from
(
select distinct continent from Application.Countries
) t1
cross apply
(
Select SUBSTRING(
(
SELECT ',' + CountryName AS 'data()'
FROM Application.Countries as c FOR XML PATH('')
where c.continent=t1.continent
), 2 , 9999) As Countries
) t2
Note that it is more usual, and arguably has more finesse, to use stuff(x,1,1,'')instead of substring(x,2,9999) to remove the first comma.

How to combine the column values from the same table with condition in sql select query

I want to combine the Currency field by comparing Config and Product Column. If both field is repeated with duplicate values but different currency, the combine the currency into single row as you see in the screenshot.
I tried the code like
Select DISTINCT LC.Config, LC.Product, CONCAT(LC.Currency,',',RC.Currency) as Currencies FROM [t_LimitCurrency] LC INNER JOIN [t_LimitCurrency] RC ON LC.[Config] = RC.[Config] AND LC.Product = RC.Product
Please let me know, how to write select statement for this scenario.
Below Code should do the trick. I am using XML Path but you can use String_AGG in latest version of sql server
select distinct Config,Product,
STUFF((SELECT ' ,' + CAST(Currency AS VARCHAR(max)) [text()]
FROM (
SELECT Currency
FROM Yourtable b
WHERE a.Config=b.Config and a.product=b.product
) ap
FOR XML PATH(''), TYPE)
.value('.','NVARCHAR(MAX)'),1,2,' ') Currency
from Yourtable a
EDIT 1 : for latest version of sql server code should be like below
select distinct Config,Product,
(SELECT
STRING_AGG(CONVERT(NVARCHAR(max),Currency), ',')
FROM YourTable b WHERE a.Config=b.Config and a.product=b.product)
Currency
from Yourtable a

Can I use string_split with enforcing combination of labels?

So I have the following table:
Id Name Label
---------------------------------------
1 FirstTicket bike|motorbike
2 SecondTicket bike
3 ThirdTicket e-bike|motorbike
4 FourthTicket car|truck
I want to use string_split function to identify rows that have both bike and motorbike labels.
So the desired output in my example will be just the first row:
Id Name Label
--------------------------------------
1 FirstTicket bike|motorbike
Currently, I am using the following query but it is returning row 1,2 and 3. I only want the first. Is it possible?
SELECT Id, Name, Label FROM tickets
WHERE EXISTS (
SELECT * FROM STRING_SPLIT(Label, '|')
WHERE value IN ('bike', 'motorbike')
)
You can use APPLY & do aggregation :
SELECT t.id, t.FirstTicket, t.Label
FROM tickets t CROSS APPLY
STRING_SPLIT(t.Label, '|') t1
WHERE t1.value IN ('bike', 'motorbike')
GROUP BY t.id, t.FirstTicket, t.Label
HAVING COUNT(DISTINCT t1.value) = 2;
However, this breaks the normalization rules you should have separate table tickets.
You could just use string functions for this:
select t.*
from mytable t
where
'|' + label + '|' like '%|bike|%'
and '|' + label + '|' like '%|motorbike|%'
I would expect this to be more efficient than other methods that split and aggregate.
Please note, however, that you should really consider fixing your data model. Instead of storing delimited lists, you should have a separated table to represent the relation between tickets and labels, with one row per ticket/label tuple. Storing delimited lists in database column is a well-know SQL antipattern, that should be avoided at all cost (hard to maintain, hard to query, hard to enforce data integrity, inefficicent, ...). You can have a look at this famous SO post for more on this topic.
Yogesh beat me to it; my solution is similar but with a HUGE performance improvement worth pointing out. We'll start with this sample data:
SET NOCOUNT ON;
IF OBJECT_ID('tempdb..#tickets','U') IS NOT NULL DROP TABLE #tickets;
CREATE TABLE #tickets (Id INT, [Name] VARCHAR(50), Label VARCHAR(1000));
INSERT #tickets (Id, [Name], Label)
VALUES
(1,'FirstTicket' , 'bike|motorbike'),
(2,'SecondTicket', 'bike'),
(3,'ThirdTicket' , 'e-bike|motorbike'),
(4,'FourthTicket', 'car|truck'),
(5,'FifthTicket', 'motorbike|bike');
Now the original and much improved version:
-- Original
SELECT t.id, t.[Name], t.Label
FROM #tickets AS t
CROSS APPLY STRING_SPLIT(t.Label, '|') t1
WHERE t1.[value] IN ('bike', 'motorbike')
GROUP BY t.id, t.[Name], t.Label
HAVING COUNT(DISTINCT t1.[value]) = 2;
-- Improved Version Leveraging APPLY to avoid a sort
SELECT t.Id, t.[Name], t.Label
FROM #tickets AS t
CROSS APPLY
(
SELECT 1
FROM STRING_SPLIT(t.Label,'|') AS split
WHERE split.[value] IN ('bike','motorbike')
HAVING COUNT(*) = 2
) AS isMatch(TF);
Now the execution plans:
If you compare the costs: the "sortless" version is query 4.36 times faster than the original. In reality it's more because, with the first version, we're not just sorting, we are sorting three columns - an int and two (n)varchars. Because sorting costs are N * LOG(N), the original query gets exponentially slower the more rows you throw at it.

SQL Server : combine two rows into one

I want to write a query which will display the following result
FROM
ID Contract# Market
1 123kjs 40010
1 123kjs 40011
2 121kjs 40098
2 121kjs 40099
TO
ID Contract# Market
1 123kjs 40010,40011
2 121kjs 40098,40099
Try out this query, I use GROUP_CONCAT to turn column fields into 1 row field.
Also notice that you should rename the FROM clause with the name of your table.
SELECT ID,Contract#, GROUP_CONCAT(Market SEPARATOR ',')
FROM nameOfThatTable GROUP BY ID;
Try this out. I used PIVOT to solve it.
SELECT
ID,
Contract#,
ISNULL(CONVERT(varchar,[40010]) + ',' + CONVERT(varchar,[40011]),
CONVERT(varchar,[40098]) + ',' + CONVERT(varchar,[40099])) AS Market FROM
( SELECT * FROM ContractTable) AS A
PIVOT(MIN(Market) FOR Market IN ([40010],[40011],[40098],[40099])) AS PVT
ORDER BY ID
You can use ', ' + CAST(Market AS VARCHAR(30)) in sub-query and join Id and Contract# of sub-query with outer query to get values of Market as Comma Separated Values for each Id and Contract#.
SELECT DISTINCT ID,Contract#,
SUBSTRING(
(SELECT ', ' + CAST(Market AS VARCHAR(30))
FROM #TEMP T1
WHERE T2.Id=T1.Id AND T2.Contract#=T1.Contract#
FOR XML PATH('')),2,200000) Market
FROM #TEMP T2
Click here to view result
Note
.........
If you want to get CSV values for Id only, remove T2.Contract#=T1.Contract# from sub-query.

Find duplicates in SQL table, list multiple instances

I'm trying to list the multiple instances of tbldoc.[docid] from tbldoc where tbldoc.[filename] occurs more than once, id like them seperated by comma and grouped by [filename]
this code works great to find duplicates:
SELECT cast([filename] as varchar(max)),
COUNT(cast([filename] as varchar(max)))
FROM tbldoc
GROUP BY cast([filename] as varchar(max))
HAVING ( COUNT(cast([filename] as varchar(max))) > 1 )
but when i try adding [docid] i get an error:
Column 'tbldoc.DocID' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.
this is what i am trying:
SELECT [docid], cast([filename] as varchar(max)),
COUNT(cast([filename] as varchar(max)))
FROM tbldoc
GROUP BY cast([filename] as varchar(max))
HAVING ( COUNT(cast([filename] as varchar(max))) > 1 )
I have no idea how to get all of the [docid]s to list seperated by commas, I'm a a pretty new user when it comes to sql.
this is the output i would like to see:
[docids]|[filemame]|[instances]
12345,12346| excelfile.xls | 3
Thanks ahead of time for the help guys/gals! =)
Iyosha,
You need to join your first result set back to your full table to get the DocIDs. I'll take the CAST() as read to save some typing.
;with CountedFiles as
(
SELECT
filename,
COUNT(filename) as Total
FROM tbldoc
GROUP BY filename
HAVING COUNT(filename) > 1
)
select
cf.filename,
cf.Total,
td.DocID
from CountedFiles as cf
inner join tbldoc at td
on td.filename = cf.filename;
This will return one DocId, one filename and the count per row. You can then follow Adam's link to turn this into a comma list.