Find duplicates in SQL table, list multiple instances - sql

I'm trying to list the multiple instances of tbldoc.[docid] from tbldoc where tbldoc.[filename] occurs more than once, id like them seperated by comma and grouped by [filename]
this code works great to find duplicates:
SELECT cast([filename] as varchar(max)),
COUNT(cast([filename] as varchar(max)))
FROM tbldoc
GROUP BY cast([filename] as varchar(max))
HAVING ( COUNT(cast([filename] as varchar(max))) > 1 )
but when i try adding [docid] i get an error:
Column 'tbldoc.DocID' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.
this is what i am trying:
SELECT [docid], cast([filename] as varchar(max)),
COUNT(cast([filename] as varchar(max)))
FROM tbldoc
GROUP BY cast([filename] as varchar(max))
HAVING ( COUNT(cast([filename] as varchar(max))) > 1 )
I have no idea how to get all of the [docid]s to list seperated by commas, I'm a a pretty new user when it comes to sql.
this is the output i would like to see:
[docids]|[filemame]|[instances]
12345,12346| excelfile.xls | 3
Thanks ahead of time for the help guys/gals! =)

Iyosha,
You need to join your first result set back to your full table to get the DocIDs. I'll take the CAST() as read to save some typing.
;with CountedFiles as
(
SELECT
filename,
COUNT(filename) as Total
FROM tbldoc
GROUP BY filename
HAVING COUNT(filename) > 1
)
select
cf.filename,
cf.Total,
td.DocID
from CountedFiles as cf
inner join tbldoc at td
on td.filename = cf.filename;
This will return one DocId, one filename and the count per row. You can then follow Adam's link to turn this into a comma list.

Related

How does Partitioning By a Substring in T-SQL Work?

I found the perfect example while browsing through sites of what I'm looking for. In this code example, all country names that appear in long formatted rows are concatenated together into one result, with a comma and space between each country.
Select CountryName from Application.Countries;
Select SUBSTRING(
(
SELECT ',' + CountryName AS 'data()'
FROM Application.Countries FOR XML PATH('')
), 2 , 9999) As Countries
Source: https://www.mytecbits.com/microsoft/sql-server/concatenate-multiple-rows-into-single-string
My question is: how can you partition these results with a second column that would read as "Continent" in such a way that each country would appear within its respective continent? The theoretical "OVER (PARTITION BY Continent)" in this example would not work without an aggregate function before it. Perhaps there is a better way to accomplish this? Thanks.
Use a continents table (you seem not to have one, so derive one with distinct), and then use the same code in a cross apply using the where as a "join" condition:
select *
from
(
select distinct continent from Application.Countries
) t1
cross apply
(
Select SUBSTRING(
(
SELECT ',' + CountryName AS 'data()'
FROM Application.Countries as c FOR XML PATH('')
where c.continent=t1.continent
), 2 , 9999) As Countries
) t2
Note that it is more usual, and arguably has more finesse, to use stuff(x,1,1,'')instead of substring(x,2,9999) to remove the first comma.

SQL Query - Copy Data into One Field

I have a SQL database and I am writing a query:
SELECT *
FROM Consignments
INNER JOIN OrderDetail
ON Consignments.consignment_id = OrderDetail.consignment_id
INNER JOIN UserReferences
ON OrderDetail.record_id = UserReferences.record_id
WHERE Consignments.despatch_date = '2020-04-23'
Within the first column is:consignment_id [this is from the Consignments table]In the final column is:senders_reference [this is from the UserReferences table]
Now - the issue I have is - that when I am running the query to pick up all consignments for a particular date - it is displaying multiple rows (with duplicated consignment_id) when there are multiple senders references within the database.
If there is one senders reference number - then there is only 1 row.
This makes sense - because within the front-end for the database the user can enter 1 or more senders references.
Now - what I would like to do is to amend my query for the resulting data to only display 1 row for all consignments and if there are multiple senders reference numbers - to have them within the one field, separated by commas.
Is this doable from the query stage?
Or if not - after export, is it possible to develop a bat file to do the same thing?
For reference - this is what I mean - this is the result I am getting at the moment:
This is what I need:
You can use older style with the help of for xml :
select t.consignment_id,
stuff((select ', ' +convert(varchar(255), t1.sender_reference)
from table t1
where t1.consignment_id = t.consignment_id
for xml path('')
), 1, 1, ''
) as senders_reference
from (select distinct consignment_id from table t) t;
Edit : You can use CTE :
with cte as (
<your query>
)
select t.consignment_id,
stuff((select ', ' +convert(varchar(255), t1.sender_reference)
from cte t1
where t1.consignment_id = t.consignment_id
for xml path('')
), 1, 1, ''
) as senders_reference
from (select distinct consignment_id from cte t) t;
You seem to want to use the STRING_AGG function.
This answer covers it nicely
ListAGG in SQLSERVER

Pivot one column with multiple rows into to one concatenated row

I need to prepare a list of emails tat can be copy and pasted into the email field (2.5k).
I have one column with one email address in each row.
I need the end result to look like:
email1#test.com; email2#test.com; email3#test.com
Select *
from(
Select email, 1 as num
FROM tabl1
WHERE b.stu_cde = 1 ) e
Pivot( max(email)
for num in ([1]) )as pv
Try this. Instead of pivot use stuff to get your result. For more details about pivot please find this link PIVOT.
select Stuff( (select ';' + email from tabl1 for xml path('')),1,1,'') as Result

SQL Server : Breaking Comma Delimited String and matching values in a table

I am using SQL Server and have a "tags" column that has a string of of comma-separated tags. Is there a way to break the string up and place in another table and have them match to be able to easily see liked tags?
Row 1:
3years_andmore,access_ccc,access_sdl,associate_iii,ccc_tickets,desoto_counter,phone_call__property_tax,ticketing,trainer
Row 2:
3years_andmore,access_ccc,access_sdl,associate_iii,ccc_tickets,desoto_counter,phone_call__dmv,ticketing,trainer
Row 3:
5_minutes,access_ccc,access_sdl,associate_ii,ccc_tickets,desoto_counter,lessthan_3years,phone_call__operations___title_by_mail_inquiry,trainer
Row4:
access_ccc,access_sdl,associate_ii,ccc_customer_request_manager_other,ccc_tickets,desoto_counter,lessthan_3years,phone_call__associate_requesting_manager__customer_requesting_mr_brierton,trainer
There is no real order and not the same tags in field but is there a way to at least sort into a new table and break them up and match to see what tickets have the same tags?
SELECT
[id],
[url],
[external_id],
[type],
[subject],
[description],
[priority],
[status],
[recipient],
[requester_id],
[submitter_id],
[assignee_id],
[organization_id],
[group_id],
[collaborator_ids],
[forum_topic_id],
[problem_id],
[has_incidents],
[due_at],
[tags],
[via],
[custom_fields],
[satisfaction_rating],
[sharing_agreement_ids],
[followup_ids],
[ticket_form_id],
[created_at],
[updated_at],
[channel]
FROM
[Brierton].[dbo].[Tickets]
WHERE
created_at BETWEEN '2017-11-01' AND '2018-08-23'
AND ',' + tags + ',' LIKE '%,' + 'ccc_tickets' + ',%'
Since version 2016 there's a built in string_split() function. To get the tags per ticket in a "more relational way" you could use:
SELECT t.id,
x.value
FROM tickets t
CROSS APPLY (SELECT value
FROM string_split(t.tags, ',')) x;
From there you could aggregate to get the most used tags.
SELECT x.value,
count(*)
FROM tickets t
CROSS APPLY (SELECT value
FROM string_split(t.tags, ',')) x
GROUP BY x.value
ORDER BY count(*) DESC;
db<>fiddle

SQL Server : combine two rows into one

I want to write a query which will display the following result
FROM
ID Contract# Market
1 123kjs 40010
1 123kjs 40011
2 121kjs 40098
2 121kjs 40099
TO
ID Contract# Market
1 123kjs 40010,40011
2 121kjs 40098,40099
Try out this query, I use GROUP_CONCAT to turn column fields into 1 row field.
Also notice that you should rename the FROM clause with the name of your table.
SELECT ID,Contract#, GROUP_CONCAT(Market SEPARATOR ',')
FROM nameOfThatTable GROUP BY ID;
Try this out. I used PIVOT to solve it.
SELECT
ID,
Contract#,
ISNULL(CONVERT(varchar,[40010]) + ',' + CONVERT(varchar,[40011]),
CONVERT(varchar,[40098]) + ',' + CONVERT(varchar,[40099])) AS Market FROM
( SELECT * FROM ContractTable) AS A
PIVOT(MIN(Market) FOR Market IN ([40010],[40011],[40098],[40099])) AS PVT
ORDER BY ID
You can use ', ' + CAST(Market AS VARCHAR(30)) in sub-query and join Id and Contract# of sub-query with outer query to get values of Market as Comma Separated Values for each Id and Contract#.
SELECT DISTINCT ID,Contract#,
SUBSTRING(
(SELECT ', ' + CAST(Market AS VARCHAR(30))
FROM #TEMP T1
WHERE T2.Id=T1.Id AND T2.Contract#=T1.Contract#
FOR XML PATH('')),2,200000) Market
FROM #TEMP T2
Click here to view result
Note
.........
If you want to get CSV values for Id only, remove T2.Contract#=T1.Contract# from sub-query.