Query to combine duplicates

Query to combine duplicates - sql

I have a table that has a table with titles and description and most titles have one description but some have two or more. if a title has more than one description I need to display "Duplicate" description next to the title instead of an actual description.
Titles
______________
ID Title Description
-----------------------
1 Test ABCD
2 Test FEGH
3 Test2 AVWL
4 Test3 KLMN
5 Test3 ASDF
From the above data my query should return 3 records:
Test Duplicate
Test2 AVWL
Test3 Duplicate
I tried using
SELECT Title, CASE WHEN COUNT(Description) > 1 THEN 'Duplicate' ELSE Description END Title_desc
FROM Titles
GROUP BY Title
But it would not work, erroring out saying Description is not a part of group by. If I add Description to Group by then the query does not remove dups. Is there a way to accomplish what I need without having too many subqueries?

You can do:
select
title,
case when cnt = 1 then d else 'Duplicate' end as val
from (
select title, count(*) as cnt, max(description) as d
from t
group by title
) x
Or, without a subquery:
select title,
case when count(*) = 1 then max(description)
else 'Duplicate' end as val
from t
group by title

Related

Difference in output from two SQL queries

What is the difference between the two SQL queries below other than Query2 returning an additional field? Are there any possible scenarios where the output of the two queries would be different (other than the additional field in Query2)
Query1:
SELECT Field1, COUNT(*)
FROM Table1
GROUP BY Field1
HAVING COUNT(*) > 1
Query2:
SELECT Field1, Field2, COUNT(*)
FROM Table1
GROUP BY Field1, Field2
HAVING COUNT(*) > 1

Absolutely, these are different. Query2's Group By clause specifies an extra field. That means when the results are aggregated, they will be aggregated for the combined unique values of Field1 AND Field2. That is, two records are aggregated if and only if both Field1 and Field2 are equal.
For example:
SELECT Profession, Count(*)
FROM People
GROUP BY Profession
HAVING Count(*) > 1
will return a list of professions with associated counts like:
Software Developer, 10
PM, 5
Tester, 2
whereas:
SELECT Profession, Gender, Count(*)
FROM People
GROUP BY Profession, Gender
HAVING Count(*) > 1
will return a list of professions broken out by gender like:
Software Developer, Male, 5
Sofware Developer, Female, 5
PM, Male, 3
PM, Female, 2
Tester, Male, 2
Edit with additional requested information:
You can retrieve counts of professions with rows for both genders via:
SELECT Profession, Count(*)
FROM People
GROUP BY Profession
HAVING SUM(case Gender when 'Female' then 1 else 0 end) > 0 AND SUM(case Gender when 'Male' then 1 else 0 end) > 0
It gets a bit hairy (need subqueries) if you also need associated gender counts

Extra group by clause in query 2 filters records.To know more look at below example.
test data:
id name
1 a
2 b
3 a
4 a
So when I say group by name,sql first filters out distinct records for name which goes like below for the below query
select name,sum(id)
from test
group by name
--first filter out distinct values for group by column (here name)
a
b
--next for each distinct record ,how many values fall into that category..
a 1 a
4 a
3 a
b 2 b
So from the above groups ,now you can calculate any aggregations on the group in our case,it is sum,so next output will go some thing like this
a 8
b 2
As you can see from above output,you also can calculate,any aggregation on group (here a and b values) ,like give me count(id),len(name) on group like below
select name,len(name),sum(id)
from test
group by name
The same thing happens when you group by another field,lets say like below
select id,name
from
test
group by id,name
so in above case,sql first filters alldistinct records for id,name
1 a
2 b
3 a
4 a
next step is to get records which fall for each group
groupby columns --columns which fall into this
1 a 1 a
2 b 2 b
3 a 3 a
4 a 4 a
Now you can calculate aggergations on above groups.hope this helps in visualizing your group by.further having will eliminate groups after group by phase,where will eliminate record before group by phase

How to perform "Select Count" with complicated "Where" statement to compute co-occurrences?

Let's have an example to declare my concern:
Suppose we have a Table (Tags) which has two columns like this
UserID -------------------------------- Tag
1 -------------------------------------- SQL
1 -------------------------------------- Select
1 -------------------------------------- DB
2 -------------------------------------- SQL
2 -------------------------------------- Programming
2 -------------------------------------- Code
2 -------------------------------------- Software
3 -------------------------------------- Code
4 -------------------------------------- SQL
4 -------------------------------------- Code
I need to count DISTINCT co-occurrences for each tag based on UserID
So, the output should be like this (with Order by Co-occurrences desc):
Tag -------------------------------- Co-occurrences
---------------------------------------------
SQL --------------------------------------- 5
Programming ------------------------------- 3
Code -------------------------------------- 3
Software ---------------------------------- 3
Select ------------------------------------ 2
DB ---------------------------------------- 2
This is just an example..
How can I make a Select statement that can do this?
I came up with one way but for only ONE specific tag:
SELECT count (distinct (Tag)) - 1 as Co_occurrences
FROM Tags
WHERE Tag is NOT NULL and UserID in
( SELECT UserID
FROM Tags
where tag = 'SQL')
Is it possible to change the above statement to make it general for all tags in the table?

SELECT t2.tag, count (distinct (t1.Tag)) - 1 as Co_occurrences
FROM Tags t1 inner join
Tags t2 on t1.UserId = t2.UserId
GROUP BY t2.tag
ORDER BY count (distinct (t1.Tag)) desc

A GROUP BY is what you are looking for:
SELECT
UserID,
Tag,
COUNT(DISTINCT Tag) - 1 AS Co_occurrences
FROM Tags
GROUP BY UserID, Tag
ORDER BY UserID, Tag
Edit: As mentioned in the comments, the above does not answer the question. I improved the answer of #OSA-E a bit, to explain what the -1 is doing after the count.
SELECT
[t1].[Tag],
COUNT(DISTINCT [t2].[Tag]) AS [Co_occurrences]
FROM [Tags] [t1]
INNER JOIN [Tags] [t2] ON [t1].[UserID] = [t2].[UserID]
WHERE [t1].[Tag] <> [t2].[Tag]
GROUP BY [t1].[Tag]
ORDER BY [Co_occurrences] DESC
Here is the Fiddle.

Exclude value of a record in a group if another is present v2

In the example table below, I'm trying to figure out a way to sum amount over marks in two situations: the first, when mark 'C' exists within a single id, and the second, when mark 'C' doesn't exist within an id (see id 1 or 2). In the first situation, I want to exclude the amount against mark 'A' within that id (see id 3 in the desired conversion table below). In the second situation, I want to perform no exclusion and take a simple sum of the amounts against the marks.
In other words, for id's containing both mark 'A' and 'C', I want to make the amount against 'A' as zero. For id's that do not contain mark 'C' but contain mark 'A', keep the original amount against mark 'A'.
My desired output is at the bottom. I've considered trying to partition over id or use the EXISTS command, but I'm having trouble conceptualizing the solution. If any of you could take a look and point me in the right direction, it would be greatly appreciated :)
example table:
id mark amount
------------------
1 A 1
2 A 3
2 B 2
3 A 1
3 C 3
desired conversion:
id mark amount
------------------
1 A 1
2 A 3
2 B 2
3 A 0
3 C 3
desired output:
mark sum(amount)
--------------------
A 4
B 2
C 3

You could slightly modify my previous answer and end up with this:
SELECT
mark,
sum(amount) AS sum_amount
FROM atable t
WHERE mark <> 'A'
OR NOT EXISTS (
SELECT *
FROM atable
WHERE id = t.id
AND mark = 'C'
)
GROUP BY
mark
;
There's a live demo at SQL Fiddle.

Try:
select
mark,
sum(amount)
from ( select
id,
mark,
case
when (mark = 'A' and id in (select id from table where mark = 'C')) then 0
else amount
end as amount
from table ) t1
group by mark

SQL - Removing Duplicate without 'hard' coding?

Heres my scenario.
I have a table with 3 rows I want to return within a stored procedure, rows are email, name and id. id must = 3 or 4 and email must only be per user as some have multiple entries.
I have a Select statement as follows
SELECT
DISTINCT email,
name,
id
from table
where
id = 3
or id = 4
Ok fairly simple but there are some users whose have entries that are both 3 and 4 so they appear twice, if they appear twice I want only those with ids of 4 remaining. I'll give another example below as its hard to explain.
Table -
Email Name Id
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3
jimmy#domain.com jimmy 3
So in the above scenario I would want to ignore the jimmy with the id of 3, any way of doing this without hard coding?
Thanks

SELECT
email,
name,
max(id)
from table
where
id in( 3, 4 )
group by email, name

Is this what you want to achieve?
SELECT Email, Name, MAX(Id) FROM Table WHERE Id IN (3, 4) GROUP BY Email;

Sometimes using Having Count(*) > 1 may be useful to find duplicated records.
select * from table group by Email having count(*) > 1
or
select * from table group by Email having count(*) > 1 and id > 3.
The solution provided before with the select MAX(ID) from table sounds good for this case.
This maybe an alternative solution.

What RDMS are you using? This will return only one "Jimmy", using RANK():
SELECT A.email, A.name,A.id
FROM SO_Table A
INNER JOIN(
SELECT
email, name,id,RANK() OVER (Partition BY name ORDER BY ID DESC) AS COUNTER
FROM SO_Table B
) X ON X.ID = A.ID AND X.NAME = A.NAME
WHERE X.COUNTER = 1
Returns:
email name id
------------------------------
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3

How to get a proper count in sql server when retrieving a lot of fields?

Here is my scenario,
I have query that returns a lot of fields. One of the fields is called ID and I want to group by ID and show a count in descending order. However, since I am bringing back more fields, it becomes harder to show a true count because I have to group by those other fields. Here is an example of what I am trying to do. If I just have 2 fields (ID, color) and I group by color, I may end up with something like this:
ID COLOR COUNT
== ===== =====
2 red 10
3 blue 5
4 green 24
Lets say I add another field which is actually the same person, but they have a different spelling of their name which throws the count off, so I might have something like this:
ID COLOR NAME COUNT
== ===== ====== =====
2 Red Jim 5
2 Red Jimmy 5
3 Red Bob 3
3 Red Robert 2
4 Red Johnny 12
4 Red John 12
I want to be able to bring back ID, Color, Name, and Count, but display the counts like in the first table. Is there a way to do this using the ID?

If you want a single result set, you would have to omit the name, as in your first post
SELECT Id, Color, COUNT(*)
FROM YourTable
GROUP By Id, Color
Now, you could get your desired functionality with a subquery, although not elegant
SELECT Id, Color Name, (SELECT COUNT(*)
FROM YourTable
Where Id = O.Id
AND Color = O.Color
) AS "Count"
FROM YourTable O
GROUP BY Id, Color, Name
This should work as you desire

Try this:-
SELECT DISTINCT a.ID, a.Color, a.Name, b.Count
FROM yourTable
INNER JOIN (
SELECT ID, Color, Count(1) [Count] FROM yourTable
GROUP BY ID, Color
) b ON a.ID = b.ID, a.Color = b.Color
ORDER BY [Count] DESC

Try doing a sub query to get the count.
-- MarkusQ

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Query to combine duplicates - sql

You can do: select title, case when cnt = 1 then d else 'Duplicate' end as val from ( select title, count() as cnt, max(description) as d from t group by title ) x Or, without a subquery: select title, case when count() = 1 then max(description) else 'Duplicate' end as val from t group by title

Related

Difference in output from two SQL queries

How to perform "Select Count" with complicated "Where" statement to compute co-occurrences?

Exclude value of a record in a group if another is present v2

SQL - Removing Duplicate without 'hard' coding?

How to get a proper count in sql server when retrieving a lot of fields?

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Query to combine duplicates - sql

You can do: select title, case when cnt = 1 then d else 'Duplicate' end as val from ( select title, count(*) as cnt, max(description) as d from t group by title ) x Or, without a subquery: select title, case when count(*) = 1 then max(description) else 'Duplicate' end as val from t group by title

Related

Difference in output from two SQL queries

How to perform "Select Count" with complicated "Where" statement to compute co-occurrences?

Exclude value of a record in a group if another is present v2

SQL - Removing Duplicate without 'hard' coding?

How to get a proper count in sql server when retrieving a lot of fields?

Categories

Resources

You can do: select title, case when cnt = 1 then d else 'Duplicate' end as val from ( select title, count() as cnt, max(description) as d from t group by title ) x Or, without a subquery: select title, case when count() = 1 then max(description) else 'Duplicate' end as val from t group by title