Prioritizing Duplicates Based on Criteria - sql

I have a table of data that contains duplicate email addresses. Each email address has a date, a color (values: "Black", "Blue", or "Green"), and a unique ID. There may be sets of duplicate email addresses that contain more than two email addresses (i.e. I may have 10 of duplicates of the same email address) and each set of duplicate email addresses may contain the same or different colors compared to those in its respective duplicate set.
My objective is to retrieve the IDs for email addresses that have a certain color and the max(date). I would like to prioritize the color (first "Black" then "Blue" then "Green") and then move to the max(date) only if there are two or more email addresses within the same duplicate set that have the same highest desired color.
Example 1
ID Email Color Date
1 xyz#xyz.com Black 01/01/2014
2 xyz#xyz.com Black 01/31/2014
3 xyz#xyz.com Blue 03/31/2015
4 xyz#xyz.com Green 01/01/2014
5 xyz#xyz.com Green 01/01/2014
Example 2
ID Email Color Date
6 abc#abc.com Green 12/31/2014
7 abc#abc.com Green 01/01/2014
8 abc#abc.com Blue 01/31/2014
In Example 1, I would want to choose ID 2 as this is the highest desired color of the set of duplicate email addresses--"Black"--and I am choosing the one with max(date).
In Example 2, I would want to choose ID 8 as this is the highest desired color of the set of duplicate email addresses--"Blue".

You can use ROW_NUMBER() to assign priority numbers to each record within every group of duplicate emails as per your requirements. Then, in an outer query, you can select records from each group with the highest priority:
SELECT ID, Email, Color
FROM (
SELECT ID, Email, Color,
ROW_NUMBER() OVER (PARTITION BY email
ORDER BY (CASE Color
WHEN 'Black' THEN 1
WHEN 'Blue' THEN 2
ELSE 3
END),
Date DESC) AS rn
FROM emails ) e
WHERE e.rn = 1
SQL Fiddle Demo

Related

SQLite - Assign a sequential number to each range of rows

In my table every color has index, which defines color order inside country, in every country order may be different.
country
color
color_order
other_data
Canada
green
1
Canada
green
1
Canada
green
1
Canada
red
2
Canada
red
2
Canada
yellow
3
Canada
yellow
3
France
red
1
France
blue
2
France
blue
2
After removing one of colors (all 'red' rows), I need to re-number color_order for each country.
Expected result:
country
color
color_order
other_data
Canada
green
1
Canada
green
1
Canada
green
1
Canada
yellow
2
Canada
yellow
2
France
blue
1
France
blue
1
It should be something like nested loop to iterate through country/color, seems query should include:
ROW_NUMBER() OVER (PARTITION BY country ORDER BY color_order)
Any ideas please?
uses a Common Table Expression (CTE) to first select the rows that you want to keep in the table (all rows where the color is not 'red'). The ROW_NUMBER() function is then used to assign a new color_order to each row based on the order of the color_order column within each country (partition). Finally, an UPDATE statement is used to update the color_order values in the original table to the new values computed in the CTE.
WITH cte AS (
SELECT country, color, color_order, other_data,
ROW_NUMBER() OVER (PARTITION BY country ORDER BY color_order) AS new_order
FROM your_table
WHERE color != 'red'
)
UPDATE cte
SET color_order = new_order;

Looking to select values grouped by one column but create a hierarchy of the different columns to find "the best" column

Mightn't make much sense but let's try.
I have a dataset that is quite large and I have a few "duplicates" in a column. Within that column, I want to group it but select the corresponding row that is the "best fit" based on the max/sum of other columns. Is this possible within SQL?
Input:
Name
Transactions
Date
Apple #
Orange #
John
10
today
10
10
John
15
Yesterday
10
10
Jack
10
Today
5
5
Output I expect:
Name
Transactions
Date
Apple #
Orange #
Total #
John
15
Yesterday
10
10
20
Jack
10
Today
5
5
10
The hierarchy would be, max(transactions), max(date) and then sum(Apple, Orange).
I want to do it then for every unique name.
If I understand correctly, you can use row_number(). The key is setting up the order by to reflect the conditions you want:
select t.*
from (select t.*,
row_number() over (partition by name order by transactions desc, date desc, apple + orange desc) as seqnum
from t
) t
where seqnum = 1;

SQL Join from 2 Tables with Null Values

I have 2 tables and want to pull the results back from them into one.
Now the Name field is a unique ID with multiple data attached to it, i.e. the dates and the times. I've simplified the data somewhat to post here but this is the general gist.
Table 1
Name Date
John 12th
John 13th
John 15th
John 17th
Table 2
Name Colour
John Red
John Blue
John Orange
John Green
Result Needed
Name Date Time
John 12th NULL
John 13th NULL
John 15th NULL
John 17th NULL
John NULL Red
John NULL Blue
John NULL Orange
John NULL Green
I'm currently performing a Left join to pull the data however it is posting the results next to each other like
John 12th Red
You want union all:
select name, date, null as colour
from t1
union all
select name, null, colour
from t2;
I took the liberty of naming the second column colour rather than time, simply because that makes more sense in the context of the question.

Group by in T-SQL for selecting different columns

I have following table ContactDetails. This table contains both cell phone as well as emails. These rows can be updated based on Users latest contact details. So userid (here 1) can have multiple rows grouped by Email and cell as below. There can be multiple users 2.3.4...so on
Rows are as below
SrNo Userid ContactType ContactDetail LoadDate
1 1 Email x1.y#gmail.com 2013-01-01
2 1 Cell 12345678 2013-01-01
3 1 Email x2.y#gmail.com 2012-01-01
4 1 Cell 98765432 2012-01-01
5 1 Email x2.y#gmail.com 2011-01-01
6 1 Cell 987654321 2011-01-01
I am looking for recent Email and Cell details of users. I tried running the query as below
Select
Userid,
Max(ContactDetail),
MAX(LoadDate)
from
ContactDetails
group by
Userid, ContactType;
But I understand that this won't work.
Can anyone give some suggestion to pull the latest email and cell in single or sub-queries?
Cheers!
Junni
You can use ROW_NUMBER() to select the most recent row of interest:
;With Ordered as (
select UserId,ContactType,ContactDetail,LoadDate,
ROW_NUMBER() OVER (
PARTITION BY UserID,ContactType
ORDER BY LoadDate DESC) as rn
from ContactDetails
)
select * from Ordered where rn = 1

SELECT DISTINCT HAVING Count unique conditions

I've searched for an answer on this but can't find quite how to get this distinct recordset based on a condition. I have a table with the following sample data:
Type Color Location Supplier
---- ----- -------- --------
Apple Green New York ABC
Apple Green New York XYZ
Apple Green Los Angeles ABC
Apple Red Chicago ABC
Apple Red Chicago XYZ
Apple Red Chicago DEF
Banana Yellow Miami ABC
Banana Yellow Miami DEF
Banana Yellow Miami XYZ
Banana Yellow Atlanta ABC
I'd like to create a query that shows the count of unique locations for each distinct Type+Color where the number of unique locations is more than 1, e.g.
Type Color UniqueLocations
---- ----- --------
Apple Green 2
Banana Yellow 2
Note that {Apple, Red, 1} doesn't appear because there is only 1 location for red apples (Chicago). I think I've got this one (but perhaps there is a simpler method). I'm using:
SELECT Type, Color, Count(Location) FROM
(SELECT DISTINCT Type, Color, Location FROM MyTable)
GROUP BY Type, Color HAVING Count(Location)>1;
How can I create another query that lists the Type, Color, and Location for each distinct Type,Color when the count of unique locations for that Type,Color is greater than 1? The resulting recordset would look like:
Type Color Location
---- ----- --------
Apple Green New York
Apple Green Los Angeles
Banana Yellow Miami
Banana Yellow Atlanta
Note that Apple, Red, Chicago doesn't appear because there is only 1 location for red apples. Thanks!
Use a COUNT(DISTINCT Location) and join against a subquery on Type and Color The GROUP BY and HAVING clauses as you have attempted to use them will do the job.
/* Be sure to use DISTINCT in the outer query to de-dup */
SELECT DISTINCT
MyTable.Type,
MyTable.Color,
Location
FROM
MyTable
INNER JOIN (
/* Joined subquery returns type,color pairs having COUNT(DISTINCT Location) > 1 */
SELECT
Type,
Color,
/* Don't actually need to select this value - it could just be in the HAVING */
COUNT(DISTINCT Location) AS UniqueLocations
FROM
MyTable
GROUP BY Type, Color
/* Note: Some RDBMS won't allow the alias here and you
would have to use the expanded form
HAVING COUNT(DISTINCT Location) > 1
*/
HAVING UniqueLocations > 1
/* JOIN back against the main table on Type, Color */
) subq ON MyTable.Type = subq.Type AND MyTable.Color = subq.Color
Here is a demonstration
You could write your first query as this:
Select Type, Color, Count(Distinct Location) As UniqueLocations
From Table
Group By Type, Color
Having Count(Distinct Location) > 1
(if you're using MySQL you could use the alias UniqueLocations in your having clause, but on many other systems the aliases are not yet available as the having clause is evaluated before the select clause, in this case you have to repeat the count on both clauses).
And for the second one, there are many different ways to write that, this could be one:
Select Distinct Type, Color, Location
From Table
Where
Exists (
Select
*
From
Table Table_1
Where
Table_1.Type = Table.Type
and Table_1.Color = Table.Color
Group By
Type, Color
Having
Count(Distinct Location) > 1
)