INNER JOIN on CTE (Common Table Expression) Without PK - sql

I have a CTE in which I am finding duplicate records matching on 5 columns:
;WITH DuplicateCount AS
(
SELECT
FirstName,
LastName,
DateofBirth,
Email,
c1.Status,
Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY FirstName, LastName, DateofBirth, Email, c1.Status
HAVING COUNT(*) > 1
)
I am then selecting Status and TotalCount from that CTE and joining an Enum table to produce readable data
;WITH DuplicateCount AS
(
SELECT
FirstName,
LastName,
DateofBirth,
Email,
c1.Status,
Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY FirstName, LastName, DateofBirth, Email, c1.Status
HAVING COUNT(*) > 1
)
SELECT e.Display, dc.TotalCount
FROM DuplicateCount dc
INNER JOIN Enum e ON dc.Status = e.Index
In this scenario, I am able to pull back readable data and use Excel to spit out a graph report of duplicates by Status.
Problem
I need to join the Customer_1 table once again to gather one more column: Stage. Here is how I tried to do it:
;WITH DuplicateCount AS
(
SELECT customerID,
FirstName,
LastName,
DateofBirth,
Email,
c1.Status,
Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY customerID, FirstName, LastName, DateofBirth, Email, c1.Status
HAVING COUNT(*) > 1
)
SELECT e.Display,
CASE
WHEN c1.Stage = 6 THEN 'First'
WHEN c1.Stage = 7 THEN 'Second'
WHEN c1.Stage = 8 THEN 'Third'
WHEN c1.Stage = 11 THEN 'Fourth'
WHEN c1.Stage = 9 THEN 'Fifth'
WHEN c1.Stage = 10 THEN 'Sixth'
WHEN c1.Stage = 12 THEN 'Unknown'
ELSE ''
END AS Stage,
dc.TotalCount
FROM DuplicateCount dc
INNER JOIN Enum e ON dc.Status = e.Index
INNER JOIN Customer_1 c1 ON c1.customerID = dc.customerID
Obviously, that didn't work because none of my records will have duplicate PKs.
Is there a way to join a table to my CTE without a PK? Or somehow add a PK to my CTE without grouping by it?
Edit: This is what I am trying to achieve
|FirstName | LastName | Stage | Total Count
| John | Smith | First | 2
| John | Smith | Third | 2
| Alex | Smith | First | 2
| Jane | Smith | Third | 2
| Jane | Smith | First | 2
| Jack | Smith | Second | 2
Then, when reporting on this data:
John Smith has 4 total records. Two in First, two in Third
Alex Smith has 2 total records. Two in First
Jane Smith has 4 total records. Two in First and two in Third
Jack Smith has 2 total records. Two in Second.
When graphing this data, I should be able to see:
First: 6 total.
Second: 2 total.
Third: 4 total.
Ideally, I could then also bring in CreatedDate and begin to gather data-over-time reports for:
How many duplicates per Stage.
How many duplicates per Person.
How many duplicates for specific date ranges, events, etc.

The cardinality of the two sets of data don't match. By that I mean the first set of data with the identified duplicates in is aggregated data across a number of customers (without identifying any customers). You can't then take the multiple separate Customer IDs and attribute them back to the aggregated rows.
I think what you need to do is re-frame what you are trying to get out of your data and work backwards. Post an example set of results that you are trying to achieve.
UPDATE:
It seems you want a list of customer\stage groups with counts?:
SELECT customerID,
FirstName,
LastName,
DateofBirth,
Email,
c1.Status,
CASE
WHEN c1.Stage = 6 THEN 'First'
WHEN c1.Stage = 7 THEN 'Second'
WHEN c1.Stage = 8 THEN 'Third'
WHEN c1.Stage = 11 THEN 'Fourth'
WHEN c1.Stage = 9 THEN 'Fifth'
WHEN c1.Stage = 10 THEN 'Sixth'
WHEN c1.Stage = 12 THEN 'Unknown'
ELSE ''
END AS Stage,
Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY customerID, FirstName, LastName, DateofBirth, Email, c1.Status, c1.Stage
HAVING COUNT(*) > 1

Related

SQL Server: finding duplicates between two parameters in different tables

I am trying to find duplicates in SQL Server where customers with the same forename, surname, and mobile number match. The thing is they are in different tables.
custid forename surname dateofbirth
-----------------------------------
1 David John 16-09-1985
2 David Jon 16-09-1985
3 Sarah Smith 10-08-2015
4 Peter Proca 11-06-2011
5 Peter Broca 11-06-2011
addid custid line1
-------------------------
1 1 0504135846
2 2 0504135846
3 3 0506523145
4 4 0503698521
5 5 0503698521
I am currently able to find duplicates by forename and surname, but if I want to find based on mobile numbers how can I bring it in?
select c.*
from
(select
c.*,
count(*) over (partition by left(surname, 3)) as cnt
from
customers c) c
order by
surname;
Use join:
select c.*
from (select c.*, t2.line1
count(*) over (partition by surname, forename, line1) as cnt
from customers c join
table2 t2
on t2.custid = c.custid
) c
order by surname;
Here you go, just JOIN on the table. Using HAVING might simplify your query as well
SELECT COUNT(*), c.forename, c.surname, mn.line1
FROM customers c
INNER JOIN mobilenumber mn ON c.custid=mn.custid
GROUP BY c.forename, c.surname, mn.line1
HAVING COUNT(*)>1
Also, you might need to LEFT JOIN if there is a chance that some records wont be in the mobilenumbers table.

How to remove duplicate columns from join in SQL

I have the following code
SELECT *
FROM customer
INNER JOIN
(SELECT
customerid, newspapername, enddate, n.publishedby
FROM
newspapersubscription ns, newspaper n
WHERE
publishedby IN (SELECT publishedby
FROM newspaper
WHERE ns.newspapername = n.NewspaperName)
UNION
SELECT
customerid, Magazinename, enddate, m.publishedby
FROM
magazinesubscription ms, magazine m
WHERE
publishedby IN (SELECT publishedby
FROM magazine
WHERE ms.Magazinename = m.MagazineName)) ON customer.customerid = customerid
ORDER BY
customer.customerid;
The customer table has the following:
customerid | customername | customersaddress
This query returns the following result:
customerid | customername | customersaddress | customerid | newspapername | enddate| publishedby
What I actually want is
customerid | customername | customersaddress | newspapername | magazinename | enddate| publishedby
Here, the newspapername field should be blank if the magazinename is present and vice versa. Also, the duplicate field of customerid from the union operations should not be present, while in my result, the value of both the newspapername and the magazinename are put under newspapername title.
How can I do that?
Since you are querying the table with '*', you will always get all the columns in both tables. In order to omit this column, you will have to manually name all columns you DO want to query. To address your other need, you need to simply insert a dummy column to each clause in the union query. Below is an example that should work to allow for what you want -
SELECT customer.customerid, customer.customername, customer.customeraddress, newspapername, magazinename, enddate, publishedby
FROM customer
INNER JOIN
(select customerid, newspapername, null Magazinename, enddate, n.publishedby
from newspapersubscription ns, newspaper n
where publishedby in(select publishedby
from newspaper
where ns.newspapername = n.NewspaperName)
UNION
select customerid, null newspapername, Magazinename, enddate, m.publishedby
from magazinesubscription ms, magazine m
where publishedby in(select publishedby
from magazine
where ms.Magazinename = m.MagazineName))
on customer.customerid = customerid
ORDER BY customer.customerid;
To get the projection you want, build sub-queries of the right shape and UNION them to get the result set. UNION ALL is better than UNION because it avoids a sort: you know you'll get a distinct set because you're joining on two different tables.
select * from (
select customer.*
, n.newspapername
, null as magazinename
, ns.enddate
, n.publishedby
from customer
join newspapersubscription ns
on ns.customerid = customer.customerid
join newspaper n
on n.newspapername = ns.newspapername
union all
select customer.*
, null as newspapername
, m.magazinename
, ms.enddate
, m.publishedby
from customer
join magazinesubscription ms
on ms.customerid = customer.customerid
join magazine m
on m.magazinename = ms.magazinename
)
order by customerid, newspapername nulls last, magazinename ;
Here is the output from my toy data set (which lacks publishedby columns:
CUSTOMERID CUSTOMERNAME NEWSPAPERNAME MAGAZINENAME ENDDATE
---------- -------------------- ---------------------- ---------------------- ---------
10 DAISY-HEAD MAISIE THE DAILY BUGLE 30-SEP-17
30 FOX-IN-SOCKS THE DAILY BUGLE 30-SEP-17
30 FOX-IN-SOCKS THE WHOVILLE TIMES 30-SEP-16
30 FOX-IN-SOCKS GREEN NEWS 31-DEC-17
30 FOX-IN-SOCKS TWEETLE BEETLE MONTHLY 31-DEC-16
40 THE LORAX GREEN NEWS 31-DEC-18
6 rows selected.
SQL>

Aggregate Functions To Pull More Record Field Data

I would like to know what would be the best way to get the data from a specific row when I use a Group By query. The real query is more complex than the example I'm providing here so I'm looking for something other than a sub-select on the Sales table.
I'm using MSSQL 2008 and I would like something that allow me to get the date field from the Sales record that has the max(amount).
Query
select uid, firstName, lastName, AmountFromTagetedRow, DateFromTargetedRow,
from users u inner join
sales s on u.uid = s.custID
group by uid, firstName, lastName
order by uid
USERS
uid firstName lastName
1 Sam Smith
2 Joe Brown
3 Kim Young
SALES
sid Custid date amount ...
1 1 2016-01-02 100
2 3 2016-01-12 485
3 1 2016-01-22 152
4 2 2016-02-01 156
5 1 2016-02-02 12
6 1 2016-03-05 84
7 2 2016-03-10 68
RESULTS
uid firstName LastName amount date
1 Sam Smith 152 2016-01-22
2 Joe Brown 156 2016-02-01
3 Kim Young 485 2016-01-12
Your posted query doesn't match your amount but something like this should get you pointed in the right direction.
with SortedResults as
(
select uid
, firstName
, lastName
, AmountFromTagetedRow
, DateFromTargetedRow
, ROW_NUMBER() over (partition by u.uid order by AmountFromTagetedRow desc) as RowNum
from users u inner join
sales s on u.uid = s.custID
group by uid
, firstName
, lastName
)
select *
from SortedResults
where RowNum = 1
order by uid

select only one row that has the highest count in sql

I need to select one row only which has the highest count. How do I do that?
This is my current code:
select firstname, lastname, count(*) as total
from trans
join work
on trans.workid = work.workid
join artist
on work.artistid = artist.artistid
where datesold is not null
group by firstname, lastname;
Example current:
FIRSTNAME | LASTNAME | TOTAL
------------------------------
Tom | Cruise | 3
Angelina | Jolie | 9
Britney | Spears | 5
Ellie | Goulding | 4
I need it to select only this:
FIRSTNAME | LASTNAME | TOTAL
--------------------------------
Angelina | Jolie | 9
You can add order by total desc and fetch first 1 row only (since Oracle 12c r1 only, otherwise you should use your result as temp table and select from it to use rownum = 1 limitation in the where clause) , in case you total can't be the same for different groups. The other way is to add this having clause, so you can list all people with maximum total:
having count(*) = (select max(total) from (select count(*) as total from <your_query>) tmp)
or that:
having count(*) = (select count(*) as total from <your_query> order by total desc fetch first 1 row only)
In Oracle 12, you can do:
select firstname, lastname, count(*) as total
from trans join
work
on trans.workid = work.workid join
artist
on work.artistid = artist.artistid
where datesold is not null
group by firstname, lastname
order by count(*) desc
fetch first 1 row only;
In older versions, you can do this with a subquery:
select twa.*
from (select firstname, lastname, count(*) as total
from trans join
work
on trans.workid = work.workid join
artist
on work.artistid = artist.artistid
where datesold is not null
group by firstname, lastname
order by count(*) desc
) twa
where rownum = 1;
this will work sql server 2012..
with CTECount (firstname, lastname,total)
as
(
select firstname, lastname, count(1) as total
from trans
join work
on trans.workid = work.workid
join artist
on work.artistid = artist.artistid
where datesold is not null
group by firstname, lastname
)
select top(1) with ties from CTECount
order by total desc
Thanks
You could do like this, very simple:
select TOP 1 firstname, lastname, count(*) as total from trans
join work on trans.workid=work.workid
join artist on work.artistid=artist.artistid
where datesold is not null
group by firstname, lastname
Order By Total DESC;

Specific Ordering in SQL

I have a SQL Server 2008 database. In this database, I have a result set that looks like the following:
ID Name Department LastOrderDate
-- ---- ---------- -------------
1 Golf Balls Sports 01/01/2015
2 Compact Disc Electronics 02/01/2015
3 Tires Automotive 01/15/2015
4 T-Shirt Clothing 01/10/2015
5 DVD Electronics 01/07/2015
6 Tennis Balls Sports 01/09/2015
7 Sweatshirt Clothing 01/04/2015
...
For some reason, my users want to get the results ordered by department, then last order date. However, not by department name. Instead, the departments will be in a specific order. For example, they want to see the results ordered by Electronics, Automotive, Sports, then Clothing. To throw another kink in works, I cannot update the table schema.
Is there a way to do this with a SQL Query? If so, how? Currently, I'm stuck at
SELECT *
FROM
vOrders o
ORDER BY
o.LastOrderDate
Thank you!
You can use case expression ;
order by case when department = 'Electronics' then 1
when department = 'Automotive' then 2
when department = 'Sports' then 3
when department = 'Clothing' then 4
else 5 end
create a table for the departments that has the name (or better id) of the department and the display order. then join to that table and order by the display order column.
alternatively you can do a order by case:
ORDER BY CASE WHEN Department = 'Electronics' THEN 1
WHEN Department = 'Automotive' THEN 2
...
END
(that is not recommended for larger tables)
Here solution with CTE
with c (iOrder, dept)
as (
Select 1, 'Electronics'
Union
Select 2, 'Automotive'
Union
Select 3, 'Sports'
Union
Select 4, 'Clothing'
)
Select * from c
SELECT o.*
FROM
vOrders o join c
on c.dept = o.Department
ORDER BY
c.iOrder