Find Duplicates using Derived Tables Only - sql

cte can be a memory hog sometimes. The following SQL works great until there are memory issues with other databases.
Any ideas how to reproduce Row_Number Over Partition using derived tables.
Table A holds the Work phone.
Table B holds the Id.
We have to join Table A with Table B in order to find duplicates in Table B; using the phone as the duplicate criteria.
This SQL works. I just want to see suggestions using derived tables instead.
;WITH cte
AS (SELECT Row_number() OVER (PARTITION BY a.WorkPhone ORDER BY b.id DESC ) AS
rownumber
,
a.WorkPhone,
a.id
FROM TableB B
JOIN TableA a
ON b.GroupofLeadsid = a.id
WHERE b.GroupofLeads = #GroupofLeads
AND NOT a.WorkPhone IS NULL
AND a.WorkPhone <> '')
UPDATE b
SET b.deleteflag = 1
FROM TableB b
JOIN cte t
ON b.id = t.id
WHERE b.GroupofLeads = #GroupofLeads
AND rownumber > 1

Update TableB
Set deleteflag = 1
From TableB As B
Join (
Select Row_Number() Over ( Partition By A.WorkPhone Order By B.Id Desc ) As RowNum
, A.WorkPhone, A.Id
, B.Id
From TableB As B1
Join TableA As A1
On A1.id = B1.GroupOfLeadsId
Where B1.GroupOfLeadsId = #GroupOfLeads
And A.WorkPhone <> ''
) As CTE
On CTE.Id = B.Id
Where CTE.RowNum > 1
Another choice:
Update TableB
Set deleteflag = 1
Where Exists (
Select 1
From (
Select Row_Number() Over ( Partition By A.WorkPhone Order By B.Id Desc ) As RowNum
, B.Id
From TableB As B1
Join TableA As A1
On A1.id = B1.GroupOfLeadsId
Where B1.GroupOfLeadsId = #GroupOfLeads
And A.WorkPhone <> ''
) As CTE
Where CTE.RowNum > 1
And CTE.Id = TableB.Id
)

Related

Removing duplicate values from a column in SQL

I have two tables A (group_id, id, subject) and B (id, date). Below is the joint table of tables A and B on id. I have tried using distinct and partition to remove the duplicates in group_id(field) only, but no luck:
My code:
select
a.group_id, a.id, a.subject, b.date
from
A a
inner join
(select
b.*,
row_number() over (partition by group_id order by date asc) as seqnum
from
B b) b on a.id = b.id and seqnum = 1
order by
date desc;
I got this error when I ran the code:
Partitioning can not be used stand-alone in query near 'partition by group_id order by date asc) as seqnum from B' at line 1
This is my expected result:
Thank you in advance!
It looks like you want the earliest date for each row in the table you show. Your question mentions two tables, but you only show one.
I recommend a correlated subquery in most databases:
select b.*
from b
where b.date = (select min(b2.date)
from b b2
where b2.group_id = b.group_id
);
I see. You need to join first and then use row_number():
select ab.*
from (select a.group_id, a.id, a.subject, b.date,
row_number() over (partition by a.group_id order by b.date) as seqnum
from A a join
B b
on a.id = b.id
) ab
where seqnum = 1
order by date desc;
You are almost there. But the column that you try to use to partition (ie group_id) comes from table a, which is not available in the subquery.
You would need to JOIN and assign the row number in a subquery, and then filter in the outer query.
select *
from (
select
a.group_id,
a.id,
a.subject,
b.date,
row_number() over (partition by a.group_id order by b.date asc) as seqnum
from a
inner join b on ON a.id = b.id
)
where seqnum = 1
ORDER BY date desc;
Another way to achieve your goal though it may not be the efficient one
SELECT
A.group_id, A.id, B.Date, A.subject
FROM A
INNER JOIN B
ON A.Id = B.Id
INNER JOIN
(
SELECT
A.Group_id, MIN(B.Date) AS Date
FROM A
INNER JOIN B
ON A.Id = B.Id
GROUP BY A.group_id
) AS supportTable
ON A.group_id = supportTable.group_id
AND B.Date = supportTable.Date

SQL duplicate values in multiple columns

I have a table with duplicate ID's, but other values in the second column.
Instead of removing all the duplicates with DISTINCT, I need 1 row with the ID and several columns with the values from the second column.
Here is what I mean: (has to become result)
Okay I've managed to fix it with this:
WITH cte AS (SELECT top 1000 *, ROW_NUMBER()OVER(PARTITION BY
id ORDER BY id) as RN FROM dbo.books) SELECT top
1000 a.id, a.category
, b.category as category2
, c.category as category3
, d.category as category4
from cte a
LEFT join cte b
on a.id = b.id
and a.RN = b.RN -1
LEFT JOIN cte c
ON a.id = c.id
AND a.RN = c.RN -2
LEFT JOIN cte d
ON a.id = d.id
AND a.RN = d.RN -3
WHERE a.RN = 1

SQL getting the last item by date

I have a query that I had working on one item, then I wiped the dataset and started over and now I can't get it to pull any data at all.
The query is basically:
SELECT *
FROM TABLEA
LEFT JOIN TABLEB
ON TABLEA.ID = TABLEB.ID
WHERE TABLEA.ID = 1
AND TABLEB.DATE = (SELECT MAX(TABLEB.DATE)
FROM TABLEB
WHERE TABLEA.ID = 1)
TableB tracks changes and has hundreds of entries per ID. I want a single row of the last chronological item pertaining to that ID.
If there is a better way to do this then awesome but I'd really like to know why this specific query isn't working. When I run this query:
SELECT MAX(TABLEB.DATE)
FROM TABLEB
LEFT JOIN TABLEB
ON TABLEA.ID = TABLEB.ID
WHERE TABLEA.ID = 1
I get the proper value of the last date in the dataset.
select *
from tableA as a
left outer join tableB as b on b.ID = a.ID
where
b.Date = (select max(t.Date) from tableB as t WHERE t.ID = a.id)
-- and a.ID = 1 if you need it
if you just need date from tableB, you can do
select *
from tableA as a
left outer join (
select max(t.Date) as Date, t.ID from tableB as t group by t.ID
) as b on b.ID = a.ID
-- where a.ID = 1 if you need it
if you can use row_number function (you can change common table expression to subquery):
with cte as (
select *, row_number() over(partition by a.ID order by b.Date desc) as rn
from tableA as a
left outer join tableB as b on b.ID = a.ID
    -- where a.ID = 1 if you need it
)
select *
from cte
where rn = 1
if you're using SQL Server version >= 2005:
select *
from tableA as a
outer apply (
select top 1 t.*
from tableB as t
where t.ID = a.ID
order by t.Date desc
) as b
-- where a.ID = 1 if you need it
Please note using aliases in all subqueries.
About your initial query, I think you has an typo there:
SELECT *
FROM TABLEA
LEFT JOIN TABLEB
ON TABLEA.ID = TABLEB.ID
WHERE TABLEA.ID = 1
AND TABLEB.DATE = (SELECT MAX(TABLEB.DATE)
FROM TABLEB
WHERE TABLEA.ID = 1) -- <-- Do you mean TABLEB.ID = 1 ??
Here is one way:
SELECT *
FROM tableA LEFT JOIN
tableB
ON tableA.ID = tableB.ID
WHERE tableA.ID = 1
ORDER BY tableB.Date desc
LIMIT 1;

Left Join Multiple Tables and Avoid Duplicates

I have two tables with a 1:n relationship to my base table, both of which I want to LEFT JOIN.
-------------------------------
Table A Table B Table C
-------------------------------
|ID|DATA| |ID|DATA| |ID|DATA|
-------------------------------
1 A1 1 B1 1 C1
- - 1 C2
I'm using:
SELECT * FROM TableA a
LEFT JOIN TableB b
ON a.Id = b.Id
LEFT JOIN TableC c
ON a.Id = c.Id
But this is showing duplicates for TableB:
1 A1 B1 C1
1 A1 B1 C2
How can I write this join to ignore the duplicates? Such as:
1 A1 B1 C1
1 A1 null C2
I think you need to do logic to get what you want. You want for any multiple b.ids to eliminate them. You can identify them using row_number() and then use case logic to make subsequent values NULL:
select a.id, a.val,
(case when row_number() over (partition by b.id, b.seqnum order by b.id) = 1 then val
end) as bval
c.val as cval
from TableA a left join
(select b.*, row_number() over (partition by b.id order by b.id) as seqnum
from tableB b
) b
on a.id = b.id left join
tableC c
on a.id = c.id
I don't think you want a full join between B and C, because you will get multiple rows. If B has 2 rows for an id and C has 3, then you will get 6. I suspect that you just want 3. To achieve this, you want to do something like:
select *
from (select b.*, row_number() over (partition by b.id order by b.id) as seqnum
from TableB b
) b
on a.id = b.id full outer join
(select c.*, row_number() over (partition by c.id order by c.id) as seqnum
from TableC c
) c
on b.id = c.id and
b.seqnum = c.seqnum join
TableA a
on a.id = b.id and a.id = c.id
This is enumerating the "B" and "C" lists, and then joining them by position on the list. It uses a full outer join to get the full length of the longer list.
The last join references both tables so TableA can be used as a filter. Extra ids in B and C won't appear in the results.
Do you want to use distinct
SELECT distinct * FROM TableA a
LEFT JOIN TableB b
ON a.Id = b.Id
LEFT JOIN TableC c
ON a.Id = c.Id
Do it as a UNION, i.e.
SELECT TableA.ID, TableB.ID, TableC.Id
FROM TableA a
INNER JOIN TableB b ON a.Id = b.Id
LEFT JOIN TableC c ON a.Id = c.Id
UNION
SELECT TableA.ID, Null, TableC.Id
FROM TableA a
LEFT JOIN TableC c ON a.Id = c.Id
i.e. one SELECT to being back the first row and another to bring back the second row. It's a bit rough because I don't know anything about the data you are trying to read but the principle is sound. You may need to rework it a bit.

Does not Recognize Column in Where Clause when Joining Tables

SELECT * FROM a
JOIN (SELECT * FROM b WHERE b.aId = a.Id) AS c ON c.aId = a.Id
It says does not recognize: a.Id in the Where Clause.
I know its probably cause im using a temp table and a.Id cannot be passed through but is there any way we can do that?
Because here is what actually happens
SELECT *
FROM a
JOIN (SELECT * FROM b
WHERE b.aId = a.Id
ORDER BY b.dateReg DESC
LIMIT 1) AS c ON c.aId = a.Id
I need the ORDER BY b.dateReg DESC LIMIT 1 as it returns me the last row that assosiates with the a Table.. If you require i can post the Create Query
-- find last rows on b
select * from b x
where exists(
select id
from b y
where y.id = b.id
having max(y.dateReg) = x.dateReg
group by id
)
-- then join that b to a, this is the final query:
select * from a
join
(
select * from b x
where exists(
select id
from b y
where y.id = b.id
having max(y.dateReg) = x.dateReg
group by id
)
) as last_rows on last_rows.id = a.id
-- simpler:
select *
from a join b x on a.id = x.id
where exists(
select id
from b y
where y.id = b.id
having max(y.dateReg) = x.dateReg
group by id)
-- or if you will use postgres:
select DISTINCT ON (a.id) *
from a join b x on a.id = x.id
order by a.id, b.dateReg DESC
-- look ma! no group by!
-- nothing beats postgresql's simplicity :-)
Try:
SELECT DISTINCT *
FROM A
JOIN B b ON b.aid = a.id
JOIN (SELECT b.aid,
MAX(b.datereg) 'max_datereg'
FROM B b
GROUP BY b.aid) md ON md.aid = b.aid
AND md.max_datereg = b.datereg
If you do want the first record associated with the associate, use:
SELECT DISTINCT *
FROM A
JOIN B b ON b.aid = a.id
JOIN (SELECT b.aid,
MIN(b.datereg) 'min_datereg'
FROM B b
GROUP BY b.aid) md ON md.aid = b.aid
AND md.min_datereg = b.datereg