Find Duplicates using Derived Tables Only

Find Duplicates using Derived Tables Only - sql

cte can be a memory hog sometimes. The following SQL works great until there are memory issues with other databases.
Any ideas how to reproduce Row_Number Over Partition using derived tables.
Table A holds the Work phone.
Table B holds the Id.
We have to join Table A with Table B in order to find duplicates in Table B; using the phone as the duplicate criteria.
This SQL works. I just want to see suggestions using derived tables instead.
;WITH cte
AS (SELECT Row_number() OVER (PARTITION BY a.WorkPhone ORDER BY b.id DESC ) AS
rownumber
,
a.WorkPhone,
a.id
FROM TableB B
JOIN TableA a
ON b.GroupofLeadsid = a.id
WHERE b.GroupofLeads = #GroupofLeads
AND NOT a.WorkPhone IS NULL
AND a.WorkPhone <> '')
UPDATE b
SET b.deleteflag = 1
FROM TableB b
JOIN cte t
ON b.id = t.id
WHERE b.GroupofLeads = #GroupofLeads
AND rownumber > 1

Update TableB
Set deleteflag = 1
From TableB As B
Join (
Select Row_Number() Over ( Partition By A.WorkPhone Order By B.Id Desc ) As RowNum
, A.WorkPhone, A.Id
, B.Id
From TableB As B1
Join TableA As A1
On A1.id = B1.GroupOfLeadsId
Where B1.GroupOfLeadsId = #GroupOfLeads
And A.WorkPhone <> ''
) As CTE
On CTE.Id = B.Id
Where CTE.RowNum > 1
Another choice:
Update TableB
Set deleteflag = 1
Where Exists (
Select 1
From (
Select Row_Number() Over ( Partition By A.WorkPhone Order By B.Id Desc ) As RowNum
, B.Id
From TableB As B1
Join TableA As A1
On A1.id = B1.GroupOfLeadsId
Where B1.GroupOfLeadsId = #GroupOfLeads
And A.WorkPhone <> ''
) As CTE
Where CTE.RowNum > 1
And CTE.Id = TableB.Id
)

Related

Removing duplicate values from a column in SQL

I have two tables A (group_id, id, subject) and B (id, date). Below is the joint table of tables A and B on id. I have tried using distinct and partition to remove the duplicates in group_id(field) only, but no luck:
My code:
select
a.group_id, a.id, a.subject, b.date
from
A a
inner join
(select
b.*,
row_number() over (partition by group_id order by date asc) as seqnum
from
B b) b on a.id = b.id and seqnum = 1
order by
date desc;
I got this error when I ran the code:
Partitioning can not be used stand-alone in query near 'partition by group_id order by date asc) as seqnum from B' at line 1
This is my expected result:
Thank you in advance!

It looks like you want the earliest date for each row in the table you show. Your question mentions two tables, but you only show one.
I recommend a correlated subquery in most databases:
select b.*
from b
where b.date = (select min(b2.date)
from b b2
where b2.group_id = b.group_id
);
I see. You need to join first and then use row_number():
select ab.*
from (select a.group_id, a.id, a.subject, b.date,
row_number() over (partition by a.group_id order by b.date) as seqnum
from A a join
B b
on a.id = b.id
) ab
where seqnum = 1
order by date desc;

You are almost there. But the column that you try to use to partition (ie group_id) comes from table a, which is not available in the subquery.
You would need to JOIN and assign the row number in a subquery, and then filter in the outer query.
select *
from (
select
a.group_id,
a.id,
a.subject,
b.date,
row_number() over (partition by a.group_id order by b.date asc) as seqnum
from a
inner join b on ON a.id = b.id
)
where seqnum = 1
ORDER BY date desc;

Another way to achieve your goal though it may not be the efficient one
SELECT
A.group_id, A.id, B.Date, A.subject
FROM A
INNER JOIN B
ON A.Id = B.Id
INNER JOIN
(
SELECT
A.Group_id, MIN(B.Date) AS Date
FROM A
INNER JOIN B
ON A.Id = B.Id
GROUP BY A.group_id
) AS supportTable
ON A.group_id = supportTable.group_id
AND B.Date = supportTable.Date

SQL duplicate values in multiple columns

I have a table with duplicate ID's, but other values in the second column.
Instead of removing all the duplicates with DISTINCT, I need 1 row with the ID and several columns with the values from the second column.
Here is what I mean: (has to become result)

Okay I've managed to fix it with this:
WITH cte AS (SELECT top 1000 *, ROW_NUMBER()OVER(PARTITION BY
id ORDER BY id) as RN FROM dbo.books) SELECT top
1000 a.id, a.category
, b.category as category2
, c.category as category3
, d.category as category4
from cte a
LEFT join cte b
on a.id = b.id
and a.RN = b.RN -1
LEFT JOIN cte c
ON a.id = c.id
AND a.RN = c.RN -2
LEFT JOIN cte d
ON a.id = d.id
AND a.RN = d.RN -3
WHERE a.RN = 1

SQL getting the last item by date

I have a query that I had working on one item, then I wiped the dataset and started over and now I can't get it to pull any data at all.
The query is basically:
SELECT *
FROM TABLEA
LEFT JOIN TABLEB
ON TABLEA.ID = TABLEB.ID
WHERE TABLEA.ID = 1
AND TABLEB.DATE = (SELECT MAX(TABLEB.DATE)
FROM TABLEB
WHERE TABLEA.ID = 1)
TableB tracks changes and has hundreds of entries per ID. I want a single row of the last chronological item pertaining to that ID.
If there is a better way to do this then awesome but I'd really like to know why this specific query isn't working. When I run this query:
SELECT MAX(TABLEB.DATE)
FROM TABLEB
LEFT JOIN TABLEB
ON TABLEA.ID = TABLEB.ID
WHERE TABLEA.ID = 1
I get the proper value of the last date in the dataset.

select *
from tableA as a
left outer join tableB as b on b.ID = a.ID
where
b.Date = (select max(t.Date) from tableB as t WHERE t.ID = a.id)
-- and a.ID = 1 if you need it
if you just need date from tableB, you can do
select *
from tableA as a
left outer join (
select max(t.Date) as Date, t.ID from tableB as t group by t.ID
) as b on b.ID = a.ID
-- where a.ID = 1 if you need it
if you can use row_number function (you can change common table expression to subquery):
with cte as (
select *, row_number() over(partition by a.ID order by b.Date desc) as rn
from tableA as a
left outer join tableB as b on b.ID = a.ID
    -- where a.ID = 1 if you need it
)
select *
from cte
where rn = 1
if you're using SQL Server version >= 2005:
select *
from tableA as a
outer apply (
select top 1 t.*
from tableB as t
where t.ID = a.ID
order by t.Date desc
) as b
-- where a.ID = 1 if you need it
Please note using aliases in all subqueries.
About your initial query, I think you has an typo there:
SELECT *
FROM TABLEA
LEFT JOIN TABLEB
ON TABLEA.ID = TABLEB.ID
WHERE TABLEA.ID = 1
AND TABLEB.DATE = (SELECT MAX(TABLEB.DATE)
FROM TABLEB
WHERE TABLEA.ID = 1) -- <-- Do you mean TABLEB.ID = 1 ??

Here is one way:
SELECT *
FROM tableA LEFT JOIN
tableB
ON tableA.ID = tableB.ID
WHERE tableA.ID = 1
ORDER BY tableB.Date desc
LIMIT 1;

Left Join Multiple Tables and Avoid Duplicates

I have two tables with a 1:n relationship to my base table, both of which I want to LEFT JOIN.
-------------------------------
Table A Table B Table C
-------------------------------
|ID|DATA| |ID|DATA| |ID|DATA|
-------------------------------
1 A1 1 B1 1 C1
- - 1 C2
I'm using:
SELECT * FROM TableA a
LEFT JOIN TableB b
ON a.Id = b.Id
LEFT JOIN TableC c
ON a.Id = c.Id
But this is showing duplicates for TableB:
1 A1 B1 C1
1 A1 B1 C2
How can I write this join to ignore the duplicates? Such as:
1 A1 B1 C1
1 A1 null C2

I think you need to do logic to get what you want. You want for any multiple b.ids to eliminate them. You can identify them using row_number() and then use case logic to make subsequent values NULL:
select a.id, a.val,
(case when row_number() over (partition by b.id, b.seqnum order by b.id) = 1 then val
end) as bval
c.val as cval
from TableA a left join
(select b.*, row_number() over (partition by b.id order by b.id) as seqnum
from tableB b
) b
on a.id = b.id left join
tableC c
on a.id = c.id
I don't think you want a full join between B and C, because you will get multiple rows. If B has 2 rows for an id and C has 3, then you will get 6. I suspect that you just want 3. To achieve this, you want to do something like:
select *
from (select b.*, row_number() over (partition by b.id order by b.id) as seqnum
from TableB b
) b
on a.id = b.id full outer join
(select c.*, row_number() over (partition by c.id order by c.id) as seqnum
from TableC c
) c
on b.id = c.id and
b.seqnum = c.seqnum join
TableA a
on a.id = b.id and a.id = c.id
This is enumerating the "B" and "C" lists, and then joining them by position on the list. It uses a full outer join to get the full length of the longer list.
The last join references both tables so TableA can be used as a filter. Extra ids in B and C won't appear in the results.

Do you want to use distinct
SELECT distinct * FROM TableA a
LEFT JOIN TableB b
ON a.Id = b.Id
LEFT JOIN TableC c
ON a.Id = c.Id

Do it as a UNION, i.e.
SELECT TableA.ID, TableB.ID, TableC.Id
FROM TableA a
INNER JOIN TableB b ON a.Id = b.Id
LEFT JOIN TableC c ON a.Id = c.Id
UNION
SELECT TableA.ID, Null, TableC.Id
FROM TableA a
LEFT JOIN TableC c ON a.Id = c.Id
i.e. one SELECT to being back the first row and another to bring back the second row. It's a bit rough because I don't know anything about the data you are trying to read but the principle is sound. You may need to rework it a bit.

Does not Recognize Column in Where Clause when Joining Tables

SELECT * FROM a
JOIN (SELECT * FROM b WHERE b.aId = a.Id) AS c ON c.aId = a.Id
It says does not recognize: a.Id in the Where Clause.
I know its probably cause im using a temp table and a.Id cannot be passed through but is there any way we can do that?
Because here is what actually happens
SELECT *
FROM a
JOIN (SELECT * FROM b
WHERE b.aId = a.Id
ORDER BY b.dateReg DESC
LIMIT 1) AS c ON c.aId = a.Id
I need the ORDER BY b.dateReg DESC LIMIT 1 as it returns me the last row that assosiates with the a Table.. If you require i can post the Create Query

-- find last rows on b
select * from b x
where exists(
select id
from b y
where y.id = b.id
having max(y.dateReg) = x.dateReg
group by id
)
-- then join that b to a, this is the final query:
select * from a
join
(
select * from b x
where exists(
select id
from b y
where y.id = b.id
having max(y.dateReg) = x.dateReg
group by id
)
) as last_rows on last_rows.id = a.id
-- simpler:
select *
from a join b x on a.id = x.id
where exists(
select id
from b y
where y.id = b.id
having max(y.dateReg) = x.dateReg
group by id)
-- or if you will use postgres:
select DISTINCT ON (a.id) *
from a join b x on a.id = x.id
order by a.id, b.dateReg DESC
-- look ma! no group by!
-- nothing beats postgresql's simplicity :-)

Try:
SELECT DISTINCT *
FROM A
JOIN B b ON b.aid = a.id
JOIN (SELECT b.aid,
MAX(b.datereg) 'max_datereg'
FROM B b
GROUP BY b.aid) md ON md.aid = b.aid
AND md.max_datereg = b.datereg
If you do want the first record associated with the associate, use:
SELECT DISTINCT *
FROM A
JOIN B b ON b.aid = a.id
JOIN (SELECT b.aid,
MIN(b.datereg) 'min_datereg'
FROM B b
GROUP BY b.aid) md ON md.aid = b.aid
AND md.min_datereg = b.datereg

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find Duplicates using Derived Tables Only - sql

Related

Removing duplicate values from a column in SQL

SQL duplicate values in multiple columns

SQL getting the last item by date

Left Join Multiple Tables and Avoid Duplicates

Does not Recognize Column in Where Clause when Joining Tables

Categories

Resources