SQL duplicate values in multiple columns - sql

I have a table with duplicate ID's, but other values in the second column.
Instead of removing all the duplicates with DISTINCT, I need 1 row with the ID and several columns with the values from the second column.
Here is what I mean: (has to become result)

Okay I've managed to fix it with this:
WITH cte AS (SELECT top 1000 *, ROW_NUMBER()OVER(PARTITION BY
id ORDER BY id) as RN FROM dbo.books) SELECT top
1000 a.id, a.category
, b.category as category2
, c.category as category3
, d.category as category4
from cte a
LEFT join cte b
on a.id = b.id
and a.RN = b.RN -1
LEFT JOIN cte c
ON a.id = c.id
AND a.RN = c.RN -2
LEFT JOIN cte d
ON a.id = d.id
AND a.RN = d.RN -3
WHERE a.RN = 1

Related

Removing duplicate values from a column in SQL

I have two tables A (group_id, id, subject) and B (id, date). Below is the joint table of tables A and B on id. I have tried using distinct and partition to remove the duplicates in group_id(field) only, but no luck:
My code:
select
a.group_id, a.id, a.subject, b.date
from
A a
inner join
(select
b.*,
row_number() over (partition by group_id order by date asc) as seqnum
from
B b) b on a.id = b.id and seqnum = 1
order by
date desc;
I got this error when I ran the code:
Partitioning can not be used stand-alone in query near 'partition by group_id order by date asc) as seqnum from B' at line 1
This is my expected result:
Thank you in advance!
It looks like you want the earliest date for each row in the table you show. Your question mentions two tables, but you only show one.
I recommend a correlated subquery in most databases:
select b.*
from b
where b.date = (select min(b2.date)
from b b2
where b2.group_id = b.group_id
);
I see. You need to join first and then use row_number():
select ab.*
from (select a.group_id, a.id, a.subject, b.date,
row_number() over (partition by a.group_id order by b.date) as seqnum
from A a join
B b
on a.id = b.id
) ab
where seqnum = 1
order by date desc;
You are almost there. But the column that you try to use to partition (ie group_id) comes from table a, which is not available in the subquery.
You would need to JOIN and assign the row number in a subquery, and then filter in the outer query.
select *
from (
select
a.group_id,
a.id,
a.subject,
b.date,
row_number() over (partition by a.group_id order by b.date asc) as seqnum
from a
inner join b on ON a.id = b.id
)
where seqnum = 1
ORDER BY date desc;
Another way to achieve your goal though it may not be the efficient one
SELECT
A.group_id, A.id, B.Date, A.subject
FROM A
INNER JOIN B
ON A.Id = B.Id
INNER JOIN
(
SELECT
A.Group_id, MIN(B.Date) AS Date
FROM A
INNER JOIN B
ON A.Id = B.Id
GROUP BY A.group_id
) AS supportTable
ON A.group_id = supportTable.group_id
AND B.Date = supportTable.Date

Getting MIN date

I have a table(A) that looks something like:
ID Date
1 2012/01/12
2 2012/01/01
3 2012/01/03
4 2012/03/12
If I wanted to grab the MIN date for this query, would I just group by?
select
a.ID,
MIN(a.DATE),
b.name,
c.price
FROM
tablea a inner join tableb b on a.ID = b.ID
inner join tablec c b.ID = c.ID
You want a window function. The correct expression is:
select a.id,
min(a.date) over () as mindate,
b.name, c.price
. . .
This says to get the min of the date over the data. There is no partition, so it gets it over all the data.
If you are looking for those that had the minimum date, then you can do this:
select
a.ID,
a.DATE,
b.name,
c.price
FROM tablea a
INNER JOIN
(
SELECT Id, MIN(Date) AS MinDate
FROM tablea
GROUP BY Id
) As minA ON a.date = mina.mindate AND a.id = mina.id
inner join tableb b on a.ID = b.ID
inner join tablec c b.ID = c.ID
WITH recordList
as
(
select a.ID,
a.DATE,
b.name,
c.price,
DENSE_RANK() OVER (PARTITION BY a.ID
ORDER BY a.Date ASC) rn
FROM tablea a
inner join tableb b on a.ID = b.ID
inner join tablec c b.ID = c.ID
)
SELECT ID, DATE, name, Price
FROM recordList
WHERE rn = 1

Left Join Multiple Tables and Avoid Duplicates

I have two tables with a 1:n relationship to my base table, both of which I want to LEFT JOIN.
-------------------------------
Table A Table B Table C
-------------------------------
|ID|DATA| |ID|DATA| |ID|DATA|
-------------------------------
1 A1 1 B1 1 C1
- - 1 C2
I'm using:
SELECT * FROM TableA a
LEFT JOIN TableB b
ON a.Id = b.Id
LEFT JOIN TableC c
ON a.Id = c.Id
But this is showing duplicates for TableB:
1 A1 B1 C1
1 A1 B1 C2
How can I write this join to ignore the duplicates? Such as:
1 A1 B1 C1
1 A1 null C2
I think you need to do logic to get what you want. You want for any multiple b.ids to eliminate them. You can identify them using row_number() and then use case logic to make subsequent values NULL:
select a.id, a.val,
(case when row_number() over (partition by b.id, b.seqnum order by b.id) = 1 then val
end) as bval
c.val as cval
from TableA a left join
(select b.*, row_number() over (partition by b.id order by b.id) as seqnum
from tableB b
) b
on a.id = b.id left join
tableC c
on a.id = c.id
I don't think you want a full join between B and C, because you will get multiple rows. If B has 2 rows for an id and C has 3, then you will get 6. I suspect that you just want 3. To achieve this, you want to do something like:
select *
from (select b.*, row_number() over (partition by b.id order by b.id) as seqnum
from TableB b
) b
on a.id = b.id full outer join
(select c.*, row_number() over (partition by c.id order by c.id) as seqnum
from TableC c
) c
on b.id = c.id and
b.seqnum = c.seqnum join
TableA a
on a.id = b.id and a.id = c.id
This is enumerating the "B" and "C" lists, and then joining them by position on the list. It uses a full outer join to get the full length of the longer list.
The last join references both tables so TableA can be used as a filter. Extra ids in B and C won't appear in the results.
Do you want to use distinct
SELECT distinct * FROM TableA a
LEFT JOIN TableB b
ON a.Id = b.Id
LEFT JOIN TableC c
ON a.Id = c.Id
Do it as a UNION, i.e.
SELECT TableA.ID, TableB.ID, TableC.Id
FROM TableA a
INNER JOIN TableB b ON a.Id = b.Id
LEFT JOIN TableC c ON a.Id = c.Id
UNION
SELECT TableA.ID, Null, TableC.Id
FROM TableA a
LEFT JOIN TableC c ON a.Id = c.Id
i.e. one SELECT to being back the first row and another to bring back the second row. It's a bit rough because I don't know anything about the data you are trying to read but the principle is sound. You may need to rework it a bit.

How many B and C has A?

I have this tables:
A:
id
1
2
B:
id a_id
1 1
2 1
3 1
C:
id a_id
1 1
2 1
3 2
I need this result:
A, CountB, CountC
1, 3, 2
2, 0, 1
This try doesnt work fine:
SELECT
A.id, COUNT(B.id), COUNT(C.id)
FROM
A
LEFT JOIN
B ON A.id = B.a_id
LEFT JOIN
C ON A.id = C.a_id
GROUP BY A.id
How must be the sql sentence without using correlative queries?
The following variation on yours should work:
SELECT A.id, COUNT(distinct B.id), COUNT(distinct C.id)
FROM A LEFT JOIN
B
ON A.id = B.a_id LEFT JOIN
C
ON A.id = C.a_id
GROUP BY A.id
However, there are those (such as myself) who feel that using count distinct is a cop-out. The problem is that the rows from B and from C are interfering with each other, multiplying in the join. So, you can also do each join independently, and then put the results together:
select ab.id, cntB, cntC
from (select a.id, count(*) as cntB
from A left outer join
B
on A.id = B.a_id
group by a.id
) ab join
(select a.id, count(*) as cntC
from A left outer join
C
on A.id = C.a_id
group by a.id
) ac
on ab.id = ac.id
For just counting, the first form is fine. If you need to do other summarizations (say, summing a value), then you generally need to split into the component queries.

Find Duplicates using Derived Tables Only

cte can be a memory hog sometimes. The following SQL works great until there are memory issues with other databases.
Any ideas how to reproduce Row_Number Over Partition using derived tables.
Table A holds the Work phone.
Table B holds the Id.
We have to join Table A with Table B in order to find duplicates in Table B; using the phone as the duplicate criteria.
This SQL works. I just want to see suggestions using derived tables instead.
;WITH cte
AS (SELECT Row_number() OVER (PARTITION BY a.WorkPhone ORDER BY b.id DESC ) AS
rownumber
,
a.WorkPhone,
a.id
FROM TableB B
JOIN TableA a
ON b.GroupofLeadsid = a.id
WHERE b.GroupofLeads = #GroupofLeads
AND NOT a.WorkPhone IS NULL
AND a.WorkPhone <> '')
UPDATE b
SET b.deleteflag = 1
FROM TableB b
JOIN cte t
ON b.id = t.id
WHERE b.GroupofLeads = #GroupofLeads
AND rownumber > 1
Update TableB
Set deleteflag = 1
From TableB As B
Join (
Select Row_Number() Over ( Partition By A.WorkPhone Order By B.Id Desc ) As RowNum
, A.WorkPhone, A.Id
, B.Id
From TableB As B1
Join TableA As A1
On A1.id = B1.GroupOfLeadsId
Where B1.GroupOfLeadsId = #GroupOfLeads
And A.WorkPhone <> ''
) As CTE
On CTE.Id = B.Id
Where CTE.RowNum > 1
Another choice:
Update TableB
Set deleteflag = 1
Where Exists (
Select 1
From (
Select Row_Number() Over ( Partition By A.WorkPhone Order By B.Id Desc ) As RowNum
, B.Id
From TableB As B1
Join TableA As A1
On A1.id = B1.GroupOfLeadsId
Where B1.GroupOfLeadsId = #GroupOfLeads
And A.WorkPhone <> ''
) As CTE
Where CTE.RowNum > 1
And CTE.Id = TableB.Id
)