Removing duplicate values from a column in SQL

Removing duplicate values from a column in SQL - sql

I have two tables A (group_id, id, subject) and B (id, date). Below is the joint table of tables A and B on id. I have tried using distinct and partition to remove the duplicates in group_id(field) only, but no luck:
My code:
select
a.group_id, a.id, a.subject, b.date
from
A a
inner join
(select
b.*,
row_number() over (partition by group_id order by date asc) as seqnum
from
B b) b on a.id = b.id and seqnum = 1
order by
date desc;
I got this error when I ran the code:
Partitioning can not be used stand-alone in query near 'partition by group_id order by date asc) as seqnum from B' at line 1
This is my expected result:
Thank you in advance!

It looks like you want the earliest date for each row in the table you show. Your question mentions two tables, but you only show one.
I recommend a correlated subquery in most databases:
select b.*
from b
where b.date = (select min(b2.date)
from b b2
where b2.group_id = b.group_id
);
I see. You need to join first and then use row_number():
select ab.*
from (select a.group_id, a.id, a.subject, b.date,
row_number() over (partition by a.group_id order by b.date) as seqnum
from A a join
B b
on a.id = b.id
) ab
where seqnum = 1
order by date desc;

You are almost there. But the column that you try to use to partition (ie group_id) comes from table a, which is not available in the subquery.
You would need to JOIN and assign the row number in a subquery, and then filter in the outer query.
select *
from (
select
a.group_id,
a.id,
a.subject,
b.date,
row_number() over (partition by a.group_id order by b.date asc) as seqnum
from a
inner join b on ON a.id = b.id
)
where seqnum = 1
ORDER BY date desc;

Another way to achieve your goal though it may not be the efficient one
SELECT
A.group_id, A.id, B.Date, A.subject
FROM A
INNER JOIN B
ON A.Id = B.Id
INNER JOIN
(
SELECT
A.Group_id, MIN(B.Date) AS Date
FROM A
INNER JOIN B
ON A.Id = B.Id
GROUP BY A.group_id
) AS supportTable
ON A.group_id = supportTable.group_id
AND B.Date = supportTable.Date

Related

Join unrelated table with unequal rows

I would like to join Table A, Table B, and Table C as the expected result in the attached image.

You can enumerate the rows and use that for joining . . . which might be what you want:
select ab.*, c.*
from (select a.*, b.*, -- really list out the columns you want
row_number() over (order by accountid) as seqnum
from a join
b
on a.accountid = b.accountid
) ab join
(select c.*, row_number() over (order by code) as seqnum
from c
) c
on ab.seqnum = c.seqnum

Getting the last entry within a join

I know this has been asked a lot but I can't seem to get my query working.
I'm trying to get only one row per id in a query looking like this :
SELECT a.id, b.name
FROM table1 a
LEFT JOIN table2 b ON a.key = b.key
WHERE a.Date =
(SELECT MAX(a1.date) from table1 WHERE a1.primarykey = a.primarykey)
GROUP BY a.id, b.name
I do not need to group by b.name but have to since I need to group by id.
Right now, I have multiple occurences for b.name which duplicates a.id where I just want the corresponding b.name for the last date for a.id.
Can anyone point me to the right way to do this ?
Thank you

I guess this condition:
WHERE a1.primarykey = a.primarykey
should be:
WHERE a1.key = a.key
and key is not the primary key of table1, because if you really mean the primary key then there is no point to search for the MAX(date) for the primary key since there is only 1 date for each primary key.
If I'm not wrong then try with row_number():
SELECT t.id, t.name
FROM (
SELECT a.id, b.name,
row_number() over (partition by a.key order by a.date desc) rn
FROM table1 a LEFT JOIN table2 b
ON a.key = b.key
) t
WHERE t.rn = 1

It looks like you would be getting 1 row per id if you would be removing b.name from your group statement.
Not sure why you would need to group on b.name if you group on a.id?

try this:
SELECT a.id, b.name from (
SELECT a1.id,a1.key,
rank() over(partition by a1.key order by a1.date desc) md FROM table1 a1 )a
LEFT JOIN table2 b ON a.key = b.key and a.md=1;
but I don't get -you need group by Id or key, double check it

Left join combining GETDATE()

I have the below tables and i trying to LEFT JOIN from table A to table B to get Code & Time. The issue is that i get multiple lines for each code. What i want to get is one row for each Code with the Time which i less than the GETDATE () ordering desc.
Tables:
Code:
SELECT
[ID],
Date_Time
FROM Table_A
LEFT JOIN Table_B
ON A.ID = B.Project_Code

You can use apply:
select a.*, b.*
from a cross apply
(select top (1) b.*
from b
where b.code = a.code and b.time < getdate()
order by b.time desc
) b;
This assumes that time is really a datetime. If you just want to compare times, then use convert(time, getdate()).

You need to add an extra join clause to only return records from table before the specified DATETIME,then simply use MAX to get the latest:
SELECT a.[ID],
Date_Time = MAX(b.Date_Time)
FROM Table_A AS a
LEFT JOIN Table_B AS b
ON b.Project_Code = a.ID
AND b.Date_Time < GETDATE()
GROUP BY a.ID;
If the column Date_time uses the TIME data type (your sample data suggests it does, but your column name suggests it does not), then you will need to convert GETDATE() to a time:
SELECT a.[ID],
Date_Time = MAX(b.Date_Time)
FROM Table_A AS a
LEFT JOIN Table_B AS b
ON b.Project_Code = a.ID
AND b.Date_Time < CONVERT(TIME, GETDATE())
GROUP BY a.ID;
If there are other columns you need to return from Table_B, then you will need to use OUTER APPLY, or a subquery with a ranking function.
OUTER APPLY
SELECT a.[ID],
b.Date_Time,
b.SomeOtherColumn
FROM Table_A AS a
OUTER APPLY
( SELECT TOP 1 b.Date_Time, b.SomeOtherColumn
FROM Table_B AS b
WHERE A.ID = B.Project_Code
AND b.Date_Time < GETDATE()
ORDER BY b.Date_Time DESC
) AS b;
SUBQUERY WITH RANKING FUNCTION
SELECT a.[ID],
b.Date_Time,
b.SomeOtherColumn
FROM Table_A AS a
LEFT JOIN
( SELECT b.Project_Code,
b.Date_Time,
b.SomeOtherColumn,
RowNumber = ROW_NUMBER() OVER(PARTITION BY b.Project_Code ORDER BY b.Date_Time DESC)
FROM Table_B AS b
WHERE b.Date_Time < GETDATE()
ORDER BY b.Date_Time DESC
) AS b
ON b.Project_Code = a.ID
AND b.RowNumber = 1;

SQL: Modifying Inner Join to Select One Row

I have two tables, A and B that I want to inner join on location. However, for each row in A, there are many rows in B whose location matches. I want to end up with at most the same number of rows as in A. Specifically, I want to take the row in B where date is earliest. Here's what I have so far:
SELECT *
FROM A
INNER JOIN B ON A.location = B.location
How would I modify this so that each row in A only gets joined with a single row in B (using the earliest date)?
Attempt:
SELECT *
FROM A
INNER JOIN B ON A.location = B.location
AND B.date = (SELECT MIN(date) FROM B)
Is that the right approach?

You can use the ANSI/ISO standard row_number() function:
SELECT *
FROM A INNER JOIN
(SELECT B.*, ROW_NUMBER() OVER (PARTITION BY B.location ORDER BY B.date) as seqnum
FROM B
) B
ON A.location = B.location AND seqnum = 1;

SELECT TOP(1) * FROM A
INNER JOIN B ON
A.LOCATION=B.LOCATION
ORDER BY B.DATE

Getting MIN date

I have a table(A) that looks something like:
ID Date
1 2012/01/12
2 2012/01/01
3 2012/01/03
4 2012/03/12
If I wanted to grab the MIN date for this query, would I just group by?
select
a.ID,
MIN(a.DATE),
b.name,
c.price
FROM
tablea a inner join tableb b on a.ID = b.ID
inner join tablec c b.ID = c.ID

You want a window function. The correct expression is:
select a.id,
min(a.date) over () as mindate,
b.name, c.price
. . .
This says to get the min of the date over the data. There is no partition, so it gets it over all the data.

If you are looking for those that had the minimum date, then you can do this:
select
a.ID,
a.DATE,
b.name,
c.price
FROM tablea a
INNER JOIN
(
SELECT Id, MIN(Date) AS MinDate
FROM tablea
GROUP BY Id
) As minA ON a.date = mina.mindate AND a.id = mina.id
inner join tableb b on a.ID = b.ID
inner join tablec c b.ID = c.ID

WITH recordList
as
(
select a.ID,
a.DATE,
b.name,
c.price,
DENSE_RANK() OVER (PARTITION BY a.ID
ORDER BY a.Date ASC) rn
FROM tablea a
inner join tableb b on a.ID = b.ID
inner join tablec c b.ID = c.ID
)
SELECT ID, DATE, name, Price
FROM recordList
WHERE rn = 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Removing duplicate values from a column in SQL - sql

Related

Join unrelated table with unequal rows

Getting the last entry within a join

Left join combining GETDATE()

SQL: Modifying Inner Join to Select One Row

Getting MIN date

Categories

Resources