SQL(ite) Remove (and keep some) duplicates in table - sql

Say I have a table, called tablex, as follows:
name|year
---------
Bob | 2010
Mary| 2011
Sam | 2012
Mary| 2012
Bob | 2013
Names appear at most twice. I want to remove from the table only those names that are repeated and have a difference of one year (in which case I want to keep the newer year).
name|year
---------
Bob | 2010
Sam | 2012
Mary| 2012
Bob | 2013
I have tried:
SELECT a.Name, a.Year, b.Year
FROM tablex AS a
LEFT JOIN tablex AS b
ON a.Name=b.Name AND (a.Year=b.Year OR b.Year-a.Year=1)
ORDER BY a.Name, a.Year
results in:
Name YearA YearB
1 Bob 2010 2010
2 Bob 2013 2013
3 Mary 2011 2011
4 Mary 2011 2012
5 Mary 2012 2012
6 Sam 2012 2012
Bob's and Sam's entries are correct, how can I restrict it further to only include Mary 2012 2012?

From the question is not clear if you want to SELECT (suppressing the duplicates) or actually DELETE the "duplicates". The select case:
SELECT a.Name, a.Year
FROM tablex AS a
WHERE NOT EXISTS (
SELECT * FROM tablex AS b
WHERE b.Name = a.Name
AND b.Year = a.Year +1
);
And the delete case:
DELETE
FROM tablex AS a
WHERE EXISTS (
SELECT * FROM tablex AS b
WHERE b.Name = a.Name
AND b.Year = a.Year +1
);

DELETE FROM tablex t
WHERE year + 1 =
(SELECT MAX(year)
FROM tablex
WHERE name = t.name)
Or, if you don't want to delete anything, but want a query to only give the desired results:
SELECT *
FROM tablex t
WHERE year + 1 !=
(SELECT MAX(year)
FROM tablex
WHERE name = t.name)

you can use :
SELECT a.Name, a.Year, b.Year
FROM tablex AS a
LEFT JOIN tablex AS b
ON a.Name=b.Name AND (a.Year=b.Year )
ORDER BY a.Name, a.Year

This operation deletes rows from the table where the same name with a newer year also exist:
delete from tablex t1 where year < (select max(year) from tablex where name = t1.name)

Related

Select data with latest date for each id

I am currently working on Microsoft Access an I am struggling to do what I want.
I have this table A:
Table A
id title name date
123 azer dfgd 1
123 afg qsd 5
123 arr poi 7
123 aur qhg 3
456 aoe aer 3
456 iuy zer 4
And I would like to get the columns id,title and name that have the latest date (highest number) for each id
With that example, the query would give
id title name date
123 arr poi 7
456 iuy zer 4
I hope you'll be able to help me.
Thanks in advance !
I would recommend a correlated subquery:
select a.*
from a
where a.date = (select max(a2.date) from a as a2 where a2.id = a.id);
For performance, you want an index on a(id, date).
With NOT EXISTS:
select t.*
from tablename t
where not exists (
select 1 from tablename
where id = t.id and date > t.date
)

SQL Output with multi join on different rows

I'm trying to join 2 different data sets with different columns and when I make the join I get repeated results.
My input dataset1 with actual data:
Cust_id Year sales
----------------------
1 2016 679862
1 2017 705365
1 2018 195662
1 2019 201234
2 2016 51074
2 2017 50611
2 2018 19070
2 2019 20123
My input dataset2 with estimated data:
Cust_id Year salesest
-------------------------
1 2018 779862
1 2019 125662
2 2017 23456
2 2018 32856
2 2019 26602
Desired output:
Cust_id Year sales salesest
-------------------------------
1 2016 679862 null
1 2017 705365 null
1 2018 195662 779862
1 2019 201234 125662
2 2016 51074 null
2 2017 50611 23456
2 2018 19070 32856
2 2019 20123 26602
This is what I have tried:
select
a.*, b.salesest
from
tab1 a, tab2 b
where
a.Cust_id = b.Cust_id
You want a LEFT JOIN. The correct syntax is:
select a.*, e.salesest
from actuals a left join
estimates e
on a.Cust_id = e.Cust_id and
a.year = e.year;
you also need to specify the year - and make an outer join for the times when there is no corresponding year in the other table.
select a.*, b.salesest
frpm tab1 a, tab2 b
where
a.Cust_id=b.Cust_id
AND a.YEAR = b.YEAR (+)

How to join only latest date values from another table and prevent duplication

I'm trying to lookup a unique value from table b and get it into table a.
Table b stores multiple values that are changing by date.
I would like to join but only getting the values with the latest date from table b.
Table a
Unique ID
1
2
Table b
Date Unique ID Price
01/01/2019 1 100
01/02/2019 1 101
01/03/2019 1 102
01/01/2019 2 90
01/02/2019 2 91
01/03/2019 2 92
Expected result
Unique ID Price Date
1 102 01/03/2019
2 92 01/03/2019
Appreciate your help!
Have a sub-query that returns each UniqueID together with its max date. IN that result.
select * from tablename
where (UniqueID, date) in (select UniqueID, max(date)
from tablename
group by UniqueID)
You want correlated subquery :
select b.*
from tableb b
where b.date = (select max(b1.date) from tableb b1 where b1.UniqueID = b.UniqueID);
If you want to go with JOIN then you can do JOIN with subquery :
select a.UniqueID , b.Price, b.Date
from tablea a inner join
tableb b
on b.UniqueID = a.UniqueID
where b.date = (select max(b1.date) from tableb b1 where b1.UniqueID = a.UniqueID);
A correlated subquery?
select b.*
from b
where b.date = (select max(b2.date) from b b2 where b2.unique_id = b.unique_id);

SQL query help

Sorry for posting this question again. I rephrased my question a little bit.
I am trying to write a query to return rows from Table-A where multiple rows found in Table-B with STATUS = 1 for each CID column from Table-A.
So in this example CID 100 has two records found in Table-B and STATUS = 1. So I want to write a query to return this row from Table-A. I know this is a weird table design. Please help.
Here are the tables with sample data.
Table-A
-----------------------------------------
AID Name CID
---------------------------------------
10 test1 100
12 test1 100
13 test2 101
14 test2 101
15 test3 102
Table-B
------------------------------------
bID AID status
-----------------------------------
1 10 1
2 12 1
3 14 1
4 15 1
Try this query:
SELECT TableA.CID
FROM TableA
JOIN TableB ON TableA.AID = TableB.AID
WHERE TableB.status = 1
GROUP BY TableA.CID
HAVING COUNT(*) > 1
It returns 100 for your example data.
Something like this?
select aid,
status
from (select aid,
count(*) as cnt
from tableA
group by aid) as aggregated
left join tableB on tableB.aid = aggregated.aid
where aggregated.cnt > 1
If your using SQL:
WITH tableBView AS
(
SELECT AID AS xxxAID
FROM [Table-B]
WHERE status = 1
GROUP BY AID
HAVING COUNT(*) > 0
)
SELECT *
FROM [Table-A]
WHERE EXISTS (SELECT * FROM tableBView WHERE xxxAID = AID)
SELECT *
FROM Table-A a
WHERE a.CID IN
(
SELECT a.CID FROM Table-A a JOIN Table-B b USING (AID)
GROUP BY a.CID
WHERE b.status = 1
HAVING count(*) > 1
)
This is a very verbose way to do it.
Selects all columns from Table-A on rows where AID match between Table-A and Table-B and more than one row with the same CID exists in Table-A:
(Btw, I wouldn't use "-" in your table/column names. Use "_" instead.)
select
derived_table.AID,
derived_table.Name,
derived_table.CID
from
(select
table_A.AID,
table_A.Name,
table_A.CID,
count(table_A.CID) c
from
Table_A
inner join Table_B on (Table_A.AID = table_B.AID)
group by table_A.CID
) derived_table
where
c > 1

Tricky SQL - Select non-adjacent numbers

Given this data on SQL Server 2005:
SectionID Name
1 Dan
2 Dan
4 Dan
5 Dan
2 Tom
7 Tom
9 Tom
10 Tom
How would I select records where the sectionID must be +-2 or more from another section for the same name.
The result would be:
1 Dan
4 Dan
2 Tom
7 Tom
9 Tom
Thanks for reading!
SELECT *
FROM mytable a
WHERE NOT EXISTS
(SELECT *
FROM mytable b
WHERE a.Name = b.Name
AND a.SectionID = b.SectionID + 1)
Here's LEFT JOIN variant of Anthony's answer (removes consecutive id's from the results)
SELECT a.*
FROM mytable a
LEFT JOIN mytable b ON a.Name = b.Name AND a.SectionID = b.SectionID + 1
WHERE b.SectionID IS NULL
EDIT: Since there is another interpretation of the question (simply getting results where id's are more than 1 number apart) here is another attempt at an answer:
WITH alternate AS (
SELECT sectionid,
name,
EXISTS(SELECT a.sectionid
FROM mytable b
WHERE a.name = b.name AND
(a.sectionid = b.sectionid-1 or a.sectionid = b.sectionid+1)) as has_neighbour,
row_number() OVER (PARTITION by a.name ORDER BY a.name, a.sectionid) as row_no
FROM mytable a
)
SELECT sectionid, name
FROM alternate
WHERE row_no % 2 = 1 OR NOT(has_neighbour)
ORDER BY name, sectionid;
gives:
sectionid | name
-----------+------
1 | Dan
4 | Dan
2 | Tom
7 | Tom
9 | Tom
Logic: if a record has neighbors with same name and id+/-1 then every odd row is taken, if it has no such neighbors then it gets the row regardless if it is even or odd.
As stated in the comment the condition is ambiguous - on start of each new sequence you might start with odd or even rows and the criteria will still be satisfied with different results (even with different number of results).