Find duplicates on the basis of a condition in sql - sql

So, I want to find duplicate IDs in a table on the basis of the condition.
I have multiple ids for files with year 2019, 2019, 2020, 2021. There can be possible overlap of ids between the files across years.
I want to find all the duplicate ids present in 2019 year, which are also present in rest of the years.
So if:
id
year
1
2019
1
2020
1
2021
2
2019
3
2019
4
2018
4
2019
I want:
id
1
4
Note: I only want IDs specific to 2019. If an Id is present in 2018 and 2020, that should go unmatched.
this is what I tried:
select id from table
intersect
select id from table where year='2019'
Thanks in advance!

Assuming you want to grab all of the rows where the year is 2019, you can use the very simple query:
SELECT * FROM TABLE WHERE year = '2019'
If you exclusively want to return IDs 1 and 4 for this year, you can use:
SELECT * FROM TABLE WHERE year = '2019' AND id IN ('1', '4')
If instead you want to return the years where there are duplicates you can use GROUP BY and HAVING:
SELECT id, year, COUNT(*)
FROM TABLE
WHERE year = '2019'
GROUP BY year
HAVING COUNT(*) > 1
Note that you'll need to replace TABLE with your table name.

TRY this: you can achieve exactly what you want by using EXISTS and HAVING as below:
CREATE TABLE #test(id INT, year INT)
INSERT INTO #test(id, year) VALUES
(1, 2019),
(1, 2020),
(1, 2021),
(2, 2019),
(3, 2019),
(4, 2018),
(4, 2019)
SELECT t.id
FROM #test t
WHERE EXISTS(SELECT 1 FROM #test t1 WHERE t.id = t1.id AND t1.year = 2019)
GROUP BY id HAVING COUNT(t.id) > 1

So there are 2 conditions, 1st is a simple where: year = 2019, second is a condition of group, if group by id, it can be written as having count(*) > 1.
However you can not write sql with both where and having, as where year = 2019 will impact the grouping, only 2019 rows participates the grouping, and all count is 1.
This can be written with with a sub query, to avoid above problem.
select id
from table
where id in (select id from table where year = 2019)
group by id
having count(*) > 1

If you formulated correctly the requirements the query would be:
select id, year, count(*)
from table_name tn
where tn.year = '2019'
and exists (select year
from table_name tn2
where tn2.id = tn.id and tn2.year = '2018')
and exists (select year
from table_name tn2
where tn2.id = tn.id and tn2.year = '2020')
and exists (select year
from table_name tn2
where tn2.id = tn.id and tn2.year = '2021')
group by tn.year
having count(*) > 1

Related

sql query to print id and sales from Table A for rows where sales of current year is greater than last year?

I want to do the following using SQL:
A query to print id and sales from Table a for rows where sales of current year is greater than last year
Let's take the following input for example:
id,sales,year
1,20k,1991
1,21k,1992
2,30k,1992
2,20k,1993
Added create table statement for reference.
CREATE TABLE a(id INT, sales INT, year INT);
INSERT INTO a VALUES(1, 20000, 1991);
INSERT INTO a VALUES(1, 21000, 1992);
INSERT INTO a VALUES(2, 30000, 1992);
INSERT INTO a VALUES(2, 20000, 1993);
Just use lag() . . . twice:
select a.*
from (select a.*,
lag(year) over (partition by id) as prev_year,
lag(sales) over (partition by id) as prev_sales
from a
) a
where prev_year = year - 1 and sales > prev_sales;
You need both lags to handle the case where years might be missing.
Maybe with something like this you can get what you want, using pure sql (valid for all rdbms's)
select
b.id, b.sales, b.year
from TableA a
join TableA b
on a.id = b.id
and a.year = b.year-1
where a.sales < b.sales

SQL Select Case When Count > 1

I have a table that looks like the below.
ParentID | PersonID | Year
----------------------------
1 1 2019
1 2 2020
3 3 2019
3 4 2020
5 5 2019
I'm trying to figure out how to select the current PersonID when a ParentID has more than one record so my results would look like the below.
ParentID | PersonID | Year
----------------------------
1 2 2020
3 4 2020
5 5 2019
I can't select just the max PersonID because we sometimes create Person records for the previous year, in which case the PersonID is greater, and we still want to return this year's record. I also can't select based on year, because if they don't have a record for this year, we still need their most recent record for each ever year that is.
I've tried selecting this subset in half a dozen ways at this point and have only managed to make my brain hurt. Any assistance would be appreciated!!
This is a typical greatest-n-per-group problem. To solve it, you need to think filtering rather than aggregation.
A portable solution is to filter with a correlated subquery that returns the latest year per parent_id:
select t.*
from mytable t
where t.year = (
select max(t1.year) from mytable t1 where t1.parent_id = t.parent_id
)
Assuming you are using MSSQL, this can be achieved by ROW_NUMBER. You can read more about ROW_NUMBER here. The PARTITION BY divides the result into partitions and apply row numbers to the partitions. So, applying partition to ParentId and sorting with Year descending, the data sorted ParentId by Year. Then remove the older data by using the RowNo = 1 condition.
Create Table Test(ParentId int, PersonId int, Year int);
INSERT INTO Test values
(1, 1, 2019),
(1, 2, 2020),
(3, 3, 2019),
(3, 4, 2020),
(5, 5, 2019);
SELECT ParentId, PersonId, Year FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY ParentId
ORDER BY Year /* Use PersonId if it fits correctly */ DESC) AS RowNo,
ParentId, PersonId, Year from Test -- Table Name
) E WHERE ROWNo = 1

SELECT a subset (n=1) of records that have the same class in a category column

For the past few days I have been stack with how to execute a query. I want to select from a table like the one below the records in which the category first changes its status.
Id Category Month
1 Start Jan
2 Start Feb
3 Middle Mar
4 Middle Apr
5 End May
From that table I want only the records in which the category changed. So I want my table from the SELECT to be like that:
Id Category Month
1 Start Jan
3 Middle Mar
5 End May
Thank you in advance for taking the time to answer my question.
Pull out minimum id for every category and get the month for that record:
select
t1.id, t1.category, t2.month
from (
select category, min(id) as id
from yourtable
group by category
) t1
join t2 on t1.id = t2.id and t1.category = t2.category -- second may not be necessary
If your id column is unique, then condition with category is not needed.
Above code would get value from month column for every category with minimum id.
If I'm understanding correctly, what you're after can be achieved using GROUP BY.
SELECT MIN(id) as ID, Category, MIN(Month) as Month
FROM MyTable
GROUP BY Category

SQL selecting where all distinct values exist in another column

I have a table in which the first two rows are Company, Year. Each company will have some years, but not necessarily all of them:
ABC | 2010
ABC | 2011
ABC | 2012
BBC | 2011 //does not have all the years, don't want to select it
I'd like to select a list of companies which have ALL the years (not just some of them), but I'm having trouble writing a select query to do that. I imagine this is really easy but I can't figure it out for some reason.
try
select company
from your_table
group by company
having count(distinct year) = (select count(distinct year) from your_table)
Select * FROM Company Where CompanyId In(
select CompanyId From Company
group by CompanyId
having count(*) = (select count(distinct Year) from Company)
)
http://www.sqlfiddle.com/#!3/c2f81/11
Note that if you alread know how many years there should be, then obviously you would just say that number instead of doing a select distinct year.
SELECT Company
FROM Table
GROUP BY Company
HAVING COUNT(Distinct YEAR) = 3

Split column in 3

Note: Tried a couple of the answers below (its in Teradata, so some of the answers are giving me syntax errors everywhere)
I hit a brick wall here.
I want to compare year by year in different columns
ID, Year, Revenue
1, 2009, 10
1, 2009, 20
1, 2010, 20
2, 2009, 5
2, 2010, 50
2, 2010, 1
How do I separate it by both ID and Year?
At the end I would like it to look like this
ID, Year, Sum
1, 2009, 30
1, 2009, 20
...
2, 2010, 51
(heavily edited for comprehension)
The best I can give you with the amount of detail you have provided is to break your table into subqueries:
select t1.yr - t2.yr from
(select yr
from the_table where yr = 2010) t1,
(select yr
from the_table where yr = 2010) t2
More detail could be given if we knew which type of database you are using, what the real structure of your table is, etc. but perhaps this will get you started.
something like this:
select id, t2009.year, t.2010.year, t2010.year-t.2009.year diff
from
( select id, year
from mytable
where year = 2009
) t2009
,
( select id, year
from mytable
where year = 2010
) t2010
You will most likely have to do a self-join
SELECT [what you are comparing] FROM [table] t1
[INNER/LEFT] JOIN [table] t2 ON t1.[someID] = t2.[someID]
WHERE t1.year = 2009 AND t2.year = 2010
In the someID would not necessarily have to be an ID, or even an indexed column, but it should be the column you are looking to compare across the years.
E.g. a table called 'Products' with columns/fields
ID
ProductName
Price
Year
You could do:
SELECT t1.ProductName, (t2.Price - t1.Price) As Price_change FROM Products t1
INNER JOIN Products t2 ON t1.ProductName = t2.ProductName
WHERE t1.year = 2009 AND t2.year = 2010
This would be faster is ProductName was a primary key or an indexed column. This would also be faster than using nested selects which are much much slower than joins (when joining on an index).
By your data and your desired output, I think you simply want this:
select ID, Year, SUM(Revenue)
from YourTable
GROUP BY ID, Year
Update
Now, if your first data sample is already a SELECT query, you need to:
select ID, Year, SUM(Revenue)
from (SELECT...) YourSelect
GROUP BY ID, Year
This looks like a good candidate for the ROLLUP command. It will give you automatic sums for the grouped-by columns:
GROUP BY ROLLUP (ID,Year)
More info here.