SQL Select Case When Count > 1 - sql

I have a table that looks like the below.
ParentID | PersonID | Year
----------------------------
1 1 2019
1 2 2020
3 3 2019
3 4 2020
5 5 2019
I'm trying to figure out how to select the current PersonID when a ParentID has more than one record so my results would look like the below.
ParentID | PersonID | Year
----------------------------
1 2 2020
3 4 2020
5 5 2019
I can't select just the max PersonID because we sometimes create Person records for the previous year, in which case the PersonID is greater, and we still want to return this year's record. I also can't select based on year, because if they don't have a record for this year, we still need their most recent record for each ever year that is.
I've tried selecting this subset in half a dozen ways at this point and have only managed to make my brain hurt. Any assistance would be appreciated!!

This is a typical greatest-n-per-group problem. To solve it, you need to think filtering rather than aggregation.
A portable solution is to filter with a correlated subquery that returns the latest year per parent_id:
select t.*
from mytable t
where t.year = (
select max(t1.year) from mytable t1 where t1.parent_id = t.parent_id
)

Assuming you are using MSSQL, this can be achieved by ROW_NUMBER. You can read more about ROW_NUMBER here. The PARTITION BY divides the result into partitions and apply row numbers to the partitions. So, applying partition to ParentId and sorting with Year descending, the data sorted ParentId by Year. Then remove the older data by using the RowNo = 1 condition.
Create Table Test(ParentId int, PersonId int, Year int);
INSERT INTO Test values
(1, 1, 2019),
(1, 2, 2020),
(3, 3, 2019),
(3, 4, 2020),
(5, 5, 2019);
SELECT ParentId, PersonId, Year FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY ParentId
ORDER BY Year /* Use PersonId if it fits correctly */ DESC) AS RowNo,
ParentId, PersonId, Year from Test -- Table Name
) E WHERE ROWNo = 1

Related

Checking conditions per group, and ranking most recent row?

I'm handling a table like so:
Name
Status
Date
Alfred
1
Jan 1 2023
Alfred
2
Jan 2 2023
Alfred
3
Jan 2 2023
Alfred
4
Jan 3 2023
Bob
1
Jan 1 2023
Bob
3
Jan 2 2023
Carl
1
Jan 5 2023
Dan
1
Jan 8 2023
Dan
2
Jan 9 2023
I'm trying to setup a query so I can handle the following:
I'd like to pull the most recent status per Name,
SELECT MAX(Date), Status, Name
FROM test_table
GROUP BY Status, Name
Additionally I'd like in the same query to be able to pull if the user has ever had a status of 2, regardless of if the most recent one is 2 or not
WITH has_2_table AS (
SELECT DISTINCT Name, TRUE as has_2
FROM test_table
WHERE Status = 2 )
And then maybe joining the above on a left join on Name?
But having these as two seperate queries and joining them feels clunky to me, especially since I'd like to add additional columns and other checks. Is there a better way to set this up in one singular query, or is this the most effecient way?
You said, "I'd like to add additional columns" so I interpret that to mean you would like to Select the entire most recent record and add an 'ever-2' column.
You can either do this by joining two queries, or use window functions. Not knowing Snowflake Cloud Data, I cannot tell you which is more efficient.
Join 2 Queries
Select A.*,Coalesce(B.Ever2,"No") as Ever2
From (
Select * From testable x
Where date=(Select max(date) From test_table y
Where x.name=y.name)
) A Left Outer Join (
Select name,"Yes" as Ever2 From test_table
Where status=2
Group By name
) B On A.name=B.name
The first subquery can also be written as an Inner Join if correlated subqueries are implemented badly on your platform.
use of Window Functions
Select * From (
Select row_number() Over (Partition by name, order by date desc, status desc) as bestrow,
A.*,
Coalesce(max(Case When status=2 Then "Yes" End) Over (Partition By name Rows Unbounded Preceding And Unbounded Following), "No") as Ever2
From test_table A
)
Where bestrow=1
This second query type always reads and sorts the entire test_table so it might not be the most efficient.
Given that you have a different partitioning on the two aggregations, you could try going with window functions instead:
SELECT DISTINCT Name,
MAX(Date) OVER(
PARTITION BY Name, Status
) AS lastdate,
MAX(CASE WHEN Status = 2 THEN 1 ELSE 0 END) OVER(
PARTITION BY Name
) AS status2
FROM tab
I'd like to pull the most recent status per name […] Additionally I'd like in the same query to be able to pull if the user has ever had a status of 2.
Snowflake has sophisticated aggregate functions.
Using group by, we can get the latest status with arrays and check for a given status with boolean aggregation:
select name, max(date) max_date,
get(array_agg(status) within group (order by date desc), 0) last_status,
boolor_agg(status = 2) has_status2
from mytable
group by name
We could also use window functions and qualify:
select name, date as max_date,
status as last_status,
boolor_agg(status = 2) over(partition by name) has_status2
from mytable
qualify rank() over(order by name order by date desc) = 1

Find duplicates on the basis of a condition in sql

So, I want to find duplicate IDs in a table on the basis of the condition.
I have multiple ids for files with year 2019, 2019, 2020, 2021. There can be possible overlap of ids between the files across years.
I want to find all the duplicate ids present in 2019 year, which are also present in rest of the years.
So if:
id
year
1
2019
1
2020
1
2021
2
2019
3
2019
4
2018
4
2019
I want:
id
1
4
Note: I only want IDs specific to 2019. If an Id is present in 2018 and 2020, that should go unmatched.
this is what I tried:
select id from table
intersect
select id from table where year='2019'
Thanks in advance!
Assuming you want to grab all of the rows where the year is 2019, you can use the very simple query:
SELECT * FROM TABLE WHERE year = '2019'
If you exclusively want to return IDs 1 and 4 for this year, you can use:
SELECT * FROM TABLE WHERE year = '2019' AND id IN ('1', '4')
If instead you want to return the years where there are duplicates you can use GROUP BY and HAVING:
SELECT id, year, COUNT(*)
FROM TABLE
WHERE year = '2019'
GROUP BY year
HAVING COUNT(*) > 1
Note that you'll need to replace TABLE with your table name.
TRY this: you can achieve exactly what you want by using EXISTS and HAVING as below:
CREATE TABLE #test(id INT, year INT)
INSERT INTO #test(id, year) VALUES
(1, 2019),
(1, 2020),
(1, 2021),
(2, 2019),
(3, 2019),
(4, 2018),
(4, 2019)
SELECT t.id
FROM #test t
WHERE EXISTS(SELECT 1 FROM #test t1 WHERE t.id = t1.id AND t1.year = 2019)
GROUP BY id HAVING COUNT(t.id) > 1
So there are 2 conditions, 1st is a simple where: year = 2019, second is a condition of group, if group by id, it can be written as having count(*) > 1.
However you can not write sql with both where and having, as where year = 2019 will impact the grouping, only 2019 rows participates the grouping, and all count is 1.
This can be written with with a sub query, to avoid above problem.
select id
from table
where id in (select id from table where year = 2019)
group by id
having count(*) > 1
If you formulated correctly the requirements the query would be:
select id, year, count(*)
from table_name tn
where tn.year = '2019'
and exists (select year
from table_name tn2
where tn2.id = tn.id and tn2.year = '2018')
and exists (select year
from table_name tn2
where tn2.id = tn.id and tn2.year = '2020')
and exists (select year
from table_name tn2
where tn2.id = tn.id and tn2.year = '2021')
group by tn.year
having count(*) > 1

SQL - How to select row by compare date from 2 table

I have 2 table like that:
Table1:
ID | COMPANY_NAME | Rank | FIRST_REGIST_DATE
1 A 1 2017-09-01
2 B 2 2017-09-05
Table 2:
ID | COMPANY_NAME | RANK | FIRST_REGIST_DATE
1 A 3 2017-09-03
2 C 4 2017-09-04
I need to SELECT company with the data FIRST_REGIST_DATE and RANK
In case of company have 2 first regist date, we choose the earlier date and RANK greater
(EX: company A above will have first date: 2017-09-01)
The Expect result will like that:
Company A - Rank 3 - Date:2017-09-01
Please have me to SELECT in that case
This technically answers the question but avoids the elephant in the room (which ID takes preference?). As both tables have ID's that may overlap ({B,C} have an ID of 2) rules need to be defined as to which ID takes preference what will the other table ID's be renamed to.
Select COMPANY_NAME
,MIN(FIRST_REGIST_DATE) as REGIST_DATE
from (
SELECT *
FROM #table1
UNION ALL
SELECT *
FROM #table2
) t3
Group by COMPANY_NAME
OP dont change your question (by adding RANK) after the question has been answered.
For your changes: kindly contributed by #toha
Select COMPANY_NAME
,MIN(FIRST_REGIST_DATE) as REGIST_DATE
,MAX(RANK ) as RANK
from ( SELECT *
FROM #table1
UNION ALL
SELECT *
FROM #table2 ) t3
Group by COMPANY_NAME
If I understand the question correctly you have two tables with data containing company details and first registration date and you want to show the row with the earliest first registration date. The following query will help you.
SELECT company_name, MIN(fisrt_regist_date)
(
SELECT company_name, fisrt_regist_date
FROM table1
UNION ALL
SELECT company_name, fisrt_regist_date
FROM table2
) tab1
FROM tab1
GROUP BY company_name
The above query will combine the results of the first table and the second table and then show you the details of the company along with the oldest registration date.

Finding the first occurrence of an element in a SQL database

I have a table with a column for customer names, a column for purchase amount, and a column for the date of the purchase. Is there an easy way I can find how much first time customers spent on each day?
So I have
Name | Purchase Amount | Date
Joe 10 9/1/2014
Tom 27 9/1/2014
Dave 36 9/1/2014
Tom 7 9/2/2014
Diane 10 9/3/2014
Larry 12 9/3/2014
Dave 14 9/5/2014
Jerry 16 9/6/2014
And I would like something like
Date | Total first Time Purchase
9/1/2014 73
9/3/2014 22
9/6/2014 16
Can anyone help me out with this?
The following is standard SQL and works on nearly all DBMS
select date,
sum(purchaseamount) as total_first_time_purchase
from (
select date,
purchaseamount,
row_number() over (partition by name order by date) as rn
from the_table
) t
where rn = 1
group by date;
The derived table (the inner select) selects all "first time" purchases and the outside the aggregates based on the date.
The two key concepts here are aggregates and sub-queries, and the details of which dbms you're using may change the exact implementation, but the basic concept is the same.
For each name, determine they're first date
Using the results of 1, find each person's first day purchase amount
Using the results of 2, sum the amounts for each date
In SQL Server, it could look like this:
select Date, [totalFirstTimePurchases] = sum(PurchaseAmount)
from (
select t.Date, t.PurchaseAmount, t.Name
from table1 t
join (
select Name, [firstDate] = min(Date)
from table1
group by Name
) f on t.Name=f.Name and t.Date=f.firstDate
) ftp
group by Date
If you are using SQL Server you can accomplish this with either sub-queries or CTEs (Common Table Expressions). Since there is already an answer with sub-queries, here is the CTE version.
First the following will identify each row where there is a first time purchase and then get the sum of those values grouped by date:
;WITH cte
AS (
SELECT [Name]
,PurchaseAmount
,[date]
,ROW_NUMBER() OVER (
PARTITION BY [Name] ORDER BY [date] --start at 1 for each name at the earliest date and count up, reset every time the name changes
) AS rn
FROM yourTableName
)
SELECT [date]
,sum(PurchaseAmount) AS TotalFirstTimePurchases
FROM cte
WHERE rn = 1
GROUP BY [date]

SQL query to select most recent of duplicates

I have a table of values, with a date stored against each entry for example
Name
Age
PaymentAmount
Date
Can someone help me to write a query that would show the most recent payment only of any person within a certain age range.
E.g If I had 5 entries, and wanted the most recent payment of all people aged 20-25
Allan, 45, $1500, 1/1/2014
Tim, 22, $1500, 1/2/2001
John, 25, $2000, 2/3/2001
Tim, 22, $2500, 1/2/2010
John, 25, $3000, 2/3/2010
It would return the bottom 2 rows only
You didn't state your DBMS, so this is ANSI SQL
select *
from (
select name,
age,
PaymentAmount,
Date,
row_number() over (partition by name order by date desc) as rn
from the_table
where age between 22 and 25
) t
where rn = 1;
Another option is to use a co-related subquery:
select name,age,paymentamount,date
from the_table t1
where age between 22 and 25
and date = (select max(date)
from the_table t2
where t2.name = t1.name
and t2.age between 22 and 25)
order by name;
Usually the solution with a window function is faster than the co-related subquery as only a single access to the table is needed.
SQLFiddle: http://sqlfiddle.com/#!15/17e37/4
Btw: having a column named age is a bit suspicious because you need to update that every year. You should rather store the date of birth and then calculate the age when retrieving the data.
This query would give you all records of most recent payment of age 20 and 25. Limit it by using TOP 2 or LIMIT 2 or rownum <=2 as per your DB syntax
SELECT NAME,AGE,PAYMENTAMOUNT,DATE FROM MY_TABLE
WHERE AGE BETWEEN 20 AND 25
AND DATE IN
(
SELECT MAX(DATE)
FROM MY_TABLE
WHERE
AGE BETWEEN 20 AND 25
);
EDIT as per horse_with_no_name:
SELECT NAME,AGE,PAYMENTAMOUNT,DATE
FROM the_table
WHERE AGE BETWEEN 20 AND 25
AND DATE IN
(
SELECT (DATE)
FROM the_table
WHERE
AGE BETWEEN 20 AND 25 order by date desc limit 2
)
limit 2;
Fiddle reference : http://sqlfiddle.com/#!15/17e37/10
Simplest of all,Try this following query
select name,age,paymentamount,date from yourtablename where date in (select max(date) from yourtablename where age between 20 and 25 and group by name);
You should Create a Table with Identity Column to make your Life easier
ColumnPrimaryKey IDENTITY (1,1)
Name
Age
PaymentAmount
Date
SELECT TOP 2 * FROM [TableName] Where Age BETWEEN 20 AND 25 ORDER BY [PrimaryKey] DESC
The above query will return the top two row Inserted in table
You can use between like
select * from meta where title='$title' and (date between '$start_date' and '$end_date').
Okay, I know you said SQL-- here's for people with two layers.
VIA SQL:
Order your SQL results by date descending (should be newest to oldest...).
VIA YOUR "BACK END":
Create an empty final set.
As you are iterating through your results, if your result row person is not in your final set, add the data to the final set.
Boom, your final set has the latest of each person.