Select entire partition where max row in partition is greater than 1 - sql

I'm partitioning by some non unique identifier, but I'm only concerned in the partitions with at least two results. What would be the way to get out all the instances where there's exactly one of the specified identifier?
Query I'm using:
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable
What I'm getting:
row | nonUniqueId | aTimeStamp
---------------------------------
1 | 1234 | 2014-10-08...
2 | 1234 | 2014-10-09...
1 | 1235 | 2014-10-08...
1 | 1236 | 2014-10-08...
2 | 1236 | 2014-10-09...
What I want:
row | nonUniqueId | aTimeStamp
---------------------------------
1 | 1234 | 2014-10-08...
2 | 1234 | 2014-10-09...
1 | 1236 | 2014-10-08...
2 | 1236 | 2014-10-09...
Thanks for any direction :)

Based on syntax, I'm assuming this is SQL Server 2005 or higher. My answer will be meant for that.
You have a couple options.
One, use a CTE:
;WITH CTE AS (
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable
)
SELECT *
FROM CTE t
WHERE EXISTS (SELECT 1 FROM CTE WHERE row = 2 and nonUniqueId = t.nonUniqueId);
Or, you can use subqueries:
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable t
WHERE EXISTS (SELECT 1 FROM myTable
WHERE nonUniqueId = t.nonUniqueId GROUP BY nonUniqueId, aTimeStamp HAVING COUNT(*) >= 2);

Related

Select the highest value of column 2 per column 1

Given the following table P_PROV
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 1 |19/06/2019 | 1 |
| 2 |18/07/2010 | 2 |
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
I want this output
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
Putting this in words, I want to return per person the maximum date. I tried something like this
SELECT DISTINCT pp.date, pp.id FROM P_PROV pp
WHERE (SELECT MAX(aa.date)
FROM P_PROV aa) = pp.date;
This one is only returning one row (of course, because the MAX will return the maximum date only), but I really don't know how to approach this issue, any kind of help would be appreciated
ROW_NUMBER provides one way to handle this:
SELECT id, date, person_id
FROM
(
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY date DESC) rn
FROM yourTable t
) t
WHERE rn = 1;
Oracle has a fun way to do this using aggregation:
select max(id) keep (dense_rank first order by date desc) as id,
max(date) as date, person_id
from P_PROV
group by person_id;
Given that your ids are increasing, this probably also does what you want:
select max(id) as id, max(date) as date, person_id
from P_PROV
group by person_id;

SQL query for selecting multiple records for one product for a single id

My table looks like this, what I'm trying to achieve is to pull out all the records for one user for the product that have the earliest date
product |type_id| user | Date |Desired ROW_NUMBER as output |
-------+--------+------+-------+---------------------
1 | 1 | A | 0101 | 1
1 | 1 | A | 0102 | 1
2 | 3 | A | 0105 | 2
2 | 5 | A | 0105 | 2
3 | 7 | B | 0101 | 1
3 | 8 | B | 0104 | 1
So I want to pull all the records with "1" in the desired row_num column, but I haven't figured out hot to get this without doing another group by. Any helps would be appreciated.
You can use window functions:
select t.*
from (select t.*,
rank() over (partition by user order by min_date) as seqnum
from (select t.*,
min(date) over (partition by user, product) as min_date
from t
) t
) t
where seqnum = 1;
Or, with only one subquery:
select t.*
from (select t.*,
min(date) over (partition by user, product) as min_date_up,
min(date) over (partition by user) as min_date_u
from t
) t
where min_date_u = min_date_up;
You can interpret this as "return all rows where the product has the minimum date for the user".
Here is a db<>fiddle.
SELECT * FROM [tableName] WHERE Desired ROW_NUMBER = 1 ORDER BY Date[DESC, ASC]
Pass the Desired ROW_NUMBER value dynamically as a parameter.

Using ROW_NUMBER () to Compare 1st ARRAY to 2nd, 3rd, 4th, etc

I'm using ROW_NUMBER and I'm trying to compare arr in rn 1 to arr in rn 2,3,4,etc
to see if they overlap. I can do this with a subquery / simple join. Is there a way that AVOIDS a join?
rn | id | job | arr |desired_result
---+----+-----+--------+---------
1 | 1 | 100 | {1,2} | {1,2}
2 | 1 | 101 | {2,3} | {1,2}
3 | 1 | 102 | {5,6,8}| {1,2}
4 | 1 | 103 | {2,7} | {1,2}
I made a dbfiddle
--USING JOIN
WITH a AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY id ORDER by job) as rn
,*
FROM a_table
)
SELECT *
FROM (
SELECT id,arr
FROM a
WHERE rn = 1
) x
JOIN a
ON a.id=x.id
You can use first_value():
SELECT a.*, first_value(arr) over (partition by id order by job)
FROM a_table a;
row_number() does not seem necessary.

How to use DISTINCT ON (of PostgreSQL) in Firebird?

I have a TempTable with datas:
------------------------------------
| KEY_1 | KEY 2 | NAME | VALUE |
------------------------------------
| 1 | 0001 | NAME 2 | VALUE 1 |
| 1 | 0002 | NAME 1 | VALUE 3 |
| 1 | 0003 | NAME 3 | VALUE 2 |
| 2 | 0001 | NAME 1 | VALUE 2 |
| 2 | 0001 | NAME 2 | VALUE 1 |
------------------------------------
I want to get the following data:
------------------------------------
| KEY_1 | KEY 2 | NAME | VALUE |
------------------------------------
| 1 | 0001 | NAME 2 | VALUE 1 |
| 2 | 0001 | NAME 1 | VALUE 2 |
------------------------------------
In PostgreSQL, I use a query with DISTINCT ON:
SELECT DISTINCT ON (KEY_1) KEY_1, KEY_2, NAME, VALUE
FROM TempTable
ORDER BY KEY_1, KEY_2
In Firebird, how to get data as above datas?
PostgreSQL's DISTINCT ON takes the first row per stated group key considering the ORDER BY clause. In other DBMS (including later versions of Firebird), you'd use ROW_NUMBER for this. You number the rows per group key in the desired order and stay with those numbered #1.
select key_1, key_2, name, value
from
(
select key_1, key_2, name, value,
row_number() over (partition by key_1 order by key_2) as rn
from temptable
) numbered
where rn = 1
order by key_1, key_2;
In your example you have a tie (key_1 = 2 / key_2 = 0001 occurs twice) and the DBMS picks one of the rows arbitrarily. (You'd have to extend the sortkey both in DISTINCT ON and ROW_NUMBER to decide which to pick.) If you want two rows, i.e. showing all tied rows, you'd use RANK (or DENSE_RANK) instead of ROW_NUMBER, which is something DISTINCT ON is not capable of.
Firebird 3.0 supports window functions, so you can use:
select . . .
from (select t.*,
row_number() over (partition by key_1 order by key_2) as seqnum
from temptable t
) t
where seqnum = 1;
In earlier versions, you can use several methods. Here is a correlated subquery:
select t.*
from temptable t
where t.key_2 = (select max(t2.key_2)
from temptable t2
where t2.key_1 = t.key_1
);
Note: This will still return duplicate values for key_1 because of the duplicates for key_2. Alas . . . getting just one row is tricky unless you have a unique identifier for each row.

How can I remove partial duplicates in my MS SQL Database?

I have a database table which is automatically filled from different sources. Now I have the problem that there are some duplicate entries.
For example:
EID | TID | StartDate | EndDate
--------------------------------------------
1 | 1 | 20.01.2012 | 23.01.2012
1 | 2 | 25.01.2012 | 26.01.2012
1 | 3 | 27.01.2012 | 30.01.2012
2 | 2 | 20.02.2012 | 23.02.2012
2 | 2 | 25.01.2012 | 26.01.2012
3 | 1 | 20.01.2012 | 23.01.2012
As you can see, there are two rows in which EID and TID are the same. What I am trying to achieve is, that one the row, where the date is higher is deleted.
The only workaround I found, is a query where only the lower ones are selected.
SELECT EID, TID, Min(StartDate), Min(EndDate) FROM Table1 GROUP BY EID, TID
You can use a CTE and the ROW_NUMBER function:
WITH CTE AS
(
SELECT EID, TID, StartDate, EndDate,
RN = ROW_NUMBER() OVER (PARTITION BY EID, TID ORDER BY StartDate, EndDate)
FROM Table1
)
DELETE FROM CTE WHERE RN > 1
DEMO