SQL Server partition by gives duplicate records - sql

I have following table:
Date | ID | firstname
---------+----+------------
20161128 | 1 | Adam
20161128 | 2 | Steve
20161128 | 2 | Steve
20161128 | 3 | Aaron
20161129 | 1 | Adam
20161129 | 2 | Steve
20161129 | 2 | Steve
20161129 | 3 | Aaron
I want to get the first row by ID for one particular date.
So what I had was:
SELECT *
FROM tableA
WHERE Date = 20161128
this however, gives all records. So I used the partition over function:
SELECT
*,
row_number() over(partition by ID order by Date desc)
FROM tableA
WHERE Date = 20161128
In this case, I get following result:
Date | ID | firstname | rownum
---------+----+-----------+-------
20161129 | 1 | Adam | 1
20161129 | 1 | Adam | 2
20161129 | 2 | Steve | 1
20161129 | 2 | Steve | 2
20161129 | 2 | Steve | 3
20161129 | 2 | Steve | 4
20161129 | 2 | Steve | 5
20161129 | 2 | Steve | 6
20161129 | 3 | Aaron | 1
20161129 | 3 | Aaron | 2
As you can see, Most ID's appear 2 times. (ID 2 even appears 6 times). In other cases, I see a record appear 10 times even though it would only have one record if I used the first query.
Any idea why this happens and how this can be fixed? My guess would be the date/where clause, but I don't see how this can effect the result this much.

You need a WHERE clause if you want to filter the records:
SELECT a.*
FROM (SELECT a.*,
row_number() over(partition by ID order by Date desc) as seqnum
FROM tableA a
WHERE a.Date = '20161128'
) a
WHERE seqnum = 1;
This will return one row per date per id number.

You can replace
SELECT *,
row_number() over(partition by ID order by Date desc)
FROM tableA
WHERE Date = 20161128
to
SELECT *
FROM tableA
WHERE ID = (select min(ID) from tableA )

This will only display the first instance.
Select * from
(SELECT *,
rownum=row_number() over(partition by PersonID_EXT order by SnapshotDate desc)
FROM tableA
WHERE Date = 20161128)x where rownum =1

Related

Get some values from the table by selecting

I have a table:
| id | Number |Address
| -----| ------------|-----------
| 1 | 0 | NULL
| 1 | 1 | NULL
| 1 | 2 | 50
| 1 | 3 | NULL
| 2 | 0 | 10
| 3 | 1 | 30
| 3 | 2 | 20
| 3 | 3 | 20
| 4 | 0 | 75
| 4 | 1 | 22
| 4 | 2 | 30
| 5 | 0 | NULL
I need to get: the NUMBER of the last ADDRESS change for each ID.
I wrote this select:
select dh.id, dh.number from table dh where dh =
(select max(min(t.history)) from table t where t.id = dh.id group by t.address)
But this select not correctly handling the case when the address first changed, and then changed to the previous value. For example id=1: group by return:
| Number |
| -------- |
| NULL |
| 50 |
I have been thinking about this select for several days, and I will be happy to receive any help.
You can do this using row_number() -- twice:
select t.id, min(number)
from (select t.*,
row_number() over (partition by id order by number desc) as seqnum1,
row_number() over (partition by id, address order by number desc) as seqnum2
from t
) t
where seqnum1 = seqnum2
group by id;
What this does is enumerate the rows by number in descending order:
Once per id.
Once per id and address.
These values are the same only when the value is 1, which is the most recent address in the data. Then aggregation pulls back the earliest row in this group.
I answered my question myself, if anyone needs it, my solution:
select * from table dh1 where dh1.number = (
select max(x.number)
from (
select
dh2.id, dh2.number, dh2.address, lag(dh2.address) over(order by dh2.number asc) as prev
from table dh2 where dh1.id=dh2.id
) x
where NVL(x.address, 0) <> NVL(x.prev, 0)
);

SQL group by a field and only return one joined row for each grouping

Table data
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 2 | 7 August | cat | Y |
| 3 | 10 August | cat | Z |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
What I want to do is group by the name, then for each group choose one of the rows with the earliest required by date.
For this data set, I would like to end up with either rows 1 and 4, or rows 2 and 4.
Expected result:
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
OR
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 2 | 7 August | cat | Y |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
I have something that returns 1,2 and 4 but I'm not sure how to only pick one from the first group to get the desired result. I'm joining the grouping with the data table so that I can get the ID and another_field back after the grouping.
SELECT d.id, d.name, d.required_by, d.another_field
FROM
(
SELECT min(required_by) as min_date, name
FROM data
GROUP BY name
) agg
INNER JOIN
data d
on d.required_by = agg.min_date AND d.name = agg.name
This is typically solved using window functions:
select d.id, d.name, d.required_by, d.another_field
from (
select id, name, required_by, another_field,
row_number() over (partition by name order by required_by) as rn
from data
) d
where d.rn = 1;
In Postgres using distinct on() is typically faster:
select distinct on (name) *
from data
order by name, required_by
Online example
SELECT [id]
,[date]
,[name]
FROM [test].[dbo].[data]
WHERE date IN (SELECT min(date) FROM data GROUP BY name)
enter image description here

Calculate how many rows are ahead of position in column when condition is met

How can I calculate how many people are ahead of Jane on Floor 2 (not including those on floor 1)?
+------+---------+----------+
|Index | Name | Floor |
+------+---------+----------+
| 1 | Sally | 1 |
| 2 | Sue | 1 |
| 3 | Fred | 1 |
| 4 | Wally | 2 |
| 5 | Tommy | 2 |
| 6 | Jane | 2 |
| 7 | Bart | 2 |
| 8 | Sam | 3 |
+------+---------+----------+
The expected result is 2 as there are 2 people (Wally & Tommy) ahead of Jane on floor 2.
I've tried using CHARINDEX to find the row number from a temp table that I've generated but that doesn't seem to work:
SELECT CHARINDEX('Jane', Name) as position
INTO #test
FROM tblExample
WHERE Floor = 2
select ROW_NUMBER() over (order by position) from #test
WHERE position = 1
I think a simple row_number() would do the trick
Select Value = RN-1
From (
Select *
,RN = row_number() over (partition by [floor] order by [index])
From YourTable
Where [Floor]=2
) A
Where [Name]='Jane'
You could do:
select count(*)
from t
where t.floor = 2 and
t.id < (select t2.id from t t2 where t2.name = 'Jane' and t2.floor = 2);
With an index on (floor, name, id), I would expect this to be faster than row_number().

SQL: Select single item per name with multiple criteria

I'm trying to select a single item per value in a "Name" column according to several criteria.
The criteria I want to use look like this:
Only include results where IsEnabled = 1
Return the single result with the lowest priority (we're using 1 to mean "top priority")
In case of a tie, return the result with the newest Timestamp
I've seen several other questions that ask about returning the newest timestamp for a given value, and I've been able to adapt that to return the minimum value of Priority - but I can't figure out how to filter off of both Priority and Timestamp.
Here is the question that's been most helpful in getting me this far.
Sample data:
+------+------------+-----------+----------+
| Name | Timestamp | IsEnabled | Priority |
+------+------------+-----------+----------+
| A | 2018-01-01 | 1 | 1 |
| A | 2018-03-01 | 1 | 5 |
| B | 2018-01-01 | 1 | 1 |
| B | 2018-03-01 | 0 | 1 |
| C | 2018-01-01 | 1 | 1 |
| C | 2018-03-01 | 1 | 1 |
| C | 2018-05-01 | 0 | 1 |
| C | 2018-06-01 | 1 | 5 |
+------+------------+-----------+----------+
Desired output:
+------+------------+-----------+----------+
| Name | Timestamp | IsEnabled | Priority |
+------+------------+-----------+----------+
| A | 2018-01-01 | 1 | 1 |
| B | 2018-01-01 | 1 | 1 |
| C | 2018-03-01 | 1 | 1 |
+------+------------+-----------+----------+
What I've tried so far (this gets me only enabled items with lowest priority, but does not filter for the newest item in case of a tie):
SELECT DATA.Name, DATA.Timestamp, DATA.IsEnabled, DATA.Priority
From MyData AS DATA
INNER JOIN (
SELECT MIN(Priority) Priority, Name
FROM MyData
GROUP BY Name
) AS Temp ON DATA.Name = Temp.Name AND DATA.Priority = TEMP.Priority
WHERE IsEnabled=1
Here is a SQL fiddle as well.
How can I enhance this query to only return the newest result in addition to the existing filters?
Use row_number():
select d.*
from (select d.*,
row_number() over (partition by name order by priority, timestamp) as seqnum
from mydata d
where isenabled = 1
) d
where seqnum = 1;
The most effective way that I've found for these problems is using CTEs and ROW_NUMBER()
WITH CTE AS(
SELECT *, ROW_NUMBER() OVER( PARTITION BY Name ORDER BY Priority, TimeStamp DESC) rn
FROM MyData
WHERE IsEnabled = 1
)
SELECT Name, Timestamp, IsEnabled, Priority
From CTE
WHERE rn = 1;

Sql two table query most duplicated foreign key

I got those two tables sport and student:
First table sport:
|idsport | name |
_______________________
| 1 | bobsled |
| 2 | skating |
| 3 | boarding |
| 4 | iceskating |
| 5 | skiing |
Second table student:
foreign key
|idstudent | name | sport_idsport
__________________________________________
| 1 | john | 3 |
| 2 | pauly | 2 |
| 3 | max | 1 |
| 4 | jane | 2 |
| 5 | nico | 5 |
so far i did this it output which number is mostly inserted, but cant get it to work
with two tables
SELECT sport_idsport
FROM (SELECT sport_idsport FROM student GROUP BY sport_idsport ORDER BY COUNT(*) desc)
WHERE ROWNUM<=1;
I need to output name of most popular sport, in that case it would be skating.
I use oracle sql.
with counter as (
Select sport_idsport,
count(*) as cnt,
dense_rank() over (order by count(*) desc) as rn
from student
group by sport_idsport
)
select s.*, c.cnt
from sport s
join counter c on c.sport_idsport = s.idsport and c.rn = 1;
SQLFiddle example: http://sqlfiddle.com/#!4/b76e21/1
select cnt, sport_idsport from (
select count(*) cnt, sport_idsport
from student
group by sport_idsport
order by count(*) desc
)
where rownum = 1