SQL: Return only first occurrence - sql

I seldomly use SQL and I cannot find anything similar in my archive so I'm asking this simple query question: I need a query which one returns personID and only the first seenTime
Records:
seenID | personID | seenTime
108 3 13:34
109 2 13:56
110 3 14:22
111 3 14:31
112 4 15:04
113 2 15:52
Wanted result:
personID | seenTime
3 13:34
2 13:56
4 15:04
That's what I did & failed:
SELECT t.attendanceID, t.seenPersonID, t.seenTime
(SELECT ROW_NUMBER() OVER (PARTITION BY seenID ORDER BY seenID) AS RowNo,
seenID,
seenPersonID,
seenTime
FROM personAttendances) t
WHERE t.RowNo=1
P.S: Notice SQL CE 4

If your seenTime increases as seenID increases:
select personID, min(seenTime) as seenTime
from personAttendances
group by personID
Update for another case:
If this is not the case, and you really want the seenTime that corresponds with the minimum seenID (assuming seenID is unique):
select a.personID, a.seenTime
from personAttendances as a
join (
-- Get the min seenID for each personID
select personID, min(seenID) as seenID
from personAttendances
group by personID
) as b on a.personID = b.personID
where a.seenID = b.seenID

You're making it way too difficult:
select personID, min(seenTime)
from personAttendances
group by personID

for PostgreSQL there is DISTINCT ON

You need to order by seen time not by seen id:
PARTITION BY seenID ORDER BY seenTime

Add this to your SQL:
and where not exists
(select 1 from personAttendances t2
where t.personID=t2.personID
and t2.seenID < t.seenID)

Related

BigQuery row_number to remove duplicates

I want to keep only the ID with the latest timestamp from the table, is there a more optimal and efficient way to solve the problem
a query that I tried
SELECT * except(row_number)
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY ID)
row_number
FROM employees
)
WHERE row_number = 1
employees table:
ID NAME DEPARTMENT UPDATED_AT
1 James IT 2019-05-21 12:13:14
1 James IT 2019-05-21 12:14:14
1 James IT 2019-05-21 12:18:14
2 Pam HR 2019-05-26 13:18:14
2 Pam HR 2019-05-26 14:18:14
3 David IT 2019-06-22 14:18:14
3 David IT 2019-06-23 12:18:14
result:
ID NAME DEPARTMENT UPDATED_AT
1 James IT 2019-05-21 12:18:14
2 Pam HR 2019-05-26 14:18:14
3 David IT 2019-06-23 12:18:14
You are just missing the ORDER BY clause in your subquery statement.
WITH
DATA AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UPDATED_AT DESC) AS _row,
*
FROM
employees )
SELECT
* EXCEPT(_row)
FROM
DATA
WHERE
_row = 1
SELECT *
FROM employees
WHERE TRUE
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UPDATED_AT DESC) = 1

Select based on max date from another table

I'm trying to do a simple Select query by getting the country based on the MAX Last update from the other table.
Order#
1
2
3
4
The other table contains the country and the last update:
Order# Cntry Last Update
1 12/21/2019 9:19 PM
1 US 1/10/2020 1:07 AM
2 JP 7/29/2020 12:15 PM
3 CA 4/12/1992 2:04 PM
3 GB 11/6/2001 9:26 AM
3 DK 2/1/2005 3:04 AM
4 CN 8/20/2013 12:04 AM
4 10/1/2015 4:04 PM
My desired result:
Order# Country
1 US
2 JP
3 DK
4
Not sure the right solution for this. So far i'm stuck with this:
SELECT Main.[Order#], tempTable.Cntry
FROM Main
LEFT JOIN (
SELECT [Order#], Cntry, Max([Last Update]) as LatestDate FROM Country
GROUP BY [Order#], Cntry
) as tempTable ON Main.[Order#] = tempTable.[Order#];
Thanks in advance!
If needs only number of order and country,maybe don't need two tables:
SELECT distinct order, country
FROM
(
SELECT order, LAST_VALUE (country) OVER (PARTITION by [order] order by last_update) country FROM Country
) X
In SQL Server, you can use a correlated subquery:
update main
set country = (select top (1) s.country
from secondtable s
where s.order# = main.order#
order by s.lastupdate desc
);
EDIT:
A select would look quite simimilar:
select m.*,
(select top (1) country
from secondtable s
where s.order# = main.order#
order by s.lastupdate desc
)
from main m
I don't have time to try it with sample data, but is that what you are looking for?
select order orde, cntry
from table
where last_update =
(select max(last_update) from table where order = orde)

SQL: Take maximum value, but if a field is missing for a particular ID, ignore all values

This is somewhat difficult to explain...(this is using SQL Assistant for Teradata, which I'm not overly familiar with).
ID creation_date completion_date Difference
123 5/9/2016 5/16/2016 7
123 5/14/2016 5/16/2016 2
456 4/26/2016 4/30/2016 4
456 (null) 4/30/2016 (null)
789 3/25/2016 3/31/2016 6
789 3/1/2016 3/31/2016 30
An ID may have more than one creation_date, but it will always have the same completion_date. If the creation_date is populated for all records for an ID, I want to return the record with the most recent creation_date. However, if ANY creation_date for a given ID is missing, I want to ignore all records associated with this ID.
Given the data above, I would want to return:
ID creation_date completion_date Difference
123 5/14/2016 5/16/2016 2
789 3/25/2016 3/31/2016 6
No records are returned for 456 because the second record has a missing creation_date. The record with the most recent creation_date is returned for 123 and 789.
Any help would be greatly appreciated. Thanks!
Depending on your database, here's one option using row_number to get the max date per group. You can then filter those results with not exists to check against null values:
select *
from (
select *,
row_number() over (partition by id order by creation_date desc) rn
from yourtable
) t
where rn = 1 and not exists (
select 1
from yourtable t2
where t2.creationdate is null and t.id = t2.id
)
row_number is a window function that is supported in many databases. mysql doesn't but you can achieve the same result using user-defined variables.
Here is a more generic version using conditional aggregation:
select t.*
from yourtable t
join (select id, max(creation_date) max_creation_date
from yourtable
group by id
having count(case when creation_date is null then 1 end) = 0
) t2 on t.id = t2.id and t.creation_date = t2.max_creation_date
SQL Fiddle Demo

Getting a row with two group by constraints

I have a table
TIMESTAMP ID Name
5/30/2016 11:45 1 Ben
5/30/2016 11:45 2 Ben
5/30/2016 23:15 2 Ben
5/30/2016 7:30 1 Peter
5/30/2016 6:05 1 Peter
5/30/2016 14:40 2 May
5/30/2016 1:05 1 May
Now, I need to get the MIN timestamp for each distinct Name.
Then if there are more than one MIN entry, choose the one with the MAX ID.
So the result should be
TIMESTAMP ID Name
5/30/2016 11:45 2 Ben
5/30/2016 6:05 1 Peter
5/30/2016 1:05 1 May
I tried using the query below:
SELECT MIN(TIMESTAMP),NAME FROM TBLSAMPLE WHERE TIMESTAMP BETWEEN TO_DATE('5/30/2016', 'MM/DD/YYYY' ) AND TO_DATE('5/30/2016', 'MM/DD/YYYY' ) + 1
GROUP BY NAME
and I could get the minimum time. But once I add in MAX(ID) the result return an entry that does not match any of the rows.
Your help are really appreciated.
You can do this with row_number():
select t.*
from (select t.*,
row_number() over (partition by name order by timestamp asc, id desc) as seqnum
from tblsample t
) t
where seqnum = 1;
Your question doesn't specify a condition on the dates. But if you want to add a where clause, then add it to the subquery.

SQL query to group by data but with order by clause

I have table booking in which I have data
GUEST_NO HOTEL_NO DATE_FROM DATE_TO ROOM_NO
1 1 2015-05-07 2015-05-08 103
1 1 2015-05-11 2015-05-12 104
1 1 2015-05-14 2015-05-15 103
1 1 2015-05-17 2015-05-20 101
2 2 2015-05-01 2015-05-02 204
2 2 2015-05-04 2015-05-05 203
2 2 2015-05-17 2015-05-22 202
What I want is to get the result as.
1 ) It should show output as Guest_no, Hotel_no, Room_no, and column with count as number of time previous three column combination repeated.
So OutPut should like
GUEST_NO HOTEL_NO ROOM_NO Count
1 1 103 2
1 1 104 1
1 1 101 1
2 2 204 1
etc. But I want result to in ordered way e.g.: The output should be order by bk.date_to desc
My query is as below its showing me count but if I use order by its not working
select bk.guest_no, bk.hotel_no, bk.room_no,
count(bk.guest_no+bk.hotel_no+bk.room_no) as noOfTimesRoomBooked
from booking bk
group by bk.guest_no, bk.hotel_no, bk.room_no, bk.date_to
order by bk.date_to desc
So with adding order by result is showing different , because as I added order by date_to column so i have to add this column is group by clause too which will end up in different result as below
GUEST_NO HOTEL_NO ROOM_NO Count
1 1 103 1
1 1 104 1
1 1 103 1
1 1 101 1
2 2 204 1
Which is not the output I want.
I want these four column but with order by desc of date_to column and count as no of repetition of first 3 columns
I think a good way to do this would be grouping by guest_no, hotel_no and room_no, and sorting by the maximum (i.e. most recent) booking date in each group.
SELECT
guest_no,
hotel_no,
room_no,
COUNT(1) AS BookingCount
FROM
booking
GROUP BY
guest_no,
hotel_no,
room_no
ORDER BY
MAX(date_to) DESC;
Maybe this is what you're looking for?
select
guest_no,
hotel_no,
room_no,
count(*) as Count
from
booking
group by
guest_no,
hotel_no,
room_no
order by
min(date_to) desc
Or maybe max() instead of min(). SQL Fiddle: http://sqlfiddle.com/#!6/e684c/3
You could try this.
select t.* from
(
select bk.guest_no, bk.hotel_no, bk.room_no, bk.date_to,
count(*) as noOfTimesBooked from booking bk
group by bk.guest_no, bk.hotel_no, bk.room_no, bk.date_to
) t
order by t.date_to
You will also have to select date_to and then group the result by it.
If you use 'group by' clause, SQL Server doesn't allow you to use 'order by'. So you can make a sub query and use 'order by' in the outer query.
SELECT * FROM
(select bk.guest_no,bk.hotel_no,bk.room_no
,count(bk.guest_no+bk.hotel_no+bk.room_no) as noOfTimesRoomBooked,
(SELECT MAX(date_to) FROM booking CK
WHERE CK.guest_no=BK.guest_no AND bk.hotel_no=CK.bk.hotel_no
bk.room_no=CK.ROOM_NO ) AS DATEBOOK
from booking bk
group by bk.guest_no,bk.hotel_no,bk.room_no,bk.date_to) A
ORDER BY DATEBOOK
IT MIGHT HELP YOU