Use MIN() where you cannot GROUP?

Use MIN() where you cannot GROUP? - sql

I feel pretty dumb, but I get stuck with an apparently very easy query. I have something like this, where every row is a user that watched a movie:
user_id date duration
1 01-01-01 62m
1 03-01-01 95m
2 02-01-01 58m
2 06-01-01 25m
2 08-01-01 95m
3 03-01-01 96m
Now, what I would like to have is a table where I have the first movie watched by each user and its duration. The problem is if I use MIN() then I have to GROUP both user_id and duration. But if I GROUP for duration as well, then I am basically going to have the same table back. How can I solve the problem?

You can use a ranking function like ROW_NUMBER:
WITH CTE AS
(
SELECT rn = ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date ASC),
user_id, date, duration
FROM dbo.TableName
)
SELECT user_id, date, duration FROM CTE WHERE rn = 1
The advantage of ROW_NUMBER is that you can change the logic easily. For example, if you want to reverse the logic and get the row of the last watched film per user, you just have to change ORDER BY date ASC to ORDER BY date DESC.
The advantage of theCTE (common-table-expression) is that you can also use it to delete or update these records. Often used to delete or identify duplicates. So you can first select to see what will be deleted/updated before you execute it.

Try this query. I haven't tested it.
SELECT date, duration FROM tablename n
WHERE NOT EXISTS(
SELECT date, user_id FROM tablename g
WHERE n.user_id = g.user_id AND g.date < n.date
);

Assuming there can only be a single record per user per date, it'd be something like this:
select y.*
from table t
inner join (
select user_id, min(date) mindate
from table
group by user_id
) t1
on t.user_id = t1.user_id
and t.date = t1.mindate

You can use ROW_NUMBER() which is a ranking function that generates sequential number for every group based on the column that you want to sort. In this case, if there is a tie, only one record for every user is selected but if you want to select all of them, you need to use DENSE_RANK() rather than ROW_NUMBER()
SELECT user_id, date, duration
FROM
(
SELECT user_id, date, duration,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date) rn
FROM tableName
) a
WHERE rn = 1
this also assumes that the data type of column date is DATE

If you are using SQL Server 2005 or later, you can use windowing functions.
SELECT *
FROM
(
SELECT user_id, date, duration, MIN(date) OVER(PARTITION BY user_id) AS MIN_DATE
FROM MY_TABLE
) AS RESULTS
WHERE date = MIN_DATE
The over clause and partion by will "group by" the user_id and select the min date per user_id without eliminating any rows. Then you select from the table where the date is equal to the min date and you are left with the first date per user_id. This is a common trick once you know about windowing functions.

If you want the first watch_date per user, there should be no date before this date for this user:
SELECT *
FROM watched_movies wm
WHERE NOT EXISTS (
SELECT *
FROM watched_movies nx
WHERE nx.user_id = wm.user_id
AND nx.watch_date < wm.watch_date
);
Note: I replaced the date column by watch_date, since date is a reserved word (type name).

This should give you the duration of the first movie watched on the earliest date:
SELECT a.user_id, b.date, a.duration
FROM table a
INNER JOIN (SELECT user_id,min(date) date FROM table GROUP BY user_id) b ON a.user_id = b.user_id AND a.date = b.date
INNER JOIN (SELECT user_id,date,min(session_id) FROM table GROUP BY user_id, date) c ON b.user_id = c.user_id AND b.date = c.date AND a.session_id = c.session_id

Try this:
WITH TABLE1
AS (SELECT
'1' AS USER_ID,
'01-01-01' AS DT,
62 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'1' AS USER_ID,
'03-01-01' AS DT,
95 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'2' AS USER_ID,
'02-01-01' AS DT,
58 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'2' AS USER_ID,
'06-01-01' AS DT,
25 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'2' AS USER_ID,
'08-01-01' AS DT,
95 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'3' AS USER_ID,
'03-01-01' AS DT,
96 AS DURATION
FROM
DUAL)
SELECT
*
FROM
(SELECT
USER_ID,
DT,
DURATION,
RANK ( ) OVER (PARTITION BY USER_ID ORDER BY DT ASC) AS ROW_RANK
FROM
TABLE1)
WHERE
ROW_RANK = 1

Use a sub-query to get the min date then join that back to the table to get all other relevant columns.
SELECT T2.user_id
,T2.date
,T2.duration
FROM YourTable T2
INNER JOIN
(
SELECT T1.user_id
,MIN(T1.date) as first_date
FROM YourTable T1
) SQ
ON T2.user_id = sq.user_id
AND T2.date = sq.first_date

Related

Alternative way of full outer join

I am running this query
select * from
(select name, count(distinct id) as ids, date
from table1
group by name, date ) as tt
full outer join
(select st_name as name,count(distinct id) as ids, date
from table2
group by st_name, date) as ts
on tt.name= ts.name
and tt.ids = ts.ids
It runs successfully but I want to ask if there is an alternative more efficient way to run this query.

I assume that you want to get days when the two numbers are not the same (it seems like the most reasonable thing you want from such a query). So, this addresses that question.
FULL OUTER JOIN should be fine. But an alternative is to try UNION ALL and aggregation:
select name, sum(ids_1), sum(ids_2), date
from ((select name, count(distinct id) as ids_1, NULL as ids_2, date
from table1
group by name, date
)
union all
(select st_name as name, NULL, count(distinct id) as ids_2, date
from table2
group by st_name, date
)
)
group by name, date
having sum(ids_1) = sum(ids_2)

Selecting new distinct values over time (ORACLE SQL)

I want to select new distinct values and track them over time.
I have a table where each row represents a score awarded to a particular person.
- timestamp (when the score was awarded)
- name (which person received the score)
- score (what score the person received)
I want the result to look like:
The above table should be interpreted as how many new distinct names appear in each day.
Because 6-NOV is the first day, all the names are new hence 3 new names.
On 7-NOV Michael is the only new name so the value is 1.
On 8-NOV we have 3 new names (Don, Alex, Tina)
And on 9-NOV 0 new names appear a Jimmy and Sara have both been score before.
Thanks for the help

Consider:
select t.timestamp, count(*)
from (select distinct timestamp from mytable) t
left join (select name, min(timestamp) timestamp from mytablegroup by name) n
on n.timestamp = t.timestamp
group by t.timestamp
This works by generating a list of distinct timestamps from the table, and then joining it with an aggregate query that comptes the first timestamp of each name. The final step is aggregation in the outer query.

Find the minimum timestamp for each name and then count how many names in each timestamp
select timestamp, count(*) as new_names from
(select name, min(timestamp) as timestamp from mytable
group by name)
group by timestamp
order by timestamp
To include all days even without any names
select t.timestamp, nvl(new_names,0) as new_names from
(select timestamp, count(*) as new_names from
(select name, min(timestamp) as timestamp from mytable
group by name)
group by timestamp) c
RIGHT OUTER JOIN (select distinct timestamp from mytable) t
ON c.timestamp = t.timestamp
order by t.timestamp
To include dates that don't appear in the table at all you need to have a list of dates from a calendar somewhere and then put that table instead of the subquery I have RIGHT OUTER JOINed to
You can do this
select t.timestamp, nvl(new_names,0) as new_names from
(select timestamp, count(*) as new_names from
(select name, min(timestamp) as timestamp from mytable
group by name)
group by timestamp) c
RIGHT OUTER JOIN (
SELECT TRUNC (SYSDATE - ROWNUM - 1) dt
FROM DUAL CONNECT BY ROWNUM < 366
) t
ON c.timestamp = t.timestamp
order by t.timestamp
But you'd have to adjust the -1 and 366 to be the date range you wanted and it's much more standard to use a calendar that already exists in your database

With MIN() window function:
select tt.firstdate, count(distinct tt.name) "new names"
from (
select t.*, min(timestamp) over (partition by name) firstdate
from tablename t
) tt
group by tt.firstdate
If you also want the dates where there are not any new names:
select t.timestamp, count(distinct tt.name) "new names"
from tablename t
left join (
select t.*, min(timestamp) over (partition by name) firstdate
from tablename t
) tt on tt.firstdate = t.timestamp
group by t.timestamp

Count only first appearances, use row_number() at first:
select timestamp, sum(frst) as new_names
from (
select timestamp,
case when row_number()
over (partition by name order by timestamp) = 1
then 1 else 0 end frst
from scores)
group by timestamp

Yet, another opetion through right joining among distinctly selected timestamps and the least values for each names. This way also non-matched rows returned with zero counts as new_names column :
SELECT NVL(t1.timestamp,t2.timestamp) AS timestamp,
SUM(NVL2(t1.timestamp,1,0)) AS new_names
FROM (SELECT name, MIN(timestamp) AS timestamp from t group by name) t1
RIGHT JOIN (SELECT DISTINCT timestamp FROM t) t2
ON t2.timestamp = t1.timestamp
GROUP BY NVL(t1.timestamp,t2.timestamp)
ORDER BY timestamp
Demo

Select the Max date time for single User

I have a table like this,
Date User
15-06-2018 A
16-06-2018 A
15-06-2018 B
14-06-2018 C
16-06-2018 C
I want to get the output like this,
Date User
16-06-2018 A
15-06-2018 B
16-06-2018 C
I tried Select Max(date),User from Table group by User

Based on your comment, I assume you have duplicated results in those 80 columns when you group by them. Assuming so, here's one option using row_number to always return 1 row per user:
select *
from (
select *, row_number() over (partition by user order by date desc) rn
from yourtable
) t
where rn = 1

You can use correlation subquery :
select t.*
from table t
where date = (select max(t1.date)
from table t1
where t1.user = t.user
);
However, i would also recommend row_number() :
select top (1) with ties *
from table t
order by row_number() over (partition by user order by date desc);

You can also use a ranking function
SELECT User, Date
FROM
(
SELECT User, Date
, Row_id = Row_Number() OVER (Partition by User, ORDER BY User, Date desc)
FROM table
)q
WHERE Row_Id = 1

I would suggest you this
Select * from table t where exist
(Select 1 from
(Select user, max(date) as date from table) A
Where A.user = t.user and A.date = t.date )

"Group" some rows together before sorting (Oracle)

I'm using Oracle Database 11g.
I have a query that selects, among other things, an ID and a date from a table. Basically, what I want to do is keep the rows that have the same ID together, and then sort those "groups" of rows by the most recent date in the "group".
So if my original result was this:
ID Date
3 11/26/11
1 1/5/12
2 6/3/13
2 10/15/13
1 7/5/13
The output I'm hoping for is:
ID Date
3 11/26/11 <-- (Using this date for "group" ID = 3)
1 1/5/12
1 7/5/13 <-- (Using this date for "group" ID = 1)
2 6/3/13
2 10/15/13 <-- (Using this date for "group" ID = 2)
Is there any way to do this?

One way to get this is by using analytic functions; I don't have an example of that handy.
This is another way to get the specified result, without using an analytic function (this is ordering first by the most_recent_date for each ID, then by ID, then by Date):
SELECT t.ID
, t.Date
FROM mytable t
JOIN ( SELECT s.ID
, MAX(s.Date) AS most_recent_date
FROM mytable s
WHERE s.Date IS NOT NULL
GROUP BY s.ID
) r
ON r.ID = t.ID
ORDER
BY r.most_recent_date
, t.ID
, t.Date
The "trick" here is to return "most_recent_date" for each ID, and then join that to each row. The result can be ordered by that first, then by whatever else.
(I also think there's a way to get this same ordering using Analytic functions, but I don't have an example of that handy.)

You can use the MAX ... KEEP function with your aggregate to create your sort key:
with
sample_data as
(select 3 id, to_date('11/26/11','MM/DD/RR') date_col from dual union all
select 1, to_date('1/5/12','MM/DD/RR') date_col from dual union all
select 2, to_date('6/3/13','MM/DD/RR') date_col from dual union all
select 2, to_date('10/15/13','MM/DD/RR') date_col from dual union all
select 1, to_date('7/5/13','MM/DD/RR') date_col from dual)
select
id,
date_col,
-- For illustration purposes, does not need to be selected:
max(date_col) keep (dense_rank last order by date_col) over (partition by id) sort_key
from sample_data
order by max(date_col) keep (dense_rank last order by date_col) over (partition by id);

Here is the query using analytic functions:
select
id
, date_
, max(date_) over (partition by id) as max_date
from table_name
order by max_date, id
;

how do I query sql for a latest record date for each user

I have a table that is a collection entries as to when a user was logged on.
username, date, value
--------------------------
brad, 1/2/2010, 1.1
fred, 1/3/2010, 1.0
bob, 8/4/2009, 1.5
brad, 2/2/2010, 1.2
fred, 12/2/2009, 1.3
etc..
How do I create a query that would give me the latest date for each user?
Update: I forgot that I needed to have a value that goes along with the latest date.

This is the simple old school approach that works with almost any db engine, but you have to watch out for duplicates:
select t.username, t.date, t.value
from MyTable t
inner join (
select username, max(date) as MaxDate
from MyTable
group by username
) tm on t.username = tm.username and t.date = tm.MaxDate
Using window functions will avoid any possible issues with duplicate records due to duplicate date values, so if your db engine allows it you can do something like this:
select x.username, x.date, x.value
from (
select username, date, value,
row_number() over (partition by username order by date desc) as _rn
from MyTable
) x
where x._rn = 1

Using window functions (works in Oracle, Postgres 8.4, SQL Server 2005, DB2, Sybase, Firebird 3.0, MariaDB 10.3)
select * from (
select
username,
date,
value,
row_number() over(partition by username order by date desc) as rn
from
yourtable
) t
where t.rn = 1

I see most of the developers use an inline query without considering its impact on huge data.
Simply, you can achieve this by:
SELECT a.username, a.date, a.value
FROM myTable a
LEFT OUTER JOIN myTable b
ON a.username = b.username
AND a.date < b.date
WHERE b.username IS NULL
ORDER BY a.date desc;

From my experience the fastest way is to take each row for which there is no newer row in the table.
Another advantage is that the syntax used is very simple, and that the meaning of the query is rather easy to grasp (take all rows such that no newer row exists for the username being considered).
NOT EXISTS
SELECT username, value
FROM t
WHERE NOT EXISTS (
SELECT *
FROM t AS witness
WHERE witness.username = t.username AND witness.date > t.date
);
ROW_NUMBER
SELECT username, value
FROM (
SELECT username, value, row_number() OVER (PARTITION BY username ORDER BY date DESC) AS rn
FROM t
) t2
WHERE rn = 1
INNER JOIN
SELECT t.username, t.value
FROM t
INNER JOIN (
SELECT username, MAX(date) AS date
FROM t
GROUP BY username
) tm ON t.username = tm.username AND t.date = tm.date;
LEFT OUTER JOIN
SELECT username, value
FROM t
LEFT OUTER JOIN t AS w ON t.username = w.username AND t.date < w.date
WHERE w.username IS NULL

To get the whole row containing the max date for the user:
select username, date, value
from tablename where (username, date) in (
select username, max(date) as date
from tablename
group by username
)

SELECT *
FROM MyTable T1
WHERE date = (
SELECT max(date)
FROM MyTable T2
WHERE T1.username=T2.username
)

This one should give you the correct result for your edited question.
The sub-query makes sure to find only rows of the latest date, and the outer GROUP BY will take care of ties. When there are two entries for the same date for the same user, it will return the one with the highest value.
SELECT t.username, t.date, MAX( t.value ) value
FROM your_table t
JOIN (
SELECT username, MAX( date ) date
FROM your_table
GROUP BY username
) x ON ( x.username = t.username AND x.date = t.date )
GROUP BY t.username, t.date

If your database syntax supports it, then TOP 1 WITH TIES can be a lifesafer in combination with ROWNUMER.
With the example data you provided, use this query:
SELECT TOP 1 WITH TIES
username, date, value
FROM user_log_in_attempts
ORDER BY ROW_NUMBER() OVER (PARTITION BY username ORDER BY date DESC)
It yields:
username | date | value
-----------------------------
bob | 8/4/2009 | 1.5
brad | 2/2/2010 | 1.2
fred | 12/2/2009 | 1.3
Demo
How it works:
ROWNUMBER() OVER (PARTITION BY... ORDER BY...) For each username a list of rows is calculated from the youngest (rownumber=1) to the oldest (rownumber=high)
ORDER BY ROWNUMBER... sorts the youngest rows of each user to the top, followed by the second-youngest rows of each user, and so on
TOP 1 WITH TIES Because each user has a youngest row, those youngest rows are equal in the sense of the sorting criteria (all have rownumber=1). All those youngest rows will be returned.
Tested with SQL-Server.

SELECT DISTINCT Username, Dates,value
FROM TableName
WHERE Dates IN (SELECT MAX(Dates) FROM TableName GROUP BY Username)
Username Dates value
bob 2010-02-02 1.2
brad 2010-01-02 1.1
fred 2010-01-03 1.0

This is similar to one of the answers above, but in my opinion it is a lot simpler and tidier. Also, shows a good use for the cross apply statement. For SQL Server 2005 and above...
select
a.username,
a.date,
a.value,
from yourtable a
cross apply (select max(date) 'maxdate' from yourtable a1 where a.username=a1.username) b
where a.date=b.maxdate

You could also use analytical Rank Function
with temp as
(
select username, date, RANK() over (partition by username order by date desc) as rnk from t
)
select username, rnk from t where rnk = 1

SELECT MAX(DATE) AS dates
FROM assignment
JOIN paper_submission_detail ON assignment.PAPER_SUB_ID =
paper_submission_detail.PAPER_SUB_ID

SELECT Username, date, value
from MyTable mt
inner join (select username, max(date) date
from MyTable
group by username) sub
on sub.username = mt.username
and sub.date = mt.date
Would address the updated problem. It might not work so well on large tables, even with good indexing.

SELECT *
FROM ReportStatus c
inner join ( SELECT
MAX(Date) AS MaxDate
FROM ReportStatus ) m
on c.date = m.maxdate

For Oracle sorts the result set in descending order and takes the first record, so you will get the latest record:
select * from mytable
where rownum = 1
order by date desc

SELECT t1.username, t1.date, value
FROM MyTable as t1
INNER JOIN (SELECT username, MAX(date)
FROM MyTable
GROUP BY username) as t2 ON t2.username = t1.username AND t2.date = t1.date

Select * from table1 where lastest_date=(select Max(latest_date) from table1 where user=yourUserName)
Inner Query will return the latest date for the current user, Outer query will pull all the data according to the inner query result.

I used this way to take the last record for each user that I have on my table.
It was a query to get last location for salesman as per recent time detected on PDA devices.
CREATE FUNCTION dbo.UsersLocation()
RETURNS TABLE
AS
RETURN
Select GS.UserID, MAX(GS.UTCDateTime) 'LastDate'
From USERGPS GS
where year(GS.UTCDateTime) = YEAR(GETDATE())
Group By GS.UserID
GO
select gs.UserID, sl.LastDate, gs.Latitude , gs.Longitude
from USERGPS gs
inner join USER s on gs.SalesManNo = s.SalesmanNo
inner join dbo.UsersLocation() sl on gs.UserID= sl.UserID and gs.UTCDateTime = sl.LastDate
order by LastDate desc

My small compilation
self join better than nested select
but group by doesn't give you primary key which is preferable for join
this key can be given by partition by in conjunction with first_value (docs)
So, here is a query:
select
t.*
from
Table t inner join (
select distinct first_value(ID) over(partition by GroupColumn order by DateColumn desc) as ID
from Table
where FilterColumn = 'value'
) j on t.ID = j.ID
Pros:
Filter data with where statement using any column
select any columns from filtered rows
Cons:
Need MS SQL Server starting with 2012.

I did somewhat for my application as it:
Below is the query:
select distinct i.userId,i.statusCheck, l.userName from internetstatus
as i inner join login as l on i.userID=l.userID
where nowtime in((select max(nowtime) from InternetStatus group by userID));

Here's one way to return only the most recent record for each user in SQL Server:
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date DESC) AS rn
FROM your_table
)
SELECT *
FROM CTE
WHERE rn = 1;
This uses a common table expression (CTE) to assign a unique rn (row number) to each record for each user, based on the user_id and sorted in descending order by date. The final query then selects only the records with rn equal to 1, which represents the most recent record for each user.

SELECT * FROM TABEL1 WHERE DATE= (SELECT MAX(CREATED_DATE) FROM TABEL1)

You would use aggregate function MAX and GROUP BY
SELECT username, MAX(date), value FROM tablename GROUP BY username, value

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Use MIN() where you cannot GROUP? - sql

Try this query. I haven't tested it. SELECT date, duration FROM tablename n WHERE NOT EXISTS( SELECT date, user_id FROM tablename g WHERE n.user_id = g.user_id AND g.date < n.date );

Assuming there can only be a single record per user per date, it'd be something like this: select y.* from table t inner join ( select user_id, min(date) mindate from table group by user_id ) t1 on t.user_id = t1.user_id and t.date = t1.mindate

Use a sub-query to get the min date then join that back to the table to get all other relevant columns. SELECT T2.user_id ,T2.date ,T2.duration FROM YourTable T2 INNER JOIN ( SELECT T1.user_id ,MIN(T1.date) as first_date FROM YourTable T1 ) SQ ON T2.user_id = sq.user_id AND T2.date = sq.first_date

Related

Alternative way of full outer join

Selecting new distinct values over time (ORACLE SQL)

Select the Max date time for single User

"Group" some rows together before sorting (Oracle)

how do I query sql for a latest record date for each user

Categories

Resources