Alternative way of full outer join - sql

I am running this query
select * from
(select name, count(distinct id) as ids, date
from table1
group by name, date ) as tt
full outer join
(select st_name as name,count(distinct id) as ids, date
from table2
group by st_name, date) as ts
on tt.name= ts.name
and tt.ids = ts.ids
It runs successfully but I want to ask if there is an alternative more efficient way to run this query.

I assume that you want to get days when the two numbers are not the same (it seems like the most reasonable thing you want from such a query). So, this addresses that question.
FULL OUTER JOIN should be fine. But an alternative is to try UNION ALL and aggregation:
select name, sum(ids_1), sum(ids_2), date
from ((select name, count(distinct id) as ids_1, NULL as ids_2, date
from table1
group by name, date
)
union all
(select st_name as name, NULL, count(distinct id) as ids_2, date
from table2
group by st_name, date
)
)
group by name, date
having sum(ids_1) = sum(ids_2)

Related

Selecting new distinct values over time (ORACLE SQL)

I want to select new distinct values and track them over time.
I have a table where each row represents a score awarded to a particular person.
- timestamp (when the score was awarded)
- name (which person received the score)
- score (what score the person received)
I want the result to look like:
The above table should be interpreted as how many new distinct names appear in each day.
Because 6-NOV is the first day, all the names are new hence 3 new names.
On 7-NOV Michael is the only new name so the value is 1.
On 8-NOV we have 3 new names (Don, Alex, Tina)
And on 9-NOV 0 new names appear a Jimmy and Sara have both been score before.
Thanks for the help
Consider:
select t.timestamp, count(*)
from (select distinct timestamp from mytable) t
left join (select name, min(timestamp) timestamp from mytablegroup by name) n
on n.timestamp = t.timestamp
group by t.timestamp
This works by generating a list of distinct timestamps from the table, and then joining it with an aggregate query that comptes the first timestamp of each name. The final step is aggregation in the outer query.
Find the minimum timestamp for each name and then count how many names in each timestamp
select timestamp, count(*) as new_names from
(select name, min(timestamp) as timestamp from mytable
group by name)
group by timestamp
order by timestamp
To include all days even without any names
select t.timestamp, nvl(new_names,0) as new_names from
(select timestamp, count(*) as new_names from
(select name, min(timestamp) as timestamp from mytable
group by name)
group by timestamp) c
RIGHT OUTER JOIN (select distinct timestamp from mytable) t
ON c.timestamp = t.timestamp
order by t.timestamp
To include dates that don't appear in the table at all you need to have a list of dates from a calendar somewhere and then put that table instead of the subquery I have RIGHT OUTER JOINed to
You can do this
select t.timestamp, nvl(new_names,0) as new_names from
(select timestamp, count(*) as new_names from
(select name, min(timestamp) as timestamp from mytable
group by name)
group by timestamp) c
RIGHT OUTER JOIN (
SELECT TRUNC (SYSDATE - ROWNUM - 1) dt
FROM DUAL CONNECT BY ROWNUM < 366
) t
ON c.timestamp = t.timestamp
order by t.timestamp
But you'd have to adjust the -1 and 366 to be the date range you wanted and it's much more standard to use a calendar that already exists in your database
With MIN() window function:
select tt.firstdate, count(distinct tt.name) "new names"
from (
select t.*, min(timestamp) over (partition by name) firstdate
from tablename t
) tt
group by tt.firstdate
If you also want the dates where there are not any new names:
select t.timestamp, count(distinct tt.name) "new names"
from tablename t
left join (
select t.*, min(timestamp) over (partition by name) firstdate
from tablename t
) tt on tt.firstdate = t.timestamp
group by t.timestamp
Count only first appearances, use row_number() at first:
select timestamp, sum(frst) as new_names
from (
select timestamp,
case when row_number()
over (partition by name order by timestamp) = 1
then 1 else 0 end frst
from scores)
group by timestamp
Yet, another opetion through right joining among distinctly selected timestamps and the least values for each names. This way also non-matched rows returned with zero counts as new_names column :
SELECT NVL(t1.timestamp,t2.timestamp) AS timestamp,
SUM(NVL2(t1.timestamp,1,0)) AS new_names
FROM (SELECT name, MIN(timestamp) AS timestamp from t group by name) t1
RIGHT JOIN (SELECT DISTINCT timestamp FROM t) t2
ON t2.timestamp = t1.timestamp
GROUP BY NVL(t1.timestamp,t2.timestamp)
ORDER BY timestamp
Demo

Select multiple max values after GROUP BY query

Suppose I have a table look like this:
date ID income
0 9/1 C 10.40
1 9/3 A 33.90
2 9/3 B 29.10
3 9/4 C 19.30
4 9/4 B 17.80
5 9/5 B 9.55
6 9/5 C 11.10
7 9/5 A 13.10
8 9/7 A 29.10
9 9/7 B 29.10
I want to find out the ID who made the most income for each date. The most intuitive approach would be writing
SELECT ID, MAX(income) FROM table GROUP BY date
But there are two IDs who made the same MAX income on 9/7, I want to retain all ties on the same date, by using that query I will ignore one ID on 9/7, and 29.1 appears on 9/3 and 9/7, any other approach?
A join based approach doesn't have this problem, and would retain all records tied for the max income on a given date.
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT date, MAX(income) AS max_income
FROM yourTable
GROUP BY date
) t2
ON t1.date = t2.date AND t1.income = t2.max_income
ORDER BY
t1.date;
The way the above query works is to join the complete original table to a subquery which finds, for each date, the maximum income value. This has the effect of filtering off any record which did not have the max income on a given date. Pay close attention to the join condition, which has two components, the date, and the income.
If your database supports analytic function, we can also use RANK here:
SELECT date, ID, income
FROM
(
SELECT t.*, RANK() OVER (PARTITION BY date ORDER BY income DESC) rnk
FROM yourTable t
) t
WHERE rnk = 1
ORDER BY date;
one approach can be like below
with cte1
(
Select t1.*
FROM yourTable t1
INNER JOIN
(
SELECT date, MAX(income) AS max_income
FROM yourTable
GROUP BY date
) t2
ON t1.date = t2.date AND t1.income = t2.max_income
) select min(ID) as ID, date,income from cte1
group by date,income
As you not mentioned which id you need in case of two ID's(when income is same on a particular date) so i took minimum id among them when two id's income is same on a particular date But at the same time you may use max() function also
Try below using subquery and as you've tie for one date so take minimum ID which'll give you one id from date 9/7
select date,min(ID),income
from
(SELECT t1.date, t1.ID,t1.income
FROM tablename t1
INNER JOIN
(
SELECT date, MAX(income) AS mincome
FROM yourTable
GROUP BY date
) t2 ON t1.date = t2.date AND t1.income = t2.mincome
)X group by date,income

Use MIN() where you cannot GROUP?

I feel pretty dumb, but I get stuck with an apparently very easy query. I have something like this, where every row is a user that watched a movie:
user_id date duration
1 01-01-01 62m
1 03-01-01 95m
2 02-01-01 58m
2 06-01-01 25m
2 08-01-01 95m
3 03-01-01 96m
Now, what I would like to have is a table where I have the first movie watched by each user and its duration. The problem is if I use MIN() then I have to GROUP both user_id and duration. But if I GROUP for duration as well, then I am basically going to have the same table back. How can I solve the problem?
You can use a ranking function like ROW_NUMBER:
WITH CTE AS
(
SELECT rn = ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date ASC),
user_id, date, duration
FROM dbo.TableName
)
SELECT user_id, date, duration FROM CTE WHERE rn = 1
The advantage of ROW_NUMBER is that you can change the logic easily. For example, if you want to reverse the logic and get the row of the last watched film per user, you just have to change ORDER BY date ASC to ORDER BY date DESC.
The advantage of theCTE (common-table-expression) is that you can also use it to delete or update these records. Often used to delete or identify duplicates. So you can first select to see what will be deleted/updated before you execute it.
Try this query. I haven't tested it.
SELECT date, duration FROM tablename n
WHERE NOT EXISTS(
SELECT date, user_id FROM tablename g
WHERE n.user_id = g.user_id AND g.date < n.date
);
Assuming there can only be a single record per user per date, it'd be something like this:
select y.*
from table t
inner join (
select user_id, min(date) mindate
from table
group by user_id
) t1
on t.user_id = t1.user_id
and t.date = t1.mindate
You can use ROW_NUMBER() which is a ranking function that generates sequential number for every group based on the column that you want to sort. In this case, if there is a tie, only one record for every user is selected but if you want to select all of them, you need to use DENSE_RANK() rather than ROW_NUMBER()
SELECT user_id, date, duration
FROM
(
SELECT user_id, date, duration,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date) rn
FROM tableName
) a
WHERE rn = 1
this also assumes that the data type of column date is DATE
If you are using SQL Server 2005 or later, you can use windowing functions.
SELECT *
FROM
(
SELECT user_id, date, duration, MIN(date) OVER(PARTITION BY user_id) AS MIN_DATE
FROM MY_TABLE
) AS RESULTS
WHERE date = MIN_DATE
The over clause and partion by will "group by" the user_id and select the min date per user_id without eliminating any rows. Then you select from the table where the date is equal to the min date and you are left with the first date per user_id. This is a common trick once you know about windowing functions.
If you want the first watch_date per user, there should be no date before this date for this user:
SELECT *
FROM watched_movies wm
WHERE NOT EXISTS (
SELECT *
FROM watched_movies nx
WHERE nx.user_id = wm.user_id
AND nx.watch_date < wm.watch_date
);
Note: I replaced the date column by watch_date, since date is a reserved word (type name).
This should give you the duration of the first movie watched on the earliest date:
SELECT a.user_id, b.date, a.duration
FROM table a
INNER JOIN (SELECT user_id,min(date) date FROM table GROUP BY user_id) b ON a.user_id = b.user_id AND a.date = b.date
INNER JOIN (SELECT user_id,date,min(session_id) FROM table GROUP BY user_id, date) c ON b.user_id = c.user_id AND b.date = c.date AND a.session_id = c.session_id
Try this:
WITH TABLE1
AS (SELECT
'1' AS USER_ID,
'01-01-01' AS DT,
62 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'1' AS USER_ID,
'03-01-01' AS DT,
95 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'2' AS USER_ID,
'02-01-01' AS DT,
58 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'2' AS USER_ID,
'06-01-01' AS DT,
25 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'2' AS USER_ID,
'08-01-01' AS DT,
95 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'3' AS USER_ID,
'03-01-01' AS DT,
96 AS DURATION
FROM
DUAL)
SELECT
*
FROM
(SELECT
USER_ID,
DT,
DURATION,
RANK ( ) OVER (PARTITION BY USER_ID ORDER BY DT ASC) AS ROW_RANK
FROM
TABLE1)
WHERE
ROW_RANK = 1
Use a sub-query to get the min date then join that back to the table to get all other relevant columns.
SELECT T2.user_id
,T2.date
,T2.duration
FROM YourTable T2
INNER JOIN
(
SELECT T1.user_id
,MIN(T1.date) as first_date
FROM YourTable T1
) SQ
ON T2.user_id = sq.user_id
AND T2.date = sq.first_date

PostgreSQL MAX and GROUP BY

I have a table with id, year and count.
I want to get the MAX(count) for each id and keep the year when it happens, so I make this query:
SELECT id, year, MAX(count)
FROM table
GROUP BY id;
Unfortunately, it gives me an error:
ERROR: column "table.year" must appear in the GROUP BY clause or be
used in an aggregate function
So I try:
SELECT id, year, MAX(count)
FROM table
GROUP BY id, year;
But then, it doesn't do MAX(count), it just shows the table as it is. I suppose because when grouping by year and id, it gets the max for the id of that specific year.
So, how can I write that query? I want to get the id´s MAX(count) and the year when that happens.
The shortest (and possibly fastest) query would be with DISTINCT ON, a PostgreSQL extension of the SQL standard DISTINCT clause:
SELECT DISTINCT ON (1)
id, count, year
FROM tbl
ORDER BY 1, 2 DESC, 3;
The numbers refer to ordinal positions in the SELECT list. You can spell out column names for clarity:
SELECT DISTINCT ON (id)
id, count, year
FROM tbl
ORDER BY id, count DESC, year;
The result is ordered by id etc. which may or may not be welcome. It's better than "undefined" in any case.
It also breaks ties (when multiple years share the same maximum count) in a well defined way: pick the earliest year. If you don't care, drop year from the ORDER BY. Or pick the latest year with year DESC.
For many rows per id, other query techniques are (much) faster. See:
Select first row in each GROUP BY group?
Optimize GROUP BY query to retrieve latest row per user
select *
from (
select id,
year,
thing,
max(thing) over (partition by id) as max_thing
from the_table
) t
where thing = max_thing
or:
select t1.id,
t1.year,
t1.thing
from the_table t1
where t1.thing = (select max(t2.thing)
from the_table t2
where t2.id = t1.id);
or
select t1.id,
t1.year,
t1.thing
from the_table t1
join (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
) t on t.id = t1.id and t.max_thing = t1.thing
or (same as the previous with a different notation)
with max_stuff as (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
)
select t1.id,
t1.year,
t1.thing
from the_table t1
join max_stuff t2
on t1.id = t2.id
and t1.thing = t2.max_thing

how do I query sql for a latest record date for each user

I have a table that is a collection entries as to when a user was logged on.
username, date, value
--------------------------
brad, 1/2/2010, 1.1
fred, 1/3/2010, 1.0
bob, 8/4/2009, 1.5
brad, 2/2/2010, 1.2
fred, 12/2/2009, 1.3
etc..
How do I create a query that would give me the latest date for each user?
Update: I forgot that I needed to have a value that goes along with the latest date.
This is the simple old school approach that works with almost any db engine, but you have to watch out for duplicates:
select t.username, t.date, t.value
from MyTable t
inner join (
select username, max(date) as MaxDate
from MyTable
group by username
) tm on t.username = tm.username and t.date = tm.MaxDate
Using window functions will avoid any possible issues with duplicate records due to duplicate date values, so if your db engine allows it you can do something like this:
select x.username, x.date, x.value
from (
select username, date, value,
row_number() over (partition by username order by date desc) as _rn
from MyTable
) x
where x._rn = 1
Using window functions (works in Oracle, Postgres 8.4, SQL Server 2005, DB2, Sybase, Firebird 3.0, MariaDB 10.3)
select * from (
select
username,
date,
value,
row_number() over(partition by username order by date desc) as rn
from
yourtable
) t
where t.rn = 1
I see most of the developers use an inline query without considering its impact on huge data.
Simply, you can achieve this by:
SELECT a.username, a.date, a.value
FROM myTable a
LEFT OUTER JOIN myTable b
ON a.username = b.username
AND a.date < b.date
WHERE b.username IS NULL
ORDER BY a.date desc;
From my experience the fastest way is to take each row for which there is no newer row in the table.
Another advantage is that the syntax used is very simple, and that the meaning of the query is rather easy to grasp (take all rows such that no newer row exists for the username being considered).
NOT EXISTS
SELECT username, value
FROM t
WHERE NOT EXISTS (
SELECT *
FROM t AS witness
WHERE witness.username = t.username AND witness.date > t.date
);
ROW_NUMBER
SELECT username, value
FROM (
SELECT username, value, row_number() OVER (PARTITION BY username ORDER BY date DESC) AS rn
FROM t
) t2
WHERE rn = 1
INNER JOIN
SELECT t.username, t.value
FROM t
INNER JOIN (
SELECT username, MAX(date) AS date
FROM t
GROUP BY username
) tm ON t.username = tm.username AND t.date = tm.date;
LEFT OUTER JOIN
SELECT username, value
FROM t
LEFT OUTER JOIN t AS w ON t.username = w.username AND t.date < w.date
WHERE w.username IS NULL
To get the whole row containing the max date for the user:
select username, date, value
from tablename where (username, date) in (
select username, max(date) as date
from tablename
group by username
)
SELECT *
FROM MyTable T1
WHERE date = (
SELECT max(date)
FROM MyTable T2
WHERE T1.username=T2.username
)
This one should give you the correct result for your edited question.
The sub-query makes sure to find only rows of the latest date, and the outer GROUP BY will take care of ties. When there are two entries for the same date for the same user, it will return the one with the highest value.
SELECT t.username, t.date, MAX( t.value ) value
FROM your_table t
JOIN (
SELECT username, MAX( date ) date
FROM your_table
GROUP BY username
) x ON ( x.username = t.username AND x.date = t.date )
GROUP BY t.username, t.date
If your database syntax supports it, then TOP 1 WITH TIES can be a lifesafer in combination with ROWNUMER.
With the example data you provided, use this query:
SELECT TOP 1 WITH TIES
username, date, value
FROM user_log_in_attempts
ORDER BY ROW_NUMBER() OVER (PARTITION BY username ORDER BY date DESC)
It yields:
username | date | value
-----------------------------
bob | 8/4/2009 | 1.5
brad | 2/2/2010 | 1.2
fred | 12/2/2009 | 1.3
Demo
How it works:
ROWNUMBER() OVER (PARTITION BY... ORDER BY...) For each username a list of rows is calculated from the youngest (rownumber=1) to the oldest (rownumber=high)
ORDER BY ROWNUMBER... sorts the youngest rows of each user to the top, followed by the second-youngest rows of each user, and so on
TOP 1 WITH TIES Because each user has a youngest row, those youngest rows are equal in the sense of the sorting criteria (all have rownumber=1). All those youngest rows will be returned.
Tested with SQL-Server.
SELECT DISTINCT Username, Dates,value
FROM TableName
WHERE Dates IN (SELECT MAX(Dates) FROM TableName GROUP BY Username)
Username Dates value
bob 2010-02-02 1.2
brad 2010-01-02 1.1
fred 2010-01-03 1.0
This is similar to one of the answers above, but in my opinion it is a lot simpler and tidier. Also, shows a good use for the cross apply statement. For SQL Server 2005 and above...
select
a.username,
a.date,
a.value,
from yourtable a
cross apply (select max(date) 'maxdate' from yourtable a1 where a.username=a1.username) b
where a.date=b.maxdate
You could also use analytical Rank Function
with temp as
(
select username, date, RANK() over (partition by username order by date desc) as rnk from t
)
select username, rnk from t where rnk = 1
SELECT MAX(DATE) AS dates
FROM assignment
JOIN paper_submission_detail ON assignment.PAPER_SUB_ID =
paper_submission_detail.PAPER_SUB_ID
SELECT Username, date, value
from MyTable mt
inner join (select username, max(date) date
from MyTable
group by username) sub
on sub.username = mt.username
and sub.date = mt.date
Would address the updated problem. It might not work so well on large tables, even with good indexing.
SELECT *
FROM ReportStatus c
inner join ( SELECT
MAX(Date) AS MaxDate
FROM ReportStatus ) m
on c.date = m.maxdate
For Oracle sorts the result set in descending order and takes the first record, so you will get the latest record:
select * from mytable
where rownum = 1
order by date desc
SELECT t1.username, t1.date, value
FROM MyTable as t1
INNER JOIN (SELECT username, MAX(date)
FROM MyTable
GROUP BY username) as t2 ON t2.username = t1.username AND t2.date = t1.date
Select * from table1 where lastest_date=(select Max(latest_date) from table1 where user=yourUserName)
Inner Query will return the latest date for the current user, Outer query will pull all the data according to the inner query result.
I used this way to take the last record for each user that I have on my table.
It was a query to get last location for salesman as per recent time detected on PDA devices.
CREATE FUNCTION dbo.UsersLocation()
RETURNS TABLE
AS
RETURN
Select GS.UserID, MAX(GS.UTCDateTime) 'LastDate'
From USERGPS GS
where year(GS.UTCDateTime) = YEAR(GETDATE())
Group By GS.UserID
GO
select gs.UserID, sl.LastDate, gs.Latitude , gs.Longitude
from USERGPS gs
inner join USER s on gs.SalesManNo = s.SalesmanNo
inner join dbo.UsersLocation() sl on gs.UserID= sl.UserID and gs.UTCDateTime = sl.LastDate
order by LastDate desc
My small compilation
self join better than nested select
but group by doesn't give you primary key which is preferable for join
this key can be given by partition by in conjunction with first_value (docs)
So, here is a query:
select
t.*
from
Table t inner join (
select distinct first_value(ID) over(partition by GroupColumn order by DateColumn desc) as ID
from Table
where FilterColumn = 'value'
) j on t.ID = j.ID
Pros:
Filter data with where statement using any column
select any columns from filtered rows
Cons:
Need MS SQL Server starting with 2012.
I did somewhat for my application as it:
Below is the query:
select distinct i.userId,i.statusCheck, l.userName from internetstatus
as i inner join login as l on i.userID=l.userID
where nowtime in((select max(nowtime) from InternetStatus group by userID));
Here's one way to return only the most recent record for each user in SQL Server:
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date DESC) AS rn
FROM your_table
)
SELECT *
FROM CTE
WHERE rn = 1;
This uses a common table expression (CTE) to assign a unique rn (row number) to each record for each user, based on the user_id and sorted in descending order by date. The final query then selects only the records with rn equal to 1, which represents the most recent record for each user.
SELECT * FROM TABEL1 WHERE DATE= (SELECT MAX(CREATED_DATE) FROM TABEL1)
You would use aggregate function MAX and GROUP BY
SELECT username, MAX(date), value FROM tablename GROUP BY username, value