Custom GROUP BY clause - sql

SAMPLE DATA
Suppose I have table like this:
No Company Vendor Code Date
1 C1 V1 C1 2016-03-08
1 C1 V1 C1 2016-03-07
1 C1 V1 C2 2016-03-06
DESIRED OUPUT
Desired output should be:
No Company Vendor Code Date
1 C1 V1 C1 2016-03-08
It should take max Date for No, Company, Vendor (group by these columns). But shouldn't group by Code, It have to be taken for that Date.
QUERY
SQL query like:
.....
LEFT JOIN (
SELECT No_, Company, Vendor, Code, MAX(Date)
FROM tbl
GROUP BY No_, Company, Vendor, Code
) t2 ON t1.Company = t2.Company and t1.No_ = t2.No_
.....
OUTPUT FOR NOW
But I got output for now:
No Company Vendor Code Date
1 C1 V1 C1 2016-03-08
1 C1 V1 C2 2016-03-06
That because Code records are different, but It should take C1 code in this case (because No, Company, Vendor match)
WHAT I'VE TRIED
I've tried to remove Code from GROUP BY clause and use SELECT MAX(Code)..., but this is wrong that because It take higher Code by alphabetic.
Have you ideas how can I achieve It? If something not clear I can explain more.

If you don't have any identity column for your table then each row is identified by all column values combination it has. That brings us weird on statement. It includes all columns we are grouping by and a date column which is max for given tuple (No_, Company, Vendor).
select t1.No_, t1.Company, t1.Vendor, t1.Code, t1.Date
from tbl t1
join (select No_, Company, Vendor, MAX(Date) as Date
from tbl
group by No_, Company, Vendor) t2
on t1.No_ = t2.No_ and
t1.Company = t2.Company and
t1.Vendor = t2.Vendor and
t1.Date = t2.Date
Take a look at this similar question.
Edit
Thank you for an answer, but this returning duplicates. Suppose that there can be rows with equal No, Company, Vendor and Date, some other columns are different, but no care. So with INNER SELECT everything fine, It returning distinct values, but problem accured when joining t1, that because It have multiple values.
Then you might be interested in such tsql constructions as rank or row_number. Take a look at Ullas' answer. Try rank as well as it can give slightly different output which might fit your needs.

You could give a row_number partitioned by No, Vendor and Date and order by descending order of date.
Query
;with cte as (
select rn = row_number() over(
partition by [No], Company, Vendor
order by [Date] desc
), *
from tbl
)
select [No], Company, Vendor, Code, [Date] from cte
where rn = 1;

If 1 date only can have 1 record, then you can Query it by search the max date first, then check it.
select No_, Company, Vendor, Code, Date
FROM tbl
where Date in
(select MAX(Date) from tbl GROUP BY No_, Company, Vendor)
if there is more than 1 row that could have the same date, then you could use partition
with cte as
(
select *, ROW_NUMBER() over(partition by No_, Company, Vendor order by Date DESC) as rn
from tbl
)
select No_, Company, Vendor, Code, Date
from cte
where rn=1

A Common Table Expression will do it for you.
WITH cte(N,C,V,D)
AS
(
SELECT t1.[No]
,t1.[Company]
,t1.[Vendor]
,MAX(t1.[Date])
FROM [MyTest] t1
GROUP BY t1.[No]
,t1.[Company]
,t1.[Vendor]
)
SELECT N,C,V,t2.Code,D
FROM cte c
INNER JOIN MyTest t2 ON c.N = t2.No AND c.C = t2.Company AND c.V = t2.Vendor AND c.D = t2.Date

Related

Select multiple max values after GROUP BY query

Suppose I have a table look like this:
date ID income
0 9/1 C 10.40
1 9/3 A 33.90
2 9/3 B 29.10
3 9/4 C 19.30
4 9/4 B 17.80
5 9/5 B 9.55
6 9/5 C 11.10
7 9/5 A 13.10
8 9/7 A 29.10
9 9/7 B 29.10
I want to find out the ID who made the most income for each date. The most intuitive approach would be writing
SELECT ID, MAX(income) FROM table GROUP BY date
But there are two IDs who made the same MAX income on 9/7, I want to retain all ties on the same date, by using that query I will ignore one ID on 9/7, and 29.1 appears on 9/3 and 9/7, any other approach?
A join based approach doesn't have this problem, and would retain all records tied for the max income on a given date.
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT date, MAX(income) AS max_income
FROM yourTable
GROUP BY date
) t2
ON t1.date = t2.date AND t1.income = t2.max_income
ORDER BY
t1.date;
The way the above query works is to join the complete original table to a subquery which finds, for each date, the maximum income value. This has the effect of filtering off any record which did not have the max income on a given date. Pay close attention to the join condition, which has two components, the date, and the income.
If your database supports analytic function, we can also use RANK here:
SELECT date, ID, income
FROM
(
SELECT t.*, RANK() OVER (PARTITION BY date ORDER BY income DESC) rnk
FROM yourTable t
) t
WHERE rnk = 1
ORDER BY date;
one approach can be like below
with cte1
(
Select t1.*
FROM yourTable t1
INNER JOIN
(
SELECT date, MAX(income) AS max_income
FROM yourTable
GROUP BY date
) t2
ON t1.date = t2.date AND t1.income = t2.max_income
) select min(ID) as ID, date,income from cte1
group by date,income
As you not mentioned which id you need in case of two ID's(when income is same on a particular date) so i took minimum id among them when two id's income is same on a particular date But at the same time you may use max() function also
Try below using subquery and as you've tie for one date so take minimum ID which'll give you one id from date 9/7
select date,min(ID),income
from
(SELECT t1.date, t1.ID,t1.income
FROM tablename t1
INNER JOIN
(
SELECT date, MAX(income) AS mincome
FROM yourTable
GROUP BY date
) t2 ON t1.date = t2.date AND t1.income = t2.mincome
)X group by date,income

Group-by statement doesn't work

So currently, I have the following table:
ID, Name, Code, Date
1 AB x1 01/03/2014
1 AB x2 01/04/2014
1 AB x3 01/05/2014
2 BC x3 01/05/2014
2 BC x5 01/06/2014
3 CD x1 01/06/2014
I want the following output:
ID, Name, Code, Date
1 AB x3 01/05/2014
2 BC x5 01/06/2014
3 CD x1 01/06/2014
So basically, I just want the latest date, without caring for the code.
In my code, I have
select id, name, code, max(date)
group by id, name, code
But the group by does not work as it's also going to take the code into consideration, thus I don't get just the latest date. Also, I can't leave code in the group by statement as it'll give me an error.
How do I use a group by without including code?
I'm using PL/SQL developer as IDE.
select id, name, code, date
from (
select id, name, code,
date,
max(date) over (partition by id) as max_date
from the_table
)
where date = max_date;
If you want to pick exactly one of the dates if there are multiple "max dates" you can use row_number() instead:
select id, name, code, date
from (
select id, name, code,
date,
row_number() over (partition by id order by date desc) as rn
from the_table
)
where rn = 1;
Btw: date is a horrible name for a column. For one because it's also the name of a data type but more importantly because it does not document at all what the column contains. An "end date"? A "start date"? A "due date"? ...
What you want is latest updated record right?
select t1.*
from table t1
inner join (select id, name, max(date) as latest_date
from table
group by id, name) t2 on t1.date = t2.latest_date
and t1.id = t2.id and t1.name = t2.name
It will be good to have index on date column
I assume you want to get whatever the code is that is on the row that has the max date. If you truly don't care what code gets returned, just use an aggregate function on it like max(code).
Otherwise, you can do this:
SELECT t1.id, t1.name, t2.code, t2.date
FROM MyTable t1
CROSS JOIN (
SELECT TOP 1 code, date
FROM MyTable t3
WHERE t3.id=t1.id
AND t3.name=t1.name
ORDER BY t3.date DESC
) t2
I'm not sure if CROSS JOIN is PL/SQL compatible, but you can find the equivalent, I'm sure.

How to join table to itself and select max values in SQL

I have a contracts table:
contractId date price partId
1 20120121 10 1
2 20110130 9 1
3 20130101 15 2
4 20110101 20 2
The contract with greatest date being the active contract (don't blame me, I blame infor for creating xpps)
I need to create query to see only active contracts (one contract per part, the contract with highest date).
So the result of the query should be like this:
contractId date price partId
1 20120121 10 1
3 20130101 15 2
I am out of ideas here, I tried self joining the table, I tried aggregation functions, but I can't figure it out. If anyone would have any idea, please share them with me..
this will work on almost all RDBMs,
SELECT a.*
FROM tableName A
INNER JOIN
(
SELECT partID, MAX(date) maxDate
FROM tableName
GROUP BY partID
) B on a.partID = b.partID AND
a.date = b.maxDate
SQLFiddle Demo
if your RDBMS supports Window Function,
SELECT contractId ,date, price,partId
FROM
(
SELECT contractId ,date, price,partId,
ROW_NUMBER() OVER (PARTITION BY PartID
ORDER BY date DESC) rn
FROM tableName
) s
WHERE rn = 1
SQLFiddle Demo
SELECT c.*
FROM contracts c
INNER JOIN
(
SELECT partId, MAX([date]) AS MaxDate
FROM contracts
GROUP BY partID
) MaxDate
ON c.partId = MaxDate.partID
AND c.[date] = MaxDate.[date]
This is the fast self join:
SELECT c1.* FROM contracts c1
LEFT OUTER JOIN contracts c2 ON c2.partId = 1.partId AND c1.date < c2.date
WHERE c2.contractId IS NULL
The use of sub selects (nested selects) tend to be slower.

SQL Group by one column

so I have this table:
ID INITIAL_DATE TAX
A 18-02-2012 105
A 19-02-2012 95
A 20-02-2012 105
A 21-02-2012 100
B 18-02-2012 135
B 19-02-2012 150
B 20-02-2012 130
B 21-02-2012 140
and what I need is, for each distinct ID, the highest TAX ever. And if that TAX occurs twice I want the record with the highest INITIAL_DATE.
This is the query I have:
SELECT ID, MAX (initial_date) initial_date, tax
FROM t t0
WHERE t0.tax = (SELECT MAX (t1.tax)
FROM t t1
WHERE t1.ID = t0.ID
GROUP BY ID)
GROUP BY ID, tax
ORDER BY id, initial_date, tax
but I want to believe there is a better way of grouping these records.
Is there any way of NOT grouping by all the columns in the SELECT?
Have you tried with analytical functions?:
SELECT t0.ID, t0.INITIAL_DATE, t0.TAX
FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY TAX DESC , INITIAL_DATE DESC) Corr
FROM t) t0
WHERE t0.Corr = 1
As far as I know all the columns in a SELECT that are not aggregated in one or another way, must be part of the GROUP BY statement.
i have tested and this is a solution that works
SELECT t1.ID ,
MAX(t1.data) ,
t1.tax FROM test t1
INNER JOIN ( SELECT ID ,
MAX(tax) as maxtax
FROM test
GROUP BY ID
) t2 ON t1.ID = t2.ID
AND t1.tax = t2.maxtax GROUP BY t1.ID ,t1.tax

how do I query sql for a latest record date for each user

I have a table that is a collection entries as to when a user was logged on.
username, date, value
--------------------------
brad, 1/2/2010, 1.1
fred, 1/3/2010, 1.0
bob, 8/4/2009, 1.5
brad, 2/2/2010, 1.2
fred, 12/2/2009, 1.3
etc..
How do I create a query that would give me the latest date for each user?
Update: I forgot that I needed to have a value that goes along with the latest date.
This is the simple old school approach that works with almost any db engine, but you have to watch out for duplicates:
select t.username, t.date, t.value
from MyTable t
inner join (
select username, max(date) as MaxDate
from MyTable
group by username
) tm on t.username = tm.username and t.date = tm.MaxDate
Using window functions will avoid any possible issues with duplicate records due to duplicate date values, so if your db engine allows it you can do something like this:
select x.username, x.date, x.value
from (
select username, date, value,
row_number() over (partition by username order by date desc) as _rn
from MyTable
) x
where x._rn = 1
Using window functions (works in Oracle, Postgres 8.4, SQL Server 2005, DB2, Sybase, Firebird 3.0, MariaDB 10.3)
select * from (
select
username,
date,
value,
row_number() over(partition by username order by date desc) as rn
from
yourtable
) t
where t.rn = 1
I see most of the developers use an inline query without considering its impact on huge data.
Simply, you can achieve this by:
SELECT a.username, a.date, a.value
FROM myTable a
LEFT OUTER JOIN myTable b
ON a.username = b.username
AND a.date < b.date
WHERE b.username IS NULL
ORDER BY a.date desc;
From my experience the fastest way is to take each row for which there is no newer row in the table.
Another advantage is that the syntax used is very simple, and that the meaning of the query is rather easy to grasp (take all rows such that no newer row exists for the username being considered).
NOT EXISTS
SELECT username, value
FROM t
WHERE NOT EXISTS (
SELECT *
FROM t AS witness
WHERE witness.username = t.username AND witness.date > t.date
);
ROW_NUMBER
SELECT username, value
FROM (
SELECT username, value, row_number() OVER (PARTITION BY username ORDER BY date DESC) AS rn
FROM t
) t2
WHERE rn = 1
INNER JOIN
SELECT t.username, t.value
FROM t
INNER JOIN (
SELECT username, MAX(date) AS date
FROM t
GROUP BY username
) tm ON t.username = tm.username AND t.date = tm.date;
LEFT OUTER JOIN
SELECT username, value
FROM t
LEFT OUTER JOIN t AS w ON t.username = w.username AND t.date < w.date
WHERE w.username IS NULL
To get the whole row containing the max date for the user:
select username, date, value
from tablename where (username, date) in (
select username, max(date) as date
from tablename
group by username
)
SELECT *
FROM MyTable T1
WHERE date = (
SELECT max(date)
FROM MyTable T2
WHERE T1.username=T2.username
)
This one should give you the correct result for your edited question.
The sub-query makes sure to find only rows of the latest date, and the outer GROUP BY will take care of ties. When there are two entries for the same date for the same user, it will return the one with the highest value.
SELECT t.username, t.date, MAX( t.value ) value
FROM your_table t
JOIN (
SELECT username, MAX( date ) date
FROM your_table
GROUP BY username
) x ON ( x.username = t.username AND x.date = t.date )
GROUP BY t.username, t.date
If your database syntax supports it, then TOP 1 WITH TIES can be a lifesafer in combination with ROWNUMER.
With the example data you provided, use this query:
SELECT TOP 1 WITH TIES
username, date, value
FROM user_log_in_attempts
ORDER BY ROW_NUMBER() OVER (PARTITION BY username ORDER BY date DESC)
It yields:
username | date | value
-----------------------------
bob | 8/4/2009 | 1.5
brad | 2/2/2010 | 1.2
fred | 12/2/2009 | 1.3
Demo
How it works:
ROWNUMBER() OVER (PARTITION BY... ORDER BY...) For each username a list of rows is calculated from the youngest (rownumber=1) to the oldest (rownumber=high)
ORDER BY ROWNUMBER... sorts the youngest rows of each user to the top, followed by the second-youngest rows of each user, and so on
TOP 1 WITH TIES Because each user has a youngest row, those youngest rows are equal in the sense of the sorting criteria (all have rownumber=1). All those youngest rows will be returned.
Tested with SQL-Server.
SELECT DISTINCT Username, Dates,value
FROM TableName
WHERE Dates IN (SELECT MAX(Dates) FROM TableName GROUP BY Username)
Username Dates value
bob 2010-02-02 1.2
brad 2010-01-02 1.1
fred 2010-01-03 1.0
This is similar to one of the answers above, but in my opinion it is a lot simpler and tidier. Also, shows a good use for the cross apply statement. For SQL Server 2005 and above...
select
a.username,
a.date,
a.value,
from yourtable a
cross apply (select max(date) 'maxdate' from yourtable a1 where a.username=a1.username) b
where a.date=b.maxdate
You could also use analytical Rank Function
with temp as
(
select username, date, RANK() over (partition by username order by date desc) as rnk from t
)
select username, rnk from t where rnk = 1
SELECT MAX(DATE) AS dates
FROM assignment
JOIN paper_submission_detail ON assignment.PAPER_SUB_ID =
paper_submission_detail.PAPER_SUB_ID
SELECT Username, date, value
from MyTable mt
inner join (select username, max(date) date
from MyTable
group by username) sub
on sub.username = mt.username
and sub.date = mt.date
Would address the updated problem. It might not work so well on large tables, even with good indexing.
SELECT *
FROM ReportStatus c
inner join ( SELECT
MAX(Date) AS MaxDate
FROM ReportStatus ) m
on c.date = m.maxdate
For Oracle sorts the result set in descending order and takes the first record, so you will get the latest record:
select * from mytable
where rownum = 1
order by date desc
SELECT t1.username, t1.date, value
FROM MyTable as t1
INNER JOIN (SELECT username, MAX(date)
FROM MyTable
GROUP BY username) as t2 ON t2.username = t1.username AND t2.date = t1.date
Select * from table1 where lastest_date=(select Max(latest_date) from table1 where user=yourUserName)
Inner Query will return the latest date for the current user, Outer query will pull all the data according to the inner query result.
I used this way to take the last record for each user that I have on my table.
It was a query to get last location for salesman as per recent time detected on PDA devices.
CREATE FUNCTION dbo.UsersLocation()
RETURNS TABLE
AS
RETURN
Select GS.UserID, MAX(GS.UTCDateTime) 'LastDate'
From USERGPS GS
where year(GS.UTCDateTime) = YEAR(GETDATE())
Group By GS.UserID
GO
select gs.UserID, sl.LastDate, gs.Latitude , gs.Longitude
from USERGPS gs
inner join USER s on gs.SalesManNo = s.SalesmanNo
inner join dbo.UsersLocation() sl on gs.UserID= sl.UserID and gs.UTCDateTime = sl.LastDate
order by LastDate desc
My small compilation
self join better than nested select
but group by doesn't give you primary key which is preferable for join
this key can be given by partition by in conjunction with first_value (docs)
So, here is a query:
select
t.*
from
Table t inner join (
select distinct first_value(ID) over(partition by GroupColumn order by DateColumn desc) as ID
from Table
where FilterColumn = 'value'
) j on t.ID = j.ID
Pros:
Filter data with where statement using any column
select any columns from filtered rows
Cons:
Need MS SQL Server starting with 2012.
I did somewhat for my application as it:
Below is the query:
select distinct i.userId,i.statusCheck, l.userName from internetstatus
as i inner join login as l on i.userID=l.userID
where nowtime in((select max(nowtime) from InternetStatus group by userID));
Here's one way to return only the most recent record for each user in SQL Server:
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date DESC) AS rn
FROM your_table
)
SELECT *
FROM CTE
WHERE rn = 1;
This uses a common table expression (CTE) to assign a unique rn (row number) to each record for each user, based on the user_id and sorted in descending order by date. The final query then selects only the records with rn equal to 1, which represents the most recent record for each user.
SELECT * FROM TABEL1 WHERE DATE= (SELECT MAX(CREATED_DATE) FROM TABEL1)
You would use aggregate function MAX and GROUP BY
SELECT username, MAX(date), value FROM tablename GROUP BY username, value