Aggregation of unique rows in SQL - sql

I have the following table of unique rows:
Name change Number_of_Sales
Soby 2.22 8370
Sollerod -1.06 11287
Sonderborg 2.60 6343
Sonderhald 11.43 1623
Sonderhald 10.93 2098
I want to select name and change, excluding duplicate Names, so that Sonderhald only occurs once. I want the Sonderhald with the maximum Number_of_Sales.
How can I do this in SQL Server?
Thanks

SELECT t.name, t.change, t.number_of_sales
FROM your_table t
INNER JOIN (
SELECT tt.name, MAX(tt.number_of_sales) AS max_number_of_sales
FROM your_table tt
GROUP BY tt.name
) tm ON t.name = tm.name AND t.number_of_sales = tm.max_number_of_sales

You can use a common table expression to do this:
;
WITH cte
AS ( SELECT Name,
Change,
Number_of_Sales,
ROW_NUMBER() OVER ( PARTITION BY name ORDER BY number_of_sales DESC ) AS RowNum
FROM your_table
)
SELECT Name,
Change,
Number_of_Sales
FROM cte
WHERE RowNum = 1
http://technet.microsoft.com/en-us/library/ms190766%28v=sql.105%29.aspx

Related

Group by column and get max and min id on sql

I got a table with theses Column :
ID_REAL,DATE_REAL,NAME_REAL
I want to make a query to get result like this with a group by on the name
NAME | MAX(DATE_REAL) | ID_REAL of the MAX(DATE_REAL) | MIN(DATE_REAL) | ID_REAL of the MIN(DATE_REAL)
I dont know how to make it for the moment I have
select NAME_REAL,max(DATE_REAL),ID_REAL from MYREALTABLE group by NAME_REAL,ID_REAL
select NAME_REAL,min(DATE_REAL),ID_REAL from MYREALTABLE group by NAME_REAL,ID_REAL
But is not whats I need, and also I need only 1 query
Thanks you
I think the following should work by finding the records which have the minimum and maximum dates per name and joining those two queries.
select
mn.NAME_REAL,
MIN_DATE_REAL,
ID_REAL_OF_MIN_DATE_REAL,
MAX_DATE_REAL,
ID_REAL_OF_MAXDATE_REAL
from
(
select NAME_REAL,
DATE_REAL as MIN_DATE_REAL,
ID_REAL as ID_REAL_OF_MIN_DATE_REAL,
from (
select
NAME_REAL,
ID_REAL,
DATE_REAL,
row_number() over (partition by NAME_REAL order by DATE_REAL asc) as date_order_asc
from MYREALTABLE
)
where date_order_asc = 1
) mn
inner join
(
select NAME_REAL,
DATE_REAL as MAX_DATE_REAL,
ID_REAL as ID_REAL_OF_MAX_DATE_REAL,
from (
select
NAME_REAL,
ID_REAL,
DATE_REAL,
row_number() over (partition by NAME_REAL order by DATE_REAL desc) as date_order_desc
from MYREALTABLE
)
where date_order_desc = 1
) mx
on mn.NAME_REAL = mx.NAME_REAL
You can join the two results into a single query result as follows
select o.NAME_REAL,o.max,o.id_real,t.min,o.id_real from (
select NAME_REAL,max(DATE_REAL) as max,ID_REAL, from MYREALTABLE group by NAME_REAL,ID_REAL)
as o inner join
(select NAME_REAL,min(DATE_REAL),ID_REAL from MYREALTABLE group by NAME_REAL,ID_REAL
) as t on o.NAME_REAL=t.NAME_REAL
Try the below -
select NAME_REAL,ID_REAL,max(DATE_REAL) as max_date, min(DATE_REAL) as min_date
from MYREALTABLE
group by NAME_REAL,ID_REAL

Group by with MIN value in same query while presnting all other columns

I have a view called a with this data:
ID tDate name task val
23 2015-06-14
23 2015-06-25
126 2015-06-18
126 2015-06-22
126 2015-06-24
ID is integer and tDate is timestamp.
Basically I want to get for each ID the min value of tDate and present this row.
meaning:
ID tDate name task val
23 2015-06-14
126 2015-06-18
I wrote this query:
select ID, min(tDate)
from a
group by ID
order by ID
This is working BUT it doesn't allow me to present all other columns of a
for example if I do:
select ID, min(tDate), name
from a
group by ID
order by ID
it says that name must be under group by. So I wrote this query:
select ID, MIN(tDate), name, task, val , ....
from a
group by ID, name, task, val , ....
order by ID
And this one doesn't work. it gives false results.
How do I solve it?
Postgres has the very convenient distinct on for this type of problem:
select distinct on (id) a.*
from a
order by id, tdate;
This will return one row for each id. The row is the first one determined by the ordering defined in the order by clause.
Do a join from the one table to a sub-query table on just the ID / Min Date
select
YT.ID,
YT.tDate as OriginalDate,
PQ.MinDate,
YT.name,
YT.task,
YT.val
from
YourTable YT
JOIN ( select ID, min( tdate ) as MinDate
from YourTable
group by ID ) as PQ
on YT.ID = PQ.ID
AND YT.tDate = PQ.MinDate
order by
ID
Try something like this:
select a.id, a.tdate , .... from a
join (select id, min(tdate) min_date
from a
group by ID
) b
on a.id=b.id and a.tdate = b.min_date
order by a.id

SQL: How to find duplicates based on two fields?

I have rows in an Oracle database table which should be unique for a combination of two fields but the unique constrain is not set up on the table so I need to find all rows which violate the constraint myself using SQL. Unfortunately my meager SQL skills aren't up to the task.
My table has three columns which are relevant: entity_id, station_id, and obs_year. For each row the combination of station_id and obs_year should be unique, and I want to find out if there are rows which violate this by flushing them out with an SQL query.
I have tried the following SQL (suggested by this previous question) but it doesn't work for me (I get ORA-00918 column ambiguously defined):
SELECT
entity_id, station_id, obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes
ON
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
Can someone suggest what I'm doing wrong, and/or how to solve this?
SELECT *
FROM (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
FROM mytable t
)
WHERE rn > 1
SELECT entity_id, station_id, obs_year
FROM mytable t1
WHERE EXISTS (SELECT 1 from mytable t2 Where
t1.station_id = t2.station_id
AND t1.obs_year = t2.obs_year
AND t1.RowId <> t2.RowId)
Change the 3 fields in the initial select to be
SELECT
t1.entity_id, t1.station_id, t1.obs_year
Re-write of your query
SELECT
t1.entity_id, t1.station_id, t1.obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes
ON
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
I think the ambiguous column error (ORA-00918) was because you were selecting columns whose names appeared in both the table and the subquery, but you did not specifiy if you wanted it from dupes or from mytable (aliased as t1).
Could you not create a new table that includes the unique constraint, and then copy across the data row by row, ignoring failures?
You need to specify the table for the columns in the main select. Also, assuming entity_id is the unique key for mytable and is irrelevant to finding duplicates, you should not be grouping on it in the dupes subquery.
Try:
SELECT t1.entity_id, t1.station_id, t1.obs_year
FROM mytable t1
INNER JOIN (
SELECT station_id, obs_year FROM mytable
GROUP BY station_id, obs_year HAVING COUNT(*) > 1) dupes
ON
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
SELECT *
FROM (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
FROM mytable t
)
WHERE rn > 1
by Quassnoi is the most efficient for large tables.
I had this analysis of cost :
SELECT a.dist_code, a.book_date, a.book_no
FROM trn_refil_book a
WHERE EXISTS (SELECT 1 from trn_refil_book b Where
a.dist_code = b.dist_code and a.book_date = b.book_date and a.book_no = b.book_no
AND a.RowId <> b.RowId)
;
gave a cost of 1322341
SELECT a.dist_code, a.book_date, a.book_no
FROM trn_refil_book a
INNER JOIN (
SELECT b.dist_code, b.book_date, b.book_no FROM trn_refil_book b
GROUP BY b.dist_code, b.book_date, b.book_no HAVING COUNT(*) > 1) c
ON
a.dist_code = c.dist_code and a.book_date = c.book_date and a.book_no = c.book_no
;
gave a cost of 1271699
while
SELECT dist_code, book_date, book_no
FROM (
SELECT t.dist_code, t.book_date, t.book_no, ROW_NUMBER() OVER (PARTITION BY t.book_date, t.book_no
ORDER BY t.dist_code) AS rn
FROM trn_refil_book t
) p
WHERE p.rn > 1
;
gave a cost of 1021984
The table was not indexed....
SELECT entity_id, station_id, obs_year
FROM mytable
GROUP BY entity_id, station_id, obs_year
HAVING COUNT(*) > 1
Specify the fields to find duplicates on both the SELECT and the GROUP BY.
It works by using GROUP BY to find any rows that match any other rows based on the specified Columns.
The HAVING COUNT(*) > 1 says that we are only interested in seeing any rows that occur more than 1 time (and are therefore duplicates)
I thought a lot of the solutions here were cumbersome and tough to understand since I had a 3 column primary key constraint and needed to find the duplicates. So here's an option
SELECT id, name, value, COUNT(*) FROM db_name.table_name
GROUP BY id, name, value
HAVING COUNT(*) > 1
I'm surprised there aren't any answers here that use a CTE (Common Table Expression)
WITH cte as (
SELECT
ROW_NUMBER()
OVER(
PARTITION BY Last_Name, First_Name order by BIRTHDATE)
AS RN,
Employee_number, First_Name, Last_Name, BirthDate,
SUM(1)
OVER(
PARTITION BY Last_Name, First_Name
ORDER BY BIRTHDATE ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
AS CNT
FROM
employment)
select * from cte where cnt > 1
Not only will this find duplicates (on first and last name only), it will tell you how many there are.

how do I query sql for a latest record date for each user

I have a table that is a collection entries as to when a user was logged on.
username, date, value
--------------------------
brad, 1/2/2010, 1.1
fred, 1/3/2010, 1.0
bob, 8/4/2009, 1.5
brad, 2/2/2010, 1.2
fred, 12/2/2009, 1.3
etc..
How do I create a query that would give me the latest date for each user?
Update: I forgot that I needed to have a value that goes along with the latest date.
This is the simple old school approach that works with almost any db engine, but you have to watch out for duplicates:
select t.username, t.date, t.value
from MyTable t
inner join (
select username, max(date) as MaxDate
from MyTable
group by username
) tm on t.username = tm.username and t.date = tm.MaxDate
Using window functions will avoid any possible issues with duplicate records due to duplicate date values, so if your db engine allows it you can do something like this:
select x.username, x.date, x.value
from (
select username, date, value,
row_number() over (partition by username order by date desc) as _rn
from MyTable
) x
where x._rn = 1
Using window functions (works in Oracle, Postgres 8.4, SQL Server 2005, DB2, Sybase, Firebird 3.0, MariaDB 10.3)
select * from (
select
username,
date,
value,
row_number() over(partition by username order by date desc) as rn
from
yourtable
) t
where t.rn = 1
I see most of the developers use an inline query without considering its impact on huge data.
Simply, you can achieve this by:
SELECT a.username, a.date, a.value
FROM myTable a
LEFT OUTER JOIN myTable b
ON a.username = b.username
AND a.date < b.date
WHERE b.username IS NULL
ORDER BY a.date desc;
From my experience the fastest way is to take each row for which there is no newer row in the table.
Another advantage is that the syntax used is very simple, and that the meaning of the query is rather easy to grasp (take all rows such that no newer row exists for the username being considered).
NOT EXISTS
SELECT username, value
FROM t
WHERE NOT EXISTS (
SELECT *
FROM t AS witness
WHERE witness.username = t.username AND witness.date > t.date
);
ROW_NUMBER
SELECT username, value
FROM (
SELECT username, value, row_number() OVER (PARTITION BY username ORDER BY date DESC) AS rn
FROM t
) t2
WHERE rn = 1
INNER JOIN
SELECT t.username, t.value
FROM t
INNER JOIN (
SELECT username, MAX(date) AS date
FROM t
GROUP BY username
) tm ON t.username = tm.username AND t.date = tm.date;
LEFT OUTER JOIN
SELECT username, value
FROM t
LEFT OUTER JOIN t AS w ON t.username = w.username AND t.date < w.date
WHERE w.username IS NULL
To get the whole row containing the max date for the user:
select username, date, value
from tablename where (username, date) in (
select username, max(date) as date
from tablename
group by username
)
SELECT *
FROM MyTable T1
WHERE date = (
SELECT max(date)
FROM MyTable T2
WHERE T1.username=T2.username
)
This one should give you the correct result for your edited question.
The sub-query makes sure to find only rows of the latest date, and the outer GROUP BY will take care of ties. When there are two entries for the same date for the same user, it will return the one with the highest value.
SELECT t.username, t.date, MAX( t.value ) value
FROM your_table t
JOIN (
SELECT username, MAX( date ) date
FROM your_table
GROUP BY username
) x ON ( x.username = t.username AND x.date = t.date )
GROUP BY t.username, t.date
If your database syntax supports it, then TOP 1 WITH TIES can be a lifesafer in combination with ROWNUMER.
With the example data you provided, use this query:
SELECT TOP 1 WITH TIES
username, date, value
FROM user_log_in_attempts
ORDER BY ROW_NUMBER() OVER (PARTITION BY username ORDER BY date DESC)
It yields:
username | date | value
-----------------------------
bob | 8/4/2009 | 1.5
brad | 2/2/2010 | 1.2
fred | 12/2/2009 | 1.3
Demo
How it works:
ROWNUMBER() OVER (PARTITION BY... ORDER BY...) For each username a list of rows is calculated from the youngest (rownumber=1) to the oldest (rownumber=high)
ORDER BY ROWNUMBER... sorts the youngest rows of each user to the top, followed by the second-youngest rows of each user, and so on
TOP 1 WITH TIES Because each user has a youngest row, those youngest rows are equal in the sense of the sorting criteria (all have rownumber=1). All those youngest rows will be returned.
Tested with SQL-Server.
SELECT DISTINCT Username, Dates,value
FROM TableName
WHERE Dates IN (SELECT MAX(Dates) FROM TableName GROUP BY Username)
Username Dates value
bob 2010-02-02 1.2
brad 2010-01-02 1.1
fred 2010-01-03 1.0
This is similar to one of the answers above, but in my opinion it is a lot simpler and tidier. Also, shows a good use for the cross apply statement. For SQL Server 2005 and above...
select
a.username,
a.date,
a.value,
from yourtable a
cross apply (select max(date) 'maxdate' from yourtable a1 where a.username=a1.username) b
where a.date=b.maxdate
You could also use analytical Rank Function
with temp as
(
select username, date, RANK() over (partition by username order by date desc) as rnk from t
)
select username, rnk from t where rnk = 1
SELECT MAX(DATE) AS dates
FROM assignment
JOIN paper_submission_detail ON assignment.PAPER_SUB_ID =
paper_submission_detail.PAPER_SUB_ID
SELECT Username, date, value
from MyTable mt
inner join (select username, max(date) date
from MyTable
group by username) sub
on sub.username = mt.username
and sub.date = mt.date
Would address the updated problem. It might not work so well on large tables, even with good indexing.
SELECT *
FROM ReportStatus c
inner join ( SELECT
MAX(Date) AS MaxDate
FROM ReportStatus ) m
on c.date = m.maxdate
For Oracle sorts the result set in descending order and takes the first record, so you will get the latest record:
select * from mytable
where rownum = 1
order by date desc
SELECT t1.username, t1.date, value
FROM MyTable as t1
INNER JOIN (SELECT username, MAX(date)
FROM MyTable
GROUP BY username) as t2 ON t2.username = t1.username AND t2.date = t1.date
Select * from table1 where lastest_date=(select Max(latest_date) from table1 where user=yourUserName)
Inner Query will return the latest date for the current user, Outer query will pull all the data according to the inner query result.
I used this way to take the last record for each user that I have on my table.
It was a query to get last location for salesman as per recent time detected on PDA devices.
CREATE FUNCTION dbo.UsersLocation()
RETURNS TABLE
AS
RETURN
Select GS.UserID, MAX(GS.UTCDateTime) 'LastDate'
From USERGPS GS
where year(GS.UTCDateTime) = YEAR(GETDATE())
Group By GS.UserID
GO
select gs.UserID, sl.LastDate, gs.Latitude , gs.Longitude
from USERGPS gs
inner join USER s on gs.SalesManNo = s.SalesmanNo
inner join dbo.UsersLocation() sl on gs.UserID= sl.UserID and gs.UTCDateTime = sl.LastDate
order by LastDate desc
My small compilation
self join better than nested select
but group by doesn't give you primary key which is preferable for join
this key can be given by partition by in conjunction with first_value (docs)
So, here is a query:
select
t.*
from
Table t inner join (
select distinct first_value(ID) over(partition by GroupColumn order by DateColumn desc) as ID
from Table
where FilterColumn = 'value'
) j on t.ID = j.ID
Pros:
Filter data with where statement using any column
select any columns from filtered rows
Cons:
Need MS SQL Server starting with 2012.
I did somewhat for my application as it:
Below is the query:
select distinct i.userId,i.statusCheck, l.userName from internetstatus
as i inner join login as l on i.userID=l.userID
where nowtime in((select max(nowtime) from InternetStatus group by userID));
Here's one way to return only the most recent record for each user in SQL Server:
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date DESC) AS rn
FROM your_table
)
SELECT *
FROM CTE
WHERE rn = 1;
This uses a common table expression (CTE) to assign a unique rn (row number) to each record for each user, based on the user_id and sorted in descending order by date. The final query then selects only the records with rn equal to 1, which represents the most recent record for each user.
SELECT * FROM TABEL1 WHERE DATE= (SELECT MAX(CREATED_DATE) FROM TABEL1)
You would use aggregate function MAX and GROUP BY
SELECT username, MAX(date), value FROM tablename GROUP BY username, value

SQL Group by & Max

I have a following data in a table:
id name alarmId alarmUnit alarmLevel
1 test voltage psu warning
2 test voltage psu ceasing
3 test voltage psu warning
4 test temp rcc warning
5 test temp rcc ceasing
I'd like to show only the most recent information about every colums group (alarmId,alarmUnit), so the result should look like this:
3 test voltage psu warning
5 test temp rcc ceasing
I've tried so far:
SELECT MAX(id) as id,name,alarmId,alarmUnit,alarmLevel GROUP BY alarmId,alarmUnit;
Selected IDs seem to be fine but selected rows aren't corresponding to them. Could you help me?
In Oracle, SQL Server 2005+ and PostgreSQL 8.4:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY alarmId, alarmUnit ORDER BY id DESC) AS rn
FROM mytable
) q
WHERE rn = 1
In MySQL:
SELECT mi.*
FROM (
SELECT alarmId, alarmUnit, MAX(id) AS mid
FROM mytable
GROUP BY
alarmId, alarmUnit
) mo
JOIN mytable mi
ON mi.id = mo.mid
In PostgreSQL 8.3 and below:
SELECT DISTINCT ON (alarmId, alarmUnit) *
FROM mytable
ORDER BY
alarmId, alarmUnit, id DESC
If you want to get the row of the max, you'll probably need a sub-query. Something like:
SELECT *
FROM YourTable
WHERE id IN (
SELECT MAX(id) FROM YourTable GROUP BY alarmId, alarmUnit
)
Try:
SELECT * FROM table WHERE id IN
(SELECT MAX(id) FROM table GROUP BY alarmId, alarmUnit)
Maybe try something like the following:
SELECT id,name,alarmId,alarmUnit,alarmLevel
FROM table
WHERE id IN (SELECT Max(id) FROM table GROUP BY alarmId, alarmUnit)
You may have to include alarmId and alarmUnit in the sub query select.
select id, name, alarmID, alarmUnit, alarmLevel
from (select max(id) as id
from table
group by alarmID, alarmUnit) maxID
inner join table
on table.id = maxID.id