Select MAX Value for Each ROW - Oracle Sql - sql

I have one doubt.
I need to find what is the latest occurrence for a specific list of Customers, let's say to simplify, I need it for 3 Customers out of 100.
I need to check when it was the last time each of them got a bonus.
The table would be:
EVENT_TBL
Fields: Account ID, EVENT_DATE, BONUS ID, ....
Can you suggest a way to grab the latest (MAX) EVENT DATE (that means one row each)
I'm using SELECT...IN to specify the Account ID but not sure how to use MAX, Group BY etc etc (if ever needed).

Use the ROW_NUMBER() analytic function:
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY Account_id ORDER BY event_date DESC ) AS rn
FROM EVENT_TBL t
WHERE Account_ID IN ( 123, 456, 789 )
)
WHERE rn = 1

You can try
with AccountID_Max_EVENT_DATE as (
select AccountID, max(EVENT_DATE) MAX_D
from EVENT_TBL
group by AccountID
)
SELECT E.*
FROM EVENT_TBL E
INNER JOIN AccountID_Max_EVENT_DATE M
ON (E.AccountID = M.AccountID AND M.MAX_D = E.EVENT_DATE)

Related

Query to select 2 most recent dated records per grouping

Using R, I want to grab the two most recently dated entries for each UserID, assuming there is 1 or more entries per UserID.
The key elements of my data would be an identifier (UserID), and a date, that is of type date.
Thank you.
In SQL Server, which has the ROW_NUMBER() analytic function, you can try this query:
SELECT t.UserID, t.date, ...other columns
FROM
(
SELECT UserID, date, ...other columns,
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY date DESC) rn
FROM yourTable
) t
WHERE t.rn <= 2

SQL Server Group By with Max on Date field

I hope i can explain the issue i'm having and hopefully so can point me in the same direction.
I'm trying to do a group by (Email Address) on a subset of data, then i'm using a max() on a date field but because of different values in other fields its bring back more rows then require.
I would just like to return the max record per email address and return the fields that are on the same row that are on the max record.
Not sure how i can write this query?
This is a task for ROW_NUMBER:
select *
from
(
select t.*,
-- assign sequential number starting with 1 for the maximum date
row_number() over (partiton by email_address order by datecol desc) as rn
from tab
) as dt
where rn = 1 -- only return the latest row
You can write this query using row_number():
select t.*
from (select t.*,
row_number() over (partition by emailaddress order by date desc) as seqnum
from t
) t
where seqnum = 1;
How about something like this?
select a.*
from baseTable as a
inner join
(select Email,
Max(EmailDate) as EmailDate
from baseTable
group by Email) as b
on a.Email = b.Email
and a.EmailDate = b.EmailDate

How to find duplicate records in PostgreSQL

I have a PostgreSQL database table called "user_links" which currently allows the following duplicate fields:
year, user_id, sid, cid
The unique constraint is currently the first field called "id", however I am now looking to add a constraint to make sure the year, user_id, sid and cid are all unique but I cannot apply the constraint because duplicate values already exist which violate this constraint.
Is there a way to find all duplicates?
The basic idea will be using a nested query with count aggregation:
select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1
You can adjust the where clause in the inner query to narrow the search.
There is another good solution for that mentioned in the comments, (but not everyone reads them):
select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1
Or shorter:
SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1
From "Find duplicate rows with PostgreSQL" here's smart solution:
select * from (
SELECT id,
ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id asc) AS Row
FROM tbl
) dups
where
dups.Row > 1
In order to make it easier I assume that you wish to apply a unique constraint only for column year and the primary key is a column named id.
In order to find duplicate values you should run,
SELECT year, COUNT(id)
FROM YOUR_TABLE
GROUP BY year
HAVING COUNT(id) > 1
ORDER BY COUNT(id);
Using the sql statement above you get a table which contains all the duplicate years in your table. In order to delete all the duplicates except of the the latest duplicate entry you should use the above sql statement.
DELETE
FROM YOUR_TABLE A USING YOUR_TABLE_AGAIN B
WHERE A.year=B.year AND A.id<B.id;
You can join to the same table on the fields that would be duplicated and then anti-join on the id field. Select the id field from the first table alias (tn1) and then use the array_agg function on the id field of the second table alias. Finally, for the array_agg function to work properly, you will group the results by the tn1.id field. This will produce a result set that contains the the id of a record and an array of all the id's that fit the join conditions.
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id;
Obviously, id's that will be in the duplicate_entries array for one id, will also have their own entries in the result set. You will have to use this result set to decide which id you want to become the source of 'truth.' The one record that shouldn't get deleted. Maybe you could do something like this:
with dupe_set as (
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id
order by tn1.id asc)
select ds.id from dupe_set ds where not exists
(select de from unnest(ds.duplicate_entries) as de where de < ds.id)
Selects the lowest number ID's that have duplicates (assuming the ID is increasing int PK). These would be the ID's that you would keep around.
Inspired by Sandro Wiggers, I did something similiar to
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id
FROM ordered
WHERE rnk > 1
)
DELETE
FROM user_links
USING to_delete
WHERE user_link.id = to_delete.id;
If you want to test it, change it slightly:
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id,year,user_id,sid, cid
FROM ordered
WHERE rnk > 1
)
SELECT * FROM to_delete;
This will give an overview of what is going to be deleted (there is no problem to keep year,user_id,sid,cid in the to_delete query when running the deletion, but then they are not needed)
In your case, because of the constraint you need to delete the duplicated records.
Find the duplicated rows
Organize them by created_at date - in this case I'm keeping the oldest
Delete the records with USING to filter the right rows
WITH duplicated AS (
SELECT id,
count(*)
FROM products
GROUP BY id
HAVING count(*) > 1),
ordered AS (
SELECT p.id,
created_at,
rank() OVER (partition BY p.id ORDER BY p.created_at) AS rnk
FROM products o
JOIN duplicated d ON d.id = p.id ),
products_to_delete AS (
SELECT id,
created_at
FROM ordered
WHERE rnk = 2
)
DELETE
FROM products
USING products_to_delete
WHERE products.id = products_to_delete.id
AND products.created_at = products_to_delete.created_at;
Following SQL syntax provides better performance while checking for duplicate rows.
SELECT id, count(id)
FROM table1
GROUP BY id
HAVING count(id) > 1
begin;
create table user_links(id serial,year bigint, user_id bigint, sid bigint, cid bigint);
insert into user_links(year, user_id, sid, cid) values (null,null,null,null),
(null,null,null,null), (null,null,null,null),
(1,2,3,4), (1,2,3,4),
(1,2,3,4),(1,1,3,8),
(1,1,3,9),
(1,null,null,null),(1,null,null,null);
commit;
set operation with distinct and except.
(select id, year, user_id, sid, cid from user_links order by 1)
except
select distinct on (year, user_id, sid, cid) id, year, user_id, sid, cid
from user_links order by 1;
except all also works. Since id serial make all rows unique.
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1;
So far works nulls and non-nulls.
delete:
with a as(
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1)
delete from user_links using a where user_links.id = a.id returning *;

how to get the distinct records based on maximum date?

I'm working with Sql server 2008.i have a table contains following columns,
Id,
Name,
Date
this table contains more than one record for same id.i want to get distinct id having maximum date.how can i write sql query for this?
Use the ROW_NUMBER() function and PARTITION BY clause. Something like this:
SELECT Id, Name, Date FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Date desc) AS ROWNUM
FROM [MyTable]
) x WHERE ROWNUM = 1
If you need only ID column and other columns are NOT required, then you don't need to go with ROW_NUMBER or MAX or anything else. You just do a Group By over ID column, because whatever the maximum date is you will get same ID.
SELECT ID FROM table GROUP BY ID
--OR
SELECT DISTINCT ID FROM table
If you need ID and Date columns with maximum date, then simply do a Group By on ID column and select the Max Date.
SELECT ID, Max(Date) AS Date
FROM table
GROUP BY ID
If you need all the columns but 1 line having Max. date then you can go with ROW_NUMBER or MAX as mentioned in other answers.
SELECT *
FROM table AS M
WHERE Exists(
SELECT 1
FROM table
WHERE ID = M.ID
HAVING M.Date = Max(Date)
)
One way, using ROW_NUMBER:
With CTE As
(
SELECT Id, Name, Date, Rn = Row_Number() Over (Partition By Id
Order By Date DESC)
FROM dbo.TableName
)
SELECT Id --, Name, Date
FROM CTE
WHERE Rn = 1
If multiple max-dates are possible and you want all you could use DENSE_RANK instead.
Here's an overview of sql-server's ranking function: http://technet.microsoft.com/en-us/library/ms189798.aspx
By the way, CTE is a common-table-expression which is similar to a named sub-query. I'm using it to be able to filter by the row_number. This approach allows to select all columns if you want.
select Max(Date) as "Max Date"
from table
group by Id
order by Id
Try with Max(Date) and GROUP BY the other two columns (the ones with repeating data)..
SELECT ID, Max(Date) as date, Name
FROM YourTable
GROUP BY ID, Name
You may try with this
DECLARE #T TABLE(ID INT, NAME VARCHAR(50),DATE DATETIME)
INSERT INTO #T VALUES(1,'A','2014-04-20'),(1,'A','2014-04-28')
,(2,'A2','2014-04-22'),(2,'A2','2014-04-24')
,(3,'A3','2014-04-20'),(3,'A3','2014-04-28')
,(4,'A4','2014-04-28'),(4,'A4','2014-04-28')
,(5,'A5','2014-04-28'),(5,'A5','2014-04-28')
SELECT T.ID FROM #T T
WHERE T.DATE=(SELECT MAX(A.DATE)
FROM #T A
WHERE A.ID=T.ID
GROUP BY A.ID )
GROUP BY T.ID
select id, max(date) from NameOfYourTable group by id;

Select Record with Maximum Creation Date

Let us say that I have a database table with the following two records:
CACHE_ID BUSINESS_DATE CREATED_DATE
1183 13-09-06 13-09-19 16:38:59.336000000
1169 13-09-06 13-09-24 17:19:05.762000000
1152 13-09-06 13-09-17 14:18:59.336000000
1173 13-09-05 13-09-19 15:48:59.136000000
1139 13-09-05 13-09-24 12:59:05.263000000
1152 13-09-05 13-09-27 13:28:59.332000000
I need to write a query that will return the CACHE_ID for the record which has the most recent CREATED_DATE.
I am having trouble crafting such a query. I can do a GROUP BY based on BUSINESS_DATE and get the MAX(CREATED_DATE)...of course, I won't have the CACHE_ID of the record.
Could someone help with this?
Not positive on oracle syntax, but use the ROW_NUMBER() function:
SELECT BUSINESS_DATE, CACHE_ID
FROM (SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY BUSINESS_DATE ORDER BY CREATED_DATE DESC) RN
FROM YourTable t
)sub
WHERE RN = 1
The ROW_NUMBER() function assigns a number to each row. PARTITION BY is optional, but used to start the numbering over for each value in that group,  ie: if you PARTITION BY BUSINESS_DATE  then for each unique BUSINESS_DATE value the numbering would start over at 1.  ORDER BY of course is used to define how the counting should go, and is required in the ROW_NUMBER() function.
You want to group on business date, and get the CACHE_ID with the most current created date? Use something like this:
select yt.CACHE_ID, yt.BUSINESS_DATE, yt.CREATED_DATE
from YourTable yt
where yt.CREATED_DATE = (select max(yt1.CREATED_DATE)
from YourTable yt1
where yt1.BUSINESS_DATE = yt.BUSINESS_DATE)
Not sure of the exact syntax, but conceptually, can't you just sort by CREATED_DATE descending and take the first one?
Across all records -
select top 1 CACHE_ID from YourTable order by CREATED_DATE desc
For each BUSINESS_DATE -
select distinct
a.BUSINESS_DATE,
(
select top 1 b.CACHE_ID
from YourTable b where a.BUSINESS_DATE = b.BUSINESS_DATE
order by b.CREATED_DATE desc
) as Last_CREATED_DATE
from YourTable a