SQL get latest date record - sql

I have a query that has the following
DATE ID Name
--- ------------ -----------
2012-02-07 11:24:53.000 00001-KK-12 Smith, JEN
2011-12-28 00:00:00.000 00001-KK-12 Bearson, Matt
2012-02-13 10:38:18.000 00003-KJ-12 Wick, Julian
What I need to do is to get the latest date for a given ID and then show the results
So in this case, it would be:
DATE ID Name
--- ------------ -----------
2012-02-07 11:24:53.000 00001-KK-12 Smith, JEN
2012-02-13 10:38:18.000 00003-KJ-12 Wick, Julian
I tried to use the Top(1) with a group by on ID based but was not successful

There are several ways to do this. One way is to use row_number. Its useful if there's a possibility that there's a tie on date and you want to arbitrarily pick one.
WITH CTE AS (
SELECT
row_number() over (partition by id order by date desc) rn,
date,
id,
name
FROM
table)
SELECT date,
id,
name
FROM CTE WHERE RN = 1
Another option is to use an ANTI JOIN (no aggregates no CTE) as follows but will return multiple results if there's a tie for first for a given ID.
SELECT
t.date,
t.id,
t.name
FROM
table t
LEFT JOIN table t1
WHERE t.Id = t1.id
and t.Date < t1.Date
WHERE
t1.Date is null

You want to use ROW_NUMBER() OVER. I was about to create a sample, but it looks like Conrad already did :)

Related

Postgresql extract last row for each id

Suppose I've next data
id date another_info
1 2014-02-01 kjkj
1 2014-03-11 ajskj
1 2014-05-13 kgfd
2 2014-02-01 SADA
3 2014-02-01 sfdg
3 2014-06-12 fdsA
I want for each id extract last information:
id date another_info
1 2014-05-13 kgfd
2 2014-02-01 SADA
3 2014-06-12 fdsA
How could I manage that?
The most efficient way is to use Postgres' distinct on operator
select distinct on (id) id, date, another_info
from the_table
order by id, date desc;
If you want a solution that works across databases (but is less efficient) you can use a window function:
select id, date, another_info
from (
select id, date, another_info,
row_number() over (partition by id order by date desc) as rn
from the_table
) t
where rn = 1
order by id;
The solution with a window function is in most cases faster than using a sub-query.
select *
from bar
where (id,date) in (select id,max(date) from bar group by id)
Tested in PostgreSQL,MySQL
I found this as the fastest solution:
SELECT t1.*
FROM yourTable t1
LEFT JOIN yourTable t2 ON t2.tag_id = t1.tag_id AND t2.value_time > t1.value_time
WHERE t2.tag_id IS NULL
For most scenarios, The most efficient way is to use GROUP BY
I saw the accepted answer which determine that using distinct on (id) id is The most efficient way to solve the problem which was described in the question but I believe it's extremely not accurate.
Sadly I couldn't find any helpfull insights from POSTGRES doc' but I did find this article which refference few others and provide examples whereas
GROUP BY approach definitely leads to better performance
We had discussion over this subject at work and did a little experience over a table that holds some data about tags' blinks with 4,114,692 rows, and has indexes over tag_id and over timestamp (seperated indexes)
Here are the queries:
1.using ditinct:
select distinct on (tag_id) tag_id, timestamp, some_data
from blinks
order by id, timestamp desc;
2.using CTE + group by + join:
`with blink_last_timestamp as (
select tag_id, max(timestamp) as max_timestamp
from blinks
group by tag_id )
select bl.tag_id, max_timestamp, some_data
from blink_last_timestamp bl
join blinks b on
b.tag_id = bl.tag_id and
bd.timestamp = bl.max_timestamp`
The results where unambiguous and favored the second solution for this scenario (Which is pretty generic in my opinion),
showing that it is being 10X times (!) faster 1655.991 ms (00:01.656) vs 16723.346 ms (00:16.723) and of course delivered the same data.
Group by id and use any aggregate functions to meet the criteria of last record. For example
select id, max(date), another_info
from the_table
group by id, another_info

Finding the first occurrence of an element in a SQL database

I have a table with a column for customer names, a column for purchase amount, and a column for the date of the purchase. Is there an easy way I can find how much first time customers spent on each day?
So I have
Name | Purchase Amount | Date
Joe 10 9/1/2014
Tom 27 9/1/2014
Dave 36 9/1/2014
Tom 7 9/2/2014
Diane 10 9/3/2014
Larry 12 9/3/2014
Dave 14 9/5/2014
Jerry 16 9/6/2014
And I would like something like
Date | Total first Time Purchase
9/1/2014 73
9/3/2014 22
9/6/2014 16
Can anyone help me out with this?
The following is standard SQL and works on nearly all DBMS
select date,
sum(purchaseamount) as total_first_time_purchase
from (
select date,
purchaseamount,
row_number() over (partition by name order by date) as rn
from the_table
) t
where rn = 1
group by date;
The derived table (the inner select) selects all "first time" purchases and the outside the aggregates based on the date.
The two key concepts here are aggregates and sub-queries, and the details of which dbms you're using may change the exact implementation, but the basic concept is the same.
For each name, determine they're first date
Using the results of 1, find each person's first day purchase amount
Using the results of 2, sum the amounts for each date
In SQL Server, it could look like this:
select Date, [totalFirstTimePurchases] = sum(PurchaseAmount)
from (
select t.Date, t.PurchaseAmount, t.Name
from table1 t
join (
select Name, [firstDate] = min(Date)
from table1
group by Name
) f on t.Name=f.Name and t.Date=f.firstDate
) ftp
group by Date
If you are using SQL Server you can accomplish this with either sub-queries or CTEs (Common Table Expressions). Since there is already an answer with sub-queries, here is the CTE version.
First the following will identify each row where there is a first time purchase and then get the sum of those values grouped by date:
;WITH cte
AS (
SELECT [Name]
,PurchaseAmount
,[date]
,ROW_NUMBER() OVER (
PARTITION BY [Name] ORDER BY [date] --start at 1 for each name at the earliest date and count up, reset every time the name changes
) AS rn
FROM yourTableName
)
SELECT [date]
,sum(PurchaseAmount) AS TotalFirstTimePurchases
FROM cte
WHERE rn = 1
GROUP BY [date]

Oracle - With a one to many relationship, select distinct rows based on a min value

This question is the same as In one to many relationship, return distinct rows based on MIN value with the exception that I'd like to see what the answer looks like in other dialects, particularly in Oracle.
Reposting from the original description:
Let's say a patient makes many visits. I want to write a query that returns distinct patient rows based on their earliest visit. For example, consider the following rows.
patients
-------------
id name
1 Bob
2 Jim
3 Mary
visits
-------------
id patient_id visit_date reference_number
1 1 6/29/14 09f3be26
2 1 7/8/14 34c23a9e
3 2 7/10/14 448dd90a
What I want to see returned by the query is:
id name first_visit_date reference_number
1 Bob 6/29/14 09f3be26
2 Jim 7/10/14 448dd90a
In the other question, using postgresql, the best solution seemed to be to use distinct on, but that is not available in other dialects.
Typically, one uses row_number():
select id, name, visit_date as first_visit_date, reference_number
from (select v.id, p.name, v.visit_date, v.reference_number,
row_number() over (partition by p.id order by v.visit_date desc) as seqnum
from visits v join
patients p
on v.patient_id p.id
) t
where seqnum = 1;

Select particular not grouped column from grouped set

The topic might be a little bit unclear but I couldn't describe in a single sentence what I want to achieve.
Say I have a table that is (columns)
id INT PK
name VARCHAR
date DATE
I have a grouping select
select
name,
max(date)
from table
group by name
that gives me a name and the latest date.
What is the easiest way to join the id column to the current aggregated result set with the id value where the date was the maximum?
Let me explain what my goal is with an example:
The table is filled with the data as follows
id name date
1 david 2012-12-12
2 david 2013-12-02
3 patrick 2014-01-02
4 patrick 2012-11-11
and by my query I'd like to get the following result
id name date
2 david 2013-12-02
3 patrick 2014-01-02
Notice that all the records for name = 'david' are aggregated and the maximum date is selected. How to get the row id for this maximum date?
One option is to use ROW_NUMBER():
SELECT id, name, date
FROM (
SELECT id, name, date,
row_number() over (partition by name order by date desc) rn
FROM yourtable
) t
WHERE rn = 1
SQL Fiddle Demo
Another option is to join the table back to itself using the MAX() aggregate. This option could potentially result in ties if multiple id/name combinations share the same max date:
SELECT t.id, t.name, t.date
FROM yourtable t
JOIN (SELECT name, max(date) maxdate
FROM yourtable
GROUP BY name) t2 on t.name = t2.name AND t.date = t2.maxdate
More Fiddle

Select a column with condition with other column as it is

I have not searched a lot before asking because I am feeling the search string complicated to write.
I will ask by example instead of description.
I have a table called user_sale
id emp_id emp_name emp_location date sales
------------------------------------------------------
1 111 mr.one A 2013/07/17 5000
2 111 mr.one C 2013/07/14 6000
3 222 mr.two B 2013/06/15 5500
and so on.
In output I want all field as it is but want emp_location latest within a month.
I am able to get month and year from date. So I can do group by year and month.
expected output:
id emp_id emp_name emp_location date sales
------------------------------------------------------
1 111 mr.one A 2013/07/17 5000
2 111 mr.one A 2013/07/14 6000
3 222 mr.two B 2013/06/15 5500
One solution is to join with the same table, but since the table contains large data it does not seem like a proper solution.
Use the window function first_value() to get the "first" of one column (emp_location) as defined by another column (date), embedded in otherwise unchanged rows:
SELECT id, emp_id, emp_name
, first_value(emp_location) OVER (PARTITION BY emp_id
ORDER BY date DESC) AS emp_location
, date, sales
FROM user_sale
ORDER BY id;
Assuming that emp_id is unique per group as you define it.
Aside: you shouldn't be using date (reserved word in SQL standard) or id (non-descriptive) as column names.
You can use a windowing function, like this to get the latest data for each employee:
SELECT *
FROM
(SELECT *,
row_number() OVER (PARTITION BY emp_name ORDER BY date_sales DESC) AS pos
FROM user_sale
) AS rankem
WHERE pos = 1;
I'm not quite clear what exactly you want but I imagine you can join to that sub-query to get what you need.