Select particular not grouped column from grouped set - sql

The topic might be a little bit unclear but I couldn't describe in a single sentence what I want to achieve.
Say I have a table that is (columns)
id INT PK
name VARCHAR
date DATE
I have a grouping select
select
name,
max(date)
from table
group by name
that gives me a name and the latest date.
What is the easiest way to join the id column to the current aggregated result set with the id value where the date was the maximum?
Let me explain what my goal is with an example:
The table is filled with the data as follows
id name date
1 david 2012-12-12
2 david 2013-12-02
3 patrick 2014-01-02
4 patrick 2012-11-11
and by my query I'd like to get the following result
id name date
2 david 2013-12-02
3 patrick 2014-01-02
Notice that all the records for name = 'david' are aggregated and the maximum date is selected. How to get the row id for this maximum date?

One option is to use ROW_NUMBER():
SELECT id, name, date
FROM (
SELECT id, name, date,
row_number() over (partition by name order by date desc) rn
FROM yourtable
) t
WHERE rn = 1
SQL Fiddle Demo
Another option is to join the table back to itself using the MAX() aggregate. This option could potentially result in ties if multiple id/name combinations share the same max date:
SELECT t.id, t.name, t.date
FROM yourtable t
JOIN (SELECT name, max(date) maxdate
FROM yourtable
GROUP BY name) t2 on t.name = t2.name AND t.date = t2.maxdate
More Fiddle

Related

Finding the first occurrence of an element in a SQL database

I have a table with a column for customer names, a column for purchase amount, and a column for the date of the purchase. Is there an easy way I can find how much first time customers spent on each day?
So I have
Name | Purchase Amount | Date
Joe 10 9/1/2014
Tom 27 9/1/2014
Dave 36 9/1/2014
Tom 7 9/2/2014
Diane 10 9/3/2014
Larry 12 9/3/2014
Dave 14 9/5/2014
Jerry 16 9/6/2014
And I would like something like
Date | Total first Time Purchase
9/1/2014 73
9/3/2014 22
9/6/2014 16
Can anyone help me out with this?
The following is standard SQL and works on nearly all DBMS
select date,
sum(purchaseamount) as total_first_time_purchase
from (
select date,
purchaseamount,
row_number() over (partition by name order by date) as rn
from the_table
) t
where rn = 1
group by date;
The derived table (the inner select) selects all "first time" purchases and the outside the aggregates based on the date.
The two key concepts here are aggregates and sub-queries, and the details of which dbms you're using may change the exact implementation, but the basic concept is the same.
For each name, determine they're first date
Using the results of 1, find each person's first day purchase amount
Using the results of 2, sum the amounts for each date
In SQL Server, it could look like this:
select Date, [totalFirstTimePurchases] = sum(PurchaseAmount)
from (
select t.Date, t.PurchaseAmount, t.Name
from table1 t
join (
select Name, [firstDate] = min(Date)
from table1
group by Name
) f on t.Name=f.Name and t.Date=f.firstDate
) ftp
group by Date
If you are using SQL Server you can accomplish this with either sub-queries or CTEs (Common Table Expressions). Since there is already an answer with sub-queries, here is the CTE version.
First the following will identify each row where there is a first time purchase and then get the sum of those values grouped by date:
;WITH cte
AS (
SELECT [Name]
,PurchaseAmount
,[date]
,ROW_NUMBER() OVER (
PARTITION BY [Name] ORDER BY [date] --start at 1 for each name at the earliest date and count up, reset every time the name changes
) AS rn
FROM yourTableName
)
SELECT [date]
,sum(PurchaseAmount) AS TotalFirstTimePurchases
FROM cte
WHERE rn = 1
GROUP BY [date]

How to select multiple rows in SQL Server while filling one column with the first value

Each of my rows have a date. I want the database to keep the good date. But I am in a situation where I want only the first date. But I still want all the other rows. So I would like to fill the date column with all the same date in my result.
For an example (Because I don't think I expressed myself well)
I have this:
name value date
a 10 5/13
b 14 2/13
c 20 1/13
a 11 7/13
a 5 8/13
b 8 9/13
I want it to become like this in the result:
name value date
a 26 5/13
b 22 5/13
c 20 5/13
I searched for this information but I only find the way to select the first row.
for now I'm doing
SELECT name, SUM(value), date FROM table
ORDER BY name
And I'm kind of clueless for what to do next.
Thanks :)
Databases don't have a concept of "first". Here is an attempt, but no guarantees unless you have a way of ordering to determine first:
select name, sum(value), const.date
from table cross join
(select top 1 date from table) const
group by name, const.date
If you only want to do this for a query, to provide this aggregated data for some specific client requirement, then #freshPrince's answer is appropriate. But if want to actually modify the data in the table itself, and prevent the issue from arising again, then you need to change the schema.
Create Table newTable(
name varChar(30) not null,
date datetime not null,
value decimal(10,2) not null default(0),
primary key (name, date) )
Insert newTable (name, date, value)
Select name, SUM(value), Min(date)
FROM currentTable
Group By Name
and delete the old table... then rename the new table to whatever...
You will also have to modify the process used to insert new rows so that instread of always inserting a new row, it updates the existing row for a specified name and date if it already exists...
Your question is slightly confusing since your desired result is showing a date that does not exists with either b or c but if that is the result that you want want you could use something similar to the following:
select name, sum(value) value, d.date
from yt
cross join
(
select min(date) date
from yt
where name = (select min(name)
from yt)
) d
group by name, d.date;
See SQL Fiddle with Demo
But it seems like you actually would want the min(date) for each name:
select name, sum(value) value, min(date)
from yt
group by name;
See SQL Fiddle with Demo.
If the order of the date should be the determined by the name then you could use:
select t.name, sum(value) value, d.date
from yt t
cross join
(
select top 1 name, date
from yt
order by name, date
) d
group by t.name, d.date;
See Demo

Row with the highest ID

You have three fields ID, Date and Total. Your table contains multiple rows for the same day which is valid data however for reporting purpose you need to show only one row per day. The row with the highest ID per day should be returned the rest should be hidden from users (not returned).
To better picture the question below is sample data and sample output:
ID, Date, Total
1, 2011-12-22, 50
2, 2011-12-22, 150
The correct result is:
2, 2012-12-22, 150
The correct output is single row for 2011-12-22 date and this row was chosen because it has the highest ID (2>1)
Assuming that you have a database that supports window functions, and that the date column is indeed just date (and not datetime), then something like:
SELECT
* --TODO - Pick columns
FROM
(
SELECT ID,[Date],Total,ROW_NUMBER() OVER (PARTITION BY [Date] ORDER BY ID desc) rn
FROM [Table]
) t
WHERE
rn = 1
Should produce one row per day - and the selected row for any given day is that with the highest ID value.
SELECT *
FROM table
WHERE ID IN ( SELECT MAX(ID)
FROM table
GROUP BY Date )
This will work.
SELECT *
FROM tableName a
INNER JOIN
(
SELECT `DATE`, MAX(ID) maxID
FROM tableName
GROUP BY `DATE`
) b ON a.id = b.MaxID AND
a.`date` = b.`date`
SQLFiddle Demo
Probably
SELECT * FROM your_table ORDER BY ID DESC LIMIT 1
Select MAX(ID),Data,Total from foo
for MySQL
Another simple way is
SELECT TOP 1 * FROM YourTable ORDER BY ID DESC
And, I think this is the most simple way!
SELECT * FROM TABLE_SUM S WHERE S.ID =
(
SELECT MAX(ID) FROM TABLE_SUM
WHERE CDATE = GG.CDATE
GROUP BY CDATE
)

SQL get latest date record

I have a query that has the following
DATE ID Name
--- ------------ -----------
2012-02-07 11:24:53.000 00001-KK-12 Smith, JEN
2011-12-28 00:00:00.000 00001-KK-12 Bearson, Matt
2012-02-13 10:38:18.000 00003-KJ-12 Wick, Julian
What I need to do is to get the latest date for a given ID and then show the results
So in this case, it would be:
DATE ID Name
--- ------------ -----------
2012-02-07 11:24:53.000 00001-KK-12 Smith, JEN
2012-02-13 10:38:18.000 00003-KJ-12 Wick, Julian
I tried to use the Top(1) with a group by on ID based but was not successful
There are several ways to do this. One way is to use row_number. Its useful if there's a possibility that there's a tie on date and you want to arbitrarily pick one.
WITH CTE AS (
SELECT
row_number() over (partition by id order by date desc) rn,
date,
id,
name
FROM
table)
SELECT date,
id,
name
FROM CTE WHERE RN = 1
Another option is to use an ANTI JOIN (no aggregates no CTE) as follows but will return multiple results if there's a tie for first for a given ID.
SELECT
t.date,
t.id,
t.name
FROM
table t
LEFT JOIN table t1
WHERE t.Id = t1.id
and t.Date < t1.Date
WHERE
t1.Date is null
You want to use ROW_NUMBER() OVER. I was about to create a sample, but it looks like Conrad already did :)

Fetch Max from a date column grouped by a particular field

I have a table similar to this:
LogId RefId Entered
==================================
1 1 2010-12-01
2 1 2010-12-04
3 2 2010-12-01
4 2 2010-12-06
5 3 2010-12-01
6 1 2010-12-10
7 3 2010-12-05
8 4 2010-12-01
Here, LogId is unique; For each RefId, there are multiple entries with timestamp. What I want to extract is LogId for each latest RefId.
I tried solutions from this link:http://stackoverflow.com/questions/121387/sql-fetch-the-row-which-has-the-max-value-for-a-column. But, it returns multiple rows with same RefId. The LogId as well as RefId should be unique.
Can someone help me with this?
Thanks
Vamyip
You need to use a subquery that extracts the latest Entered value for each RefId, and then join your source table to this on RefId, Entered:
SELECT DISTINCT MyTable.LogId, MyTable.Entered FROM MyTable
INNER JOIN (SELECT RefId, MAX(Entered) as Entered FROM MyTable GROUP BY RefId) Latest
ON MyTable.RefId = Latest.RefId AND MyTable.Entered = Latest.Entered
Since it appears auto-increment log ID, they would be date/time stamped in sequential order. So, by grabbing the last LogID per Reference ID, you'll have the "most recent" one in the "PreQuery" below, then join based on that single ID to the original table to get the actual date stamp info (or other details) you need from the actual log.
select PreQuery.RefID,
PreQuery.LastLogEntry,
L.Entered
from
( select RefID,
Max( LogID ) LastLogEntry
from
YourLog
group by
RefID ) PreQuery,
YourLog L
where
PreQuery.LastLogEntry = L.LogID
To handle the duplicates correctly:
SELECT m.*
FROM (
SELECT DISTINCT refid
FROM mytable
) md
JOIN mytable m
ON m.LogID =
(
SELECT LogID
FROM mytable mi
WHERE mi.refid = md.refid
ORDER BY
mi.refid DESC, mi.entered DESC, mi.logid DESC
LIMIT 1
)
Create an index on mytable (refid, entered, logid) for this to work fast.