I'm new to SQL and this website so apologies if anything is unclear.
Basically, I got two separate tables:
Table A:
CustomerID | PromoStart | PromoEnd
1 | 2020-05-01 | 2020-05-30
2 | 2020-06-01 | 2020-07-30
3 | 2020-07-01 | 2020-10-15
Table B:
CustomerID | Date | Payment |
1 | 2020-02-15 | 5000 |
1 | 2020-05-04 | 200 |
1 | 2020-05-28 | 100 |
1 | 2020-06-05 | 1000 |
2 | 2020-06-10 | 20 |
2 | 2020-07-25 | 500 |
2 | 2020-08-02 | 1000 |
3 | 2020-09-05 | 580 |
3 | 2020-12-01 | 20 |
What I want is to get the sum of all payments that fall between PromoStart and PromoEnd for each customer.
so the desired result would be :
CustomerID | TotalPayments
1 | 300
2 | 520
3 | 580
I guess this would involve an inner (left?) join and a where clause however I just can't figure it out.
A LATERAL join would do it:
SELECT a.customer_id, b.total_payments
FROM table_a a
LEFT JOIN LATERAL (
SELECT sum(payment) AS total_payments
FROM table_b
WHERE customer_id = a.customer_id
AND date BETWEEN a.promo_start AND a.promo_end
) b ON true;
This assumes inclusive lower and upper bounds, and that you want to include all rows from table_a, even without any payments in table_b.
You can use a correlated subquery or join with aggregation. The correlated subquery looks like:
select a.*,
(select sum(b.payment)
from b
where b.customerid = a.customerid and
b.date >= a.promostart and
b.date <= a.promoend
) as totalpayments
from a;
You don't mention your database, but this can take advantage of an index on b(customerid, date, payment). By avoiding the outer aggregation, this would often have better performance than an alternative using group by.
I hope I didn't overlook something important but it seems to me simple join on range matching condition should be sufficient:
with a (CustomerID , PromoStart , PromoEnd) as (values
(1 , date '2020-05-01' , date '2020-05-30'),
(2 , date '2020-06-01' , date '2020-07-30'),
(3 , date '2020-07-01' , date '2020-10-15')
), b (CustomerID , d , Payment ) as (values
(1 , date '2020-02-15' , 5000 ),
(1 , date '2020-05-04' , 200 ),
(1 , date '2020-05-28' , 100 ),
(1 , date '2020-06-05' , 1000 ),
(2 , date '2020-06-10' , 20 ),
(2 , date '2020-07-25' , 500 ),
(2 , date '2020-08-02' , 1000 ),
(3 , date '2020-09-05' , 580 ),
(3 , date '2020-12-01' , 20 )
)
select a.CustomerID, sum(b.Payment)
from a
join b on a.CustomerID = b.CustomerID and b.d between a.PromoStart and PromoEnd
group by a.CustomerID
Db fiddle here.
I am working with two tables, the first, purchases, below (note, this is a clipping of the purchases table:
| ID | Date | Value | Type | Satisfied By |
|:-------:|:---------:|:-----:|:----:|:------------:|
| SALE100 | 1/1/2019 | -5 | OUT | |
| SALE201 | 1/9/2019 | -10 | OUT | |
| SALE203 | 2/22/2019 | -1 | OUT | |
| SALE205 | 3/14/2019 | -1 | OUT | |
I am trying to determine which MAKE items from another table, makes, satisfies these sales.
| ID | Date | Value | Needed For |
|:-------:|:----------:|:-----:|:----------:|
| MAKE300 | 12/24/2018 | 5 | SALE100 |
| MAKE301 | 1/3/2019 | 3 | SALE201 |
| MAKE399 | 1/5/2019 | 5 | SALE201 |
| MAKE401 | 1/7/2019 | 3 | SALE201 |
| MAKE401 | 1/7/2019 | 3 | SALE203 |
| MAKE912 | 2/1/2019 | 1 | SALE205 |
I am trying to write a query that will enable me to determine which ID or IDs from the makes table satisfies my sales.
My end results would look either like, in the case that they are LISTAGG:
| ID | Date | Value | Type | Satisfied By |
|:-------:|:---------:|:-----:|:----:|:-------------------------:|
| SALE100 | 1/1/2019 | -5 | OUT | MAKE300 |
| SALE201 | 1/9/2019 | -10 | OUT | MAKE301, MAKE399, MAKE401 |
| SALE203 | 2/22/2019 | -1 | OUT | MAKE401 |
| SALE205 | 3/14/2019 | -1 | OUT | MAKE912 |
However, when writing the following line of code:
(SELECT LISTAGG(makes.id, ', ') WITHIN GROUP (ORDER BY NULL) FROM makes WHERE purchased.id = needed_for.id) ELSE NULL END AS Satisfied_By
results in an error stating:
ORA-01489: result of string concatenation is too long
01489. 00000 - "result of string concatenation is too long"
I have also tried the following query to obtain results like this (which is ideal):
| ID | Date | Value | Type | Satisfied By |
|:-------:|:---------:|:-----:|:----:|:------------:|
| SALE100 | 1/1/2019 | -5 | OUT | MAKE300 |
| SALE201 | 1/9/2019 | -10 | OUT | MAKE301 |
| SALE201 | 1/9/2019 | -10 | OUT | MAKE399 |
| SALE201 | 1/9/2019 | -10 | OUT | MAKE401 |
| SALE203 | 2/22/2019 | -1 | OUT | MAKE401 |
| SALE205 | 3/14/2019 | -1 | OUT | MAKE912 |
CASE WHEN Type = 'OUT' THEN
(SELECT
makes.id
FROM
makes
WHERE
makes.id IN (
SELECT
makes.id
FROM
makes
WHERE
sales.id = purchases.id
)) ELSE NULL END AS Satisfied_By
Which yields
ORA-01427: single-row subquery returns more than one row
01427. 00000 - "single-row subquery returns more than one row"
I have found many examples of this error on Stack Overflow, which is where I adopted the IN method from, and from this source, but I am still getting the error. Any help is appreciated.
Your 'ideal' result is a simple join:
select p.id, p.dt, p.value, p.type, m.id as satisfied_by
from purchases p
join makes m on m.needed_for = p.id;
You might want to make it a left join in case there are not matches, if that is possible in your data.
Quick demo with your data:
-- CTEs for sample data
with purchases (id, dt, value, type, satisfied_by) as (
select 'SALE100', date '2019-01-01', -5, 'OUT', null from dual
union all select 'SALE201', date '2019-01-09', -10, 'OUT', null from dual
union all select 'SALE203', date '2019-02-22', -1, 'OUT', null from dual
union all select 'SALE205', date '2019-03-14', -1, 'OUT', null from dual
),
makes (id, dt, value, needed_for) as (
select 'MAKE300', date '2018-12-24', 5, 'SALE100' from dual
union all select 'MAKE301', date '2019-01-03', 3, 'SALE201' from dual
union all select 'MAKE399', date '2019-01-05', 5, 'SALE201' from dual
union all select 'MAKE401', date '2019-01-07', 3, 'SALE201' from dual
union all select 'MAKE401', date '2019-01-07', 3, 'SALE203' from dual
union all select 'MAKE912', date '2019-02-01', 1, 'SALE205' from dual
)
-- actual query
select p.id, p.dt, p.value, p.type, m.id as satisfied_by
from purchases p
left join makes m on m.needed_for = p.id;
ID DT VALUE TYP SATISFIED_BY
------- ---------- ---------- --- ------------------------------
SALE100 2019-01-01 -5 OUT MAKE300
SALE201 2019-01-09 -10 OUT MAKE301
SALE201 2019-01-09 -10 OUT MAKE399
SALE201 2019-01-09 -10 OUT MAKE401
SALE203 2019-02-22 -1 OUT MAKE401
SALE205 2019-03-14 -1 OUT MAKE912
The listagg version is also fairly simple:
select p.id, p.dt, p.value, p.type,
listagg(m.id, ', ') within group (order by m.id) as satisfied_by
from purchases p
left join makes m on m.needed_for = p.id
group by p.id, p.dt, p.value, p.type;
ID DT VALUE TYP SATISFIED_BY
------- ---------- ---------- --- ------------------------------
SALE100 2019-01-01 -5 OUT MAKE300
SALE201 2019-01-09 -10 OUT MAKE301, MAKE399, MAKE401
SALE203 2019-02-22 -1 OUT MAKE401
SALE205 2019-03-14 -1 OUT MAKE912
It isn't clear from your code fragment quite what you are doing wrong, but it looks like you are correlating your subqueries properly; but then you don't really need them... And if you are already correlating the listagg version's subquery properly then you might just actually have too many matches in you real data; it's either that or the subquery is returning more data than it should and aggregating all of that breaks the size limit.
The "missing" part of the subquery is that I use a CASE WHEN TYPE = 'OUT' THEN, so nothing fancy, but that would limit the amount of records I have
You could maybe include that in the join condition:
from purchases p
left join makes m on (p.type = 'OUT' and m.needed_for = p.id)
You can still use a subquery for the listagg approach:
select p.id, p.dt, p.value, p.type,
(
select listagg(m.id, ', ') within group (order by m.id)
from makes m
where m.needed_for = p.id
-- and p.type = 'OUT'
) as satisfied_by
from purchases p;
which may in fact be what you are doing - it isn't entirely clear if that condition is equivalent to the purchased.id = needed_for.id you showed. If you still get ORA-01489 from that then you will from the non-subquery version too, and you just have too many matches to fit the aggregated list into 4000 bytes. And if they both work then I'm not sure what the advantage of having a subquery would be - at best the Oracle optimiser might make them equivalent, but it seems more likely that performance would be worse. You'd need to test with your real environment and data to be sure though.
The non-listagg subquery won't work, with or without the in() (which just adds another level of subquery with no real effect):
select p.id, p.dt, p.value, p.type,
(
select m.id
from makes m
where m.needed_for = p.id
-- and p.type = 'OUT'
) as satisfied_by
from purchases p;
ORA-01427: single-row subquery returns more than one row
... because you know and expect you will get multiple rows from that subquery, at least for some purchases. With your sample data this does actually work if you exclude SALE201, but that isn't helpful. You're trying to cram multiple values into a single scalar result, which won't work, and is why you needed to look at listagg in the first place.
As well as the xmlagg variant demonstrated by #Tejash, you could also get the combined values as a collection, e.g.:
select p.id, p.dt, p.value, p.type,
cast(multiset(
select m.id
from makes m
where m.needed_for = p.id
-- and p.type = 'OUT'
) as sys.odcivarchar2list) as satisfied_by
from purchases p;
ID DT VALUE TYP SATISFIED_BY
------- ---------- ---------- --- --------------------------------------------------
SALE100 2019-01-01 -5 OUT ODCIVARCHAR2LIST('MAKE300')
SALE201 2019-01-09 -10 OUT ODCIVARCHAR2LIST('MAKE301', 'MAKE399', 'MAKE401')
SALE203 2019-02-22 -1 OUT ODCIVARCHAR2LIST('MAKE401')
SALE205 2019-03-14 -1 OUT ODCIVARCHAR2LIST('MAKE912')
... or as a table-type collection of you have one defined in your schema. That may be even harder to work with though, and even further from your 'ideal' output. It depends a bit on what will consume your result set, and how.
Your first query returned the following error:
ORA-01489: result of string concatenation is too long 01489. 00000 -
"result of string concatenation is too long"
As concatenation in the column "Satisfied_By" becomes more than 4000 characters long.
You need to use XMLAGG for the safer side while concatenating the VARCHAR column.
You can try the following query:
-- DATA PREPARATION
with purchases (id, dt, value, type, satisfied_by) as (
select 'SALE100', date '2019-01-01', -5, 'OUT', null from dual
union all select 'SALE201', date '2019-01-09', -10, 'OUT', null from dual
union all select 'SALE203', date '2019-02-22', -1, 'OUT', null from dual
union all select 'SALE205', date '2019-03-14', -1, 'OUT', null from dual
),
makes (id, dt, value, needed_for) as (
select 'MAKE300', date '2018-12-24', 5, 'SALE100' from dual
union all select 'MAKE301', date '2019-01-03', 3, 'SALE201' from dual
union all select 'MAKE399', date '2019-01-05', 5, 'SALE201' from dual
union all select 'MAKE401', date '2019-01-07', 3, 'SALE201' from dual
union all select 'MAKE401', date '2019-01-07', 3, 'SALE203' from dual
union all select 'MAKE912', date '2019-02-01', 1, 'SALE205' from dual
)
-- actual query
SELECT
P.ID,
P.DT,
P.VALUE,
P.TYPE,
RTRIM(XMLAGG(XMLELEMENT(E, M.ID, ',').EXTRACT('//text()')
ORDER BY
M.ID
).GETCLOBVAL(), ',') AS SATISFIED_BY
FROM
PURCHASES P
LEFT JOIN MAKES M ON P.ID = M.NEEDED_FOR
GROUP BY
P.ID,
P.DT,
P.VALUE,
P.TYPE;
DB Fiddle demo
Cheers!!
I need to select rows with start and end dates and if some dates overlap check if the rest of the row is the same then merge the rows with min(startdate) and max(startdate)? I think I first need to group rows that overlap and then I can do a group by that.
Each row have an ID, start_date, end_date and some data. Some rows date range overlaps and some don't, I want to merge those that have the same ID, data and have a date range that overlaps.
When tried only the two top rows with the suggested answer I got the three rows that are last in the question.
id valid_from valid_to
900101 06-MAY-13 02-FEB-14
900101 03-FEB-14 23-JUL-14
900102 01-JAN-10 01-DEC-10
900102 01-JAN-11 23-JAN-13
900102 01-AUG-11 23-JAN-15
900102 01-SEP-11 15-DEC-14
After a run it should be:
id valid_from valid_to
900101 06-MAY-13 02-FEB-14
900101 03-FEB-14 23-JUL-14
900102 01-JAN-10 01-DEC-10
900102 01-JAN-11 23-JAN-15
Where the three bottom rows merged.
With only the two top rows the suggested code returned this:
900101 06-MAY-13 02-FEB-14
900101 06-MAY-13 23-JUL-14
900101 03-FEB-14 23-JUL-14
If you are writing tables with start_date and end_date, you would probably benefit from reading Developing Time-Oriented Database Applications in SQL by Richard Snodgrass. People have studied questions like yours for 20+ years, and this is a great intro to the academic literature for working programmers. You can get a used copy on Amazon or read it for free online (in the "Books" section).
Your specific question is addressed in section 6.5. For instance given this table:
ssn | pcn | start_date | end_date
----------+--------+------------+-----------
111223333 | 120033 | 1996-01-01 | 1996-06-01
111223333 | 120033 | 1996-04-01 | 1996-10-01
111223333 | 120033 | 1996-04-01 | 1996-10-01
111223333 | 120033 | 1996-10-01 | 1998-01-01
111223333 | 120033 | 1997-12-01 | 1998-01-01
You can merge adjacent/overlapping time periods and remove duplicates with this SQL (slightly adapted from the book to use a CTE instead of a temp table):
WITH temp AS (
SELECT ssn, pcn, start_date, end_date
FROM incumbents
)
SELECT DISTINCT f.ssn, f.pcn, f.start_date, l.end_date
FROM temp AS f,
temp AS l
WHERE f.start_date < l.end_date
AND f.ssn = l.ssn
AND f.pcn = l.pcn
AND NOT EXISTS (SELECT 1
FROM temp AS m
WHERE m.ssn = f.ssn
AND m.pcn = f.pcn
AND f.end_date < m.start_date
AND m.start_date < l.start_date
AND NOT EXISTS (SELECT 1
FROM temp AS t1
WHERE t1.ssn = f.ssn
AND t1.pcn = f.pcn
AND t1.start_date < m.start_date
AND m.start_date <= t1.end_date))
AND NOT EXISTS (SELECT 1
FROM temp AS t2
WHERE t2.ssn = f.ssn
AND t2.pcn = f.pcn
AND ((t2.start_date < f.start_date
AND f.start_date <= t2.end_date)
OR (t2.start_date <= l.end_date
AND l.end_date < t2.end_date)))
That is in the Postgres dialect but I'm sure you can adapt it to Oracle (or any other database). Also, you should change ssn and pcn to whatever key you're using (possibly id, as long as the same id is allowed to appear in multiple records at different times).
This will work in Oracle using hierarchical queries and will query the original data only twice
WITH d AS
(
--
SELECT DATE '2016-01-01' effective_start_date, DATE '2016-02-01' - 1 effective_end_date, 1 contract_id
FROM dual
UNION ALL --
SELECT DATE '2016-02-01', DATE '2016-04-01' - 1, 1
FROM dual
UNION ALL --
SELECT DATE '2016-04-01', DATE '2016-04-30', 1
FROM dual
UNION ALL --
SELECT DATE '2016-06-01', DATE '2016-07-01' - 1, 1
FROM dual
UNION ALL -- gap
SELECT DATE '2016-07-01' + 1, DATE '2016-07-31', 1
FROM dual
UNION ALL --
-- other contract
SELECT DATE '2016-02-01', DATE '2016-03-01' - 1, 3
FROM dual
UNION ALL --
SELECT DATE '2016-03-01', DATE '2016-03-31', 3
FROM dual
--
),
q1 AS
(
-- walk the chain backwards and get the "root" start
SELECT d.*, connect_by_root effective_start_date contract_start, LEVEL
FROM d
CONNECT BY PRIOR contract_id = contract_id
AND PRIOR effective_end_date + 1 = effective_start_date),
q2 AS
(
-- walk the chain forward and get the "root" end
SELECT d.*, connect_by_root effective_end_date contract_end, LEVEL
FROM d -
CONNECT BY PRIOR contract_id = contract_id
AND PRIOR effective_start_date = effective_end_date + 1)
-- join the forward and backward data to get the contiguous contract start and ed
SELECT DISTINCT MIN(a.contract_start) contract_start, MAX(b.contract_end) contract_end, a.contract_id
FROM q1 a
JOIN q2 b
ON a.contract_id = b.contract_id
AND a.effective_start_date = b.effective_start_date
GROUP BY a.effective_start_date, a.effective_end_date, a.contract_id
and it gives the desired result
+-----+----------------+--------------+-------------+
| | CONTRACT_START | CONTRACT_END | CONTRACT_ID |
+-----+----------------+--------------+-------------+
| 1 | 2016-01-01 | 2016-04-30 | 1 |
| 2 | 2016-06-01 | 2016-06-30 | 1 |
| 3 | 2016-07-02 | 2016-07-31 | 1 |
| 4 | 2016-02-01 | 2016-03-31 | 3 |
+-----+----------------+--------------+-------------+
I have a table of player performance:
CREATE TABLE TopTen (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
home INT UNSIGNED NOT NULL,
`datetime`DATETIME NOT NULL,
player VARCHAR(6) NOT NULL,
resource INT NOT NULL
);
What query will return the rows for each distinct home holding its maximum value of datetime? In other words, how can I filter by the maximum datetime (grouped by home) and still include other non-grouped, non-aggregate columns (such as player) in the result?
For this sample data:
INSERT INTO TopTen
(id, home, `datetime`, player, resource)
VALUES
(1, 10, '04/03/2009', 'john', 399),
(2, 11, '04/03/2009', 'juliet', 244),
(5, 12, '04/03/2009', 'borat', 555),
(3, 10, '03/03/2009', 'john', 300),
(4, 11, '03/03/2009', 'juliet', 200),
(6, 12, '03/03/2009', 'borat', 500),
(7, 13, '24/12/2008', 'borat', 600),
(8, 13, '01/01/2009', 'borat', 700)
;
the result should be:
id
home
datetime
player
resource
1
10
04/03/2009
john
399
2
11
04/03/2009
juliet
244
5
12
04/03/2009
borat
555
8
13
01/01/2009
borat
700
I tried a subquery getting the maximum datetime for each home:
-- 1 ..by the MySQL manual:
SELECT DISTINCT
home,
id,
datetime AS dt,
player,
resource
FROM TopTen t1
WHERE `datetime` = (SELECT
MAX(t2.datetime)
FROM TopTen t2
GROUP BY home)
GROUP BY `datetime`
ORDER BY `datetime` DESC
The result-set has 130 rows although database holds 187, indicating the result includes some duplicates of home.
Then I tried joining to a subquery that gets the maximum datetime for each row id:
-- 2 ..join
SELECT
s1.id,
s1.home,
s1.datetime,
s1.player,
s1.resource
FROM TopTen s1
JOIN (SELECT
id,
MAX(`datetime`) AS dt
FROM TopTen
GROUP BY id) AS s2
ON s1.id = s2.id
ORDER BY `datetime`
Nope. Gives all the records.
I tried various exotic queries, each with various results, but nothing that got me any closer to solving this problem.
You are so close! All you need to do is select BOTH the home and its max date time, then join back to the topten table on BOTH fields:
SELECT tt.*
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime
The fastest MySQL solution, without inner queries and without GROUP BY:
SELECT m.* -- get the row that contains the max value
FROM topten m -- "m" from "max"
LEFT JOIN topten b -- "b" from "bigger"
ON m.home = b.home -- match "max" row with "bigger" row by `home`
AND m.datetime < b.datetime -- want "bigger" than "max"
WHERE b.datetime IS NULL -- keep only if there is no bigger than max
Explanation:
Join the table with itself using the home column. The use of LEFT JOIN ensures all the rows from table m appear in the result set. Those that don't have a match in table b will have NULLs for the columns of b.
The other condition on the JOIN asks to match only the rows from b that have bigger value on the datetime column than the row from m.
Using the data posted in the question, the LEFT JOIN will produce this pairs:
+------------------------------------------+--------------------------------+
| the row from `m` | the matching row from `b` |
|------------------------------------------|--------------------------------|
| id home datetime player resource | id home datetime ... |
|----|-----|------------|--------|---------|------|------|------------|-----|
| 1 | 10 | 04/03/2009 | john | 399 | NULL | NULL | NULL | ... | *
| 2 | 11 | 04/03/2009 | juliet | 244 | NULL | NULL | NULL | ... | *
| 5 | 12 | 04/03/2009 | borat | 555 | NULL | NULL | NULL | ... | *
| 3 | 10 | 03/03/2009 | john | 300 | 1 | 10 | 04/03/2009 | ... |
| 4 | 11 | 03/03/2009 | juliet | 200 | 2 | 11 | 04/03/2009 | ... |
| 6 | 12 | 03/03/2009 | borat | 500 | 5 | 12 | 04/03/2009 | ... |
| 7 | 13 | 24/12/2008 | borat | 600 | 8 | 13 | 01/01/2009 | ... |
| 8 | 13 | 01/01/2009 | borat | 700 | NULL | NULL | NULL | ... | *
+------------------------------------------+--------------------------------+
Finally, the WHERE clause keeps only the pairs that have NULLs in the columns of b (they are marked with * in the table above); this means, due to the second condition from the JOIN clause, the row selected from m has the biggest value in column datetime.
Read the SQL Antipatterns: Avoiding the Pitfalls of Database Programming book for other SQL tips.
Here goes T-SQL version:
-- Test data
DECLARE #TestTable TABLE (id INT, home INT, date DATETIME,
player VARCHAR(20), resource INT)
INSERT INTO #TestTable
SELECT 1, 10, '2009-03-04', 'john', 399 UNION
SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
SELECT 3, 10, '2009-03-03', 'john', 300 UNION
SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
SELECT 8, 13, '2009-01-01', 'borat', 700
-- Answer
SELECT id, home, date, player, resource
FROM (SELECT id, home, date, player, resource,
RANK() OVER (PARTITION BY home ORDER BY date DESC) N
FROM #TestTable
)M WHERE N = 1
-- and if you really want only home with max date
SELECT T.id, T.home, T.date, T.player, T.resource
FROM #TestTable T
INNER JOIN
( SELECT TI.id, TI.home, TI.date,
RANK() OVER (PARTITION BY TI.home ORDER BY TI.date) N
FROM #TestTable TI
WHERE TI.date IN (SELECT MAX(TM.date) FROM #TestTable TM)
)TJ ON TJ.N = 1 AND T.id = TJ.id
EDIT
Unfortunately, there are no RANK() OVER function in MySQL.
But it can be emulated, see Emulating Analytic (AKA Ranking) Functions with MySQL.
So this is MySQL version:
SELECT id, home, date, player, resource
FROM TestTable AS t1
WHERE
(SELECT COUNT(*)
FROM TestTable AS t2
WHERE t2.home = t1.home AND t2.date > t1.date
) = 0
This will work even if you have two or more rows for each home with equal DATETIME's:
SELECT id, home, datetime, player, resource
FROM (
SELECT (
SELECT id
FROM topten ti
WHERE ti.home = t1.home
ORDER BY
ti.datetime DESC
LIMIT 1
) lid
FROM (
SELECT DISTINCT home
FROM topten
) t1
) ro, topten t2
WHERE t2.id = ro.lid
I think this will give you the desired result:
SELECT home, MAX(datetime)
FROM my_table
GROUP BY home
BUT if you need other columns as well, just make a join with the original table (check Michael La Voie answer)
Best regards.
Since people seem to keep running into this thread (comment date ranges from 1.5 year) isn't this much simpler:
SELECT * FROM (SELECT * FROM topten ORDER BY datetime DESC) tmp GROUP BY home
No aggregation functions needed...
Cheers.
You can also try this one and for large tables query performance will be better. It works when there no more than two records for each home and their dates are different. Better general MySQL query is one from Michael La Voie above.
SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
FROM t_scores_1 t1
INNER JOIN t_scores_1 t2
ON t1.home = t2.home
WHERE t1.date > t2.date
Or in case of Postgres or those dbs that provide analytic functions try
SELECT t.* FROM
(SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
, row_number() over (partition by t1.home order by t1.date desc) rw
FROM topten t1
INNER JOIN topten t2
ON t1.home = t2.home
WHERE t1.date > t2.date
) t
WHERE t.rw = 1
SELECT tt.*
FROM TestTable tt
INNER JOIN
(
SELECT coord, MAX(datetime) AS MaxDateTime
FROM rapsa
GROUP BY
krd
) groupedtt
ON tt.coord = groupedtt.coord
AND tt.datetime = groupedtt.MaxDateTime
This works on Oracle:
with table_max as(
select id
, home
, datetime
, player
, resource
, max(home) over (partition by home) maxhome
from table
)
select id
, home
, datetime
, player
, resource
from table_max
where home = maxhome
Try this for SQL Server:
WITH cte AS (
SELECT home, MAX(year) AS year FROM Table1 GROUP BY home
)
SELECT * FROM Table1 a INNER JOIN cte ON a.home = cte.home AND a.year = cte.year
Here is MySQL version which prints only one entry where there are duplicates MAX(datetime) in a group.
You could test here http://www.sqlfiddle.com/#!2/0a4ae/1
Sample Data
mysql> SELECT * from topten;
+------+------+---------------------+--------+----------+
| id | home | datetime | player | resource |
+------+------+---------------------+--------+----------+
| 1 | 10 | 2009-04-03 00:00:00 | john | 399 |
| 2 | 11 | 2009-04-03 00:00:00 | juliet | 244 |
| 3 | 10 | 2009-03-03 00:00:00 | john | 300 |
| 4 | 11 | 2009-03-03 00:00:00 | juliet | 200 |
| 5 | 12 | 2009-04-03 00:00:00 | borat | 555 |
| 6 | 12 | 2009-03-03 00:00:00 | borat | 500 |
| 7 | 13 | 2008-12-24 00:00:00 | borat | 600 |
| 8 | 13 | 2009-01-01 00:00:00 | borat | 700 |
| 9 | 10 | 2009-04-03 00:00:00 | borat | 700 |
| 10 | 11 | 2009-04-03 00:00:00 | borat | 700 |
| 12 | 12 | 2009-04-03 00:00:00 | borat | 700 |
+------+------+---------------------+--------+----------+
MySQL Version with User variable
SELECT *
FROM (
SELECT ord.*,
IF (#prev_home = ord.home, 0, 1) AS is_first_appear,
#prev_home := ord.home
FROM (
SELECT t1.id, t1.home, t1.player, t1.resource
FROM topten t1
INNER JOIN (
SELECT home, MAX(datetime) AS mx_dt
FROM topten
GROUP BY home
) x ON t1.home = x.home AND t1.datetime = x.mx_dt
ORDER BY home
) ord, (SELECT #prev_home := 0, #seq := 0) init
) y
WHERE is_first_appear = 1;
+------+------+--------+----------+-----------------+------------------------+
| id | home | player | resource | is_first_appear | #prev_home := ord.home |
+------+------+--------+----------+-----------------+------------------------+
| 9 | 10 | borat | 700 | 1 | 10 |
| 10 | 11 | borat | 700 | 1 | 11 |
| 12 | 12 | borat | 700 | 1 | 12 |
| 8 | 13 | borat | 700 | 1 | 13 |
+------+------+--------+----------+-----------------+------------------------+
4 rows in set (0.00 sec)
Accepted Answers' outout
SELECT tt.*
FROM topten tt
INNER JOIN
(
SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home
) groupedtt ON tt.home = groupedtt.home AND tt.datetime = groupedtt.MaxDateTime
+------+------+---------------------+--------+----------+
| id | home | datetime | player | resource |
+------+------+---------------------+--------+----------+
| 1 | 10 | 2009-04-03 00:00:00 | john | 399 |
| 2 | 11 | 2009-04-03 00:00:00 | juliet | 244 |
| 5 | 12 | 2009-04-03 00:00:00 | borat | 555 |
| 8 | 13 | 2009-01-01 00:00:00 | borat | 700 |
| 9 | 10 | 2009-04-03 00:00:00 | borat | 700 |
| 10 | 11 | 2009-04-03 00:00:00 | borat | 700 |
| 12 | 12 | 2009-04-03 00:00:00 | borat | 700 |
+------+------+---------------------+--------+----------+
7 rows in set (0.00 sec)
SELECT c1, c2, c3, c4, c5 FROM table1 WHERE c3 = (select max(c3) from table)
SELECT * FROM table1 WHERE c3 = (select max(c3) from table1)
Another way to gt the most recent row per group using a sub query which basically calculates a rank for each row per group and then filter out your most recent rows as with rank = 1
select a.*
from topten a
where (
select count(*)
from topten b
where a.home = b.home
and a.`datetime` < b.`datetime`
) +1 = 1
DEMO
Here is the visual demo for rank no for each row for better understanding
By reading some comments what about if there are two rows which have same 'home' and 'datetime' field values?
Above query will fail and will return more than 1 rows for above situation. To cover up this situation there will be a need of another criteria/parameter/column to decide which row should be taken which falls in above situation. By viewing sample data set i assume there is a primary key column id which should be set to auto increment. So we can use this column to pick the most recent row by tweaking same query with the help of CASE statement like
select a.*
from topten a
where (
select count(*)
from topten b
where a.home = b.home
and case
when a.`datetime` = b.`datetime`
then a.id < b.id
else a.`datetime` < b.`datetime`
end
) + 1 = 1
DEMO
Above query will pick the row with highest id among the same datetime values
visual demo for rank no for each row
Why not using:
SELECT home, MAX(datetime) AS MaxDateTime,player,resource FROM topten GROUP BY home
Did I miss something?
In MySQL 8.0 this can be achieved efficiently by using row_number() window function with common table expression.
(Here row_number() basically generating unique sequence for each row for every player starting with 1 in descending order of resource. So, for every player row with sequence number 1 will be with highest resource value. Now all we need to do is selecting row with sequence number 1 for each player. It can be done by writing an outer query around this query. But we used common table expression instead since it's more readable.)
Schema:
create TABLE TestTable(id INT, home INT, date DATETIME,
player VARCHAR(20), resource INT);
INSERT INTO TestTable
SELECT 1, 10, '2009-03-04', 'john', 399 UNION
SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
SELECT 3, 10, '2009-03-03', 'john', 300 UNION
SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
SELECT 8, 13, '2009-01-01', 'borat', 700
Query:
with cte as
(
select id, home, date , player, resource,
Row_Number()Over(Partition by home order by date desc) rownumber from TestTable
)
select id, home, date , player, resource from cte where rownumber=1
Output:
id
home
date
player
resource
1
10
2009-03-04 00:00:00
john
399
2
11
2009-03-04 00:00:00
juliet
244
5
12
2009-03-04 00:00:00
borat
555
8
13
2009-01-01 00:00:00
borat
700
db<>fiddle here
This works in SQLServer, and is the only solution I've seen that doesn't require subqueries or CTEs - I think this is the most elegant way to solve this kind of problem.
SELECT TOP 1 WITH TIES *
FROM TopTen
ORDER BY ROW_NUMBER() OVER (PARTITION BY home
ORDER BY [datetime] DESC)
In the ORDER BY clause, it uses a window function to generate & sort by a ROW_NUMBER - assigning a 1 value to the highest [datetime] for each [home].
SELECT TOP 1 WITH TIES will then select one record with the lowest ROW_NUMBER (which will be 1), as well as all records with a tying ROW_NUMBER (also 1)
As a consequence, you retrieve all data for each of the 1st ranked records - that is, all data for records with the highest [datetime] value with their given [home] value.
Try this
select * from mytable a join
(select home, max(datetime) datetime
from mytable
group by home) b
on a.home = b.home and a.datetime = b.datetime
Regards
K
#Michae The accepted answer will working fine in most of the cases but it fail for one for as below.
In case if there were 2 rows having HomeID and Datetime same the query will return both rows, not distinct HomeID as required, for that add Distinct in query as below.
SELECT DISTINCT tt.home , tt.MaxDateTime
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime
this is the query you need:
SELECT b.id, a.home,b.[datetime],b.player,a.resource FROM
(SELECT home,MAX(resource) AS resource FROM tbl_1 GROUP BY home) AS a
LEFT JOIN
(SELECT id,home,[datetime],player,resource FROM tbl_1) AS b
ON a.resource = b.resource WHERE a.home =b.home;
Hope below query will give the desired output:
Select id, home,datetime,player,resource, row_number() over (Partition by home ORDER by datetime desc) as rownum from tablename where rownum=1
(NOTE: The answer of Michael is perfect for a situation where the target column datetime cannot have duplicate values for each distinct home.)
If your table has duplicate rows for homexdatetime and you need to only select one row for each distinct home column, here is my solution to it:
Your table needs one unique column (like id). If it doesn't, create a view and add a random column to it.
Use this query to select a single row for each unique home value. Selects the lowest id in case of duplicate datetime.
SELECT tt.*
FROM topten tt
INNER JOIN
(
SELECT min(id) as min_id, home from topten tt2
INNER JOIN
(
SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt2
ON tt2.home = groupedtt2.home
) as groupedtt
ON tt.id = groupedtt.id
Accepted answer doesn't work for me if there are 2 records with same date and home. It will return 2 records after join. While I need to select any (random) of them. This query is used as joined subquery so just limit 1 is not possible there.
Here is how I reached desired result. Don't know about performance however.
select SUBSTRING_INDEX(GROUP_CONCAT(id order by datetime desc separator ','),',',1) as id, home, MAX(datetime) as 'datetime'
from topten
group by (home)