Self join next timestamp - sql

I am looking to merge timestamp from 2 different row based on Employee and punch card but the max or limit does not work with the from statement, if I only use > then i get every subsequent for everyday... I want the next higher value on a self join, also I have to mention that i have to use SQL 2008! so the lag and Lead does not work!
please help me.
SELECT , Det.name
,Det.[time]
,Det2.[time]
,Det.[type]
,det2.type
,Det.[detail]
FROM [detail] Det
join [detail] Det2 on
Det2.name = Det.name
and
Det2.time > Det.time Max 1
where det.type <>3
Table detail
NAME | Time | Type | detail
john | 10:30| 1 | On
steve| 10:32| 1 | On
john | 10:34| 2 | break
paul | 10:35| 1 | On
steve| 10:45| 3 | Off
john | 10:49| 2 | on
paul | 10:55| 3 | Off
john | 11:12| 3 | Off
Wanted result
John | 10:30 | 10:34 | 1 | 2 | On
John | 10:34 | 10:49 | 2 | 1 | Break
John | 10:49 | 11:12 | 1 | 3 | on
Steve| 10:32 | 10:45 | 1 | 3 | on
Paul | 10:35 | 10:55 | 1 | 3 | On
Thank you in advance!

You can do it with cross apply:
SELECT Det.name
,Det.[time]
,ca.[time]
,Det.[type]
,ca.type
,Det.[detail]
FROM [detail] Det
Cross Apply(Select Top 1 * From detail det2 where det.Name = det2.Name Order By det2.Time) ca
Where det.Type <> 3

As you said LAG or LEAD functions won't work for you, but you could use ROW_NUMBER() OVER (PARTITION BY name ORDER BY time DESC) on both tables and then do a JOIN on RN1 = RN2 + 1
This is just a idea, but I don't see an issue why it shouldn't work.
Query:
;WITH Data (NAME, TIME, type, detail)
AS (
SELECT 'john', CAST('10:30' AS DATETIME2), 1, 'On'
UNION ALL
SELECT 'steve', '10:32', 1, 'On'
UNION ALL
SELECT 'john', '10:34', 2, 'break'
UNION ALL
SELECT 'paul', '10:35', 1, 'On'
UNION ALL
SELECT 'steve', '10:45', 3, 'Off'
UNION ALL
SELECT 'john', '10:49', 2, 'on'
UNION ALL
SELECT 'paul', '10:55', 3, 'Off'
UNION ALL
SELECT 'john', '11:12', 3, 'Off'
)
SELECT t.NAME, LTRIM(RIGHT(CONVERT(VARCHAR(25), t.TIME, 100), 7)) AS time, LTRIM(RIGHT(CONVERT(VARCHAR(25), t2.TIME, 100), 7)) AS time, t.type, t2.type, t.detail
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY TIME) rn, *
FROM Data
) AS t
INNER JOIN (
SELECT ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY TIME) rn, *
FROM Data
) AS t2
ON t2.NAME = t.NAME
AND t2.rn = t.rn + 1;
Result:
NAME time time type type detail
----------------------------------------------
john 10:30AM 10:34AM 1 2 On
john 10:34AM 10:49AM 2 2 break
john 10:49AM 11:12AM 2 3 on
paul 10:35AM 10:55AM 1 3 On
steve 10:32AM 10:45AM 1 3 On
Any comments, concerns - let me know. :)

As #evaldas-buinauskas said,
The OVER and LAG statements in SQL will work for you.
Here is a similar example:
http://www.databasejournal.com/features/mssql/lead-and-lag-functions-in-sql-server-2012.html

Related

SQL query to find the visitor together with the date time

My visitor log table has id, visitor, department,vtime fields.
id | visitor | Visittime | Department_id
--------------------------------------------------------------
1 1 2019-05-07 13:53:50 1
2 2 2019-05-07 13:56:54 1
3 1 2019-05-07 14:54:10 3
4 2 2019-05-08 13:54:49 1
5 1 2019-05-08 13:58:15 1
6 2 2019-05-08 18:54:30 2
7 1 2019-05-08 18:54:37 2
And I have already have the following index
CREATE INDEX Idx_VisitorLog_Visitor_VisitTime_Includes ON VisitorLog
(Visitor, VisitTime) INCLUDE (DepartmentId, ID)
From the above table 4 filters are passed from User interface, visitor 1 and visitor 2 and visiting start time and end time.
In what are the department visitor 1 and visitor 2 both together with the VisitTime difference with in 5 mins those need to be filtered
Output shout be
id | visitor | Visittime | Department_id
--------------------------------------------------------------
1 1 2019-05-07 13:53:50 1
2 2 2019-05-07 13:56:54 1
4 2 2019-05-08 13:54:49 1
5 1 2019-05-08 13:58:15 1
For that I had used the following query,
;with CTE1 AS(
Select id,visitor,Visittime,department_id from visitorlog where visitor=1
)
,CTE2 AS(
Select id,visitor,Visittime,department_id from visitorlog where visitor=2
)
select * from CTE2 V2
Inner join CTE1 V1 on V2.department_id=V1.department_id and DATEDIFF(minute,V2.Visittime,V1.Visittime)between -5 and 5**
The above query takes too much of time to give response. Because in my table, almost 20 million records are available
Could any one suggest the correct way for my requirement.
Thanks in advance
This is a completely revised answer, based upon your additional information above.
After reviewing the data file above and the results you desire, this seems like the cleanest way to provide your results. First, we need a different index:
create index idx_POC_visitorlog on visitorlog
(visitor, Department_id, Visittime) include(id);
With this index, we can limit the queries to only the two passed in IDs. To simulate that, I created variables to hold their values. This query returns the data you are looking for.
DECLARE #Visitor1 int = 1,
#Visitor2 int = 2
;with t as (
select Department_id,
dateadd(minute, -5, visittime) as EarlyTime,
dateadd(minute, 5, Visittime) as LateTime,
id
from visitorlog
where visitor = #Visitor1
),
v as (
select v.id,
t.id as tid
from visitorlog v
INNER JOIN t
ON v.visitor = #Visitor2
AND v.Department_id = t.Department_id
and v.Visittime BETWEEN t.EarlyTime and t.LateTime
)
SELECT *
FROM visitorlog vl
WHERE ID IN (
SELECT v.id
FROM v
UNION
SELECT v.tid
FROM v
)
ORDER BY visittime;
If your version of SQL Server supports the LAG and LEAD functions, try rewriting the query as follows:
with t as (
select
*,
dateadd(minute, 5,
lag(Visittime) over(partition by Department_id order by Visittime)) lag_visit_time,
dateadd(minute, -5,
lead(Visittime) over(partition by Department_id order by Visittime)) lead_visit_time
from visitorlog
where visitor in(1, 2)
)
select
id, visitor, visittime, department_id
from t
where lag_visit_time >= Visittime or lead_visit_time <= Visittime;
This index is called a POC.
Results:
+----+---------+----------------------+---------------+
| id | visitor | visittime | department_id |
+----+---------+----------------------+---------------+
| 1 | 1 | 2019-05-07T13:53:50Z | 1 |
| 2 | 2 | 2019-05-07T13:56:54Z | 1 |
| 4 | 2 | 2019-05-08T13:54:49Z | 1 |
| 5 | 1 | 2019-05-08T13:58:15Z | 1 |
| 6 | 2 | 2019-05-08T18:54:30Z | 2 |
| 7 | 1 | 2019-05-08T18:54:37Z | 2 |
+----+---------+----------------------+---------------+
Demo.

Possible to use a column name in a UDF in SQL?

I have a query in which a series of steps is repeated constantly over different columns, for example:
SELECT DISTINCT
MAX (
CASE
WHEN table_2."GRP1_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP1_MINIMUM_DATE",
MAX (
CASE
WHEN table_2."GRP2_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP2_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
I was considering writing a function to accomplish this as doing so would save on space in my query. I have been reading a bit about UDF in SQL but don't yet understand if it is possible to pass a column name in as a parameter (i.e. simply switch out "GRP1_MINIMUM_DATE" for "GRP2_MINIMUM_DATE" etc.). What I would like is a query which looks like this
SELECT DISTINCT
FUNCTION(table_2."GRP1_MINIMUM_DATE") AS "GRP1_MINIMUM_DATE",
FUNCTION(table_2."GRP2_MINIMUM_DATE") AS "GRP2_MINIMUM_DATE",
FUNCTION(table_2."GRP3_MINIMUM_DATE") AS "GRP3_MINIMUM_DATE",
FUNCTION(table_2."GRP4_MINIMUM_DATE") AS "GRP4_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
Can anyone tell me if this is possible/point me to some resource that might help me out here?
Thanks!
There is no such direct as #Tejash already stated, but the thing looks like your database model is not ideal - it would be better to have a table that has USER_ID and GRP_ID as keys and then MINIMUM_DATE as seperate field.
Without changing the table structure, you can use UNPIVOT query to mimic this design:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4))
Result:
| USER_ID | GRP_ID | MINIMUM_DATE |
|---------|--------|--------------|
| 1 | 1 | 09/09/19 |
| 1 | 2 | 09/09/19 |
| 1 | 3 | 09/09/19 |
| 1 | 4 | 09/09/19 |
| 2 | 1 | 09/08/19 |
| 2 | 2 | 09/07/19 |
| 2 | 3 | 09/06/19 |
| 2 | 4 | 09/05/19 |
With this you can write your query without further code duplication and if you need use PIVOT-syntax to get one line per USER_ID.
The final query could then look like this:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
, INPUT_COHORT(USER_ID, ANCHOR_DATE)
AS (SELECT 1, SYSDATE-1 FROM dual UNION ALL
SELECT 2, SYSDATE-2 FROM dual UNION ALL
SELECT 3, SYSDATE-3 FROM dual)
-- Above is sampledata query starts from here:
, unpiv AS (SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4)))
SELECT qcsj_c000000001000000 user_id, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE
FROM INPUT_COHORT cohort
LEFT JOIN unpiv table_2
ON cohort.USER_ID = table_2.USER_ID
pivot (MAX(CASE WHEN minimum_date <= cohort."ANCHOR_DATE" THEN 1 ELSE 0 END) AS MINIMUM_DATE
FOR grp_id IN (1 AS GRP1,2 AS GRP2,3 AS GRP3,4 AS GRP4))
Result:
| USER_ID | GRP1_MINIMUM_DATE | GRP2_MINIMUM_DATE | GRP3_MINIMUM_DATE | GRP4_MINIMUM_DATE |
|---------|-------------------|-------------------|-------------------|-------------------|
| 3 | | | | |
| 1 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1 | 1 | 1 |
This way you only have to write your calculation logic once (see line starting with pivot).

How to create a query with all of dependencies in hierarchical organization?

I've been trying hard to create a query to see all dependencies in a hierarchical organization. But the only I have accuaried is to retrieve the parent dependency. I have attached an image to show what I need.
Thanks for any clue you can give me.
This is the code I have tried with the production table.
WITH CTE AS
(SELECT
H1.systemuserid,
H1.pes_aprobadorid,
H1.yomifullname,
H1.internalemailaddress
FROM [dbo].[ext_systemuser] H1
WHERE H1.pes_aprobadorid is null
UNION ALL
SELECT
H2.systemuserid,
H2.pes_aprobadorid,
H2.yomifullname,
H2.internalemailaddress
FROM [dbo].[ext_systemuser] H2
INNER JOIN CTE c ON h2.pes_aprobadorid=c.systemuserid)
SELECT *
FROM CTE
OPTION (MAXRECURSION 1000)
You are almost there with your query. You just have to include all rows as a starting point. Also the join should be cte.parent_id = ext.user_id and not the other way round. I've done an example query in postgres, but you shall easily adapt it to your DBMS.
with recursive st_units as (
select 0 as id, NULL as pid, 'Director' as nm
union all select 1, 0, 'Department 1'
union all select 2, 0, 'Department 2'
union all select 3, 1, 'Unit 1'
union all select 4, 3, 'Unit 1.1'
),
cte AS
(
SELECT id, pid, cast(nm as text) as path, 1 as lvl
FROM st_units
UNION ALL
SELECT c.id, u.pid, cast(path || '->' || u.nm as text), lvl + 1
FROM st_units as u
INNER JOIN cte as c on c.pid = u.id
)
SELECT id, pid, path, lvl
FROM cte
ORDER BY lvl, id
id | pid | path | lvl
-: | ---: | :--------------------------------------- | --:
0 | null | Director | 1
1 | 0 | Department 1 | 1
2 | 0 | Department 2 | 1
3 | 1 | Unit 1 | 1
4 | 3 | Unit 1.1 | 1
1 | null | Department 1->Director | 2
2 | null | Department 2->Director | 2
3 | 0 | Unit 1->Department 1 | 2
4 | 1 | Unit 1.1->Unit 1 | 2
3 | null | Unit 1->Department 1->Director | 3
4 | 0 | Unit 1.1->Unit 1->Department 1 | 3
4 | null | Unit 1.1->Unit 1->Department 1->Director | 4
db<>fiddle here
I've reached this code that it is working but when I include a hierarchy table of more than 1800 the query is endless.
With cte AS
(select systemuserid, systemuserid as pes_aprobadorid, internalemailaddress, yomifullname
from #TestTable
union all
SELECT c.systemuserid, u.pes_aprobadorid, u.internalemailaddress, u.yomifullname
FROM #TestTable as u
INNER JOIN cte as c on c.pes_aprobadorid = u.systemuserid
)
select distinct * from cte
where pes_aprobadorid is not null
OPTION (MAXRECURSION 0)

A very basic SQL issue I'm stuck with [duplicate]

I have a table of player performance:
CREATE TABLE TopTen (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
home INT UNSIGNED NOT NULL,
`datetime`DATETIME NOT NULL,
player VARCHAR(6) NOT NULL,
resource INT NOT NULL
);
What query will return the rows for each distinct home holding its maximum value of datetime? In other words, how can I filter by the maximum datetime (grouped by home) and still include other non-grouped, non-aggregate columns (such as player) in the result?
For this sample data:
INSERT INTO TopTen
(id, home, `datetime`, player, resource)
VALUES
(1, 10, '04/03/2009', 'john', 399),
(2, 11, '04/03/2009', 'juliet', 244),
(5, 12, '04/03/2009', 'borat', 555),
(3, 10, '03/03/2009', 'john', 300),
(4, 11, '03/03/2009', 'juliet', 200),
(6, 12, '03/03/2009', 'borat', 500),
(7, 13, '24/12/2008', 'borat', 600),
(8, 13, '01/01/2009', 'borat', 700)
;
the result should be:
id
home
datetime
player
resource
1
10
04/03/2009
john
399
2
11
04/03/2009
juliet
244
5
12
04/03/2009
borat
555
8
13
01/01/2009
borat
700
I tried a subquery getting the maximum datetime for each home:
-- 1 ..by the MySQL manual:
SELECT DISTINCT
home,
id,
datetime AS dt,
player,
resource
FROM TopTen t1
WHERE `datetime` = (SELECT
MAX(t2.datetime)
FROM TopTen t2
GROUP BY home)
GROUP BY `datetime`
ORDER BY `datetime` DESC
The result-set has 130 rows although database holds 187, indicating the result includes some duplicates of home.
Then I tried joining to a subquery that gets the maximum datetime for each row id:
-- 2 ..join
SELECT
s1.id,
s1.home,
s1.datetime,
s1.player,
s1.resource
FROM TopTen s1
JOIN (SELECT
id,
MAX(`datetime`) AS dt
FROM TopTen
GROUP BY id) AS s2
ON s1.id = s2.id
ORDER BY `datetime`
Nope. Gives all the records.
I tried various exotic queries, each with various results, but nothing that got me any closer to solving this problem.
You are so close! All you need to do is select BOTH the home and its max date time, then join back to the topten table on BOTH fields:
SELECT tt.*
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime
The fastest MySQL solution, without inner queries and without GROUP BY:
SELECT m.* -- get the row that contains the max value
FROM topten m -- "m" from "max"
LEFT JOIN topten b -- "b" from "bigger"
ON m.home = b.home -- match "max" row with "bigger" row by `home`
AND m.datetime < b.datetime -- want "bigger" than "max"
WHERE b.datetime IS NULL -- keep only if there is no bigger than max
Explanation:
Join the table with itself using the home column. The use of LEFT JOIN ensures all the rows from table m appear in the result set. Those that don't have a match in table b will have NULLs for the columns of b.
The other condition on the JOIN asks to match only the rows from b that have bigger value on the datetime column than the row from m.
Using the data posted in the question, the LEFT JOIN will produce this pairs:
+------------------------------------------+--------------------------------+
| the row from `m` | the matching row from `b` |
|------------------------------------------|--------------------------------|
| id home datetime player resource | id home datetime ... |
|----|-----|------------|--------|---------|------|------|------------|-----|
| 1 | 10 | 04/03/2009 | john | 399 | NULL | NULL | NULL | ... | *
| 2 | 11 | 04/03/2009 | juliet | 244 | NULL | NULL | NULL | ... | *
| 5 | 12 | 04/03/2009 | borat | 555 | NULL | NULL | NULL | ... | *
| 3 | 10 | 03/03/2009 | john | 300 | 1 | 10 | 04/03/2009 | ... |
| 4 | 11 | 03/03/2009 | juliet | 200 | 2 | 11 | 04/03/2009 | ... |
| 6 | 12 | 03/03/2009 | borat | 500 | 5 | 12 | 04/03/2009 | ... |
| 7 | 13 | 24/12/2008 | borat | 600 | 8 | 13 | 01/01/2009 | ... |
| 8 | 13 | 01/01/2009 | borat | 700 | NULL | NULL | NULL | ... | *
+------------------------------------------+--------------------------------+
Finally, the WHERE clause keeps only the pairs that have NULLs in the columns of b (they are marked with * in the table above); this means, due to the second condition from the JOIN clause, the row selected from m has the biggest value in column datetime.
Read the SQL Antipatterns: Avoiding the Pitfalls of Database Programming book for other SQL tips.
Here goes T-SQL version:
-- Test data
DECLARE #TestTable TABLE (id INT, home INT, date DATETIME,
player VARCHAR(20), resource INT)
INSERT INTO #TestTable
SELECT 1, 10, '2009-03-04', 'john', 399 UNION
SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
SELECT 3, 10, '2009-03-03', 'john', 300 UNION
SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
SELECT 8, 13, '2009-01-01', 'borat', 700
-- Answer
SELECT id, home, date, player, resource
FROM (SELECT id, home, date, player, resource,
RANK() OVER (PARTITION BY home ORDER BY date DESC) N
FROM #TestTable
)M WHERE N = 1
-- and if you really want only home with max date
SELECT T.id, T.home, T.date, T.player, T.resource
FROM #TestTable T
INNER JOIN
( SELECT TI.id, TI.home, TI.date,
RANK() OVER (PARTITION BY TI.home ORDER BY TI.date) N
FROM #TestTable TI
WHERE TI.date IN (SELECT MAX(TM.date) FROM #TestTable TM)
)TJ ON TJ.N = 1 AND T.id = TJ.id
EDIT
Unfortunately, there are no RANK() OVER function in MySQL.
But it can be emulated, see Emulating Analytic (AKA Ranking) Functions with MySQL.
So this is MySQL version:
SELECT id, home, date, player, resource
FROM TestTable AS t1
WHERE
(SELECT COUNT(*)
FROM TestTable AS t2
WHERE t2.home = t1.home AND t2.date > t1.date
) = 0
This will work even if you have two or more rows for each home with equal DATETIME's:
SELECT id, home, datetime, player, resource
FROM (
SELECT (
SELECT id
FROM topten ti
WHERE ti.home = t1.home
ORDER BY
ti.datetime DESC
LIMIT 1
) lid
FROM (
SELECT DISTINCT home
FROM topten
) t1
) ro, topten t2
WHERE t2.id = ro.lid
I think this will give you the desired result:
SELECT home, MAX(datetime)
FROM my_table
GROUP BY home
BUT if you need other columns as well, just make a join with the original table (check Michael La Voie answer)
Best regards.
Since people seem to keep running into this thread (comment date ranges from 1.5 year) isn't this much simpler:
SELECT * FROM (SELECT * FROM topten ORDER BY datetime DESC) tmp GROUP BY home
No aggregation functions needed...
Cheers.
You can also try this one and for large tables query performance will be better. It works when there no more than two records for each home and their dates are different. Better general MySQL query is one from Michael La Voie above.
SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
FROM t_scores_1 t1
INNER JOIN t_scores_1 t2
ON t1.home = t2.home
WHERE t1.date > t2.date
Or in case of Postgres or those dbs that provide analytic functions try
SELECT t.* FROM
(SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
, row_number() over (partition by t1.home order by t1.date desc) rw
FROM topten t1
INNER JOIN topten t2
ON t1.home = t2.home
WHERE t1.date > t2.date
) t
WHERE t.rw = 1
SELECT tt.*
FROM TestTable tt
INNER JOIN
(
SELECT coord, MAX(datetime) AS MaxDateTime
FROM rapsa
GROUP BY
krd
) groupedtt
ON tt.coord = groupedtt.coord
AND tt.datetime = groupedtt.MaxDateTime
This works on Oracle:
with table_max as(
select id
, home
, datetime
, player
, resource
, max(home) over (partition by home) maxhome
from table
)
select id
, home
, datetime
, player
, resource
from table_max
where home = maxhome
Try this for SQL Server:
WITH cte AS (
SELECT home, MAX(year) AS year FROM Table1 GROUP BY home
)
SELECT * FROM Table1 a INNER JOIN cte ON a.home = cte.home AND a.year = cte.year
Here is MySQL version which prints only one entry where there are duplicates MAX(datetime) in a group.
You could test here http://www.sqlfiddle.com/#!2/0a4ae/1
Sample Data
mysql> SELECT * from topten;
+------+------+---------------------+--------+----------+
| id | home | datetime | player | resource |
+------+------+---------------------+--------+----------+
| 1 | 10 | 2009-04-03 00:00:00 | john | 399 |
| 2 | 11 | 2009-04-03 00:00:00 | juliet | 244 |
| 3 | 10 | 2009-03-03 00:00:00 | john | 300 |
| 4 | 11 | 2009-03-03 00:00:00 | juliet | 200 |
| 5 | 12 | 2009-04-03 00:00:00 | borat | 555 |
| 6 | 12 | 2009-03-03 00:00:00 | borat | 500 |
| 7 | 13 | 2008-12-24 00:00:00 | borat | 600 |
| 8 | 13 | 2009-01-01 00:00:00 | borat | 700 |
| 9 | 10 | 2009-04-03 00:00:00 | borat | 700 |
| 10 | 11 | 2009-04-03 00:00:00 | borat | 700 |
| 12 | 12 | 2009-04-03 00:00:00 | borat | 700 |
+------+------+---------------------+--------+----------+
MySQL Version with User variable
SELECT *
FROM (
SELECT ord.*,
IF (#prev_home = ord.home, 0, 1) AS is_first_appear,
#prev_home := ord.home
FROM (
SELECT t1.id, t1.home, t1.player, t1.resource
FROM topten t1
INNER JOIN (
SELECT home, MAX(datetime) AS mx_dt
FROM topten
GROUP BY home
) x ON t1.home = x.home AND t1.datetime = x.mx_dt
ORDER BY home
) ord, (SELECT #prev_home := 0, #seq := 0) init
) y
WHERE is_first_appear = 1;
+------+------+--------+----------+-----------------+------------------------+
| id | home | player | resource | is_first_appear | #prev_home := ord.home |
+------+------+--------+----------+-----------------+------------------------+
| 9 | 10 | borat | 700 | 1 | 10 |
| 10 | 11 | borat | 700 | 1 | 11 |
| 12 | 12 | borat | 700 | 1 | 12 |
| 8 | 13 | borat | 700 | 1 | 13 |
+------+------+--------+----------+-----------------+------------------------+
4 rows in set (0.00 sec)
Accepted Answers' outout
SELECT tt.*
FROM topten tt
INNER JOIN
(
SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home
) groupedtt ON tt.home = groupedtt.home AND tt.datetime = groupedtt.MaxDateTime
+------+------+---------------------+--------+----------+
| id | home | datetime | player | resource |
+------+------+---------------------+--------+----------+
| 1 | 10 | 2009-04-03 00:00:00 | john | 399 |
| 2 | 11 | 2009-04-03 00:00:00 | juliet | 244 |
| 5 | 12 | 2009-04-03 00:00:00 | borat | 555 |
| 8 | 13 | 2009-01-01 00:00:00 | borat | 700 |
| 9 | 10 | 2009-04-03 00:00:00 | borat | 700 |
| 10 | 11 | 2009-04-03 00:00:00 | borat | 700 |
| 12 | 12 | 2009-04-03 00:00:00 | borat | 700 |
+------+------+---------------------+--------+----------+
7 rows in set (0.00 sec)
SELECT c1, c2, c3, c4, c5 FROM table1 WHERE c3 = (select max(c3) from table)
SELECT * FROM table1 WHERE c3 = (select max(c3) from table1)
Another way to gt the most recent row per group using a sub query which basically calculates a rank for each row per group and then filter out your most recent rows as with rank = 1
select a.*
from topten a
where (
select count(*)
from topten b
where a.home = b.home
and a.`datetime` < b.`datetime`
) +1 = 1
DEMO
Here is the visual demo for rank no for each row for better understanding
By reading some comments what about if there are two rows which have same 'home' and 'datetime' field values?
Above query will fail and will return more than 1 rows for above situation. To cover up this situation there will be a need of another criteria/parameter/column to decide which row should be taken which falls in above situation. By viewing sample data set i assume there is a primary key column id which should be set to auto increment. So we can use this column to pick the most recent row by tweaking same query with the help of CASE statement like
select a.*
from topten a
where (
select count(*)
from topten b
where a.home = b.home
and case
when a.`datetime` = b.`datetime`
then a.id < b.id
else a.`datetime` < b.`datetime`
end
) + 1 = 1
DEMO
Above query will pick the row with highest id among the same datetime values
visual demo for rank no for each row
Why not using:
SELECT home, MAX(datetime) AS MaxDateTime,player,resource FROM topten GROUP BY home
Did I miss something?
In MySQL 8.0 this can be achieved efficiently by using row_number() window function with common table expression.
(Here row_number() basically generating unique sequence for each row for every player starting with 1 in descending order of resource. So, for every player row with sequence number 1 will be with highest resource value. Now all we need to do is selecting row with sequence number 1 for each player. It can be done by writing an outer query around this query. But we used common table expression instead since it's more readable.)
Schema:
create TABLE TestTable(id INT, home INT, date DATETIME,
player VARCHAR(20), resource INT);
INSERT INTO TestTable
SELECT 1, 10, '2009-03-04', 'john', 399 UNION
SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
SELECT 3, 10, '2009-03-03', 'john', 300 UNION
SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
SELECT 8, 13, '2009-01-01', 'borat', 700
Query:
with cte as
(
select id, home, date , player, resource,
Row_Number()Over(Partition by home order by date desc) rownumber from TestTable
)
select id, home, date , player, resource from cte where rownumber=1
Output:
id
home
date
player
resource
1
10
2009-03-04 00:00:00
john
399
2
11
2009-03-04 00:00:00
juliet
244
5
12
2009-03-04 00:00:00
borat
555
8
13
2009-01-01 00:00:00
borat
700
db<>fiddle here
This works in SQLServer, and is the only solution I've seen that doesn't require subqueries or CTEs - I think this is the most elegant way to solve this kind of problem.
SELECT TOP 1 WITH TIES *
FROM TopTen
ORDER BY ROW_NUMBER() OVER (PARTITION BY home
ORDER BY [datetime] DESC)
In the ORDER BY clause, it uses a window function to generate & sort by a ROW_NUMBER - assigning a 1 value to the highest [datetime] for each [home].
SELECT TOP 1 WITH TIES will then select one record with the lowest ROW_NUMBER (which will be 1), as well as all records with a tying ROW_NUMBER (also 1)
As a consequence, you retrieve all data for each of the 1st ranked records - that is, all data for records with the highest [datetime] value with their given [home] value.
Try this
select * from mytable a join
(select home, max(datetime) datetime
from mytable
group by home) b
on a.home = b.home and a.datetime = b.datetime
Regards
K
#Michae The accepted answer will working fine in most of the cases but it fail for one for as below.
In case if there were 2 rows having HomeID and Datetime same the query will return both rows, not distinct HomeID as required, for that add Distinct in query as below.
SELECT DISTINCT tt.home , tt.MaxDateTime
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime
this is the query you need:
SELECT b.id, a.home,b.[datetime],b.player,a.resource FROM
(SELECT home,MAX(resource) AS resource FROM tbl_1 GROUP BY home) AS a
LEFT JOIN
(SELECT id,home,[datetime],player,resource FROM tbl_1) AS b
ON a.resource = b.resource WHERE a.home =b.home;
Hope below query will give the desired output:
Select id, home,datetime,player,resource, row_number() over (Partition by home ORDER by datetime desc) as rownum from tablename where rownum=1
(NOTE: The answer of Michael is perfect for a situation where the target column datetime cannot have duplicate values for each distinct home.)
If your table has duplicate rows for homexdatetime and you need to only select one row for each distinct home column, here is my solution to it:
Your table needs one unique column (like id). If it doesn't, create a view and add a random column to it.
Use this query to select a single row for each unique home value. Selects the lowest id in case of duplicate datetime.
SELECT tt.*
FROM topten tt
INNER JOIN
(
SELECT min(id) as min_id, home from topten tt2
INNER JOIN
(
SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt2
ON tt2.home = groupedtt2.home
) as groupedtt
ON tt.id = groupedtt.id
Accepted answer doesn't work for me if there are 2 records with same date and home. It will return 2 records after join. While I need to select any (random) of them. This query is used as joined subquery so just limit 1 is not possible there.
Here is how I reached desired result. Don't know about performance however.
select SUBSTRING_INDEX(GROUP_CONCAT(id order by datetime desc separator ','),',',1) as id, home, MAX(datetime) as 'datetime'
from topten
group by (home)

Query for missing elements

I have a table with the following structure:
timestamp | name | value
0 | john | 5
1 | NULL | 3
8 | NULL | 12
12 | john | 3
33 | NULL | 4
54 | pete | 1
180 | NULL | 4
400 | john | 3
401 | NULL | 4
592 | anna | 2
Now what I am looking for is a query that will give me the sum of the values for each name, and treats the nulls in between (orderd by the timestamp) as the first non-null name down the list, as if the table were as follows:
timestamp | name | value
0 | john | 5
1 | john | 3
8 | john | 12
12 | john | 3
33 | pete | 4
54 | pete | 1
180 | john | 4
400 | john | 3
401 | anna | 4
592 | anna | 2
and I would query SUM(value), name from this table group by name. I have thought and tried, but I can't come up with a proper solution. I have looked at recursive common table expressions, and think the answer may lie in there, but I haven't been able to properly understand those.
These tables are just examples, and I don't know the timestamp values in advance.
Could someone give me a hand? Help would be very much appreciated.
With Inputs As
(
Select 0 As [timestamp], 'john' As Name, 5 As value
Union All Select 1, NULL, 3
Union All Select 8, NULL, 12
Union All Select 12, 'john', 3
Union All Select 33, NULL, 4
Union All Select 54, 'pete', 1
Union All Select 180, NULL, 4
Union All Select 400, 'john', 3
Union All Select 401, NULL, 4
Union All Select 592, 'anna', 2
)
, NamedInputs As
(
Select I.timestamp
, Coalesce (I.Name
, (
Select I3.Name
From Inputs As I3
Where I3.timestamp = (
Select Max(I2.timestamp)
From Inputs As I2
Where I2.timestamp < I.timestamp
And I2.Name Is not Null
)
)) As name
, I.value
From Inputs As I
)
Select NI.name, Sum(NI.Value) As Total
From NamedInputs As NI
Group By NI.name
Btw, what would be orders of magnitude faster than any query would be to first correct the data. I.e., update the name column to have the proper value, make it non-nullable and then run a simple Group By to get your totals.
Additional Solution
Select Coalesce(I.Name, I2.Name), Sum(I.value) As Total
From Inputs As I
Left Join (
Select I1.timestamp, MAX(I2.Timestamp) As LastNameTimestamp
From Inputs As I1
Left Join Inputs As I2
On I2.timestamp < I1.timestamp
And I2.Name Is Not Null
Group By I1.timestamp
) As Z
On Z.timestamp = I.timestamp
Left Join Inputs As I2
On I2.timestamp = Z.LastNameTimestamp
Group By Coalesce(I.Name, I2.Name)
You don't need CTE, just a simple subquery.
select t.timestamp, ISNULL(t.name, (
select top(1) i.name
from inputs i
where i.timestamp < t.timestamp
and i.name is not null
order by i.timestamp desc
)), t.value
from inputs t
And summing from here
select name, SUM(value) as totalValue
from
(
select t.timestamp, ISNULL(t.name, (
select top(1) i.name
from inputs i
where i.timestamp < t.timestamp
and i.name is not null
order by i.timestamp desc
)) as name, t.value
from inputs t
) N
group by name
I hope I'm not going to be embarassed by offering you this little recursive CTE query of mine as a solution to your problem.
;WITH
numbered_table AS (
SELECT
timestamp, name, value,
rownum = ROW_NUMBER() OVER (ORDER BY timestamp)
FROM your_table
),
filled_table AS (
SELECT
timestamp,
name,
value
FROM numbered_table
WHERE rownum = 1
UNION ALL
SELECT
nt.timestamp,
name = ISNULL(nt.name, ft.name),
nt.value
FROM numbered_table nt
INNER JOIN filled_table ft ON nt.rownum = ft.rownum + 1
)
SELECT *
FROM filled_table
/* or go ahead aggregating instead */