TSQL: Return row(s) with earliest dates - sql

Given 2 tables called "table1" and "table1_hist" that structurally resemble this:
TABLE1
id status date_this_status
1 open 2008-12-12
2 closed 2009-01-01
3 pending 2009-05-05
4 pending 2009-05-06
5 open 2009-06-01
TABLE1_hist
id status date_this_status
2 open 2008-12-24
2 pending 2008-12-26
3 open 2009-04-24
4 open 2009-05-04
With table1 being the current status and table1_hist being a history table of table1, how can I return the rows for each id that has the earliest date. In other words, for each id, I need to know it's earliest status and date.
EXAMPLE:
For id 1 earliest status and date is open and 2008-12-12.
For id 2 earliest status and date is open and 2008-12-24.
I've tried using MIN(datetime), unions, dynamic SQL, etc. I've just reached tsql writers block today and I'm stuck.
Edited to add: Ugh. This is for a SQL2000 database, so Alex Martelli's answer won't work. ROW_NUMBER wasn't introduced until SQL2005.

SQL Server 2005 and later support an interesting (relatively recent) aspect of SQL Standards, "ranking/windowing functions", allowing:
WITH AllRows AS (
SELECT id, status, date_this_status,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY date_this_status ASC) AS row,
FROM (SELECT * FROM Table1 UNION SELECT * FROM Table1_hist) Both_tables
)
SELECT id, status, date_this_status
FROM AllRows
WHERE row = 1
ORDER BY id;
where I'm also using the nice (and equally "new") WITH syntax to avoid nesting the sub-query in the main SELECT.
This article shows how one could hack the equivalent of ROW_NUMBER (and also RANK and DENSE_RANK, the other two "new" ranking/windowing functions) in SQL Server 2000 -- but that's not necessarily pretty nor especially well-performing, alas.

The following code sample is completely self-sufficient, just copy and paste it into a management studio query and hit F5 =)
DECLARE #TABLE1 TABLE
(
id INT,
status VARCHAR(50),
date_this_status DATETIME
)
DECLARE #TABLE1_hist TABLE
(
id INT,
status VARCHAR(50),
date_this_status DATETIME
)
--TABLE1
INSERT #TABLE1
SELECT 1, 'open', '2008-12-12' UNION ALL
SELECT 2, 'closed', '2009-01-01' UNION ALL
SELECT 3, 'pending', '2009-05-05' UNION ALL
SELECT 4, 'pending', '2009-05-06' UNION ALL
SELECT 5, 'open', '2009-06-01'
--TABLE1_hist
INSERT #TABLE1_hist
SELECT 2, 'open', '2008-12-24' UNION ALL
SELECT 2, 'pending', '2008-12-26' UNION ALL
SELECT 3, 'open', '2009-04-24' UNION ALL
SELECT 4, 'open', '2009-05-04'
SELECT x.id,
ISNULL(y.[status], x.[status]) AS [status],
ISNULL(y.date_this_status, x.date_this_status) AS date_this_status
FROM #TABLE1 x
LEFT JOIN (
SELECT a.*
FROM #TABLE1_hist a
INNER JOIN (
SELECT id,
MIN(date_this_status) AS date_this_status
FROM #TABLE1_hist
GROUP BY id
) b
ON a.id = b.id
AND a.date_this_status = b.date_this_status
) y
ON x.id = y.id

SELECT id,
status,
date_this_status
FROM ( SELECT *
FROM Table1
UNION
SELECT *
from TABLE1_hist
) a
WHERE date_this_status = ( SELECT MIN(date_this_status)
FROM ( SELECT *
FROM Table1
UNION
SELECT *
from TABLE1_hist
) t
WHERE id = a.id
)
This is a bit ugly, but seems to work in MS SQL Server 2005.

You can do this with an exclusive self join. Join on the history table, and then another time on all earlier history entries. In the where statement, you specify that there are not allowed to be any earlier entries.
select t1.id,
isnull(hist.status, t1.status),
isnull(hist.date_this_status, t1.date_this_status)
from table1 t1
left join (
select h1.id, h1.status, h1.date_this_status
from table1_hist h1
left join table1_hist h2
on h2.id = h1.id
and h2.date_this_status < h1.date_this_status
where h2.date_this_status is null
) hist on hist.id = t1.id
A bit of a mind-binder, but fairly flexible and efficient!
This assumes there are no two history entries with the exact same date. If there are, write the self join like:
left join table1_hist h2
on h2.id = h1.id
and (
h2.date_this_status < h1.date_this_status
or (h2.date_this_status = h1.date_this_status and h2.id < h1.id)
)

If I understand the OP correctly, a given ID may appear in TABLE1 or TABLE1_HISTORY or both.
In your result set, you want back each distinct ID and the oldest status/date associated with that ID, regardless which table the oldest one happens to be in.
So, look in BOTH tables and return any record where there is no record in either table for it's ID that has a smaller date_this_status.
Try this:
SELECT ID, status, date_this_status FROM table1 ta WHERE
NOT EXISTS(SELECT null FROM table1 tb WHERE
tb.id = ta.id
AND tb.date_this_status < ta.date_this_status)
AND NOT EXISTS(SELECT null FROM table1_history tbh WHERE
tbh.id = ta.id
AND tbh.date_this_status < ta.date_this_status)
UNION ALL
SELECT ID, status, date_this_status FROM table1_history tah WHERE
NOT EXISTS(SELECT null FROM table1 tb WHERE
tb.id = tah.id
AND tb.date_this_status < tah.date_this_status)
AND NOT EXISTS(SELECT null FROM table1_history tbh WHERE
tbh.id = tah.id
AND tbh.date_this_status < tah.date_this_status)
Three underlying assumptions here:
Every ID you want back will have at least one record in at least one of the tables.
There won't be multiple records for the same ID in the same table with the same date_this_status value (can be mitigated by using DISTINCT)
There won't be records for the same ID in the other table with the same date_this_status value (can be mitigated by using UNION instead of UNION ALL)
There are two slight optimizations we can make:
If an ID has a record in TABLE1_HISTORY, it will always be older than the record in TABLE1 for that ID.
TABLE1 will never contain multiple records for the same ID (but the history table may).
So:
SELECT ID, status, date_this_status FROM table1 ta WHERE
NOT EXISTS(SELECT null FROM table1_history tbh WHERE
tbh.id = ta.id
)
UNION ALL
SELECT ID, status, date_this_status FROM table1_history tah WHERE
NOT EXISTS(SELECT null FROM table1_history tbh WHERE
tbh.id = tah.id
AND tbh.date_this_status < tah.date_this_status)

If that is the actual structure of your tables, you can't get a 100% accurate answer, the issue being that you can have 2 different statuses for the same (earliest) date for any given record and you would not know which one was entered first, because you don't have a primary key on the history table

Ignoring the "two tables" issues for a moment, I'd use the following logic...
SELECT
id, status, date
FROM
Table1_hist AS [data]
WHERE
[data].date = (SELECT MIN(date) FROM Table1_hist WHERE id = [data].id)
(EDIT: As per BlackTigerX's comment, this assumes no id can have more than one status with the same datetime.)
The simple way to extrapolate this to two tables is to use breitak67's answer. Replace all instances of "my_table" with subqueries that UNION the two tables together. A potential issue here is that of performance, as you may find that indexes become unusable.
One method of speeding this up could be to use implied knowledge:
1. The main table always has a record for each id.
2. The history table doesn't always have a record.
3. Any record in the history table is always 'older' than the one in main table.
SELECT
[main].id,
ISNULL([hist].status, [main].status),
ISNULL([hist].date, [main].date)
FROM
Table1 AS [main]
LEFT JOIN
(
SELECT
id, status, date
FROM
Table1_hist AS [data]
WHERE
[data].date = (SELECT MIN(date) FROM Table1_hist WHERE id = [data].id)
)
AS [hist]
ON [hist].id = [main].id
Find the oldest status for each id in the history table. (Can use its indexes)
LEFT JOIN that to the main table (which always has exactly one record for each id)
If [hist] contains a value, it's the older by definition
If the [hist] doesn't have a value, use the [main] value

Related

Max and Min records from 2 tables

I have 2 tables.
The 1st table have the columns fileID, createdate.
The 2nd table have the userid, fileID, createdate as common fields along with other columns.
I am trying to write a query to find latest fileid(max) and the 1st loaded fileid(min) based on the createdate for a specific userid by joining both these tables and using groupby on fileid, createdate in the query and filtering the user id in the where clause.
However the result is showing all the rows.
I need a suggestion as how to write a query inorder to get 2 records(max and min fileid records) only from both these tables and not all the records with these field changes.
I am using SQL Server to write the query.
Thanks for your help.
To select fieldid by earliest or lattest createdate and to have it in two separate rows you can try something like this:
SELECT fileid, "earliest" as type FROM table1 WHERE createdate = (SELECT MIN(createdate) from table1) LIMIT 1
UNION ALL
SELECT fileid, "lattest" as type FROM table1 WHERE createdate = (SELECT MAX(createdate) from table1) LIMIT 1
It is not clear, why you want to join it with the second table, but you can do it like this:
SELECT
*
FROM
(
SELECT fileid, "earliest" as type FROM table1 WHERE createdate = (SELECT
MIN(createdate) from table1) LIMIT 1
UNION ALL
SELECT fileid, "lattest" as type FROM table1 WHERE createdate = (SELECT
MAX(createdate) from table1) LIMIT 1
) as subquery1
LEFT JOIN
table2 on table2.fileid = createdate.fileid

most efficient way to select duplicate rows with max timestamp

Suppose I have a table called t, which is like
id content time
1 'a' 100
1 'a' 101
1 'b' 102
2 'c' 200
2 'c' 201
id are duplicate, and for the same id, content could also be duplicate. Now I want to select for each id the rows with max timestamp, which would be
id content time
1 'b' 102
2 'c' 201
And this is my current solution:
select t1.id, t1.content, t1.time
from (
select id, content, time from t
) as t1
right join (
select id, max(time) as time from t group by id
) as t2
on t1.id = t2.id and t1.time = t2.time;
But this looks inefficient to me. Because theoretically when select id, max(time) as time from t group by id is executed, the rows I want have already been located. The right join brings extra O(n^2) time cost, which seems unnecessary.
So is there any more efficient way to do it, or anything that I missunderstand?
Use DISTINCT ON:
SELECT DISTINCT ON (id) id, content, time
FROM yourTable
ORDER BY id, time DESC;
On Postgres, this is usually the most performant way to write your query, and it should outperform ROW_NUMBER and other approaches.
The following index might speed up this query:
CREATE INDEX idx ON yourTable (id, time DESC, content);
This index, if used, would let Postgres rapidly find, for each id, the record having the latest time. This index also covers the content column.
Try this
SELECT a.id, a.content, a.time FROM t AS a
INNER JOIN (
SELECT a.content, MAX(a.time) AS time FROM t
GROUP BY a.content
) AS b ON a.content = b.content AND a.time = b.time

Diff between two tables (using sql) -> incremental changes

I have a need to identify differences between two tables. I have looked at sql query to return differences between two tables but it was a bit too different for me to extrapolate with my current SQL skills.
Table A is a snapshot of a certain group of people where the snapshot was taken yesterday, where each row is a unique person and certain characteristics about the person. Table B is the same snapshot taken 24 hours later. Within the 24 hour period:
New people may have been added.
People from yesterday may have been removed.
People from yesterday may have changed (i.e., original row is there, but one or more column values have changed).
My output should have the following:
a row for each new person added
a row for each person removed
a row for each person who has changed
I would grateful for any ideas. Thanks!
This type of problem has a very simple and efficient solution that does not use joins (it doesn't even use a union of the results of two MINUS operations) - it just uses one union and a GROUP BY operation. The solution was developed in a thread on AskTom many years ago, it is surprising that it is not more widely known and used. For example (but not only): https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:24371552251735
In your case, assuming there is a primary key constraint on PERSON_ID (which makes the solution simpler):
select max(flag) as flag, PERSON_ID, first_name, last_name, (etc. - all the columns)
from ( select 'old' as flag, t1.*
from old_table t1
union all
select 'new' as flag, t2.*
from new_table t2
)
group by PERSON_ID, first_name, last_name, (etc.)
having count(*) = 1
order by PERSON_ID -- optional
;
If for a PERSON_ID all the data is the same in both tables, that will result in a count of 2 for that group. So it won't pass the HAVING condition. The only groups that will have a count of 1 (and therefore will be just one row each!) are either rows that are in one table but not the other. If a person was added, that will show only one row, with the flag = 'new'. If a person was deleted, you will get only one row, with the flag 'old'. If there were updates, the same PERSON_ID will appear twice, but since at least one field is different, the two rows (one with flag 'new' and the other with 'old') will be in different groups, they will pass the HAVING filter, and they will BOTH be in the output.
Which is slightly different from what you requested; you will get both the old AND the new information for updates, labeled as 'old' and 'new'. You said you wanted only one of those but didn't state which one. This will give you both (which makes more sense anyway), but if you really only want one, it can be done easily in the query above.
Note - the outer select must have max(flag) rather than flag because flag is not a GROUP BY column; but it's the max() over exactly one row, so it WILL be the flag for that row anyway.
Added - OP indicated he would like to get only the "new" row for a person with updated (changed, modified) data. The approach shown below will change the flag to "changed" in this case.
with old_table ( person_id, first_name, last_name ) as (
select 101, 'John', 'Smith' from dual union all
select 102, 'Mary', 'Green' from dual union all
select 103, 'July', 'Dobbs' from dual union all
select 104, 'Will', 'Scott' from dual
),
new_table ( person_id, first_name, last_name ) as (
select 101, 'Joe' , 'Smith' from dual union all
select 102, 'Mary', 'Green' from dual union all
select 104, 'Will', 'Scott' from dual union all
select 105, 'Andy', 'Brown' from dual
)
-- end of test data; solution (SQL query) begins below this line
select case ct when 1 then flag else 'changed' end as flag,
person_id, first_name, last_name
from (
select max(flag) as flag, person_id, first_name, last_name,
count(*) over (partition by person_id) as ct,
row_number() over (partition by person_id order by max(flag)) as rn
from ( select 'old' as flag, t1.*
from old_table t1
union all
select 'new' as flag, t2.*
from new_table t2
)
group by person_id, first_name, last_name
having count(*) = 1
)
where rn = 1
order by person_id -- ORDER BY clause is optional
;
Output:
FLAG PERSON_ID FIRS_NAME LAST_NAME
------- ---------- --------- ---------
changed 101 Joe Smith
old 103 July Dobbs
new 105 Andy Brown
The first 2 parts are easy:
select 'New', name from B where not exists (select name from A where A.name=B.name)
union select 'Removed', name from A where not exists (select name from B where B.name = A.name)
The last one is where you need to compare characteristics. How many of them are there? Do you want to list what has changed or only that they have changed?
For argument's sake, let us only say that the characteristics are address and telephone #:
union select 'Phone', name from A,B where A.name = B.name and A.telephone != B.telephone
union select 'Address', name from A,B where A.name = B.name and A.address != B.address
Note: The question isn't currently tagged with the dbms. I use sql-server, so that's what I used to write the below. There may be slight differences in another dbms.
You can do something along these lines:
select *
from TableA a
left join TableB b on b.ID = a.ID
where a.ID is null -- added since yesterday
union
select *
from TableA a
left join TableB b on b.ID = a.ID
where b.ID is null -- removed since yesterday
union
select *
from TableA a
inner join TableB b on b.ID = a.ID -- restrict to records in both tables
where a.SomeValue <> b.SomeValue
or a.SomeOtherValue <> b.SomeOtherValue
--etc
Each select handles one portion of your expected output. In this manner, they'd all be joined into 1 result set. If you drop the union, you'll end up with a separate set for each.
I suggest to use Except to get the changed records. The below query should work if the db is sql server.
-- added since yesterday
SELECT B.*
FROM TableA A
LEFT Outer Join TableB B on B.ID = A.ID
WHERE A.ID IS NULL
UNION
-- removed since yesterday
SELECT A.*
FROM TableA A
LEFT OUTER JOIN TableB B on B.ID = A.ID
WHERE B.ID IS NULL
UNION
-- Those changed with values from yesterdady
SELECT B.* FROM TableB B WHERE EXISTS(SELECT A.ID FROM TableA A WHERE A.ID = B.ID)
EXCEPT
SELECT A.* FROM TableA A WHERE EXISTS(SELECT B.ID FROM TableB B WHERE B.ID = A.ID)
Assuming you have a unique id for each person in the able, you can use full outer join:
select coalesce(ty.customerid, tt.customerid) as customerid,
(case when ty.customerid is null then 'New'
when tt.customerid is null then 'Removed'
else 'Modified'
end) as status
from tyesterday ty full outer join
ttoday tt
on ty.customerid= tt.customerid
where ty.customerid is null or
tt.customerid is null or
(tt.col1 <> ty.col1 or tt.col2 <> ty.col2 or . . . ); -- may need to take `NULL`s into account
mathguy provided a successful answer to my initial problem. I asked him for a revision (to make it even better). He provided a revision, but I am getting a "missing keyword" error when executing against my code. Here is my code:
select case when ct = 1 then flag else 'changed' as flag, PERSON_ID, FIRSTNAME, LASTNAME
from (
select max(flag), PERSON_ID, FIRSTNAME, LASTNAME
count() over (partition by PERSON_ID) as ct,
row_number() over (partition by PERSON_ID
order by case when flag = 'new' then 0 end) as rn
from ( select 'old' as flag, t1.*
from YESTERDAY_TABLE t1
union all
select 'new' as flag, t2.*
from TODAY_TABLE t2
)
group by PERSON_ID, FIRSTNAME, LASTNAME
having count(*) = 1
)
where rn = 1
order by PERSON_ID;

Count uid from two tables who look the same sort by tablename

since I am not as good with more complex SQL SELECT Statements I thought of just asking here, since it's hard to find something right on topic.
I got two tables who have exactly the same structure like
TABLE A (id (INT(11)), time (VARCHAR(10));)
TABLE B (id (INT(11)), time (VARCHAR(10));)
Now I want a single SELECT to count the entrys of an specific id in both tables.
SELECT COUNT(*) FROM TABLE A WHERE id = '1';
SELECT COUNT(*) FROM TABLE B WHERE id = '1';
So I thought it would be much better for the database performance if I use one SELECT instead of one.
Thanks for helping out
SELECT COUNT(*) as count, 'tableA' as table_name FROM TABLEA WHERE id = '1'
union all
SELECT COUNT(*), 'tableB' FROM TABLEB WHERE id = '1'
If you want the separate counts in a single row, you can use subqueries
SELECT
(SELECT COUNT(*) FROM TABLE A WHERE id = '1') a_count,
(SELECT COUNT(*) FROM TABLE B WHERE id = '1') b_count;
You could do it like:
select count(*)
from (
select id from t1 where id = 1
union all
select id from t2 where id = 1
) as t
Another alternative is:
select sum(cnt)
from (
select count(*) as cnt from t1 where id = 1
union all
select count(*) as cnt from t2 where id = 1
) as t

SQL - Getting Most Recent Date From Multiple Columns

Assume a rowset containing the following
EntryID Name DateModified DateDeleted
-----------------------------------------------
1 Name1 1/2/2003 NULL
2 Name1 1/3/2005 1/5/2008
3 Name1 1/3/2006 NULL
4 Name1 NULL NULL
5 Name1 3/5/2008 NULL
Clarification:
I need a single value - the largest non-null date from BOTH columns. So the largest of all ten cells in this case.
SELECT MAX(CASE WHEN (DateDeleted IS NULL OR DateModified > DateDeleted)
THEN DateModified ELSE DateDeleted END) AS MaxDate
FROM Table
For MySQL, Postgres or Oracle, use the GREATEST function:
SELECT GREATEST(ISNULL(t.datemodified, '1900-01-01 00:00:00'),
ISNULL(t.datedeleted, '1900-01-01 00:00:00'))
FROM TABLE t
Both Oracle and MySQL will return NULL if a NULL is provided. The example uses MySQL null handling - update accordingly for the appropriate database.
A database agnostic alternative is:
SELECT z.entryid,
MAX(z.dt)
FROM (SELECT x.entryid,
x.datemodified AS dt
FROM TABLE x
UNION ALL
SELECT y.entryid
y.datedeleted AS dt
FROM TABLE y) z
GROUP BY z.entryid
As a general solution, you could try something like this:
select max(date_col)
from(
select max(date_col1) AS date_col from some_table
union
select max(date_col2) AS date_col from some_table
union
select max(date_col3) AS date_col from some_table
...
)
There might be easier ways, depending on what database you're using.
How about;
SELECT MAX(MX) FROM (
SELECT MAX(DateModified) AS MX FROM Tbl
UNION
SELECT MAX(DateDeleted) FROM Tbl
) T
The answer depends on what you really want. If you simply want the most recent of the two date values then you can do:
Select Max(DateModified), Max(DateDeleted)
From Table
If you are asking for the largest value from either column, then you can simply do:
Select Case
When Max(DateModified) > Max(DateDeleted) Then Max(DateModified)
Else Max(DateDeleted)
End As MaxOfEitherValue
From Table
The above are all valid answers;
But I'm Not sure if this would work?
select IsNull((
select MAX(DateModified)
from table
)
,
(
select MAX(DateDeleted)
from table
)
) as MaxOfEitherValue
from table
Edit 1:
Whilst in the shower this morning, I had another solution:
Solution 2:
select MAX(v) from (
select MAX(DateModified) as v from table
union all
select MAX(DateDeleted) as v from table
) as SubTable
Edit 3:
Damn it, just spotted this is the same solution as Alex k. sigh...
How to find the Latest Date from the columns from Multiple tables
e.g. if the Firstname is in Table1, Address is in Table2, Phone is in Table3:
When you are using with main SELECT statement while selecting other columns it is best written as :
SELECT Firstname
,Lastname
,Address
,PhoneNumber
,
,(SELECT max(T.date_col) from(select max(date_col1) AS date_col from Table1 Where ..
union
select max(date_col2) AS date_col from Table2 Where..
union
select max(date_col3) AS date_col from Table3 Where..
) AS T
) AS Last_Updated_Date
FROM Table T1
LEFT JOIN Table T2 ON T1.Common_Column=T2.Common_Column
LEFTJOIN Table T3 ON T1.Common_Column=T3.Common_Column