Selecting most recent and specific version in each group of records, for multiple groups - sql

The problem:
I have a table that records data rows in foo. Each time the row is updated, a new row is inserted along with a revision number. The table looks like:
id rev field
1 1 test1
2 1 fsdfs
3 1 jfds
1 2 test2
Note: the last record is a newer version of the first row.
Is there an efficient way to query for the latest version of a record and for a specific version of a record?
For instance, a query for rev=2 would return the 2, 3 and 4th row (not the replaced 1st row though) while a query for rev=1 yields those rows with rev <= 1 and in case of duplicated ids, the one with the higher revision number is chosen (record: 1, 2, 3).
I would not prefer to return the result in an iterative way.

To get only latest revisions:
SELECT * from t t1
WHERE t1.rev =
(SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)
To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):
SELECT * from foo t1
WHERE t1.rev =
(SELECT max(rev)
FROM foo t2
WHERE t2.id = t1.id
AND t2.rev <= 1)
It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.

Here's an alternative solution that incurs an update cost but is much more efficient for reading the latest data rows as it avoids computing MAX(rev). It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.
Aging
Replace the rev column with an age column
Create a view of the current latest data with filter: age = 0
To create a new version of your data ...
INSERT: new rows with age = -1 - This was my slow long running batch process.
UPDATE: UPDATE table-name SET age = age + 1 for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction.
DELETE: rows having age > N in the subset - Optionally purge old data
Indexing
Create a composite index with age and then id so the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (during UPDATE SET age=age+1) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a given id ordered by age, you may need an additional non-unique index on id then age.
Rollback
Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:
UPDATE table-name SET age = age - 1 -- Roll back a version
DELETE table-name WHERE age < 0 -- Clean up bad stuff
Existing Table
Suppose you have an existing table that now needs to support aging. You can use this pattern by first renaming the existing table, then add the age column and indexing and then create the view that includes the age = 0 condition with the same name as the original table name.
This strategy may or may not work depending on the nature of technology layers that depended on the original table but in many cases swapping a view for a table should drop in just fine.
Notes
I recommend naming the age column to RowAge in order to indicate this pattern is being used, since it's clearer that its a database related value and it complements SQL Server's RowVersion naming convention. It also won't conflict with a column or view that needs to return a person's age.
Unlike other solutions, this pattern works for non SQL Server databases.
If the subsets you're updating are very large then this might not be a good solution as your final transaction will update not just the current records but all past version of the records in this subset (which could even be the entire table!) so you may end up locking the table.

This is how I would do it. ROW_NUMBER() requires SQL Server 2005 or later
Sample data:
DECLARE #foo TABLE (
id int,
rev int,
field nvarchar(10)
)
INSERT #foo VALUES
( 1, 1, 'test1' ),
( 2, 1, 'fdsfs' ),
( 3, 1, 'jfds' ),
( 1, 2, 'test2' )
The query:
DECLARE #desiredRev int
SET #desiredRev = 2
SELECT * FROM (
SELECT
id,
rev,
field,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM #foo WHERE rev <= #desiredRev
) numbered
WHERE rn = 1
The inner SELECT returns all relevant records, and within each id group (that's the PARTITION BY), computes the row number when ordered by descending rev.
The outer SELECT just selects the first member (so, the one with highest rev) from each id group.
Output when #desiredRev = 2 :
id rev field rn
----------- ----------- ---------- --------------------
1 2 test2 1
2 1 fdsfs 1
3 1 jfds 1
Output when #desiredRev = 1 :
id rev field rn
----------- ----------- ---------- --------------------
1 1 test1 1
2 1 fdsfs 1
3 1 jfds 1

If you want all the latest revisions of each field, you can use
SELECT C.rev, C.fields FROM (
SELECT MAX(A.rev) AS rev, A.id
FROM yourtable A
GROUP BY A.id)
AS B
INNER JOIN yourtable C
ON B.id = C.id AND B.rev = C.rev
In the case of your example, that would return
rev field
1 fsdfs
1 jfds
2 test2

SELECT
MaxRevs.id,
revision.field
FROM
(SELECT
id,
MAX(rev) AS MaxRev
FROM revision
GROUP BY id
) MaxRevs
INNER JOIN revision
ON MaxRevs.id = revision.id AND MaxRevs.MaxRev = revision.rev

SELECT foo.* from foo
left join foo as later
on foo.id=later.id and later.rev>foo.rev
where later.id is null;

How about this?
select id, max(rev), field from foo group by id
For querying specific revision e.g. revision 1,
select id, max(rev), field from foo where rev <= 1 group by id

Related

How to compare records and update one column

It is a little bit tricky for me so I need your help :)
I want to update the column Relevant to 0 WHERE Contract_Status_Code is 10 OR the Date_Contract_start YEAR is the same AND the Ranking_Value is lower than the other one ON all records that have the same VIN.
So I want to compare all records which have the same VIN.
Few examples to illustrate it:
I have there two records with the VIN = 123456. One of them (ID = 6847) has a higher Ranking_Value (7) than the other one (3). The YEAR is the same as well so I want to update the column relevant to 0 where the ID is 8105.
Two records with the VIN = 654321. Both of them have the same Ranking_Value but the record with the id = 11012 has the value 10 for the column Contract_Status_Code so I want to update the relevant column to 0 where the ID = 11012.
The last two records... They have the VIN = 171819. The first one (ID = 11578) has the higher Ranking_Value. But they have a different year where the contract has started. So I don't want to update both.
It is also possible that there are three or four records with the same VIN.
I hope you understand my problem. I'm from Germany so sorry for my English :)
By considering your ID column as unique or Identity column, I can suggest you the below query for your solution:
With cte
As
(Select a.Id, a.VIN From Table a
Join (Select max(Ranking_Value) ranks,VIN From Table Group By VIN, Year(Date_Contract_start)) b
on a.VIN=b.VIN And a.Ranking_Value = b.ranks)
update table
set Relevant = 0
where (Contract_Status_Code = 10) Or
ID Not In (Select id from cte)
update table1
set Relevant = 0
where Contract_Status_Code = 10
or (VIN,Year,Ranking_value) not in(
select VIN,Year,max(Ranking_Value)
from table1
group by VIN,Year
)

How to update with incrementing value

I have a table in PostgreSQL that has an ID column that is supposed to be unique. However, a large number of the rows (around 3 million) currently have an ID of "1".
What I know:
The total number of rows
The current maximum value for the ID column
The number of rows with an (incorrect) ID of "1"
What I need is a query that will pull all the rows with an ID of "1" and assign them a new ID that increments automatically so that every row in the table will have a unique ID. I'd like it to start at the currentMaxId + 1 and assign each row the subsequent ID.
This is the closest I've gotten with a query:
UPDATE table_name
SET id = (
SELECT max(id) FROM table_name
) + 1
WHERE id = '1'
The problem with this is that the inner SELECT only runs the first time, thus setting the ID of the rows in question to the original max(id) + 1, not the new max(id) + 1 every time, giving me the same problem I'm trying to solve.
Any suggestions on how to tweak this query to achieve my desired result or an alternative method would be greatly appreciated!
You may do it step by step with a temporary sequence.
1) creation
create temporary sequence seq_upd;
2) set it to the proper initial value
select setval('seq_upd', (select max(id) from table_name));
3) update
update table_name set id=nextval('seq_upd') where id=1;
If you are going to work with a SEQUENCE, consider the serial pseudo data type for you id. Then you can just draw nextval() from the "owned" (not temporary) sequence, which will then be up to date automatically.
If you don't want that, you can fall back to using the ctid and row_number() for a one-time numbering:
UPDATE tbl t
SET id = x.max_id + u.rn
FROM (SELECT max(id) AS max_id FROM tbl) x
, (SELECT ctid, row_number() OVER (ORDER BY ctid) AS rn
FROM tbl WHERE id = 1) u
WHERE t.ctid = u.ctid;
Related answer on dba.SE:
numbering rows consecutively for a number of tables

Getting the newest revision from a table via SQL

I have a database that looks like this:
ID parent name description _record_status _log_user _log_timestamp _log_type
--------------------------------------------------------------------------------------------------------------------
1 1 This is a rwo! Test content active 1 2012-01-29 15:49:21 create
2 1 This is a row! Test Content active 1 2012-01-29 15:52:14 revision
3 3 Another record! More content active 1 2012-01-29 15:58:43 create
4 4 Deleted Record More content active 1 2012-01-29 15:58:43 create
5 4 Deleted Record More content deleted 1 2012-01-29 15:58:43 destroy
I want to be able to select the newest row for each record, where the record isn't deleted. So for example, the output I'd expect is:
ID name
------------------
2 This is a row!
3 Another Record!
Is there a way to do this via SQL that is efficient, and if not, what might I want to do in PHP to accomplish this?
Would having a separate version of each table for revisions be the way to go here?
You'd get the highest record per parent first, then excluded anything that has been deleted:
SELECT YourTable.ID, YourTable.name
FROM YourTable INNER JOIN (
SELECT parent, MAX(_log_timestamp) MaxLogTS
FROM YourTable
GROUP BY parent) T ON YourTable.parent = T.parent
AND YourTable._log_timestamp = T.MaxLogTS
WHERE YourTable._record_status != 'deleted'
This can be slightly optimized if you know that ID values are always in ascending time order. Then you could base the MAX record comparison on ID rather than a date and time value.
Assuming you are using MySQL, try this:
select * from
(select id, name, _record_status from
(select parent, id, name, _record_status
from your_table
order by parent, _log_timestamp desc) v
group by parent) q
where _record_status <> 'deleted'

Getting the last record in SQL in WHERE condition

i have loanTable that contain two field loan_id and status
loan_id status
==============
1 0
2 9
1 6
5 3
4 5
1 4 <-- How do I select this??
4 6
In this Situation i need to show the last Status of loan_id 1 i.e is status 4. Can please help me in this query.
Since the 'last' row for ID 1 is neither the minimum nor the maximum, you are living in a state of mild confusion. Rows in a table have no order. So, you should be providing another column, possibly the date/time when each row is inserted, to provide the sequencing of the data. Another option could be a separate, automatically incremented column which records the sequence in which the rows are inserted. Then the query can be written.
If the extra column is called status_id, then you could write:
SELECT L1.*
FROM LoanTable AS L1
WHERE L1.Status_ID = (SELECT MAX(Status_ID)
FROM LoanTable AS L2
WHERE L2.Loan_ID = 1);
(The table aliases L1 and L2 could be omitted without confusing the DBMS or experienced SQL programmers.)
As it stands, there is no reliable way of knowing which is the last row, so your query is unanswerable.
Does your table happen to have a primary id or a timestamp? If not then what you want is not really possible.
If yes then:
SELECT TOP 1 status
FROM loanTable
WHERE loan_id = 1
ORDER BY primaryId DESC
-- or
-- ORDER BY yourTimestamp DESC
I assume that with "last status" you mean the record that was inserted most recently? AFAIK there is no way to make such a query unless you add timestamp into your table where you store the date and time when the record was added. RDBMS don't keep any internal order of the records.
But if last = last inserted, that's not possible for current schema, until a PK addition:
select top 1 status, loan_id
from loanTable
where loan_id = 1
order by id desc -- PK
Use a data reader. When it exits the while loop it will be on the last row. As the other posters stated unless you put a sort on the query, the row order could change. Even if there is a clustered index on the table it might not return the rows in that order (without a sort on the clustered index).
SqlDataReader rdr = SQLcmd.ExecuteReader();
while (rdr.Read())
{
}
string lastVal = rdr[0].ToString()
rdr.Close();
You could also use a ROW_NUMBER() but that requires a sort and you cannot use ROW_NUMBER() directly in the Where. But you can fool it by creating a derived table. The rdr solution above is faster.
In oracle database this is very simple.
select * from (select * from loanTable order by rownum desc) where rownum=1
Hi if this has not been solved yet.
To get the last record for any field from a table the easiest way would be to add an ID to each record say pID. Also say that in your table you would like to hhet the last record for each 'Name', run the simple query
SELECT Name, MAX(pID) as LastID
INTO [TableName]
FROM [YourTableName]
GROUP BY [Name]/[Any other field you would like your last records to appear by]
You should now have a table containing the Names in one column and the last available ID for that Name.
Now you can use a join to get the other details from your primary table, say this is some price or date then run the following:
SELECT a.*,b.Price/b.date/b.[Whatever other field you want]
FROM [TableName] a LEFT JOIN [YourTableName]
ON a.Name = b.Name and a.LastID = b.pID
This should then give you the last records for each Name, for the first record run the same queries as above just replace the Max by Min above.
This should be easy to follow and should run quicker as well
If you don't have any identifying columns you could use to get the insert order. You can always do it like this. But it's hacky, and not very pretty.
select
t.row1,
t.row2,
ROW_NUMBER() OVER (ORDER BY t.[count]) AS rownum from (
select
tab.row1,
tab.row2,
1 as [count]
from table tab) t
So basically you get the 'natural order' if you can call it that, and add some column with all the same data. This can be used to sort by the 'natural order', giving you an opportunity to place a row number column on the next query.
Personally, if the system you are using hasn't got a time stamp/identity column, and the current users are using the 'natural order', I would quickly add a column and use this query to create some sort of time stamp/incremental key. Rather than risking having some automation mechanism change the 'natural order', breaking the data needed.
I think this code may help you:
WITH cte_Loans
AS
(
SELECT LoanID
,[Status]
,ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS RN
FROM LoanTable
)
SELECT LoanID
,[Status]
FROM LoanTable L1
WHERE RN = ( SELECT max(RN)
FROM LoanTable L2
WHERE L2.LoanID = L1.LoanID)

How to get one common value from Database using UNION

2 records in above image are from Db, in above table Constraint are (SID and LINE_ITEM_ID),
SID and LINE_ITEM_ID both column are used to find a unique record.
My issues :
I am looking for a query it should fetch the recored from DB depending on conditions
if i search for PART_NUMBER = 'PAU43-IMB-P6'
1. it should fetch one record from DB if search for PART_NUMBER = 'PAU43-IMB-P6', no mater to which SID that item belong to if there is only one recored either under SID =1 or SID = 2.
2. it should fetch one record which is under SID = 2 only, from DB on search for PART_NUMBER = 'PAU43-IMB-P6', if there are 2 items one in SID=1 and other in SID=2.
i am looking for a query which will search for a given part_number depending on Both SID 1 and 2, and it should return value under SID =2 and it can return value under SID=1 only if the there are no records under SID=2 (query has to withstand a load of Million record search).
Thank you
Select *
from Table
where SID||LINE_ITEM_ID = (
select Max(SID)||Max(LINE_ITEM_ID)
from table
where PART_NUMBER = 'PAU43-IMB-P6'
);
If I understand correctly, for each considered LINE_ITEM_ID you want to return only the one with the largest value for SID. This is a common requirement and, as with most things in SQL, can be written in many different ways; the best performing will depend on many factors, not least of which is the SQL product you are using.
Here's one possible approach:
SELECT DISTINCT * -- use a column list
FROM YourTable AS T1
INNER JOIN (
SELECT T2.LINE_ITEM_ID,
MAX(T2.SID) AS max_SID
FROM YourTable AS T2
GROUP
BY T2.LINE_ITEM_ID
) AS DT1 (LINE_ITEM_ID, max_SID)
ON T1.LINE_ITEM_ID = DT1.LINE_ITEM_ID
AND T1.SID = DT1.max_SID;
That said, I don't recall seeing one that relies on the UNION relational operator. You could easily rewrite the above using the INTERSECT relational operator but it would be more verbose.
Well in my case it worked something like this:
select LINE_ITEM_ID,SID,price_1,part_number from (
(select LINE_ITEM_ID,SID,price_1,part_number from Table where SID = 2)
UNION
(select LINE_ITEM_ID,SID,price_1,part_number from Table SID = 1 and line_item_id NOT IN (select LINE_ITEM_ID,SID,price_1,part_number from Table SID = 2)))
This query solved my issue..........