Getting the newest revision from a table via SQL - sql

I have a database that looks like this:
ID parent name description _record_status _log_user _log_timestamp _log_type
--------------------------------------------------------------------------------------------------------------------
1 1 This is a rwo! Test content active 1 2012-01-29 15:49:21 create
2 1 This is a row! Test Content active 1 2012-01-29 15:52:14 revision
3 3 Another record! More content active 1 2012-01-29 15:58:43 create
4 4 Deleted Record More content active 1 2012-01-29 15:58:43 create
5 4 Deleted Record More content deleted 1 2012-01-29 15:58:43 destroy
I want to be able to select the newest row for each record, where the record isn't deleted. So for example, the output I'd expect is:
ID name
------------------
2 This is a row!
3 Another Record!
Is there a way to do this via SQL that is efficient, and if not, what might I want to do in PHP to accomplish this?
Would having a separate version of each table for revisions be the way to go here?

You'd get the highest record per parent first, then excluded anything that has been deleted:
SELECT YourTable.ID, YourTable.name
FROM YourTable INNER JOIN (
SELECT parent, MAX(_log_timestamp) MaxLogTS
FROM YourTable
GROUP BY parent) T ON YourTable.parent = T.parent
AND YourTable._log_timestamp = T.MaxLogTS
WHERE YourTable._record_status != 'deleted'
This can be slightly optimized if you know that ID values are always in ascending time order. Then you could base the MAX record comparison on ID rather than a date and time value.

Assuming you are using MySQL, try this:
select * from
(select id, name, _record_status from
(select parent, id, name, _record_status
from your_table
order by parent, _log_timestamp desc) v
group by parent) q
where _record_status <> 'deleted'

Related

In SQL, is there something like COALESCE() to qualify Column values which are NOT NULL?

I have an "Order"Table which houses "Status"Column. Values under Status = ('Completed','Sold','In-Process','Released','NotStarted').
Even though there is no sequence or hierarchy for that in the table, we can perceive the sequence as below.
1 NotStarted
2 Released
3 In-Process
4 Sold
5 Completed
So 'Completed' is the highest status and each order goes through these Statuses until they are Completed. if they are not completed yet, they should be in one of the other status.
When I filter on Completed, I miss out on the other records. When I include all Status, I get multiple records of same order such as 1 record for Released, 1 record for InProcess, etc (i.e, the various stages of the order)
select * from OrderTable
where Status = 'Completed'
I want to have the ability to do something like this --
COALESCE(Completed,Sold,In-Process,Released,NotStarted, NULL)
In other words, I want to get the highest record for that status and only 1 record for each order.
Is this possible in Sql?
One option is to use ROW_NUMBER() with a Case expression to establish your ordering.
SELECT *
FROM
(
SELECT OrderNumber,
Status,
ROW_NUMBER() OVER (PARTITION BY OrderNumber ORDER BY
CASE Status
WHEN 'NotStarted' THEN 1
WHEN 'Released' THEN 2
WHEN 'In-Process' THEN 3
WHEN 'Sold' THEN 4
WHEN 'Completed' THEN 5
END DESC) as Order_Status_Rank
FROM OrderTable
) dt
WHERE Order_Status_Rank = 1;
See it in action here
If you used the numbers shown in your list instead of the text, all you would have to do is select Top 1 and sort on the number. You can use another table to store the number/entry data. By the way, this would also prevent erroneous data entry.

SQL Server ID reseed after deleltion of a middle record

i have a Table from which i delete records .
The problem is when i delete a certain record,its ID flies away too, so the ID order is no longer respected within the table.
What i want is a SQL Server Procedure to rearrange records after the deletion of one of them.
Example :
ID ID ID
1 1 1
2 I delete record 2, i want to have this ===> 2 and NOT this : 3
3 3 4
4 4 5
5
You don't want to do this. The id should be a field that has no meaning other than identifying a row. You might have other tables that refer to the id and they would break.
Instead, just recalculate a sequential value when you query the table:
select t.*, row_number() over (order by id) as seqnum
from t;

Selecting most recent and specific version in each group of records, for multiple groups

The problem:
I have a table that records data rows in foo. Each time the row is updated, a new row is inserted along with a revision number. The table looks like:
id rev field
1 1 test1
2 1 fsdfs
3 1 jfds
1 2 test2
Note: the last record is a newer version of the first row.
Is there an efficient way to query for the latest version of a record and for a specific version of a record?
For instance, a query for rev=2 would return the 2, 3 and 4th row (not the replaced 1st row though) while a query for rev=1 yields those rows with rev <= 1 and in case of duplicated ids, the one with the higher revision number is chosen (record: 1, 2, 3).
I would not prefer to return the result in an iterative way.
To get only latest revisions:
SELECT * from t t1
WHERE t1.rev =
(SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)
To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):
SELECT * from foo t1
WHERE t1.rev =
(SELECT max(rev)
FROM foo t2
WHERE t2.id = t1.id
AND t2.rev <= 1)
It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.
Here's an alternative solution that incurs an update cost but is much more efficient for reading the latest data rows as it avoids computing MAX(rev). It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.
Aging
Replace the rev column with an age column
Create a view of the current latest data with filter: age = 0
To create a new version of your data ...
INSERT: new rows with age = -1 - This was my slow long running batch process.
UPDATE: UPDATE table-name SET age = age + 1 for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction.
DELETE: rows having age > N in the subset - Optionally purge old data
Indexing
Create a composite index with age and then id so the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (during UPDATE SET age=age+1) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a given id ordered by age, you may need an additional non-unique index on id then age.
Rollback
Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:
UPDATE table-name SET age = age - 1 -- Roll back a version
DELETE table-name WHERE age < 0 -- Clean up bad stuff
Existing Table
Suppose you have an existing table that now needs to support aging. You can use this pattern by first renaming the existing table, then add the age column and indexing and then create the view that includes the age = 0 condition with the same name as the original table name.
This strategy may or may not work depending on the nature of technology layers that depended on the original table but in many cases swapping a view for a table should drop in just fine.
Notes
I recommend naming the age column to RowAge in order to indicate this pattern is being used, since it's clearer that its a database related value and it complements SQL Server's RowVersion naming convention. It also won't conflict with a column or view that needs to return a person's age.
Unlike other solutions, this pattern works for non SQL Server databases.
If the subsets you're updating are very large then this might not be a good solution as your final transaction will update not just the current records but all past version of the records in this subset (which could even be the entire table!) so you may end up locking the table.
This is how I would do it. ROW_NUMBER() requires SQL Server 2005 or later
Sample data:
DECLARE #foo TABLE (
id int,
rev int,
field nvarchar(10)
)
INSERT #foo VALUES
( 1, 1, 'test1' ),
( 2, 1, 'fdsfs' ),
( 3, 1, 'jfds' ),
( 1, 2, 'test2' )
The query:
DECLARE #desiredRev int
SET #desiredRev = 2
SELECT * FROM (
SELECT
id,
rev,
field,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM #foo WHERE rev <= #desiredRev
) numbered
WHERE rn = 1
The inner SELECT returns all relevant records, and within each id group (that's the PARTITION BY), computes the row number when ordered by descending rev.
The outer SELECT just selects the first member (so, the one with highest rev) from each id group.
Output when #desiredRev = 2 :
id rev field rn
----------- ----------- ---------- --------------------
1 2 test2 1
2 1 fdsfs 1
3 1 jfds 1
Output when #desiredRev = 1 :
id rev field rn
----------- ----------- ---------- --------------------
1 1 test1 1
2 1 fdsfs 1
3 1 jfds 1
If you want all the latest revisions of each field, you can use
SELECT C.rev, C.fields FROM (
SELECT MAX(A.rev) AS rev, A.id
FROM yourtable A
GROUP BY A.id)
AS B
INNER JOIN yourtable C
ON B.id = C.id AND B.rev = C.rev
In the case of your example, that would return
rev field
1 fsdfs
1 jfds
2 test2
SELECT
MaxRevs.id,
revision.field
FROM
(SELECT
id,
MAX(rev) AS MaxRev
FROM revision
GROUP BY id
) MaxRevs
INNER JOIN revision
ON MaxRevs.id = revision.id AND MaxRevs.MaxRev = revision.rev
SELECT foo.* from foo
left join foo as later
on foo.id=later.id and later.rev>foo.rev
where later.id is null;
How about this?
select id, max(rev), field from foo group by id
For querying specific revision e.g. revision 1,
select id, max(rev), field from foo where rev <= 1 group by id

How to find out the duplicate records

Using Sql Server 2000
I want to find out the duplicate record in the table
Table1
ID Transaction Value
001 020102 10
001 020103 20
001 020102 10 (Duplicate Records)
002 020102 10
002 020103 20
002 020102 10 (Duplicate Records)
...
...
Transaction and value can be repeat for different id's, not for the same id...
Expected Output
Duplicate records are...
ID Transaction Value
001 020102 10
002 020102 10
...
...
How to make a query for view the duplicate records.
Need Query help
You can use
SELECT
ID, Transaction, Value
FROM
Table1
GROUP BY
ID, Transaction, Value
HAVING count(ID) > 1
Select Id, Transaction, Value, Count(id)
from table
group by Id, Transaction, Value
having count(id) > 1
This query will show you the count of times the ID has been repeated with each entry of the Id. If you don't need it you can simply remove the Count(Id) column from the select clause.
Self join (with additional PK or Timestamp or...)
I can see that people've provided solution with grouping but none has provided the self join solution. The only problem is that you'd need some other row descriptor that should be unique for each record. Be it primary key, timestamp or anything else... Suppose that the unique column's name is Uniq this would be the solution:
select distinct ID, [Transaction], Value
from Records r1
join Records r2
on ((r2.ID = r1.ID) and
(r2.[Transaction] = r1.[Transaction]) and
(r2.Value = r1.Value) and
(r2.Uniq != r1.Uniq))
The last join column makes it possible to not join each row to itself but only to other duplicates...
To find out which one works best for you, you can check their execution plan and execute some testing.
You can do this:
SELECT ID, Transaction, Value
FROM Table
GROUP BY ID, Transaction, Value
HAVING COUNT(*) > 1
To delete the duplicates, if you have no primary key then you need to select the distinct values into a separate table, delete everything from this one, then copy the distinct records back:
SELECT ID, Transaction, Value
INTO #tmpDeduped
FROM Table
GROUP BY ID, Transaction, Value
DELETE FROM Table
INSERT Table
SELECT * FROM #tmpDeduped

how to query with child relations to same table and order this correctly

Take this table:
id name sub_id
---------------------------
1 A (null)
2 B (null)
3 A2 1
4 A3 1
The sub_id column is a relation to his own table, to column ID.
subid --- 0:1 --- id
Now I have the problem to make a correctly SELECT query to show that the child rows (which sub_id is not null) directly selected under his parent row. So this must be a correctly order:
1 A (null)
3 A2 1
4 A3 1
2 B (null)
A normal SELECT order the id. But how or which keyword help me to order this correctly?
JOIN isn't possible I think because I want to get all the rows separated. Because the rows will be displayed on a Gridview (ASP.Net) with EntityDataSource but the child rows must be displayed directly under his parent.
Thank you.
Look at Managing Hierarchical Data in MySQL.
Since recursion is an expensive operation because basicly you're firing multiple queries to your database you could consider using the Nested Set Model. In short you're assigning numbers to ranges in your table. It's a long article but it worth reading it. I've used it during my internship as a solution not to have 1000+ queries, But bring it down to 1 query.
Your handling 'overhead' now lies at the point of updating the table by adding, updating or deleting records. Since you then have to update all the records with a bigger 'right-value'. But when you're retrieving the data, it all goes with 1 query :)
select * from table1 order by name, sub_id will in this case return your desired result but only because the parents names and the child name are similar. If you're using SQL 2005 a recursive CTE will work:
WITH recurse (id, Name, childID, Depth)
AS
(
SELECT id, Name, ISNULL(childID, id) as id, 0 AS Depth
FROM table1 where childid is null
UNION ALL
SELECT table1.id, table1.Name, table1.childID, recurse.Depth + 1 AS Depth FROM table1
JOIN recurse ON table1.childid = recurse.id
)
SELECT * FROM recurse order by childid, depth
SELECT
*
FROM
table
ORDER BY
COALESCE(id,sub_id), id
btw, this will work only for one level.. any thing more than that requires recursive/cte function