I don't understand how make task on SQL - sql

There is a table with two fields: Id and Timestamp.
Id is an increasing sequence. Each insertion of a new record into the table leads to the generation of ID(n)=ID(n-1) + 1. Timestamp is a timestamp that, when inserted retroactively, can take any values less than the maximum time of all previous records.
Retroactive insertion is the operation of inserting a record into a table in which
ID(n) > ID(n-1)
Timestamp(n) < max(timestamp(1):timestamp(n-1))
Example of a table:
ID
Timestamp
1
2016.09.11
2
2016.09.12
3
2016.09.13
4
2016.09.14
5
2016.09.09
6
2016.09.12
7
2016.09.15
IDs 5 and 6 were inserted retroactively (their timestamps are lower than later records).
I need a query that will return a list of all ids that fit the definition of insertion retroactively. How can I do this?

It can be rephrased to :
Find every entries for which, in the same table, there is an entry with a lesser id (a previous entry) having a greater timestamp
It can be achieved using a WHERE EXISTS clause :
SELECT t.id, t.timestamp
FROM tbl t
WHERE EXISTS (
SELECT 1
FROM tbl t2
WHERE t.id > t2.id
AND t.timestamp < t2.timestamp
);
Fiddle for MySQL It should work with any DBMS, since it's a standard SQL syntax.

Related

SQL Server Inner Join with Timestamps: is each record only assigned once?

I am working with timestamped records and need to do an inner join based on the timestamp difference. I have been using the DATEDIFF function and it seems to be working well. However, the amount of time between timestamps varies. To clarify, sometimes the record appears in table 2 within the same second as table 1, and sometimes the record in table 2 is up to 15 seconds behind the record in table 1. The records in table 1 are always timestamped before table 2. There is no other common field with which I can join, however there is a register number in each table that I am using to increase accuracy by ensuring that the registers are the same.
My question is: if I increase the timestamp difference to do the inner join (e.g. where the DATEDIFF = 1 or 2 or 3... or 15) will records only be joined once? Or would my table contain duplicate records from table 1 (e.g. where record 1 is joined to record 4 in table 2 where the diff is 4 seconds, and is also joined with record 7 from table 2 where the diff is 11 seconds)?
The reason my statement works now is that no registers have records with less than 6 seconds in between, so even if there are multiple timestamps that would match, the matching of registers eliminates this problem.
My Statement is currently working as:
SELECT *
INTO AtriumSequoiaJoin5
FROM Atrium INNER JOIN Sequoia ON Atrium.Reader = Sequoia.theader_pos_name
WHERE (
((DateDiff(s,[Atrium].[Date2],[Sequoia].[theader_tdatetime]))=0
Or (DateDiff(s,[Atrium].[Date2],[Sequoia].[theader_tdatetime]))=1
Or (DateDiff(s,[Atrium].[Date2],[Sequoia].[theader_tdatetime]))=2
Or (DateDiff(s,[Atrium].[Date2],[Sequoia].[theader_tdatetime]))=3
Or (DateDiff(s,[Atrium].[Date2],[Sequoia].[theader_tdatetime]))=4
Or (Datediff(s,[Atrium].[Date2],[Sequoia].[theader_tdatetime]))=5)
)
ORDER BY Sequoia.theader_id;
you could CROSS APPLY to the closest record in proximity. That's by no means ideal however, what if there are multiple records written at the same time? You perhaps should give the first table an identity field, then update the next table with scopeidentity
SELECT *
INTO AtriumSequoiaJoin5
FROM Atrium CROSS APPLY
(SELECT TOP 1 * FROM Sequoia WHERE
Atrium.Reader = Sequoia.theader_pos_name
ORDER BY Datediff(millisecond,[Atrium].[Date2],[Sequoia].[theader_tdatetime])) DQ
ORDER BY Sequoia.theader_id;

SQL Query in Apex if date not exist return value from previous date

I need to generate interactive report in Oracle Apex by comparing 2 tables. Each table has Columns: Date, Tag, Number.
Table 1: contains History of the changes of Column: Number and writes on which date the column: Number was changed.
Table 2 : Also has column: Number and column: Date (column: Date which shows different values for column: Number for the different dates of the month).
The Report is being generated when comparing columns "Tag_id" between table 1 and table 2
so we know which value from column: number in table 1 is referring to column: number in table 2. (...where table1.tag=table2.tag).
Table 2 contains date in column: number for every day of the month,
and Table 1 contains date that is changed in column: number once every 1,2,3 or more days.
I want the report to show values from column: Number from Table 1 and to show the values from column: Number from Table 2 while making comparison between table 1 and table 2 by column: Date.
The problem is when in Table 1 there is no date in column:Date which is equal to date in column:Date in Table 2, the Database doesnt know which value for column:Number to return in the Generated Report.
What I want to achieve:
When there is no date match in the columns:Date between Table 1 and Table 2 I want in the Generated report in Apex to return the value from column:Number from Table 2 for the
Date for which there is no match* and the value of column:Number from Table 1 for the closest previous date by using only SQL Query.
Interesting question. I don't know about Apex, but if it were Oracle's MySQL, I'd go along the way of:
SELECT
t2.id,
t2.date,
CASE WHEN t1.date IS NULL
THEN (SELECT table1.number FROM table1 WHERE table1.data <= t2.data ORDER BY table1.data DESC LIMIT 1)
ELSE t1.number
END AS numberT1,
t2.number as numberT2
FROM table2 t2
LEFT JOIN table1 t1 ON t1.date = t2.date

Removing duplicate records using another table's oid

Table 1 Table 2
-------- --------
oid oid (J)
sequence trip_id
stop
trip_update_id (J)
(J) = join
Table 1 and Table 2 are updated ever 30 seconds from an api simultaneously.
At the end of each day Table 1 has been filled with 98% duplicate data, this is because the data feed includes both new data generated in the last 30 seconds, and all data generated in previous feeds from the same day. As a result Table 1 is filled with mostly duplicate data (the oid is automatically generated upon insertion, therefore all oid are unique).
Table 2 has all unique records, therefore my question is what is the sql to turn Table 1 into all unique records for each trip_id in Table 2.
I'm not quite sure if I understand what the problem is, but here comes a few suggestions.
To remove rows from table1 with trip_update_id values not found in table2:
delete from table1
where trip_update_id not in (select trip_id from table2 where trip_id is not null)
(The is not null part is very important if trip_id is allowed to have NULL values!!!)
To duplicate remove trip_update_id rows from table 1, keep the one with highest oid:
delete from table1
where oid not in (select max(oid) from table1
group by trip_update_id)

Selecting most recent and specific version in each group of records, for multiple groups

The problem:
I have a table that records data rows in foo. Each time the row is updated, a new row is inserted along with a revision number. The table looks like:
id rev field
1 1 test1
2 1 fsdfs
3 1 jfds
1 2 test2
Note: the last record is a newer version of the first row.
Is there an efficient way to query for the latest version of a record and for a specific version of a record?
For instance, a query for rev=2 would return the 2, 3 and 4th row (not the replaced 1st row though) while a query for rev=1 yields those rows with rev <= 1 and in case of duplicated ids, the one with the higher revision number is chosen (record: 1, 2, 3).
I would not prefer to return the result in an iterative way.
To get only latest revisions:
SELECT * from t t1
WHERE t1.rev =
(SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)
To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):
SELECT * from foo t1
WHERE t1.rev =
(SELECT max(rev)
FROM foo t2
WHERE t2.id = t1.id
AND t2.rev <= 1)
It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.
Here's an alternative solution that incurs an update cost but is much more efficient for reading the latest data rows as it avoids computing MAX(rev). It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.
Aging
Replace the rev column with an age column
Create a view of the current latest data with filter: age = 0
To create a new version of your data ...
INSERT: new rows with age = -1 - This was my slow long running batch process.
UPDATE: UPDATE table-name SET age = age + 1 for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction.
DELETE: rows having age > N in the subset - Optionally purge old data
Indexing
Create a composite index with age and then id so the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (during UPDATE SET age=age+1) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a given id ordered by age, you may need an additional non-unique index on id then age.
Rollback
Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:
UPDATE table-name SET age = age - 1 -- Roll back a version
DELETE table-name WHERE age < 0 -- Clean up bad stuff
Existing Table
Suppose you have an existing table that now needs to support aging. You can use this pattern by first renaming the existing table, then add the age column and indexing and then create the view that includes the age = 0 condition with the same name as the original table name.
This strategy may or may not work depending on the nature of technology layers that depended on the original table but in many cases swapping a view for a table should drop in just fine.
Notes
I recommend naming the age column to RowAge in order to indicate this pattern is being used, since it's clearer that its a database related value and it complements SQL Server's RowVersion naming convention. It also won't conflict with a column or view that needs to return a person's age.
Unlike other solutions, this pattern works for non SQL Server databases.
If the subsets you're updating are very large then this might not be a good solution as your final transaction will update not just the current records but all past version of the records in this subset (which could even be the entire table!) so you may end up locking the table.
This is how I would do it. ROW_NUMBER() requires SQL Server 2005 or later
Sample data:
DECLARE #foo TABLE (
id int,
rev int,
field nvarchar(10)
)
INSERT #foo VALUES
( 1, 1, 'test1' ),
( 2, 1, 'fdsfs' ),
( 3, 1, 'jfds' ),
( 1, 2, 'test2' )
The query:
DECLARE #desiredRev int
SET #desiredRev = 2
SELECT * FROM (
SELECT
id,
rev,
field,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM #foo WHERE rev <= #desiredRev
) numbered
WHERE rn = 1
The inner SELECT returns all relevant records, and within each id group (that's the PARTITION BY), computes the row number when ordered by descending rev.
The outer SELECT just selects the first member (so, the one with highest rev) from each id group.
Output when #desiredRev = 2 :
id rev field rn
----------- ----------- ---------- --------------------
1 2 test2 1
2 1 fdsfs 1
3 1 jfds 1
Output when #desiredRev = 1 :
id rev field rn
----------- ----------- ---------- --------------------
1 1 test1 1
2 1 fdsfs 1
3 1 jfds 1
If you want all the latest revisions of each field, you can use
SELECT C.rev, C.fields FROM (
SELECT MAX(A.rev) AS rev, A.id
FROM yourtable A
GROUP BY A.id)
AS B
INNER JOIN yourtable C
ON B.id = C.id AND B.rev = C.rev
In the case of your example, that would return
rev field
1 fsdfs
1 jfds
2 test2
SELECT
MaxRevs.id,
revision.field
FROM
(SELECT
id,
MAX(rev) AS MaxRev
FROM revision
GROUP BY id
) MaxRevs
INNER JOIN revision
ON MaxRevs.id = revision.id AND MaxRevs.MaxRev = revision.rev
SELECT foo.* from foo
left join foo as later
on foo.id=later.id and later.rev>foo.rev
where later.id is null;
How about this?
select id, max(rev), field from foo group by id
For querying specific revision e.g. revision 1,
select id, max(rev), field from foo where rev <= 1 group by id

How do I find the most recently dated record for each ID in a set of IDs where any ID can have multiple records each for a particular date?

I've got a data set that looks as follows:
The first column is an auto-increment primary key. The second column is an ID number for whatever, maybe 3 rocks with IDs 1, 2 and 3 respectively. (I probably should have used the standard customer and order example but oh well.) The third column is a date when I threw the rock. I track the date each time the rock is thrown, hence the multiple IDs (the second column) each with a throwing time.
I want a query to return the rock ID and most recent date for each ID. The result of course would have a single record for each ID - the one with the latest access time.
I'm struggling with the possible combination of "DISTINCT", "TOP 1" and "GROUP BY" clauses that gives the result I want.
SELECT id, max(date)
FROM table
GROUP BY id
If you want also the autoincrement row id then
SELECT t1.rowid, t1.id, t1date
FROM table t1
JOIN (SELECT id,max(date) date FROM table1 t2 GROUP BY id) t2 ON t1.id = t2.id AND t1.date = t2.date