How to find exact time of last received transaction on asynchronous mirror (SQL Server 2005)? - sql-server-2005

I need to provide users with exact time of data integrity in case of forced service with possible data loss.
I guess that I can find last LSN using:
SELECT [mirroring_failover_lsn]
FROM [master].[sys].[database_mirroring]
But that won't give me exact time.

Read How to read and interpret the SQL Server log. You'll see that LOP_BEGIN_XACT contains a timestamp. Given an LSN, you can analize the log and find all pending transactions (that is all xact_ids that do not have a commit or rollback logged before the given LSN). All the pending transations will be rolled back in case of failover. This will be the data lost if a forced failover occurs. There will be a number of transactions pending that will be undone, and these various transactions have started at various times. If you want to attach a 'exact time of data integrity' then you can say that no data loss will occur for anything earlier than the earliest pending lop_begin_xact. Eg. given the following log stream:
+-----+-----------+---------+------------+
| LSN | Operation | xact_id | timestampt |
+-----+-----------+---------+------------+
| 1 |INSERT | 1 | |
| 2 |BEGIN_XACT | 2 | 12:00 |
| 3 |INSERT | 1 | |
| 4 |BEGIN_XACT | 3 | 12:02 |
| 5 |COMMIT_XACT| 1 | |
| 6 |INSERT | 2 | |
| 7 |INSERT | 3 | |
| 8 |COMMIT_XACT| 3 | |
| 9 |COMMIT_XACT| 2 | |
Lets say that the mirroring failover LSN is 8. In this case you can say that not data loss will occur earlier than 12:00, because xact_id 2 is not committed at LSN 8 and therefore it will be rolled back. Note that xact_id 3 is commited by LSN 8 so it won't be lost, even though it has a later time stamp. So your timestamp is no t absolute, this is why I say 'no data loss will occur earlier than...' rather than 'data after ... will be lost'.

Related

Can you delete old entries from a table index?

I made a reminder application that heavily writes and reads records with future datetimes, but less on records with past datetimes. These reminders are indexed by remind_at, so a million of records means a million on the index, but speeds up checking records that must be reminded in the next hour.
| uuid | user_id | text | remind_at | ... | ... | ... |
| ------- | ------- | ------------ | ------------------- | --- | --- | --- |
| 45c1... | 23 | Buy paint | 2019-01-01 20:00:00 | ... | ... | ... |
| 23f1... | 924 | Pick up car | 2019-02-01 20:00:00 | ... | ... | ... |
| 2d84... | 650 | Call mom | 2020-03-01 20:00:00 | ... | ... | ... |
| 3f1a... | 81 | Get shoes | 2020-04-01 20:00:00 | ... | ... | ... |
The problem is performance. Once the database grows big, retrieving any record becomes relatively slow.
I'm trying to check what RDBMS offer a full or semi automated way allow better performance retrieving future datetimes, since past datetimes are rarely retrieved or checked.
A neat solution that I don't know if exist would be to instruct the RDBM to prune old entries from the index. I don't know if any RDBM allows that, but in PostgreSQL, SQL Server, and SQLite there is a way to use a "partial index", but what would happen if I **recreate an index on a table with millions of records?
Some solutions that didn't fit the bill:
Horizontal scaling: It would replicate the same problem, (n) number of times.
Vertical scaling: still doesn't fix the problem.
Sharding: Could be, since every instance would hold a part of the database, but the app will have to handle the "sharding key".
Two databases: Okay, one fast and other slow. Moving old entries to the "slow instance" (toaster) would be done manually. Also, the app would have to be heavily modified to check both databases since it doesn't know where it is initially. Logic increases heavily.
Anyway, the whole point is to make future (or the closest) records to remind snappier on retrieval while disregarding the performance to retrieve older entries.

Database design for partially changing data points, with history and snapshot functionality?

I'm looking for a best practice or solution, on a conceptual level, to a problem I'm working on.
I have a collection of data points (around 500) which are partially changed, by a user, over time. It is important to able to tell, which values have been changed at what point in time. The data might look like this:
Data changed over time:
+--------------------------------------------------------------------------------------+
| Date | Value no. 1 | Value no. 2 | Value no. 3 | ... | Value no. 500 |
|------------+---------------+---------------+---------------+-------+-----------------|
| 1/1/2018 | | | 2 | | 1 |
| 1/3/2018 | 2 | 1 | | | |
| 1/7/2018 | | | 4 | | 8 |
| 1/12/2018 | 5 | 3 | | | |
....
It must be possible to take a snapshot at a certain point in time, to get a complete set of data points, that were valid for that particular point in time, like this:
Snapshot taken 1/3/2018 will yield:
+---------------------------------------------------------+
| Value 1 | Value 2 | Value 3 | ... | Value 500 |
|-----------+-----------+-----------+-------+-------------|
| 2 | 1 | 2 | 0 | 1 |
Snapshot taken 1/9/2018 will yield:
+---------------------------------------------------------+
| Value 1 | Value 2 | Value 3 | ... | Value 500 |
|-----------+-----------+-----------+-------+-------------|
| 2 | 1 | 4 | 0 | 8 |
Snapshot taken 1/13/2018 will yield:
+---------------------------------------------------------+
| Value 1 | Value 2 | Value 3 | ... | Value 500 |
|-----------+-----------+-----------+-------+-------------|
| 5 | 3 | 4 | 0 | 8 |
and so on...
I'm not bound by a particular database technology, so either SQL or NoSQL will do. It is probably not possible to satisfy all the requirements in the DB-domain - some will probably have to be addressed in code. But my main question is what database technology is best suited for this task?
I'm not quite sure this fits a time-series database (TSDB), since only a portion of the values are changed at a given time, and it is important to know which values changed. Maybe I'm wrong?
/Chris
My suggestion would be to model this in a sparse format, something like:
CREATE TABLE DataPoint (
DataID int, /* 1 to 500 in your example, or whatever you need to identify it*/
ValidFrom timestamp, /*default value 01/01/1970-00:00:00 or a suitable "Epoch" */
ValidUntil timestamp, /*default value 31/12/3999-00:00:00 or again something that is in the far future for your case */
value Number (7,5) /* again, this may be any data type, or even more than one field if needed, like Price & Currency
);
What we have just defined is a set of data and the "interval" in which each data has a specific value, so if you measured DataPoint 1 yesterday and got a value of 89.768 you will insert:
DataId=1
ValidFrom=26/11/2018-14:52:41
ValidUntil=31/12/3999-00:00:00
Value=89.768
Then you measure it again tomorrow and get:
DataId=1
ValidFrom=28/11/2018-14:51:23
ValidUntil=31/12/3999-00:00:00
Value=89.443
(Let assume that you have also logic so that when you record a new value you update the current value record and assign ValidUntil=28/11/2018-14:51:23 this is not really needed but will make the example query simpler).
One month from now you have accumulated more measurements for data #1, and the same, on different moments, for data #2 to 500.
You now want to find out what the values were at noon today (i.e. one month "ago") i.e. at 27/11/2018:12:00:00:00
Select DataID, Value from DataPoint where ValidFrom <= 27/11/2018:12:00:00 and ValidUntil > 27/11/2018:12:00:00
This will return:
001,89.768
002,45.678
...,...
500,112.809
Regarding logging who did this, or for what reason, you can either log it separately (saving for example DataPoint Id, Timestamp, UserId...) or make it part of the original table, so that whenever you register a new datapoint you also log who measured it.
Have a look at SQL Server temporal tables engine which may be a solution in your case. This approach allow to run the queries mentioned in the question, for example
SELECT *
FROM my_data
FOR SYSTEM_TIME AS OF '2018-01-01'
However, the table in the example seems to be very large (maybe denormalized). I would suggest to group columns by some technical or functional characteristics (vertical partitioning) to avoid further maintenance drawbacks.

Editing a row in a database table affects all previous records that query that information. How should prior versions be stored/managed?

I’ve been working on a Windows Form App using vb.net that retrieves information from a SQL database. One of the forms, frmContract, queries several tables, such as Addresses, and displays them in various controls, such as Labels and DataGridViews. Every year, the customer’s file is either renewed or expired, and I’m just now realizing that a change committed to any record today will affect the information displayed for the customer in the past. For example, if we update a customer’s mailing address today, this new address will show up in all previous customer profiles. What is the smartest way to avoid this problem without creating separate rows in each table with the same information? Or to put it another way, how can versions of a customer’s profile be preserved?
Another example would be a table that stores customer’s vehicles.
VehicleID | Year | Make | Model | VIN | Body
---------------------------------------------------------------
1 | 2005 | Ford | F150 | 11111111111111111 | Pickup
2 | 2001 | Niss | Sentra | 22222222222222222 | Sedan
3 | 2004 | Intl | 4700 | 33333333333333333 | Car Carrier
If today vehicle 1 is changed from a standard pickup to a flatbed, then if I load the customer contract from 2016 it will also show as flatbed even though back then it was a pickup truck.
I have a table for storing individual clients.
ClientID | First | Last | DOB
---------|----------|-----------|------------
1 | John | Doe | 01/01/1980
2 | Mickey | Mouse | 11/18/1928
3 | Eric | Forman | 03/05/1960
I have another table to store yearly contracts.
ContractID | ContractNo | EffectiveDate | ExpirationDate | ClientID (foreign key)
-----------|------------|---------------|-------------------|-----------
1 | 13579 | 06/15/2013 | 06/15/2014 | 1
2 | 13579 | 06/15/2014 | 06/15/2015 | 1
3 | 24680 | 10/05/2016 | 10/05/2017 | 3
Notice that the contract number can remain the same across different periods. In addition, because the same vehicle can be related to multiple contracts, I use a bridge table to relate individual vehicles to different contracts.
Id | VehicleID | ContractID <-- both foreign keys
---|-----------|------------
1 | 1 | 1
2 | 3 | 1
3 | 1 | 2
4 | 3 | 2
5 | 2 | 3
6 | 2 | 2
When frmContract is loaded, it queries the database and displays information about that particular contract year. However, if Vehicle 1 is changed from pickup to flatbed right now, then all the previous contract years will also show it as a flatbed.
I hope this illustrates my predicament. Any guidance will be appreaciated.
Some DB systems have built-in temporal features so you can keep audit history of rows. Check to see if your DB has built-in support for this.

Detecting rising and falling edge via SQL (loading cycles)

i need to detect rising and falling edges from a loading state in my logs and need to list all loading cycles.
Lets say i have a table LOG
UTS | VALUE | STATE
1438392102 | 1000 | 0
1438392104 | 1001 | 1
1438392106 | 1002 | 1
1438392107 | 1003 | 0
1438392201 | 1007 | 1
1438392220 | 1045 | 1
1438392289 | 1073 | 0
1438392305 | 1085 | 1
1438392310 | 1090 | 1
1438392315 | 1095 | 1
And need all cycles where STATE = 1
I need to know when they started how long they lasted
and how much VALUE changed in each cycle.
I also might have a situation where the last cycle isn't
finished yet.
Do you have an idea how i can do this in SQL in a
good performing way ? Cuz i might run into situations
where my logs return several hundred of thausends of rows.
Thanks for help

SQL - Combining 3 rows per group in a logging scenario

I have reworked our API's logging system to use Azure Table Storage from using SQL storage for cost and performance reasons. I am now migrating our legacy logs to the new system. I am building a SQL query per table that will map the old fields to the new ones, with the intention of exporting to CSV then importing into Azure.
So far, so good. However, one artifact of the previous system is that it logged 3 times per request - call begin, call response and call end - and the new one logs the call as just one log (again, for cost and performance reasons).
Some fields common are common to all three related logs, e.g. the Session which uniquely identifies the call.
Some fields I only want the first log's value, e.g. Date which may be a few seconds different in the second and third log.
Some fields are shared for the three different purposes, e.g. Parameters gives the Input Model for Call Begin, Output Model for Call Response, and HTTP response (e.g. OK) for Call End.
Some fields are unused for two of the purposes, e.g. ExecutionTime is -1 for Call Begin and Call Response, and a value in ms for Call End.
How can I "roll up" the sets of 3 rows into one row per set? I have tried using DISTINCT and GROUP BY, but the fact that some of the information collides is making it very difficult. I apologize that my SQL isn't really good enough to really explain what I'm asking for - so perhaps an example will make it clearer:
Example of what I have:
SQL:
SELECT * FROM [dbo].[Log]
Results:
+---------+---------------------+-------+------------+---------------+---------------+-----------------+--+
| Session | Date | Level | Context | Message | ExecutionTime | Parameters | |
+---------+---------------------+-------+------------+---------------+---------------+-----------------+--+
| 84248B7 | 2014-07-20 19:16:15 | INFO | GET v1/abc | Call Begin | -1 | {"Input":"xx"} | |
| 84248B7 | 2014-07-20 19:16:15 | INFO | GET v1/abc | Call Response | -1 | {"Output":"yy"} | |
| 84248B7 | 2014-07-20 19:16:15 | INFO | GET v1/abc | Call End | 123 | OK | |
| F76BCBB | 2014-07-20 19:16:17 | ERROR | GET v1/def | Call Begin | -1 | {"Input":"ww"} | |
| F76BCBB | 2014-07-20 19:16:18 | ERROR | GET v1/def | Call Response | -1 | {"Output":"vv"} | |
| F76BCBB | 2014-07-20 19:16:18 | ERROR | GET v1/def | Call End | 456 | BadRequest | |
+---------+---------------------+-------+------------+---------------+---------------+-----------------+--+
Example of what I want:
SQL:
[Need to write this query]
Results:
+---------------------+-------+------------+----------+---------------+----------------+-----------------+--------------+
| Date | Level | Context | Message | ExecutionTime | InputModel | OutputModel | HttpResponse |
+---------------------+-------+------------+----------+---------------+----------------+-----------------+--------------+
| 2014-07-20 19:16:15 | INFO | GET v1/abc | Api Call | 123 | {"Input":"xx"} | {"Output":"yy"} | OK |
| 2014-07-20 19:16:17 | ERROR | GET v1/def | Api Call | 456 | {"Input":"ww"} | {"Output":"vv"} | BadRequest |
+---------------------+-------+------------+----------+---------------+----------------+-----------------+--------------+
select L1.Session, L1.Date, L1.Level, L1.Context, 'Api Call' AS Message,
L3.ExecutionTime,
L1.Parameters as InputModel,
L2.Parameters as OutputModel,
L3.Parameters as HttpResponse
from Log L1
inner join Log L2 ON L1.Session = L2.Session
inner join Log L3 ON L1.Session = L3.Session
where L1.Message = 'Call Begin'
and L2.Message = 'Call Response'
and L3.Message = 'Call End'
This would work in your sample.