I am in the process of migrating a SQL 2008 R2 database between software versions (6 years old to current schema.) There are a few auditing tables with SQL TimeStamp columns on them. Am doing this by copying data out of original tables into the new structure - the change is fairly complex as you might expect after 6 years.
Is there a way to preserve the fingerprint of the timestamps as I move it into a new database or a best practise way of keeping the audit traceability of this data?
Thanks
You can convert a timestamp to varbinary(8) to preserve it:
select cast([timestamp] as varbinary(8))
But the value of timestamp itself is not particularly useful: it does not translate to a particular time. In the future, MSDN suggests it might be renamed to the more appropriate rowversion.
I have used SQL 2014 and it allows rowversion, for your case, you have to convert the timestamp ie. to varbinary and save in another table for reference in future. Timestamp is a read only column.
Related
(I am guessing based on How do I query the streaming buffer in BigQuery if the _PARTITIONTIME field isn't available with Standard SQL that my question has no simple solution, so I will "enhance" it)
I stream my data into Bigquery's partitioned and clustered table using a timestamp field (not an ingestion time partition).
I want to have a view that always look into the last hour data, what already in the table, plus what still in the buffer.
Since this table is not an ingestion time partitioned table, there is no pseudo column _PARTITIONTIME/DATE, so I can't use it in order to get the buffer data.
The only way I've found is by using legacy SQL: SELECT * FROM [dataset.streaming_data$__UNPARTITIONED__]
This is not good enough for me, since even if I save this as a view, I can't refer to a legacy SQL view from a standard SQL query.
Any idea how I can achieve this ?
Another idea I am thinking of - bigquery can have an external data source (using EXTERNAL_QUERY), which I can query using standard SQL.
A solution might be some "temporary" table on a separate database (such as PostgreSQL Cloud SQL) which will only have 1 hour of data, and won't have bigquery's buffer mechanism.
I think this is a bad solution, but I guess it might work...
What do you think ?
Thanks to #Felipe Hoffae I just found out I need to do nothing :-)
Buffered data is already available in any SQL query if the WHERE clause includes the data in it...
I'm creating a database which involves many records which have many dates. Many records within these tables can have the same date. These will range from 3 years prior to about 3 years in the future. Would an efficient system use the date datatype built into SQL or to make individual tables for the Date, Month and Year. Sorry if this seems like an amateur question, I've only learnt SQL recently for this project.
Thanks
Yes, as you already guessed, the best solution here is to use the date datatype built into SQL.
From the way you have asked the question, it sounds like you want to record aggregated data for each day/month/year. As #edward said, you will definitely want to use the built in data type for the raw records - your "fact" table, and then you might also build up aggregated data in separate tables for the year or month.
Depending on the volume of data these might be stored physically, or just done through views on the fact table.
In general, you never want to remove information as you never know how it might be used in the future, which is why storing with the raw date is the correct option.
Lets say I have a table called Test.
Test has a Column called CreatedDate. This is the DateTimeOffset at the time of creation of the row.
Test also has a Column called ExternalDate. This is an externally provided time through an API.
What I require to do is to calculate the difference between CreatedDate and ExternalDate. CreatedDate is DateTimeOffset, but ExternalDate is always provided as DateTime2. The external system providing this time does not provide offset or timezone data.
So we can see here, that the calculation can be off if we are in DST or not by an hour.
We are using SQL2008 unfortunately.
I am thinking of creating a table of DST dates, and performing a join when migrating the dating to figure out this issue as mentioned in another thread here. So historical migration is ok. (Other Thread: Migrating SQL stored DateTime values to DateTimeOffset best practice?)
Question is, I will continue to have the external system send in DateTime2 with no offset value. I am worried if I do the calculation on the fly with a join of this same DST table there may be performance implications? I am not too familiar with SQL performance. What does a SQL expert out there think about this? or is there some other more efficient way to do this on the fly?
Thanks! Much appreciated!
I googled around and found no answer. The question is whether for an existing SQL table (assume any of H2, MySQL, or Postgres)...
Is there a way to get the last-update timestamp value for a given table row. That is, without explicitly declaring a new column (altering the table), and/or adding triggers that update a timestamp column.
I'm using a JDBC driver, preparing statements, getting ResultSets and so forth. I need to be able to determine whether the data has changed recently or not, and for this a timestamp would help. If possible I want to avoid adding timestamp columns across all tables in the system.
There is no implicit standard approach to this problem. The standard way is having an explicit column and logic in a db trigger or app function...
As mentioned, there are ways to do it through the logs but it's hard and usually wont be much accurate. For example, in Postgres you can enable commit timestamps in postgresql.conf and check the last update time but those are approximate and are not kept for a long time...
I want to apply data concurrency on some of tables in my SQL Server database. In order to achieve row versioning, I am thinking of adding an integer column named RowVersion, and increase the row value of the column when I update a row.
There is an other option: using timestamp column. When you use timestamp column, SQL Server automatically creates a unique value when the row was updated.
I want to know the advantages of these options. I think that inserting an int column to store row version is more generic way while inserting a timestamp column is SQL Server specific.
What is more, integer value of a row version is more human readable than timestamp column.
I want to know other advantages or disadvantages choosing integer column for row version field.
Thanks.
If you let SQL server do it for you, you have guaranteed atomicity for free. (Concurrent updates on the same table are guaranteed to produce different row versions.) If you roll your own row versioning scheme, you are going to have to do whatever it takes to guarantee atomicity by yourself.