I am currently exploring Change Data Capture as an option to store temporal databases. It is great because it stores only the deltas and seems like it may solve my problem. When I enabled CDC, a bunch of tables appeared under System Tables.
When querying cdc.dbo_MyTable, I am able to see all the changes that took place on the table. Now, how would I construct a historical view? For instance, if I wanted to see the state of my table as of a particular date, how would I go about doing that? Is that even possible?
It looks I need to take the log and start applying it over my original table but I was wondering if there is a built-in way of doing this. Any suggestions?
Some of the use cases I am looking at:
Know the state of the graph at a particular point in time
Given two graphs at different times, know the set of links that are different (this can probably be obtained using an EXCEPT clause after constructing the tables)
it's possible, but not with a built-in way i'm a afraid. You would have to reconstruct the timeline by hand.
Given that the change-tracking tables offer the tran_end_time, which is the time that the value of the property should be perceived as persisted, you would have to make a query that fetches all the distinct periods of table states, join on the tracked property changes and then pivot (to have a presentation in the same form as the table). Don't forget to union with the table state itself to obtain the values that have not been changed/tracked for completeness.
The final result, simplified, should look like
RN PK PropA PropB FromDate ToDate
1 1 'Ver1' 'Ver1' 2012-01-01 09:00 2012-01-02 08:00
2 1 'Ver1' 'Ver2' 2012-01-02 08:00 2012-01-03 07:00
3 1 'Ver2' 'Ver2' 2012-01-03 07:00 *getdate()*
4 2 'Ver1' 'Ver1' 2012-01-01 05:00 2012-01-02 06:00
5 2 'Ver1' 'Ver2' 2012-01-02 06:00 2012-01-03 01:00
6 2 'Ver2' 'Ver2' 2012-01-03 01:00 *getdate()*
note that the getdate() is valid if the row wasn't deleted in which case it should be substituted with the deletion date
EDIT, for the 2 use cases.
The first point is easily addressed it's a matter of constructing the temporal object graph and then filtering:
declare #pointInTime datetime = '20120102 10:00';
select * from Reconstructed_TG where FromDate <= #pointInTime and #pointInTime < ToDate
the second point, can be generated easily with the EXCEPT clause, as you point out.
given the above query:
declare #pointInTimeA datetime = '20120102 10:00';
declare #pointInTimeB datetime = '20120103 01:00';
select * from Reconstructed_TG where FromDate <= #pointInTimeA and #pointInTimeA < ToDate
EXCEPT
select * from Reconstructed_TG where FromDate <= #pointInTimeB and #pointInTimeB < ToDate
yet the except clause only presents the rows that have at least one different column value; i don't know if that information is really meaningful to the human eye. Depending on your needs a query that works directly on the cdc data may be more appropriate.
You may want to check out Snapshots, which have been built in to SQL Server since 2005.
These will be most useful to you if you only need a few timepoints, but they can help you track all of the tables in a complex database.
These are deltas, so Compared to a full copy of a database, however, snapshots are highly space efficient. A snapshot requires only enough storage for the pages that change during its lifetime. Generally, snapshots are kept for a limited time, so their size is not a major concern.
I'm not sure about this, never done anything like that, but maybe you can add a column "changeset" to the table that can keep track of the changes you have on the table, every time there's a transaction get the max(changeset) and save the new cahnges with the next value... Or if you have a timestamp and want to know the status of your table at certain time do querys to filter changes previous to the date you want to check...
(Not sure if I should write this is as an answer or a comment... I'm new here)
Anyway, hope it helps...
Related
I'm dealing with a table containing records from questionnaires administered to people after completing an activity. There are several questions on the questionnaire, so each person has multiple records with the same collection date, like so.
PersonID Question Result CollectedDate
-------------------------------------------------------------
1001 First activity? Yes 10/23/2022
1001 Activity date 10/20/2022 10/23/2022
1001 Activity type Painting 10/23/2022
1002 First activity? No 10/24/2022
1002 Activity date 10/23/2022 10/24/2022
1002 Activity type Writing 10/24/2022
Since my end goal is to compare the activity date with the questionnaire collection date and see how much time elapsed between them, I've altered my query a bit so I'm focusing only on each person's question regarding the activity date. It's a super simple query:
SELECT
PersonID,
Question,
Result,
CollectedDate
FROM Questionnaire
WHERE Question LIKE '%date%'
PersonID Question Result CollectedDate
-------------------------------------------------------------
1001 Activity date 10/20/2022 10/23/2022
1002 Activity date 10/23/2022 10/24/2022
My main issue is that the Result field is varchar(50) in order to accommodate text answers, so any dates seen there are actually from free text fields in the front-end interface. I've tried using both CAST() and CONVERT() to turn it into an actual date format so the difference between the dates can be calculated. I've seen both of the following errors depending on which function I'm using or which date/time style I'm attempting to apply:
Conversion failed when converting date and/or time from character string
The conversion of a varchar data type to a datetime data type resulted in an out-of-range value
I've tried:
SELECT
PersonID,
Question,
CAST(Result as date),
CollectedDate
FROM Questionnaire
WHERE Question LIKE '%date%'
and...
SELECT
PersonID,
Question,
CONVERT(DATETIME,Result,101) as Result,
CollectedDate
FROM Questionnaire
WHERE Question LIKE '%date%'
...and have tried several different styles. Does anyone have any further suggestions? Is the date itself likely the problem, or is if the fact that the Result field contains a bunch of other stuff too, even though it's currently omitted from the query results?
UPDATE: There are some kind of wonky date formats in this Result field even when I have the other question types filtered out (I hate free text). For example, there are some formatted like 05/01/2022 and others like 5/1/2022. Some others have something like 5/19/2022 - 5/20/2022, like maybe the person couldn't remember the exact date of their activity. What's the best way to deal with all of this?
You should be able to get past the error by making sure you reject any value that can't be converted to a date. Largely, that is this:
Result = CASE
WHEN ISDATE(Result) = 1 THEN CONVERT(date, Result, 101) END
You'd think it would be enough to say WHERE Question = 'Activity Date' AND ISDATE(Result) = 1, but:
Someone still might have entered bad data on that row.
SQL Server might try to perform the CONVERT() operation before the filter.
You can identify the ones that have bad data using:
WHERE Question = 'Activity Date' AND ISDATE(Result) = 0
But until you've fixed the structure and stored dates in an independent column, fixing that data just means it's a matter of time before it happens again.
You might consider, in the meantime, just displaying what the user entered as a string, instead of trying to force it to be converted to a date. Especially since 101 might be a bad guess - what if the user is from the UK or Canada? They may have entered 05/12 and meant December 5th, not May 12th.
I have a BigQuery dataset updating on irregular times (can be once, twice a week, or less). Data is structured as following.
id
Column1
Column2
data_date(timestamp)
0
Datapoint0
Datapoint00
2022-01-01
1
Datapoint1
Datapoint01
2022-01-01
2
Datapoint2
Datapoint02
2022-01-03
3
Datapoint3
Datapoint03
2022-01-03
4
Datapoint4
Datapoint04
2022-02-01
5
Datapoint5
Datapoint05
2022-02-01
6
Datapoint6
Datapoint06
2022-02-15
7
Datapoint7
Datapoint07
2022-02-15
Timestamp is a string in 'YYYY-MM-DD' format.
I want to make a chart and a pivot table in Google DataStudio that automatically filters by the latest datapoints ('2022-02-15' in the example). All the solutions I tried are either sub-optimal or just don't work:
Creating a support column doesn't work because I need to mix aggregated and non-aggregated fields (data_date and the latest data_date)
Adding a filter to the charts allows me to specify only a specific day - I would need to edit the chart regularly every time the underlyind data is updated
Using a dropdown filter allows me to dynamically filter whatever date I need. However I consider it suboptimal because I can't have it automatically select the latest date. Having a date filter can make it dynamic, but since the update time is not regular it may select a date range with multiple timestamps/or none at all, so it's also a sub-optimal solution
Honestly I'm out of ideas. I stupidly thought it was possible to add a column saying data_date = (select max(data_date) from dataset, but it seems not possible since max needs to work on aggregated data.
One possible solution could be creating a view that can have the latest data point, and referencing the view from the data studio.
CREATE OR REPLACE VIEW `project_id.dataset_id.table_name` AS
SELECT *
FROM `bigquery-public-data.covid19_ecdc_eu.covid_19_geographic_distribution_worldwide`
ORDER BY date DESC # or timestamp DESC
LIMIT 1
I would like to make an SQL-Statement in order to find the amount of users that are using a channel by date and time. Let me give you an example:
Let's call this table Data:
Date Start End
01.01.2020 17:00 17:30
01.01.2020 17:01 17:03
01.01.2020 17:29 18:30
Data is a table that shows when an user started the connection on a channel and the time the connection was closed. A connection can be made any time, which means from 00:00 until the next day.
What I am trying to achieve is to count the maximum number of connections that were made over a big period if time. Let's say 1st February to 1st April.
My idea was to make another table with timestamps in Excel. The table would display a Timestamp for every Minute in a specific date.
Then I tried to make a statement like:
SELECT *
FROM Data,Timestamps
WHERE Timestamps.Time BETWEEN Data.Start AND Data.End.
Now logically this statement does what is supposed to do. The only problem is that it is not really performant and therefore not finishing. With the amount of timestamps and the amount of data I have to check it is not able to finish.
Could anybody help me with this problem? Any other ideas I can try or how to improve my statement?
Regards!
So why dou you create another table in Excel and not directly in MS Access and then why won't you set up the indexes of the timestamps right. That will speed it up by factors.
By the way I think that your statement will print repeat every user that happened to match your Start .. End period, so the amount of rows produced will be enormous. You shall rather try
SELECT Timestamps.Time, COUNT(*)
FROM Data,Timestamps
WHERE Timestamps.Time BETWEEN Data.Start AND Data.End
GROUP BY Timestamps.Time;
But sorry if the syntax in MS Access is different.
I run a query every day to place in a file. It is regarding effective date and term dates of coverage. occasionally have a group that will actually term and become effective again the next day. I need help with SQL code that will pick up the original effective date and the latest expiration date. The example that I am giving is a very small part of the table.. due to hippa regulations. The SQL code that I currently am using is super easy query code and I have supplied just the lines of data within the attachment.you will see where this member has 2 effect dates and 2 term dates I need to display it as one..with 01/01/2018 as effect and 12/31/9999 as term. cannot figure out how to add an attachment.. so I am just going to copy the two rows.
meme_altid meme_eff meme_trm
S409666X1E 2018-01-01 2018-12-31
S409666X1E 2019-01-01 9999-12-31
Earliest eff and latest term?
Select meme_altid, Min(meme_eff) As eff, Max(meme_trm) As term From #tbl
Group By meme_altid
I've got a bit of a weird logic problem that I can't seem to wrap my head around (perhaps from studying it for too long).
Where I work we have a very old piece of software that we're required to use to track the status of equipment that we use. This software provides very little functionality to manipulate these statuses to try and provide a good analysis of downtime. I've been working on a database application in Access (since it's the only tool they make available to me) to import status data from the old system into a format that is more easily manipulated.
The way status data is spit out from the old program is fairly straight-forward:
EQUIPNAME STATUS STARTDATETIME ENDDATETIME
It's easy enough to read that text and insert it into the table in Access. The problem I'm having comes from trying to find how many hours a piece of equipment spent in different statuses over different date ranges.
The start/end date/time can be any length of time. Finding which rows contain the dates is difficult. I've been using BETWEEN statements in SQL to try and find them which, for the most part, works out well:
SELECT * FROM Statuses WHERE
(StartDateTime BETWEEN [StartDT] AND [EndDT])
OR
(EndDateTime BETWEEN [StartDT] AND [EndDT])
The real issue is when StartDateTime is BEFORE StartDT and EndDateTime is AFTER EndDT (ie the entire range I'm looking for is INSIDE this status's start/end dates). It simply doesn't find it, which makes sense.
I can't seem to come up with an elegant solution to this. I need to be able to select all rows which contain a status that contains or is contained within the supplied date range. I wouldn't normally come here for such a simple problem, but my brain and Google-fu are failing me.
A little bit of sample data:
EQUIP STATUS STARTDATETIME ENDDATETIME
A123 OPER 01/30/2013 21:30 12/31/1999 00:00
A123 DFM 01/26/2013 10:42 01/30/2013 21:29
A123 OPER 01/01/2013 00:00 01/26/2013 10:41
B123 OPER 01/01/2013 00:00 12/31/1999 00:00
C123 DFU 01/29/2013 12:31 12/31/1999 00:00
C123 OPER 01/01/2013 00:00 01/29/2013 12:30
Any kind of booking collusion occurs when:
RequestStartDate <= EndDate
and
RequestEndDate >= StartDate
The above will ALSO return overlaps. So if I query today + tomorrow, and a range starts at the being of the year to the end of the year, the query WILL be included in the range.
Eg:
Select * from tblEQUIP
where
#01/31/2013# <= ENDDATETIME
and
#02/01/2013# >= StartDateTime
At that point you can "process" each record. You likely have to use something like:
Do while RecordDate.eof = false
For datePtr = RequestStartDateTime to RequestendDateTime
If datePtr >= RecordData!StartDateTime and DatePtr <= RecordData!EndDateTime then
DaysTotal = DaysTotal + 1
End if
Next DatePtr
recordData.Movenext
loop
The above is air code, but shows the basic processing loop you need to first grab the overlapping records, and then a processing loop to add up days/time for each record in your date range that does fall withing the given date range.
Interesting question ! Sorry not to have much time now to eleborate, but I would advise to explore the Partition function for that, or and/or doing a crosstab query with the date as column heading. More on msoffice site and here.