storing data ranges - effective representation

storing data ranges - effective representation - sql

I need to store values for every day in timeline, i.e. every user of database should has status assigned for every day, like this:
from 1.1.2000 to 28.05.2011 - status 1
from 29.05.2011 to 30.01.2012 - status 3
from 1.2.2012 to infinity - status 4
Each day should have only one status assigned, and last status is not ending (until another one is given). My question is what is effective representation in sql database? Obvious solution is to create row for each change (with the last day the status is assigned in each range), like this:
uptodate status
28.05.2011 status 1
30.01.2012 status 3
01.01.9999 status 4
this has many problems - if i would want to add another range, say from 15.02.2012, i would need to alter last row too:
uptodate status
28.05.2011 status 1
30.01.2012 status 3
14.02.2012 status 4
01.01.9999 status 8
and it requires lots of checking to make sure there is no overlapping and errors, especially if someone wants to modify ranges in the middle of the list - inserting a new status from 29.01.2012 to 10.02.2012 is hard to implement (it would require data ranges of status 3 and status 4 to shrink accordingly to make space for new status). Is there any better solution?
i thought about completly other solution, like storing each day status in separate row - so there will be row for every day in timeline. This would make it easy to update - simply enter new status for rows with date between start and end. Of course this would generate big amount of needless data, so it's bad solution, but is coherent and easy to manage. I was wondering if there is something in between, but i guess not.
more context: i want moderator to be able to assign status freely to any dates, and edit it if he would need to. But most often moderator will be adding new status data ranges at the end. I don't really need the last status. After moderator finishes editing whole month time, I need to generate raport based on status on each day in that month. But anytime moderator may want to edit data months ago (which would be reflected on updated raports), and he can put one status for i.e. one year in advance.

You seem to want to use this table for two things - recording the current status and the history of status changes. You should separate the current status out and move it up to the parent (just like the registered date)
User
===============
Registered Date
Current Status
Status History
===============
Uptodate
Status

Your table structure should include the effective and end dates of the status period. This effectively "tiles" the statuses into groups that don't overlap. The last row should have a dummy end date (as you have above) or NULL. Using a value instead of NULL is useful if you have indexes on the end date.
With this structure, to get the status on any given date, you use the query:
select *
from t
where <date> between effdate and enddate
To add a new status at the end of the period requires two changes:
Modify the row in the table with the enddate = 01/01/9999 to have an enddate of yesterday.
Insert a new row with the effdate of today and an enddate of 01/01/9999
I would wrap this in a stored procedure.
To change a status on one date in the past requires splitting one of the historical records in two. Multiple dates may require changing multiple records.
If you have a date range, you can get all tiles that overlap a given time period with the query:
select *
from t
where <periodstart> <= enddate and <periodend> >= effdate

Related

SQL - Update groups of data based on start and end dates

I have a table with dates of service for various hospital stays and want to update the starting and end dates for each claim to match the length of the entire stay. The table below has seven inpatient stays and dates of service for each of those stays. A min_max flag of 1 or 2 means that the dates in that row cover the entire length of that specific stay (each stay is color-coded).
Current table image here
I need to updated the dates for all rows within each colored grouping to match the starting and end dates for the row which has a min_max flag of 1 or 2 within the same group to ultimately find the sum of claims in each stay. I could do this manually here or in excel but I need it done on a much larger scale with thousands of hospital stays.
Goal table here
TIA!

Running Total Based on State Change and Date Difference

I'm wanting to do a running total based on State and Date Difference.
I have machines that enter a row of data every few milliseconds providing it's current state, (Off-line, Stopped, Ready, Active & Error), some machine data and a timestamp.
These machines can run for a few minutes or a few days so using a date range doesn't work for the current status duration.
an example of the data is:-
RowID, MachineID, Status, TimeStamp
1, Machine1, Active, 27/04/2022 10:00:00.050
I want to pick up the current status, which I do by picking up the Top 1 entry by the machineID and ordering by RowID Descending
If my current Status is Active I want to know how long its been in that state, the machine could have been active for a few minutes or a few days so using a date range doesn't work for me, I want to perform a date diff from the last entry to the first entry that the Status changed to Active.
All advise is welcomed and thanks for reading my post.

Adjusting Overlapping Dates and Applying Other Rules

I'm trying to design a SELECT query that will modify some dates in a subset of "activity" records for each person in a table of data.
Each person is identified uniquely using "PersonID" and then each activity record using "RecordID".
Each subset of activity records a person has will also have dates against the "Start Date" and "End Date" fields.
For example, data like this (sorted by start date then longest duration):
(I've added the yellow bars to give an idea of some of the overlap and gaps between sets of dates).
Where I work, we have a task that involves adding a maximum of 1 "claim" record to associate with each of these activity records. The claim records have their own Start Date and End Date, but each claim record must:
Not cover a duration outside of the Start Date and End Date of the activity record it's being attached to.
Not overlap with the duration of any other claim record in the person's subset of claim records
Have a duration of at least 1 month, defined as either: starting on the first day of the month and ending on the last day of the month (e.g. 1/12/2018 to 31/12/2018), or starting in a month but ending in a different month (e.g. 31/12/2018 to 1/1/2019).
This is because the claim records are validated against the activity records (by an external validation tool we have no control over).
So, based on the example above, the query might output the following efficient set of dates for each activity record to use as claim records:
A brief run-down of what would happen on each record:
Record 1: For claim record 1, it used the original dates from the activity records, i.e. create a claim record that covers the entire
activity period. If working through the records in the sort order
described, it would make sense that it simply claims the full activity
period for the first claim record for each person's subset of activity
records.
Record 2: For claim record 2, NULLs have be supplied as there is no period to claim on this activity that hasn't already been claimed
in record 1.
Record 3: For claim record 3, the start date has been set to 1/12/2018, because that is the earliest date to claim from that is
both within this activity period and after the end month of the last
claim (i.e. record 1's 25/11/2019 end date).
The end date has been set to 31/12/2018. You may wonder why it is not
set to the activity's original end date of 23/1/2019. If you look
ahead to record 5, setting record 3's end date to 23/1/2019 would mean
setting record 5's start date to 1/2/2019, and it's end date would be
4/2/2019, which is not long enough to make a claim. So it would be
more efficient to stop record 3 in December so record 5 can claim both
January AND February. This may be hard to script though!
Record 4: For claim record 4, NULLs have be supplied as there is no period to claim on this activity that hasn't already been claimed
in previous claim records.
Record 5: For claim record 5, the start date has been set to 1/1/2019. See record 3 for an explanation of why.
Record 6: For claim record 6, it used the original dates from the activity records. This was just to illustrate that not all activity
records will overlap.
I'm not too sure how to approach this. I've looked at some CTE examples, but nothing that seems to match what I'm trying to do (perhaps too ambitious.. particularly the record 3 & 5 scenario?)
Any help / examples would be much appreciated.

monitor the time taken for each entry in a sql table and notify using email if the time taken is more than 5 minutes

I have a table which contains the products details, If it is a new product the status will be 1.
Once it got purchased, the status will change to 2.
My requirement is to send mail to the owner if the product remains in status 1 for more than 5 minutes.
Help me out to proceed further, what are all the ways to do so.

Maybe you can add a field like "LastStatusChangedOn", which is a DateTime (or a DateTimeOffset if you need to keep account with different time zones).
And then just select all Products where the difference between the current time and the LastStatusChangedOn is greater than 5 minutes.
Without the exact database structure, it's impossible to give a complete sample, but something like this?
SELECT * FROM Products WHERE DateDiff(minute, LastStatusChangeOn, getdate()) > 5

SQL - state machine - reporting on historical data based on changeset

I want to record user states and then be able to report historically based on the record of changes we've kept. I'm trying to do this in SQL (using PostgreSQL) and I have a proposed structure for recording user changes like the following.
CREATE TABLE users (
userid SERIAL NOT NULL PRIMARY KEY,
name VARCHAR(40),
status CHAR NOT NULL
);
CREATE TABLE status_log (
logid SERIAL,
userid INTEGER NOT NULL REFERENCES users(userid),
status CHAR NOT NULL,
logcreated TIMESTAMP
);
That's my proposed table structure, based on the data.
For the status field 'a' represents an active user and 's' represents a suspended user,
INSERT INTO status_log (userid, status, logcreated) VALUES (1, 's', '2008-01-01');
INSERT INTO status_log (userid, status, logcreated) VALUES (1, 'a', '2008-02-01');
So this user was suspended on 1st Jan and active again on 1st of February.
If I wanted to get a suspended list of customers on 15th January 2008, then userid 1 should show up. If I get a suspended list of customers on 15th February 2008, then userid 1 should not show up.
1) Is this the best way to structure this data for this kind of query?
2) How do I query the data in either this structure or in your proposed modified structure so that I can simply have a date (say 15th January) and find a list of customers that had an active status on that date in SQL only? Is this a job for SQL?

This can be done, but would be a lot more efficient if you stored the end date of each log. With your model you have to do something like:
select l1.userid
from status_log l1
where l1.status='s'
and l1.logcreated = (select max(l2.logcreated)
from status_log l2
where l2.userid = l1.userid
and l2.logcreated <= date '2008-02-15'
);
With the additional column it woud be more like:
select userid
from status_log
where status='s'
and logcreated <= date '2008-02-15'
and logsuperseded >= date '2008-02-15';
(Apologies for any syntax errors, I don't know Postgresql.)
To address some further issues raised by Phil:
A user might get moved from active, to suspended, to cancelled, to active again. This is a simplified version, in reality, there are even more states and people can be moved directly from one state to another.
This would appear in the table like this:
userid from to status
FRED 2008-01-01 2008-01-31 s
FRED 2008-02-01 2008-02-07 c
FRED 2008-02-08 a
I used a null for the "to" date of the current record. I could have used a future date like 2999-12-31 but null is preferable in some ways.
Additionally, there would be no "end date" for the current status either, so I think this slightly breaks your query?
Yes, my query would have to be re-written as
select userid
from status_log
where status='s'
and logcreated <= date '2008-02-15'
and (logsuperseded is null or logsuperseded >= date '2008-02-15');
A downside of this design is that whenever the user's status changes you have to end date their current status_log as well as create a new one. However, that isn't difficult, and I think the query advantage probably outweighs this.

Does Postgres support analytic queries? This would give the active users on 2008-02-15
select userid
from
(
select logid,
userid,
status,
logcreated,
max(logcreated) over (partition by userid) max_logcreated_by_user
from status_log
where logcreated <= date '2008-02-15'
)
where logcreated = max_logcreated_by_user
and status = 'a'
/

#Tony the "end" date isn't necessarily applicable.
A user might get moved from active, to suspended, to cancelled, to active again. This is a simplified version, in reality, there are even more states and people can be moved directly from one state to another.
Additionally, there would be no "end date" for the current status either, so I think this slightly breaks your query?

#Phil
I like Tony's solution. It seems to most approriately model the situation described. Any particular user has a status for a given period of time (a minute, an hour, a day, etc.), but it is for a duration, not an instant in time. Since you want to know who was active during a certain period of time, modeling the information as a duration seems like the best approach.
I am not sure that additional statuses are a problem. If someone is active, then suspended, then cancelled, then active again, each of those statuses would be applicable for a given duration, would they not? It may be a vey short duration, such as a few seconds or a minute, but they would still be for a length of time.
Are you concerned that a person's status can change multiple times in a given day, but you want to know who was active for a given day? If so, then you just need to more specifically define what it means to be active on a given day. If it is enough that they were active for any part of that day, then Tony's answer works well as is. If they would have to be active for a certain amount of time in a given day, then Tony's solution could be modified to simply determine the length of time (in hours, or minutes, or days), and adding further restrictions in the WHERE clause to retrieve for the proper date, status, and length of time in that status.
As for there being no "end date" for the current status, that is no problem either as long as the end date were nullable. Simply use something like this "WHERE enddate <= '2008-08-15' or enddate is null".

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas