table design for time-dependent data

table design for time-dependent data - sql

I would like to store some time-dependent data on MS SQL Server, for example employee salary history. Attributes include employee number, effective from date, salary, reason of change.
My question is "Should I include the effective to date on the table? " What is the best practice on storing this type of time-dependent data (current effective to = next effective from - 1), i.e. Should I include the value which can be deduce from another record ?
If effective to is to be stored, can I use Trigger to maintain its value?
Michael

What you seem to have is a slowly changing dimension. The best approach is based on how the data wil be used. After all, if you are never going to query the data, I can save you a bunch of time and perhaps money by advising you not to use a database at all.
That said, I would encourage you to have two dates on each row:
EffDate
EndDate
You can then determine the values of any of the values at any given point in history. This makes querying the data easier. But, when loading the table, you need to remember to update the EndDate of the previously active data. That is one extra step on the load.
If you only have EffDate, then the querying is more difficult, although the load is slightly easier. If the only thing you will ever do is choose an employee and look at the records in order, this might be sufficient.
As for the nature of the dates. If the dates are "continuous" (meaning they have a time component), then EndDate for one record should be equal to the EffDate of the next. Querying for a time would use:
where MYDATE >= EffDate and MYDATE < EndDate
If the dates are discrete, then I have the condition that the EndDate is one less than the next EffDate:
where MYDATE >= EffDate and MyDate <= EndDate
You could also use between for this condition (but be careful when using between on anything that looks like a date).

You need to have a child table with history of salaries, its primary key is a combined key (employeeId and a sequence id). This table will be your reference for data history.
ex:
empX, 1, fromDate 10/10/2012, toDate 11/10/2012, 1,980,000$,
empX, 2, .....
empX, 3,....
Your employee table will have the current salary and its effective date. this table will be faster to query from.
..and this employee is a lucky one ;)

Related

I need help counting char occurencies in a row with sql (using firebird server)

I have a table where I have these fields:
id(primary key, auto increment)
car registration number
car model
garage id
and 31 fields for each day of the mont for each row.
In these fields I have char of 1 or 2 characters representing car status on that date. I need to make a query to get number of each possibility for that day, field of any day could have values: D, I, R, TA, RZ, BV and LR.
I need to count in each row, amount of each value in that row.
Like how many I , how many D and so on. And this for every row in table.
What best approach would be here? Also maybe there is better way then having field in database table for each day because it makes over 30 fields obviously.

There is a better way. You should structure the data so you have another table, with rows such as:
CarId
Date
Status
Then your query would simply be:
select status, count(*)
from CarStatuses
where date >= #month_start and date < month_end
group by status;
For your data model, this is much harder to deal with. You can do something like this:
select status, count(*)
from ((select status_01 as status
from t
) union all
(select status_02
from t
) union all
. . .
(select status_31
from t
)
) s
group by status;

You seem to have to start with most basic tutorials about relational databases and SQL design. Some classic works like "Martin Gruber - Understanding SQL" may help. Or others. ATM you miss the basics.
Few hints.
Documents that you print for user or receive from user do not represent your internal data structures. They are created/parsed for that very purpose machine-to-human interface. Inside your program should structure the data for easy of storing/processing.
You have to add a "dictionary table" for the statuses.
ID / abbreviation / human-readable description
You may have a "business rule" that from "R" status you can transition to either "D" status or to "BV" status, but not to any other. In other words you better draft the possible status transitions "directed graph". You would keep it in extra columns of that dictionary table or in one more specialized helper table. Dictionary of transitions for the dictionary of possible statuses.
Your paper blank combines in the same row both totals and per-day detailisation. That is easy for human to look upon, but for computer that in a sense violates single responsibility principle. Row should either be responsible for primary record or for derived total calculation. You better have two tables - one for primary day by day records and another for per-month total summing up.
Bonus point would be that when you would change values in the primary data table you may ask server to automatically recalculate the corresponding month totals. Read about SQL triggers.
Also your triggers may check if the new state properly transits from the previous day state, as described in the "business rules". They would also maybe have to check there is not gaps between day. If there is a record for "march 03" and there is inserted a new the record for "march 05" then a record for "march 04" should exists, or the server would prohibit adding such a row. Well, maybe not, that is dependent upon you business processes. The general idea is that server should reject storing any data that is not valid and server can know it.
you per-date and per-month tables should have proper UNIQUE CONSTRAINTs prohibiting entering duplicate rows. It also means the former should have DATE-type column and the latter should either have month and year INTEGER-type columns or have a DATE-type column with the day part in it always being "1" - you would want a CHECK CONSTRAINT for it.
If your company has some registry of cars (and probably it does, it is not looking like those car were driven in by random one-time customers driving by) you have to introduce a dictionary table of cars. Integer ID (PK), registration plate, engine factory number, vagon factory number, colour and whatever else.
The per-month totals table would not have many columns per every status. It would instead have a special row for every status! The structure would probably be like that: Month / Year / ID of car in the registry / ID of status in the dictionary / count. All columns would be integer type (some may be SmallInt or BigInt, but that is minor nuancing). All the columns together (without count column) should constitute a UNIQUE CONSTRAINT or even better a "compound" Primary Key. Adding a special dedicated PK column here in the totaling table seems redundant to me.
Consequently, your per-day and per-month tables would not have literal (textual and immediate) data for status and car id. Instead they would have integer IDs referencing proper records in the corresponding cars dictionary and status dictionary tables. That you would code as FOREIGN KEY.
Remember the rule of thumb: it is easy to add/delete a row to any table but quite hard to add/delete a column.
With design like yours, column-oriented, what would happen if next year the boss would introduce some more statuses? you would have to redesign the table, the program in many points and so on.
With the rows-oriented design you would just have to add one row in the statuses dictionary and maybe few rows to transition rules dictionary, and the rest works without any change.
That way you would not

How to create a custom primary key using strings and date

I have an order table in sql server and I need for the order number primary key to be like this
OR\20160202\01
OR is just a string
20160202 is the Date
01 is sequence number for that day
for second Order record the same day it would be
OR\20160202\02 and so on..
backlashes should also be included...
Whats the way to go about creating such a field in sql server (using version 2016)
EDIT: to add more context to what sequence number is, its just a way for this field composite or not to be unique. without a sequence number i would get duplicate records in DB because i could have many records the same day so date would remain the same thus it would be something like
OR\20160202 for all rows for that particular day so it would be duplicate. Adding a "sequence" number helps solve this.

The best way is to not create such a column in SQL. You're effectively combining multiple pieces of data into the same column, which shouldn't happen in a relational database for many reasons. A column should hold one piece of data.
Instead, create a composite primary key across all of the necessary columns.

composite pk
order varchar(20)
orDate DateTime
select *
, row_number() over (partition by cast(orDate as Date) order by orDate) as seq
from table
Will leave it to you on how to concatenate the data
That is presentation thing - don't make it a problem for the PK

About "sequence number for that day" (department, year, country, ...).
Almost every time I discussed such a requirement with end users it turned out to be just misunderstanding of how shared database works, a vague attempt to repeat old (separate databases, EXCEL files or even paper work) tricks on shared database.
So i second Tom H and others, first try not to do it.
If nevertheless you must do it, for legal or other unnegotiatable reasons then i hope you are on 2012+. Create SEQUENCE for every day.

Formatted PK is not a good idea.Composite key is a better approach.The combination of day as a date column and order number as a bigint column should be used.This helps in improving the query performance too.

You might want to explore 'Date Dimension' table. Date Dimension is commonly used table in data warehousing. It stores all the days of the calendar(based on your choice of years) and numeric generated keys for these days. Check this post on date dimension. It talks about creating one in SQL SERVER.
https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/

sql query to find missing records

I am using a transaction table called Student_Details which contain Batch_No, PF_No, Emp_name,DOB, DOR and DOJ along with other details. There is another master table called Batch_Master which contains Batch_No, From_date, To_date and Due_date.
I want to get the details of staff whose due date is within Enter_date1 and Enter_date2 who actually fall due within this period and such of those staff whose due_date is earlier to Enter_date1 but still not come (i.e there is no record for the same person with a DOJ after due_date.)
Please help in designing a query in MS-Access

You should look specifically at SQL WHERE clause. I think this would be most beneficial for what you are trying to achieve. WHERE is used in these types of conditional statements to say if this is true (or false depending on the situation) then return this.
For example:
SELECT *
FROM <location>
WHERE Due_date > Enter_date1 AND Due_date < Enter_date2.
However I am really confused on this statement:
"such of those staff whose due_date is earlier to Enter_date1 but still not come (i.e there is no record for the same person with a DOJ after due_date"
To me this sounds like you just want to return every less than Enter_date2, but you want to ensure that there is only Unique records return... or so I assume.

sql server updating new column based on another column from same table

I have a table with composite primary key based on three columns.
(Unit, Account, service)
I have another two columns: DOB (this holds date of birth) & Age (this is a new int column which needs to be updated based on DOB and the result will be integer value of years)
I know how to retrieve the reslt for Age
select datediff(Year,DOB,GETDATE()) as AGE
But not sure how to update entire table based on the rows DOB data.
Columns are Unit, Account, Service, DOB, Age

As per the comments, it isn't wise to persist Age as it varies daily (and possibly more frequently, if you have users in different timezones).
Also, your Age vs DOB algorithm isn't accurate - see here for a better one
As a consequence, IMO this is one scenario where a non-persisted COMPUTED column makes sense, like so:
ALTER TABLE Person add Age
AS DateDiff(yy,DOB,CURRENT_TIMESTAMP)
- CASE WHEN DATEPART (mm,DOB) <= DATEPART(mm,CURRENT_TIMESTAMP)
and DATEPART(dd, DOB) <= DATEPART(dd,CURRENT_TIMESTAMP)
THEN 0
ELSE 1
END

To answer your question:
UPDATE dob.MyTable SET Age = datediff(Year,DOB,GETDATE());
This will update the entire table as per your requirements.
However, I strongly recommend you look at all the other answers here. Especially the ones about the calculation error in above formula.

Get rid of the age column and calculate the age like so:
SELECT DATEDIFF(yy, DOB, getdate()) AS age FROM daffyduck
There are very rare cases that you need to store items such as age. Since age changes daily, you will have to update your records often. It is instead better to store a fixed value, such as the date of birth that can be calculated. Most DBMS's provide the functionality for doing the date arithmetic for this reason.

What is the best way to structure Days of the week in a db

This is a normalization thing, but I want I have to hold information about the days of the week. Where the user is going to select each day and put a start time and a finish time. I need this info to be stored in a db. I can simply add 14 fields to the table and it will work (MondayStart,MondayFinish,TuesdayStart, etc). This doesnt seem

Do NOT design your database to match the UI.
My time keeping system at my job has a place to enter data for each day of the week. That doesn't mean you store it that way.
You need a table for users and one for times
User_T
User_ID
Time_log_T
User_ID
Start_dt (datetime)
End_dt (Datetime)
Everything can be derived from this.
If you want to have one check-in per day create a unique constraint on User_ID, TRUNC(start_DT). This will handle third shift that wrap days. RDBMS cannot express that the next start_dt for a given User_ID is > MAX(End_DT) for that user... you'll have to do that in code. Of course if you allow records from previous days to be entered or corrected you'll need to validate them to be non-overlapping in a more complex style.
Think of all the queries you'd throw at these tables; This will beat the 14 columns 99% of the time.

Users
id
...etc...
Days
id
day nvarchar (Monday, Tuesday, etc)
start_time datetime
end_time datetime
user_id
you could also break out day in Days to a day of week to enforce consistency on the day if you only want to allow specific days or what not so Days would become
Days
id
day_of_week_id
...etc...
DaysOfWeek
id
name

I don't think moving the data to another table would accomplish anything. There would still be a one-to-one (main record to 14 fields) relationship. It would be more complex and run slower.
Your instincts are good but in this case I think you would be better off leaving the data in the table. Over-normalization is a bad thing.

You could create a table with 3 columns -- one for the day (this would be the primary key), one for the start time, and one for the finish time.
You would then have one row for each day of the week.
You could extend it with, say, a column for a user id, if you are storing the start and finish time for each user on each day (in this case, the primary key would be user id and day of the week)... or something similar to suit your needs.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas