How to create a custom primary key using strings and date - sql

I have an order table in sql server and I need for the order number primary key to be like this
OR\20160202\01
OR is just a string
20160202 is the Date
01 is sequence number for that day
for second Order record the same day it would be
OR\20160202\02 and so on..
backlashes should also be included...
Whats the way to go about creating such a field in sql server (using version 2016)
EDIT: to add more context to what sequence number is, its just a way for this field composite or not to be unique. without a sequence number i would get duplicate records in DB because i could have many records the same day so date would remain the same thus it would be something like
OR\20160202 for all rows for that particular day so it would be duplicate. Adding a "sequence" number helps solve this.

The best way is to not create such a column in SQL. You're effectively combining multiple pieces of data into the same column, which shouldn't happen in a relational database for many reasons. A column should hold one piece of data.
Instead, create a composite primary key across all of the necessary columns.

composite pk
order varchar(20)
orDate DateTime
select *
, row_number() over (partition by cast(orDate as Date) order by orDate) as seq
from table
Will leave it to you on how to concatenate the data
That is presentation thing - don't make it a problem for the PK

About "sequence number for that day" (department, year, country, ...).
Almost every time I discussed such a requirement with end users it turned out to be just misunderstanding of how shared database works, a vague attempt to repeat old (separate databases, EXCEL files or even paper work) tricks on shared database.
So i second Tom H and others, first try not to do it.
If nevertheless you must do it, for legal or other unnegotiatable reasons then i hope you are on 2012+. Create SEQUENCE for every day.

Formatted PK is not a good idea.Composite key is a better approach.The combination of day as a date column and order number as a bigint column should be used.This helps in improving the query performance too.

You might want to explore 'Date Dimension' table. Date Dimension is commonly used table in data warehousing. It stores all the days of the calendar(based on your choice of years) and numeric generated keys for these days. Check this post on date dimension. It talks about creating one in SQL SERVER.
https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/

Related

How to produce a reproducible column of random integers in SQL

I have a table of patient, with a unique patientID column. This patientID cannot be shared with study teams, so I need a randomised set of unique patient identifiers to be able to share. The struggle is that there will be several study teams, so every time a randomised identifier is produced, it needs to be different to the identifier produced for other studies. To make it even more complicated, we need to be able to reproduce the same set of random identifiers for a study at any point (if the study needs to re-run the data for example).
I have looked into the RAND() and NEWID() functions but not managed to figure out a solution. I think this may be possible using RAND() with a seed, and a while loop, but I haven't used these before.
Can anyone provide a solution that allows me to share several randomised sets of unique identifiers, that never have the same identifier for the same patient, and which can be re-run to produce the same list?
Thanks in advance to anyone that helps with this!
Your NEWID() should work as long as you have correct datatype.
Using UNIQUEIDENTIFIER as datatype should be unique across entire database/server. See full details from link below:
sqlshack.com/understanding-the-guid-data-type-in-sql-server
DECLARE #UNI UNIQUEIDENTIFIER
SET #UNI = NEWID()
SELECT #UNI
Comments from link:
As mentioned earlier, GUID values are unique across tables, databases, and servers. GUIDs can be considered as global primary keys. Local primary keys are used to uniquely identify records within a table. On the other hand, GUIDs can be used to uniquely identify records across tables, databases, and servers.
One method is to use the patientid as a seed to rand():
select rand(checksum(patientid))
This returns a value between 0 and 1. You can multiply by a large number.
That said, I think you should keep a list of patients in each study -- so you don't have to reproduce the results. Reproducing results seems dangerous, especially for something like a "study" that could have an impact on health.
This is too much for a comment. It's not black and white from your description and comments what you are asking for, but it appears you want to associate a new random ID value for each existing patients' ID, presumably being able to tie it back to the source ID, and produce the same random ID at a later date repeatedly.
It sounds like you'll need an intermediary table to store the randomly produced IDs (otherwise, being random how do you guarantee to get the same value for the same PatientID?)
Could you therefore have a table something like
create table Synonyms (
Id int not null identity(1,1),
PatientId int not null,
RandomId uniqueidentifier not null default newid(),
Createdate datetime not null default getdate()
)
PatientId is the foreign key to the actual Id of the Patent.
Each time you need a new random PatientId, insert the PatientIDs into this table and then join to it when querying out the patient data, supplying the RandomId instead. That way, you can reproduce the same random Id each time it's needed.
You could have a view that always provides the most recent RandomId value for each PatientId, or by some mechanism to track which "version" a report gets.
If you need a new Id for the patient, insert its Id again and you are guaranteed to get the same Id via whatever logic you need - ie you could have a ReportNo column as a sequence partitioned by PatientId or any number of other ways.
If you prefer to avoid a GUID you could make it an int and use a function to generate it by checking it's not already used, possibly a computed column with an inline function that selects top 1 from a numbers table that doesn't already exist as a RandomId... or something like that!
I may have completely misunderstood, hopefully it might give you some ideas though.

I need help counting char occurencies in a row with sql (using firebird server)

I have a table where I have these fields:
id(primary key, auto increment)
car registration number
car model
garage id
and 31 fields for each day of the mont for each row.
In these fields I have char of 1 or 2 characters representing car status on that date. I need to make a query to get number of each possibility for that day, field of any day could have values: D, I, R, TA, RZ, BV and LR.
I need to count in each row, amount of each value in that row.
Like how many I , how many D and so on. And this for every row in table.
What best approach would be here? Also maybe there is better way then having field in database table for each day because it makes over 30 fields obviously.
There is a better way. You should structure the data so you have another table, with rows such as:
CarId
Date
Status
Then your query would simply be:
select status, count(*)
from CarStatuses
where date >= #month_start and date < month_end
group by status;
For your data model, this is much harder to deal with. You can do something like this:
select status, count(*)
from ((select status_01 as status
from t
) union all
(select status_02
from t
) union all
. . .
(select status_31
from t
)
) s
group by status;
You seem to have to start with most basic tutorials about relational databases and SQL design. Some classic works like "Martin Gruber - Understanding SQL" may help. Or others. ATM you miss the basics.
Few hints.
Documents that you print for user or receive from user do not represent your internal data structures. They are created/parsed for that very purpose machine-to-human interface. Inside your program should structure the data for easy of storing/processing.
You have to add a "dictionary table" for the statuses.
ID / abbreviation / human-readable description
You may have a "business rule" that from "R" status you can transition to either "D" status or to "BV" status, but not to any other. In other words you better draft the possible status transitions "directed graph". You would keep it in extra columns of that dictionary table or in one more specialized helper table. Dictionary of transitions for the dictionary of possible statuses.
Your paper blank combines in the same row both totals and per-day detailisation. That is easy for human to look upon, but for computer that in a sense violates single responsibility principle. Row should either be responsible for primary record or for derived total calculation. You better have two tables - one for primary day by day records and another for per-month total summing up.
Bonus point would be that when you would change values in the primary data table you may ask server to automatically recalculate the corresponding month totals. Read about SQL triggers.
Also your triggers may check if the new state properly transits from the previous day state, as described in the "business rules". They would also maybe have to check there is not gaps between day. If there is a record for "march 03" and there is inserted a new the record for "march 05" then a record for "march 04" should exists, or the server would prohibit adding such a row. Well, maybe not, that is dependent upon you business processes. The general idea is that server should reject storing any data that is not valid and server can know it.
you per-date and per-month tables should have proper UNIQUE CONSTRAINTs prohibiting entering duplicate rows. It also means the former should have DATE-type column and the latter should either have month and year INTEGER-type columns or have a DATE-type column with the day part in it always being "1" - you would want a CHECK CONSTRAINT for it.
If your company has some registry of cars (and probably it does, it is not looking like those car were driven in by random one-time customers driving by) you have to introduce a dictionary table of cars. Integer ID (PK), registration plate, engine factory number, vagon factory number, colour and whatever else.
The per-month totals table would not have many columns per every status. It would instead have a special row for every status! The structure would probably be like that: Month / Year / ID of car in the registry / ID of status in the dictionary / count. All columns would be integer type (some may be SmallInt or BigInt, but that is minor nuancing). All the columns together (without count column) should constitute a UNIQUE CONSTRAINT or even better a "compound" Primary Key. Adding a special dedicated PK column here in the totaling table seems redundant to me.
Consequently, your per-day and per-month tables would not have literal (textual and immediate) data for status and car id. Instead they would have integer IDs referencing proper records in the corresponding cars dictionary and status dictionary tables. That you would code as FOREIGN KEY.
Remember the rule of thumb: it is easy to add/delete a row to any table but quite hard to add/delete a column.
With design like yours, column-oriented, what would happen if next year the boss would introduce some more statuses? you would have to redesign the table, the program in many points and so on.
With the rows-oriented design you would just have to add one row in the statuses dictionary and maybe few rows to transition rules dictionary, and the rest works without any change.
That way you would not

Calculate date and time key in fact table using existing date time field

I have date time field in a fact table in the format MM/DD/YY HH:MM:SS (e.g 2/24/2009 11:18:47 AM) and I have seperate date and time dimension
tables. What I would like ask is that how I can create date key and time key in the fact table using the date time field so that I can join the
date and time dimension.
There are alot of reference for creating seperate date and time dimension and their benefit but I could not find how to create date and time keys
in the fact table using existing date time field.
I have also heard that having date time field in fact table has certain benefit. If so, what would you recommend, should I have all three (date key, time key
and date time field) in the fact table. Date key and time key are must to have for me and I am concerned about fact table size if I have date time field
also in the fact table.
Thank you all for any help you can give.
What you need to do (if I understand correctly) is to create two fields in your Fact table:kTime, kDate.
We would always suggest using the primary keys for DimTime and DimDate as having meaning (this being a special case normally Dim tables' promary keys dont have any meaning). So e.g. in DimDate, we would have kDate as primary key with values formatted as YYYYMMDD so that you can order by kDate and it puts them in date order. Then have DimTime table having kTime primary key in the form HHMM or HHMMSS (depending on the resolution you need.
It is best to keep the actual date time field on the fact table as well, as it allows SQL to use its inbuilt date/time functions to do subsetting, but if you extend your Dim tables with useful extra columns : DimDate (add DayOfWeek, IsHoliday, DayNumber, MonthNumber, YearNumber, etc) and DimTime (HourNumber, MinuteNumber, IsWorkingTime), then you can perform very interesting queries very simply.
So to answer your question, "how to create date and time keys in the fact table using existing date time field?" ... as you are loading the data into the fact table, use the inbuilt date/time functions to create separate Date fields and Time fields.
It depends very much on how many rows you expect in your Fact table wether this approach will produce a lot of data, but it is the easiest to work with from a data warehouse point of view.
best of luck!

DateTime as part of PK in FACT Table for warehouses

I know generally its not good idea to have a DateTime column as your PK, however for my situation I believe it makes more sense than having a surrogate key in the Fact Table.
The reasons are...
The data inserted into the fact table is always sequential. i.e. I would never insert date time value that is older than the last value already in Fact table.
The date time field is not the only column of the PK (composite PK), The PK is of course itself and the dimension FK's surrogate key.
The way i be querying the data is nearly always based on time.
A surrogate key on the Fact table would tell me nothing about the row. Each row is already unique and to find that particular fact I would always filter on the Date time first and the values in the dimensions.
There is no separate datetime dimension table. No requirement now or in the foreseeable future to have named points in time etc.
Side notes - time will be in UTC and using SQL 2008 R2.
What I'm asking is are giving the situation - what are the disadvantages to doing this? Will I come up against unforeseen issues?
Is this actaully a good thing to be doing when querying that data back later?
Would like to know peoples view points on a DateTime field as the first column of a composite PK.
It's almost an essential feature of any data warehouse that date/time is a component of a key in most tables. There's nothing "wrong" with that.
A surrogate key generally shouldn't be the only key of a table, so perhaps your question is really "Should I create a surrogate key on my table as well?". My suggestion is that if you don't have a reason to create a surrogate key then don't. The time to create a surrogate is when you find that you need it.
Most fact tables have composite keys and date-time or often DateKey, TimeKey are part of it. Actually, quite common.
The dimDate and dimTime are simply used to avoid having "funny" date-time functions in the WHERE clause of a query. For example
-- sales on
-- weekends for previous 28 weeks
--
select sum(f.SaleAmount)
from factSale as f
join dimDate as d on d.DateKey = f.DateKey
where d.IsWeekend = 'yes'
and d.WeeksAgo between 1 and 28 ;
So here I can have indexes on IsWeekend and WeeksAgo (DateKey too). If these were replaced by date-time functions, this would cause row-by-row processing.

How are these tasks done in SQL?

I have a table, and there is no column which stores a field of when the record/row was added. How can I get the latest entry into this table? There would be two cases in this:
Loop through entire table and get the largest ID, if a numeric ID is being used as the identifier. But this would be very inefficient for a large table.
If a random string is being used as the identifier (which is probably very, very bad practise), then this would require more thinking (I personally have no idea other than my first point above).
If I have one field in each row of my table which is numeric, and I want to add it up to get a total (so row 1 has a field which is 3, row 2 has a field which is 7, I want to add all these up and return the total), how would this be done?
Thanks
1) If the id is incremental, "select max(id) as latest from mytable". If a random string was used, there should still be an incremental numeric primary key in addition. Add it. There is no reason not to have one, and databases are optimized to use such a primary key for relations.
2) "select sum(mynumfield) as total from mytable"
for the last thing use a SUM()
SELECT SUM(OrderPrice) AS OrderTotal FROM Orders
assuming they are all in the same column.
Your first question is a bit unclear, but if you want to know when a row was inserted (or updated), then the only way is to record the time when the insert/update occurs. Typically, you use a DEFAULT constraint for inserts and a trigger for updates.
If you want to know the maximum value (which may not necessarily be the last inserted row) then use MAX, as others have said:
SELECT MAX(SomeColumn) FROM dbo.SomeTable
If the column is indexed, MSSQL does not need to read the whole table to answer this query.
For the second question, just do this:
SELECT SUM(SomeColumn) FROM dbo.SomeTable
You might want to look into some SQL books and tutorials to pick up the basic syntax.