Best way to store thousands of numbers in SQL? - sql

I have about 250 products and they all come with 52 weeks of history + 52 weeks of forecasts. I need to store these numbers in SQL but can't figure out the best way of doing it. I've only used databases a couple of times before so my knowledge is pretty limited...
I thought about using plain text and read/write with separators. But it felt bad in so many ways and made the entire database kinda useless.
Then I thought of adding 52 columns to the table, but I read it was a bad idea.
So now I'm back to where I began and it's just a table with
[ID] [WEEK_NUMBERS] [HISTORY_NUMBERS] [FORECAST_NUMBERS]
Is this the best way of doing it?
The ~25000-30000 rows are not a problem?

You would use a table something like this:
create table ProductHistory (
ProductHistoryId int identity primary key,
ProductId int not null references Products(ProductId),
Type varchar(255) not null,
Week date not null, -- storing this as a date is a guess
Number decimal(38, 10), -- should probably be decimal, but the scale and precision might be overkill,
constraint chk_ProductHistory_type (type in ('Forecast', 'Actual')
);
This is an example. It is unclear:
Should each row have a column for Forecast and Actual, or should they be on separate rows?
How should the week be stored?
What is the right type for Number?
But the idea is the same . . . at least one row per product/week combination.

Related

How to calculate age from birthday when creating a table in SQL?

This is my code
CREATE TABLE patients (
PatientID int PRIMARY KEY NOT NULL,
FirstName varchar(40) NOT NULL,
LastName varchar(40) NOT NULL,
PatientBirthday date,
PatientAge int AS (year(CURRENT_TIMESTAMP) - year(PatientBirthday))
);
But whenever I run it I get a syntax error highlighting on AS
I don't think you can. The expressions that are available in a calculated field are limited. If you remove the line with the calculated field, create the table, then shift it into design view and try to add the calculated field manually, you'll get the design interface and it will give you a list of the available functions. YEAR is listed, but NOW isn't,
and I think NOW is the Access equivalent of the ANSI current_timestamp. If you attempt YEAR(now()) you will see this:
YEAR([PatientBirthday]) is accepted by the editor, so that's not the issue. Sorry, I think this is not possible, at least not the way you want to do it. I think you will have to calculate during the SELECT or UPDATE the ages after insert, or you could hardcode the current year: 2021 - YEAR([PatientBirthday]). I know that's less than ideal. It would also be possible to store the current year in a separate table and update that with VBA at database startup, then use that field in the calculation.
Access can be very hacky at times. I really would suggest using a full featured ANSI compliant DBMS if you have the choice.

How to produce a reproducible column of random integers in SQL

I have a table of patient, with a unique patientID column. This patientID cannot be shared with study teams, so I need a randomised set of unique patient identifiers to be able to share. The struggle is that there will be several study teams, so every time a randomised identifier is produced, it needs to be different to the identifier produced for other studies. To make it even more complicated, we need to be able to reproduce the same set of random identifiers for a study at any point (if the study needs to re-run the data for example).
I have looked into the RAND() and NEWID() functions but not managed to figure out a solution. I think this may be possible using RAND() with a seed, and a while loop, but I haven't used these before.
Can anyone provide a solution that allows me to share several randomised sets of unique identifiers, that never have the same identifier for the same patient, and which can be re-run to produce the same list?
Thanks in advance to anyone that helps with this!
Your NEWID() should work as long as you have correct datatype.
Using UNIQUEIDENTIFIER as datatype should be unique across entire database/server. See full details from link below:
sqlshack.com/understanding-the-guid-data-type-in-sql-server
DECLARE #UNI UNIQUEIDENTIFIER
SET #UNI = NEWID()
SELECT #UNI
Comments from link:
As mentioned earlier, GUID values are unique across tables, databases, and servers. GUIDs can be considered as global primary keys. Local primary keys are used to uniquely identify records within a table. On the other hand, GUIDs can be used to uniquely identify records across tables, databases, and servers.
One method is to use the patientid as a seed to rand():
select rand(checksum(patientid))
This returns a value between 0 and 1. You can multiply by a large number.
That said, I think you should keep a list of patients in each study -- so you don't have to reproduce the results. Reproducing results seems dangerous, especially for something like a "study" that could have an impact on health.
This is too much for a comment. It's not black and white from your description and comments what you are asking for, but it appears you want to associate a new random ID value for each existing patients' ID, presumably being able to tie it back to the source ID, and produce the same random ID at a later date repeatedly.
It sounds like you'll need an intermediary table to store the randomly produced IDs (otherwise, being random how do you guarantee to get the same value for the same PatientID?)
Could you therefore have a table something like
create table Synonyms (
Id int not null identity(1,1),
PatientId int not null,
RandomId uniqueidentifier not null default newid(),
Createdate datetime not null default getdate()
)
PatientId is the foreign key to the actual Id of the Patent.
Each time you need a new random PatientId, insert the PatientIDs into this table and then join to it when querying out the patient data, supplying the RandomId instead. That way, you can reproduce the same random Id each time it's needed.
You could have a view that always provides the most recent RandomId value for each PatientId, or by some mechanism to track which "version" a report gets.
If you need a new Id for the patient, insert its Id again and you are guaranteed to get the same Id via whatever logic you need - ie you could have a ReportNo column as a sequence partitioned by PatientId or any number of other ways.
If you prefer to avoid a GUID you could make it an int and use a function to generate it by checking it's not already used, possibly a computed column with an inline function that selects top 1 from a numbers table that doesn't already exist as a RandomId... or something like that!
I may have completely misunderstood, hopefully it might give you some ideas though.

How to create a custom primary key using strings and date

I have an order table in sql server and I need for the order number primary key to be like this
OR\20160202\01
OR is just a string
20160202 is the Date
01 is sequence number for that day
for second Order record the same day it would be
OR\20160202\02 and so on..
backlashes should also be included...
Whats the way to go about creating such a field in sql server (using version 2016)
EDIT: to add more context to what sequence number is, its just a way for this field composite or not to be unique. without a sequence number i would get duplicate records in DB because i could have many records the same day so date would remain the same thus it would be something like
OR\20160202 for all rows for that particular day so it would be duplicate. Adding a "sequence" number helps solve this.
The best way is to not create such a column in SQL. You're effectively combining multiple pieces of data into the same column, which shouldn't happen in a relational database for many reasons. A column should hold one piece of data.
Instead, create a composite primary key across all of the necessary columns.
composite pk
order varchar(20)
orDate DateTime
select *
, row_number() over (partition by cast(orDate as Date) order by orDate) as seq
from table
Will leave it to you on how to concatenate the data
That is presentation thing - don't make it a problem for the PK
About "sequence number for that day" (department, year, country, ...).
Almost every time I discussed such a requirement with end users it turned out to be just misunderstanding of how shared database works, a vague attempt to repeat old (separate databases, EXCEL files or even paper work) tricks on shared database.
So i second Tom H and others, first try not to do it.
If nevertheless you must do it, for legal or other unnegotiatable reasons then i hope you are on 2012+. Create SEQUENCE for every day.
Formatted PK is not a good idea.Composite key is a better approach.The combination of day as a date column and order number as a bigint column should be used.This helps in improving the query performance too.
You might want to explore 'Date Dimension' table. Date Dimension is commonly used table in data warehousing. It stores all the days of the calendar(based on your choice of years) and numeric generated keys for these days. Check this post on date dimension. It talks about creating one in SQL SERVER.
https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/

MySQL. Working with Integer Data Interval

I've just started using SQL, so that have no idea how t work with not standard data types.
I'm working with MySQL...
Say, there are 2 tables: Stats and Common. The Common table looks like this:
CREATE TABLE Common (
Mutation VARCHAR(10) NOT NULL,
Deletion VARCHAR(10) NOT NULL,
Stats_id ??????????????????????,
UNIQUE(Mutation, Deletion) );
Instead of ? symbols there must be some type that references on the Stats table (Stats.id).
The problem is, this type must make it possible to save data in such a format: 1..30 (interval between 1 and 30). According to this type, it was my idea to shorten the Common table's length.
Is it possible to do this, are there any different ideas?
Assuming that Stats.id is an INTEGER (if not, change the below items as appropriate):
first_stats_id INTEGER NOT NULL REFERENCES Stats(id)
last_stats_id INTEGER NOT NULL REFERENCES Stats(id)
Given that your table contains two VARCHAR fields and an unique index over them, having an additional integer field is the least of your concerns as far as memory usage goes (seriously, one integer field represents a mere 1GB of memory for 262 million lines).

Generate unique ID to share with multiple tables SQL 2008

I have a couple of tables in a SQL 2008 server that I need to generate unique ID's for. I have looked at the "identity" column but the ID's really need to be unique and shared between all the tables.
So if I have say (5) five tables of the flavour "asset infrastructure" and I want to run with a unique ID between them as a combined group, I need some sort of generator that looks at all (5) five tables and issues the next ID which is not duplicated in any of those (5) five tales.
I know this could be done with some sort of stored procedure but I'm not sure how to go about it. Any ideas?
The simplest solution is to set your identity seeds and increment on each table so they never overlap.
Table 1: Seed 1, Increment 5
Table 2: Seed 2, Increment 5
Table 3: Seed 3, Increment 5
Table 4: Seed 4, Increment 5
Table 5: Seed 5, Increment 5
The identity column mod 5 will tell you which table the record is in. You will use up your identity space five times faster so make sure the datatype is big enough.
Why not use a GUID?
You could let them each have an identity that seeds from numbers far enough apart never to collide.
GUIDs would work but they're butt-ugly, and non-sequential if that's significant.
Another common technique is to have a single-column table with an identity that dispenses the next value each time you insert a record. If you need them pulling from a common sequence, it's not unlikely to be useful to have a second column indicating which table it was dispensed to.
You realize there are logical design issues with this, right?
Reading into the design a bit, it sounds like what you really need is a single table called "Asset" with an identity column, and then either:
a) 5 additional tables for the subtypes of assets, each with a foreign key to the primary key on Asset; or
b) 5 views on Asset that each select a subset of the rows and then appear (to users) like the 5 original tables you have now.
If the columns on the tables are all the same, (b) is the better choice; if they're all different, (a) is the better choice. This is a classic DB spin on the supertype / subtype relationship.
Alternately, you could do what you're talking about and recreate the IDENTITY functionality yourself with a stored proc that wraps INSERT access on all 5 tables. Note that you'll have to put a TRANSACTION around it if you want guarantees of uniqueness, and if this is a popular table, that might make it a performance bottleneck. If that's not a concern, a proc like that might take the form:
CREATE PROCEDURE InsertAsset_Table1 (
BEGIN TRANSACTION
-- SELECT MIN INTEGER NOT ALREADY USED IN ANY OF THE FIVE TABLES
-- INSERT INTO Table1 WITH THAT ID
COMMIT TRANSACTION -- or roll back on error, etc.
)
Again, SQL is highly optimized for helping you out if you choose the patterns I mention above, and NOT optimized for this kind of thing (there's overhead with creating the transaction AND you'll be issuing shared locks on all 5 tables while this process is going on). Compare that with using the PK / FK method above, where SQL Server knows exactly how to do it without locks, or the view method, where you're only inserting into 1 table.
I found this when searching on google. I am facing a simillar problem for the first time. I had the idea to have a dedicated ID table specifically to generate the IDs but I was unsure if it was something that was considered OK design. So I just wanted to say THANKS for confirmation.. it looks like it is an adequate sollution although not ideal.
I have a very simple solution. It should be good for cases when the number of tables is small:
create table T1(ID int primary key identity(1,2), rownum varchar(64))
create table T2(ID int primary key identity(2,2), rownum varchar(64))
insert into T1(rownum) values('row 1')
insert into T1(rownum) values('row 2')
insert into T1(rownum) values('row 3')
insert into T2(rownum) values('row 1')
insert into T2(rownum) values('row 2')
insert into T2(rownum) values('row 3')
select * from T1
select * from T2
drop table T1
drop table T2
This is a common problem for example when using a table of people (called PERSON singular please) and each person is categorized, for example Doctors, Patients, Employees, Nurse etc.
It makes a lot of sense to create a table for each of these people that contains thier specific category information like an employees start date and salary and a Nurses qualifications and number.
A Patient for example, may have many nurses and doctors that work on him so a many to many table that links Patient to other people in the PERSON table facilitates this nicely. In this table there should be some description of the realtionship between these people which leads us back to the categories for people.
Since a Doctor and a Patient could create the same Primary Key ID in their own tables, it becomes very useful to have a Globally unique ID or Object ID.
A good way to do this as suggested, is to have a table designated to Auto Increment the primary key. Perform an Insert on that Table first to obtain the OID, then use it for the new PERSON.
I like to go a step further. When things get ugly (some new developer gets got his hands on the database, or even worse, a really old developer, then its very useful to add more meaning to the OID.
Usually this is done programatically, not with the database engine, but if you use a BIG INT for all the Primary Key ID's then you have lots of room to prefix a number with visually identifiable sequence. For example all Doctors ID's could begin with 100, all patients with 110, all Nurses with 120.
To that I would append say a Julian date or a Unix date+time, and finally append the Auto Increment ID.
This would result in numbers like:
110,2455892,00000001
120,2455892,00000002
100,2455892,00000003
since the Julian date 100yrs from now is only 2492087, you can see that 7 digits will adequately store this value.
A BIGINT is 64-bit (8 byte) signed integer with a range of -9.22x10^18 to 9.22x10^18 ( -2^63 to 2^63 -1). Notice the exponant is 18. That's 18 digits you have to work with.
Using this design, you are limited to 100 million OID's, 999 categories of people and dates up to... well past the shelf life of your databse, but I suspect thats good enough for most solutions.
The operations required to created an OID like this are all Multiplication and Division which avoids all the gear grinding of text manipulation.
The disadvantage is that INSERTs require more than a simple TSQL statement, but the advantage is that when you are tracking down errant data or even being clever in your queries, your OID is visually telling you alot more than a random number or worse, an eyesore like GUID.