CREATE TABLE Persons(
ID int not null,
Name varchar(255) not null,
Description varchar(255));
INSERT INTO Persons values(15, "Alex", [["cool",1,19],["strong", 1, 20]]);
Is it possible to use a list of lists in this case or should I use another type?
Consider how you will query this data in the future. For example, will you need to search for a person with a specific trait in their description? How would you write that query if it's stored in a "list" as you call it? Using any kind of semi-structured data makes it easy to put data in, but it's not always clear how to search the data afterwards. You should think ahead with this in mind.
If you use the technique of structuring your database into Normal Forms, you will end up with a database that is the most flexible in terms of supporting a wide variety of queries.
Any standard Relational DMBS is not supposed to store such data as it violates normalisation principles.
While the following schema will suffice to create a table it saves a little time now and creates massive time sink later.
CREATE TABLE Persons
(
ID int not null,
Name varchar(255) not null,
MultiValueColumnViolates1NF varchar(255)
)
;
It violates the 1st NF because column MultiValueColumnViolates1NF allows multiple data tuples in a single cell. Yes, it can hold a list (JSON or XML depends on the RDBMS flavour). Or as normal DBAs call this: Garbage in, garbage out. Or as I call it: Excel tables.
An actual design to store such data preferably is at least in 2NF. Which in this case can be:
CREATE TABLE People
(
Name varchar(255) not null,
SingleValueColumn varchar(255)
)
;
The INSERT statement will then allow inserting data like:
INSERT INTO People
VALUES
( 'Alex', '["cool",1,19]' ),
( 'Alex', '["strong", 1, 20]')
;
One issue: No unique key possible. So there are multiple rows coming back if data is retrieved for 'Alex'.
Probably not what you want to achieve.
An RDMBS performant way to store this data is in two separate tables.
CREATE TABLE People
(
ID int not null,
Name varchar(255) not null
)
;
CREATE TABLE People_Data
(
ID_People int NOT NULL,
Key varchar(100) NOT NULL,
Value varchar(200) NOT NULL
)
;
The downside to data normalisation is that it makes it more difficult to get the data back (Murphys Law: A database stores data and is unwilling to show it back once it has hold of the data).
If this is just to store data in a database that is completely used outside the database forever and a day then go with the first table creation.
If not, please use normalisation to allow fast and efficient analysis of the data through database tools.
Related
I have a Products table as follows:
create table dbo.Product (
Id int not null
Name nvarchar (80) not null,
Price decimal not null
)
I am creating Baskets (lists of products) as follows:
create table dbo.Baskets (
Id int not null
Name nvarchar (80) not null
)
create table dbo.BasketProducts (
BasketId int not null,
ProductId int not null,
)
A basket is created based on a Search Criteria using parameters:
MinimumPrice;
MaximumPrice;
Categories (can be zero to many);
MinimumWarrantyPeriod
I need to save these parameters so later I know how the basket was created.
In the future I will have more parameters so I see 2 options:
Add MinimumPrice, MaximumPrice and MinimumWarrantyPeriod as columns to Basket table and add a BasketCategories and Categories tables to relate a Basket to Categories.
Create a more flexible design using a Parameters table:
create table dbo.BasketParameters (
BasketId int not null,
ParameterTypeId int not null,
Value nvarchar (400) not null
)
create table dbo.ParameterType (
Id int not null
Name nvarchar (80) not null
)
Parameter types are MinimumPrice, MaximumPrice, Categories, MinimumWarrantyPeriod, etc.
So for each Basket I have a list of BasketParameters, all different, having each on value. Later if I need for parameter types I add them to the ParameterType table ...
The application will be responsible for using each Basket Parameters to build the Basket ... I will have, for example, a Categories table but will be decoupled from the BasketParameters.
Does this make sense? Which approach would you use?
Your first option is superior (especially since you are using a relational data store. I.e. SQL Server), since it is properly referential. This will be much easier to maintain and query as well as far more performant.
Your second solution is equivalent to an EVA table: https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
Which are usually a terrible idea (and if you need that type of flexibility you should probably use a Document Database or other NoSQL Solution instead). The only benefit to this is if you need to add/remove attributes regularly or based on other criteria.
I have a sample table like below where Course Completion Status of a Student is being stored:
Create Table StudentCourseCompletionStatus
(
CourseCompletionID int primary key identity(1,1),
StudentID int not null,
AlgorithmCourseStatus nvarchar(30),
DatabaseCourseStatus nvarchar(30),
NetworkingCourseStatus nvarchar(30),
MathematicsCourseStatus nvarchar(30),
ProgrammingCourseStatus nvarchar(30)
)
Insert into StudentCourseCompletionStatus Values (1, 'In Progress', 'In Progress', 'Not Started', 'Completed', 'Completed')
Insert into StudentCourseCompletionStatus Values (2, 'Not Started', 'In Progress', 'Not Started', 'Not Applicable', 'Completed')
Now as part of normalizing the schema I have created two other tables - CourseStatusType and Status for storing the Course Status names and Status.
Create Table CourseStatusType
(
CourseStatusTypeID int primary key identity(1,1),
CourseStatusType nvarchar(100) not null
)
Insert into CourseStatusType Values ('AlgorithmCourseStatus')
Insert into CourseStatusType Values ('DatabaseCourseStatus')
Insert into CourseStatusType Values ('NetworkingCourseStatus')
Insert into CourseStatusType Values ('MathematicsCourseStatus')
Insert into CourseStatusType Values ('ProgrammingCourseStatus')
Insert into CourseStatusType Values ('OperatingSystemsCourseStatus')
Insert into CourseStatusType Values ('CompilerCourseStatus')
Create Table Status
(
StatusID int primary key identity(1,1),
StatusName nvarchar (100) not null
)
Insert into Status Values ('Completed')
Insert into Status Values ('Not Started')
Insert into Status Values ('In Progress')
Insert into Status Values ('Not Applicable')
The modified table is as below:
Create Table StudentCourseCompletionStatus1
(
CourseCompletionID int primary key identity(1,1),
StudentID int not null,
CourseStatusTypeID int not null CONSTRAINT [FK_StudentCourseCompletionStatus1_CourseStatusType] FOREIGN KEY (CourseStatusTypeID) REFERENCES dbo.CourseStatusType (CourseStatusTypeID),
StatusID int not null CONSTRAINT [FK_StudentCourseCompletionStatus1_Status] FOREIGN KEY (StatusID) REFERENCES Status (StatusID),
)
I have few question on this:
Is this the correct way to normalize it ? The old table was very helpful to get data easily - I can store a student's course status in a single row, but now 5 rows are required. Is there a better way to do it?
Moving the data from the old table to this new table seems to be not an easy task. Can I achieve this using a query or I have to manually to do this ?
Any help is appreciated.
vou could also consider storing results in flat table like this:
studentID,courseID,status
1,1,"completed"
1,2,"not started"
2,1,"not started"
2,3,"in progress"
you will also need additional Courses table like this
courserId,courseName
1, math
2, programming
3, networking
and a students table
students
1 "john smith"
2 "perry clam"
3 "john deere"
etc..you could also optionally create a status table to store the distinct statusstrings statusstings and refer to their PK instead ofthestrings
studentID,courseID,status
1,1,1
1,2,2
2,1,2
2,3,3
... etc
and status table
id,status
1,"completed"
2,"not started"
3,"in progress"
the beauty of this representation is: it is quite easy to filter and aggregate data , i.e it is easy to query which subjects a particular person have completed, how many subjects are completed by an average student, etc. this things are much more difficult in the columnar design like you had. you can also easily add new subjects without the need to adapt your tables or even queries they,will just work.
you can also always usin SQLs PIVOT query to get it to a familiar columnar presentation like
name,mathstatus,programmingstatus,networkingstatus,etc..
but now 5 rows are required
No, it's still just one row. That row simply contains identifiers for values stored in other tables.
There are pros and cons to this. One of the main reasons to normalize in this way is to protect the integrity of the data. If a column is just a string then anything can be stored there. But if there's a foreign key relationship to a table containing a finite set of values then only one of those options can be stored there. Additionally, if you ever want to change the text of an option or add/remove options, you do it in a centralized place.
Moving the data from the old table to this new table seems to be not an easy task.
No problem at all. Create your new numeric columns on the data table and populate them with the identifiers of the lookup table records associated with each data table record. If they're nullable, you can make them foreign keys right away. If they're not nullable then you need to populate them before you can make them foreign keys. Once you've verified that the data is correct, remove the old de-normalized columns. Done.
In StudentCourseCompletionStatus1 you still need 2 associations to Status and CourseStatusType. So I think you should consider following variant of normalization:
It means, that your StudentCourseCompletionStatus would hold only one CourseStatusID and another table CourseStatus would hold the associations to CourseType and Status.
To move your data you can surely use a query.
When creating tables, I have generally created them with a couple extra columns that track change times and the corresponding user:
CREATE TABLE dbo.Object
(
ObjectId int NOT NULL IDENTITY (1, 1),
ObjectName varchar(50) NULL ,
CreateTime datetime NOT NULL,
CreateUserId int NOT NULL,
ModifyTime datetime NULL ,
ModifyUserId int NULL
) ON [PRIMARY]
GO
I have a new project now where if I continued with this structure I would have 6 additional columns on each table with this type of change tracking. A time column, user id column and a geography column. I'm now thinking that adding 6 columns to every table I want to do this on doesn't make sense. What I'm wondering is if the following structure would make more sense:
CREATE TABLE dbo.Object
(
ObjectId int NOT NULL IDENTITY (1, 1),
ObjectName varchar(50) NULL ,
CreateChangeId int NOT NULL,
ModifyChangeId int NULL
) ON [PRIMARY]
GO
-- foreign key relationships on CreateChangeId & ModifyChangeId
CREATE TABLE dbo.Change
(
ChangeId int NOT NULL IDENTITY (1, 1),
ChangeTime datetime NOT NULL,
ChangeUserId int NOT NULL,
ChangeCoordinates geography NULL
) ON [PRIMARY]
GO
Can anyone offer some insight into this minor database design problem, such as common practices and functional designs?
Where i work, we use the same construct as yours - every table has the following fields:
CreatedBy (int, not null, FK users table - user id)
CreationDate (datetime, not null)
ChangedBy (int, null, FK users table - user id)
ChangeDate (datetime, null)
Pro: easy to track and maintain; only one I/O operation (i'll come to that later)
Con: i can't think of any at the moment (well ok, sometimes we don't use the change fields ;-)
IMO the approach with the extra table has the problem, that you will have to reference somehow also the belonging table for every record (unless you only need the one direction Object to Tracking table). The approach also leads to more I/O database operations - for every insert or modify you will need to:
add entry to Table Object
add entry to Tracking Table and get the new Id
update Object Table entry with the Tracking Table Id
It would certainly make the application code that communicates with the DB a bit more complicated and error-prone.
I have some txt files that contain tables with a mix of different records on them which have diferent types of values and definitons for columns. I was thinking of importing it into a table and running a query to separate the different record types since a identifier to this is listed in the first column. Is there a way to change the value type of a column in a query? since it will be a pain to treat all of them as text. If you have any other suggestions on how to solve this please let me know as well.
Here is an example of tables for 2 record types provided by the website where I got the data from
create table dbo.PUBACC_A2
(
Record_Type char(2) null,
unique_system_identifier numeric(9,0) not null,
ULS_File_Number char(14) null,
EBF_Number varchar(30) null,
spectrum_manager_leasing char(1) null,
defacto_transfer_leasing char(1) null,
new_spectrum_leasing char(1) null,
spectrum_subleasing char(1) null,
xfer_control_lessee char(1) null,
revision_spectrum_lease char(1) null,
assignment_spectrum_lease char(1) null,
pfr_status char(1) null
)
go
create table dbo.PUBACC_AC
(
record_type char(2) null,
unique_system_identifier numeric(9,0) not null,
uls_file_number char(14) null,
ebf_number varchar(30) null,
call_sign char(10) null,
aircraft_count int null,
type_of_carrier char(1) null,
portable_indicator char(1) null,
fleet_indicator char(1) null,
n_number char(10) null
)
Yes, you can do what you want. In ms access you can use any VBA functions and with some
IIF(FirstColumn="value1", CDate(SecondColumn), NULL) as DateValue,
IIF(FirstColumn="value2", CDec(SecondColumn), NULL) as DecimalValue,
IIF(FirstColumn="value3", CStr(SecondColumn), NULL) as StringValue
You can use all/any of the above in your SELECT.
EDIT:
From your comments it seems that you want to split them into different tables - importing as text should not be a problem in that case.
a)
After you import and get it in the initial table, create the proper table manually setting you can INSERT into the proper table.
b)
You could even do a make table query, but it might be faster to create it manually. If you do a make table query you have to be sure that you have casted the data into proper type in your select.
EDIT2:
As you updated the question showing the structure it becomes obvious that my suggestion above will not help directly.
If this is one time process you can follow HLGEM's solution. Here are some more details.
1) Import into a table with two columns - RecordType char(2), Rest memo
2) Now you can split the data (make two queries that select based on RecordType) and re-export the data (to be able to use access' import wizard)
3) Now you have two text files with proper structure which can be easily imported
I did this in my last job. You start with a staging table that has one column or two coulmns if your identifier is always the same length.
Then using the record identifier, you move the data to another set of staging tables, one for each type of record you have. This will be in columns for the data and can have the correct data types. Then you do any data cleaning you need to do. Then you insert into the real production table.
If you have a column defined as text, because it has both alphas and numbers, you'll only be able to query it as if it were text. Once you've separated out the different "types" of data into their own tables, you should be able to change the schema definition. Please comment here if I'm misunderstanding what you're trying to do.
Sirs,
I have the following physical model below, resembling an class table inheritance like the pattern from Fowler (http://martinfowler.com/eaaCatalog/classTableInheritance.html)
CREATE TABLE [dbo].[ProductItem] (
[IdProductItem] INT IDENTITY (1, 1) NOT NULL,
[IdPointOfSale] INT NOT NULL,
[IdDiscountRules] INT NOT NULL,
[IdProductPrice] INT NULL);
CREATE TABLE [dbo].[Cellphone] (
[IdCellphone] INT IDENTITY (1, 1) NOT NULL,
[IdModel] INT NOT NULL,
[IMEI] NVARCHAR (150) NOT NULL,
[IdProductItem] INT NULL
);
ProductItem is my base class. It handles all actions related to the sales. Cellphone is a subclass from ProductItem. It adds the atributes and behavior specific that I need to use when I sell an cellphone (IMEI number, activate the cell phone etc)
I need to track each item of the inventory individually. When I receive a batch of 10.000 cellphone, I need to load all this information in my system. I need to create the cellphones and the productitem in my database.
If it was only one table, it is easy to use bulk insert. But, in my case I have an base class with some diferent subclasses represented by tables. What is the best approach to handle this task?
Regards
Camilo
If you're ok with buik inserts, it's still easiest to build a little script to build the tables using an appropriate sequence for referential integrity - in your case probably product, then instances of product (cellphones).