Sirs,
I have the following physical model below, resembling an class table inheritance like the pattern from Fowler (http://martinfowler.com/eaaCatalog/classTableInheritance.html)
CREATE TABLE [dbo].[ProductItem] (
[IdProductItem] INT IDENTITY (1, 1) NOT NULL,
[IdPointOfSale] INT NOT NULL,
[IdDiscountRules] INT NOT NULL,
[IdProductPrice] INT NULL);
CREATE TABLE [dbo].[Cellphone] (
[IdCellphone] INT IDENTITY (1, 1) NOT NULL,
[IdModel] INT NOT NULL,
[IMEI] NVARCHAR (150) NOT NULL,
[IdProductItem] INT NULL
);
ProductItem is my base class. It handles all actions related to the sales. Cellphone is a subclass from ProductItem. It adds the atributes and behavior specific that I need to use when I sell an cellphone (IMEI number, activate the cell phone etc)
I need to track each item of the inventory individually. When I receive a batch of 10.000 cellphone, I need to load all this information in my system. I need to create the cellphones and the productitem in my database.
If it was only one table, it is easy to use bulk insert. But, in my case I have an base class with some diferent subclasses represented by tables. What is the best approach to handle this task?
Regards
Camilo
If you're ok with buik inserts, it's still easiest to build a little script to build the tables using an appropriate sequence for referential integrity - in your case probably product, then instances of product (cellphones).
Related
CREATE TABLE Persons(
ID int not null,
Name varchar(255) not null,
Description varchar(255));
INSERT INTO Persons values(15, "Alex", [["cool",1,19],["strong", 1, 20]]);
Is it possible to use a list of lists in this case or should I use another type?
Consider how you will query this data in the future. For example, will you need to search for a person with a specific trait in their description? How would you write that query if it's stored in a "list" as you call it? Using any kind of semi-structured data makes it easy to put data in, but it's not always clear how to search the data afterwards. You should think ahead with this in mind.
If you use the technique of structuring your database into Normal Forms, you will end up with a database that is the most flexible in terms of supporting a wide variety of queries.
Any standard Relational DMBS is not supposed to store such data as it violates normalisation principles.
While the following schema will suffice to create a table it saves a little time now and creates massive time sink later.
CREATE TABLE Persons
(
ID int not null,
Name varchar(255) not null,
MultiValueColumnViolates1NF varchar(255)
)
;
It violates the 1st NF because column MultiValueColumnViolates1NF allows multiple data tuples in a single cell. Yes, it can hold a list (JSON or XML depends on the RDBMS flavour). Or as normal DBAs call this: Garbage in, garbage out. Or as I call it: Excel tables.
An actual design to store such data preferably is at least in 2NF. Which in this case can be:
CREATE TABLE People
(
Name varchar(255) not null,
SingleValueColumn varchar(255)
)
;
The INSERT statement will then allow inserting data like:
INSERT INTO People
VALUES
( 'Alex', '["cool",1,19]' ),
( 'Alex', '["strong", 1, 20]')
;
One issue: No unique key possible. So there are multiple rows coming back if data is retrieved for 'Alex'.
Probably not what you want to achieve.
An RDMBS performant way to store this data is in two separate tables.
CREATE TABLE People
(
ID int not null,
Name varchar(255) not null
)
;
CREATE TABLE People_Data
(
ID_People int NOT NULL,
Key varchar(100) NOT NULL,
Value varchar(200) NOT NULL
)
;
The downside to data normalisation is that it makes it more difficult to get the data back (Murphys Law: A database stores data and is unwilling to show it back once it has hold of the data).
If this is just to store data in a database that is completely used outside the database forever and a day then go with the first table creation.
If not, please use normalisation to allow fast and efficient analysis of the data through database tools.
I have a Products table as follows:
create table dbo.Product (
Id int not null
Name nvarchar (80) not null,
Price decimal not null
)
I am creating Baskets (lists of products) as follows:
create table dbo.Baskets (
Id int not null
Name nvarchar (80) not null
)
create table dbo.BasketProducts (
BasketId int not null,
ProductId int not null,
)
A basket is created based on a Search Criteria using parameters:
MinimumPrice;
MaximumPrice;
Categories (can be zero to many);
MinimumWarrantyPeriod
I need to save these parameters so later I know how the basket was created.
In the future I will have more parameters so I see 2 options:
Add MinimumPrice, MaximumPrice and MinimumWarrantyPeriod as columns to Basket table and add a BasketCategories and Categories tables to relate a Basket to Categories.
Create a more flexible design using a Parameters table:
create table dbo.BasketParameters (
BasketId int not null,
ParameterTypeId int not null,
Value nvarchar (400) not null
)
create table dbo.ParameterType (
Id int not null
Name nvarchar (80) not null
)
Parameter types are MinimumPrice, MaximumPrice, Categories, MinimumWarrantyPeriod, etc.
So for each Basket I have a list of BasketParameters, all different, having each on value. Later if I need for parameter types I add them to the ParameterType table ...
The application will be responsible for using each Basket Parameters to build the Basket ... I will have, for example, a Categories table but will be decoupled from the BasketParameters.
Does this make sense? Which approach would you use?
Your first option is superior (especially since you are using a relational data store. I.e. SQL Server), since it is properly referential. This will be much easier to maintain and query as well as far more performant.
Your second solution is equivalent to an EVA table: https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
Which are usually a terrible idea (and if you need that type of flexibility you should probably use a Document Database or other NoSQL Solution instead). The only benefit to this is if you need to add/remove attributes regularly or based on other criteria.
When creating tables, I have generally created them with a couple extra columns that track change times and the corresponding user:
CREATE TABLE dbo.Object
(
ObjectId int NOT NULL IDENTITY (1, 1),
ObjectName varchar(50) NULL ,
CreateTime datetime NOT NULL,
CreateUserId int NOT NULL,
ModifyTime datetime NULL ,
ModifyUserId int NULL
) ON [PRIMARY]
GO
I have a new project now where if I continued with this structure I would have 6 additional columns on each table with this type of change tracking. A time column, user id column and a geography column. I'm now thinking that adding 6 columns to every table I want to do this on doesn't make sense. What I'm wondering is if the following structure would make more sense:
CREATE TABLE dbo.Object
(
ObjectId int NOT NULL IDENTITY (1, 1),
ObjectName varchar(50) NULL ,
CreateChangeId int NOT NULL,
ModifyChangeId int NULL
) ON [PRIMARY]
GO
-- foreign key relationships on CreateChangeId & ModifyChangeId
CREATE TABLE dbo.Change
(
ChangeId int NOT NULL IDENTITY (1, 1),
ChangeTime datetime NOT NULL,
ChangeUserId int NOT NULL,
ChangeCoordinates geography NULL
) ON [PRIMARY]
GO
Can anyone offer some insight into this minor database design problem, such as common practices and functional designs?
Where i work, we use the same construct as yours - every table has the following fields:
CreatedBy (int, not null, FK users table - user id)
CreationDate (datetime, not null)
ChangedBy (int, null, FK users table - user id)
ChangeDate (datetime, null)
Pro: easy to track and maintain; only one I/O operation (i'll come to that later)
Con: i can't think of any at the moment (well ok, sometimes we don't use the change fields ;-)
IMO the approach with the extra table has the problem, that you will have to reference somehow also the belonging table for every record (unless you only need the one direction Object to Tracking table). The approach also leads to more I/O database operations - for every insert or modify you will need to:
add entry to Table Object
add entry to Tracking Table and get the new Id
update Object Table entry with the Tracking Table Id
It would certainly make the application code that communicates with the DB a bit more complicated and error-prone.
I need to have a kind of 'versioning' for some critical tables, and tried to implement it in a rather simple way:
CREATE TABLE [dbo].[Address] (
[id] bigint IDENTITY(1, 1) NOT NULL,
[post_code] bigint NULL,
...
)
CREATE TABLE [dbo].[Address_History] (
[id] bigint NOT NULL,
[id_revision] bigint NOT NULL,
[post_code] bigint NULL,
...
CONSTRAINT [PK_Address_History] PRIMARY KEY CLUSTERED ([id], [id_revision]),
CONSTRAINT [FK_Address_History_Address]...
CONSTRAINT [FK_Address_History_Revision]...
)
CREATE TABLE [dbo].[Revision] (
[id] bigint IDENTITY(1, 1) NOT NULL,
[id_revision_operation] bigint NULL,
[id_document_info] bigint NULL,
[description] varchar(255) COLLATE Cyrillic_General_CI_AS NULL,
[date_revision] datetime NULL,
...
)
and a bunch of triggers on insert/update/delete for each table, that is intended to store it's changes.
My application is based on PyQt + sqlalchemy, and when I try to insert an entity, that is stored in a versioned table, sqlalchemy fires an error:
The target table 'Heritage' of the DML statement cannot have
any enabled triggers if the statement contains
an OUTPUT clause without INTO clause.
(334) (SQLExecDirectW); [42000]
[Microsoft][ODBC SQL Server Driver]
[SQL Server]Statement(s) could not be prepared. (8180)")
What should I do? I must use sqlalchemy.
If one can give an advice to me, how can I implement versioning without triggers, it'd be cool.
You should set 'implicit_returning' to 'False' to avoid "OUTPUT" usage in query generated by SQLAlchemy (and this should resolve your issue):
class Company(sqla.Model):
__bind_key__ = 'dbnamere'
__tablename__ = 'tblnamehere'
__table_args__ = {'implicit_returning': False} # http://docs.sqlalchemy.org/en/latest/dialects/mssql.html#triggers
id = sqla.Column('ncompany_id', sqla.Integer, primary_key=True)
...
I cant seem to add a comment so adding another answer.
It's not that complicated and I would suggest it's less fragile than putting 1/2 your business logic in your domain and the other half in your database trigger.
Personally I would write my own list object with a reference to the history list for the some_list_of_other_entities and in the Remove and Add methods maintain your history records.
This way your objects are automatically up to date before even saving them into your ORM.
public class ListOfOtherEntities : System.Collections.IEnumerable
{
// Add list stuff here...
public void Remove(MyEntity obj)
{
this.List.Remove(obj);
this.History.Add(new History("Added a object!");
}
public void Remove(MyEntity obj)
{
this.List.Remove(obj);
this.History.Add(new History("Removed a object!");
}
}
This way your objects are automatically up to date before even saving them into your ORM and another developer looking at the code can see what you have done quite easily.
This won't answer your question directly but in my experience using Triggers leads to endless pain so avoid them at all cost. If you manage all of the data yourself then the simple answer is populate the version history tables yourself. It also means you have all of your business logic in one place which is a bonus!
I have some txt files that contain tables with a mix of different records on them which have diferent types of values and definitons for columns. I was thinking of importing it into a table and running a query to separate the different record types since a identifier to this is listed in the first column. Is there a way to change the value type of a column in a query? since it will be a pain to treat all of them as text. If you have any other suggestions on how to solve this please let me know as well.
Here is an example of tables for 2 record types provided by the website where I got the data from
create table dbo.PUBACC_A2
(
Record_Type char(2) null,
unique_system_identifier numeric(9,0) not null,
ULS_File_Number char(14) null,
EBF_Number varchar(30) null,
spectrum_manager_leasing char(1) null,
defacto_transfer_leasing char(1) null,
new_spectrum_leasing char(1) null,
spectrum_subleasing char(1) null,
xfer_control_lessee char(1) null,
revision_spectrum_lease char(1) null,
assignment_spectrum_lease char(1) null,
pfr_status char(1) null
)
go
create table dbo.PUBACC_AC
(
record_type char(2) null,
unique_system_identifier numeric(9,0) not null,
uls_file_number char(14) null,
ebf_number varchar(30) null,
call_sign char(10) null,
aircraft_count int null,
type_of_carrier char(1) null,
portable_indicator char(1) null,
fleet_indicator char(1) null,
n_number char(10) null
)
Yes, you can do what you want. In ms access you can use any VBA functions and with some
IIF(FirstColumn="value1", CDate(SecondColumn), NULL) as DateValue,
IIF(FirstColumn="value2", CDec(SecondColumn), NULL) as DecimalValue,
IIF(FirstColumn="value3", CStr(SecondColumn), NULL) as StringValue
You can use all/any of the above in your SELECT.
EDIT:
From your comments it seems that you want to split them into different tables - importing as text should not be a problem in that case.
a)
After you import and get it in the initial table, create the proper table manually setting you can INSERT into the proper table.
b)
You could even do a make table query, but it might be faster to create it manually. If you do a make table query you have to be sure that you have casted the data into proper type in your select.
EDIT2:
As you updated the question showing the structure it becomes obvious that my suggestion above will not help directly.
If this is one time process you can follow HLGEM's solution. Here are some more details.
1) Import into a table with two columns - RecordType char(2), Rest memo
2) Now you can split the data (make two queries that select based on RecordType) and re-export the data (to be able to use access' import wizard)
3) Now you have two text files with proper structure which can be easily imported
I did this in my last job. You start with a staging table that has one column or two coulmns if your identifier is always the same length.
Then using the record identifier, you move the data to another set of staging tables, one for each type of record you have. This will be in columns for the data and can have the correct data types. Then you do any data cleaning you need to do. Then you insert into the real production table.
If you have a column defined as text, because it has both alphas and numbers, you'll only be able to query it as if it were text. Once you've separated out the different "types" of data into their own tables, you should be able to change the schema definition. Please comment here if I'm misunderstanding what you're trying to do.