Database design for items with variable attributes? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm creating an application that will need to store items and categories. The information added to the database will be user submitted and the user can add items and in the future categories for items.
E.g. A user can add a CD to the Music category or a DVD to the Movies category. I don't want to limit the categories that can be added or the items that can be added to those categories. The user can pretty much add any item.
Right now I'm using SQL Server and I have an Item table with ItemId (primary), Name, Description and CategoryId (foreign) linking to a Category table with similar columns.
My problem is that a CD and DVD have different attributes like 'running time', 'age rating', 'number of tracks' so I can't store them in the same table because they would have redundant columns. What if the user added a car with 'engine size', 'color'...etc?
So I've researched and I think I have the following options:
1) Dynamically create a new table for each category that is added and store all items within that category together in the same table.
Problem: I hear dynamically creating tables is a bad design decision. It is harder to manage and find what I need.
2) In the item table create a 'ItemAttributeData' string column where I can store a custom string such as an XML document containing the attributes for that particular item.
Problem: These attributes aren't query-able from SQL and would have to be processed manually in the code.
3) Go with a NoSQL solution such as MongoDB or azure table storage (it's an ASP.NET app) and create an items collection where each item can have a different set of columns.
Problem: I lose the relational mapping from categories to items and other tables like 'Users' (I think?)
4) Combine RDBMS and NoSQL such that the schematic-less attributes are stored in a NoSQL itemAttributes collection and the shared item properties are stored in the relational database. Then link them using the itemId.
What do you think the is best proposed solution in terms of extensibility and performance?

If I was you I would go for another option: an EAV model.
You can easily create some views for CD's and DVD's from that model that resemble flat tables for each category.
Once you get the hang of it, it is quite simple. And it performs well.

I never use it myself but an interesting solution could be PostgreSql + hStore
You keep Sql and you are able to use semi structured data inside a column. That column is a key/value store quite limited (no nesting for example) but indexable and searchable.

Related

Optimal way to store statuses in DB [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 12 months ago.
Improve this question
I would like to have your opinion on the "best" way to manage the storage of different statuses in my DB
Currently, when I have a new status type, e.g. "status registration file", "status refund request", "status transfers to another system", I create a new table for each of these types of status, usually with an ID and a label field, then I join the created table.
I was told that this was a no-no, it was an amateur way of working, that only one table should be used, that it multiplied the tables unnecessarily and that, moreover, it was bad for performance. Less tables = more performance.
From my point of view, the advantages I find in creating one table per status type:
allows me to add information/columns as needed (active/inactive status, additional IDs with letters or strings, descriptions, translations...), in short, information that is not necessary for most statuses.
facilitates queries with IDEs (no need to specify the ID of the type of status to be taken into account in a query)
ease of data retrieval with doctrine for the same reason.
The negative point:
a table and a join to be created for each new status type.
Depending on my projects, I have 2/3 to a dozen tables to manage.
What do you think about it?
Is it bad for sql performance/cache to have many tables ( more than 100)?
Thanks in advance for your answers.
When we think of statuses we tend to either think of a series of events like 'prepared' -> 'running' -> 'finished' or of mere booleans (married = yes/no, active = yes/no). If we need this in combination with dates, we can use status history tables that show when a status changed.
But this is not what you have in mind. Your statuses come with data. When you talk about "status registration file", some registration file got involved and you want to store this with the product, order or whatever. And once you store this file (or the file's path) this implies a certain status.
Depending on what you have to store, you'll add a column or a table and maybe even a status (the registration file being unchecked, approved, dismissed).
If I have a table of employees, I may store a column driving_licence_photo. And all employees that have a driving licince photo in the table are allowed to drive the company's cars. The status ("they have a driving licence") is implicit.
If I have a table of employees and they can have various certificates, I may create a table employee_certificate and this table may have a certificate type, a certificate number and maybe even a status "pending" / "achieved".
If I have a table of employees and want to know their working status ('active', 'pausing', 'retired', 'on sick leave', ...), I will probably create a table work_status and give the employee table a work_status_id.
So, the answer is: It depends.

More than 50 different categories that share 50% of attributes. How to create all these tables efficiently in sql database? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
In large system, when analyzing the database, there are about 50 different categories in the requirements, which should represented as tables.
Each category has many attributes - columns-, all these categories has the same of 50% of columns. For example, each category has (id, name, date, state, admin, dept), all categories have those attributes, but each category has its own attributes which differ from each other, they are about 3 - 5 attributes.
Now, how to represent them in the physical database as tables? One table, or table for each category, what about redundancy?
Depends on what exactly you are trying to achieve.
If your primary concern is disk space, I would recommend to consider sparse columns, with column sets as an option, if necessary. In this scenario, you can put all these entities into a single physical table, with mandatory attributes being normal columns and specific attributes being declared as sparse.
If you are thinking about a normalised model which would eliminate most of data anomalies, a typical solution is a supertype-subtype hierarchy. The main table stores only the attributes that are mandatory for all entities, and child tables contain only main table' identifier and attributes specific to this particular category. All the child tables reference the "supertype" table via foreign keys.
Sometimes, depending on subject area, a more complex model with additional "nesting" levels might be employed. You can think of this as a class inheritance hierarchy - the analogy is very close, actually.
Of course, both (and other) approaches have their strengths and weaknesses, so you might need to read up on the subjects and make a choice.
As some attributes are only applicable to some categories, you can think of Entity-Attribute-Value model, for storing the categories.
There are multiple ways of representing EAV models in a database. You can refer to below article: https://inviqa.com/blog/understanding-eav-data-model-and-when-use-it
The EAV model way of data storage comes up with its own challenges, when you query the database. So, see whether it will suit your needs, before choosing the same.

I have a single table format used by 2 independent groups of people, do I make 2 tables or 1? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have 2 set of data about products, both have the same columns but each I used by a different independent group of users.
I'm not sure if I should store the 2 categories data together or separately.
If I put it in 1 table it'd look like
dbo.Products
ID | ..... | Catagory
If I do 2 tables.
Cat1.Products
ID | .....
Cat2.Prodcuts
ID | .....
Edit: The categories both use the same columns.
You have a couple of options depending upon your architecture and your, or your clients, needs.
If your clients use the same interface to access the data and the data is separate or can possibly be shared then, for simplicity, you should use the same tables.
Caveat: If the data contains sensitive information (such as personally identifiable data: SSNs, Names, Addresses, etc.) that should only ever be accessible to the owner of that data, then you should seriously consider cloning the table structure to a new database owner, instance, or schema to reduce the chance of any data breaches.
If your clients use different interfaces, or different instances of the interface that could potentially diverge from the originating code-base, then you should create a different set of tables under a different database owner, instance, or schema.
There are, of course, other mitigating factors, but I think those are the main things to consider.
It depends if they also share the same data and are related with each other, to keep it simple it's better to do only one if the data and usage are the same.

Should I Create a Table or a Query? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
OK, I have an Access DB that have an Items Table and a Students Table that contains the monthly subscription fee, this two is linked in a third Table "Payments", that gathers the data from a Student(fee + items) and sum them. But that table only keeps the values and not the description. As the payment is irregular(the student don't need to pay all in the same day), and because of this the student's item debt value needs to be reduced as the way he pays, I need a control of that. So, should I create a new table that copies the data from two other tables and make the changes in this new one, or just use a query to show the data and makes the changes in the "main" table? I'm a bit lost and confuse in this, so sorry this mess.
You need to read a beginner's text about database design before you go any further with this project, imo. The first item found by googling "relational database tutorial" is
http://www3.ntu.edu.sg/home/ehchua/programming/sql/relational_database_design.html
see the section "Create Relationships among Tables". There are countless other tutorials online.
As rule, you don't copy data from one table to another. A piece of information like an item's description nor a user's name should only be stored in one place in a database. When you need that in the context of relating it to data in another table (e.g. to display the description of an entry in the Items table with the cost amount in the Fees or Payments table), you look it up, not copy it.
The way to deal with a student having arbitarily many items is to have a "link" table that mainly stores only a unique identifier of the student and a unique identifier of the Item. Usually, these would be numeric identifiers that are assigned as now student/item/other entities are added to the db.
The point of having a link table is that there is no practical limit to the number of items that can be associated with a particular student.
You call add a column to the link table to relate the student and one or more instance of the same item to particular bills (or or orders or whatever it is that your db is modelling).

When to use separate SQL database tables for two slightly different types of information? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I need help with an SQL decision that has confused me for a while.
I'm trying to make a short story website where users can write their own stories and can browse each other's, etc. I've also got a collection of classic short stories written by great writers from the past. I'm confused as to whether I should store both types of story in the same database table.
I want to keep the two types of stories (classic authors/users) distinct to some degree, since you should be able to search the website and filter out user stories from the results. But I can't just have a single database row in the table to represent this, ie a boolean CLASSIC, since with classic short stores, several other of the rows would be different too - there is no user, the date would be YYYY (ie, 1869) instead of a full datetime when the user submitted it.
Yet I can't quite justify putting them in separate tables either. When most of the attributes are the same, should I really have two different database tables for short stories? At the moment I am filling in NULL into the user row for classic short stories, and my filtered search has an option to search only through classics, which selects from the database where user is NULL. This seems to hit performance though, when you're searching through a huge database of potentially millions of user stories just to find a few thousand classic stories.
Note that there are other tables too, like tags for the stories, linked to the short stories table.
So I'm basically asking you SQL experts - is there enough justification for separating the two types of information into different tables? I'm currently using SQLite in development but will switch to MySQL or PostgreSQL later.
I'd probably go with a "parent-child" table structure, where you have matching primary keys across tables, something like:
Stories: StoryId (PK), StoryType (U or C), StoryText, etc. (all of the shared stuff)
UserStories: StoryId (PK and FK), UserId, etc.
ClassicStories: StoryId (PK and FK), AuthorName, etc.
Then if you want, you can build two views around them:
V_UserStories: StoryId, StoryText, UserId, etc.
V_ClassicStories: StoryId, StoryText, AuthorName, etc.
With this setup, you're not wasting any columns, you're keeping shared stuff together, while still keeping the two types of stories easily logically separate if you need them.
To make such a decision you have to think if the field you want to insert into your table for your table only and nothing else.
for example
Story and type of story, if a story can have several types of stories and / or a type for several stories then yes you must make a specific table kind of history, but if only one type of story concern one story then you insert the type informations (name, description etc ...) directly into the stories table.