Basically, I'm kinda exhausted while thinking about the best possible way to structure my database for a recipe app I have planned and could need some help/ideas.
The goal is to make a recipe app including multiple languages(translated recipes) and which has features like a meal plan and shopping list, for example.
To achieve all that, I came up with this design of a database(Image below). It's important to have the ingredients as an own entity so that they add up in the shopping list. With the current design its really hard to filter by the total amount of calories, eg. "I want to see all recipes that have less than 500 calories".
The recipes only get added by me, meaning there will be probably never ever be more than 10 000 of them(In the absolute best case :D).
I would appreciate any help and also comments about other things like scalabilty/performance etc.
Related
I asked the following question on a database site:
I am trying to build an EER Model for a Autostore that has 5 locations
and offers a range of auto products. They offer car repairs and
roadworthy tests as a service also. I need to be able to make
fortnightly reports on unfinished service jobs, and fortnightly
reports on the sales. They have a wide customer database filled with
full addresses. There is a constant inflow of new stock items and
restocking of old ones. There should also be a way to know the cost of
each item in stock and where its being held.
I swear I've researched it enough to be able to understand it by now
but Im really struggling to map this out as I'm constantly running
into a wall when dealing with the products that are being restocked,
sold and stocked by particular stores in different locations.
-I'm a total rookie with this kind of thing but if anyone can help me it would be amazing.
but I am struggling to find an answer and was thinking that maybe if I asked someone here to build an SQL setup it would lead me in the direction of being able to make the model or if there was a way of building the relational model then it would be a simple step from there, unless someone has the original answer - or all of them haha, hope you can help!
Thanks,
Jacob
If you are struggling with creating an EER diagram, my guess is that you may not have captured detailed enough requirements for the application. A clear understanding of the functionality the application should provide should lay the groundwork for what you need to model in the database.
Ask yourself these questions.
Have I created user profiles for each type of user the application will be used by?
Have I outlined every action these users will be performing on the application and the details of the actions?
These are just two of many questions that you have hopefully fully addressed. If you have addressed these topics and everything else fully, perhaps you just need a different approach in organizing your requirements.
Break it up into segments of data. For example, you'll need to create a system of tables that manages inventory. Which will need to then be linked up to a system of tables that manages sales and service records. Which will need to be linked to a system of tables that manages customers data. The sales/service and inventory control will need to be linked up to a system of tables that governs employees and their roles and ability to do things (security, privileges, etc). I can go on and on speaking theoretically about this, but this should hopefully be enough to get you started.
Good luck.
I'm developing an e-commerce site and I'm facing a problem. The requirement I need to fulfill is: given a certain string like "lock", "tennis", "cinema" or "scaffold" entered by an user, I should display both profiles and product postings matching that query.
That leads me to these conclusions:
a) I should classify profiles (what they do): maybe some sort of "business category". I face a problem here, it's almost impossible to create a list that covers all the business activities in the world, so I've been reading some good questions here and liked the strategy of creating a 1 or 2 level hierarchy and then capture the user feedback automatically through reusable tags.
The idea here is to keep the keywords that are more likely to be thought of and so with best odds of being searched for.
By the way, is there a really good business category list out there that I'm missing?
b) I should classify products and services (what they offer): with a similar approach, I liked using some fixed categories and then let users use persistent tags. I ask you again if there is a category list out there that really rocks.
Questions:
1) Do you like what I've just described? Based on your experiences do you think it might work or will I drown in a tag sea?
2) Sometimes both product and business hierarchies are kind of redundant. There are some people that describe their businesses like "theater", "bank", "dentist" and that's not a product or service but rather what they actually are. But there are plenty of cases where people say "(sale/manufacturing) cars", "(manufacturing) pottery", and so on. Their business activity is chiefly determined by the product or service they're selling or producing. What's your take on this?
I did something like this is ror_ecommerce. I used a tree for product_types.
So you could sell:
Furniture => Livingroom => Chairs
Filtering by Livingroom would also give items that have a product_type pointing to Chairs.
To make it more complex I would first fight the business person.
Second I would create a model where product has many product_types. Thus you need a join table between product_type and product.
Answer to your question:
1) You will probably drown because categories are infinite. Limit the number of tags or make an admin interface that your business people can populate. If not your business folks will request you to do this work over and over again.
2) It sounds like Your tree would look like this:
sale/manufacturing => cars
sale/manufacturing => pottery
bank => personal
bank => business
thus if someone just wants a bank they get results for personal and business. but they can further refine the search to just personal banking.
It also sounds like the business and product hierarchies are completely redundant. I would really fight a business guy saying anything else because a future requirement will be to relate business and personal hierarchies. If they are the same it will be easy. You might want a type flag on the list for personal / business / both. it will be a lot easier to remove the flag if it's not needed.
Good Luck
I recently learned about normalisation in my informatics class and I'm developing a multiplayer game using SQLite as backend database at the moment.
Some information on it:
The simplified structure looks a bit like the following:
player_id | level | exp | money | inventory
---------------------------------------------------------
1 | 3 | 120 | 400 | {item a; item b; item c}
Okay. As you can see, I'm storing a table/array in string form in the column "inventory". This is against normalization.
But the thing is: Making an extra table for the inventory of players brings only disadvantages for me!
The only points where I access the database is:
When a player joins the game and his profile is loaded
When a player's profile is saved
When a player joins, I load his data from the DB and store it in memory. I only write to the DB like every five minutes when the player is saved. So there are actually very few SQL queries in my script.
If I used an extra table for the inventory I would have to, upon loading:
Perform an performance and probably more data-intensive query to fetch all items from the inventory table which belong to player X
Walk through the results and convert them into a table for storage in memory
And upon saving:
Delete all items from the inventory table which belong to player X (player might have dropped/sold some items?)
Walk through the table and perform a query for each item the player owns
If I kept all the player data in one table:
I'd only have one query for saving and loading
Everything would be in one place
I would only have to (de)serialize the tables upon loading and saving, in my script
What should I do now?
Do my arguments and situation justify working against normalisation?
Are you saying that you think parsing a string out of "inventory" doesn't take any time or effort? Because everything you need to do to store/retrieve inventory items from a sub table is something you'd need to do with this string, and with the string you don't have any database tools to help you do it.
Also, if you had a separate subtable for inventory items, you could add and remove items in real time, meaning that if the app crashes or the user disconnects, they don't lose anything.
There are a lot of possible answers, but the one that works for you is the one to choose. Keep in mind, your choice may need to change over time.
If the amount of data you need to persist is small (ie: fits into a single table row) and you only need to update that data infrequently, and you don't have any reason to care about subsets of that data, then your approach makes sense. As time goes on and your players gain more items and you add more personalization to the game, you may begin to push up against the limits of SQLite, and you'll need to evolve your design. If you discover that you need to be able to query the item list to determine which players have what items, you'll need to evolve your design.
It's generally considered a good idea to get your data architecture right early, but there's no point in sitting in meetings today trying to guess how you'll use your software in 5-10 years. Better to get a design that meets this year's needs, and then plan to re-evaluate the design again after a year.
What's going to happen when you have one hundred thousand items in your inventory and you only want to bring back two?
If this is something that you're throwing together for a one off class and that you won't ever use again, then yes, the quick and dirty route might be a quicker option for you.
However if this is something you're going to be working on for a few months, then you're going to run into long-term issues with that design decision.
No, your arguments aren't valid. They basically boil down to "I want to do all of this processing in my client code instead of in SQL and then just write it all to a single field" because you are still doing all of the exact same processing to generate the string. By doing this you are removing the ability to easily load a small portion of the list and losing relationships to the actual item table which could contain more information about the items (I assume you're hard coding it all based on names instead of using internal item IDs which is a really bad idea, imo).
Don't do it. Long term the approach you are wanting to take will generate a lot more work for you as your needs evolve.
Another case of premature optimization.
You are trying to optimize something that you don't have any performance metrics. What is the target platform? Even crappiest computers nowadays could run at least hundreds of your reading operation per second. Then you add better hardware for more users, then you can go to cloud and when you come into problem space that Google, Twitter and Facebook are dealing with, you can consider denormalizing. Even then, best solution is some sort of key-value database.
Maybe you should check Wikipedia article on Database Normalization to remind you why normalized database is a good thing.
You should also think about the items. Are the items unique for every user or does user1 could have item1 and user2 have item1 to. If you now want to change item1 you have to go through your whole table and check which user have this item. If you would normalize your table, this would be much more easy.
But it the end, I think the answer is: It depends
Do my arguments and situation justify
working against normalisation?
Not based on what I've seen so far.
Normalized database designs (appropriately indexed and with efficient usage of the database with UPSERTS, transactions, etc) in general-purpose engines will generally outperform code except where code is very tightly optimized. Typically in such code, some feature of the general purpose RDBMS engine is abandoned, such as one of the ACID properties or referntial integrity.
If you want to have very simple data access (you tout one table, one query as a benefit), perhaps you should look at a document centric database like mongodb or couchdb.
The reason that you use any technology is to leverage the technology's advantages. SQL has many advantages that you seem to not want to use, and that's fine, if you don't need them. In Neal Stephenson's Zodiac, the main character mentions that few things bought from a hardware store are used for their intended purpose. Software's like that, too. What counts is that it works, and it works nearly 100% of the time, and it works fast enough.
And yet, I can't help but think that someday you're going to have some overpowered item released into the wild, and you're going to want to deal with this problem at the database layer. Say you accidently gave out some superinstakillmegadeathsword inventory items that kill everything within 50 meters on use (wielder included), and you want to remove those things from play. As an apology to the people who lose their superinstakillmegadeathsword items, you want to give them 100 money for each superinstakillmegadeathsword you take away.
With a properly normalized database structure, that's a trivial task. With a denormalized structure, it's quite a bit harder and slower. A normalized database is also going to be easier to expand on the design in the future.
So are you sure you don't want to normalize your database?
I am looking to design a database for a website where users will be able to gain points (reputation) for performing certain activities and am struggling with the database design.
I am planning to keep records of the things a user does so they may have 25 points for an item they have submitted, 1 point each for 30 comments they have made and another 10 bonus points for being awesome!
Clearly all the data will be there, but it seems like a lot or querying to get the total score for each user which I would like to display next to their username (in the form of a level). For example, a query to the submitted items table to get the scores for each item from that user, a query to the comments table etc. If all this needs to be done for every user mentioned on a page.... LOTS of queries!
I had considered keeping a score in the user table, which would seem a lot quicker to look up, but I've had it drummed into me that storing data that can be calculated from other data is BAD!
I've seen a lot of sites that do similar things (even stack overflow does similar) so I figure there must be a "best practice" to follow. Can anyone suggest what it may be?
Any suggestions or comments would be great. Thanks!
I think that this is definitely a great question. I've had to build systems that have similar behavior to this--especially when the table with the scores in it is accessed pretty often (like in your scenario). Here's my suggestion to you:
First, create some tables like the following (I'm using SQL Server best practices, but name them however you see fit):
UserAccount UserAchievement
-Guid (PK) -Guid (PK)
-FirstName -UserAccountGuid (FK)
-LastName -Name
-EmailAddress -Score
Once you've done this, go ahead and create a view that looks something like the following (no, I haven't verified this SQL, but it should be a good start):
SELECT [UserAccount].[FirstName] AS FirstName,
[UserAccount].[LastName] AS LastName,
SUM([UserAchievement].[Score]) AS TotalPoints
FROM [UserAccount]
INNER JOIN [UserAchievement]
ON [UserAccount].[Guid] = [UserAchievement].[UserAccountGuid]
GROUP BY [UserAccount].[FirstName],
[UserAccount].[LastName]
ORDER BY [UserAccount].[LastName] ASC
I know you've mentioned some concern about performance and a lot of queries, but if you build out a view like this, you won't ever need more than one. I recommend not making this a materialized view; instead, just index your tables so that the lookups that you need (essentially, UserAccountGuid) will enable fast summation across the table.
I will add one more point--if your UserAccount table gets huge, you may consider a slightly more intelligent query that would incorporate the names of the accounts you need to get roll-ups for. This will make it possible not to return huge data sets to your web site when you're only showing, you know, 3-10 users' information on the page. I'd have to think a bit more about how to do this elegantly, but I'd suggest staying away from "IN" statements since this will invoke a linear search of the table.
For very high read/write ratios, denormalizing is a very valid option. You can use an indexed view and the data will be kept in sync declaratively (so you never have to worry about there being bad score data). The downside is that it IS kept in sync.. so the updates to the store total are a synchronous aspect of committing the score action. This would normally be quite fast, but it is a design decision. If you denormalize yourself, you can choose if you want to have some kind of delayed update system.
Personally I would go with an indexed view for starting, and then later you can replace it fairly seamlessly with a concrete table if your needs dictate.
In the past we've always used some sort of nightly or perodic cron job to calculate the current score and save it in the database - sort of like a persistent view of the SUM on the activities table. Like most "best practices" they are simply guidelines and it's often better and more practical to deviate from a specific hard nosed practice on very specific areas.
Plus it's not really all that much of a deviation if you use the cron job as it's better viewed as a cache stored in the database.
If you have a separate scores table, you could update it each time an item is submitted or a comment is posted by a user. You could do this using a trigger or within the sites code.
The user scores would be updated continuously, and could be quickly queried for display.
Option A
We are working on a small project that requires a pricing wizard for custom tables. (yes, actual custom tables- the kind you eat at. From here out I'll call them kitchen tables so we don't get confused) I came up with a model where each kitchen table part was a database table. So the database looked like this:
TableLineItem
-------------
ID
TableSizeID
TableEdgeWoodID
TableBaseID
Quantity
TableEdgeWoodID
---------------
ID
Name
MaterialUnitCost
LaborSetupHours
LaborWorkHours
Each part has to be able to calculate its price. Most of the calculations are very similar. I liked this structure because I can drag it right into the linq-to-sql designer, and have all of my classes generated. (Less code writing means less to maintain...) I then implement a calculate cost interface which just takes in the size of the table. I have written some tests and this functions pretty well. I added also added a table to filter parts in the UI based on previous selections. (You can't have a particular wood with a particular finish.) There some other one off exceptions in the model, and I have them hard coded. This model is very rigid, and changing requirements would change the datamodel. (For example, if all the tables suddenly need umbrellas.)
Option B:
After various meetings with my colleagues (which probably took more time than it should considering the size of this project), my colleagues decided they would prefer a more generic approach. Something like this:
Spec
----
SpecID
SpecTypeID
TableType_LookupID
Name
MaterialUnitCost
LaborSetupHours
LaborWorkHours
SpecType
--------
SpecTypeID
ParentSpecType_SpecTypeID
IsCustomerOption
IsRequiredCustomerOption
etc...
This is a much more generic approach that could be used to construct any product. (like, if they started selling chairs...) I think this would take longer time to implement, but would be more flexible in the future. (although I doubt we will revisit this.) Also you lose some referential integrity- you would need triggers to enforce that a table base cannot be set for a table wood.
Questions:
Which database structure do you prefer? Feel free to suggest your own.
What would be considered a best practice? If you have several similar database tables, do you create 1 database table with a type column, or several distinct tables? I suspect the answer begins with "It depends..."
What would an estimated time difference be in the two approaches (1 week, 1 day, 150% longer, etc)
Thanks in advance. Let me know if you have any questions so I can update this.
Having been caught out much more often than I should have by designing db structures that met my clients original specs but which turned out to be too rigid, I would always go for the more flexible approach, even though it takes more time to set up.
I don't have time for a complete answer right now, but I'll throw this out:
It's usually a bad idea to design a database based on the development tool that you're using to code against it.
You want to be generic to a point. Tables in a database should represent something and it is possible to make it too generic. For example, a table called "Things" is probably too generic.
It may be possible to make constraints that go beyond what you expect. Your example of a "table base" with a "table wood" didn't make sense to me, but if you can expand on a specific example someone might be able to help with that.
Finally, if this is a small application for a single store then your design is going to have much less impact on the project outcome than it would if you were designing for an application that would be heavily used and constantly changed. This goes back to the "too generic" comment above. It is possible to overdesign a system when its use will be minimal and well-defined. I hope that makes sense.
Given your comment below about the table bases and woods, you could set up a table called TableAttributes (or something similar) and each possible option would be of a particular table attribute type. You could then enforce that any given option is only used for the attribute to which it applies all through foreign keys.
There is a tendency to over-abstract with database schema design, because the cost of change can be high. Myself, I like table names that are fairly descriptive. I often equate schema design with OO design. E.g., you wouldn't normally create a class named Thing, you would probably call it Product, Furniture, Item, something that relates to your business.
In the schema you have provided there is a mix of the abstract (spec) and the specific (TableType_LookupID). I would tend to equalize the level of abstraction, so use entities like:
ProductGroup (for the case where you have a product that is a collection of other products)
Product
ProductType
ProductDetail
ProductDetailType
etc.
Here's what my experience would tell me:
Which database structure do you prefer? Without a doubt, I'd go for approach one. Go for the simplest setup that might work. If you add complexity, always ask yourself, what value will it have to the customer?
What would be considered a best practice? That does indeed depend, among others on the size of the project and the expected rate of change. As a general rule, generic tables are worth it when you expect the customer to be adding new types. For example, if your customer wants to be able to add a new "color" entity to the table, you'd need generic tables. You can't predict beforehand what they will add.
What would an estimated time difference be in the two approaches? Not knowing your business, skill, and environment, it's impossible to give a valid estimate. The approach that you are confident in coding will take the least time. Here, my guess would be approach #1 could be 5x-50x as fast. Generic tables are hard, both on the database and the client side.
Option B..
Generic is generally better than specific. Software already is doomed to fail or reach it's capacity by it's design for a certain set of tasks only. If you build something generic it will break less if abstracted with a realistic analysis of where it might head. As long as you stay away from over-abstraction and under-abstraction, it's probably the sweet spot.
In this case the adage "less code is more" would probably be drawn in that you wouldn't have to come back and re-write it again.