I'm going to start work on a medium sized application, and i'm planning it's db design.
One thing that I'm not sure about is this.
I will have many tables which will need internationalization, such as: "membership_options, gender_options, language_options etc"
Each of these tables will share common i18n fields, like:
"title, alternative_title, short_description, description"
In your opinion which is the best way to do it?
Have an i18n table with the same fields for each of the tables that will need them?
or do something like:
Membership table Gender table
---------------- --------------
id | created_at id | created_at
1 - 22.03.2001 1 - 14.08.2002
2 - 22.03.2001 2 - 14.08.2002
General translation table
-------------------------
record_id | table_name | string_name | alternative_title| .... |id_language
1 - membership regular null 1 (english)
1 - membership normale null 2 (italian)
1 - gender man null 1(english)
1 -gender uomo null 2(italian)
This would avoid me repeating something like:
membership_translation table
-----------------------------
membership_id | name | alternative_title | id_lang
1 regular null 1
1 normale null 2
gender_translation table
-----------------------------
gender_id | name | alternative_title | id_lang
1 man null 1
1 uomo null 2
and so on, so i would probably reduce the number of db tables, but i'm not sure about performance.I'm not much of a DB designer, so please let me know.
The most common way I've seen this done is with two tables, membership and membership_ml, with one storing the base values and the ml table storing the localized strings. This is similar to your second option. Most of the systems I see like this are made that way because they weren't designed with internationalization in mind from the get go, so the extra _ml tables were "tacked on" later.
What I think is a better option is similar to your first option, but a little bit different. You would have a central table for storing all the translations, but instead of putting the table name and field name in there, you would use tokens and a central "Content" table to store all the translations. That way you can enforce some kind of RI between the tokens in the base table and the translations in the Content table if you want as well.
I actually asked a question about this very thing a while back, so you can have a look at that for some more info (rather than repasting the schema examples here).
I also think the best solution is to keep translations on different table. This approach use Open Cart which is open source and you can take a look the way it deals with the problem. Another source of information is here "http://www.gsdesign.ro/blog/multilanguage-database-design-approach/" especially on the comments sections
Related
How would I go about joining results from multiple SQL queries so that they are side by side (but unrelated)?
The reason I am thinking of this is so that I can run 1 query in Google Big Query and it will return 1 single table which I can import into Excel and do some charts.
e.g. Query 1 looks at dataset TableA and returns:
**Metric:** Sales
**Value:** 3,402
And then Query 2 looks at dataset TableB and returns:
**Name:** John
**DOB:** 13 March
They would both use different tables and different filters, etc.
What would I do to make it look like:
---Sales----------John----
---3,402-------13 March----
Or alternatively:
-----Sales--------3,402-----
-----John-------13 March----
Or is there a totally different way to do this?
I can see the use case for the above, I've used something similar to create a single table from multiple tables with different metrics to query in Data Studio so that filters apply to all data in the dataset for example. However in that case, the data did share some dimensions that made it worthwhile doing.
If you are going to put those together with no relationship between the tables, I'd have 4 columns with TYPE describing the data in that row to make for easier filtering.
Type | Sales | Name | DOB
Use UNION ALL to put the rows together so you have something like
"Sales" | 3402 | null | null
"Customer Details" | null | John | 13 March
However, like the others said, make sure you have a good reason to do that otherwise you're just creating a bigger table to query for no reason.
I am sure this question has been asked before, but I'm so new to SQL, I can't even combine the correct search terms to find an answer! So, apologies if this is a repetition.
The db I'm creating has to be created at run-time, then the data is entered after creation. Some fields will have a varying number of entries, but the number is unknown at creation time.
I'm struggling to come up with a db design to handle this variation.
As an (anonymised) example, please see below:
| salad_name | salad_type | salad_ingredients | salad_cost |
| apple | fruity | apple | cheap |
| unlikely | meaty | sausages, chorizo | expensive |
| normal | standard | leaves, cucumber, tomatoes | mid |
As you can see, the contents of "salad_ingredients" varies.
My thoughts were:
just enter a single, comma-separated string and separate at run-time. Seems hacky, and couldn't search by salad_ingredients!
have another table, for each salad, such as "apple_ingredients", which could have a varying number of rows for each ingredient. However, I can't do this, because I don't know the salad_name at creation time! :(
Have a separate salad_ingredients table, where each row is a salad_name, and there is an arbitrary number of ingredients fields, say 10, so you could have up to 10 ingredients. Again, seems slightly hacky, as I don't like to unused fields, and what happens if a super-complicated salad comes along?
Is there a solution that I've missed?
Thanks,
Dan
based on my experience the best solution is based on a normalized set of tables
table salads
id
salad_name
salad_type
salad_cost
.
table ingredients
id
name
and
table salad_ingredients
id
id_salad
id_ingredients
where id_salad is the corresponding if from salads
and id_ingredients is the corresponding if from ingredients
using proper join you can get (select) and filter (where) all the values you need
While trying to build a data warehousing application using Talend, we are faced with the following scenario.
We have two tables tables that look like
Table master
ID | CUST_NAME | CUST_EMAIL
------------------------------------
1 | FOO | FOO_BAR#EXAMPLE.COM
Events Table
ID | CUST_ID | EVENT_NAME | EVENT_DATE
---------------------------------------
1 | 1 | ACC_APPLIED | 2014-01-01
2 | 1 | ACC_OPENED | 2014-01-02
3 | 1 | ACC_CLOSED | 2014-01-02
There is a one-to-many relationship between master and the events table.Since, given a limited number of event names I proposing that we denormalize this structure into something that looks like
ID | CUST_NAME | CUST_EMAIL | ACC_APP_DATE_ID | ACC_OPEN_DATE_ID |ACC_CLOSE_DATE_ID
-----------------------------------------------------------------------------------------
1 | FOO | FOO_BAR#EXAMPLE.COM | 20140101 | 20140102 | 20140103
THE DATE_ID columns refer to entries inside the time dimension table.
First question : Is this a good idea ? What are the other alternatives to this scheme ?
Second question : How do I implement this using Talend Open Studio ? I figured out a way in which I moved the data for each event name into it's own temporary table along with cust_id using the tMap component and later linked them together using another tMap. Is there another way to do this in talend ?
To do this in Talend you'll need to first sort your data so that it is reliably in the order of applied, opened and closed for each account and then denormalize it to a single row with a single delimited field for the dates using the tDenormalizeRows component.
After this you'll want to use tExtractDelimitedFields to split the single dates field.
Yeah, this is a good idea, this is called a cumulative snapshot fact. http://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/
Not sure how to do this in Talend (dont know the tool) but it would be quite easy to implement in SQL using a Case or Pivot statement
Regarding only your first question, it's certainly a good idea -- unless there is any possibility of the same persons applying-opening-closing their account more than once AND you want to keep all this information in their history (so UPDATE wouldn't help).
Snowflaking is definitely not a good option if you are going to design a data warehouse. So, denormalizing will certainly be a good choice in this case. Following article almost fits perfectly to clear the air over such scenarios,
http://www.kimballgroup.com/2008/09/design-tip-105-snowflakes-outriggers-and-bridges/
I'm creating a simple directory listing page where you can specify what kind of thing you want to list in the directory e.g. a person or a company.
Each user has an UserTypeID and there is a dbo.UserType lookup table. The dbo.UserType lookup table is like this:
UserTypeID | UserTypeParentID | Name
1 NULL Person
2 NULL Company
3 2 IT
4 3 Accounting Software
In the dbo.Users table we have records like this:
UserID | UserTypeID | Name
1 1 Jenny Smith
2 1 Malcolm Brown
3 2 Wall Mart
4 3 Microsoft
5 4 Sage
My SQL (so far) is very simple: (excuse the pseudo-code style)
DECLARE #UserTypeID int
SELECT
*
FROM
dbo.Users u
INNER JOIN
dbo.UserType ut
WHERE
ut.UserTypeID = #UserTypeID
The problem is here is that when people want to search for companies they will enter in '2' as the UserTypeID. But both Microsoft and Sage won't show up because their UserTypeIDs are 3 and 4 respectively. But its the final UserTypeParentID which tells me that they're both Companies.
How could I rewrite the SQL to ask it to return to return records where the UserTypeID = #UserTypeID or where its final UserTypeParentID is also equal to #UserTypeID. Or am I going about this the wrong way?
Schema Change
I would suggest you to break it down this schema a little bit more, to make your queries and life simpler, with this current schema you will end up writing a recursive query every time you want to get simplest data from your Users table, and trust me you dont want to do this to yourself.
I would break down this schema of these tables as follow:
dbo.Users
UserID | UserName
1 | Jenny
2 | Microsoft
3 | Sage
dbo.UserTypes_Type
TypeID | TypeName
1 | Person
2 | IT
3 | Compnay
4 | Accounting Software
dbo.UserTypes
UserID | TypeID
1 | 1
2 | 2
2 | 3
3 | 2
3 | 3
3 | 4
You say that you are "creating" this - excellent because you have the opportunity to reconsider your whole approach.
Dealing with hierarchical data in a relational database is problematic because it is not designed for it - the model you choose to represent it will have a huge impact on the performance and ease of construction of your queries.
You have opted for an Adjacently List model which is great for inserts (and deletes) but a bugger for selects because the query has to effectively reconstruct the hierarchy path. By the way an Adjacency List is the model almost everyone goes for on their first attempt.
Everything is a trade off so you should decide what queries will be most common - selects (and updates) or inserts (and deletes). See this question for starters. Also, since SQL Server 2008, there is a native HeirachyID datatype (see this) which may be of assistance.
Of course, you could store your data in an XML file (in SQL Server or not) which is designed for hierarchical data.
I have 2 tables, a purchases table and a users table. Records in the purchases table looks like this:
purchase_id | product_ids | customer_id
---------------------------------------
1 | (99)(34)(2) | 3
2 | (45)(3)(74) | 75
Users table looks like this:
user_id | email | password
----------------------------------------
3 | joeShmoe#gmail.com | password
75 | nolaHue#aol.com | password
To get the purchase history of a user I use a query like this:
mysql_query(" SELECT * FROM purchases WHERE customer_id = '$users_id' ");
The problem is, what will happen when tens of thousands of records are inserted into the purchases table. I feel like this will take a performance toll.
So I was thinking about storing the purchases in an additional field directly in the user's row:
user_id | email | password | purchases
------------------------------------------------------
1 | joeShmoe#gmail.com | password | (99)(34)(2)
2 | nolaHue#aol.com | password | (45)(3)(74)
And when I query the user's table for things like username, etc. I can just as easily grab their purchase history using that one query.
Is this a good idea, will it help better performance or will the benefit be insignificant and not worth making the database look messier?
I really want to know what the pros do in these situations, for example how does amazon query it's database for user's purchase history since they have millions of customers. How come there queries don't take hours?
EDIT
Ok, so I guess keeping them separate is the way to go. Now the question is a design one:
Should I keep using the "purchases" table I illustrated earlier. In that design I am separating the product ids of each purchase using parenthesis and using this as the delimiter to tell the ids apart when extracting them via PHP.
Instead should I be storing each product id separately in the "purchases" table so it looks like this?:
purchase_id | product_ids | customer_id
---------------------------------------
1 | 99 | 3
1 | 34 | 3
1 | 2 | 3
2 | 45 | 75
2 | 3 | 75
2 | 74 | 75
Nope, this is a very, very, very bad idea.
You're breaking first normal form because you don't know how to page through a large data set.
Amazon and Yahoo! and Google bring back (potentially) millions of records - but they only display them to you in chunks of 10 or 25 or 50 at a time.
They're also smart about guessing or calculating which ones are most likely to be of interest to you - they show you those first.
Which purchases in my history am I most likely to be interested in? The most recent ones, of course.
You should consider building these into your design before you violate relational database fundamentals.
Your database already looks messy, since you are storing multiple product_ids in a single field, instead of creating an "association" table like this.
_____product_purchases____
purchase_id | product_id |
--------------------------
1 | 99 |
1 | 34 |
1 | 2 |
You can still fetch it in one query:
SELECT * FROM purchases p LEFT JOIN product_purchases pp USING (purchase_id)
WHERE purchases.customer_id = $user_id
But this also gives you more possibilities, like finding out how many product #99 were bought, getting a list of all customers that purchased product #34 etc.
And of course don't forget about indexes, that will make all of this much faster.
By doing this with your schema, you will break the entity-relationship of your database.
You might want to look into Memcached, NoSQL, and Redis.
These are all tools that will help you improve your query performances, mostly by storing data in the RAM.
For example - run the query once, store it in the Memcache, if the user refresh the page, you get the data from Memcache, not from MySQL, which avoids querying your database a second time.
Hope this helps.
First off, tens of thousands of records is nothing. Unless you're running on a teensy weensy machine with limited ram and harddrive space, a database won't even blink at 100,000 records.
As for storing purchase details in the users table... what happens if a user makes more than one purchase?
MySQL is hugely extensible, and don't let the fact that it's free convince you of otherwise. Keeping the two tables separate is probably best, not only because it keeps the db more normal, but having more indices will speed queries. A 10,000 record database is relatively small in deference to multi-hundred-million record health record databases.
As far as Amazon and Google, they hire hundreds of developers to write specialized query languages for their specific application needs... not something developers like us have the resources to fund.