What's the more efficient method of schema design in terms of scalability?
If a database has multiple users, and each user has objects (consisting of data in a longtext, a date, and a unique id), is it more efficient to
(1) create a mass table of objects that each have a user column, or
(2) create individual tables of objects for each user?
I've seen conflicting answers when researching, database normalization says to make individual columns for each user, while several posts mention that performance is increased by using a mass table.
Edit: changed 'elements' to 'objects' for clarity.
In general, you want to create a single table for an entity and not break them into separate tables for users.
This makes the system more maintainable. Queries on the system are consistent across all applications. It also structures the data in the way that databases are optimized for accessing it.
There are a few specialized cases where you would split user data into separate tables or even separate databases. This might be a user requirement ("our data cannot mix with anyone elses"). It may be needed to support various backup and security policies. However, the general approach is to design the tables around entities and not around users.
Having a single table with a column to identify the user is proper relational design. Having to add tables when adding a user sounds very fishy, and may cause problems once you have a large amount of users.
When a single table becomes too large, most database products support so-called partitioning which allows to split a single logical table into multiple physical tables on disk, based on some criteria (e.g. to stay with your example, you could have three physical tables with data for userids 1 - 99999 in partion 1, 100000 - 199999 in partition 2 and 200000 - 299999 in partition 3).
Here's an overview for Oracle.
Related
Suppose i have a User table, and other tables (e.g. UserSettings, UserStatistics) which have one-to-one relationship with a user.
Since sql databases don't save complex structs in table fields (some allow JSON fields with undefined format), is it ok to just add said tables, allowing to store individual (complex) data for each user? Will it complicate performance by 'joining' more queries?
And in distirbuted databases cases, will it save those (connected) tables randomly in different nodes, making more redundant requests with each other and decreasing efficiency?
1:1 joins can definitely add overhead, especially in a distributed database. Using a JSON or other schema-less column is one way to avoid that, but there are others.
The simplest approach is a "wide table": instead of creating a new table UserSettings with columns a,b,c, add columns setting_a, setting_b, setting_c to your User table. You can still treat them as separate objects when using an ORM, it'll just need a little extra code.
Some databases (like CockroachDB which you've tagged in your question) let you subdivide a wide table into "column families". This tends to let you get the best of both worlds: the database knows to store rows for the same user on the same node, but also to let them be updated independently.
The main downside of using JSON columns is they're harder to query efficiently--if you want all users with a certain setting, or want to know just one setting for a user, you're going to get at least a minor performance hit if the database has to parse a JSON column to figure that out, or you have to fetch the entire blob and do it in your app. If they're more convenient for other reasons though, you can work around this by adding inverted indexes on your JSON columns, or expression indexes on the specific values you're interested in. Indexes can have a similar cost to 1:1 joins, but you can mitigate that in CockroachDB using by using the STORING keyword to tell the DB to write a copy of all the user columns to the index.
I am developing a mssql db for stores that are in different cities. is it better to have a table for each city, or house all in one table. I also dont want users from different cities accessing data from cities that are not theirs
SQL is designed to handle large tables, really big tables. It is not designed to handle a zillion little tables. The clear answer to your question is that all examples of a particular entity should go in a single table. There are lots of good reasons for this:
You want to be able to write a query that will return data about any city or all cities. This is easy if the data is in one table; hard if the data is in multiple tables.
You want to optimize your queries by choosing correct indexes and data types and collecting statistics and defragging indexes and so on. Why multiply the work by multiplying the number of tables?
Foreign key relationships should be properly declared. You cannot do that if the foreign key could be to multiple tables.
Lots of small tables results in lots of partially filled data pages, which just makes the database bigger and slows it down.
I could go on. But you probably get the idea by now that one table per entity is the right way to go (at least under most circumstances).
Your issue of limiting users to see data only in one city can be handled in a variety of ways. Probably the most common is simply to use views.
In Relational Model, why we not keep all our data in a single table ? Why we need to create multiple tables ?
It depends on what your purpose is. For many analytic purposes, a single table is the simplest method.
However, relational databases are really design to keep data integrity. And one aspect of data integrity is that any given item of data is stored in only one place. For instance, a customer name is stored on the customer table, so it does not need to be repeated on the orders table.
This ensures that the customer name is always correct, because it is stored in one place.
In addition, repeating data through a single table often requires duplicating data -- and that would make the table way larger than needed and hence slow everything down.
I am not an expert, but i think that this would consume very much time and many resources, if there are a lot of data in the table. If we seperate them we can make operations a lot easier
Taken from https://www.bbc.co.uk/bitesize/guides/zvq634j/revision/1
A single flat-file table is useful for recording a limited amount of
data. But a large flat-file database can be inefficient as it takes up
more space and memory than a relational database. It also requires new
data to be added every time you enter a new record, whereas a
relational database does not. Finally, data redundancy – where data is
partially duplicated across records – can occur in flat-file tables,
and can more easily be avoided in relational databases.
Therefore, if you have a large set of data about many different
entities, it is more efficient to create separate tables and connect
them with relationships.
Even after reading SQLite limits I could not find the maximum number of tables a SQLite database file can hold. So, I'd like to know if
There is a maximum number of tables a SQLite database can hold?
It is an issue to have thousands of small tables in a SQLite database file?
Many tables in a SQLite database file can impact performance of queries?
The list of limits in SQLite is documented at this page. There is no maximum number of tables per database given, so there is likely no limit implemented by SQLite. There is a limit of 64 tables per JOIN.
4. Maximum Number Of Tables In A Join
SQLite does not support joins containing more than 64 tables. This
limit arises from the fact that the SQLite code generator uses bitmaps
with one bit per join-table in the query optimizer.
SQLite uses an efficient query planner algorithm and so even a large
join can be prepared quickly. Hence, there is no mechanism to raise or
lower the limit on the number of tables in a join.
15. Maximum Number Of Tables In A Schema
Each table and index requires at least one page in the database file.
An "index" in the previous sentence means an index created explicitly
using a CREATE INDEX statement or implicit indices created by UNIQUE
and PRIMARY KEY constraints. Since the maximum number of pages in a
database file is 2147483646 (a little over 2 billion) this is also
then an upper bound on the number of tables and indices in a schema.
Whenever a database is opened, the entire schema is scanned and parsed
and a parse tree for the schema is held in memory. That means that
database connection startup time and initial memory usage is
proportional to the size of the schema.
Are the table identical in structure? If so, it's generally considered a better practice to store them in a single table with an identifying column.
I believe the number of tables is constrained only by the size of the database. There can be at most 2,147,483,646 pages in a single SQLite database. So I'd guess that would also be the maximum number of tables in a single SQLite database.
That's based on the assumption that database pages are used only for tables, which probably isn't a very useful assumption.
To answer your questions 2 and 3, although having multiple tables with similar structure goes against the principles of database normalisation, there are many practical reasons why it would be preferred over a single table or virtual table - the biggest of course being than in SQLite it's much easier to drop tables than columns. It also takes up less space than having "tableX" in every row of a single table if you take the simple approach and don't do "proper" normalized relational tables.
In terms of performance, you won't see any issues with using hundreds of thousands of tables compared to a single table with hundreds of thousands entries in the "table" column, and that column indexed. In fact the index on that single normalized table could be far larger than the table indexing mechanisms SQLite uses, and less efficient.
Having said all of that, i cannot with a healthy conscience end this post without saying that much like exec() being used to assign variables with variable names being a common beginner mistake in programming, making multiple tables which should be in a single normalized table (virtual or otherwise) is a common beginner mistake in database architecture. There are, in both areas, instances where the circumstances make using exec or many tables the correct option. If for example, your data is all very similar but you are sure you will not be doing any joining whatsoever on the data, then many tables is fine. Just make sure that you really do see the data as totally unrelated, despite being of a similar structure.
If my table has a huge number of columns (over 80) should I split it into several tables with a 1-to-1 relationship or just keep it as it is? Why? My main concern is performance.
PS - my table is already in 3rd normal form.
PS2 - I am using MS Sql Server 2008.
PS3 - I do not need to access all table data at once, but rather have 3 different categories of data within that table, which I access separately. It is something like: member preferences, member account, member profile.
80 columns really isn't that many...
I wouldn't worry about it from a performance standpoint. Having a single table (if you're typically using all of the data in your standard operations) will probably outperform multiple tables with 1-1 relationships, especially if you're indexing appropriately.
I would worry about this (potentially) from a maintenance standpoint, though. The more columns of data in a single table, the less understandable the role of that table in your grand scheme becomes. Also, if you're typically only using a small subset of the data, and all 80 columns are not always required, splitting into 2+ tables might help performance.
Re the performance question - it depends. The larger a row is, the less rows can be read from disk in one read. If you have a lot of rows, and you want to be able to read the core information from the table very quickly, then it may be worth splitting it into two tables - one with small rows with only the core info that can be read quickly, and an extra table containing all the info you rarely use that you can lookup when needed.
Taking another tack, from a maintenance & testing point of view, if as you say you have 3 distinct groups of data in the one table albeit all with the same unique id (e.g. member_id) it might make sense to split it out into separate tables.
If you need to add fields to say your profile details section of the members info table, do you really want to run the risk of having to re-test the preferences & account details elements of your app as well to ensure no knock on impacts.
Also for audit trail purposes if you want to track the last user ID/Timestamp to change a members data. If the admin app allows Preferences/Account Details/Profile Details to be updated separately then it makes sense to have them in separate tables to more easily track updates.
Not quite a SQL/Performance answer but maybe something to look at from a DB & App design pov
Depends what those columns are. If you've got hard coded duplicated fields like Colour1, Colour2, Colour3, then these are candidates for child tables. My general rule of thumb is if there's more than one field of the same type (Colour), then you might as well code for N of them, not a fixed number.
Rob.
1-1 may be easier, if you have say Member_Info; Member_Pref; Member_Profile. Having too many columns can make it run if you want lots of varchar(255) as you may go over the rowsize limit, and it just makes it too confusing.
Just make sure you have the correct forgein key constraints and suchwhat, so there's always 1 row in each table with the same member_id