DB Schema: Why not create new table for each 'entity'? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Sorry about the vague title.
An example: I'm guessing SO has one large table that lists all answers, in a schema like:
[ Ques No, Ans No, Text , Points ]
[ 22, 0 , "Win", 3 ],
[ 22, 1 , "Tin", 4 ],
[ 23, 0 , "Pin", 2 ]
My question is would it be better if there were two tables: Table_Ques22 and Table_Ques23? Can someone please list the pros and cons?
What comes to my mind:
Cons of multiple tables: Overhead of meta storage.
Pros of multiple tables: Quickly answer queries like, find all answers to Ques 22. (I know there are indices, but they take time to build and space to maintain).

Databases are designed to handle large tables. Having multiple tables with the same structure introduces a lot of problems. These come to mind:
Queries that span multiple rows ("questions" in your example) become much more complicated and performance suffers.
Maintaining similar entities is cumbersome. Adding an index or partitioning a single table is one thing. Doing it to hundreds of tables is much harder.
Maintaining triggers is cumbersome.
When a new row appears (new question), you have to incur the overhead of creating a table rather than just adding to an existing table.
Altering a table, say to add a new column or rename an existing one, is very cumbersome.
Although putting all questions in one table does use a small additional amount of storage, you have to balance that against the overhead of having very small tables. A table with data has to occupy at least one data page, regardless of whether the data is 10 bytes or 10 Gbytes. If a data page is 16 kbytes, that is a lot of wasted space to support multiple tables for a singe entity.
As for database limits. I'm not even sure a database could support a separate table for each question on Stack Overflow.
There is one case where having parallel table structures is useful. That is when security requirements require that the data be separated, perhaps for client confidentiality reasons. However, this is often an argument for separate databases, not just separate tables.

What about: SQL Servers are not made for people ignoring the basics of the relational theoream.
You ahve a ton of problems with cross question queries in your part, which will totally kill all the gains. Typical beginner mistake - I suggest a good book about SQL basics.

Related

Storing wide-form dataframes in datajoint table [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 months ago.
Improve this question
Say I have some analysis that spits out a wide-form pandas dataframe with a multiindex on the index and columns. Depending on the analysis parameters, the number of columns may change. What is the best design pattern to use to store the outputs in a datajoint table? The following come to mind, each with pros and cons
Reshape to long-form and store single entries with index x column levels as primary keys
Pros: Preserves the ability to query/constrain based on both index and columns
Cons: Each analysis would insert millions of rows to the table, and I may have to do hundreds of such analyses. Even adding this many rows seems to take several minutes per dataframe, and queries become slow
Keep as wide-form and store single rows as longblob with just index levels as primary keys
Pros: Retain ability to query based on index levels, results in tables with a more reasonable number of rows
Cons: Loses the ability to query based on column levels, the columns would then also have to be stored somewhere to be able to reconstruct the original dataframes. Since dataframes with different numbers of columns need to be stored in the same table, it is not feasible to explicitly encode all the columns in the table definition
Store the dataframe itself as e.g. an h5 and store it in the database simply as a filepath or as an attachment
Pros: Does not result in large databases, simple to implement
Cons: Does not really feel in the "spirit" of datajoint, lose the ability to perform constraints or queries
Are there any designs or pros/cons I haven't thought of?
Before providing a more specific answer, let's establish a few basics (also known as normal forms).
DataJoint implements the relational data model. Under the relational model, complex dataframes of the type you described require normalization into multiple related tables related to each other through their primary keys and foreign keys.
Each table will represent a single entity class: Units and Trials will be represented in separate tables.
All entities in a given table will have the same attributes (columns). They will be uniquely identified by the same attribute(s) comprising the primary key.
In addition to the primary key, tables may have additional secondary indexes to accelerate queries.
If you already knew about normalization, we can talk how about to normalize your design. If not, we can refer you to a quick tutorial.

Can converting a SQL query to PL/SQL improve performance in Oracle 12c? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have been given an 800 lines SQL Query which is taking around 20 hours to fetch around 400 million records.
There are 13 tables which are partitioned by month.
The tables have records ranging from 10k to 400 million in each partition.
The tables are indexed on primary keys.
The query uses many inline views and outer joins and a few group by functions.
DBAs say we cannot add more indexes as it would slow down the performance since it is an OLTP system.
I have been asked to convert the query logic to pl/sql and then populate a table in chunks.Then do a select * from that table.
My end result should be a query which can be fed to my application.
So even after I use pl/sql to populate a table in chunks,ultimately I need to fetch the data from that table as a query.
My question is, since pl/sql would require select and insert both, are there any chances pl/sql can be faster than sql?
Are there any cases where pl/sql is faster for any result which is achievable by sql?
I will be happy to provide more information if the given info doesn't suffice.
Implementing it as a stored procedure could be faster because the SQL will already be parsed and compiled when the procedure is created. However, given the volume of data you are describing its unclear if this will make a significant difference. All you can do is try it and see.
I think you really need to identify where the performance problem is; where the time is being spent. For example (and I have seen examples of this many times), the majority of the time might be in fetching to 400M rows to whatever the "client" is. In that case, re-writing the query or as PL/SQL will make no difference.
Anyway, once you can enumerate the problem, you have a better chance of getting sound answers, rather than guesses...

Database Schema SQL Rows vs Columns [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a lot of databases with relatively large amounts of columns ranging from 5 to 300. Each table has at least 50,000 rows in it.
What is the most effective way to store this data? Presently the data has just been dumped into an indexed sql database.
It was suggested to me to create 3 columns as follows.
Column Name, Column category, Row ID, Row Data.
example data would be
Male, 25-40, 145897, 365
Would this be faster? Would this be slower? Is there better ways to store such large and bulky databases?
I will almost never be updating or changing data. It simply be outputted to a 'datatables' dynamic table where it will be sorted, limited and ect. The category column will be used to break up the columns on the table.
Normalize your db!
I have struggled with this "theory" for a long time and experience has proven that if you can normalize data across multiple tables it is better and performance will not suffer.
Do Not try to put all the data in one row with hundreds of columns. Not because of performance but because of development ease.
Learn More here

SQL large table VS. multiple smaller tables [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I have the option to use a single table that will expand upwards of 1,000,000 records per year.
With that said, I could use a foreign key to break up this table into muitiple smaller tables, which will reduce this expansion to each smaller table of 100,000 records per year.
Lets say 50% of the time, users will query all of the records where the other 50% of the time users will query the segmented smaller table data set. ( think based on all geographic areas vs. specific geographic areas)
Using a database managed by a shared hosting account ( think site5, godaddy, etc... ), is it faster to use a single larger table or to use several smaller segmented tables given this situation?
Where each dataset is accessed 10%/%90, 20%/%80, %30/%70... etc, at what point would using a single table vs muiltiple smaller tables be the most/least efficient?
In general do it so as to reduce the amount of duplicated information. If you are making smaller tables which have many redundant columns, then it seems like it'd be more efficient to have just one table. But otherwise, one table.
It also depends on what percent of the row is being used per query, and how your queries are structured. If you are adding lots of joins or subqueries, then it'll most likely be slower.

Which type of database structure design is better for performance? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
MSSQL database. I have issue to create database using old databases data. Old database structure is thousands tables conected with each other by ID. In this tables data duplicated many times. Old database tables have more than 50 000 rows (users). Structure like this table
Users (id, login, pass, register-date, update-date),
Users-detail (id, users_id, some data)
Users-some-data (id, users_is, some data)
and this kind of tables is hundreds.
And the question is, which design of db structure to choose, one table with all of this data, or hundreds of tables separated by some theme.
Which type of db structure would be with better performance?
Select id, login, pass from ONE_BIG_TABLE
or
Select * from SMALL_ONLY_LOGINS_TABLE.
Answer really depends on the use. No one can optimize your database for you if they don't know the usage statistics.
Correct DB design dictates that an entity is stored inside a single table, that is, the client with their details for example.
However this rule can change on the occasion you only access/write some of the entity data multiple times, and/or of there is optional info you store about a client (eg, some long texts, biography, history, extra addresses etc) in which cases it would be optimal to store them on a child-table.
If you find yourself a bunch of columns with all-null values, that means you should strongly consider a child table.
If you only need to try login credentials against the DB table, a stored procedure that returns a bool value depending on if the username/password are correct, will save you the round-trip of the data.
Without indexes the select on the smaller tables will be faster. But you can create the same covering index (id, login, pass) on both tables, so if you need only those 3 columns performance will probably be the same on both tables.
The general question which database structure is better can not be answered without knowing the usage of your database.