Is this a bad way to structure my Sql Server database?

Is this a bad way to structure my Sql Server database? - sql

I have a table that contains a few columns and then 2 final (nullable) columns which are varbinary (actually, they are SQL 2008 geography types, but I want to keep this post database agnostic).
I've hit around 500mb with around 200K rows. The varbinary is the problem - and I need the data.
So, I was wondering if it's bad if I do the following:-
Create a separate FILEGROUP: SpatialData.mdf
Create a new table, assigned to that new filegroup.
Move all the spatial data (read: last two fields) out of the original table and into the new table. The new table has a foreign key against the original table.
Create a view representing both tables.
Now, the view will be a left outer join because the relationship is: the new table has a zero or one row relationship to the original table.
EG.
Original Table
FooId INT PK NOT NULL IDENTITY
Blah VARCHAR(..) NOT NULL
Boo WHATEVER NOT NULL
New Table
FooID PK FK NOT NULL
Spatial_A VARBINARY(MAX)/GEOGRAPHY
Spatial_B VARBINARY(MAX)/GEOGRAPHY
The reason why I want to know if this is bad is because of the view and how the view is doing a join on the spatial table. I'll be using the view a lot. Currently, I'm just doing queries against the original table (because the new table doesn't exist just yet). By adding this join and the PK/FK relationship, will this impact performance?
Why split the data? I need to download the live DB to our dev servers now and then. We don't really care too much about those two spatial fields, so not having them is fine. Therefor, the size of the database to download will be much smaller.
Thoughts?

Instead of creating a second table, joining, and creating a view, a better solution that is possible with SQL Server 2005/2008 is to use table partitioning. To my recollection, you can vertically partition a table, and place some columns (i.e. your geospatial columns) in one file group, while putting the rest in another file group. SQL Server will handle the rest for you, you don't need to bother with a join, and you don't need a view.

The method that you've described is actually fairly common in my experience. Technically, if you were to normalize your database to the fullest extent you would have a lot of tables like that since one of the (usually not used) steps in normalization includes making sure that no columns have NULL values.
In practice it isn't usually carried out to that extent, but for a column (or columns) that is sparsely populated it's not a bad idea to separate it out. As long as the tables share the same primary key (which will of course be indexed), performance shouldn't be a problem.

Related

SQL - What is best to do when multiple tables have the same columns

I have different tables in my scheme with different columns, but I want to store data of when was the table modified or when was the data stored, so I added some columns to specify that.
I realized that I had to add the same "modification_date" and "modification_time" columns to all my tables, so I thought about making a new table called DATA_INFO so I won't need to do so, but every table has a different PRIMARY KEY and I don't know which one to add as FOREIGN KEY to the DATA_INFO table.
I don't know if I have to maybe add all of them or is there another way to do what I need.

It's better to have the same "modification_datetime" column in all tables, rather than trying to keep that data in a central table.
That's what we have done at every shop I've worked in.

I want to emphasize that a separate table is not reasonable for this purpose. The lack of an obvious foreign key is a hint.
Unlike Tab Allerman, tables that I create are much less likely to be updated, so I have three additional columns on most tables:
CreatedBy -- the user who created the row
CreatedAt -- when the row was creatd
CreatedOn -- the system where the table was created
The most important point is that this information can -- in many databases -- be implemented using default values rather than triggers. That is a big advantage of working within a single row. The fewer triggers, the better.

Is a generic ID column in a SQL table a bad idea?

In our database we have many tables with a 'Notes' column. This is important functionality, but for most rows the value of Notes is null. These tables have many columns and we would like to remove some columns for better legibility.
We could add one Notes table for every table that has a notes column. But this would create clutter of a different kind- too many small tables.
My idea is to create a generic Notes table and also a reference table. The Notes table would have a column for the notes text, a column for the id of the row being linked to, and a foreign key to the reference table. The reference table would have a text value for each table for which we need notes. Using these two tables we should be able to link the note back to whichever table and column it came from.
By using this solution, we remove any cases of null values from notes and also slim down some of our tables. All at the modest price of two additional tables. It feels very 'hacky' to me however. Is there a reason why using a 'generic' id column or a reference table of other tables is a bad idea from a DB management perspective?

Managing the references to disparate entities can be really challenging in SQL Server. Postgres, by contrast, supports inheritance which makes this much simpler.
So, my recommendation is to add a notes column to every entity where you want notes. You an add a view to bring all the notes together if you need a view of all the notes.
This has minimal impact on performance or data size. There is no additional overhead for a varchar column, other than the additional NULL bit -- and that is pretty minimal.

IMO, the other solution of managing two tables doesn't bring in much efficiency but adds complexity to the solution. You should probably stick with the the notes column in the original table with datatype as varchar.
Generic id column is not bad inherently but the use of it generally gives smell of bad/hacky design.

Additionaly for SQL Server you can use sparse for the note columns to reduce size.
But i used a similary approach myself. (Note column needed for many columns to write info / changerequest / lockcomment. But normally never used).
Works fine and can be programmed genericaly in source.
But if you need only one comment column per table i wood prefer sparse

A single table that represents multiple tables

I have a problem with finding a way to represent multiple tables hash tables into a single table.
Say I have 3 tables with the format:
Table1(Table1_PK1,Table1_PK2,Table1_PK3,Table1_Hash)
Table2(Table2_PK1,Table2_PK2,Table2_Hash)
Table3(Table3_Pk1,Table3_PK2,Table3_PK3,Table3_PK4,Table3_PK5,Table3_Hash)
Table1_PK1,Table1_PK2,Table1_PK3... are columns and they might have different datatypes (VARCHAR, INT or DATETIME ...).
My question is if there is a way to create a single table (fixed number of columns) that can represent all of these 3 tables (may be more in practical).
I am trying to do this for my database tool. Each table actual a table which contains primary keys and a hash data associating with them.

Since you're apparently building a database tool, not a database, it might make more sense to do this in application code rather than in a database table.
In a different answer, you commented
I am still looking for a dynamic way to do it without knowing how many primary keys a table can have.
A table can have only one primary key. That primary key can consist of more than one column, though. (You already knew this; you were just using the wrong words, which might confuse others.)
A table can also have an arbitrary number of other keys, which will be either declared (as NOT NULL UNIQUE) or "undeclared" (by creating an index that guarantees uniqueness over a set of columns).
You can look all that stuff up at run time in one or both of two ways. (Links go to documentation for PostgreSQL.)
System tables, sometimes called system catalogs
information_schema views
As far as I know, all modern SQL platforms implement at least one of these interfaces. The information_schema views are covered in the SQL standards, but there seems to be some room for interpretation. They don't look quite the same on all platforms.

Why combine the 3 tables into one? Would be really bad db design. But here's a way to do it:
The one table will have a column for each of the 3 tables' columns you want in the final table. I am making the assumption that TableX_Hash is the same type, so that remains as one unique column:
Table_All_in_One (
Table1_PK1,
Table1_PK2,
Table1_PK3,
# space just for clarity of grouping
Table2_PK1,
Table2_PK2,
Table3_PK1,
Table3_PK2,
Table3_PK3,
Table3_PK4,
Table3_PK5,
TableX_Hash # Assuming all the _Hash'es are the same type+length,
# otherwise, add Table1_Hash, Table2_Hash, Table3_Hash
# This can be your new primary key
)
The Primary Keys (PKx) are required to be non-NULL only in their own tables. For this table, they have to allow nulls. The idea is that each row of this new table will only hold the data for one of the tables. The other columns will be empty for that row. If you want to associate the row of one table with another, you can add that to the same row or add FK_Table1_Hash, FK_Table2_Hash and FK_Table3_Hash columns which will refer to the TableX_Hash value of a record.
PS: I wonder if what you are really looking for is a View and not this really bad all-in-one table.
Edit: Combining them into one "without knowing how many primary keys a table can have." as per your comment:
Store all the _PKs concatenated into one column:
Table_All_in_One (
New_PK,
TableX_Hash,
Table1_PKx, # Concatenated PKs of Table1
Table2_PKx, # Concatenated PKs of Table2, etc.
...,
# OR just one
TableX_PKs, # concatenate all the PK's into one VARCHAR field
# Add a pipe `|` between them optionally.
Table_Num # If using just one, then you'll need to store the table number
)
You will not be able to conveniently pick records based on part of their composite primary key. It will always have to be TableX_PKs = CONCAT_WS('|', Table1_PK1, Table1_PK2, ...). So your only dependency is the number of PKs in the original column.

In order to model a bunch of tables you will need 3 tables. An entity table that contains the table names of the tables you wish to set up this way called a factor or entity table. A Factor_detail table that contains all the columns and their associated properties of the tables. A table, factor_detail_value, for storing things like lookup values for lookup tables. I'm trying to learn more about this myself as well because we are using this technique at work as well. Genrate sql on the fly for any table so encoded, and store the data in a repository pertiinant to the data itself. This way if a table changes and you need to add a column or change a datatype, you can add a row to the factor detail table without affecting a database shut down in production. In most businesses a four hour shut down to make a sql data table change can cost thousands of dollars. If you are dealing with insurance for example, each additional state that you sell insurance in has different requirements for being able to seel it and that will result in table changes. We reduced our table count way down from over 700 tables in this manner also we can make changes without database shut down thus avoiding the loss in revenue.

Are relationship tables really needed?

Relationship tables mostly contain two columns: IDTABLE1, and IDTABLE2.
Only thing that seems to change between relationship tables is the names of those two columns, and table name.
Would it be better if we create one table Relationships and in this table we place 3 columns:
TABLE_NAME, IDTABLE1, IDTABLE2, and then use this table for all relationships?
Is this a good/acceptable solution in web/desktop application development? What would be downside of this?
Note:
Thank you all for feedback. I appreciate it.
But, I think you are taking it a bit too far... Every solution works until one point.
As data storage simple text file is good till certain point, than excel is better, than MS Access, than SQL Server, than...
To be honest, I haven't seen any argument that states why this solution is bad for small projects (with DB size of few GB).

It would be a monster of a table; it would also be cumbersome. Performance-wise, such a table would not be a great idea. Also, foreign keys are impossible to add to such a table. I really can't see a lot of advantages to such a solution.

Bad idea.
How would you enforce the foreign keys if IDTABLE1 could contain ids from any table at all?
To achieve acceptable performance on joins without a load of unnecessary IO to bring in completely unrelated rows you would need a composite index with leading column TABLE_NAME that basically ends up partitioning the table into sections anyway.
Obviously even with this pseudo partitioning going on you would still be wasting a lot of space in the table/indexes just repeating the table name for each row.

Isn't it a big IF that you're only going to store the 2 ID fields? If I have a StudentCourse (or better yet Enrollment) table that has StudentID & CourseID, but wouldn't EnrollmentDate go in this table as well since not all students enroll on the first day of class. Seems like a bad idea to add this column to an already bloated table where most records will be null.
The benefit of a single table could be a requirement that the application has the ability to allow user/admin to create these relationships with data (Similar to have a single lookup or reference list table) and avoid having to create a new table to address these User Created References. Needing dynamic querying may benefit as well. An application that requires such dynamic data structure requirements might be better suited for a schemaless or nosql database.

Difference between a db view and a lookuptable

When I create a view I can base it on multiple columns from different tables.
When I want to create a lookup table I need information from one table, for example the foreign key of an order table, to get customer details from another table. I can create a view having parameters to make sure it will get all data that I need. I could also - from what I have been reading - make a lookup table. What is the difference in this case and when should I choose for a lookup table?? I hope this ain't a bad question, I'm not very into db's yet ;).

Creating a view gives you a "live" representation of the data as it is at the time of querying. This comes at the cost of higher load on the server, because it has to determine the values for every query.
This can be expensive, depending on table sizes, database implementations and the complexity of the view definition.
A lookup table on the other hand is usually filled "manually", i. e. not every query against it will cause an expensive operation to fetch values from multiple tables. Instead your program has to take care of updating the lookup table should the underlying data change.
Usually lookup tables lend themselves to things that change seldomly, but are read often. Views on the other hand - while more expensive to execute - are more current.

I think your usage of "Lookup Table" is slightly awry. In normal parlance a lookup table is a code or reference data table. It might consist of a CODE and a DESCRIPTION or a code expansion. The purpose of such tables is to provide a lsit of permitted values for restricted columns, things like CUSTOMER_TYPE or PRIORITY_CODE. This category of table is often referred to as "standing data" because it changes very rarely if at all. The value of defining this data in Lookup tables is that they can be used in foreign keys and to populate Dropdowns and Lists Of Values.
What you are describing is a slightly different scenario:
I need information from one table, for
example the foreign key of an order
table, to get customer details from
another table
Both these tables are application data tables. Customer and Order records are dynamic. Now it is obviously valid to retrieve additional data from the Customer table to display along side the Order data, and in that sense Customer is a "lookup table". More pertinently it is the parent table of Order, because it has the primary key referenced by the foreign key on Order.
By all means build a view to capture the joining logic between Order and Customer. Such views can be quite helpful when building an application that uses the same joined tables in several places.

Here's an example of a lookup table. We have a system that tracks Jurors, one of the tables is JurorStatus. This table contains all the valid StatusCodes for Jurors:
Code: Value
WS : Will Serve
PP : Postponed
EM : Excuse Military
IF : Ineligible Felon
This is a lookup table for the valid codes.
A view is like a query.

Read this tutorial and you may find helpful info when a lookup table is needed:
SQL: Creating a Lookup Table

Just learn to write sql queries to get exactly what you need. No need to create a view! Views are not good to use in many instances, especially if you start to base them on other views, when they will kill performance. Do not use views just as a shorthand for query writing.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas