Synchronize tables with mismatched columns

Synchronize tables with mismatched columns - sql

Consider the following situation:
In my database: Table "Items" => Several thousand records, 35 columns.
Linked server: View "ItemInfo" => Several thousand records, 45 columns.
1 column in both ("ItemID") provides the matching key between table and view (Unique, not NULL in both.)
30 columns (not counting ItemID) appear on both sides, but may be of different type. E.g, in view ItemInfo it is a date or number field, while in table Items has a string containing a date or number. This happens on about half of the columns.
Table and view content do not exactly match.
About 99% of the rows appear in both with matching ItemID (but the other columns may have different values). The other 1% only appears on one side.
I need a stored procedure (in the database containing table Items) to update the table Items, so that that values from the view ItemInfo are copied in the table. (For a matching row the values from the view are leading.)
Additionally: In case a rows doesn't exist in table Items, it must be copied from view ItemInfo. Extras in the table (not present in the view) can be left as is.
I need to synchronize the table with the view several times a day, but the number of mutations (each synchronization) will be very low. Possibly none. I have no way to tell if a entry in the view is actually "new" or "updated".
My current proposed solution is to do this in 2 steps:
for rows in view ItemInfo that have no match in table Items insert that row in table Items, using a sub-expression for each column (if needed) to do type-conversion.
run an update over table Items updating each column (with a sub-expression) with the corresponding value from the matching entry (if existing) from view ItemInfo.
Step 2 is obviously very in-efficient, because there is no need to update the vast majority of the rows.
Only a handful (if any) will actually have been changed since the last update.
The whole thing involves a bloody mess of sub-expressions, but I don't see any other approach.
Can someone think of a better idea to implement something like this?
To clarify:
I'm not looking for an complete solution to my problem, but more for a general approach to take on similar problems.
I'm certain I'm going to face this multiple times over the coming years...
P.S.
I have no control over the view. Just read-only access to read it. I do have full control of the database where table is located, but I can't change anything about the structure of table Items itself, because there is other software interfacing with this table over which I have no control. I'm free to add temporary tables if needed.

Related

Connecting one foreign key to multiple tables (primary keys)

I am developing an application for making quotations. First you make cost break down (or calculation) and upon that result you add item to quotation. The problem is that i have many product, so each category of a product will have its own cost break down form with different parameters to be filled in. If I will have only one table for cost breakdown, then it will be huge (a lot of fields in table). I have a feeling that this is not the right approach. So I came up with diagram below:
Is this solution even possible, or I must have "N" (if I have N-tables) different FK for each cost break down table? Do you have any better solutions?
I have another question if my linking table "Quotation_QtnDetail" is necessary?

It would be possible to store a reference to a particular value in one of these tables by having a CalculationType column indicating which table the record is in, along with a generic reference ID column (containing the ID of the relevant record). For example, if you were storing a CalcId of 123 and a CalculationType of 2, this would point to the record with ID 123 in the Calc2 table.
The downside to doing this is you're going to lose the ability to validate your data using FK constraints, and it will also make joins to your calculation tables a bit more complicated.
Regarding the Quotation_QtnDetail table, unless a QtnDetail record could ever be linked to multiple Quotation records, there is no need for this extra linking table. Instead, just link it directly by adding a QtnId column to the QtnDetail table. Similarly, you may also be able to remove the Calc_QtnItm table if an item is only ever linked to a single calculation record.

Having placeholder columns when creating new database table

Is it good practice to add some placeholder columns when creating a database table with millions of rows, in case the schema gets changed later? More efficient to rename a column than to insert a new one?

There are many problems with adding "placeholder" columns to a table.
These columns may take up useless space, and appear "sloppy".
You may create too many columns now, and have columns that will never be used.
You may not create enough columns now, and will have to end up creating more anyways.
You don't know what the column data types will be at this time.
Always remember that if a column needs added at a later date and will not be used for any of the current rows in the table, you can still keep the table normalized by creating a smaller table that holds this information, then link them by using the primary key.
Let me know if you have any questions about this. I hope this helps!

A single table that represents multiple tables

I have a problem with finding a way to represent multiple tables hash tables into a single table.
Say I have 3 tables with the format:
Table1(Table1_PK1,Table1_PK2,Table1_PK3,Table1_Hash)
Table2(Table2_PK1,Table2_PK2,Table2_Hash)
Table3(Table3_Pk1,Table3_PK2,Table3_PK3,Table3_PK4,Table3_PK5,Table3_Hash)
Table1_PK1,Table1_PK2,Table1_PK3... are columns and they might have different datatypes (VARCHAR, INT or DATETIME ...).
My question is if there is a way to create a single table (fixed number of columns) that can represent all of these 3 tables (may be more in practical).
I am trying to do this for my database tool. Each table actual a table which contains primary keys and a hash data associating with them.

Since you're apparently building a database tool, not a database, it might make more sense to do this in application code rather than in a database table.
In a different answer, you commented
I am still looking for a dynamic way to do it without knowing how many primary keys a table can have.
A table can have only one primary key. That primary key can consist of more than one column, though. (You already knew this; you were just using the wrong words, which might confuse others.)
A table can also have an arbitrary number of other keys, which will be either declared (as NOT NULL UNIQUE) or "undeclared" (by creating an index that guarantees uniqueness over a set of columns).
You can look all that stuff up at run time in one or both of two ways. (Links go to documentation for PostgreSQL.)
System tables, sometimes called system catalogs
information_schema views
As far as I know, all modern SQL platforms implement at least one of these interfaces. The information_schema views are covered in the SQL standards, but there seems to be some room for interpretation. They don't look quite the same on all platforms.

Why combine the 3 tables into one? Would be really bad db design. But here's a way to do it:
The one table will have a column for each of the 3 tables' columns you want in the final table. I am making the assumption that TableX_Hash is the same type, so that remains as one unique column:
Table_All_in_One (
Table1_PK1,
Table1_PK2,
Table1_PK3,
# space just for clarity of grouping
Table2_PK1,
Table2_PK2,
Table3_PK1,
Table3_PK2,
Table3_PK3,
Table3_PK4,
Table3_PK5,
TableX_Hash # Assuming all the _Hash'es are the same type+length,
# otherwise, add Table1_Hash, Table2_Hash, Table3_Hash
# This can be your new primary key
)
The Primary Keys (PKx) are required to be non-NULL only in their own tables. For this table, they have to allow nulls. The idea is that each row of this new table will only hold the data for one of the tables. The other columns will be empty for that row. If you want to associate the row of one table with another, you can add that to the same row or add FK_Table1_Hash, FK_Table2_Hash and FK_Table3_Hash columns which will refer to the TableX_Hash value of a record.
PS: I wonder if what you are really looking for is a View and not this really bad all-in-one table.
Edit: Combining them into one "without knowing how many primary keys a table can have." as per your comment:
Store all the _PKs concatenated into one column:
Table_All_in_One (
New_PK,
TableX_Hash,
Table1_PKx, # Concatenated PKs of Table1
Table2_PKx, # Concatenated PKs of Table2, etc.
...,
# OR just one
TableX_PKs, # concatenate all the PK's into one VARCHAR field
# Add a pipe `|` between them optionally.
Table_Num # If using just one, then you'll need to store the table number
)
You will not be able to conveniently pick records based on part of their composite primary key. It will always have to be TableX_PKs = CONCAT_WS('|', Table1_PK1, Table1_PK2, ...). So your only dependency is the number of PKs in the original column.

In order to model a bunch of tables you will need 3 tables. An entity table that contains the table names of the tables you wish to set up this way called a factor or entity table. A Factor_detail table that contains all the columns and their associated properties of the tables. A table, factor_detail_value, for storing things like lookup values for lookup tables. I'm trying to learn more about this myself as well because we are using this technique at work as well. Genrate sql on the fly for any table so encoded, and store the data in a repository pertiinant to the data itself. This way if a table changes and you need to add a column or change a datatype, you can add a row to the factor detail table without affecting a database shut down in production. In most businesses a four hour shut down to make a sql data table change can cost thousands of dollars. If you are dealing with insurance for example, each additional state that you sell insurance in has different requirements for being able to seel it and that will result in table changes. We reduced our table count way down from over 700 tables in this manner also we can make changes without database shut down thus avoiding the loss in revenue.

Difference between a db view and a lookuptable

When I create a view I can base it on multiple columns from different tables.
When I want to create a lookup table I need information from one table, for example the foreign key of an order table, to get customer details from another table. I can create a view having parameters to make sure it will get all data that I need. I could also - from what I have been reading - make a lookup table. What is the difference in this case and when should I choose for a lookup table?? I hope this ain't a bad question, I'm not very into db's yet ;).

Creating a view gives you a "live" representation of the data as it is at the time of querying. This comes at the cost of higher load on the server, because it has to determine the values for every query.
This can be expensive, depending on table sizes, database implementations and the complexity of the view definition.
A lookup table on the other hand is usually filled "manually", i. e. not every query against it will cause an expensive operation to fetch values from multiple tables. Instead your program has to take care of updating the lookup table should the underlying data change.
Usually lookup tables lend themselves to things that change seldomly, but are read often. Views on the other hand - while more expensive to execute - are more current.

I think your usage of "Lookup Table" is slightly awry. In normal parlance a lookup table is a code or reference data table. It might consist of a CODE and a DESCRIPTION or a code expansion. The purpose of such tables is to provide a lsit of permitted values for restricted columns, things like CUSTOMER_TYPE or PRIORITY_CODE. This category of table is often referred to as "standing data" because it changes very rarely if at all. The value of defining this data in Lookup tables is that they can be used in foreign keys and to populate Dropdowns and Lists Of Values.
What you are describing is a slightly different scenario:
I need information from one table, for
example the foreign key of an order
table, to get customer details from
another table
Both these tables are application data tables. Customer and Order records are dynamic. Now it is obviously valid to retrieve additional data from the Customer table to display along side the Order data, and in that sense Customer is a "lookup table". More pertinently it is the parent table of Order, because it has the primary key referenced by the foreign key on Order.
By all means build a view to capture the joining logic between Order and Customer. Such views can be quite helpful when building an application that uses the same joined tables in several places.

Here's an example of a lookup table. We have a system that tracks Jurors, one of the tables is JurorStatus. This table contains all the valid StatusCodes for Jurors:
Code: Value
WS : Will Serve
PP : Postponed
EM : Excuse Military
IF : Ineligible Felon
This is a lookup table for the valid codes.
A view is like a query.

Read this tutorial and you may find helpful info when a lookup table is needed:
SQL: Creating a Lookup Table

Just learn to write sql queries to get exactly what you need. No need to create a view! Views are not good to use in many instances, especially if you start to base them on other views, when they will kill performance. Do not use views just as a shorthand for query writing.

One mysql table with many fields or many (hundreds of) tables with fewer fields?

I am designing a system for a client, where he is able to create data forms for various products he sales him self.
The number of fields he will be using will not be more than 600-700 (worst case scenario). As it looks like he will probably be in the range of 400 - 500 (max).
I had 2 methods in mind for creating the database (using meta data):
a) Create a table for each product, which will hold only fields necessary for this product, which will result to hundreds of tables but with only the neccessary fields for each product
or
b) use one single table with all availabe form fields (any range from current 300 to max 700), resulting in one table that will have MANY fields, of which only about 10% will be used for each product entry (a product should usualy not use more than 50-80 fields)
Which solution is best? keeping in mind that table maintenance (creation, updates and changes) to the table(s) will be done using meta data, so I will not need to do changes to the table(s) manually.
Thank you!
/**** UPDATE *****/
Just an update, even after this long time (and allot of additional experience gathered) I needed to mention that not normalizing your database is a terrible idea. What is more, a not normalized database almost always (just always from my experience) indicates a flawed application design as well.

i would have 3 tables:
product
id
name
whatever else you need
field
id
field name
anything else you might need
product_field
id
product_id
field_id
field value

Your key deciding factor is whether normalization is required. Even though you are only adding data using an application, you'll still need to cater for anomalies, e.g. what happens if someone's phone number changes, and they insert multiple rows over the lifetime of the application? Which row contains the correct phone number?
As an example, you may find that you'll have repeating groups in your data, like one person with several phone numbers; rather than have three columns called "Phone1", "Phone2", "Phone3", you'd break that data into its own table.
There are other issues in normalisation, such as transitive or non-key dependencies. These concepts will hopefully lead you to a database table design without modification anomalies, as you should hope for!

Pulegiums solution is a good way to go.
You do not want to go with the one-table-for-each-product solution, because the structure of your database should not have to change when you insert or delete a product. Only the rows of one or many tables should be inserted or deleted, not the tables themselves.

While it's possible that it may be necessary, having that many fields for something as simple as a product list sounds to me like you probably have a flawed design.
You need to analyze your potential table structures to ensure that each field contains no more than one piece of information (e.g., "2 hammers, 500 nails" in a single field is bad) and that each piece of information has no more than one field where it belongs (e.g., having phone1, phone2, phone3 fields is bad). Either of these situations indicates that you should move that information out into a separate, related table with a foreign key connecting it back to the original table. As pulegium has demonstrated, this technique can quickly break things down to three tables with only about a dozen fields total.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas