Opinions on planning and avoiding data redundancy

Opinions on planning and avoiding data redundancy - vb.net

I am currently going to be designing an app in vb.net to work with an access back-end database. I have been trying to think of ways to reduce down data redundancy
and I have an example scenario below:
Lets imagine, for an example purpose, I have a customers table and need to highlight all customers in WI and send them a letter. The customers table would
contain all the customers and properties associated with customers (Name, Address, Etc) so we would query for where the state is "WI" in the table. Then we would
take the results of that data, and append it into a table with a "completion" indicator (So from 'CUSTOMERS' to say 'WI_LETTERS' table).
Lets assume some processing needs to be done so when its completed, mark a field in that table as 'complete', then allow the letters to be printed with
a mail merge. (SELECT FROM 'WI_LETTERS' WHERE INDICATOR = COMPLETE).
That item is now completed and done. But lets say, that every odd year (2013) we also send a notice to everyone in the table with a state of "WI". We now query the
customers table when the year is odd and the customer's state is "WI". Then append that data into a table called 'notices' with a completion indicator
and it is marked complete.
This seems to keep the data "task-based" as the data is based solely around the task at hand. However, isn't this considered redundant data? This setup means there
can be one transaction type to many accounts (even multiple times to the same account year after year), but shouldn't it be one account to many transactions?
How is the design of this made better?

You certainly don't want to start creating new tables for each individual task you perform. You may want to create several different tables for different types of tasks if the information you need to track (and hence the columns in those tables) will be quite different between the different types of tasks, but those tables should be used for all tasks of that particular type. You can maintain a field in those tables to identify the individual task to which each record applies (e.g., [campaign_id] for Marketing campaign mailouts, or [mail_batch_id], or similar).
You definitely don't want to start creating new tables like [WI_letters] that are segregated by State (or any client attribute). You already have the customers' State in the [Customers] table so the only customer-related attribute you need in your [Letters] table is the [CustomerID]. If you frequently want to see a list of Letters for Customers in Wisconsin then you can always create a saved Query (often called a View in other database systems) named [WI_Letters] that looks like
SELECT * FROM Letters INNER JOIN Customers ON Customers.CustomerID=Letters.CustomerID
WHERE Customers.State="WI"

Related

SQL - MS Access Form design - add data of ISA relationships

I'm taking a DBMS course and I need to design and build my own DB. I have a database for a hospital where doctors,nurses,support staff etc are in a ISA relationship to an Employee entity with the rest of the data like the name, address , salary and the rest of the employee data.
Designing a form, I want to be able to add an employee with all of their data in one form.
Is there a way to do a "conditional table" of sorts where if i select "doctor" from a drop-down i get to add to the doctor table too, and same for the rest of the entities under the ISA relationship?
Thx!

As a general rule, when dealing with data, you do NOT flip or switch tables for a given form or relatonal database design.
So, for example. If I have a table of customers. Well, now if I want to mark some of the customers as plumbers, and others as doctors? I don't create two tables.
All I would do is add ONE column to that customers table and it would simply allow me to set the type of customer. The reason for this design is "many" but some significant reasons are:
For each new type of customer, you would not create a new table. Worse, all of the forms, the reports, the SQL, the code you write? Well, all of that code would have to be modified EACH time you create a new table. So, you SIMPLY cannot adopt a design in which the concept of changing a table is part of that process.
Forms are bound to ONE table. For related data, you in most cases will use a sub form.
So, think of even a accounting system. They can have huge numbers of customers, and as a result, you can "query" that table to give you all customers. Or you might ask how many accounting firms are in the customer list. Or make a report that summeries by customer type a "count" of each type of customer.
So, buidling forms, or reports? They cannot on the fly "change" the tables they are using.
So, in place of a tables called:
SalesJan
SalesFeb
SalesMar
etc.
Well, now you can't query sales from Jan to mar, because the data is in different tables.
So, what you do is have ONE table called "sales", and you add ONE column of the date. Now, at the start of each new month, you don't have to create a new table.
Now, of course in some cases it makes sense to create a separate table. For example, a table of customers, and a table of employees in a database is just fine. It makes sense in this case to use two tables, since the information about a customer and what they can do and the kind of information is VERY different then how you would deal with employees.
So, with above? Well, if I need to print mailing labels for all customers and all employees? That would require two different reports. And very likely the table structure for the two tables is different.
Bottom line:
If you working on design or form or report? And you needing to try and change the table that the form/report/code etc is going to operate on? This is a sign that your design approach has gone complete off the rails and is the wrong design.
So, in the case of doctors, nurses etc.? Well, they are all hospital staff, and MOST of the basic information about such employees will be common, much the same, and thus a SINGLE table of "employees" makes the most sense. You would only need a nice "employee type" combo box on that one form, and thus you can add/enter/edit/search any employee in that one table.
The fact that you "want to search" for a employee show that all these people "are" employees and thus belong in one table. And the basic information about all employees is going to be the same anyway. If you find you are attempting to create a new table but with near identical structures over and over, then just like a new table for each month sales, or a new table for each new kind of employee? Simply add the "one" column that allows you to make that distinguish, and not a whole new table.
Now one COULD even attempt to put patients in the same table, but then again, dealing with patents as opposed employees is a considerable different kind of "thing".
So employees are employees - even different kinds. (manager, cleaning staff etc.).
And patients are patients - even different kinds (long term care, emergency etc.).

Custom user defined database fields, what is the best solution?

To keep this as short as possible I'm going to use and example.
So let's say I have a simple database that has the following tables:
company - ( "idcompany", "name", "createdOn" )
user - ( "iduser", "idcompany", "name", "dob", "createdOn" )
event - ( "idevent", "idcompany", "name", "description", "date", "createdOn" )
Many users can be linked to a single company as well as multiple events and many events can be linked to a single company. All companies, users and events have columns as show above in common. However, what if I wanted to give my customers the ability to add custom fields to both their users and their events for any unique extra information they wish to store. These extra fields would be on a company wide basis, not on a per record basis ( so a company adding a custom field to their users would add it to all of their users not just one specific user ). The custom fields also need to be sesrchable and have the ability to be reported on, ideally automatically with some sort of report wizard. Considering the database is expected to have lots of traffic as well as lots of custom fields, what is the best solution for this?
My current research and findings in possible solutions:
To have generic placeholder columns such as "custom1", "custom2" etc.
** This is not viable as there will eventually be too many custom columns and there will be too many NULL values stored in the database
To have 3x tables per current table. eg: user, user-custom-field, user-custom-field-value. The user table being the same. The user-custom-field table containing the information about the new field such as name, data type etc. And the user-custom-field-value table containing the value for the custom field
** This one is more of a contender if it were not for its complexity and table size implications. I think it will be impossible to avoid a user-custom-field table if I want to automatically report on these fields as I will have to store the information on how to report on these fields here. However, In order to pull almost any data you would have to do a million joins on the user-custom-field-value table as well as the fact that your now storing column data as rows which in a database expected to have a lot of traffic as well as a lot of custom fields would soon cause a problem.
Create a new user and event table for each new company that is added to the system removing the company id from within those tables and instead using it in the table name ( eg user56, 56 being the company id ). Then allowing the user to trigger DB commands that add the new custom columns to the tables giving them the power to decide if it has a default value or auto increments etc.
** Everytime I have seen this solution it has always instantly been shut down by people saying it would be unmanageable as you would eventually get thousands of tables. However nobody really explains what they mean by unmanageable. Firstly as far as my understanding goes, more tables is actually more efficient and produces faster search times as the tables are much smaller. Secondly, yes I understand that making any common table changes would be difficult but all you would have to do is run a script that changes all your tables for each company. Finally I actually see benefits using this method as it would seperate company data making it impossible for one to accidentally access another's data via a potential bug, plus it would potentially give the ability to back up and restore company data individually. If someone could elaborate on why this is perceived as a bad idea It would be appreciated.
Convert fully or partially to a NoSQL database.
** Honestly I have no experience with schemaless databases and don't really know how dynamic user defined fields on a per record basis would work ( although I know it's possible ). If someone could explain the implications of the switch or differences in queries and potential benefits that would be appreciated.
Create a JSON column in each table that requires extra fields. Then add the extra fields into that JSON object.
** The issue I have with this solution is that it is nearly impossible to filter data via the custom columns. You would not be able to report on these columns and until you have received and processed them you don't really know what is in them.
Finally if anyone has a solution not mentioned above or any thoughts or disagreements on any of my notes please tell me as this is all I have been able to find or figure out for myself.

A typical solution is to have a JSON (or XML) column that contains the user-defined fields. This would be an additional column in each table.
This is the most flexible. It allows:
New fields to be created at any time.
No modification to the existing table to do so.
Supports any reasonable type of field, including types not readily available in SQL (i.e. array).
On the downside,
There is no validation of the fields.
Some databases support JSON but do not support indexes on them.
JSON is not "known" to the database for things like foreign key constraints and table definitions.

How do I make a subtable in Access in this situation?

I'm aware that someone will see this and say "I'm sure there's already an answer to "how do I make a subtable", so this question is repetitive." Well, no answers seem to be relevant to THIS situation. Here goes:
I own a car business. I want to make a rolling list of costs associated with each car, to eventually be summed into "totalCosts".
I have table
ASSETS
stock#,
make,
model,
purchase price,
totalCosts
and table
COSTS
description,
cost (in $)
so I want the COSTS table to contain several costs like:
purchase price $2,000
paint $50
new tires $200
I can figure out how to sum the costs, but mainly I need to know how do I make the COSTS table to exist for each car or stock#. So each stock# would have its list of COSTS, which I would sum and insert into total costs.
The only solution I see now is to make a costs table that contains every cost with a stock # on the same line, but I would like to have a separate costs table for each stock #

This is really the most basic related table you can make – quite much applies to near any database system.
You cost table will have
Id - pk id (all tables should have this PK)
Asset ID - standard long number column used to relate back to the assets table.
Item (description of the cost (paint, tires, seat repair, etc. etc. etc.)
Cost (the amount of the given item)
So you build a main form that is based on the assets, and then build a sub form that likely best setup of as a “repeating rows” or so called multiple items form (of which you drop into your main form, and thus the “costs” form becomes a sub-form of the assets.
So in effect, you attach each row of cost to a given single assets record. And access will “set” the Asset ID column for you automatic if you drop in the costs form as a sub form into the assets form.
The form will thus look something like this form:
In above, (a access application and a basic form I built in Access) I actually have “two” columns in which to select the “item” or “cost” type. I have sunglasses, but on the next row I could add tires, then paint, then whoever I want. Over time, you can add any kind of “new” cost item without having to create a new table or change the database structure.
If your design requires a “change” in the table structure for each new “thing” you enter, then you have the wrong design. Can you imagine an accounting package that you have to “stop” all the time and modify the software?
And then the real mess comes when you attempt to build a report. Once again, reports only work on pre-defined tables – you can “switch” tables on the fly for a report.
So any concept you have in regards to spreadsheets etc. must be tossed out of your mind. Computer information systems SIMPLY DO NOT WORK THAT WAY AND CANNOT WORK THAT WAY!
So keep in mind that Relational databases are NOT spreadsheets, and you don’t and CAN NOT adopt designs that require structure changes on the fly. Forms, reports, a query, and even computer code cannot function on tables that change for each record you enter – you MUST adopt a data model that reflects your current needs. That data model if done right can THEN be used for very complex accounting systems, ERP systems, or even a web store front that has 1000’s or even millions of different products. Such data models are designed first, and then the forms and user interface, the reports etc. are THEN created.
The same even goes for job costing. You might have liquids, labor costs, wall paper used in feet, paint used in gallons, and tiles used in meters. So just like an invoice system, job costing systems can handle “different” kinds of costs, and no table changes are required.
In above, we have an invoice like setup, and we can add as many things we want to a tour booking. (Tickets, jackets, books, skates, tires, paint, seats, and windows - whatever we want – we simply add a new row for whatever we need.
Think of any kind of “invoice” software you used – you can add as many items you want to that invoice.
A relational database does NOT support a whole new table for each new “cost” that you want to build – databases simple do not work this way. So you only (so far) really need a master table of the asset, and then the child table of “costs”.
You can’t create a whole new “cost” table for each asset as the built in tools in near any and all databases don’t support nor work with creating a new table for each new set of data you create.
And for ease of data entry, you could build a table of cost Items, so the user does not have to type in paint, tires etc., but choose that item from a drop down combo box.
So EVERY single example on the internet, every book, and every article that explains how to relate a master table to a child table is in fact 100% relevant and is how you HAVE to approach this problem.
Relational databases do not support creating of a whole new table for “one” new set of data, because such an approach cannot be used in reports nor does the query language support such a design.

SQL full text index on linked and parent child tables

I would like broad guidelines before hitting the details, and thus as brief as I can on two issues (this might be far too little info):
Supplier has more than one address. Address is made up of fields. Address line 1 and 2 is free text. The rest are keys to master data tables that has FK and name. Id to country. Id to province. ID to municipality. ID to city. ID to suburb. I would like to employ FTS on address line 1 and 2 and also all the master data table name columns so that user can find suppliers whose address match what they capture. This is thus across various master data tables. Note that a province or suburb etc is not only single word, e.g. meadow park.
Supplier provides many goods and services. These goods and services are a 4 level hierarchy (UNSPSC) of parent and child items. The goods or service is at the lowest level of the 4 level hierarchy, but hits on the higher levels should be returned as well. Supplier linked to lowest items of hierarchy. Would like to employ FTS to find supplier who provides goods and services across the 4 level hierarchy.
The idea is this to find matches, return suppliers, show rank, show where it hit. If I'm unable to show the hit, the result makes less sense, e.g. search for the word "car" will hit on care, cars, cardiovascular, cards etc. User can type in more than one word, e.g. "car service".
Should the FTS indexes be on only the text fields on the master data tables, and my select thus inner join on FTS fields? Should I create views and indexes on those? How do I show the hit words?

Should the FTS indexes be on only the text fields on the master data tables...?
When you have fields across multiple tables that need to be searched in a single query and then ranked, the best practice is to combine those fields into a single field through an ETL process. I recently posted an answer where I explained the benefits of this approach:
Why combine them into 1 table? This approach results in better ranking than if you were to apply full text indexes to each existing
table. The former solution produces a single rank whereas the latter
will produce a different rank for each table and there is no accurate
way to resolve multiple ranks (which are based on completely different
scales) into 1 rank. ...
How can you combine them into 1 table? You will need some sort of ETL process which either runs on a schedule (which may be easier to
implement but will result in lag time where your full text index is
out of sync with the master tables) or gets run on demand whenever
your master tables are modified (either via triggers or by hooking
into an event in your data layer).
How do I show the hit words?
Unfortunately SQL Server Full Text does not have a feature that extracts or highlights the words/phrases that were matched during the search. The answers to this question have tips on how to roll your own solution. There's also a 3rd party product called ThinkHighlight which is a CLR assembly that helps with highlighting (I've never used it so I can't vouch for it).
...search for the word "car" will hit on care, cars, cardiovascular, cards etc...
You didn't explicitly ask about this but you should be aware that by default "car" will not match "care", etc. What you're looking to do is a wildcard search. Your full text query will need to use an asterisk and should look something like this: SELECT * FROM CONTAINSTABLE(MyTable, *, '"car*"') Be aware that wildcards are only available when using CONTAINS/CONTAINSTABLE (boolean searches), not FREETEXT/FREETEXTTABLE (natural language searches). Based on how you describe your use case, it sounds like you will need to modify your user's search string to add the wildcards. You'll need to do this anyway if you use CONTAINS/CONTAINSTABLE in order to add the boolean operators and quotes (ex: User types car service. You change it to "car*" AND "service*".)

One mysql table with many fields or many (hundreds of) tables with fewer fields?

I am designing a system for a client, where he is able to create data forms for various products he sales him self.
The number of fields he will be using will not be more than 600-700 (worst case scenario). As it looks like he will probably be in the range of 400 - 500 (max).
I had 2 methods in mind for creating the database (using meta data):
a) Create a table for each product, which will hold only fields necessary for this product, which will result to hundreds of tables but with only the neccessary fields for each product
or
b) use one single table with all availabe form fields (any range from current 300 to max 700), resulting in one table that will have MANY fields, of which only about 10% will be used for each product entry (a product should usualy not use more than 50-80 fields)
Which solution is best? keeping in mind that table maintenance (creation, updates and changes) to the table(s) will be done using meta data, so I will not need to do changes to the table(s) manually.
Thank you!
/**** UPDATE *****/
Just an update, even after this long time (and allot of additional experience gathered) I needed to mention that not normalizing your database is a terrible idea. What is more, a not normalized database almost always (just always from my experience) indicates a flawed application design as well.

i would have 3 tables:
product
id
name
whatever else you need
field
id
field name
anything else you might need
product_field
id
product_id
field_id
field value

Your key deciding factor is whether normalization is required. Even though you are only adding data using an application, you'll still need to cater for anomalies, e.g. what happens if someone's phone number changes, and they insert multiple rows over the lifetime of the application? Which row contains the correct phone number?
As an example, you may find that you'll have repeating groups in your data, like one person with several phone numbers; rather than have three columns called "Phone1", "Phone2", "Phone3", you'd break that data into its own table.
There are other issues in normalisation, such as transitive or non-key dependencies. These concepts will hopefully lead you to a database table design without modification anomalies, as you should hope for!

Pulegiums solution is a good way to go.
You do not want to go with the one-table-for-each-product solution, because the structure of your database should not have to change when you insert or delete a product. Only the rows of one or many tables should be inserted or deleted, not the tables themselves.

While it's possible that it may be necessary, having that many fields for something as simple as a product list sounds to me like you probably have a flawed design.
You need to analyze your potential table structures to ensure that each field contains no more than one piece of information (e.g., "2 hammers, 500 nails" in a single field is bad) and that each piece of information has no more than one field where it belongs (e.g., having phone1, phone2, phone3 fields is bad). Either of these situations indicates that you should move that information out into a separate, related table with a foreign key connecting it back to the original table. As pulegium has demonstrated, this technique can quickly break things down to three tables with only about a dozen fields total.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas