Best practice for many to many data mapping - sql

I am looking to find the best practice for many to many mapping in database.
For example I have two tables to be mapped. I create third table to store this mapping.
On UI I have several A to be mapped(or not) multiple with B. And I see two solutions for now:
1 - On every update for every record from A I will delete all mapped data for it and insert new data mapping.
Advantage: I store only mapped data.
Disadvantage: I need to use delete and insert statement every time.
2 - I need to add new bit column to AB table with name isMapped. And I will store all mapping for every record from A to every record from B. On save mapping action I will use only update statement.
Advantage: No need to delete and insert every time.
Disadvantage: Need to store unnecessary records.
Can you offer me best solution?
Thanks

between the 2 options you have listed I would go with option no 1, isMapped is not meaningful, if they are not mapped the records should not exists in the first place.
you still have one more option though:
DELETE FROM AB where Not in the new map
INSERT INTO AB FROM (New map) where NOT in AB
if these are a lot of maps I would delete and insert from the new mapping, otherwise I would just delete all then insert like you are suggesting.

I'd say anytime you see the second bullet point in your #2 scenario
"Need to store unnecessary records"
that's your red flag not to use that scenario.
Your data is modeled correctly in scenario 1, i.e. mapppings exist in the mapping table when there are mappings between records in A and B and mappings do not exist in the mapping table when there is not a mapping between those records in A and B.
Also, the underlying mechanics of an update statement are a delete and then an insert, so you are not really saving the database any work by issuing one over the other.
Lastly, speaking of saving the database work, don't try and do it at this stage. This is what they are designed for. :)
Implementing your data model correctly as you are in Scenario 1 is the best optimization you can make.
Once you have the basic normalized structure in place and have some test data, then you can start testing performance and refactoring if necessary. Adding indexes, changing data structures, etc.

Related

EF: Inserting already present record in many to many relationship

For what I searched there are 2 ways to insert an already present record into a ICollection list:
group.Users.Add(db.Users.FirstOrDefault(x=> x.Id = 1));
var to_add = new User{Id: 1}; db.Users.Attach(to_add); group.Users.Add(to_add);
The problem with both the above approach is it makes a db call every time we want to add a record. While we already know the user's Id and the group's id and that's all it needs to create a relationship.
Imagine a long list to be added, both the above methods would make multiple calls to db.
So you have Groups and Users. Every Group has zero or more Users; every User has zero or more Groups. A traditional many-to-many relationship.
Normally one would add a User to a Group, or a Group to a User. However you don't have a Group, nor a User, you only have a GroupId and a UserId. and because of the large number of insertions you don't want to fetch the Users and the Groups of which you want to create relations
The problem is, if you could add the GroupId-UserId combination directly to your junction table, how would you know that you wouldn't be adding a Group-User relation that already exists? If you wouldn't care, you'd end up with twice the relation. This would lead to problems: Would you want them to be shown twice if you'd ask the Users of a Group? Which one should be removed if the relation ends, or should they all be removed?
If you really want to implement the possibility of double relation, then you'd need to Implement a a Custom Junction Table as described here The extra field would be the number of relations.
This would not help you with your large batch, because you would still need to fetch the field from the custom junction table to increment the NrOfRelations value.
On the other hand, if you don't want double relations, you'd have to check whether the value already exists, and you didn't want to fetch data before inserting.
Usually the number of additions to a database is far less then the number of queries. If you have a large batch of data to be inserted, then it is usually only during the initialization phase of the database. I wouldn't bother optimizing initialization too much.
Consider remembering already fetched Groups and Users in a dictionary, preventing them to be fetched twice. However, if your list is really huge, this is not a practical solution.
If you really need this functionality for a prolonged period of time consider creating a Stored Procedure that checks if the GroupId / UserId already exists in the junction table, and if not, add it.
See here For SQL code how to do Add-Or-Update
Entity Framework call stored procedure

How to deal with lots of parameters in database?

Let's say you need to store 4 different document types somewhere (i.e, draft, sent for approval, approved, rejected).
You can create a new table for that and store it there. But it seems a bit excessive to create a table just for 4 entries.
Another approach is to create a one big table for all parameters in general and store them there. So if you ever in a need of storing 4, 15 or 2000 new parameters, you can just simply insert them into the Parameters table and store them there all together and not create a new table.
But what if the number of parameters gets bigger? Let's say, 1.000.000, 5.000.000 or even more? What would be the best approach then?
I'm not talking about the particular database but if that would help, it's either Oracle 12c or OpenEdge (Progress) database.
But it seems a bit excessive to create a table just for 4 entries.
Work with well-designed databases for a while, and it won't seem excessive.
The main problem with one big "parameters" table is that foreign keys can reference any unique row, not just the rows you want. For example, if you had one big table like this . . .
draft
sent for approval
approved
rejected
USA
Great Britain
California
...
. . . then sooner or later you'll have a document whose type is "California".
But this stems from a fundamental misunderstanding of the relational model. In the relational model, a domain consists of all possible values for a particular attribute. There are only four possible values for the "document type" attribute. For reliable data, the dbms needs to know that. And usually the best way to tell the dbms there are only four possible values is to set a foreign key reference to a column that contains only those four values.
It's an anti-pattern. Search online for "one true lookup table" or "OTLT".

Db design for data update approval

I'm working on a project where we need to have data entered or updated by some users go through a pending status before being added into 'live data'.
Whilst preparing the data the user can save incomplete records. Whilst the data is in the pending status we don't want the data to affect rules imposed on users editing the live data e.g. a user working on the live data should not run up against a unique contraint when entering the same data that is already in the pending status.
I envisage that sets of data updates will be grouped into a 'data submission' and the data will be re-validated and corrected/rejected/approved when someone quality control the submission.
I've thought about two scenarios with regards to storing the data:
1) Keeping the pending status data in the same table as the live data, but adding a flag to indicate its status. I could see issues here with having to remove contraints or make required fields nullable to support the 'incomplete' status data. Then there is the issue with how to handle updating existing data, you would have to add a new row for an update and link it back to existing 'live' row. This seems a bit messy to me.
2) Add new tables that mirror the live tables and store the data in there until it has been approved. This would allow me to keep full control over the existing live tables while the 'pending' tables can be abused with whatever the user feels he wants to put in there. The downside of this is that I will end up with a lot of extra tables/SPs in the db. Another issue I was thinking about was how might a user link between two records, whereby the record linked to might be a record in the live table or one in the pending table, but I suppose in this situation you could always take a copy of the linked record and treat it as an update?
Neither solutions seem perfect, but the second one seems like the better option to me - is there a third solution?
Your option 2 very much sounds like the best idea. If you want to use referential integrity and all the nice things you get with a DBMS you can't have the pending data in the same table. But there is no need for there to be unstructured data- pending data is still structured and presumably you want the db to play its part in enforcing rules even on this data. Even if you didn't, pending data fits well into a standard table structure.
A separate set of tables sounds the right answer. You can bring the primary key of the row being changed into the pending table so you know what item is being edited, or what item is being linked to.
I don't know your situation exactly so this might not be appropriate, but an idea would be to have a separate table for storing the batch of edits that are being made, because then you can quality control a batch, or submit a batch to live. Each pending table could have a batch key so you know what batch it is part of. You'll have to find a way to control multiple pending edits to the same rows (if you want to) but that doesn't seem too tricky a problem to solve.
I'm not sure if this fits but it might be worth looking into 'Master Data Management' tools such as SQL Server's Master Data Services.
'Unit of work' is a good name for 'data submission'.
You could serialize it to a different place, like (non-relational) document-oriented database, and only save to relational DB on approval.
Depends on how many of live data constraints still need to apply to the unapproved data.
I think second option is better. To manage this, you can use View which will contain both tables and you can work with this structure through view.
Another good approach is to use XML column in a separate table to store necessary data(because of unknown quantity/names of columns). You can create just one table with XML column ad column "Type" do determine which table this document is related with.
First scenerio seems to be good.
Add Status column in the table.There is no need to remove Nullable constraint just add one function to check the required fields based on flag like If flag is 1(incomplete) Null is allowed otherwise Not allowed.
regarding second doubt do you want to append the data or update the whole data.

Designing an append-only data access layer with LINQ to SQL

I have an application in mind which dicates database tables be append-only; that is, I can only insert data into the database but never update or delete it. I would like to use LINQ to SQL to build this.
Since tables are append-only but I still need to be able to "delete" data, my thought is that each table Foo needs to have a corresponding FooDeletion table. The FooDeletion table contains a foreign key which references a Foo that has been deleted. For example, the following tables describe the state "Foos 1, 2, and 3 exist, but Foo 2 and Foo 3 have been deleted".
Foo FooDeletion
id id fooid
---- -------------
1 1 2
2 2 3
3
Although I could build an abstraction on top of the data access layer which (a) prevents direct access to LINQ to SQL entities and (b) manages deletions in this manner, one of my goals is to keep my data access layer as thin as possible, so I'd prefer to make the DataContext or entity classes do the work behind the scenes. So, I'd like to let callers use Table<Foo>.DeleteOnSubmit() like normal, and the DAL knows to add a row to FooDeletion instead of deleting a row from Foo.
I've read through "Implementing Business Logic" and "Customizing the Insert, Update, and Delete Behavior of Entity Classes", but I can't find a concrete way to implement what I want. I thought I could use the partial method DataContext.DeleteFoo() to instead call ExecuteDynamicInsert(FooDeletion), but according to this article, "If an inapplicable method is called (for example, ExecuteDynamicDelete for an object to be updated), the results are undefined".
Is this a fool's errand? Am I making this far harder on myself than I need to?
You have more than one option - you can either:
a) Override SubmitChanges, take the change set (GetChangeSet()) and translate updates and deletes into inserts.
b) Use instead-of triggers db-side to change the updates/delete behavior.
c) Add a new Delete extension method to Table that implements the behavior you want.
d) ...or combine a+b+c as needed...
if you want a big-boy enterprise quality solution, you'd put it in the database - either b) from above or CRUD procedures <- my preference... triggers are evil.
If this is a small shop, not a lot of other developers or teams, or data of minimal value such that a second or third app trying to access the data isn't likely than stick with whatever floats your boat.

One mysql table with many fields or many (hundreds of) tables with fewer fields?

I am designing a system for a client, where he is able to create data forms for various products he sales him self.
The number of fields he will be using will not be more than 600-700 (worst case scenario). As it looks like he will probably be in the range of 400 - 500 (max).
I had 2 methods in mind for creating the database (using meta data):
a) Create a table for each product, which will hold only fields necessary for this product, which will result to hundreds of tables but with only the neccessary fields for each product
or
b) use one single table with all availabe form fields (any range from current 300 to max 700), resulting in one table that will have MANY fields, of which only about 10% will be used for each product entry (a product should usualy not use more than 50-80 fields)
Which solution is best? keeping in mind that table maintenance (creation, updates and changes) to the table(s) will be done using meta data, so I will not need to do changes to the table(s) manually.
Thank you!
/**** UPDATE *****/
Just an update, even after this long time (and allot of additional experience gathered) I needed to mention that not normalizing your database is a terrible idea. What is more, a not normalized database almost always (just always from my experience) indicates a flawed application design as well.
i would have 3 tables:
product
id
name
whatever else you need
field
id
field name
anything else you might need
product_field
id
product_id
field_id
field value
Your key deciding factor is whether normalization is required. Even though you are only adding data using an application, you'll still need to cater for anomalies, e.g. what happens if someone's phone number changes, and they insert multiple rows over the lifetime of the application? Which row contains the correct phone number?
As an example, you may find that you'll have repeating groups in your data, like one person with several phone numbers; rather than have three columns called "Phone1", "Phone2", "Phone3", you'd break that data into its own table.
There are other issues in normalisation, such as transitive or non-key dependencies. These concepts will hopefully lead you to a database table design without modification anomalies, as you should hope for!
Pulegiums solution is a good way to go.
You do not want to go with the one-table-for-each-product solution, because the structure of your database should not have to change when you insert or delete a product. Only the rows of one or many tables should be inserted or deleted, not the tables themselves.
While it's possible that it may be necessary, having that many fields for something as simple as a product list sounds to me like you probably have a flawed design.
You need to analyze your potential table structures to ensure that each field contains no more than one piece of information (e.g., "2 hammers, 500 nails" in a single field is bad) and that each piece of information has no more than one field where it belongs (e.g., having phone1, phone2, phone3 fields is bad). Either of these situations indicates that you should move that information out into a separate, related table with a foreign key connecting it back to the original table. As pulegium has demonstrated, this technique can quickly break things down to three tables with only about a dozen fields total.