DB Design: Same Column used for 2 different foreign keys - sql

I'm developing a method of joining 2 sources of Data (e.g. Queries).
I have a table Named QueryField with the following structure:
QueryID
FieldID
FieldName
....
If I have 2 records on QueryField
QueryID FieldID FieldNAme
------------ --------- ----------
1 1 CustomerID
1 2 CustAddress
2 3 CustNo
2 4 CustomerPhone
I want to have a new table QueryFieldJoin which defines which fields in the 2 queries to use to join on. My idea was to have the following structure
LeftJoinFieldID (FK from FieldID of QueryField)
RightJoinFieldID (also FK from FieldID of QueryField)
JoinType (intersect, outer join).
PrimaryKey is a combination of LeftJoinFieldID and RightJoinFieldID
LeftJoinFieldID RightJoinFieldId JoinType
-------------- ---------------- --------
1 3 Intersect
This will work, however I feel that this isn't the best DB design having the same field as a foreign to two different columns on another table. Can anybody suggest a better approach?

The DB Design also depends on what are your needs:
1) Which queries do you need to answer?
2) How fast do you need to access those data?
From an expressive POV, your design can be correct but maybe not the best solution depending on which queries you need to run.
For Instance, you might consider to have three different tables: One for the Fields, one for The Queries and one Operations.
Or even one big table with everything there if you do not want to perform any Join.

Related

Modifying column in access

I have 2 tables in MS Access, TableA and TableB. Table A has only 1 field: myFieldID, and TableB has only 1 field: myFieldName (In reality I have more fields, but these are the ones that matter for the sake of my problem).
Both tables have records that mean the same thing, but written in a different, but similar way.
For instances TableA has:
|TableA.myFieldId |
|-----------------|
|MM0001P |
|HR0003P |
|MH0567P |
So as you can see all of the records are formated this way (with a P at the end):
([A-Z][A-Z][0-9][0-9][0-9][0-9]P)
then, TableB has:
|TableB.myFieldName |
|--------------------------------------------|
|MH-0567 Materials Handling important Role |
|MM-0001 Materials Management Minor Role |
|HR-0003 Human Resources Super Important Role|
So this one has the format (without 'P' at the end):
([A-Z][A-Z]-[0-9][0-9][0-9][0-9] ([A-Z]|[a-z]*))
First, I would like to make join queries with tableA and tableB on these fields, but as you can see, results will be NULL every time since both fields have completely different records.
So I would like to change every name in TableA.myFieldId with his corresponding name in TableB.myFieldName
Problem is, that both tables have around 1 million records, and the fields are repeated multiple times in both tables, plus I don't know how to do this (MS Access doesn't even let me use Regular Expressions).
I would make a table (or query, if it changes often enough) of all unique entries in the 2nd table and the corresponding key for the 1st table. Then use that table or query to help join the two tables.
Something like
Select myFieldName as FName, left(myFieldName,2) & mid(myFieldName,4,4) & "P" as FID
from TableB
group by FName, FID
Important note - are all IDs found in both files, or do you have records in either table that are not in the other? If they don't always match, you may need additional logic or steps to make a master table from both tableA and tableB.

Junction table joining multiple tables

I have a single table with images which I need to link to 6 other tables. Let's say those tables are - Users, Tables, Foods, Restaurants, Categories, and Ships.
Should I create 6 different junction tables so each table has it's own junction table - Images_Users, Images_Tables, Images_Restaurants etc..?
Or is it better to create one table with a field to distinguish where it links -
Images_Entity with fields- Id, Image_Id, Entity_Id, Entity_Type(I use this to distinguish whether its a user table, foods, or whatever). I don't like this solution since I will lack FK constraint in this case, but I'm leaning towards since the project will already have a large number of tables.
Perhaps there is a third approach? Create 6 image tables? Which solution is the best performance wise?
EDIT*
Database will be used to display data, insert, update performance is not an issue, only select statements. I just figured out that no image can link to two entries(This makes the junction tables redundant).
Let me rephrase question entirely- What is the best way to connect Table with only one of the 6 other tables using a one to many association?
So Images table should contain FK and can link to only one of the 6 tables, never two at the same time.
One possible approach is to add a UserId, RestaurantId, TableId, FoodId etc to the Images table.
That way, you can add a proper FK to each of those columns.
Using a constraint or trigger (depending on the DBMS), you can enforce that exactly one of those fields is not null. The actual validation of that one filled in id is handled by the FK constraints.
This way it is fairly easy to enfore all the rules you want to have.
With separate junction tables this is harder to manage. When you insert a junction row for a Table_Image, you must validate whether there is no such record for any of the other entities in the other junction tables.
You could use exclusive FKs or inheritance as described in this post.
If you opt for the former, the CHECK employed for exclusive FKs would need to use slightly different syntax (from what was used in the link) to be practical:
CHECK (
(
CASE WHEN UserID IS NULL THEN 0 ELSE 1 END
+ CASE WHEN TableID IS NULL THEN 0 ELSE 1 END
+ CASE WHEN FoodID IS NULL THEN 0 ELSE 1 END
+ CASE WHEN RestaurantID IS NULL THEN 0 ELSE 1 END
+ CASE WHEN CategoryID IS NULL THEN 0 ELSE 1 END
+ CASE WHEN ShipID IS NULL THEN 0 ELSE 1 END
)
= 1
)
The CHECK above ensures only one of the FKs is non-NULL at any given time.
BTW, your suspicions about "dynamic" FK (with the "type" field) were correct.

How to merge two identical database data to one?

Two customers are going to merge. They are both using my application, with their own database. About a few weeks they are merging (they become one organisation). So they want to have all the data in 1 database.
So the two database structures are identical. The problem is with the data. For example, I have Table Locations and persons (these are just two tables of 50):
Database 1:
Locations:
Id Name Adress etc....
1 Location 1
2 Location 2
Persons:
Id LocationId Name etc...
1 1 Alex
2 1 Peter
3 2 Lisa
Database 2:
Locations:
Id Name Adress etc....
1 Location A
2 Location B
Persons:
Id LocationId Name etc...
1 1 Mark
2 2 Ashley
3 1 Ben
We see that person is related to location (column locationId). Note that I have more tables that is referring to the location table and persons table.
The databases contains their own locations and persons, but the Id's can be the same. In case, when I want to import everything to DB2 then the locations of DB1 should be inserted to DB2 with the ids 3 and 4. The the persons from DB1 should have new Id 4,5,6 and the locations in the person table also has to be changed to the ids 4,5,6.
My solution for this problem is to write a query which handle everything, but I don't know where to begin.
What is the best way (in a query) to renumber the Id fields also having a cascade to the childs? The databases does not containing referential integrity and foreign keys (foreign keys are NOT defined in the database). Creating FKeys and Cascading is not an option.
I'm using sql server 2005.
You say that both customers are using your application, so I assume that it's some kind of "shrink-wrap" software that is used by more customers than just these two, correct?
If yes, adding special columns to the tables or anything like this probably will cause pain in the future, because you either would have to maintain a special version for these two customers that can deal with the additional columns. Or you would have to introduce these columns to your main codebase, which means that all your other customers would get them as well.
I can think of an easier way to do this without changing any of your tables or adding any columns.
In order for this to work, you need to find out the largest ID that exists in both databases together (no matter in which table or in which database it is).
This may require some copy & paste to get a lot of queries that look like this:
select max(id) as maxlocationid from locations
select max(id) as maxpersonid from persons
-- and so on... (one query for each table)
When you find the largest ID after running the query in both databases, take a number that's larger than that ID, and add it to all IDs in all tables in the second database.
It's very important that the number needs to be larger than the largest ID that already exists in both databases!
It's a bit difficult to explain, so here's an example:
Let's say that the largest ID in any table in both databases is 8000.
Then you run some SQL that adds 10000 to every ID in every table in the second database:
update Locations set Id = Id + 10000
update Persons set Id = Id + 10000, LocationId = LocationId + 10000
-- and so on, for each table
The queries are relatively simple, but this is the most work because you have to build a query like this manually for each table in the database, with the correct names of all the ID columns.
After running the query on the second database, the example data from your question will look like this:
Database 1: (exactly like before)
Locations:
Id Name Adress etc....
1 Location 1
2 Location 2
Persons:
Id LocationId Name etc...
1 1 Alex
2 1 Peter
3 2 Lisa
Database 2:
Locations:
Id Name Adress etc....
10001 Location A
10002 Location B
Persons:
Id LocationId Name etc...
10001 10001 Mark
10002 10002 Ashley
10003 10001 Ben
And that's it! Now you can import the data from one database into the other, without getting any primary key violations at all.
If this were my problem, I would probably add some columns to the tables in the database I was going to keep. These would be used to store the pk values from the other db. Then I would insert records from the other tables. For the ones with foreign keys, I would use a known value. Then I would update as required and drop the columns I added.

What is the best way to copy data from related tables to another related tables?

What is the best way to copy data from related tables to another related tables with same schema. Table are connected with one-to-many relationship.
Consider following schema
firm
id | name | city.id (FK)
employee
id | lastname | firm.id (FK)
firm2
id | name | city_id (FK)
employee2
id | lastname |firm2.id (FK)
What I want to do is to copy rows from firm with specific city.id to firm2 and and their employees assosiated with firm to table employee2.
I use posgresql 9.0 so I have to call SELECT nextval('seq_name') to get new id for table.
Right now I perform this query simply iterating over all rows in Java backend server, but on huge amount of data (50 000 employee and 2000 of firms) it takes too much time ( 1-3 minutes).
I'm wondering is there another more tricky way to do it, for example select data into temp table? Or probably use store procedure and iterate over rows with cursror to avoid buffering on my backend server?
This is one problem caused by simply using a sequence or identity value as your sole primary key in a table.
If there is a real-life unique index/primary key, then you can join on that. The other option would be to create a mapping table as you fill in the tables with sequences then you can insert into the children tables' FKs by joining to the mapping tables. It doesn't completely remove the need for looping, but at least some of the inserts get moved out into a set-based approach.

Basic question: how to properly redesign this schema

I am hopping on a project that sits on top of a Sql Server 2008 DB with what seems like an inefficient schema to me. However, I'm not an expert at anything SQL, so I am seeking for guidance.
In general, the schema has tables like this:
ID | A | B
ID is a unique identifier
A contains text, such as animal names. There's very little variety; maybe 3-4 different values in thousands of rows. This could vary with time, but still a small set.
B is one of two options, but stored as text. The set is finite.
My questions are as follows:
Should I create another table for names contained in A, with an ID and a value, and set the ID as the primary key? Or should I just put an index on that column in my table? Right now, to get a list of A's, it does "select distinct(a) from table" which seems inefficient to me.
The table has a multitude of columns for properties of A. It could be like: Color, Age, Weight, etc. I would think that this is better suited in a separate table with: ID, AnimalID, Property, Value. Each property is unique to the animal, so I'm not sure how this schema could enforce this (the current schema implies this as it's a column, so you can only have one value for each property).
Right now the DB is easily readable by a human, but its size is growing fast and I feel like the design is inefficient. There currently is not index at all anywhere. As I said I'm not a pro, but will read more on the subject. The goal is to have a fast system. Thanks for your advice!
This sounds like a database that might represent a veterinary clinic.
If the table you describe represents the various patients (animals) that come to the clinic, then having properties specific to them are probably best on the primary table. But, as you say column "A" contains a species name, it might be worthwhile to link that to a secondary table to save on the redundancy of storing those names:
For example:
Patients
--------
ID Name SpeciesID Color DOB Weight
1 Spot 1 Black/White 2008-01-01 20
Species
-------
ID Species
1 Cocker Spaniel
If your main table should be instead grouped by customer or owner, then you may want to add an Animals table and link it:
Customers
---------
ID Name
1 John Q. Sample
Animals
-------
ID CustomerID SpeciesID Name Color DOB Weight
1 1 1 Spot Black/White 2008-01-01 20
...
As for your original column B, consider converting it to a boolean (BIT) if you only need to store two states. Barring that, consider CHAR to store a fixed number of characters.
Like most things, it depends.
By having the animal names directly in the table, it makes your reporting queries more efficient by removing the need for many joins.
Going with something like 3rd normal form (having an ID/Name table for the animals) makes you database smaller, but requires more joins for reporting.
Either way, make sure to add some indexes.