Performing a Multitable Query without Cartesian Product - multi-table

Just learning MS Access 2013 and feeling stuck with how to retrieve the data I want using a multitable query. I have two tables: one stores a list of organizations and the other stores a list of individual names. In a third table, I have a lookup field that I'd like to be populated by all of these organizations and individuals. When I use the Query Design tool to try to make this happen, the only thing I can seem to produce is a cartesian product. Any suggestions?

You definitely don't want the cartesian product.
While I'm not too familiar with MS Access (SQL is more my realm), MS Access is just a front-end to a database, and it actually turns what you design with the Query Design tool into a SQL dialect called Access SQL.
Here's what you want to do: link the Organizations table to the Individuals with the join criteria being Organizations's primary key on the Individuals's foreign key. So if your tables look like this...
Individuals table:
Name | OrganizationId
____ | ______________
John | 1
Organizations table:
OrganizationId | OrganizationName
______________ | ______________
1 | StackOverflow
You want to have design your query so that Individual.OrganizationId (foreign key) is joined to Organization.OrganizationId (primary key)

Related

PHP Junction Table Relations (Many to Many), grasping concept

So I've tried searching and have yet to find out how to grasp this entirely.
I'm reorganising my database because I was storing user id's as comma separated values in a column withing that row to control permissions. To me, this seems like a better and faster(hardware) way, but I'm moving towards this "proper" way now.
I understand that you need 3 tables. This is what I have.
Table 1. members -> ID | user_name
Table 2. teams -> ID | team_name
Table 3. team_members -> ID | team_fk | member_fk
I understand how to store data in another column and use sql data to display it. What I'm confused about is why I have to link(relation) the columns to the ID's of the other tables. I could get the data without using the relation. I'm confused by what it even does.
Furthermore, I would like to have multiple values that determine permissions for each team. Would I do:
Table 3. team_members -> ID | team_fk | member_fk | leader_fk | captain_fk
^setting 0 or 1(true or false) for the leader and captain.
Or would I create a table(like team_leaders, team_captains) for each permission?
Thanks for the help!
Ryan
It seems that "leader", "captain and "regular member" are roles in your team. So you can create table team_roles, or just assign roles as strings to your relation table, i.e.
team_members -> ID | team_fk | member_fk | role
The key thing about this is to keep your database [normalised]https://en.wikipedia.org/wiki/Database_normalization. It is really easier to work with normalised database in most cases.
What I'm confused about is why I have to link(relation) the columns to the ID's of the other tables. I could get the data without using the relation.
You don't have to declare columns as foreign keys. It's just a good idea. It serves the following purposes:
It tells readers of the schema how the tables are related to each other. If you name the columns well, this is redundant -- team_fk is pretty obviously a reference to the teams table.
It enables automatic integrity checks by the database. If you try to create a team_members row that contains a team_fk or member_fk that isn't in the corresponding table, it will report an error. Note that in MySQL, this checking is only done by the InnoDB engine, not MyISAM.
Indexes are automatically created for the foreign key columns, which helps to optimize queries between the tables.
Table 3. team_members -> ID | team_fk | member_fk | leader_fk | captain_fk
If leader and captain are just true/false values, they aren't foreign keys. A foreign key column contains a reference to a key in another table. So I would call these is_leader and is_captain.
But you should only put these values in the team_members table if a team can have multiple captains and leaders. If there's just one of each, they should be in the teams table:
teams -> ID | team_name | leader_fk | captain_fk
where leader_fk and captain_fk are IDs from the members table. This will ensure that you can't inadvertently assign is_captain = 1 to multiple members from the same team.

What should a relationships table look like - Need confirmation of my technique

Lets say I have 3 models:
User
Page
Comments
I asked a question based on if I should have each model keep track of its relationships: SQL relationships and best practices
an example of this would be a "Pages" table that states who its author was... The problem seemed to be that if 2 users were the author of the one page, you'd have to add a new specific table called PageRelationshipsWithUsers that might have a reference to the PageID and the UserID that created it and a separate row for the co-author.
Understandably this sounds a bit naff. I would end up with a heck load of relation tables and most likely, it could be replaced with just the one multi-purpose relationship table... So I decided to come up with a relationships table like the following:
Relationships Table New
RelationshipID | ItemID | LinkID | ItemType | LinkType | Status
-----------------------------------------------------------------------------
1 | 23(PageID) | 7(UserID) | ("Page") | ("User") | TRUE
2 | 22(CommentID) | 7(UserID) | ("Comment") | ("User") | TRUE
3 | 22(CommentID) | 23(PageID) | ("Comment") | ("Page") | TRUE
however, I would very much appreciate some input as to how good of an idea laying out my relationships table like this is.
Any thoughts?
Answer was told to me by a work colleague:
Imagine the above relationships table for the model "Book"
A User can Rent a book, so the relation is User -> Book...
But what if he can buy a book too: User->Book....
Ooops, we need a new relationship... and considering this relationship table was supposed to be the 1 size fits all, we now have a requirement to add a new separate table... whoops.
So the answer is NO NO NO. don't, it's naughty. Keep your relationship tables separate and specific.
Your suggestion for a relationship table is not optimal for several reasons:
It's difficult to write queries that join tables through the relationship table, as you will need filters on the ItemType and LinkType columns, which is not intuitive when writing queries.
If a need arises to add new entities in the future, that use different datatypes for their primary keys, you cannot easily store ID's of various datatypes in your ItemID and LinkID columns.
You cannot create explicit foreign keys in your database, to enforce referential integrity, which is possibly the best reason to avoid the design you suggest.
Query performance might suffer.
When normalizing a database, you should not be afraid to have many tables. Just make sure to use a naming convention that makes sense and is self-documenting. For example, you could name the relation table between authors and pages "PageAuthors", instead of "Pages".

How to bond N database table with one master-table?

Lets assume that I have N tables for N Bookstores. I have to keep data about books in separate tables for each bookstore, because each table has different scheme (number and types of columns is different), however there are same set of columns which is common for all Bookstores table;
Now I want to create one "MasterTable" with only few columns.
| MasterTable |
|id. | title| isbn|
| 1 | abc | 123 |
| MasterToBookstores |
|m_id | tb_id | p_id |
| 1 | 1 | 2 |
| 1 | 2 | 1 |
| BookStore_Foo |
|p_id| title| isbn| date | size|
| 1 | xyz | 456 | 1998 | 3KB |
| 2 | abc | 123 | 2003 | 4KB |
| BookStore_Bar |
|p_id| title| isbn| publisher | Format |
| 1 | abc | 123 | H&K | PDF |
| 2 | mnh | 986 | Amazon | MOBI |
My question, is it right to keep data in such way? What are best-practise about this and similar cases? Can I give particular Bookstore table an aliase with number, which will help me manage whole set of tables?
Is there a better way of doing such thing?
I think you are confusing the concepts of "store" and "book".
From you comments and the example data, it appears the problem is in having different sets of attributes for books, not stores. If so, you'll need a structure similar to this:
The symbol: denotes inheritance1. The BOOK is the "base class" and BOOK1/BOOK2/BOOK3 are various "subclasses"2. This is a common strategy when entities share a set of attributes or relationships3. For the fuller explanation of this concept, please search for "Subtype Relationships" in the ERwin Methods Guide.
Unfortunately, inheritance is not directly supported by current relational databases, so you'll need to transform this hierarchy into plain tables. There are generally 3 strategies for doing so, as described in these posts:
Interpreting ER diagram
Parent and Child tables - ensuring children are complete
Supertype-subtype database design
NOTE: The structure above allows various book types to be mixed inside the same bookstore. Let me know if that's not desirable (i.e. you need exactly one type of books in any given bookstore)...
1 Aka. category, subclassing, subtyping, generalization hierarchy etc.
2 I.e. types of books, depending on which attributes they require.
3 In this case, books of all types are in the many-to-many relationship with stores.
If you had at least two columns which all other tables use it then you could have base table for all books and add more tables for the rest of the data using the id from Base table.
UPDATE:
If you use entity framework to connect to your DB I suggest you to try this:
Create your entities model something like this:
then let entity framework generate the database(Update database from Model) for you. Note this uses inheritance(not in database).
Let me know if you have questions.
Suggest data model:
1. Have a master database, which saves master data
2. The dimension tables in master database, transtional replicated to your distributed bookstore database
3. You can choose to use updatable scriscriber or merge replication is also a good choice
4. Each distributed bookstore database still work independently, however master data either merge back by merge replication or updatable subscriber.
5. If you want to make sure master data integrity, you can only read-only subscriber, and use transational replication to distribute master data into distributed database, but in this design, you need to have store proceduces in master database to register your dimension data. Make sure there is no double-hop issue.
I would suggest you to have two tables:
bookStores:
id name someMoreColumns
books:
id bookStore_id title isbn date publisher format size someMoreColumns
It's easy to see the relationship here: a bookStore has many books.
Pay attention that I'm putting all the columns you have in all of your BookStore tables in just one table, even if some row from some table does not have a value to some column.
Why I prefer this way:
1) To all the data from BookStore tables, just few columns will never have a value on table books (as example, size and format if you don't have an e-book version). The other columns can be filled someday (you can set a date to your e-books, but you don't have this column on your table BookStore_Bar, which seems to refer to the e-books). This way you can have much more detailed infos from all your books if someday you want to update it.
2) If you have a bunch of tables BookStore, lets say 12, you will not be able to handle your data easily. What I say is, if you want to run some query to all your books (which means to all your tables), you will have at least three ways:
First: run manually the query to each of the 12 tables and so merge the data;
Second: write a query with 12 joins or set 12 tables on your FROM clause to query all your data;
Third: be dependent of some script, stored procedure or software to do for you the first or the second way I just said;
I like to be able to work with my data as easy as possible and with no dependence of some other script or software, unless I really need it.
3) As of MySQL (because I know much more of MySQL) you can use partitions on your table books. It is a high level of data management in which you can distribute the data from your table to several files on your disk instead of just one, as generally a table is allocated. It is very useful when handling a large ammount of data in a same table and it speeds up queries based on your data distribution plan. Lets see an example:
Lets say you already have 12 distinct bookStores, but under my database model. For each row in your table books you'll have an association to one of the 12 bookStore. If you partition your data over the bookStore_id it will be almost the same as you had 12 tables, because you can create a partition for each bookStore_id and so each partition will handle only the related data (the data that match the bookStore_id).
Lets say you want to query the table books to the bookStore_id in (1, 4, 9). If your query really just need of these three partitions to give you the desired output, then the others will not be queried and it will be as fast as you were querying each separated table.
You can drop a partition and the other will not be affected. You can add new partitions to handle new bookStores. You can subpartition a partition. You can merge two partitions. In a nutshell, you can turn your single table books in an easy-to-handle, multi-storage table.
Side Effects:
1) I don't know all of table partitioning, so it's good to refer to the documentation to learn all important points to create and manage it.
2) Take care of data with regular backups (dumps) as you probably may have a very populated table books.
I hope it helps you!

What is the best way to copy data from related tables to another related tables?

What is the best way to copy data from related tables to another related tables with same schema. Table are connected with one-to-many relationship.
Consider following schema
firm
id | name | city.id (FK)
employee
id | lastname | firm.id (FK)
firm2
id | name | city_id (FK)
employee2
id | lastname |firm2.id (FK)
What I want to do is to copy rows from firm with specific city.id to firm2 and and their employees assosiated with firm to table employee2.
I use posgresql 9.0 so I have to call SELECT nextval('seq_name') to get new id for table.
Right now I perform this query simply iterating over all rows in Java backend server, but on huge amount of data (50 000 employee and 2000 of firms) it takes too much time ( 1-3 minutes).
I'm wondering is there another more tricky way to do it, for example select data into temp table? Or probably use store procedure and iterate over rows with cursror to avoid buffering on my backend server?
This is one problem caused by simply using a sequence or identity value as your sole primary key in a table.
If there is a real-life unique index/primary key, then you can join on that. The other option would be to create a mapping table as you fill in the tables with sequences then you can insert into the children tables' FKs by joining to the mapping tables. It doesn't completely remove the need for looping, but at least some of the inserts get moved out into a set-based approach.

SQL Server multiple many-to-many join query

I currently have a one main table with multiple other tables associated with it via many-to-many joins (with join tables). The application using this database needs to have search functionality that will print out multiple rows matching specific criteria, including all the values in the join tables. The values from the join tables also need to be links that will enable a search for all other rows that match that value. I am trying to figure out how to do this without taxing the database.
Here is an example of the table structure
**Metrics (Main Table)**
MetricID (pk)
Metric
**Domains (ValueList Table)**
DomainID (pk)
Domain
**MetricsDomains (Join Table)**
MetricsDomainsID (pk)
MetricID (fk)
DomainID (fk)
**MetricTypes (ValueList Table)**
MetricTypeID (pk)
MetricType
**MetricsMetricTypes (Join Table)**
MetricMetricTypesID (pk)
MetricID (fk)
MetricTypeID (fk)
**Studies (ValueList Table)**
StudyID (pk)
Study
**MetricsStudies (Join Table)**
MetricsStudiesID (pk)
MetricID (fk)
StudyID (fk)
When someone searches for a Metric by various criteria, they should get output in table format that looks something like this:
Metric1 | Description | Study1, Study2, Study3 | MetricType1, MetricType2 | Domain1, Domain2
Metric2 | Description | Study5, Study2, Study4 | MetricType2, MetricType3 | Domain5, Domain9
The Metric will be a link to the full description of the Metric. However, in addition, the Studies (ie., Study 1, Study 2, Study 3, etc.) and MetricTypes (MetricType1, Metric2, etc.) and Domains (Domain1, Domain 2, etc.) should also be links, that when clicked on, will perform a new search for all other metrics that contain that study, or type, or domain. This leads me to believe I will also need the primary key of the study, type, or domain, in addition to the text, in order to place in the href.
At any rate, considering one search could possibly return 20+ metrics, what I need to figure out is a good way to write an optimized query to return the results of the multiple many-to-many joins. I know that joining all of these tables in one query will generally result in a Cartesian product of all the joins, but I am not sure if there is another way to go about it. I have also read about a way I could return the many-to-many results as a comma-separated list in a field using a method like this:
SELECT m.MetricID, Description,
STUFF((
SELECT ', ' + s.Study
FROM Studies s, Metrics_Studies ms
WHERE s.StudyID = ms.StudyID AND ms.MetricID = m.MetricID
ORDER BY s.Study
FOR XML PATH('')
),1,1,'') as Study,
FROM Metrics m
WHERE Metric_PK = 13
However, I am not sure of the performance impact of this method, or whether it will really get me what I am looking for since I think I may need the primary keys of the Studies as well.
Any help would be appreciated.
Thanks!
I'd recommend doing it first with your multi-joins - only then will you know whether the perofrmance is good enough. As always you have to beware of premature optimisation. Once you have your query running correctly against your normalised model, you can check the query plan, etc. This may highlight that you need to flatten a few of your joins, if that's the case you probably will have to store the data in two different formats, one for reporting / one for searching, etc. But 1st thing is to see if performance is actually acceptable out of the box. Hopefully that helps.