Unexpected results after joining another table

Unexpected results after joining another table - sql

I use three tables to get to the final result. They are called project_board_members, users and project_team.
This is the query:
SELECT `project_board_members`.`member_id`,
`users`.`name`,
`users`.`surname`,
`users`.`country`,
`project_team`.`tasks_completed`
FROM `project_board_members`
JOIN `users`
ON (`users`.`id` = `project_board_members`.`member_id`)
JOIN `project_team`
ON (`project_team`.`user_id` = `project_board_members`.`member_id`)
WHERE `project_board_members`.`project_id` = '5'
You can ignore last line because it just points to the project I'm using.
Table project_board_members holds three entries and have structure like:
id,
member_id,
project_id,
created_at;
I need to get member_id from that table. Then I join to users table to get name, surname and country. No problems. All works! :)
After that, I needed to get tasks_completed for each user. That is stored in project_team table. The big unexpected thing is that I got four entries returned and the big what-the-f*ck is that in the project_board_members table are only three entries.
Why is that so? Thanks in advice!

A SQL join creates a result set that contains one row for each combination of the left and right tables that matches the join conditions. Without seeing the data or a little more information it's hard to say what exactly is wrong from what you expect, but I'm guessing it's one of the following:
1) You have two entries in project_team with the same user_id.
2) Your entries in project_team store both user_id and project_id and you need to be joining on both of them rather than just user_id.

The table project_board_members represent what is called in the Entity-Relationship modelling world an "associative entity". It exists to implement a many-to-many relationship (in this case, between the project and user entities. As such it is a dependent entity, which is to say that the existence of an instance of it is predicated on the existence of an instance of each of the entities to which it refers (a user and a project).
As a result, the columnns comprising the foreign keys relating to those entities (member_id and project_id) must be form part or all of the primary key.
Normally, instances of an associative entity are unique WRT the entities to which it relates. In your case the relationship definitions would be:
Each user is seated on the board of 0-to-many projects;
Each project's board is comprise of 0-to-many users
which is to say that a particular user may not be on the board of a particular project more than once. The only reason for adding other columns (such as your id column) to the primary key would be if the user:project relationship is non-unique.
To enforce this rule -- a user may sit on the board a particular project just once -- the table schema should look like this:
create table project_board_member
(
member_id int not null foreign key references user ( user_id ) ,
project_Id int not null foreign key references project ( project_id ) ,
created_at ...
...
primary key ( member_id , project_id ) ,
)
}
The id column is superfluous.

For debugging purposes do
SELECT GROUP_CONCAT(pbm.member_id) AS member_ids,
GROUP_CONCAT(u.name) as names,
GROUP_CONCAT(u.surname) as surnames,
GROUP_CONCAT(u.country) as countries,
GROUP_CONCAT(pt.tasks_completed) as tasks
FROM project_board_members pbm
JOIN users u
ON (u.id = pbm.member_id)
JOIN project_team pt
ON (pt.user_id = pbm.member_id)
WHERE pbm.project_id = '5'
GROUP BY pbm.member_id
All the fields that list multiple entries in the result are messing up the rowcount in your resultset.
To Fix that you can do:
SELECT pbm.member_id
u.name,
u.surname,
u.country,
pt.tasks_completed
FROM (SELECT
p.project_id, p.member_id
FROM project_board_members p
WHERE p.project_id = '5'
LIMIT 1
) AS pbm
JOIN users u
ON (u.id = pbm.member_id)
JOIN project_team pt
ON (pt.user_id = pbm.member_id)

Related

select items where id is contained in another table field

I have the following database schema in sqlite3:
Basically, a member has multiple characters. A character plays in an activity (with a mode type) and has results for that activity (character_activity_stats)
I select all of the stats (activity / character_activity_stats) for a specific character and mode like so:
SELECT
*,
activity.mode as activity_mode,
character_activity_stats.id as character_activity_stats_index
FROM
character_activity_stats
INNER JOIN
activity ON character_activity_stats.activity = activity.id,
modes ON modes.activity = activity.id
WHERE
modes.mode = 5 AND
character_activity_stats.character = 1
This works great.
However, now I want to select the same set of data, but by member (basically combine results for all characters for a member).
However, I am not really sure how to even approach this.
Basically, I need to retrieve all character_activity_stats where character_activity_stats.character is a character of the specified member (by id). Any suggestions or pointers? (I am very new to sql).

Join those 3 tables on the right keys:
select *
from character_activity_stats
join character on character_activity_stats.character = character.id
join member on member.id = character.member
where member.id = ?
If you don't need any data from member other than limit by id, then you leave that join off and just do character.member = ? instead.
It's much easier if you use the same name for the primary and foreign keys (i.e. don't use id for the primary key). It also allows you use natural joins so you don't even need to give the join conditions. For the primary key to convention is usually _id. You id and _in in most of the tables, so I don't what is that is about.

How can I automatically add a foreign key to a table on select which is stored in an associated other?

I have a database with some tables among which these 3:
organizations
organization_id PK
teams
team_id PK
team_organization_id FK
users
user_id PK
teams_users
tu_id PK
tu_team_id FK
tu_user_id FK
So:
Team belongs to Organization
Organization has many Teams
User has and belongs to many Teams
Team has and belongs to many Users
And, consequently User has to belong to an Unique Organization. When I select some users, I would want to know (always) to which organization it depends.
What is the best way to deal with this ?
I think for now that perfect solution would be if I could get the organization_id called for example user_organization_id in my all select results on users.
Am I true on this point ? How to do that correctly ?
My database runs on PostgreSQL (9.3).

When I select some users, I would want to know (always) to which
organization it depends.
Use joins:
SELECT u.*, t.team_organization_id AS organization_id
FROM users u
JOIN teams_users tu ON tu_user_id = u.user_id;
JOIN teams t ON t.team_id = tu.tu_team_id
WHERE tu_user_id = $user_id;
To get that automatically, you could create a VIEW encapsulating the query:
CREATE VIEW usr_org As
<query from above>
Then instead of SELECT * FROM users, use:
SELECT * FROM usr_org;
More about views in the manual.

Inserting Into and Maintaining Many-to-Many Tables

SQLite3 user.
I have read thru numerous books on relational DBs and SQL and not one shows how to maintain the linking tables for many-to-many relationships. I just went through a book that went into the details of SELECT and JOINS with examples, but then glosses over the same when many-to-many relationships are covered. The author just showed some pseudo code for a table, no data, and then a pseudo code query--WTF? I am probably missing something, but it has become quite maddening.
Anyways, say I have a table like [People] with 3 columns: pID (primary), name and age. A table [Groups] with 3 columns: gID (primary), groupname and years. Since people can belong to multiple groups and groups can have multiple people, I set up a linking table called [peoplegroups] with two columns: pID, and gID both of which come from their respective tables.
So ,how do I efficiently get data into the linking table when INSERTING on the others and how do I get data out using the linking table?
Example: I want to INSERT "Jane" into [people] and make her a member of group gID 2, "bowlers" and update the linking table {peoplegroups] at the same time.
Later I want to go back and pull out a list of all of the bowlers or all the groups a person is part of.

If you already don't use primary and foreign keys (which you should!) I think you may need to consider using triggers in your design as well? So if you have a specific set of rules (e.g. if you want to create Jane with id = 1 and choose existing group 2, then after insert jane into people automatically create an entry pair personid=1,groupid=2 in the table peoplegroups. You can also create views with specific selects to see the data you want, for example if you want a query where you only show the peoples names and groups names you could create a view 'PeopleView':
SELECT P.PersonName, G.GroupName
FROM People P
INNER JOIN PeopleGroup PG ON P.PersonID = PG.PersonID
INNER JOIN Group G ON G.GroupId = PG.GroupID
then you can query 'PeopleView' saying
SELECT * FROM PeopleView WHERE GroupName = 'bowlers'

When inserting new data into the tables mentioned, the "linking" table that you are referring to needs to contain both primary keys from the other tables as foreign keys. So basically The [People] tables (pID) and the [Groups] table (gID) should both be foreign keys in the [PeopleGroups] table. In order to create a new "link" in [PeopleGroups] the record has to already exist in the [People] table as well as the [Groups] table BEFORE you try and create the link in the [PeopleGroups] table. I hope this helps

SQL Database Design Many to Many

I am creating a database based on a sporting game to store matches and each player involved in each match. I am having trouble resolving a many to many relationship. I currently have the following tables:
Player
id
name
Match
id
date
PlayerMatch
player_id
match_id
home_team
goals_for
goals_against
There will always be a minimum of two players in a match. Is this the best design for this approach?

I would recommend a sticking with a many to many relationship. This allows you to change the specifications of how many players you can have in a game easily while not complicating the data model much.
Player
id
name
Match
id
date
PlayerMatch
player_id
match_id
is_home
goals_for
goals_against
Foreign key from PlayerMatch to Player
Foreign key from PlayerMatch to Match
--All the matches a player has played in.
SELECT m.*
FROM Player p
JOIN PlayerMatch pm
ON p.id = pm.player_id
JOIN Match m
ON m.id = pm.match_id
WHERE p.id = /*your player Id*/
--All the players in a match
SELECT p.*
FROM Match m
JOIN PlayerMatch pm
ON m.id = pm.match_id
JOIN Player p
ON p.id = pm.player_id
WHERE m.id = /*your match Id*/
--player information for a single match.
SELECT pm.*
FROM Player p
JOIN PlayerMatch pm
ON p.id = pm.player_id
JOIN Match m
ON m.id = pm.match_id
WHERE p.id = /*your player Id*/
AND m.id = /*your match Id*/

That is a valid option, though I would suggest a naming convention where you use the same column name in both tables (i.e. use match_id in both Match and PlayerMatch; same for player_id). This helps make your SQL a bit more clear and when doing joins in some databases (MySQL) you can then use the 'using (col1, col2, ...)' syntax for the joins.

I wouldn't use the many-to-many relationship, I would do like this:
Player
id
name
Match
id
home_player_id
guest_player_id
date
goals_home_player
goals_guest_player

I think I'd try to model the match first & then see what happens with the table design :
Match
-------------
match_Id
player1_Id
player2_Id
player1_Goals
player2_Goals
Where player1_Id and Player2_Id are both foreign keys onto the Player table
Player
---------
Id
Name
By convention player1 would always be the home team
then you would query it like
Select p1.name as player1_Home, p2.name as player2_away,
matchId,
player1_Goals as homeGoals, player2_Goals as awayGoals
from Match m
inner join Player p1 on p1.id = m.Player1_Id
inner join Player p2 on p2.id = m.Player2_Id

This sort of data relationship is not at all unnatural. To set it up, just ask yourself two questions:
Do players have more than one match?
Do matches have more than one player?
If the answer is yes to both, then you have a many-to-many relationship and these are not at all uncommon. Their implementation is only slightly more complicated. In a one-to-many relationship, you'd hold a foreign key to a list of records in some table. As it happens, this is still how it works in many-to-many relationships, except that both the Players and the Matches table will need a foreign key to some list of records.
This list is called the Bridge Table. So you'll need to use a total of three tables to descrive the relationship
Players
-------
player_id
<player attribute columns, eg last_name, first_name, goals_scored, etc.>
Player_Match
------------
player_id
match_id
Matches
-------
match_id
<a list of columns that are match attributes, eg. match date, etc.>
The table in the middle of the diagram above is called a bridge table, and it does nothing more than map players to matches, and it also maps matches to a list of players. Often, bridge tables have only 2 columns, each representing a foreign key to one of the bridged tables. There is no need for a primary key in a bridge table, and if there is not one, it means that a player can have more than one of the same match. If a player can have only one of one kind of match, then make the primary key for each row of the bridge table a composite key on both of the columns.
In database design, normalization is a highly desirable relational goal because it provides a database with the greatest possible flexibility and the lowest amount of redundancy. To normalize, ask yourself if the data you want to put in a table is -really- an actual attribute of the object described by the primary key. For example, is the home_team an actual attribute of the match. I would say no, it is not. In this case, you should replace home_team in your PlayerMatch table with a foreign key to a Teams table. In your Matches table, you ought to have two columns. One for a home team foreign key, and one for the away team key. The teams are not actual attributes of a match and so to normalize the Match table, you'd want to put those data in tables of their own.

Agree with M Hagopian, the op schema looks like a good start.

Design : multiple visits per patient

Above is my schema. What you can't see in tblPatientVisits is the foreign key from tblPatient, which is patientid.
tblPatient contains a distinct copies of each patient in the dataset as well as their gender. tblPatientVists contains their demographic information, where they lived at time of admission and which hospital they went to. I chose to put that information into a separate table because it changes throughout the data (a person can move from one visit to the next and go to a different hospital).
I don't get any strange numbers with my queries until I add tblPatientVisits. There are just under one millions claims in tblClaims, but when I add tblPatientVisits so I can check out where that person was from, it returns over million. I thinkthis is due to the fact that in tblPatientVisits the same patientID shows up more than once (due to the fact that they had different admission/dischargedates).
For the life of me I can't see where this is incorrect design, nor do I know how to rectify it beyond doing one query with count(tblPatientVisits.PatientID=1 and then union with count(tblPatientVisits.patientid)>1.
Any insight into this type of design, or how I might more elegantly find a way to get the claimType from tblClaims to give me the correct number of rows with I associate a claim ID with a patientID?
EDIT: The biggest problem I'm having is the fact that if I include the admissionDate,dischargeDate or the patientStatein the tblPatient table I can't use the patientID as a primary key.
It should be noted that tblClaims are NOT necessarily related to tblPatientVisits.admissionDate, tblPatientVisits.dischargeDate.
EDIT: sample queries to show that when tblPatientVisits is added, more rows are returned than claims
SELECT tblclaims.id, tblClaims.claimType
FROM tblClaims INNER JOIN
tblPatientClaims ON tblClaims.id = tblPatientClaims.id INNER JOIN
tblPatient ON tblPatientClaims.patientid = tblPatient.patientID INNER JOIN
tblPatientVisits ON tblPatient.patientID = tblPatientVisits.patientID
more than one million query rows returned
SELECT tblClaims.id, tblPatient.patientID
FROM tblClaims INNER JOIN
tblPatientClaims ON tblClaims.id = tblPatientClaims.id INNER JOIN
tblPatient ON tblPatientClaims.patientid = tblPatient.patientID
less than one million query rows returned

I think this is crying for a better design. I really think that a visit should be associated with a claim, and that a claim can only be associated with a single patient, so I think the design should be (and eliminating the needless tbl prefix, which is just clutter):
CREATE TABLE dbo.Patients
(
PatientID INT PRIMARY KEY
-- , ... other columns ...
);
CREATE TABLE dbo.Claims
(
ClaimID INT PRIMARY KEY,
PatientID INT NOT NULL FOREIGN KEY
REFERENCES dbo.Patients(PatientID)
-- , ... other columns ...
);
CREATE TABLE dbo.PatientVisits
(
PatientID INT NOT NULL FOREIGN KEY
REFERENCES dbo.Patients(PatientID),
ClaimID INT NULL FOREIGN KEY
REFERENCES dbo.Claims(ClaimID),
VisitDate DATE
, -- ... other columns ...
, PRIMARY KEY (PatientID, ClaimID, VisitDate) -- not convinced on this one
);
There is some redundant information here, but it's not clear from your model whether a patient can have a visit that is not associated with a specific claim, or even whether you know that a visit belongs to a specific claim (this seems like crucial information given the type of query you're after).
In any case, given your current model, one query you might try is:
SELECT c.id, c.claimType
FROM dbo.tblClaims AS c
INNER JOIN dbo.tblPatientClaims AS pc
ON c.id = pc.id
INNER JOIN dbo.tblPatient AS p
ON pc.patientid = p.patientID
-- where exists tells SQL server you don't care how many
-- visits took place, as long as there was at least one:
WHERE EXISTS (SELECT 1 FROM dbo.tblPatientVisits AS pv
WHERE pv.patientID = p.patientID);
This will still return one row for every patient / claim combination, but it should only return one row per patient / visit combination. Again, it really feels like the design isn't right here. You should also get in the habit of using table aliases - they make your query much easier to read, especially if you insist on the messy tbl prefix. You should also always use the dbo (or whatever schema you use) prefix when creating and referencing objects.

I'm not sure I understand the concept of a claim but I suspect you want to remove the link table between claims and patient and instead make the association between patient visit and a claim.
Would that work out better for you?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas