Above is my schema. What you can't see in tblPatientVisits is the foreign key from tblPatient, which is patientid.
tblPatient contains a distinct copies of each patient in the dataset as well as their gender. tblPatientVists contains their demographic information, where they lived at time of admission and which hospital they went to. I chose to put that information into a separate table because it changes throughout the data (a person can move from one visit to the next and go to a different hospital).
I don't get any strange numbers with my queries until I add tblPatientVisits. There are just under one millions claims in tblClaims, but when I add tblPatientVisits so I can check out where that person was from, it returns over million. I thinkthis is due to the fact that in tblPatientVisits the same patientID shows up more than once (due to the fact that they had different admission/dischargedates).
For the life of me I can't see where this is incorrect design, nor do I know how to rectify it beyond doing one query with count(tblPatientVisits.PatientID=1 and then union with count(tblPatientVisits.patientid)>1.
Any insight into this type of design, or how I might more elegantly find a way to get the claimType from tblClaims to give me the correct number of rows with I associate a claim ID with a patientID?
EDIT: The biggest problem I'm having is the fact that if I include the admissionDate,dischargeDate or the patientStatein the tblPatient table I can't use the patientID as a primary key.
It should be noted that tblClaims are NOT necessarily related to tblPatientVisits.admissionDate, tblPatientVisits.dischargeDate.
EDIT: sample queries to show that when tblPatientVisits is added, more rows are returned than claims
SELECT tblclaims.id, tblClaims.claimType
FROM tblClaims INNER JOIN
tblPatientClaims ON tblClaims.id = tblPatientClaims.id INNER JOIN
tblPatient ON tblPatientClaims.patientid = tblPatient.patientID INNER JOIN
tblPatientVisits ON tblPatient.patientID = tblPatientVisits.patientID
more than one million query rows returned
SELECT tblClaims.id, tblPatient.patientID
FROM tblClaims INNER JOIN
tblPatientClaims ON tblClaims.id = tblPatientClaims.id INNER JOIN
tblPatient ON tblPatientClaims.patientid = tblPatient.patientID
less than one million query rows returned
I think this is crying for a better design. I really think that a visit should be associated with a claim, and that a claim can only be associated with a single patient, so I think the design should be (and eliminating the needless tbl prefix, which is just clutter):
CREATE TABLE dbo.Patients
(
PatientID INT PRIMARY KEY
-- , ... other columns ...
);
CREATE TABLE dbo.Claims
(
ClaimID INT PRIMARY KEY,
PatientID INT NOT NULL FOREIGN KEY
REFERENCES dbo.Patients(PatientID)
-- , ... other columns ...
);
CREATE TABLE dbo.PatientVisits
(
PatientID INT NOT NULL FOREIGN KEY
REFERENCES dbo.Patients(PatientID),
ClaimID INT NULL FOREIGN KEY
REFERENCES dbo.Claims(ClaimID),
VisitDate DATE
, -- ... other columns ...
, PRIMARY KEY (PatientID, ClaimID, VisitDate) -- not convinced on this one
);
There is some redundant information here, but it's not clear from your model whether a patient can have a visit that is not associated with a specific claim, or even whether you know that a visit belongs to a specific claim (this seems like crucial information given the type of query you're after).
In any case, given your current model, one query you might try is:
SELECT c.id, c.claimType
FROM dbo.tblClaims AS c
INNER JOIN dbo.tblPatientClaims AS pc
ON c.id = pc.id
INNER JOIN dbo.tblPatient AS p
ON pc.patientid = p.patientID
-- where exists tells SQL server you don't care how many
-- visits took place, as long as there was at least one:
WHERE EXISTS (SELECT 1 FROM dbo.tblPatientVisits AS pv
WHERE pv.patientID = p.patientID);
This will still return one row for every patient / claim combination, but it should only return one row per patient / visit combination. Again, it really feels like the design isn't right here. You should also get in the habit of using table aliases - they make your query much easier to read, especially if you insist on the messy tbl prefix. You should also always use the dbo (or whatever schema you use) prefix when creating and referencing objects.
I'm not sure I understand the concept of a claim but I suspect you want to remove the link table between claims and patient and instead make the association between patient visit and a claim.
Would that work out better for you?
Related
I'm trying to determine for a given person how many people have a better score than they do, and group it by the different teams they belong to. So, in the tables below, I'm grabbing the list of team_id from the team_person table where the person_id matches the person I care about. That will get me all of the teams I belong to.
Then I need to know each person_id that is in any team I belong to so that I can find out what their maximum score is from the performances table.
Once I have that, I finally want to determine, for each team_id, how many people on that team have a better score than I do, where better is simply defined as having a larger value.
I've gotten way beyond my abilities with SQL at this point. What I have so far, which seems to get me the maximum score for all the people I care about, (basically everything but my final "by team" requirement) is this:
SELECT person_id, MAX(score) m
FROM performances
WHERE category_id = 7 AND person_id IN (
-- Find all the people on the teams I belong to
SELECT DISTINCT person_id
FROM team_person
WHERE team_id IN (
-- Find all the teams that I belong to
SELECT DISTINCT team_id
FROM team_person
WHERE person_id = 2
)
)
GROUP BY person_id
ORDER BY 2 DESC
My two relevant tables are defined like so, and I'm using psql 9.1.15
Table "public.team_person"
Column | Type | Modifiers
------------+--------------------------+-------------------------------------------------------------
ident | integer | not null default nextval('team_person_ident_seq'::regclass)
team_id | integer | not null
person_id | integer | not null
*chop extraneous columns*
Indexes:
"team_person_pkey" PRIMARY KEY, btree (ident)
"teamPersonUnique" UNIQUE CONSTRAINT, btree (team_id, person_id)
Foreign-key constraints:
"team_person_person_id_fkey" FOREIGN KEY (person_id) REFERENCES person(ident) ON DELETE CASCADE
"team_person_team_id_fkey" FOREIGN KEY (team_id) REFERENCES team(ident) ON DELETE CASCADE
Referenced by:
TABLE "roster" CONSTRAINT "roster_team_person_id_fkey" FOREIGN KEY (team_person_id) REFERENCES team_person(ident) ON DELETE SET NULL
Triggers:
update_team_person_modified BEFORE INSERT OR UPDATE ON team_person FOR EACH ROW EXECUTE PROCEDURE update_modified_column()
Table "public.performances"
Column | Type | Modifiers
-------------+--------------------------+--------------------------------------------------------------
ident | bigint | not null default nextval('performances_ident_seq'::regclass)
category_id | integer | not null
person_id | integer | not null
score | real | not null
*chop extraneous columns*
Indexes:
"performances_pkey" PRIMARY KEY, btree (ident)
Foreign-key constraints:
"performances_category_id_fkey" FOREIGN KEY (category_id) REFERENCES performance_categories(ident) ON DELETE CASCADE
"performances_person_id_fkey" FOREIGN KEY (person_id) REFERENCES person(ident) ON DELETE CASCADE
First, state just the problem, without assumptions about how to get to the solution. You've done that fairly well:
determine for a given person how many people have a better score than they do, and group it by the different teams they belong to.
but I'd rephrase a bit:
For each team a given person is a member of, how many people in that team have a better score than the subject person?
I don't know about you, but it suddenly seems simpler now. Take the team table, left outer join team_person and filter for teams we're a member of, left outer join performances to find games we played with that team, left outer join team_person again to get other people who're members of each team, left outer join performances, filter out teams the subject person isn't a member of, group and aggregate.
It's underspecified for some corner cases (like a team where you're the only member, or a team where you didn't play a game), but eh, whatever.
Problems:
There's no team table. Since you don't care about anything in the team table, you can omit it from the join and just use team_person as the join root.
Your team_person table is defective, by the way. It should have a UNIQUE constraint on (team_id, person_id). Or, better, that should be the primary key. It doesn't actually matter for this query because duplicate team memberships won't change the result, but it's bad data modelling. You can't be a member of a team more than once.
performances should also have a column identifying the particular game or whatever. Since you haven't shown one, I'm going to assume you mean that you're looking for people who, in any game, performed better than the subject person at least once, in that game or another game. If you actually want to find people who did better in a particular game then you need a suitable key on performances.
Fatal problem: performances is also missing a column linking the performance to the team. This makes it impossible to properly solve the problem because you can't get performances by a given person on a given team. I'm going to assume there is in fact a team_id on performances and you just left it out.
So, allowing for the above issues, I'd first acquire the data with a big join, then group and aggregate it. This join will give us, for each team we played in, for each of our performances, for each other player, for each of their other performances, one row with all the relevant information. You can then compare performances and aggregate.
The below is totally untested, since you didn't provide sample data and you chopped important parts out of your schema (or the schema is defective), but I'd try something like:
SELECT
my_performances.team_id,
-- Find how many distinct people scored better than us at least once,
-- no matter how many times or in which game.
COUNT(distinct other_team_person.person_id)
-- Start the join with our team memberships and how we scored in each.
-- If we didn't play any games for this team don't produce a result row
-- for it, so INNER JOIN.
FROM team_person my_team_person
INNER JOIN performances my_performances ON
(my_performances.person_id = my_team_person.person_id
AND my_performances.team_id = my_team_person.team_id)
-- Other members of teams we're also a member of, skipping
-- ourselves. An `INNER JOIN` is fine here because we know
-- a team with only ourselves as a member isn't interesting
-- and we might as well skip it.
INNER JOIN team_person others_team_person ON (
my_team_person.team_id = other_team_person.team_id
AND my_team_person.person_id <> other_team_person.person_id)
-- How each of those people performed in each team they're in
-- (because of previous filter, only considers teams we're in too).
-- INNER JOIN because if they never played they can't beat us.
INNER JOIN performances other_performances ON (
other_team_person.person_id = other_performances.person_id
AND other_team_person.team_id = other_performances.team_id)
-- Make sure `my_team_person` is only teams we're a member of
WHERE my_team_person.person_id = $1
-- Also discard rows where the other person didn't do better than us
AND my_performances.score < other_performances.score
-- Emit one row per team we're a member of
GROUP BY my_performances.team_id;
If you want to show teams where you never played and teams where you're the only player, you'll need to change some INNER JOINs to LEFT OUTER JOINs.
If you want to compare to find people who beat you only within a given game, you're going to need an extra column on performances, then an extra term in the join on other_performances to restrict it to only matching in the same game as my_performances.
So, I've a Uni assignment and the lecturer has picked this week to be ill and unable to answer questions.
We've been given a baseball database made up of 4 tables to work with. Table structures are as follows:
TABLENAME:(column1, column2...etc) PK = Primary Key, FK = Foreign Key
PLAYER:(num PK, name, dob, team FK, position)
GAME:(num PK, gamedate, hometeam FK, awayteam FK, homescore,
awayscore)
GAMESTAT:(gamenum PK, playernum FK, homeruns, strikeout)
TEAM:(code PK, name, town, ground)
The aim of this particularly question is to obtain the name of the stadium's (ground in team table), the sum of the home runs scored on that ground, the sum of the strikeouts and then the sum of these two values within a specified date range.
My query and issue are below:
SELECT
t.ground AS GROUNDPLAYED,
SUM(gs.homeruns) as TOTALHOMERUNS,
SUM(gs.strikeouts) AS TOTALSTRIKEOUTS,
SUM(gs.homeruns + gs.strikeouts) AS COMBINEDTOTAL
FROM team t
LEFT OUTER JOIN game g ON g.hometeam = t.code
LEFT OUTER JOIN gamestat gs ON g.num = gs.gamenum
WHERE g.gamedate BETWEEN '7-AUG-2014' AND '13-AUG-2014'
GROUP BY t.ground;
My problem lies in the fact that I get the correct values for games played but regardless of using the LEFT OUTER JOIN, I'm not getting all the stadium's to list. I'm convinced it has to do with the fact that I have had to join to the hometeam from the GAME table and it can only pick the home stadiums based on that.
Any help you may be able to offer would be much appreciated.
Move your WHERE clause to the ON clause for the join to gamestat.
By imposing the filter criteria in the WHERE clause, it occurs after the join has been performed, removing the stadiums with no activity. Once this predicate is moved to the appropriate ON clause it will filter the gamestat's before the join instead of after.
You have experienced the good fortune to encounter this important quirk of SQL, that the positioning of predicates affects the result-set, early in your education.
I am pulling reports for my company and am needing to pull a specific report that I am having trouble with. We are using SQL Server 2012 and I am pulling the SQL reports.
What I need is to pull a simple report:
Group Name, List of Members in the group; Supervisor of the group.
However, the problem is that the supervisor as well as the members and the group name all come from one table in order to get the relevant information. Currently here is my SQL code below:
Use DATABASE
go
-- This is the select portion deciding the columns needed.
select
C.group_name
,C2.first_name
,C2.last_name
-- These are the tables that the query is pulling from.
FROM db..groups AS G
LEFT OUTER JOIN db..contact AS C
ON G.group_id=C.contact_id
INNER JOIN db..contact AS C2
ON G.member=C2.contact_id
go
This pulls the first portion:
The group name, then the first name of a member in that group, and then the last name of a member in that group.
However, I am having trouble getting the supervisor portion. This portion uses the table db.contact under the column supervisor_id as a foreign key. The supervisor_id uses the same unique id as the normal contact_id, but in the same table. Some contact_ids have supervisor_id's that are other contact_id's from the same table, hence the foreign key.
How can I make it so I can get the contact_id that is equal to the supervisor_id of the contact_id that is equal to the group_id?
Taking a quick stab at this while we wait for details
You know you need groups and I'm assuming you don't care about Groups that have no members. Thus Groups INNER JOINed to Contact. This generates your direct group membership. To get the supervisor, you then need to factor in the Supervisor on the specific Contact row.
You might not have a boss, or your boss might be yourself. It's always interesting to see how various HR systems record this. In my example, I'm assuming the head reports to no one instead of themselves.
SELECT
G.group_name
, C.first_name
, C.last_name
-- this may produce nulls depending on outer vs inner join below
, CS.first_name AS supervisor_first_name
, CS.last_name AS supervisor_last_name
FROM
dbo.Groups AS G
INNER JOIN
dbo.Contact AS C
ON C.contact_id = G.member
LEFT OUTER JOIN
dbo.Contact AS CS
ON CS.contact_id = C.supervisor_id;
Depending on how exactly you wanted that data reported, there are various tricks we could use to report that data. In particular, GROUPING SETS might come in handy.
SQLFiddle
I use three tables to get to the final result. They are called project_board_members, users and project_team.
This is the query:
SELECT `project_board_members`.`member_id`,
`users`.`name`,
`users`.`surname`,
`users`.`country`,
`project_team`.`tasks_completed`
FROM `project_board_members`
JOIN `users`
ON (`users`.`id` = `project_board_members`.`member_id`)
JOIN `project_team`
ON (`project_team`.`user_id` = `project_board_members`.`member_id`)
WHERE `project_board_members`.`project_id` = '5'
You can ignore last line because it just points to the project I'm using.
Table project_board_members holds three entries and have structure like:
id,
member_id,
project_id,
created_at;
I need to get member_id from that table. Then I join to users table to get name, surname and country. No problems. All works! :)
After that, I needed to get tasks_completed for each user. That is stored in project_team table. The big unexpected thing is that I got four entries returned and the big what-the-f*ck is that in the project_board_members table are only three entries.
Why is that so? Thanks in advice!
A SQL join creates a result set that contains one row for each combination of the left and right tables that matches the join conditions. Without seeing the data or a little more information it's hard to say what exactly is wrong from what you expect, but I'm guessing it's one of the following:
1) You have two entries in project_team with the same user_id.
2) Your entries in project_team store both user_id and project_id and you need to be joining on both of them rather than just user_id.
The table project_board_members represent what is called in the Entity-Relationship modelling world an "associative entity". It exists to implement a many-to-many relationship (in this case, between the project and user entities. As such it is a dependent entity, which is to say that the existence of an instance of it is predicated on the existence of an instance of each of the entities to which it refers (a user and a project).
As a result, the columnns comprising the foreign keys relating to those entities (member_id and project_id) must be form part or all of the primary key.
Normally, instances of an associative entity are unique WRT the entities to which it relates. In your case the relationship definitions would be:
Each user is seated on the board of 0-to-many projects;
Each project's board is comprise of 0-to-many users
which is to say that a particular user may not be on the board of a particular project more than once. The only reason for adding other columns (such as your id column) to the primary key would be if the user:project relationship is non-unique.
To enforce this rule -- a user may sit on the board a particular project just once -- the table schema should look like this:
create table project_board_member
(
member_id int not null foreign key references user ( user_id ) ,
project_Id int not null foreign key references project ( project_id ) ,
created_at ...
...
primary key ( member_id , project_id ) ,
)
}
The id column is superfluous.
For debugging purposes do
SELECT GROUP_CONCAT(pbm.member_id) AS member_ids,
GROUP_CONCAT(u.name) as names,
GROUP_CONCAT(u.surname) as surnames,
GROUP_CONCAT(u.country) as countries,
GROUP_CONCAT(pt.tasks_completed) as tasks
FROM project_board_members pbm
JOIN users u
ON (u.id = pbm.member_id)
JOIN project_team pt
ON (pt.user_id = pbm.member_id)
WHERE pbm.project_id = '5'
GROUP BY pbm.member_id
All the fields that list multiple entries in the result are messing up the rowcount in your resultset.
To Fix that you can do:
SELECT pbm.member_id
u.name,
u.surname,
u.country,
pt.tasks_completed
FROM (SELECT
p.project_id, p.member_id
FROM project_board_members p
WHERE p.project_id = '5'
LIMIT 1
) AS pbm
JOIN users u
ON (u.id = pbm.member_id)
JOIN project_team pt
ON (pt.user_id = pbm.member_id)
Scenario: A sampling survey needs to be performed on membership of 20,000 individuals. Survey sample size is 3500 of the total 20000 members. All membership individuals are in table tblMember. Same survey was performed the previous year and members whom were surveyed are in tblSurvey08. Membership data can change over the year (e.g. new email address, etc.) but the MemberID data stays the same.
How do I remove the MemberID/records contained tblSurvey08 from tblMember to create a new table of potential members to be surveyed (lets call it tblPotentialSurvey09). Again the record for a individual member may not match from the different tables but the MemberID field will remain constant.
I am fairly new at this stuff but I seem to be having a problem Googling a solution - I could use the EXCEPT function but the records for the individuals members are not necessarily the same from one table to next - just the MemberID may be the same.
Thanks
SELECT
* (replace with column list)
FROM
member m
LEFT JOIN
tblSurvey08 s08
ON m.member_id = s08.member_id
WHERE
s08.member_id IS NULL
will give you only members not in the 08 survey. This join is more efficient than a NOT IN construct.
A new table is not such a great idea, since you are duplicating data. A view with the above query would be a better choice.
I apologize in advance if I didn't understand your question but I think this is what you're asking for. You can use the insert into statement.
insert into tblPotentialSurvey09
select your_criteria from tblMember where tblMember.MemberId not in (
select MemberId from tblSurvey08
)
First of all, I wouldn't create a new table just for selecting potential members. Instead, I would create a new true/false (1/0) field telling if they are eligible.
However, if you'd still want to copy data to the new table, here's how you can do it:
INSERT INTO tblSurvey00 (MemberID)
SELECT MemberID
FROM tblMember m
WHERE NOT EXISTS (SELECT 1 FROM tblSurvey09 s WHERE s.MemberID = m.MemberID)
If you just want to create a new field as I suggested, a similar query would do the job.
An outer join should do:
select m_09.MemberID
from tblMembers m_09 left outer join
tblSurvey08 m_08 on m_09.MemberID = m_08.MemberID
where
m_08.MemberID is null