I am trying to write a query to de identify one of my tables. To make distinct ids for people, I used name, age and sex. However in my main table, the data has been collected for years and the sex code changed from 1 meaning male and 2 meaning female to M meaning male and F meaning female. To make this uniform in my distinct individuals table I used a crosswalk table to convert the sexcode into to the correct format before placing it into the distinct patients table.
I am now trying to write the query to match the distinct patient ids to their correct the rows from the main. table. The issue is that now the sexcode for some has been changed. I know I could use an update statement on my main table and changes all of the 1 and 2 to the m and f. However, I was wondering if there was a way to match the old to the new sexcodes so I would not have to make the update. I did not know if there was a way to join the main and distinct ids tables in the query while using the sexcode table to convert the sexcodes again. Below are the example tables I am currently using.
This is my main table that I want to de identify
----------------------------
| Name | age | sex | Toy |
----------------------------
| Stacy| 30 | 1 | Bat |
| Sue | 21 | 2 | Ball |
| Jim | 25 | 1 | Ball |
| Stacy| 30 | M | Ball |
| Sue | 21 | F | glove |
| Stacy| 18 | F | glove |
----------------------------
Sex code crosswalk table
-------------------
| SexOld | SexNew |
-------------------
| M | M |
| F | F |
| 1 | M |
| 2 | F |
-------------------
This is the table I used to to populate IDs for people I found to be distinct in my main table
--------------------------
| ID | Name | age | sex |
--------------------------
| 1 | Stacy| 30 | M |
| 2 | Jim | 25 | M |
| 3 | Stacy| 18 | F |
| 4 | Sue | 21 | F |
--------------------------
This what I want my de identified table to look like
---------------
| ID | Toy |
---------------
| 1 | Bat |
| 4 | Ball |
| 2 | Ball |
| 1 | Ball |
| 4 | glove |
| 3 | glove |
---------------
select c.ID, a.Toy
from maintable a
left join sexcodecrosswalk b on b.sexold = a.sex
left join peopleids c on c.Name = a.Name and c.age = a.age and c.Sex = b.sexNew
Here's a demonstration that this works:
http://sqlfiddle.com/#!3/a2d26/1
Related
I'm trying to query data from population table from members
table using the first state id (sid), by selecting the first appeared "sid" on members table
for each state id "sid" without duplicating all appeared sid on members table.
I want to get on male and female total for each state using sid. but when I query I get total of all record
from poupulation table
Example:
male_child(20) + female_cahild(70) for sid = 1
male_child(10) + female_cahild(12) for sid = 3
total = 112
Here is my sql query :
SELECT sum(p.number)as total FROM population p
JOIN members m ON p.mid = m.mid
states
+------------
|sid | name |
+----+------+
| 11 | A |
| 23 | B |
+-----------+
members
+-------------------------+
| mId | sid | date |
+------+------+-----------+
| 1 | 11 | 10-2-2021 |
| 2 | 11 | 15-2-2021 |
| 3 | 23 | 12-2-2021 |
| 4 | 23 | 16-2-2021 |
+--------------=----------+
pupulation table
pupulation
+----------------------------------------+
| pid | mid | gender | type | number |
+-----+-----+--------+--------+----------+
| 1 | 1 | male | child | 20 |
| 2 | 1 | female | child | 50 |
| 3 | 2 | male | child | 20 |
| 4 | 2 | female | child | 20 |
| 5 | 3 | male | child | 10 |
| 6 | 3 | female | child | 12 |
| 7 | 4 | female | child | 30 |
| 8 | 4 | female | child | 25 |
+----------------------------------------+
result : getting total / sum of the first `members`.`sid` 11 on row1 and 23 on
row3 of members table then sum their population
that will be (20 + 50) + (10 + 12) = 92
You haven't actually shown what your desired results are, but I suspect you just need to group by the relevant columns
select
s.name,
p.gender,
Sum(p.number) as total
from population p
join members m on p.mid = m.mid
join states s on s.sid = m.sid
group by s.name, p.gender
Edit
To get the desired total, simply get the minimum id for each member and sum the rows for those IDs
select Sum(number) as Total
from population
where mid in (select Min(mid) from members group by sid)
Example DB Fiddle
Say, I have a table like the following called "name":
| nid | name |
----------------
| 1 | john |
| 2 | mike |
| 3 | tom |
| 4 | jack |
| 5 | will |
| 6 | david | ...
and another table like the following called "relation_father_son":
| rid | fnid | snid |
---------------------
| 1 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 4 | 5 |
| 4 | 2 | 6 | ...
then I would like a result like the following:
| father | son |
------------------
| john | mike |
| john | tom |
| jack | will |
| mike | david | ...
What the should the SQL query be?
The query would be:
SELECT
f.name AS father,
s.name AS son
FROM relation_father_son
INNER JOIN name AS f
ON (nid = fnid)
INNER JOIN name AS s
ON (nid = snid)
First of all, it is confusing that the first table is named as name. You should rename it to a more distinguished name, such as family_names. Read more at: Is name a reserved word in MySQL?
For the desired result, you can use the following query:
SELECT
(SELECT `name` FROM `family_names` WHERE nid=fnid) AS father,
(SELECT `name` FROM `family_names` WHERE nid=snid) AS son
FROM relation_father_son
I've joined up my tables such that every entry is unique and I want to get a COUNT() value for how many unique courses the teachers teach. I figured I would make a table of distinct courses then do a count based on the teacher's id, however this doesn't account for teachers who taught no courses and I wish to return a zero value in this case. How do I go about getting these zero values?
Table for reference:
id | name | course_id | sec_id | semester | year
-------+------------+-----------+--------+----------+------
33456 | A | | | |
10101 | B | CS-101 | 1 | Fall | 2009
76766 | C | BIO-301 | 1 | Summer | 2010
12121 | D | FIN-201 | 1 | Spring | 2010
10101 | B | CS-347 | 1 | Fall | 2009
76543 | E | | | |
83821 | F | CS-319 | 2 | Spring | 2010
83821 | F | CS-190 | 2 | Spring | 2009
98345 | G | EE-181 | 1 | Spring | 2009
10101 | B | CS-315 | 1 | Spring | 2010
22222 | H | PHY-101 | 1 | Fall | 2009
45565 | I | CS-101 | 1 | Spring | 2010
15151 | J | MU-199 | 1 | Spring | 2010
32343 | K | HIS-351 | 1 | Spring | 2010
83821 | F | CS-190 | 1 | Spring | 2009
45565 | I | CS-319 | 1 | Spring | 2010
76766 | C | BIO-101 | 1 | Summer | 2009
58583 | L | | | |
ps. I believe I am using PostgreSQL.
[EDIT] the expected result is a table of id's, names, and a number showing the amount of courses the teacher has taught, including 0 if they have not taught any course.
[EDIT 2] I only need a query on this table, all the other work is done. If there is no value for course_id, sec_id, semester, year then that teacher has not taught a course (in the case of teachers A, E, and L; who would have a count of 0). I only need a way to count these courses, nothing else.
let's assume the table name is t:
select distinct count(course_id) filter (where course_id is not null) over (partition by id,name),id, name
from t
order by name;
count | id | name
-------+-------+--------------
0 | 33456 | A
3 | 10101 | B
2 | 76766 | C
1 | 12121 | D
0 | 76543 | E
3 | 83821 | F
1 | 98345 | G
1 | 22222 | H
2 | 45565 | I
1 | 15151 | J
1 | 32343 | K
0 | 58583 | L
(12 rows)
https://www.postgresql.org/docs/current/static/sql-expressions.html
I believe, this query should do the job:
SELECT COUNT(DISTINCT course_id), t.id, name FROM courses r Right Outer JOIN teachers t ON (r.id=t.id) group by t.id, name;
I assumed, that not all teachers where in the table you have provided, thus the need for the Outer Join. Tested on similar database on Oracle.
Do the joins with subquery to find the teachers course count
select t.name, coalesce(c.course_count, 0) course_count
from table t left join (
select name, count(distinct course _id) course_count
from table
group by name
) c on c. name = t.name
group by t.name
The database I'm working on is DB2 and I have a problem similar to the following scenario:
Table Structure
-------------------------------
| Teacher Seating Arrangement |
-------------------------------
| PK | seat_argmt_id |
| | teacher_id |
-------------------------------
-----------------------------
| Seating Arrangement |
-----------------------------
|PK FK | seat_argmt_id |
|PK | Row_num |
|PK | seat_num |
|PK | child_name |
-----------------------------
Table Data
------------------------------
| Teacher Seating Arrangement|
------------------------------
| seat_argmt_id | teacher_id |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 2 |
------------------------------
---------------------------------------------------
| Seating Arrangement |
---------------------------------------------------
| seat_argmt_id | row_num | seat_num | child_name |
| 1 | 1 | 1 | Abe |
| 1 | 1 | 2 | Bob |
| 1 | 1 | 3 | Cat |
| | | | |
| 2 | 1 | 1 | Abe |
| 2 | 1 | 2 | Bob |
| 2 | 1 | 3 | Cat |
| | | | |
| 3 | 1 | 1 | Abe |
| 3 | 1 | 2 | Cat |
| 3 | 1 | 3 | Bob |
| | | | |
| 4 | 1 | 1 | Abe |
| 4 | 1 | 2 | Bob |
| 4 | 1 | 3 | Cat |
| 4 | 2 | 2 | Dan |
---------------------------------------------------
I want to see where there are duplicate seating arrangements for a teacher. And by duplicates I mean where the row_num, seat_num, and child_name are the same among different seat_argmt_id for one teacher_id. So with the data provided above, only seat id 1 and 2 are what I would want to pull back, as they are duplicates on everything but the seat id. If all the children on the 2nd table are exact (sans the primary & foreign key, which is seat_argmt_id in this case), I want to see that.
My initial thought was to do a count(*) group by row#, seat#, and child. Everything with a count of > 1 would mean it's a dupe and = 1 would mean it's unique. That logic only works if you are comparing single rows though. I need to compare multiple rows. I cannot figure out a way to do it via SQL. The solution I have involves going outside of SQL and works (probably). I'm just wondering if there is a way to do it in DB2.
Does this do what you want?
select d.teacher_id, sa.row_num, sa.seat_num, sa.child_name
from seatingarrangement sa join
data d
on sa.seat_argmt_id = d.seat_argmt_id
group by d.teacher_id, sa.row_num, sa.seat_num, sa.child_name
having count(*) > 1;
EDIT:
If you want to find two arrangements that are the same:
select sa1.seat_argmt_id, sa2.seat_argmt_id
from seatingarrangement sa1 join
seatingarrangement sa2
on sa1.seat_argmt_id < sa2.seat_argmt_id and
sa1.row_num = sa2.row_num and
sa1.seat_num = sa2.seat_num and
sa1.child_name = sa2.child_name
group by sa1.seat_argmt_id, sa2.seat_argmt_id
having count(*) = (select count(*) from seatingarrangement sa where sa.seat_argmt_id = sa1.seat_argmt_id) and
count(*) = (select count(*) from seatingarrangement sa where sa.seat_argmt_id = sa2.seat_argmt_id);
This finds the matches between two arrangements and then verifies that the counts are correct.
I have 3 tables of data. Table A, AC and C. Table AC simply connects A and C together with a ForeignKey. Currently all my rows in AC are single entries that create these connections. So my question in this matter arises whether it makes sense to convert all the single entries in AC into one row for each A entry.
The connections that occur between A and C is one-to-many, so the array length is basically 1..x. The mean entry amount in table AC for each A entry is around 6, so it would reduce the entries in AC significantly if it was changed to arrays.
OR should i instead remove AC and simply add the FK field to the A table instead?
Where at the pitfalls when using arrays in this use-case? Will i have to use JSON style entries in an array: ['blah','blah2','blah3']?
The example below explains the structure of the database, while values are bogus:
Table A
| id | name |
---------------------
| 1 | John |
| 2 | Jim |
| 3 | Joe |
Table AC
| id | id_a | id_c |
---------------------------------
| 1 | 1 | 1 |
| 2 | 1 | 4 |
| 3 | 2 | 2 |
| 4 | 3 | 3 |
| 5 | 3 | 1 |
Table C
| id | name |
---------------------
| 1 | Pie |
| 2 | Cake |
| 3 | Burger |
| 4 | Ice |