SQL populating table with calculation from other tables - sql

I have two tables:
grades_table = has 100 rows of students with grades i.e.
Student Mark
Joe 64
Mark 50
percentage_table = a table which has a percentage to a grade i.e.
Grade Mark
A 70+
B 60+
C 50+
Is it possible in SQL to populate a new table called Overall, with the grade calculation? For example,
Student Grade
Joe B
Mark C
I've looked at IN and BETWEEN. But I can't quite understand how to calculate the ranges. I'm new to SQL and any help or point to the right direction would be great.

Split percentage table's mark column into two columns, markmin and markmax. Then JOIN the tables as:
select g.student, p.grade
from grades g
join percentage p ON g.mark between p.markmin and p.markmax
Or, according to later suggested If I changed the scenario slightly, and only had one mark column in the percentage_table, so only 60+, 70+:
Here I assume (A, 70) means 70 and above is grade A. Do a GROUP BY with MIN to find "lowest" grade alphabetically:
select g.student, min(p.grade)
from grades g
join percentage p ON g.mark <= p.mark
group by g.student
Alternatively, use a correlated sub-select:
select g.student, (select min(p.grade) from percentage p
where g.mark <= p.mark)
from grades g

I would suggest you to have two different columns in your table percentage_table:
Grade | MarkMin | MarkMax
A | 70 | Null
B | 60 | 69
C | 50 | 59
Then you can just use a JOIN:
SELECT g.student, p.Mark
FROM
grades_table AS g LEFT JOIN percentage_table AS p
ON (g.Mark >= p.MarkMin OR p.MarkMin IS NULL)
AND (g.Mark<=p.MarkMax OR p.MarkMax IS NULL)
70+ which means >= 70 can then be inserted as (70, Null).
If you want to have a <=49 then you can add a mark as (Null, 49).

Related

R) Using join in R

Given database is down below,
> dbReadTable(jamesdb, "EMPLOYEE")
EMP_NO NI_NO NAME AGE DEPT_NO
1 E1 123 SMITH 21 D1
2 E2 159 SMITH 31 D1
3 E3 5432 BROWN 65 D2
4 E5 7654 GREEN 52 D3
> dbReadTable(jamesdb, "DEPARTMENT")
DEPT_NO NAME MANAGER
1 D1 Accounts E1
2 D2 Stores E3
3 D3 Sales E5
> dbReadTable(jamesdb, "PRODUCT")
PROD_NO NAME COLOR
1 p1 PANTS BLUE
2 p2 PANTS KHAKI
3 p3 SOCKS GREEN
4 p4 SOCKS WHITE
5 p5 SHIRTS WHITE
> dbReadTable(jamesdb, "STOCK_TOTAL")
PROD_NO QUANTITY
1 p1 2000
2 p2 1000
3 p3 1500
4 p4 200
5 p5 800
And down below is what I got so far but I think I have a misunderstanding of using join.
How should I fix them?
Retrieve the employment number of the sales department manager.
dbGetQuery(jamesdb, 'SELECT EMPLOYEE.EMP_NO FROM DEPARTMENT JOIN EMPLOYEE
WHERE DEPARTMENT.NAME = "Sales"')
Who works in Department D2?
dbGetQuery(jamesdb, 'SELECT MANAGER FROM DEPARTMENT WHERE DEPT_NO = "D2"')
How many white-colored products are in stock?
dbGetQuery(jamesdb, 'SELECT SUM(QUANTITY) FROM PRODUCT JOIN STOCK_TOTAL
WHERE PRODUCT.COLOR = "WHITE"')
Joins are typically done matching one (or more) fields from one table with a corresponding field(s) of another table. For instance, I'm inferring that DEPARTMENT.MANAGER is actually a foreign key to EMPLOYEE.EMP_NO, so when you join, you should be very specific about that relationship:
SELECT e.EMP_NO
FROM DEPARTMENT d
LEFT JOIN EMPLOYEE e on d.MANAGER = e.EMP_NUM
WHERE d.NAME = "Sales"
Notes:
Many databases allow you to be sloppy, where they will infer field associations (foreign keys) based on common field names. First, I don't like allowing that inference; second, it doesn't work here.
I personally prefer to be explicit about the type of join, whether left join, inner join, etc. It's a style, you may choose just join if you prefer.
I'm introducing table aliases here (d and e), a way to shorten long table names. However, they are stylistic, not required.
I personally dislike databases that lack a dictionary of foreign keys and have unintuitive names to associate them. For instance, I'm inferring from the contents of the tables that DEPARTMENT.MANAGER is linked to EMPLOYEE.EMP_NUM. If I'm wrong on this inference, then answers below are likely skewed.
For your first question, though, I don't know why you need a join: since MANAGER is already the employee number, this should be simply
select d.MANAGER
from DEPARTMENT d
where d.NAME='Sales'
Similarly, your second question needs no join.
select e.*
from EMPLOYEE e
where e.DEPT_NO='D2'
The last one needs a join, and can be done in a number of ways. One such is:
select sum(case when st.Quantity > 0 then 1 else 0 end) as Count
from STOCK_TOTAL st
left join PRODUCT pr on st.PROD_NO=pr.PROD_NO
where pr.COLOR='WHITE'

SQL - Count rows based on matching columns and value range

Please see below query using T-SQL with SSMS. There are three tables: B, G and L.
B has a column Bname
G has 2 columns Gname, Gross
L has 2 columns Bname, Gname
Gross column is an INT ranging values between 80 and 100.
Table L's columns: Bname and Gname will feature names from the B and G tables on the same row. Where both names feature on the same row, I would like to COUNT this as one item; only if the Gross on Table G ranges between 80 and 100 to the corresponding Gname row.
My current query reads:
SELECT l.bname, (SELECT COUNT(*) FROM g WHERE g.gross BETWEEN 80 AND 90) AS Good
FROM l
INNER JOIN b
ON b.bname=l.bname
INNER JOIN g
ON g.gname=l.gname
GROUP BY l.bname;
The result is nearly there, but it counts all Table G:Gname rows betweeen 80 and 100. Emitting the instances on Table L where the Bname and Gname are on the same row.
Thanks in advance for looking.
I suspect that you want:
SELECT l.bname,
(SELECT COUNT(*)
FROM b INNER JOIN
g
ON g.gname = l.gname
WHERE b.bname = l.bname AND g.gross BETWEEN 80 AND 90
) AS Good
FROM l ;
The outer aggregation is not needed of l.bname is unique.
This would more commonly be calculating using conditional aggregation:
SELECT l.bname,
SUM(CASE WHEN g.gross BETWEEN 80 AND 90 THEN 1 ELSE 0 END) AS Good
FROM l INNER JOIN
b
ON b.bname = l.bname INNER JOIN
g
ON g.gname = l.gname
GROUP BY l.bname;
No subquery is needed.

Deleting records from one table if all records are equal to a specific value

Student Table
Student_ID School Home State Grade Age
85 Washington St Colorado Junior 22
90 Washington St Washington Senior 23
81 Oregon California Junior 21
21 Washington Washington Sophomore 21
Attendance Table
Student_ID Active Date
85 N 9/22/20
85 N 9/21/20
81 Y 9/22/20
81 N 9/21/20
Hey in an Oracle DB if I want to go clean up the table with the student's information by seeing who is still an active student. By sorting by Student_ID in the attendance table I want to find students if they have all values for which active = 'N'. If all values for each student active = 'N' then I know they are no longer a student, I would like to delete the records from the Student table (student 85). However, if only on records for each student has Active = 'Y', then I won't delete anything with that student as I know they're still active (student 81). What would the best way to go about this, I have tried to use the all operator but I've been unable to get the results desired. Below is the query I have been trying to use.
DELETE /*+ parallel (a) */ FROM STUDENT a
WHERE ( a.student_ID = ALL
(SELECT /*+ parallel (b) */ b.student_id, b.active FROM attendance b WHERE b.active = 'N'));
One option uses not exists:
delete from student s
where not exists (
select 1 from attendance a where a.student_id = s.student_id and a.active = 'Y'
)
This also deletes students that have no attendance at all. If that's not what you want, then you can use a correlated subquery instead:
delete from student s
where (
select min(active) from attendance a where a.student_id = s.student_id
) = 'N'
You can check for 'Y' rows using aggregation:
DELETE
FROM STUDENT
WHERE student_ID IN
(
SELECT student_id
FROM attendance
GROUP BY student_id
HAVING MAX(active) = 'N' -- no Y for this student
);
NOT Exists
will be best option here, as guessing you have to handle some huge tables.
I second GMB's 1st query.

how to query the value that has changed +/- 10% of the value from first encounter of each patient in sql server?

I have this query that finds users with multiple hospital visits.
Table has about 593 columns, so I don't think I can show you the structure. But let's assume these are basic patients table with following columns.
id, sex, studyDate, referringPhysician, bmi, bsa, height, weight, bloodPressure, heartRate. These are also in the real table.
The patient visits the hospital and has some worked done. What we would like to find is how much of patient's bmi has changed since the first encounter. For example,
ID |SEX| StudyDate | Physician|BMI| BSA | ht| Wt | BP | HR |
1 PatientA | M | 2017-09-11 | Dr. Hale | 60| 2.03 | 6 | 282 | 116/82 | 77 |
2 PatientA | M | 2017-12-11 | Dr. Hale | 58| 2.03 | 6 | 296 | 126/82 | 72 |
3 PatientA | M | 2018-03-17 | Dr. Hale | 50| 2.03 | 6 | 282 | 126/82 | 72 |
In the example above, row 1 was the first encounter and the BMI was 60. In row 2, the bmi decreased to 58, but it's not more than 10%. So, that shouldn't be displayed. However, row 3 has bmi 50 which is decreased by more than 10% of bmi in row 1. That should be displayed.
I'm sorry, I don't have the data that I can share.
with G as(
select * from Patients P
inner join (
select count(*) as counts, ID as oeID
from Patients
group by ID
Having count(*) > 2
) oe on P.ID = oe.oeID where P.BMI > 30
)
select * from G
order by StudyDate asc;
From this, what I'd like to do is find out patients whose BMI has changed by 10% from the first encounter.
How can I do this?
Can you also help me understand the concept of for-each users in SQL, and how it handles such queries?
Guessing at your data model here...I suspect you've got a heavily denormalized structure here with everything being crammed into one table. You would be far better off to have a patient table separate from the table that stores their visits. The with G syntax here is very unneeded as well, especially if you are just doing a select * from the table after. Heh, I'm trying to get into medical analytics, so will give this a try.
I'll build this as I see your data model...you may have to change a step here and there to fit your column names. Lets start by getting first and most recent (last) visit dates by id
select id, min(StudyDate) as first_visit, max(studydate) as last_visit
from patients
group by id
having min(StudyDate) <> max(StudyDate)
Simply query at this point and by using the having clause we ensure that these are two separate visits. But we are lacking the BMI numbers for these visits...so we will have to join back to the patient table to grab them. We will iunclude a where clause to ensure only the +/- of 10% is found
select a.id, a.first_visit, a.last_visi, b.bmi as first_bmi, c.bmi as last_bmi, b.bmi - c.bmi as bmi_change
from
(select id, min(StudyDate) as first_visit, max(studydate) as last_visit
from patients
group by id
having min(StudyDate) <> max(StudyDate) a
inner join patients b on b.id = a.id and b.study_date = a.first_visit
inner join patients c on c.id = a.id and c.study_date = a.last_visit
where b.bmi - c.bmi >= 10 or b.bmi - c.bmi <= -10
Hopefully that makes sense, you'll want to change the top select line to grab all the fields you actually want to return, I'm just returning the ones of interest to your question
Part 2:
Lets approach this from a similar angle:
select id, min(StudyDate) as first_visit
from patients
group by id
Now we've got the first visit date. Lets join back to patients and get the bmi here.
select a.id, first_visit, p.bmi
from
(select id, min(StudyDate) as first_visit
from patients
group by id) a
inner join patients b on a.first_visit = b.studydate and a.id = b.id
This will simply be a list of each patient by ID giving us their first_visit date and their BMI on that first visit. Now we want to compare this bmi to all subsequent visits...so lets join all rows to back to this query. Subquery a below is simply the query above in brackets:
select a.id, a.first_visit, b.study_date, a.bmi, b.bmi, a.bmi-b.bmi as bmi_change
from
(select a.id, first_visit, b.bmi
from
(select id, min(StudyDate) as first_visit
from patients
group by id) a
inner join patients b on a.first_visit = b.studydate and a.id = b.id) a
inner join patients b on a.id = b.id
where a.bmi - b.bmi >= 10 or a.bmi - b.bmi <= -10
Similar idea, instead of joining on the max_date to get most recent, we are joining to all records for that patient and running the math from there. In the commented example, this will give rows 3,5,6.
Part 3
A little more complex...getting rows 3,4,5,6 when row 4 shows less than a 10 change in BMI means you are now trying to pick out the first date that the 10 change is seen and displaying all records from that. Lets call the query in part 2 subquery a and go pseudo code for a moment:
Select id, min(studydate)
from (subquerya) a
(subquerya) simply stands for the entire query used at the end of part 2. This will grab the study date of the first time a bmi change of over 10 is detected for each patient id (in our comment example, it would be visit 3). Now we can join back to patients, this time getting all records that are equal to or more recent than the min(studydate) of the first time bmi changed more than 10 since the first visit
select a.id, b.studydate, b.bmi
from
(Select id, min(studydate) as min_studydate
from (subquerya) a) a
inner join patients b on a.id = b.id and a.min_studydate <= b.studydate
This will bring back the list of all study dates happening after the first time a bmi change more than 10 was detected (3,4,5,6 from our comment example). Of course we've now lost the first study date's bmi value, so lets add that back in and bring the query all together.
select a.id, b.studydate, b.bmi, c.bmi as start_bmi, c.bmi - b.bmi as bmi_change
from
(Select id, min(studydate) as min_studydate
from ( select a.id, a.first_visit, b.study_date, a.bmi, b.bmi, a.bmi-b.bmi as bmi_change
from
(select a.id, first_visit, b.bmi
from
(select id, min(StudyDate) as first_visit
from patients
group by id) a
inner join patients b on a.first_visit = b.studydate and a.id = b.id) a
inner join patients b on a.id = b.id
where a.bmi - b.bmi >= 10 or a.bmi - b.bmi <= -10) a) a
inner join patients b on a.id = b.id and a.min_studydate <= b.studydate
inner join (select a.id, first_visit, p.bmi
from
(select id, min(StudyDate) as first_visit
from patients
group by id) a
inner join patients b on a.first_visit = b.studydate and a.id = b.id) c on c.id = a.id
If I have everything right, this should bring back rows 3,4,5,6 and the change in BMI across each visit. I've left a few more columns in there than need be and it could be cleaned a little, but all logic should be there. I don't have

Oracle, LEFT OUTER JOIN not returning all rows from left table, instead behaving like INNER JOIN

I'm doing a left outer join and only getting back matching rows like it was an inner join.
To simplify the data, my first table(ROW_SEG), or left table looks something like this:
ASN | DEPT NO
-----------------------
85 | 836
86 | null
87 | null
My second table(RF_MERCHANT_ORG) has DEPT_NAME, and some other things which i want to get when i have a dept number.
DEPT NO | DEPT_NAME
-----------------------
836 | some dept name 1
837 | some dept name 2
838 | some dept name 3
In this case after my join i'd only get 1 row, for ASN 85 that had a DEPT NO.
...omitting a bunch of SQL for simplicity
, ROW_SEG AS (
SELECT *
FROM VE_SI_EC_OI
WHERE ROW_NUM BETWEEN 1 AND 1000 -- screen pagination, hardcoding values
)
-- ROW_SEG has a count of 1000 now
, RFS_JOIN AS (
SELECT ROW_SEG.*
,MO.BYR_NO
,MO.BYR_NAME
,MO.DEPT_NAME
FROM ROW_SEG
LEFT OUTER JOIN RF_MERCHANT_ORG MO
ON ROW_SEG.DEPT_NO = MO.DEPT_NO
WHERE MO.ORG_NO = 100
)
SELECT * FROM RFS_JOIN; -- returns less than 1000
I only get back the number of rows equal to the number of rows that have dept nos. So in my little data example above i would only get 1 row for ASN 85, but i want all rows with BYR_NO, BYR_NAME, AND DEPT_NAME populated on rows where i had a DEPT_NO, and if not, then empty/null columns.
If ORG_NO is within the RF_MERCHANT_ORG table (using aliases consistently would help there) then acting like an inner join would then would be the correct result for the SQL being used.
The join should be this to make it act like a proper left join:
LEFT OUTER JOIN RF_MERCHANT_ORG MO ON ROW_SEG.DEPT_NO = MO.DEPT_NO AND MO.ORG_NO = 100
If ORG_NO is in RF_MERCHANGE_ORG, then that is likely to be the cause... the where condition is limiting the result set.