Loop through rows and match values in SQL - sql

appreciate any help with my problem! I have an org chart of all employees and then columns for their supervisors. I am trying to find the first in the org structure supervisor for each employee that has 3+ years' experience. So if supervisor 1 has only 1 year, I will need to move to the next column with super visor 2 and see if they have more experience. At the end, I would like to return a column of supervisors' ids [experienced_supervisor column]
Table: org_chart
id | experience | supervisor_id_1| supervisor_id_2 | experienced_supervisor
A | 2 | X | C | X
C | 5 | V | D | D
V | 1 | M | X | M
X | 3
D | 8
M | 11
I am new to SQL and not even sure if this is the best approach. But here is my thinking: I will use CASE to look though every row (employee) and compare their supervisor's experience.
SELECT CASE
WHEN experience >=3 THEN supervisor_id_1
ELSE
CASE WHEN experience >=3 THEN supervisor_id_2
ELSE 'not found'
END AS experienced_supervisor
FROM org_chart
Questions:
Is this the best way to tackle the problem?
Can I look up the value [experience years] of supervisors by matching supervisor_id_1, supervisor_id_2 to id? Or do I need to create a new column supervisor_id_1_experience and fill the years of experience by doing the join?
I am using Redshift.

You only need one case expression, but a lot of joins or subqueries. Perhaps
SELECT (CASE WHEN (SELECT oc2.experience >=3 FROM org_chart oc2 WHERE oc2.id = supervisor_id_1) >= 3
THEN supervisor_id_1
WHEN (SELECT oc2.experience >=3 FROM org_chart oc2 WHERE oc2.id = supervisor_id_2) >= 3
THEN supervisor_id_2
. . .
END) AS experienced_supervisor
FROM org_chart oc

After lots of trial and errors here is the result that worked for my problem. I am using Redshift in this case.
-- Use common table expression to find job level for each supervisor from reporting levels 8 to 2
WITH cte1 AS
(
SELECT B.id as employee
,B.experience as employee_experience
,B.supervisor_id_1 as manager_1
,A.experience as supervisor_1_experience
FROM org_chart
INNER JOIN org_chart B ON B.supervisor_id_1 = A.id
),
cte2 AS
(
SELECT B.id as employee2
,B.experience as employee_experience
,B.supervisor_id_2 as manager_2
,A.experience as supervisor_2_experience
FROM org_chart
INNER JOIN org_chart B ON B.supervisor_id_2 = A.id
),
........-- Write as many statements as I have columns with reporting levels
-- Join all tables above
cte3 AS
(
SELECT employee
,employee_experience
,manager_1
,supervisor_1_experience
,manager_2
,supervisor_2_experience
FROM cte1
JOIN cte2 ON cte2.employee2 = cte1.employee
....... -- Write as many statements as I have columns with reporting levels
)
-- Run through every row and evaluate if each supervisor has more than 3 years of experience
SELECT *
,CASE
WHEN cte3.supervisor_1_experience >= 3 THEN cte3.manager_1
WHEN cte3.supervisor_1_experience < 3
AND cte3.supervisor_2_experience >=3
THEN cte3.manager_2
........ -- Write as many statements as I have columns with reporting levels
END experienced_supervisor
FROM cte3

Related

How do you join a table with a different WHERE condition after you already used a join

Hi i have 2 tables employees and medical leaves related through the employee ID, basically i want to make a result set where there is one column that filters by month and year, and another column that filters by year only
EMPLOYEES MEDICAL
|employee|ID| |ID|DateOfLeave|
A 1 1 2019/1/3
B 2 1 2019/4/15
C 3 2 2019/5/16
D 4
select employees.employee,Employees.ID,count(medical.dateofleave) as
NumberofLeaves
from employees
left outer join Medical on employees.emp = MedBillInfo.emp
and month(medbillinfo.date) in(1) and year(medbillinfo.date) in (2019)
group by Employees.employee,employees.ID
RESULT SET
|Employee|ID|NumberOfLeaves|YearlyLeaves|--i want to join this column
A 1 1 2
B 2 0 1
C 3 0 0
D 4 0 0
But i have no idea how to write inside the current sql statement to join a yearly leaves column to my current result set which is only employee,id and numberofleaves
I think you want conditional aggregation:
select e.employee, e.ID,
count(*) as num_leaves,
sum(case when month(m.date) = 1 then 1 else 0 end) as num_leaves_in_month_1
from employees e left join
Medical m
on e.emp = m.emp
where m.date >= '2019-01-01' and m.date < '2020-01-01'
group by e.employee, e.ID;
Notes:
This removes the where clause which seems to refer to a non-existent table alias.
The date arithmetic uses direct comparisons rather than functions.
This introduces table aliases so the question is easier to write and to read.
Your question probably needs to be corrected as the group by condition does not match with select columns. But based on what you asked, I think you need to use truncate date function in order to group the leaves by year. For SQL Server, there is YEAR(date) function which returns the year of the given date. This date would be MEDICAL.DateOfLeave in your case.

Count different groups in the same query

Imagine I have a table like this:
# | A | B | MoreFieldsHere
1 1 1
2 1 3
3 1 5
4 2 6
5 2 7
6 3 9
B is associated to A in an 1:n relationship. The table could've been created with a join for example.
I want to get both the total count and the count of different A.
I know I can use a query like this:
SELECT v1.cnt AS total, v2.cnt AS num_of_A
FROM
(
SELECT COUNT(*) AS cnt
FROM SomeComplicatedQuery
WHERE 1=1
-- AND SomeComplicatedCondition
) v1,
(
SELECT COUNT(A) AS cnt
FROM SomeComplicatedQuery
WHERE 1=1
-- AND SomeComplicatedCondition
GROUP BY A
) v2
However SomeComplicatedQuery would be a complicated and slow query and SomeComplicatedCondition would be the same in both cases. And I want to avoid calling it unnessesarily. Aside from that if the query changes, you need to make sure to change it in the other place too, making it prone to error and creating (probably unnessesary) work.
Is there a way to do this more efficiently?
Are you looking for this?
SELECT COUNT(*) AS total, COUNT(DISTINCT A) AS num_of_A
FROM (. . . ) q

How to list distinct column after a join and also count?

I have three tables:
team(ID, name)
goal(ID, team_ID, goalType_ID, date)
goalType(ID, name)
As you can see, team_ID is the ID of teams table, and goalType_ID is the ID of goalType table.
For all teams, I want to list the number of different types of goals that ever happened, 0 should appear if none.
We don't need to care about the goals table since we don't need the name of the type of goal so I've gotten to the follow code that only uses the first two tables:
SELECT team.ID, team.name, goal.goaType_ID
FROM team LEFT JOIN goal ON team.ID=goal.team_ID
What this results in is a three-column table of information I want, but I would like to count the number of DISTINCT goalTypes, and GROUP BY team.ID or team.name and keep it three columns and also if the result is null, show 0 (team might not have scored any goals).
The resulting table looks something like this:
team.ID team.name goalsType.ID
1 Team_1 1
2 Team_2 2
2 Team_2 2
2 Team_2 2
3 Team_3 4
4 Team_4 null
5 Team_5 null
6 Team_6 1
6 Team_6 2
6 Team_6 4
6 Team_6 3
7 Team_7 5
7 Team_7 4
8 Team_8 null
I have tried a combination of GROUP BY, DISTINCT, and COUNT, but still can't get a result I want.
Maybe I'm going about this all wrong? Any help would be appreciated, Thanks.
EDIT:
Based on Gordon Linoff's answer, I tried doing:
SELECT DISTINCT team.name, COUNT(goal.goalType_ID)
FROM team LEFT JOIN goal ON team.ID=goal.team_ID
GROUP BY team.ID, team.name
and it will give me:
Name #0
Team_1 1
Team_2 3
Team_3 1
Team_4 0
Team_5 0
Team_6 4
Team_7 1
Team_8 0
If I try to use "DISTINCT team.ID, DISTINCT team.name", it will error out.
Is this what you want?
SELECT team.ID, team.name, count(distinct goal.goalType_ID) as NumGoalTypes
FROM team LEFT JOIN
goal
ON team.ID = goal.team_ID
GROUP BY team.ID, team.name;
Try this http://sqlfiddle.com/#!3/8ec680/13
;WITH cte
AS (SELECT Row_number() OVER(partition BY tname
ORDER BY goalid), * from temp)--temp= Your join statement
SELECT CASE
WHEN a.goalid IS NULL THEN 0
ELSE a.row_n
END [count],
a.tid,
a.tname,
a.goalid
FROM cte a
JOIN (SELECT Max(row_n) row_n,
tname
FROM cte
GROUP BY tname) b
ON a.row_n = b.row_n
AND a.tname = b.tname

Grouping by overall score based on a range of values and a score table in SQLite

Given a big table of data about when people begin and complete tasks, e.g:
Person | Task | Date started | Date ended
---------------------------------------------
A Cleaning 20-FEB-2012 22-FEB-2012
N Dishes 20-FEB-2012 24-FEB-2012
Z Cleaning 21-FEB-2012 23-FEB-2012
and a score table which assigns scores of 2,3,4 for each task based on how long it takes them to do it, e.g.:
| Task | Days taken | Score
---------------------------
Cleaning 2 2
Cleaning 1.5 3
Cleaning 1 4
Dishes 3 2
Dishes 2.5 3
Dishes 2 4
how might I produce a query which gives the overall score for each person for each task, e.g.:
Person | Task | Overall Score
---------------------------------------------
A Cleaning 3.1
A Dishes 2.7
N Cleaning 3.4
The solution's been subtly eluding me, some assistance would be appreciated! I'm using SQLite at present.
Your definitions are a bit vague. However, the following should help you:
select t.person, t.task, sum(s.score)
from tasks t left outer join
score s
on t.task = s.task and
s.daysTaken = t.julianday(dateended) - t.julianday(datestarted)
group by t.person, t.task
Handling ranges a bit more difficult. You need to get the two ends of the interval, and then do the join:
select t.person, t.task, sum(s.score)
from tasks t left outer join
(select s.*,
(select min(days_taken) from score s2 where s2.person = s.person and s2.task = s.task and s2.days_taken > s.days_taken
) as nextDays_Taken
from score s
) s
on t.task = s.task and
t.julianday(dateended) - t.julianday(datestarted) >= s.days_taken and
t.julianday(dateended) - t.julianday(datestarted) < nextDays_Taken
group by t.person, t.task

Select data from a table where only the first two columns are distinct

Background
I have a table which has six columns. The first three columns create the pk. I'm tasked with removing one of the pk columns.
I selected (using distinct) the data into a temp table (excluding the third column), and tried inserting all of that data back into the original table with the third column being '11' for every row as this is what I was instructed to do. (this column is going to be removed by a DBA after I do this)
However, when I went to insert this data back into the original table I get a pk constraint error. (shocking, I know)
The other three columns are just date columns, so the distinct select didn't create a unique pk for each record. What I'm trying to achieve is just calling a distinct on the first two columns, and then just arbitrarily selecting the three other columns as it doesn't matter which dates I choose (at least not on dev).
What I've tried
I found the following post which seems to achieve what I want:
How do I (or can I) SELECT DISTINCT on multiple columns?
I tried the answers from both Joel,and Erwin.
Attempt 1:
However, with Joels answer the set returned is too large - the inner join isn't doing what I thought it would do. Selecting distinct col1 and col2 there are 400 columns returned, however when I use his solution 600 rows are returned. I checked the data and in fact there were duplicate pk's. Here is my attempt at duplicating Joels answer:
select a.emp_no,
a.eec_planning_unit_cde,
'11' as area, create_dte,
create_by_emp_no, modify_dte,
modify_by_emp_no
from tempdb.guest.temp_part_time_evaluator b
inner join
(
select emp_no, eec_planning_unit_cde
from tempdb.guest.temp_part_time_evaluator
group by emp_no, eec_planning_unit_cde
) a
ON b.emp_no = a.emp_no AND b.eec_planning_unit_cde = a.eec_planning_unit_cde
Now, if I execute just the inner select statement 400 rows are returned. If I select the whole query 600 rows are returned? Isn't inner join supposed to only show the intersection of the two sets?
Attempt 2:
I also tried the answer from Erwin. This one has a syntax error and I'm having trouble googling the spec on the where clause (specifically, the trick he is using with (emp_no, eec_planning_unit_cde))
Here is the attempt:
select emp_no,
eec_planning_unit_cde,
'11' as area, create_dte,
create_by_emp_no,
modify_dte,
modify_by_emp_no
where (emp_no, eec_planning_unit_cde) IN
(
select emp_no, eec_planning_unit_cde
from tempdb.guest.temp_part_time_evaluator
group by emp_no, eec_planning_unit_cde
)
Now, I realize that the post I referenced is for postgresql. Doesn't T-SQL have something similar? Trying to google parenthesis isn't working too well.
Overview of Questions:
Why doesn't inner join return an intersection of two sets? From googling this is what I thought it was supposed to do
Is there another way to achieve the same method that I was trying in attempt 2 in t-sql?
It doesn't matter to me which one of these I use, or if I use another solution... how should I go about this?
A select distinct will be based on all columns so it does not guarantee the first two to be distinct
select pk1, pk2, '11', max(c1), max(c2), max(c3)
from table
group by pk1, pk2
You could TRY this:
SELECT a.emp_no,
a.eec_planning_unit_cde,
b.'11' as area,
b.create_dte,
b.create_by_emp_no,
b.modify_dte,
b.modify_by_emp_no
FROM
(
SELECT emp_no, eec_planning_unit_cde
FROM tempdb.guest.temp_part_time_evaluator
GROUP BY emp_no, eec_planning_unit_cde
) a
JOIN tempdb.guest.temp_part_time_evaluator b
ON a.emp_no = b.emp_no AND a.eec_planning_unit_cde = b.eec_planning_unit_cde
That would give you a distinct on those fields but if there is differences in the data between columns you might have to try a more brute force approch.
SELECT a.emp_no,
a.eec_planning_unit_cde,
a.'11' as area,
a.create_dte,
a.create_by_emp_no,
a.modify_dte,
a.modify_by_emp_no
FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY emp_no, eec_planning_unit_cde) rownumber,
a.emp_no,
a.eec_planning_unit_cde,
a.'11' as area,
a.create_dte,
a.create_by_emp_no,
a.modify_dte,
a.modify_by_emp_no
FROM tempdb.guest.temp_part_time_evaluator
) a
WHERE rownumber = 1
I'll reply one by one:
Why doesn't inner join return an intersection of two sets? From googling this is what I thought it was supposed to do
Inner join don't do an intersection. Le'ts supose this tables:
T1 T2
n s n s
1 A 2 X
2 B 2 Y
2 C
3 D
If you join both tables by numeric column you don't get the intersection (2 rows). You get:
select *
from t1 inner join t2
on t1.n = t2.n;
| N | S |
---------
| 2 | B |
| 2 | B |
| 2 | C |
| 2 | C |
And, your second query approach:
select *
from t1
where t1.n in (select n from t2);
| N | S |
---------
| 2 | B |
| 2 | C |
Is there another way to achieve the same method that I was trying in attempt 2 in t-sql?
Yes, this subquery:
select *
from t1
where not exists (
select 1
from t2
where t2.n = t1.n
);
It doesn't matter to me which one of these I use, or if I use another solution... how should I go about this?
yes, using #JTC second query.