List the name of division that all employees are working on some project(s) - sql

List the name of division that ALL employees are working on some project(s). Namly, there not exists an employee who do is the full question. I'm having trouble getting an actual answer for this one, and my professor is being no help to telling me what I'm doing wrong. The code I have is
select dname
from division d, employee e, workon w
where e.did = d.did
and w.empid = e.empid
and not exists
(select empid
from workon
group by empid
having count (empid) >= all(select e.empid
from employee ee
where e.did = ee.did
group by ee.empid))
group by dname
The tables I have are
Employee
| EMPID | NAME | SALARY | DID |
--------------------------------
| 1 | kevin | 32000 | 2 |
| 2 | joan | 46200 | 1 |
| 3 | brian | 37000 | 3 |
| 4 | larry | 82000 | 5 |
| 5 | harry | 92000 | 4 |
| 6 | peter | 45000 | 2 |
| 7 | peter | 68000 | 3 |
| 8 | smith | 39000 | 4 |
| 9 | chen | 71000 | 1 |
| 10 | kim | 46000 | 5 |
Division
| DID | DNAME | MANAGERID |
----------------------------------------------
| 1 | engineering | 2 |
| 2 | marketing | 1 |
| 3 | human resource | 3 |
| 4 | Research and development | 5 |
| 5 | accounting | 4 |
Workon
| PID | EMPID | HOURS |
-----------------------
| 3 | 1 | 30 |
| 2 | 3 | 40 |
| 5 | 4 | 30 |
| 6 | 6 | 60 |
| 4 | 3 | 70 |
| 2 | 4 | 45 |
| 5 | 3 | 90 |
| 3 | 3 | 100 |
| 6 | 8 | 30 |
| 4 | 4 | 30 |
| 5 | 8 | 30 |
| 6 | 7 | 30 |
| 6 | 9 | 40 |
| 5 | 9 | 50 |
| 4 | 6 | 45 |
| 2 | 7 | 30 |
| 2 | 8 | 30 |
| 2 | 9 | 30 |
| 1 | 9 | 30 |
| 1 | 8 | 30 |
| 1 | 7 | 30 |
| 1 | 5 | 30 |
| 1 | 6 | 30 |
| 2 | 6 | 30 |

You're very close. What you're trying to do is called a "correlated subquery". You're relating a key from a table you are querying to a key in a query that doesn't contribute to the candidate set, but does act as a filter in your where clause.
The key line in your code that demonstrates this is the line in the NOT EXISTS clause that says:
e.did = ee.did
Instead of trying to do this by comparing aggregate COUNT(...) results, do an outer join between the Employee and Workon tables to find out if there are any employees who aren't doing anything, then find your departments based on those employees not existing for a given department.
Here's an example query using the Oracle standard HR example tutorial tables representing the same join conditions as you have here. You probably have access to these tables wherever you're running the query, and so should anyone else here who might be interested in the answer, so they can run the query without building your tables to play around with the answer. It's a relatively trivial matter to convert the query to your tables, so I'll leave that exercise to you! :)
The final capitalized line in my query below is the join condition that makes this query a correlated subquery, like you tried to do in yours.
select
*
from
hr.departments d
where
not exists
(
select
ee.employee_id
,ee.first_name
,ee.last_name
,dd.department_id
,dd.department_name
,jj.job_id
from
hr.employees ee
,hr.departments dd
,hr.job_history jj
where
ee.department_id = dd.department_id
and ee.employee_id = jj.employee_id (+)
and jj.job_id is null
AND D.DEPARTMENT_ID = DD.DEPARTMENT_ID
)

Related

SQL JOIN each id in JSON object

I have a JSON column containing col_values for another table. I want to return rows from that other table for each item in the JSON object.
If this was an INT column, I would use JOIN, but I need to JOIN every entry in the JSON object.
Take:
writers :
| id | name | projects (JSON) |
|:-- |:-----|:------------------|
| 1 | Andy | ["1","2","3","4"] |
| 2 | Hank | ["3","4","5","6"] |
| 3 | Alex | ["1","7","8","9"] |
| 4 | Joe | ["1","5","6","7"] |
| 5 | Ken | ["2","4","5","6"] |
| 6 | Zach | ["2","7","8","9"] |
| 7 | Walt | ["2","5","6","7"] |
| 8 | Mike | ["2","3","4","5"] |
cities :
| id | name | project |
|:-- |:---------|:--------|
| 1 | Boston | 1 |
| 2 | Chicago | 2 |
| 3 | Cisco | 3 |
| 4 | Seattle | 4 |
| 5 | North | 5 |
| 6 | West | 6 |
| 7 | Miami | 7 |
| 8 | York | 8 |
| 9 | Tainan | 9 |
| 10 | Seoul | 1 |
| 11 | South | 2 |
| 12 | Tokyo | 3 |
| 13 | Carlisle | 4 |
| 14 | Fugging | 5 |
| 15 | Turkey | 6 |
| 16 | Paris | 7 |
| 17 | Midguard | 8 |
| 18 | Fugging | 9 |
| 19 | Madrid | 1 |
| 20 | Salvador | 2 |
| 21 | Everett | 3 |
I need every city ordered by name for Mike (id=8).
Desired results:
This is what I'm getting and what I need to get (ORDER BY name).
Output :
| id | name | project |
|:---|:---------|:--------|
| 13 | Carlisle | 4 |
| 2 | Chicago | 2 |
| 3 | Cisco | 3 |
| 21 | Everett | 3 |
| 14 | Fugging | 5 |
| 5 | North | 5 |
| 20 | Salvador | 2 |
| 4 | Seattle | 4 |
| 11 | South | 2 |
| 12 | Tokyo | 3 |
Current query, but this can't be the best way...
SQL >
SELECT c.*
FROM cities c
WHERE EXISTS (
SELECT 1
FROM writers w
WHERE JSON_CONTAINS(
w.projects, CONCAT('\"', c.project, '\"'))
AND w.id = '8'
)
ORDER BY c.name;
DB Fiddle with the above. Is there a better way to do this "properly"?
Background
If it matters, I need to keep using JSON as the datatype because my server-side software that uses this database normally reads that column best if presented as a JSON object.
I would normally just do several database calls and iterate through that JSON object in my server-side language, but that is way too expensive with so many database calls, notwithstanding that it is even more costly to do multiple database calls for pagination.
I need all the results in a single database call. So, I need to JOIN or otherwise loop through each item in the JSON object within SQL.
Start with JOIN
Per a comment from a user, there is a better way...
SQL >
SELECT c.*
FROM writers w
JOIN cities c ON JSON_CONTAINS(w.projects, CONCAT('\"', c.project, '\"'))
WHERE w.id = '8'
ORDER BY c.name;
Output is the same...
Output :
id
name
project
13
Carlisle
4
2
Chicago
2
3
Cisco
3
21
Everett
3
14
Fugging
5
5
North
5
20
Salvador
2
4
Seattle
4
11
South
2
12
Tokyo
3
DB Fiddle

Using one query's result into another

I have this query which shows the below result, I want to use this MarksObtained and take out min, max and avg of each course and show it with the second query (have provided below).
select
CourseName, StdID, MarksObtained
from
stdmarks
inner join
course on course.courseid = stdmarks.examid
+--------------------------+-------+---------------+
| CourseName | StdID | MarksObtained |
+--------------------------+-------+---------------+
| Digital Logic | 1 | 20 |
| Visual Prog | 1 | 20 |
| Computer Arch and Design | 1 | 20 |
| Digital Logic | 2 | 20 |
| Visual Prog | 2 | 20 |
+--------------------------+-------+---------------+
This is the second query
select
distinct CourseName, TeacherName, SemName
from
teacher
inner join
stdcourseteacher on teacher.teacherid = stdcourseteacher.teacherid
inner join
course on course.courseid = stdcourseteacher.courseid
inner join
semester on stdcourseteacher.semid = semester.semid
+-------------------------+-------------+----------+
| CourseName | TeacherName | SemName |
+-------------------------+-------------+----------+
| Business Communications | Dr. Iman | Fall2021 |
| Calculus - 1 | Dr. Khalid | Fall2021 |
| Calculus - 2 | Dr. Khalid | Fall2020 |
+-------------------------+-------------+----------+
So it will basically show min, max and avg of each course achieved by the students.
What I want:
+-------------------------+-------------+----------+-----+-----+-----+
| CourseName | TeacherName | SemName | Min | Max | Avg |
+-------------------------+-------------+----------+-----+-----+-----+
| Business Communications | Dr. Iman | Fall2021 | 80 | 20 | 50 |
| Calculus - 1 | Dr. Khalid | Fall2021 | 70 | 15 | 45 |
| Calculus - 2 | Dr. Khalid | Fall2020 | 85 | 15 | 50 |
+-------------------------+-------------+----------+-----+-----+-----+
Sample data:
StdMarks table:
+-------+--------+---------------+
| StdID | ExamID | MarksObtained |
+-------+--------+---------------+
| 1 | 9 | 20 |
| 1 | 10 | 20 |
| 1 | 11 | 20 |
+-------+--------+---------------+
StdCourseTeacher Table:
+-------+----------+------------+-------+
| StdID | CourseID | TeacherID | SemID |
+-------+----------+------------+-------+
| 1 | 9 | 7 | 6 |
| 1 | 10 | 7 | 6 |
| 1 | 11 | 2 | 6 |
| 2 | 9 | 7 | 6 |
| 2 | 10 | 7 | 6 |
+-------+----------+------------+-------+
Exam Table:
+--------+--------+----------+----------+-------+----------+-----------+
| ExamID | EvalID | Topic | MaxMarks | SemID | CourseID | TeacherID |
+--------+--------+----------+----------+-------+----------+-----------+
| 1 | 3 | Mid-Term | 20 | 6 | 1 | 3 |
| 2 | 3 | Mid-Term | 20 | 6 | 2 | 4 |
| 3 | 3 | Mid-Term | 20 | 6 | 3 | 7 |
+--------+--------+----------+----------+-------+----------+-----------+
Course Table:
+----------+---------------------------+----------+
| CourseID | CourseName | Semester |
+----------+---------------------------+----------+
| 1 | Calculus - 1 | 1 |
| 2 | Business Communications | 1 |
| 3 | Introduction To Computing | 1 |
+----------+---------------------------+----------+
Semester Table:
+-------+------------+
| SemID | SemName |
+-------+------------+
| 1 | Spring2020 |
| 2 | Summer2020 |
+-------+------------+
Teacher Table:
+-----------+-------------+
| TeacherID | TeacherName |
+-----------+-------------+
| 2 | Dr. Ahmed |
| 3 | Dr. Khalid |
+-----------+-------------+
I think you want to use group by in order to use aggregate functions as follows:
select CourseName, TeacherName, SemName, min(MarksObtained), Max(MarksObtained), avg(MarksObtained)
from teacher T
inner join CT on CT.teacherid = T.teacherid
inner join course C on C.courseid = CT.courseid
inner join semester S on S.semid = CT.semid
inner join stdmarks M on M.examid = C.courseid
group by CourseName, TeacherName, SemName

sql left join on same table

|-------------|------------|
| employee_id | team_id |
|:------------|:-----------|
| 1 | 8 |
| 2 | 8 |
| 3 | 8 |
| 4 | 7 |
| 5 | 9 |
| 6 | 9 |
|-------------|------------|
Write an SQL query to find the team size of each of the employees.
SELECT a.employee_id, COUNT(b.team_id) AS team_size
FROM Employee a LEFT JOIN
Employee b
ON a.team_id = b.team_id
GROUP BY a.employee_id
The answer above is correct I am just confused as to why you use a LEFT JOIN on two tables that are the same.
The best way to understand what is happening is to just view the intermediate table which results from the self join. It looks something like this:
+---------------+-----------+---------------+-----------+
| a.employee_id | a.team_id | b.employee_id | b.team_id |
+---------------+-----------+---------------+-----------+
| 1 | 8 | 1 | 8 | \
| 1 | 8 | 2 | 8 | 3 members
| 1 | 8 | 3 | 8 | /
| 2 | 8 | 1 | 8 | \
| 2 | 8 | 2 | 8 | 3 members
| 2 | 8 | 3 | 8 | /
| 3 | 8 | 1 | 8 | \
| 3 | 8 | 2 | 8 | 3 members
| 3 | 8 | 3 | 8 | /
| 4 | 7 | 4 | 7 | - 1 member only
| 5 | 9 | 5 | 9 | \
| 5 | 9 | 6 | 9 | / 2 members
| 6 | 9 | 5 | 9 | \
| 6 | 9 | 6 | 9 | / 2 members
+---------------+-----------+---------------+-----------+
The self left join causes each employee record on the left side to be replicated however many times there are other employees on the same team. Then, aggregating by employee and taking the count of records gives the sizes of each employee's team. In the intermediate table above, we can see this happening in progress.

Percentage to total in BigQuery Legacy SQL (Subqueries?)

I can't understand how to calulate percentage to total in BigQuery Legacy SQL.
So, I have a table:
ID | Name | Group | Mark
1 | John | A | 10
2 | Lucy | A | 5
3 | Jane | A | 7
4 | Lily | B | 9
5 | Steve | B | 14
6 | Rita | B | 11
I want to calculate percentage like this:
ID | Name | Group | Mark | Percent
1 | John | A | 10 | 10/(10+5+7)=45%
2 | Lucy | A | 5 | 5/(10+5+7)=22%
3 | Jane | A | 7 | 7/(10+5+7)=33%
4 | Lily | B | 9 | 9/(9+14+11)=26%
5 | Steve | B | 14 | 14/(9+14+11)=42%
6 | Rita | B | 11 | 11/(9+14+11)=32%
My table is quite long for me (3 million rows).
I thought that I could do it with subqueries, but in SELECT I can't use subqueries.
Does anyone know a way to do it?
SELECT
ID, Name, [Group], Mark,
RATIO_TO_REPORT(Mark) OVER(PARTITION BY [Group]) AS percent
FROM YourTable
Check more about RATIO_TO_REPORT

How do I select columns whenever they change?

I'm trying to create a slowly changing dimension (type 2 dimension) and am a bit lost on how to logically write it out. Say that we have a source table with a grain of Person | Country | Department | Login Time. I want to create this dimension table with Person | Country | Department | Eff Start time | Eff End Time.
Data could look like this:
Person | Country | Department | Login Time
------------------------------------------
Bob | CANADA | Marketing | 2009-01-01
Bob | CANADA | Marketing | 2009-02-01
Bob | USA | Marketing | 2009-03-01
Bob | USA | Sales | 2009-04-01
Bob | MEX | Product | 2009-05-01
Bob | MEX | Product | 2009-06-01
Bob | MEX | Product | 2009-07-01
Bob | CANADA | Marketing | 2009-08-01
What I want in the Type 2 dimension would look like this:
Person | Country | Department | Eff Start time | Eff End Time
------------------------------------------------------------------
Bob | CANADA | Marketing | 2009-01-01 | 2009-03-01
Bob | USA | Marketing | 2009-03-01 | 2009-04-01
Bob | USA | Sales | 2009-04-01 | 2009-05-01
Bob | MEX | Product | 2009-05-01 | 2009-08-01
Bob | CANADA | Marketing | 2009-08-01 | NULL
Assume that Bob's name, Country and Department hasn't been updated since 2009-08-01 so it's left as NULL
What function would work best here? This is on Netezza, which uses a flavor of Postgres.
Obviously GROUP BY would not work here because of same groupings later on (I added in Bob | CANADA | Marketing at the last row to show this.
EDIT
Including a hash column on Person, Country, and Department, would make sense, correct? Thinking of using logic of
SELECT PERSON, COUNTRY, DEPARTMENT
FROM table t1
where
person = person
AND t1.hash <> hash_function(person, country, department)
Answer
create table so (
person varchar(32)
,country varchar(32)
,department varchar(32)
,login_time date
) distribute on random;
insert into so values ('Bob','CANADA','Marketing','2009-01-01');
insert into so values ('Bob','CANADA','Marketing','2009-02-01');
insert into so values ('Bob','USA','Marketing','2009-03-01');
insert into so values ('Bob','USA','Sales','2009-04-01');
insert into so values ('Bob','MEX','Product','2009-05-01');
insert into so values ('Bob','MEX','Product','2009-06-01');
insert into so values ('Bob','MEX','Product','2009-07-01');
insert into so values ('Bob','CANADA','Marketing','2009-08-01');
/* ************************************************************************** */
with prm as ( --Create an ordinal primary key.
select
*
,row_number() over (
partition by person
order by login_time
) rwn
from
so
), chn as ( --Chain events to their previous and next event.
select
cur.rwn
,cur.person
,cur.country
,cur.department
,cur.login_time cur_login
,case
when
cur.country = prv.country
and cur.department = prv.department
then 1
else 0
end prv_equal
,case
when
(
cur.country = nxt.country
and cur.department = nxt.department
) or nxt.rwn is null --No next record should be equivalent to matching.
then 1
else 0
end nxt_equal
,case prv_equal
when 0 then cur_login
else null
end eff_login_start_sparse
,case
when eff_login_start_sparse is null
then max(eff_login_start_sparse) over (
partition by cur.person
order by rwn
rows unbounded preceding --The secret sauce.
)
else eff_login_start_sparse
end eff_login_start
,case nxt_equal
when 0 then cur_login
else null
end eff_login_end
from
prm cur
left outer join prm nxt on
cur.person = nxt.person
and cur.rwn + 1 = nxt.rwn
left outer join prm prv on
cur.person = prv.person
and cur.rwn - 1 = prv.rwn
), grp as ( --Group by login starts.
select
person
,country
,department
,eff_login_start
,max(eff_login_end) eff_login_end
from
chn
group by
person
,country
,department
,eff_login_start
), led as ( --Change the effective end to be the next start, if desired.
select
person
,country
,department
,eff_login_start
,case
when eff_login_end is null
then null
else
lead(eff_login_start) over (
partition by person
order by eff_login_start
)
end eff_login_end
from
grp
)
select * from led order by eff_login_start;
This code returns the following table.
PERSON | COUNTRY | DEPARTMENT | EFF_LOGIN_START | EFF_LOGIN_END
--------+---------+------------+-----------------+---------------
Bob | CANADA | Marketing | 2009-01-01 | 2009-03-01
Bob | USA | Marketing | 2009-03-01 | 2009-04-01
Bob | USA | Sales | 2009-04-01 | 2009-05-01
Bob | MEX | Product | 2009-05-01 | 2009-08-01
Bob | CANADA | Marketing | 2009-08-01 |
Explanation
I must have solved this four or five times in the past few years and keep neglecting to write it down formally. I'm glad to have the chance to do it, so this is a great question.
When attempting this, I like writing down the problem in matrix form. Here's the input, presuming that all values have the same key in the SCD.
Cv | Ce
----|----
A | 10
A | 11
B | 14
C | 16
D | 18
D | 25
D | 34
A | 40
Where Cv is the value that we'll need to compare against (again, presuming that the key value for the SCD is equal in this data; we'll be partitioning over the key value the entire time so it's irrelevant to the solution) and Ce is the event time.
First, we need an ordinal primary key. I've designated this Ck in the table. This will allow us to join the table to itself to get the previous and next events. I've called these columns Pk (previous key), Nk (next key), Pv, and Nv.
Cv | Ce | Ck | Pk | Pv | Nk | Nv |
----|----|----|----|----|----|----|
A | 10 | 1 | | | 2 | A |
A | 11 | 2 | 1 | A | 3 | B |
B | 14 | 3 | 2 | A | 4 | C |
C | 16 | 4 | 3 | B | 5 | D |
D | 18 | 5 | 4 | C | 6 | D |
D | 25 | 6 | 5 | D | 7 | D |
D | 34 | 7 | 6 | D | 8 | A |
A | 40 | 8 | 7 | D | | |
Now we need some columns to see if we're at the beginning or end of a contiguous event block. I'll call these Pc and Nc, for contiguous. Pc is defined as Pv = Cv => true. 1 represents true and 0 represents false. Nc is defined similarly, except that the null case defaults to true (we'll see why in a minute)
Cv | Ce | Ck | Pk | Pv | Nk | Nv | Pc | Nc |
----|----|----|----|----|----|----|----|----|
A | 10 | 1 | | | 2 | A | 0 | 1 |
A | 11 | 2 | 1 | A | 3 | B | 1 | 0 |
B | 14 | 3 | 2 | A | 4 | C | 0 | 0 |
C | 16 | 4 | 3 | B | 5 | D | 0 | 0 |
D | 18 | 5 | 4 | C | 6 | D | 0 | 1 |
D | 25 | 6 | 5 | D | 7 | D | 1 | 1 |
D | 34 | 7 | 6 | D | 8 | A | 1 | 0 |
A | 40 | 8 | 7 | D | | | 0 | 1 |
Now you can start to see how the 1,1 combination of Pc,Nc is a completely useless record. We know this intuitively, since Bob's Mex/Product combination on the 6th row is pretty much useless information when building an SCD.
So let's get rid of the useless information. I'll add two new columns here: an almost-complete effective start time called Sn and an actually-complete effective end time called Ee. Sn is is populated with Ce when Pc is 0 and Ee is populated with Ce when Nc is 0.
Cv | Ce | Ck | Pk | Pv | Nk | Nv | Pc | Nc | Sn | Ee |
----|----|----|----|----|----|----|----|----|----|----|
A | 10 | 1 | | | 2 | A | 0 | 1 | 10 | |
A | 11 | 2 | 1 | A | 3 | B | 1 | 0 | | 11 |
B | 14 | 3 | 2 | A | 4 | C | 0 | 0 | 14 | 14 |
C | 16 | 4 | 3 | B | 5 | D | 0 | 0 | 16 | 16 |
D | 18 | 5 | 4 | C | 6 | D | 0 | 1 | 18 | |
D | 25 | 6 | 5 | D | 7 | D | 1 | 1 | | |
D | 34 | 7 | 6 | D | 8 | A | 1 | 0 | | 34 |
A | 40 | 8 | 7 | D | | | 0 | 1 | 40 | |
This looks really close, but we still have the problem that we can't group by Cv (person/country/department). What we need is for Sn to populate all those nulls with the previous value of Sn. You could join this table to itself on rwn < rwn and get the maximum, but I'm going to be lazy and use Netezza's analytic functions and the rows unbounded preceding clause. It's a shortcut to the method I just described. So we're going to create another column called Es, efffective start, defined as follows.
case
when Sn is null
then max(Sn) over (
partition by k --key value of the SCD
order by Ck
rows unbounded preceding
)
else Sn
end Es
With that definition, we get this.
Cv | Ce | Ck | Pk | Pv | Nk | Nv | Pc | Nc | Sn | Ee | Es |
----|----|----|----|----|----|----|----|----|----|----|----|
A | 10 | 1 | | | 2 | A | 0 | 1 | 10 | | 10 |
A | 11 | 2 | 1 | A | 3 | B | 1 | 0 | | 11 | 10 |
B | 14 | 3 | 2 | A | 4 | C | 0 | 0 | 14 | 14 | 14 |
C | 16 | 4 | 3 | B | 5 | D | 0 | 0 | 16 | 16 | 16 |
D | 18 | 5 | 4 | C | 6 | D | 0 | 1 | 18 | | 18 |
D | 25 | 6 | 5 | D | 7 | D | 1 | 1 | | | 18 |
D | 34 | 7 | 6 | D | 8 | A | 1 | 0 | | 34 | 18 |
A | 40 | 8 | 7 | D | | | 0 | 1 | 40 | | 40 |
The rest is trivial. Group by Es and grab the max of Ee to obtain this table.
Cv | Es | Ee |
----|----|----|
A | 10 | 11 |
B | 14 | 14 |
C | 16 | 16 |
D | 18 | 34 |
A | 40 | |
If you want to populate the effective end time with the next start, join the table again to itself or use the lead() window function to grab it.