Oracle SQL query regarding group by - sql

i have 2 tables, article and caretaker with following columns and structure
SQL> desc caretaker;
Name Null? Type
----------------------------------------- -------- ----------------------------
CID NOT NULL NUMBER(5)
CNAME VARCHAR2(15)
ADDRESS VARCHAR2(20)
SALARY NUMBER(10,2)
SQL> desc article;
Name Null? Type
----------------------------------------- -------- ----------------------------
ART_NO NOT NULL NUMBER(5)
ART_TITLE VARCHAR2(15)
TYPE VARCHAR2(15)
A_DATE DATE
CID NUMBER(5)
MUSEUM_ID NUMBER(5)
and i need to execute 2 queries,
1) find the details of the articles cared by person whose salary is more than 20000 and who takes care of atleast 2 articles
2)display the details of the caretaker taking care of maximum articles.
for the 1st query i have made it do far
select a.art_no,a.art_title,a.type,a.a_date from article a,caretaker c
where a.cid = c.cid and c.salary > 20000;
now i am confused about on how to extract the articles which are cared by person who takes care of atleast 2 articles?!!
2)for the second query,
select c.cid,c.cname,c.address,c.salary from caretaker c,article a
where c.cid=a.cid
and count( select a.cid from article a group by a.cid ) = MAX(a.cid)?????
am confused,please correct me,thank you
(I'm not supposed to JOIN commands)

For the first query:
select a.art_no,a.art_title,a.type,a.a_date
from article a,caretaker c
where a.cid = c.cid and c.salary > 20000
and c.cid in (select cid from article group by cid having count(cid) > 1)
SQL Fiddle: http://sqlfiddle.com/#!4/bef24/16
For the second query:
select cname, a.art_no,a.art_title,a.type,a.a_date
from article a,caretaker c
where a.cid = c.cid and c.cid = (select cid
from(select cid, count(cid)
from article
group by cid
having count(cid) = (select max(count(cid))
from article group by cid)));
SQL Fiddle: http://sqlfiddle.com/#!4/bef24/18

Related

Finding max of a column while doing inner join of two tables

I have two tables as follows:
Table A
=====================
student_id test_week
-------- ---------
s1 2018-12-01
s1 2018-12-08
Table B
======================
student_id last_updated remarks
-------- ------------ --------
s1 2018-12-06 Fail
s1 2018-12-10 Pass
Above two tables, I want to fetch following columns:
student_id, last(test_week) and remarks such that
last_updated>=test_week -1 and last_updated<=test_week-15,
i.e. last_updated should be within two weeks of last(test_week), so following will be the result for above entries:
s1 2018-12-08 Pass
I have written like following:
select a.student_id, test_week, remarks
from A inner join B
on A.student_id = B.student_id
and DATEDIFF(last_updated, test_week)>=1
and DATEDIFF(last_updated, test_week)<=15;
But how I will handle the last(test_week), that I am not getting.
If you need the only record related to the last test_week then you can do the following. If I understood this right.
select top 1 a.student_id, test_week, remarks
from A inner join B
on A.student_id = B.student_id
and DATEDIFF(last_updated, test_week)>=1
and DATEDIFF(last_updated, test_week)<=15
order by last_week desc;
You can try to use window function row_number(). The following query will give the max(test_week) for every student_id.
select * from (
select id, test_week, remarks, row_number()
over (partition by id order by test_week desc) as rn
from (
select a.id, test_week, remarks from A join B on A.id = B.id and last_updated - test_week >=1 and last_updated - test_week <=15)tb1
)tb2 where rn=1;
Note : The above query is supported in postgresql, you might want to convert it into equivalent Mysql query

SQL query on similarity

I have to display all the customers who have been referred by a referrer with the same last name as the customer.
You can use a self-join as
select c1.customer#, c1.lastname, c1.city, c1.zip, c1.referred
from customers c1
join customers c2
on c1.customer# = c2.referred
and c1.lastname = c2.lastname;
customer# lastname city zip referred
--------- -------- ----------- ------ ---------
1003 SMITH TALLAHASSEE 32306 NULL
Rextester Demo
You can try this query
select cust.*, cust_ref.*
from customers cust,
referred cust_ref
where cust_ref.lastname = cust.lastname
Note : You can select field according your requirement.
I hope it will used.

Concerned with query size using non-unique join conditions

I have a situation at work. I work in housing. We raise orders to houses (so our contractors can go out and repair the houses).
Orders contain one or more jobs. A dwelling has zero, one or more orders raised against it.
This is a brief data definition. I've simplified the tables - but hopefully you get the idea. An order can contain many jobs, and a property can have many orders.
CREATE TABLE dwellings (
id VARCHAR2(10) PRIMARY KEY NOT NULL,
address VARCHAR2(100) NOT NULL
);
CREATE TABLE orders (
id VARCHAR2(10) PRIMARY KEY NOT NULL,
created_by VARCHAR2(10) NOT NULL,
created_on DATE NOT NULL,
dwelling_id VARCHAR2(10) NOT NULL REFERENCES dwellings(id)
);
CREATE TABLE jobs (
id VARCHAR2(10) PRIMARY KEY NOT NULL,
sor_id VARCHAR2(10) NOT NULL,
order_id VARCHAR2(10) NOT NULL REFERENCES orders(id)
);
And populated:
INSERT INTO dwellings VALUES ('00ABC', '2 The Mews House Little Boston London E1 1EE');
INSERT INTO dwellings VALUES ('5H88H', '3 Electric House Snodsbury S1 1IT');
INSERT INTO orders VALUES ('000001-A', 'CSMITH', DATE '2016-03-10', '00ABC');
INSERT INTO orders VALUES ('000002-A', 'CSMITH', DATE '2016-03-11', '00ABC');
INSERT INTO orders VALUES ('000003-A', 'AJONES', DATE '2016-03-16', '00ABC');
INSERT INTO orders VALUES ('000004-A', 'CSMITH', DATE '2016-03-16', '5H88H');
INSERT INTO jobs VALUES ('001', '000AA0', '000001-A');
INSERT INTO jobs VALUES ('002', '123BB0', '000001-A');
INSERT INTO jobs VALUES ('003', '000AA0', '000002-A');
INSERT INTO jobs VALUES ('004', '787XD7', '000003-A');
INSERT INTO jobs VALUES ('005', '000AA0', '000003-A');
INSERT INTO jobs VALUES ('006', '787XD7', '000004-A');
An analyst wants to know agents who are raising orders that are similar to previous orders. The thing under scrutiny is the SOR_ID, which denotes the type of job. Remember, there is one or more job associated with each order. So the task is: produce a report showing orders that contain one or more duplicate job types to previous orders at the property.
The report I'm building will have these column headings.
Agent Name
Order Id
Address
Previous Order Id
Duplicate Job Types
Here is the start of a query that gets there. I haven't executed it against the database because there are 50,000 properties and 100,000 orders and 200,000 jobs. I'm concerned about the size of the table because I'm joining on columns that are not unique.
select * from orders ord
join orders ord2 on ord.dwelling_id = ord2.dwelling_id --shaky
and ord.id <> ord2.id
and ord.created_on - ord2.created_on between 0 and 90
join jobs job on job.order_id = ord.id
join jobs job2 on job2.order_id = ord2.id
where job.sor_id = job2.sor_id
I'm looking for recommendations for how you might refactor this query into something more manageable (without PLSQL). Note that I haven't used LAG / LEAD and I haven't yet used LISTAGG to collapse the job type codes. That will come later. I'm concerned about how expensive the query is at the moment.
Query:
SELECT o.created_by AS agent_name,
d.address,
LISTAGG( o.id, ',' ) WITHIN GROUP ( ORDER BY o.created_on ) AS order_ids,
j.sor_id AS job_type
FROM dwellings d
INNER JOIN orders o
ON ( o.dwelling_id = d.id )
INNER JOIN jobs j
ON ( j.order_id = o.id )
GROUP BY o.created_by, d.address, j.sor_id
HAVING COUNT(1) > 1;
Output:
AGENT_NAME ADDRESS ORDER_IDS JOB_TYPE
---------- -------------------------------------------- ----------------- ----------
CSMITH 2 The Mews House Little Boston London E1 1EE 000001-A,000002-A 000AA0
Lists the jobs with the different order ids that were of the same type and placed by the same agent at the same address. The orders are listed in chronological order within the comma-separated list.
However, if you want it with your headings then you could do:
SELECT *
FROM (
SELECT o.created_by AS agent_name,
o.id,
d.address,
LAG( o.id ) OVER ( PARTITION BY o.created_by, d.address, j.sor_id
ORDER BY o.created_on
) AS previous_order_id,
j.sor_id AS job_type
FROM dwellings d
INNER JOIN orders o
ON ( o.dwelling_id = d.id )
INNER JOIN jobs j
ON ( j.order_id = o.id )
)
WHERE previous_order_id IS NOT NULL;
Which would output:
AGENT_NAME ID ADDRESS PREVIOUS_ORDER_ID JOB_TYPE
---------- ---------- -------------------------------------------- ----------------- ----------
CSMITH 000002-A 2 The Mews House Little Boston London E1 1EE 000001-A 000AA0
If you want to consider multiple agents then you can remove o.created_by from the GROUP BYor PARTITION BY clauses. For the top query you would then need to use LISTAGG to get all the agents. Like this:
SELECT LISTAGG( o.created_by, ',' ) WITHIN GROUP ( ORDER BY o.created_on ) AS agent_name,
d.address,
LISTAGG( o.id, ',' ) WITHIN GROUP ( ORDER BY o.created_on ) AS order_ids,
j.sor_id AS job_type
FROM dwellings d
INNER JOIN orders o
ON ( o.dwelling_id = d.id )
INNER JOIN jobs j
ON ( j.order_id = o.id )
GROUP BY d.address, j.sor_id
HAVING COUNT(1) > 1;
Or, for the second query, like this:
SELECT *
FROM (
SELECT o.created_by AS agent_name,
o.id,
d.address,
LAG( o.id ) OVER ( PARTITION BY d.address, j.sor_id
ORDER BY o.created_on
) AS previous_order_id,
j.sor_id AS job_type
FROM dwellings d
INNER JOIN orders o
ON ( o.dwelling_id = d.id )
INNER JOIN jobs j
ON ( j.order_id = o.id )
)
WHERE previous_order_id IS NOT NULL;
Both the queries would then also output the order with id 000003-A placed by AJONES.
Changes i would try out:
ord.id <> ord2.id : ord2.id < ord.id (not sure if that's applicable for you)
ord.created_on - ord2.created_on between 0 and 90 : ord2.created_on <= ord.created_on and ord2.created_on >= ord.created_on - 90 (not sure if the RDBMS can do that optimization)
Move job.sor_id = job2.sor_id into the ON clause (But the RDBMS will probably do that for you)
select * from orders ord
join orders ord2
on ord2.dwelling_id = ord.dwelling_id
and ord2.id < ord.id
and ord2.created_on <= ord.created_on
and ord2.created_on >= ord.created_on - 90
join jobs job on job.order_id = ord.id
join jobs job2
on job2.order_id = ord2.id
and job2.sor_id = job.sor_id;
Indexes you will need:
orders(dwelling_id, created_on, id)
jobs(order_id, sor_id)

Complicated table join

I thought I had a good grasp on table joins but there is one problem here I can't figure out.
I am trying to track the progress of students on specifically required courses. Some students are required to complete an exact list of courses before further qualification.
Tables (simplified):
students
--------
id INT PRIMARY KEY
name VARCHAR(50)
student_courses
---------------
student_id INT PRIMARY KEY
course_id TINYINT PRIMARY KEY
course_status TINYINT (Not done, Started, Completed)
steps_done TINYINT
total_steps TINYINT
date_created DATETIME
date_modified DATETIME
courses
-------
id TINYINT PRIMARY KEY
name VARCHAR(50)
I want to insert a list of required courses, for example 5 different courses in the courses table and then select a specific student and get list of all the courses required, whether a row exists for that course in the student_courses table or not.
I guess I could insert all rows from the courses table in the student_courses table for each student, but I don't want that because not all students need to do these courses. And what if new courses are added later.
I just want a result which is something like this:
students table:
id name
--- ------------------
1 George Smith
2 Dana Jones
3 Maria Cobblestone
SELECT * FROM students (JOIN bla bla bla - this is the point where I'm lost...)
WHERE students.id = 1
Result:
id name course_id courses.name course_status steps_done
--- ------------------ --------- ------------ ------------- ----------
1 George Smith 1 Botany Not started 0
1 George Smith 2 Biology NULL NULL
1 George Smith 3 Physics NULL NULL
1 George Smith 4 Algebra Completed 34
1 George Smith 5 Sewing Started 2
If the course_status or steps_done is NULL it means that no row exists for this student for this course in the student_courses table.
The idea is then using this in MS Access (or some other system) and have the row automatically inserted in the student_courses table once you enter a value in the NULL field.
You can't just use an outer join to do this, you need to create a list of all students/classes combinations that you're interested in first, then use that list in a LEFT JOIN. Can be done in a cte/subquery using CROSS JOIN:
;WITH cte AS (SELECT DISTINCT s.id Student_ID
,s.name
,c.id Course_ID
,c.name Class_Name
FROM Students s
CROSS JOIN Courses c)
SELECT cte.*,sc.status
FROM cte
LEFT JOIN student_courses sc
ON cte.course_id = sc.course_id
Can also use a subquery if needs to be done in Access (not 100% on syntax in Access):
SELECT sub.*,sc.status
FROM (SELECT DISTINCT s.id Student_ID
,s.name
,c.id Course_ID
,c.name Class_Name
FROM Students s
CROSS JOIN Courses c
) AS sub
LEFT JOIN student_courses sc
ON sub.course_id = sc.course_id
Demo: SQL Fiddle
You want a left outer join. The first table is from the courses table and is used for the required courses (defined in the where clause).
select s.id, s.name, c.id, c.name, c.course_status, c.steps_done
from (courses as c left join
student_courses as sc
on sc.course_id = c.id and
sc.student_id = 1
) left join
students as s
on sc.student_id = s.id
where c.id in (<list of required courses>)
order by s.id, c.id;
I think I have all the "Access"isms in there.
Actually, the above will be missing the student name when s/he is missing a course. The following is more correct:
select s.id, s.name, c.id, c.name, c.course_status, c.steps_done
from (courses as c left join
student_courses as sc
on sc.course_id = c.id and
sc.student_id = 1
) cross join
students as s
on s.id = 1
where c.id in (<list of required courses>)
order by s.id, c.id;

SQL Query - count - max

I cant manage to come up with a query for a problem. I have three tables
CREATE TABLE institute (
iid INT PRIMARY KEY,
sign VARCHAR(127) UNIQUE,
city VARCHAR(127) NOT NULL,
area INT CHECK (area>0));
CREATE TABLE desease (
did INT PRIMARY KEY,
name VARCHAR(127) UNIQUE,
level INT CHECK (level>0));
CREATE TABLE studies (
did INT,
iid INT,
FOREIGN KEY (did) REFERENCES desease (did),
FOREIGN KEY (iid) REFERENCES institute (iid),
PRIMARY KEY (iid,did));
My question is: What are the names of the deseases by the largest number of institutes from Lisbon (Lisbon beeng the city from institute). This is what i came up with but it doesnt give me the right answer.
SELECT DISTINCT D.name, MAX(I.iid)
FROM desease D, studies S
JOIN institute I ON (S.iid = I.iid)
WHERE I.city = 'Lisboa' AND D.did = S.did
GROUP BY D.nome
HAVING COUNT(I.iid) = MAX(I.city)
As an example : Imagine 5 institutes al with city = 'Lisbon' and with iid A,B,C,D,E respectevely (just for demonstration purposes, I know type is INT); 5 Diseases with name = Z,X,N,V,M respectevely.
Now lets say desease Z,X, and M are studied by institutes A,B,C (in any order), desease N is studied by D(1 inst.) and desease V is studied by E (only one). So the max number of deseases studied by any Lisbon institute is 3 (A,B and C all study 3 deseases) so the table would look like this
Z - 3
X - 3
M - 3
Edit : I managed to found a way to do it. Here is the query that I came up with
SELECT DISTINCT D.name, COUNT(*) AS C
FROM desease D, studies E, institute I
WHERE I.iid = E.iid AND D.did = E.did AND I.city = "Lisboa"
GROUP BY D.name
HAVING C >= ALL (
SELECT COUNT(*)
FROM desease D, studies E, institute I
WHERE I.iid = E.iid AND D.did = E.did AND I.cidade = "Lisboa"
GROUP BY D.name
);
I don't understand structure/problme well enough but I did see that you were mixing joins and had a cross join which would inflate the number of recrds.
SELECT DISTINCT D.name, MAX(I.iid)
FROM desease D
INNER JOIN studies S ON D.iid=S.Did
INNER JOIN institute I ON (S.iid = I.iid)
WHERE I.city = 'Lisboa' AND D.did = S.did
GROUP BY D.nome
HAVING COUNT(I.iid) = MAX(I.city)
This would return a list of disease names that have an institute in Lisbon starting with the one with the greatest number of institutes in Lisbon and going down:
SELECT D.name, COUNT(*) as numberOfInstitutes
FROM desease D
INNER JOIN studies S ON D.did=S.did
INNER JOIN institute I ON (S.iid = I.iid)
WHERE I.city = 'Lisbon'
GROUP BY D.did
ORDER BY COUNT(*) desc
If you need only the one that has the most institutes and you need the rest of the columns from the desease table, you can do this (in Sql Server):
SELECT TOP 1 D.*
FROM desease D
INNER JOIN
(
SELECT D.did, COUNT(*) as numberOfInstitutes
FROM desease D
INNER JOIN studies S ON D.did=S.did
INNER JOIN institute I ON (S.iid = I.iid)
WHERE I.city = 'Lisbon'
GROUP BY D.did
) as tblCount on tblCount.did = D.did
ORDER BY numberOfInstitutes desc
Just a rough guess what you need:
SELECT stu.iid, COUNT(*) AS nstudies
FROM studies stu, institute ins
WHERE stu.iid=ins.iid
AND ins.city='Lisboa'
GROUP BY stu.iid
ORDER BY nstudies DESC;
This should give you a list of institutes that are in Lisboa and the number of studies they did.
SELECT stu.did, COUNT(*) AS ninst
FROM studies stu, institute ins, disease dis
WHERE stu.iid=ins.iid
AND stu.did=dis.did
AND ins.city='Lisboa'
GROUP BY stu.did
ORDER BY ninst DESC;
This gives you a list of deseases and the number of Lisboa instutitues that did it.
Unfortunately your question leaves a lot of room for speculation as to what you need -- maybe you should add some example data and the expected result.