Concatenating multiple rows that differ in only one column - sql

I'm writing an app and I have the following tables in SQLite:
course:
_id | name | a | b
university:
_id | name | c | d
course_university:
course_id | university_id
Course_university links the courses with the universities that offer them. It's a many-to-many relationship. I need a request that would give me the following
course._id | course.name | course.a | university.name | university.c
The query I thought would work was
SELECT c._id, c.name, c.a, u.name, u.c
FROM course AS c, university AS u, course_university AS cu
WHERE c._id=cu.course_id AND u._id=cu.university_id
The problem is that if there is a course offered by more than one university, the above query will show it twice, the only difference being in the university column. Is there a way to concatenate the university names for once course, so instead of getting
20 | Calculus | 23 | Stanford | 5 |
20 | Calculus | 23 | Harvard | 5 |
I'd get
20 | Calculus | 23 | Stanford & Harvard | 5 |
In my case there might be more than 2 universities working together on one course, so if it accommodates for concatenating three rows then great. This is my first time dealing with SQL databases, so I'm not that aware of any more advanced methodology to solve this.

Here is an example of how you use group_concat():
SELECT c._id, c.name, c.a, group_concat(u.name, ' & ') as universities, u.c
FROM course_university cu join
course c
on c._id = cu.course_id join
university AS u,
on u._id = cu.university_id
group by c._id, c.name, c.a, u.c;
I also changed the query syntax to use explicit, ANSI standard join syntax.

Related

Compare Two Relations in SQL

I just started studying SQL and this is a demo given by the teacher in an online course and it works fine. The statement is looking for "students such that number of other students with same GPA is equal to number of other students with same sizeHS":
select *
from Student S1
where (
select count(*)
from Student S2
where S2.sID <> S1.sID and S2.GPA = S1.GPA
) = (
select count(*)
from Student S2
where S2.sID <> S1.sID and S2.sizeHS = S1.sizeHS
);
It seems that in this where clause, we're comparing two relations (because the result of a subquery is a relation), but most of the time we are comparing attributes(as far as I've seen).
So I'm thinking about whether there are requirements for how many attributes, and how many tuples, the RELATION should contain when comparing two RELATIONS. If not, how do we compare two RELATIONS when there're multiple attributes or multiple tuples and what do we get for result?
Note:
Student relation has 4 attributes: sID, sName, GPA, sizeHS. And here's the data:
+-----+--------+-----+--------+
| sID | sName | GPA | sizeHS |
+-----+--------+-----+--------+
| 123 | Amy | 3.9 | 1000 |
| 234 | Bob | 3.6 | 1500 |
| 345 | Craig | 3.5 | 500 |
| 456 | Doris | 3.9 | 1000 |
| 567 | Edward | 2.9 | 2000 |
| 678 | Fay | 3.8 | 200 |
| 789 | Gary | 3.4 | 800 |
| 987 | Helen | 3.7 | 800 |
| 876 | Irene | 3.9 | 400 |
| 765 | Jay | 2.9 | 1500 |
| 654 | Amy | 3.9 | 1000 |
| 543 | Craig | 3.4 | 2000 |
+-----+--------+-----+--------+
and the result of this query is:
+-----+--------+-----+---------+
| sID | sName | GPA | sizeHS |
+-----+--------+-----+---------+
| 345 | Craig | 3.5 | 500 |
| 567 | Edward | 2.9 | 2000 |
| 678 | Fay | 3.8 | 200 |
| 789 | Gary | 3.4 | 800 |
| 765 | Jay | 2.9 | 1500 |
| 543 | Craig | 3.4 | 2000 |
+-----+--------+-----+---------+
because the result of a subquery is a relation
Relation is the scientific name for what we call a table in a database and I like the name "table" much better than "relation". A table is easy to imagine. We know them from our school time schedule for instance. Yes, we relate things here inside a table (day and time and the subject taught in school), but we can also relate tables to tables (pupils' timetables with the table of class rooms, the overall subject schedule, and the teacher's timetables). As such, tables in an RDBMS are also related to each other (hence the name relational database management system). I find the name relation for a table quite confusing (and many people use the word "relation" to describe the relations between tables instead).
So, yes, a query result itself is again a table ("relation"). And from tables we can of course select:
select * from (select * from b) as subq;
And then there are scalar queries that return exactly one row and one column. select count(*) from b is such a query. While this is still a table we can select from
select * from (select count(*) as cnt from b) as subq;
we can even use them where we usually have single values, e.g. in the select clause:
select a.*, (select count(*) from b) as cnt from a;
In your query you have two scalar subqueries in your where clause.
With subqueries there is another distinction to make: we have correlated and non-correlated subqueries. The last query I have just shown contains a non-correlated subquery. It selects the count of b rows for every single result row, no matter what that row contains elsewise. A correlated subquery on the other hand may look like this:
select a.*, (select count(*) from b where b.x = a.y) as cnt from a;
Here, the subquery is related to the main table. For every result row we look up the count of b rows matching the a row we are displaying via where b.x = a.y, so the count is different from row to row (but we'd get the same count for a rows sharing the same y value).
Your subqueries are also correlated. As with the select clause, the where clause deals with one row at a time (in order to keep or dismiss it). So we look at one student S1 at a time. For this student we count other students (S2, where S2.sID <> S1.sID) who have the same GPA (and S2.GPA = S1.GPA) and count other students who have the same sizeHS. We only keep students (S1) where there are exactly as many other students with the same GPA as there are with the same sizeHS.
UPDATE
As do dealing with multiple tuples as in
select *
from Student S1
where (
select count(*), avg(grade)
from Student S2
where S2.sID <> S1.sID and S2.GPA = S1.GPA
) = (
select count(*), avg(grade)
from Student S2
where S2.sID <> S1.sID and S2.sizeHS = S1.sizeHS
);
this is possible in some DBMS, but not in SQL Server. SQL Server doesn't know tuples.
But there are other means to achieve the same. You could just add two subqueries:
select * from student s1
where (...) = (...) -- compare counts here
and (...) = (...) -- compare averages here
Or get the data in the FROM clause and then deal with it. E.g.:
select *
from Student S1
cross apply
(
select count(*) as cnt, avg(grade) as avg_grade
from Student S2
where S2.sID <> S1.sID and S2.GPA = S1.GPA
) sx
cross apply
(
select count(*) as cnt, avg(grade) as avg_grade
from Student S2
where S2.sID <> S1.sID and S2.sizeHS = S1.sizeHS
) sy
where sx.cnt = sy.cnt and sx.avg_grade = sy.avg_grade;
There are relational operations:
The intersection operator produces the set of tuples that two
relations share in common. Intersection is implemented in SQL in the
form of the INTERSECT operator.
The difference operator acts on two relations and produces the set of tuples from the first relation that do not exist in the second relation. Difference is implemented in SQL in the form of the EXCEPT or MINUS operator.
So, in the context of SQL Server, for example, you can do:
SELECT *
FROM R1
EXCEPT
SELECT *
FROM R2
to get rows in R1 not included in R2 and the reverse - to get all differences.
Of course, the attributes must be the same - if not, you need to explicit set the attributes in the SELECT.

sql find all instances where all are part of distinct group

I'm sure a similar question to this has already been asked and answered, but I haven't been able to find anything in search, please be gentle.
I would like to know all the names of faculty members in a database who teach in every room of a building. The tables are very bare, but they are:
class:
+--------+---------+------+------+
| cname | meetsat | room | fid |
+--------+---------+------+------+
| class | 8 | R128 | 5 |
| class2 | 9 | R129 | 6 |
| class3 | 9 | R128 | 5 |
+--------+---------+------+------+
faculty:
+-----+---------------+--------+
| fid | fname | deptid |
+-----+---------------+--------+
| 5 | i.teach | 999 |
| 6 | other guy | 998 |
| 8 | another woman | 997 |
+-----+---------------+--------+
Through discussion with other users so far, I have:
(SELECT f.fname
FROM faculty f, class c
WHERE f.fid = c.fid)
UNION
(select c.fid
from class c
group by c.fid
having count(distinct room) = (select count(distinct c2.room) from class
c2));
current output:
+-----------+
| fname |
+-----------+
| i.teach |
| other guy |
+-----------+
desired output should be:
+---------+
| fname |
+---------+
| i.teach |
+---------+
I think I only need to join correctly. The course materials I have are extremely bare-bones and don't offer much in concept instruction, so I don't know who to apply them in different situations.
Here's a query that may do what you require, implementing your algorithm of comparing counts. It is an alternative to the HAVING posted by Gordon
SELECT * FROM
(SELECT count(distinct room) as countAllRooms FROM class) ar
INNER JOIN
(SELECT c.fid, count(distinct c.room) as countRoomsPerTeacher FROM class c GROUP BY c.fid) rpt
ON
rpt.countRoomsPerTeacher = ar.countAllRooms
INNER JOIN
faculty f
ON
f.fid = rpt.fid
In relation to your query on Gordon's answer, the safest way to join the faculty table:
Select * from faculty inner join
(
select c.fid
from class C
group by c.fid
having count(distinct room) = (select count(distinct c2.room) from class c2)
) ff
on ff.fid = faculty.fid
I wouldn't normally format an sql like this but I've done this deliberately to show the bits I added and which bits were Gordon's
You should avoid trying to join the he faculty table into the inner query that does the grouping as it will force you to add more columns to your select list, which forces you to add more to your group by, which breaks your counting,, better to consider Gordon's query a "faculty finder" that runs in isolation as a sub query and is joined later
You are looking for having:
select c.fid
from class c
group by c.fid
having count(distinct c.room) = (select count(distinct c2.room) from class c2);
Getting the name is just a matter of joining in the faculty table.
What about inner join or I didn’t understand your question.
Select f.name from faculty f inner join class c on f.fid=c.fid

How can I find all columns A whose subcategories B are all related to the same column C?

I'm trying to better understand relational algebra and am having trouble solving the following type of question:
Suppose there is a column A (Department), a column B (Employees) and a column C (Managers). How can I find all of the departments who only have one manager for all of their employees? An example is provided below:
Department | Employees | Managers
-------------+-------------+----------
A | John | Bob
A | Sue | Sam
B | Jim | Don
B | Alex | Don
C | Jason | Xie
C | Greg | Xie
In this table, the result I should get are all tuples containing departments B and C because all of their employees are managed by the same person (Don and Xie respectively). Department A however, would not be returned because it's employees have multiple managers.
Any help or pointers would be appreciated.
Such problems usually call for a self-join.
Joining the relation onto itself on Department, then filtering out the tuples where the Managers are equal would yield us all the unwanted tuples, which we can just subtract from the original relations.
Here's how I'd do it:
First we make a copy of table T, and call it T2, then take a cross product of T and T2. From the result we select all the rows where T1.Manager /= T2.Manager but T1.Department=T2.Department, yielding us these tuples:
T1.Department | T1.Employees| T1.Managers | T2.Managers | T2.Employees | T2.Department
--------------+-------------+-------------+-------------+--------------+--------------
A | John | Bob | Sam | Sue | A
A | Sue | Sam | Bob | John | A
Departments A and B aren't present because their T1.Manager always equals T2.Manager.
Then we just subtract this result the original set to get the answer.
If your RDBMS supports common table expressions:
with C as (
select department, manager, count(*) as cnt
from A
group by department, manager
),
B as (
select department, count(*) as cnt
from A group by department
)
select A.*
from A
join C on A.department = C.department
join B on A.department = B.department
where B.cnt = C.cnt;

How to cross join in Big Query using intervals?

How can I join two tables using intervals in Google Big Query?
I have two table:
Table CarsGPS:
ID | Car | Latitude | Longitude
1 | 1 | -22.123 | -43.123
2 | 1 | -22.234 | -43.234
3 | 2 | -22.567 | -43.567
4 | 2 | -22.678 | -43.678
...
Table Areas:
ID | LatitudeMin | LatitudeMax | LongitudeMin | LongitudeMax
1 | -22.124 | -22.120 | -43.124 | -43.120
2 | -22.128 | -22.124 | -43.128 | -43.124
...
I'd like to cross join these tables to check in which areas each car has passed by using Google Big Query.
In a regular SQL server I would make:
SELECT A.ID, C.Car
FROM Cars C, Areas A
WHERE C.Latitude BETWEEN A.LatitudeMin AND A.LatitudeMax AND
C.Longitude BETWEEN A.LongitudeMin AND A.LongitudeMax
But Google Big Query only allows me to do joins (even JOIN EACH) using exact matches among joined tables. And the "FROM X, Y" means UNION, not JOINS.
So, this is not an option:
SELECT A.ID, C.Car
FROM Cars C
JOIN EACH
Areas A
ON C.Latitude BETWEEN A.LatitudeMin AND A.LatitudeMax AND
C.Longitude BETWEEN A.LongitudeMin AND A.LongitudeMax
Then, how can I run something similar to it to identify which cars passed inside each area?
BigQuery now supports CROSS JOIN. Your query would look like:
SELECT A.ID, C.Car
FROM Cars C
CROSS JOIN Areas A
WHERE C.Latitude BETWEEN A.LatitudeMin AND A.LatitudeMax AND
C.Longitude BETWEEN A.LongitudeMin AND A.LongitudeMax

SQL duration between dates for different persons

hopefully someone can help me with the following task:
I hVE got 2 tables Treatment and 'Person'. Treatment contains the dates when treatments for the different persons were started, Person contains personal information, e.g. lastname.
Now I have to find all persons where the duration between the first and last treatment is over 20 years.
The Tables look something like this:
Person
| PK_Person | First name | Name |
_________________________________
| 1 | A_Test | Karl |
| 2 | B_Test | Marie |
| 3 | C_Test | Steve |
| 4 | D_Test | Jack |
Treatment
| PK_Treatment | Description | Starting time | PK_Person |
_________________________________________________________
| 1 | A | 01.01.1989 | 1
| 2 | B | 02.11.2001 | 1
| 3 | A | 05.01.2004 | 1
| 4 | C | 01.09.2013 | 1
| 5 | B | 01.01.1999 | 2
So in this example, the output should be person Karl, A_Test.
Hopefully its understandable what the problem is and someone can help me.
Edit: There seems to be a problem with the formatting, the tables are not displayed correctly, I hope its readable.
SELECT *
FROM person p
INNER JOIN Treatment t on t.PK_Person = p.PK_Person
WHERE DATEDIFF(year,[TREATMENT_DATE_1], [TREATMENT_DATE_2]) > 20
This should do it, it is however untested so will need tweaking to your schema
Your data looks a bit suspicious, because the first name doesn't look like a first name.
But, what you want to do is aggregate the Treatment table for each person and get the minimum and maximum starting times. When the difference is greater than 20 years, then keep the person, and join back to the person table to get the names.
select p.FirstName, p.LastName
from Person p join
(select pk_person, MIN(StartingTime) as minst, MAX(StartingTime) as maxst
from Treatment t
group by pk_person
having MAX(StartingTime) - MIN(StartingTime) > 20*365.25
) t
on p.pk_person = t.pk_person;
Note that date arithmetic does vary between databases. In most databases, taking the difference of two dates counts the number of days between them, so this is a pretty general approach (although not guaranteed to work on all databases).
I've taken a slightly different approach and worked with SQL Fiddle to verify that the below statements work.
As mentioned previously, the data does seem a bit suspicious; nonetheless per your requirements, you would be able to do the following:
select P.PK_Person, p.FirstName, p.Name
from person P
inner join treatment T on T.pk_person = P.pk_person
where DATEDIFF((select x.startingtime from treatment x where x.pk_person = p.pk_person order by startingtime desc limit 1), T.StartingTime) > 7305
First, we need to inner join treatements which will ignore any persons who are not in the treatment table. The where portion now just needs to select based on your criteria (in this case a difference of dates). Doing a subquery will generate the last date a person has been treated, compare that to each of your records, and filter by number of days (7305 = 20 years * 365.25).
Here is the working SQL Fiddle sample.