SQL Join repeatedly (M:N) - sql

How can I join one table with another table multiple times in SQL? Illustrative example:
Table "Couples":
+------+------+------+
| ID | ID_1 | ID_2 |
+------+------+------+
| 1 | 123 | 456 |
+------+------+------+
Table "Info":
+-----+-----------+-----------+--------+--------+
| ID | FirstName | LastName | Gender | Season |
+-----+-----------+-----------+--------+--------+
| 123 | Jon | Snow | Male | 6 |
| 456 | Daenerys | Targaryen | Female | 6 |
| 123 | Jon | Targaryen | Male | 7 |
+-----+-----------+-----------+--------+--------+
And now I need a combined result, that needs to be "up to date" (Info.Season must be the highest possible, but you cannot delete obsolete rows):
Desired result:
+-------------+------------+----------+-------------+------------+----------+
| FirstName_1 | LastName_1 | Gender_1 | FirstName_2 | LastName_2 | Gender_2 |
+-------------+------------+----------+-------------+------------+----------+
| Jon | Targaryen | Male | Daenerys | Targaryen | Female |
+-------------+------------+----------+-------------+------------+----------+
I have no clue how to solve the problem, that IDs are not unique and I need to join the Info table "multiple times".

You can select the most current state of a player by using an SQL Window Function to order each player by season. Then you need to join the information table to the couples table for each person in the couple.
actor_latest CTE
ID FIRSTNAME LASTNAME GENDER SEASON LAST_CHANGE
123 Jon Targaryen Male 7 1
123 Jon Snow Male 6 2
456 Daenerys Targaryen Female 6 1
Resulting SQL
with actor_latest (id, firstname,lastname,gender,season, last_change) as (
select
id
, firstname
, lastname
, gender
, season
, rank() over (partition by id order by season desc) as last_change
from info
)
select
left_partner.firstname as firstname_1
, left_partner.lastname as lastname_1
, left_partner.gender as gender_1
, left_partner.season as season_1
, right_partner.firstname as firstname_2
, right_partner.lastname as lastname_2
, right_partner.gender as gender_2
, right_partner.season as season_2
from
couples c
join actor_latest left_partner on c.id_1 = left_partner.id and left_partner.last_change=1
join actor_latest right_partner on c.id_2 = right_partner.id and left_partner.last_change=1
Results
FIRSTNAME_1 LASTNAME_1 GENDER_1 SEASON_1 FIRSTNAME_2 LASTNAME_2 GENDER_2 SEASON_2
Jon Targaryen Male 7 Daenerys Targaryen Female 6
SQL Fiddle

Related

How do I insert multiple rows from one table into a struct column of a single row of another table?

I have 2 source tables at the moment.
Table #1: sourceTableMain
|EmployeeNumber| DepartmentNumber | CostCenterNumber |
| -------------| ---------------- |------------------|
| 1 | 100 | 1001 |
| 2 | 200 | 1001 |
| 3 | 100 | 1002 |
Table #2: sourceTableEmployee
|EmployeeNumber| EmployeeFirstName | EmployeeLastName | EmployeeAddress |
| -------------| ---------------- |------------------|---------------- |
| 1 | Michael | Scott | 110 ABC Ln |
| 1 | Michael | Scott | 450 XYZ Ln |
| 2 | Dwight | Schrute | 321 PQR St |
| 3 | Jim | Halpert | 678 LMN Blvd |
I am trying to insert the combine the rows into a 3rd table named targetTableCombined which has the following schema:
FieldName
Type
Mode
employeeNumber
INTEGER
NULLABLE
employeeDetails
(struct)
RECORD
REPEATED
employeeFirstName
STRING
NULLABLE
employeeLastName
STRING
NULLABLE
employeeAddress
STRING
NULLABLE
Within the target table (targetTableCombined), I am trying to make sure that for each employeeNumber, all of the First Names, Last Names and Addresses are repeated under a single struct array. For example, EmployeeNumber 1 should have only 1 row in the target table, with the first name, last name and different addresses as part of the second column (struct), each in a separate row.
I wrote an insert script to do this, but I am going wrong:
insert into `dev.try_sbx.targetTableCombined`
select
main.employeeNumber,
array(
select as struct
emp.employeeFirstName,
emp.employeeLastName,
emp.employeeAddress
)
from
`dev.try_sbx.sourceTableMain` as main
inner join `dev.try_sbx.sourceTableEmployee` as emp
on main.EmployeeNumber = emp.EmployeeNumber;
This is the result I am getting when running the query above:
| EmployeeNumber | EmployeeDetails |
| ------------- | ------------------------------ |
| 1 | [Michael, Scott, 110 ABC Ln] |
| 1 | [Michael, Scott, 450 XYZ Ln] |
| 2 | [Dwight, Schrute, 321 PQR St] |
| 3 | [Jim, Halpert, 678 LMN Blvd] |
(Sorry about not being able to share screenshots - I don't have enough rep. But to elaborate, I am expecting only 3 rows on the insert (employee 1 should have had a single array containing both addresses). I am instead, getting 4 rows after the insert.)
Where am I going wrong with my script?
It's because ARRAY() is not an aggregation function. You should ARRAY_AGG() along with GROUP BY to group details for each employee into an array.
SELECT EmployeeNumber,
ARRAY_AGG((SELECT AS STRUCT EmployeeFirstName, EmployeeLastName, EmployeeAddress)) AS employeeDetails
FROM `dev.try_sbx.sourceTableEmployee`
GROUP BY 1;
More preferred way is :
SELECT EmployeeNumber,
ARRAY_AGG(STRUCT(EmployeeFirstName, EmployeeLastName, EmployeeAddress)) AS employeeDetails
FROM `dev.try_sbx.sourceTableEmployee`
GROUP BY 1;
output:

How to print the students name in this query?

The concerned tables are as follows:
students(rollno, name, deptcode)
depts(deptcode, deptname)
course(crs_rollno, crs_name, marks)
The query is
Find the name and roll number of the students from each department who obtained
highest total marks in their own department.
Consider:
i) Courses of different department are different.
ii) All students of a particular department take same number and same courses.
Then only the query makes sense.
I wrote a successful query for displaying the maximum total marks by a student in each department.
select do.deptname, max(x.marks) from students so
inner join depts do
on do.deptcode=so.deptcode
inner join(
select s.name as name, d.deptname as deptname, sum(c.marks) as marks from students s
inner join crs_regd c
on s.rollno=c.crs_rollno
inner join depts d
on d.deptcode=s.deptcode
group by s.name,d.deptname) x
on x.name=so.name and x.deptname=do.deptname group by do.deptname;
But as mentioned I need to display the name as well. Accordingly if I include so.name in select list, I need to include it in group by clause and the output is as below:
Kendra Summers Computer Science 274
Stewart Robbins English 80
Cole Page Computer Science 250
Brian Steele English 83
expected output:
Kendra Summers Computer Science 274
Brian Steele English 83
Where is the problem?
I guess this can be easily achieved if you use window function -
select name, deptname, marks
from (select s.name as name, d.deptname as deptname, sum(c.marks) as marks,
row_number() over(partition by d.deptname order by sum(c.marks) desc) rn
from students s
inner join crs_regd c on s.rollno=c.crs_rollno
inner join depts d on d.deptcode=s.deptcode
group by s.name,d.deptname) x
where rn = 1;
To solve the problem with a readable query I had to define a couple of views:
total_marks: For each student the sum of their marks
create view total_marks as select s.deptcode, s.name, s.rollno, sum(c.marks) as total from course c, students s where s.rollno = c.crs_rollno group by s.rollno;
dept_max: For each department the highest total score by a single student of that department
create view dept_max as select deptcode, max(total) max_total from total_marks group by deptcode;
So I can get the desidered output with the query
select a.deptcode, a.rollno, a.name from total_marks a join dept_max b on a.deptcode = b.deptcode and a.total = b.max_total
If you don't want to use views you can replace their selects on the final query, which will result in this:
select a.deptcode, a.rollno, a.name
from
(select s.deptcode, s.name, s.rollno, sum(c.marks) as total from course c, students s where s.rollno = c.crs_rollno group by s.rollno) a
join (select deptcode, max(total) max_total from (select s.deptcode, s.name, s.rollno, sum(c.marks) as total from course c, students s where s.rollno = c.crs_rollno group by s.rollno) a_ group by deptcode) b
on a.deptcode = b.deptcode and a.total = b.max_total
Which I'm sure it is easily improvable in performance by someone more skilled then me...
If you (and anybody else) want to try it the way I did, here is the schema:
create table depts ( deptcode int primary key auto_increment, deptname varchar(20) );
create table students ( rollno int primary key auto_increment, name varchar(20) not null, deptcode int, foreign key (deptcode) references depts(deptcode) );
create table course ( crs_rollno int, crs_name varchar(20), marks int, foreign key (crs_rollno) references students(rollno) );
And here all the entries I inserted:
insert into depts (deptname) values ("Computer Science"),("Biology"),("Fine Arts");
insert into students (name,deptcode) values ("Turing",1),("Jobs",1),("Tanenbaum",1),("Darwin",2),("Mendel",2),("Bernard",2),("Picasso",3),("Monet",3),("Van Gogh",3);
insert into course (crs_rollno,crs_name,marks) values
(1,"Algorithms",25),(1,"Database",28),(1,"Programming",29),(1,"Calculus",30),
(2,"Algorithms",24),(2,"Database",22),(2,"Programming",28),(2,"Calculus",19),
(3,"Algorithms",21),(3,"Database",27),(3,"Programming",23),(3,"Calculus",26),
(4,"Zoology",22),(4,"Botanics",28),(4,"Chemistry",30),(4,"Anatomy",25),(4,"Pharmacology",27),
(5,"Zoology",29),(5,"Botanics",27),(5,"Chemistry",26),(5,"Anatomy",25),(5,"Pharmacology",24),
(6,"Zoology",18),(6,"Botanics",19),(6,"Chemistry",22),(6,"Anatomy",23),(6,"Pharmacology",24),
(7,"Sculpture",26),(7,"History",25),(7,"Painting",30),
(8,"Sculpture",29),(8,"History",24),(8,"Painting",30),
(9,"Sculpture",21),(9,"History",19),(9,"Painting",25) ;
Those inserts will load these data:
select * from depts;
+----------+------------------+
| deptcode | deptname |
+----------+------------------+
| 1 | Computer Science |
| 2 | Biology |
| 3 | Fine Arts |
+----------+------------------+
select * from students;
+--------+-----------+----------+
| rollno | name | deptcode |
+--------+-----------+----------+
| 1 | Turing | 1 |
| 2 | Jobs | 1 |
| 3 | Tanenbaum | 1 |
| 4 | Darwin | 2 |
| 5 | Mendel | 2 |
| 6 | Bernard | 2 |
| 7 | Picasso | 3 |
| 8 | Monet | 3 |
| 9 | Van Gogh | 3 |
+--------+-----------+----------+
select * from course;
+------------+--------------+-------+
| crs_rollno | crs_name | marks |
+------------+--------------+-------+
| 1 | Algorithms | 25 |
| 1 | Database | 28 |
| 1 | Programming | 29 |
| 1 | Calculus | 30 |
| 2 | Algorithms | 24 |
| 2 | Database | 22 |
| 2 | Programming | 28 |
| 2 | Calculus | 19 |
| 3 | Algorithms | 21 |
| 3 | Database | 27 |
| 3 | Programming | 23 |
| 3 | Calculus | 26 |
| 4 | Zoology | 22 |
| 4 | Botanics | 28 |
| 4 | Chemistry | 30 |
| 4 | Anatomy | 25 |
| 4 | Pharmacology | 27 |
| 5 | Zoology | 29 |
| 5 | Botanics | 27 |
| 5 | Chemistry | 26 |
| 5 | Anatomy | 25 |
| 5 | Pharmacology | 24 |
| 6 | Zoology | 18 |
| 6 | Botanics | 19 |
| 6 | Chemistry | 22 |
| 6 | Anatomy | 23 |
| 6 | Pharmacology | 24 |
| 7 | Sculpture | 26 |
| 7 | History | 25 |
| 7 | Painting | 30 |
| 8 | Sculpture | 29 |
| 8 | History | 24 |
| 8 | Painting | 30 |
| 9 | Sculpture | 21 |
| 9 | History | 19 |
| 9 | Painting | 25 |
+------------+--------------+-------+
I take chance to point out that this database is badly designed. This becomes evident with course table. For these reasons:
The name is singular
This table does not represent courses, but rather exams or scores
crs_name should be a foreign key referencing the primary key of another table (that would actually represent the courses)
There is no constrains to limit the marks to a range and to avoid a student to take twice the same exam
I find more logical to associate courses to departments, instead of student to departments (this way also would make these queries easier)
I tell you this because I understood you are learning from a book, so unless the book at one point says "this database is poorly designed", do not take this exercise as example to design your own!
Anyway, if you manually resolve the query with my data you will come to this results:
+----------+--------+---------+
| deptcode | rollno | name |
+----------+--------+---------+
| 1 | 1 | Turing |
| 2 | 6 | Bernard |
| 3 | 8 | Monet |
+----------+--------+---------+
As further reference, here the contents of the views I needed to define:
select * from total_marks;
+----------+-----------+--------+-------+
| deptcode | name | rollno | total |
+----------+-----------+--------+-------+
| 1 | Turing | 1 | 112 |
| 1 | Jobs | 2 | 93 |
| 1 | Tanenbaum | 3 | 97 |
| 2 | Darwin | 4 | 132 |
| 2 | Mendel | 5 | 131 |
| 2 | Bernard | 6 | 136 |
| 3 | Picasso | 7 | 81 |
| 3 | Monet | 8 | 83 |
| 3 | Van Gogh | 9 | 65 |
+----------+-----------+--------+-------+
select * from dept_max;
+----------+-----------+
| deptcode | max_total |
+----------+-----------+
| 1 | 112 |
| 2 | 136 |
| 3 | 83 |
+----------+-----------+
Hope I helped!
Try the following query
select a.name, b.deptname,c.marks
from students a
, crs_regd b
, depts c
where a.rollno = b.crs_rollno
and a.deptcode = c.deptcode
and(c.deptname,b.marks) in (select do.deptname, max(x.marks)
from students so
inner join depts do
on do.deptcode=so.deptcode
inner join (select s.name as name
, d.deptname as deptname
, sum(c.marks) as marks
from students s
inner join crs_regd c
on s.rollno=c.crs_rollno
inner join depts d
on d.deptcode=s.deptcode
group by s.name,d.deptname) x
on x.name=so.name
and x.deptname=do.deptname
group by do.deptname
)
Inner/Sub query will fetch the course name and max marks and the outer query gets the corresponding name of the student.
try and let know if you got the desired result
Dense_Rank() function would be helpful in this scenario:
SELECT subquery.*
FROM (SELECT Student_Total_Marks.rollno,
Student_Total_Marks.name,
Student_Total_Marks.deptcode, depts.deptname,
rank() over (partition by deptcode order by total_marks desc) Student_Rank
FROM (SELECT Stud.rollno,
Stud.name,
Stud.deptcode,
sum(course.marks) total_marks
FROM students stud inner join course course on stud.rollno = course.crs_rollno
GROUP BY stud.rollno,Stud.name,Stud.deptcode) Student_Total_Marks,
dept dept
WHERE Student_Total_Marks.deptcode = dept.deptname
GROUP BY Student_Total_Marks.deptcode) subquery
WHERE suquery.student_rank = 1

Union two tables with a table _name column

I want to union two tables(Student1, Student2).
1 - Student1
| student_code | name |
--------------------------
| 1 | katia |
| 2 | roger |
| 3 | ken |
2 - Student2
| student_code | name |
--------------------------
| 3 | katia |
| 4 | roger |
| 5 | ken |
then I want get result like this.
result
|table_name| student_code | name |
-------------------------------------
|Student1 | 1 | katia |
|Student1 | 2 | roger |
|Student1 | 3 | ken |
|Student2 | 3 | katia |
|Student2 | 4 | roger |
|Student2 | 5 | ken |
I want to use only ANSI sql.
You could use
SELECT 'Student1' AS table_name, student_code, name FROM Student1
UNION ALL
SELECT 'Student2' AS table_name, student_code, name FROM Student2
select 'Student1' AS table_name,student_code,name from student1
union
select 'Student2' AS table_name,student_code,name from student2
I assume you know the difference betweenUNION and UNION ALL, union brings unique records, it is same as UNION PERFORMED ON SETS while union all will bring duplicate rows as well.
In your case, it will bring duplicates even with union because of the first column which differentiates the rows.
Use UNION ALL statement :
SELECT 'Student1' as table_name, student_code, name FROM Student1
UNION ALL
SELECT 'Student2' as table_name, student_code, name FROM Student2

Query to rank rows in groups

I'm using Apache Derby 10.10.
I have a list of participants and would like to calculate their rank in their country, like this:
| Country | Participant | Points | country_rank |
|----------------|---------------------|--------|--------------|
| Australia | Bridget Ciriac | 1 | 1 |
| Australia | Austin Bjorklun | 4 | 2 |
| Australia | Carrol Motto | 7 | 3 |
| Australia | Valeria Seligma | 8 | 4 |
| Australia | Desmond Miyamot | 27 | 5 |
| Australia | Maryjane Digma | 33 | 6 |
| Australia | Kena Elmendor | 38 | 7 |
| Australia | Emmie Hicke | 39 | 8 |
| Australia | Kaitlyn Mund | 50 | 9 |
| Australia | Alisia Vitaglian | 65 | 10 |
| Australia | Anika Bulo | 65 | 11 |
| UK | Angle Ifil | 2 | 1 |
| UK | Demetrius Buelo | 12 | 2 |
| UK | Ermelinda Mell | 12 | 3 |
| UK | Adeline Pee | 21 | 4 |
| UK | Alvera Cangelos | 23 | 5 |
| UK | Keshia Mccalliste | 23 | 6 |
| UK | Alayna Rashi | 24 | 7 |
| UK | Malinda Mcfarlan | 25 | 8 |
| United States | Gricelda Quirog | 3 | 1 |
| United States | Carmina Britto | 5 | 2 |
| United States | Noemi Blase | 6 | 3 |
| United States | Britta Swayn | 8 | 4 |
| United States | An Heidelber | 12 | 5 |
| United States | Maris Padill | 21 | 6 |
| United States | Rachele Italian | 21 | 7 |
| United States | Jacquiline Speake | 28 | 8 |
| United States | Hipolito Elami | 45 | 9 |
| United States | Earl Sayle | 65 | 10 |
| United States | Georgeann Ves | 66 | 11 |
| United States | Conchit Salli | 77 | 12 |
The schema looks like this (sqlfiddle):
create table Country(
id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
name varchar(255),
PRIMARY KEY (id)
);
create table Team(
id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
country_id int not null,
PRIMARY KEY (id),
FOREIGN KEY (country_id) REFERENCES Country(id)
);
create table Participant(
id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
team_id int not null,
name varchar(100),
points int,
PRIMARY KEY (id),
FOREIGN KEY (team_id) REFERENCES Team(id)
);
This is what I have tried:
select
Country.name,
Participant.name,
Participant.points,
ROW_NUMBER() OVER(order by Country.name, Participant.points) as country_rank
from Country
join Team
on Country.id = Team.country_id
join Participant
on Team.id = Participant.team_id;
But according to the apache derby doco, the OVER() statement doesn't take any arguments.
Does anyone have a way to achieve the country rank?
SQL
SELECT c.name AS Country,
p.name AS Participant,
p.points AS Points,
(SELECT COUNT(*)
FROM Participant p2
JOIN Team t2 ON p2.team_id = t2.id
WHERE t2.country_id = t.country_id
AND (p2.points < p.points
OR p2.points = p.points AND p2.name <= p.name)) AS country_rank
FROM Country c
JOIN Team t ON c.id = t.country_id
JOIN Participant p ON t.id = p.team_id
ORDER BY c.name, p.points, p.name;
Online Demo
SQL Fiddle demo: http://sqlfiddle.com/#!5/f48f8/14
Explanation
A simple ANSI-SQL subselect can be used to do the same job, counting the number of records for participants in the same country with a lower score or with the same score and a name that is alphabetically no higher.
Consider a non-windows function SQL query that uses a correlated aggregate count subquery. Because the group column (Country.name) is not in same table as the rank criteria (Participant.points), we need to run same joins in the subquery but rename table aliases to properly compare inner and outer queries.
Now of course, in a perfect world that would be it but we must now account for tied points. Therefore, another very similar subquery (for tie breaker) is used to be added to first subquery. This second nested query matches inner and outer query's Country.name and Participant.points but ranks by alphabetical order of Participant.name.
SELECT
Country.name AS Country,
Participant.name AS Participant,
Participant.points,
(SELECT Count(*) + 1
FROM Country subC
INNER JOIN Team subT
ON subC.id = subT.country_id
INNER JOIN Participant subP
ON subT.id = subP.team_id
WHERE subC.name = Country.name
AND subP.points < Participant.points)
+
(SELECT Count(*)
FROM Country subC
INNER JOIN Team subT
ON subC.id = subT.country_id
INNER JOIN Participant subP
ON subT.id = subP.team_id
WHERE subC.name = Country.name
AND subP.points = Participant.points
AND subP.name < Participant.name) As country_rank
FROM Country
INNER JOIN Team
ON Country.id = Team.country_id
INNER JOIN Participant
ON Team.id = Participant.team_id
ORDER BY Country.name, Participant.points;
all you need to add is a partition by country and that should give you what you need.
SELECT
Country.name,
Participant.name,
Participant.points,
ROW_NUMBER() OVER(PARTITION BY country order by Country.name, Participant.points) as country_rank
from Country
join Team
on Country.id = Team.country_id
join Participant
on Team.id = Participant.team_id;

SQL select flag based on count and/or flag of joined table

I have a Customer table and an Address table.
The Address table has a flag which is either INVOICE, CORRESPONDENCE or DELIVERY.
A Customer can have 0 to many Address records.
I want to be able to query both tables and generate a flag for each customer based on the address data - no address records = NONE, 1 or more INVOICE records = HASINVOICE, no INVOICE but 1 or more others = HASOTHER
so, for the following data:
+------------+---------+
| CustomerID | Name |
+------------+---------+
| 1 | Peter |
| 2 | Ray |
| 3 | Egon |
| 4 | Winston |
| 5 | Dana |
+------------+---------+
+-----------+------------+----------------+
| AddressID | CustomerID | AddressType |
+-----------+------------+----------------+
| 1 | 1 | INVOICE |
| 2 | 1 | DELIVERY |
| 3 | 2 | DELIVERY |
| 4 | 2 | CORRESPONDENCE |
| 5 | 4 | INVOICE |
| 6 | 5 | CORRESPONDENCE |
+-----------+------------+----------------+
I would expect the following output:
+------------+---------+-------------+
| CustomerID | Name | AddressFlag |
+------------+---------+-------------+
| 1 | Peter | HASINVOICE |
| 2 | Ray | HASOTHER |
| 3 | Egon | NONE |
| 4 | Winston | HASINVOICE |
| 5 | Dana | HASOTHER |
+------------+---------+-------------+
Is this possible, for SQL 2000, using a single query and no cursors?
I don't have a 2000 instance handy (you really should upgrade, you're 4-5 releases behind), but I think that this should work:
declare #Customers table (CustomerID int,Name varchar(10))
insert into #Customers (CustomerID,Name)
select 1,'Peter' union all select 2,'Ray' union all
select 3,'Egon' union all select 4,'Winston' union all
select 5,'Dana'
declare #Addresses table (AddressID int, CustomerID int,
AddressType varchar(30))
insert into #Addresses (AddressID,CustomerID,AddressType)
select 1,1,'INVOICE' union all select 2,1,'DELIVERY' union all
select 3,2,'DELIVERY' union all select 4,2,'CORRESPONDENCE' union all
select 5,4,'INVOICE' union all select 6,5,'CORRESPONDENCE'
select
c.CustomerID,
c.Name,
CASE MAX(CASE
WHEN a.AddressType = 'Invoice' THEN 2
WHEN a.AddressType IS NOT NULL THEN 1
END
) WHEN 2 THEN 'HASINVOICE'
WHEN 1 THEN 'HASOTHER'
ELSE 'NONE'
END as AddressFlag
from
#Customers c
left join
#Addresses a
on
c.CustomerID = a.CustomerID
group by
c.CustomerID,
c.Name
Produces:
CustomerID Name AddressFlag
----------- ---------- -----------
5 Dana HASOTHER
3 Egon NONE
1 Peter HASINVOICE
2 Ray HASOTHER
4 Winston HASINVOICE