cross joining two tables - sql

I have a table that looks like this, lets call this table B.
id boardid schoolid subject cnt1 cnt2 cnt3 ....
=================================================================
1 20 21 f
2 20 21 r
3 20 21 w
4 20 21 m
5 20 30 r
6 20 30 w
7 20 30 m
Suppose the counts are just integers. Notice that there is no subject = f for schoolid = 30. Similarly, for most schools, some subject dosnt exist. You might have a schoolid that has just r, w or some that are just r, m, f..
So what I want to do is have 4 consistent rows for each school, and the row that dosnt exist I want dummy values. I thought about creating a secondary table
drop table #A
Select * into #A FROM
(
select [subject_s] = 'r', orderNo = 1
union all
select [subject_s] = 'w', orderNo = 2
union all
select [subject_s] = 'm', orderNo = 3
union all
select [subject_s] = 'f', orderNo = 4
) z
and doing some joins on them, but I've gotten NO where. I've tried inner join, left outer, cross join, everything. I've even tried to make cartesian product. I think my cartesian product messes up because I have orderno in there so it makes 16 rows per row in the main table. Actually typing this out, I realize if I remove the orderno, apply the cartesian product and then add orderno in later, it might work but I am interested to see what you guys can come up with. I am stumped.
End result
id boardid schoolid subject cnt1 cnt2 cnt3 ....
=================================================================
1 20 21 r
2 20 21 w
3 20 21 m
4 20 21 f
5 20 30 r
6 20 30 w
7 20 30 m
7 20 30 f

Try the following:
SELECT S.boardid, S.schoolid, A.[subject], B.cnt1, B.cnt2, B.cnt3
FROM (SELECT DISTINCT boardid, schoolid FROM YourTable) S
CROSS JOIN #A A
LEFT JOIN YourTable B
ON B.boardid = S.boardid AND B.schoolid = S.schoolid
AND A.[subject] = B.[subject]

Since I do not know which RDBMS you are using I tried the following with sqlite and a simpler table:
sqlite> create table schools (name varchar, subject varchar, teacher varchar);
sqlite> select * from schools;
School1|Maths|Mr Smith
School2|English|Jack
School3|English|Jimmy
School3|Maths|Jane
School4|Computer Science|Bob
sqlite> select
schoolnames.name,
subjects.subject,
ifnull(teachers.teacher, "Unknown")
from (select distinct name from schools) schoolnames
join (select distinct subject from schools) subjects
left join schools teachers
on schoolnames.name = teachers.name
and subjects.subject = teachers.subject;
School1|Maths|Mr Smith
School1|English|Unknown
School1|Computer Science|Unknown
School2|Maths|Unknown
School2|English|Jack
School2|Computer Science|Unknown
School3|Maths|Jane
School3|English|Jimmy
School3|Computer Science|Unknown
School4|Maths|Unknown
School4|English|Unknown
School4|Computer Science|Bob

I'd use:
SELECT
boardid, schoolid, dist_subject, id, cnt1, ...
FROM
(SELECT
boardid, schoolid, dist_subject
FROM
(SELECT
DISTINCT subject AS dist_subject
FROM b ) s full outer join
(SELECT
boardid, schoolid
FROM b
GROUP BY
boardid, schoolid ) g ) sg LEFT OUTER JOIN
b ON
sg.boardID = b.boardID AND
sg.schoolid = b.schoolID
sg.dist_subject = b.subject

Related

SQL read multiple tables and self join

Been trying to learn SQL and am stuck on a problem I want to understand:
Given the following tables:
TABLE - Customer.movie_id
ID MOVIE_ID
-------------
x Spiderman
y Batman
z Avengers
TABLE - Customer.game_id
ID GAME_ID
-------------
A COD
C HALO
D BATTLEFEILD
B MINECRAFT
TABLE - Customer.type_id
ID TYPE_ID
-------------
ii AGE
jj GENDER
kk INCOME
TABLE - Customer.Info
ID MOVIE_ID GAME_ID TYPE_ID DATA
--------------------------------------------
1 x A ii 20
2 x A jj F
3 x C kk 1000
4 y C ii 40
5 y D jj M
6 y C kk 5000
7 z B ii 60
8 z B jj F
9 z C kk 10000
Produce an output that will show rows of AGE only if MOVIE_ID and GAME_ID match the same values on the GENDER type rows.
TABLE - Customer.Info
ID MOVIE_ID GAME_ID TYPE_ID DATA
--------------------------------------------
1 x A ii 20
7 z B ii 60
I have been able to do queries individually in python and process it there, but I don't have any idea of how to combine all of this into one query.
Can anyone help?
This is probably the shortest. If any other rows exists that has the same ids and gender then return it:
select i1.*
from Customer.Info i1 inner join Customer.Info i2
on i2.MOVIE_ID = i1.MOVIE_ID and i2.GAME_ID = i3.GAME_ID
and i1.TYPE_ID = 'ii' and i2.TYPE_ID = 'jj'
where and exists (
select 1 from Customer.Info i3
where i3.ID <> i2.ID
and i3.MOVIE_ID = i2.MOVIE_ID and i3.GAME_ID = i2.GAME_ID
and i3.TYPE_ID = 'jj' and i3.DATA = i2.DATA
);
I don't know if you intend for groups of more than two to all match together without variation.
This data model is not normalized properly and everything is named poorly, but aside from that, you probably want some form of EXISTS clause, such as:
select
i.*
from
customer.info i
join
customer.type_id ti on ti.id = i.type_id
where
ti.type_id = 'AGE'
and
exists(
select 1
from customer.info i2
join customer.type_id ti2 on ti2.id = i2.type_id
where i2.movie_id = i.movie_id
and i2.game_id = i.game_id
and ti2.type_id = 'GENDER'
)
/
Or, if EXISTS is not available, you can do it joining two subqueries:
select
age_info.*
from
(
select i.*
from customer.info i
join customer.type_id ti on ti.id = i.type_id
where ti.type_id = 'AGE'
) age_info
join
(
select distinct i.movie_id, i.game_id
from customer.info i
join customer.type_id ti on ti.id = i.type_id
where ti.type_id = 'GENDER'
) gender_info
on gender_info.movie_id = age_info.movie_id
and gender_info.game_id = age_info.game_id
/

Values from a table that are not in another one

I have two tables, A and B.
A
ID age
1 24
2 25
45 22
B
ID school
34 school1
1 school2
I want to select IDs that are in B but not in A.
I wrote
Select distinct bb.school
From B as bb
Left outer join A as aa
On bb.ID=aa.ID
inner join C as cc
On bb.school=cc.school
This code returns exactly the same number of rows that I would have with an inner join instead of left outer join.
Am I doing something wrong?
Try using not in;
Select * From A Where ID Not In ( Select ID From B )

Attributed null values to each ID in Athena (Presto)

Here is what my initial dataset looks like
prof_id id title
1 5 A
1 5 B
1 5 C
1 5 D
2 5 C
2 5 D
2 5 E
NA 5 F
NA 5 G
Here is what the new table should look like:
prof_id id title
1 5 A
1 5 B
1 5 C
1 5 D
1 5 F
1 5 G
2 5 C
2 5 D
2 5 E
2 5 F
2 5 G
Any row with a null value for a prof_id should be attributed to all of the prof_id. I have provided an example where there are two '
prof_id but there are also instances where there are 1 or 0 prof_id.
For 1, all of the null should be attributed to that single prof_id
For 0, leave it as is
I'm new to SQL so I'm not sure how to start. Any guidance would be much appreciated.
Thanks
In this case, you will need to do cross join, where essentially it is going to multiply 2 tables together.
First to pick out all nulls:
select id, title from table where prof_id is null
Then pick out the prof_id you want to apply to all tables
select distinct prof_id from table where prof_is is not null
Do a cross join together, then union the rest of "good" data back
(select distinct prof_id from table where prof_is is not null)
CROSS JOIN
(select id, title from table where prof_id is null)
UNION ALL
(select prof_id, id, title from table where prof_id is not null)
You can generate all the rows using a cross join. Then use union all to combine this with the rest of the data.
The following syntax should work:
select p.prof_id, i.id, t.title
from (select distinct prof_id
from t
where prof_id <> 'NA' -- or do you mean is not null
) p cross join
(select distinct id from t) i cross join
(select distinct title
from t
where prof_id = 'NA' -- or is null
) t
union all
select prof_id, id, title
from t
where prof_id <> 'NA' -- or is not null

SQL - only select results that do not have a specific status value in a corresponding table

I have two tables, and I only want to get the Student IDs where they have perfect attendance for all months (they do not have a PerfectAttendance value of N for any month). These tables will have hundreds of millions of rows, so I was trying to come up with an approach that doesn't require a full separate subquery. If anyone has any recommendations, please let me know:
Table Student:
ID Name
------------
1 A
2 B
Table Attendance:
ID Month PerfectAttendance
---------------------------------
1 1 Y
1 2 Y
1 3 Y
1 4 Y
1 5 Y
1 6 Y
1 7 Y
1 8 Y
1 9 Y
1 10 Y
1 11 Y
1 12 Y
2 1 Y
2 2 Y
2 3 Y
2 4 Y
2 5 Y
2 6 Y
2 7 Y
2 8 Y
2 9 Y
2 10 Y
2 11 Y
2 12 N
SELECT *
FROM dbo.Student S
WHERE NOT EXISTS(SELECT 1 FROM dbo.Attendance
WHERE PerfectAttendance = 'N'
AND ID = S.ID);
My suggestion for this would be to query the table and get the number of months that each student has perfect attendance. Once you've done that, you can filter on the count being 12 (since there are twelve months).
Try this:
SELECT s.id, s.name, COUNT(*) AS numPerfectMonths
FROM student s JOIN attendence a ON s.id = a.id
WHERE a.perfectAttendance = 'Y'
GROUP BY s.id
HAVING COUNT(*) = 12;
Here is the SQL Fiddle for you.
EDIT
I made the assumption you will have 12 rows for each student. However, let's say you ran this in October and you want to see which students have a perfect attendance up to that point. You can use a subquery to pull for students without perfect attendance, and filter them out using NOT IN like so:
SELECT id
FROM student
WHERE id
NOT IN(SELECT s.id
FROM student s JOIN attendance a ON s.id = a.id
WHERE a.perfectAttendance = 'N'
GROUP BY s.id
HAVING COUNT(*) > 0);
Have an updated SQL Fiddle. To test this one, try deleting one of the rows for id number 1, and you'll still see that they are returned with perfect attendance.
Assuming you have 12 records per student in attendance table based on your data , you can do it with GROUP BY and HAVING clause.
SELECT S.ID, S.NAME
FROM Student S
JOIN Attendance A
on S.ID = A.ID
AND A.PerfectAttendance = 'Y'
GROUP BY S.ID, S.NAME
HAVING COUNT(*) = 12
I think Lamak's answer is probably the clearest and best-performing, but here is another variation on the GROUP BY method suggested by others, when you don't specifically look for a total of 12 months:
;WITH PerfectAttendance AS (
SELECT a.id
FROM Attendance a
GROUP BY a.id
HAVING MIN(a.PerfectAttendance) = 'Y'
)
SELECT s.id, s.Name
FROM PerfectAttendance p
JOIN Student s ON p.id = s.id;

Is it possible to left join two tables and have the right table supply each row no more than once?

Given this table structure:
Table A
ID AGE EDUCATION
1 23 3
2 25 6
3 22 5
Table B
ID AGE EDUCATION
1 26 4
2 24 6
3 21 3
I want to find all matches between the two tables where the age is within 2 and the education is within 2. However, I do not want to select any row from TableB more than once. Each row in B should be selected 0 or 1 times and each row in A should be selected one or more times (standard left join).
SELECT *
FROM TableA as A LEFT JOIN TableB as B ON
abs(A.age - B.age) <= 2 AND
abs(A.education - B.education) <= 2
A.ID A.AGE A.EDUCATION B.ID B.AGE B.EDUCATION
1 23 3 3 21 3
2 25 6 1 26 4
2 25 6 2 24 6
3 22 5 2 24 6
3 22 5 3 21 3
As you can see, the last two rows in the output have duplicated B.ID of 2 and 3 when compared to the entire result set. I'd like those rows to return as a single null match with A.ID = 3 since they were both matched to previous A values.
Desired output:
(note that for A.ID = 3, there is no match in B because all rows in B have already been joined to rows in A.)
A.ID A.AGE A.EDUCATION B.ID B.AGE B.EDUCATION
1 23 3 3 21 3
2 25 6 1 26 4
2 25 6 2 24 6
3 22 5 null null null
I can do this with a short program, but I'd like to solve the problem using a SQL query because it is not for me and I will not have the luxury of ever seeing the data or manipulating the environment.
Any ideas? Thanks
As #Joel Coehoorn said earlier, there has to be a mechanism that selects which pairs of (a,b) with the same (b) are filtered out from the output. SQL is not great on allowing you to select ONE row when multiple match, so a pivot query needs to be created, where you filter out the records you don't want. In this case, filtering can be done by reducing all of match IDs of B as a smallest (or largest, it doesn't really matter), using any function that will return one value from a set, it's just min() and max() are most convenient to use. Once you reduced the result to know which (a,b) pairs you care about, then you join against that result, to pull out the rest of the table data.
select a.id a_id, a.age a_age, a.education a_e,
b.id b_id, b.age b_age, b.education b_e
from a left join
(
SELECT
a.id a_id, min(b.id) b_id from a,b where
abs(A.age - B.age) <= 2 AND
abs(A.education - B.education) <= 2
group by a.id
) g on a.id = g.a_id
left join b on b.id = g.b_id;
To my knowledge something like this is not possible with a simple select statement and joins because you need to know what has already been selected in order to eliminate duplicates.
You can however try something a little more like this:
DECLARE #JoinResults TABLE
(A_ID INT, A_Age INT, A_Education INT, B_ID INT, B_Age INT, B_Education INT)
INSERT INTO #JoinResults (A_ID, A_Age, A_Education)
SELECT ID, AGE, EDUCATION
FROM TableA
DECLARE #i INT
SET #i = 1
--Assume that A_ID is incremental and no values missed
WHILE (#i < (SELECT Max(A_ID) FROM #JoinResults
BEGIN
UPDATE #JoinResult
SET B_ID = SQ.ID,
B_Age = SQ.AGE,
B_Education = SQ.Education
FROM (
SELECT ID, AGE, EDUCATION
FROM TableB b
WHERE (
abs((SELECT A_Age FROM #JoinResult WHERE A_Id = #i) - AGE) <=2
AND abs((SELECT A_Education FROM #JoinResult WHERE A_Id = #i) - EDUCATION) <=2
) AND (SELECT B_ID FROM #JoinResults WHERE B_ID = b.id) IS NULL
) AS SQ
SET #i = #i + 1
END
SELECT #JoinResults
NOTE: I do not currently have access to a database so this is untested and I am weary of 2 potential issues with it
I am not sure how the update will react if there are no results
I am unsure if the IS NULL check is correct to eliminate the duplicates.
If these issues do arise let me know and I can help troubleshoot.
In SQL-Server, you can use the CROSS APPLY syntax:
SELECT
a.id, a.age, a.education,
b.id AS b_id, b.age AS b_age, b.education AS b_education
FROM tableB AS b
CROSS APPLY
( SELECT TOP (1) a.*
FROM tableA AS a
WHERE ABS(a.age - b.age) <= 2
AND ABS(a.education - b.education) <= 2
ORDER BY a.id -- your choice here
) AS a ;
Depending on the order you choose in the subquery, different rows from tableA will be selected.
Edit (after your update): But the above query will not show rows from A that have no matching rows in B or even some that have but not been selected.
It could also be done with window functions but Access does not have them. Here is a query that I think will work in Access:
SELECT
a.id, a.age, a.education,
s.id AS s_id, s.age AS b_age, s.education AS b_education
FROM tableB AS a
LEFT JOIN
( SELECT
b.id, b.age, b.education, MIN(a.id) AS a_id
FROM tableB AS b
JOIN tableA AS a
ON ABS(a.age - b.age) <= 2
AND ABS(a.education - b.education) <= 2
GROUP BY b.id, b.age, b.education
) AS s
ON a.id = s.a_id ;
I'm not sure if Access allows such a subquery but if it doesn't, you can define it as a "Query" and then use it in another.
Use SELECT DISTINCT
SELECT DISTINCT A.id, A.age, A.education, B.age, B.education
FROM TableA as A LEFT JOIN TableB as B ON
abs(A.age - B.age) <= 2 AND
abs(A.education - B.education) <= 2