How to solve union de-duplication problem in SQL

How to solve union de-duplication problem in SQL - sql

If I want to make a Union between two tables (basketbalplayers and footballplayers), to only select one time the students who play both sports. But the problem is footballplayers use helmets which is a column in footballplayers table but basketball players don't have helmets. What should I do?

You don't have to query all the columns - query just the once you need (read: the columns commons to both tables):
SELECT first_name, last_name
FROM footballplayers
UNION
SELECT first_name, last_name
FROM basketballplayers

I think you said you only want players who play both sports:
select first_name, last_name
from footballplayers f
inner join basketballplayers b on f.first_name = b.firstname and f.last_name = b.last_name

Related

sub-queries are running fast but joining them is taking forever

I have two tables (sample below) with some additional columns that I have not shown here. The only way to join the two tables is by using a combination of first name, last name, and address.
table A (~3000 rows):
First Name
Last Name
Address
Jane
Doe
123 Main St
Jack
Jones
100 Chestnut St
Tom
Locke
50 Market St
table B (~ 9M rows):
First Name
Last Name
Address
Jane
Doe
123 Main St
Jack
Jones
100 Chestnut St
Jeremy
Thomas
27 Spruce St
I have tried the following code -
select * from
(select first_name, last_name, address, concat(first_name, last_name, address) as con_A
from table_A) as A
join
(select first_name, last_name, address, concat(first_name, last_name, address) as con_B
from table_B) as B
on A.con_A=B.con_B
The above code is a generalization of what my code looks like. I have tried to only put the columns I need in the sub queries in my original code.
The two sub queries run within seconds when I run them individually but taking over an hour to execute when I join them.

I don't know that I'd use inline tables for this ... why not just a direct join?
select
A.first_name,
A.last_name,
A.address
from
table_A A
join table_B B on A.first_name = B.first_name AND
A.last_name = B.last_name AND
A.address = B.address
Now this is an inner join, so you'll only get exact matches for both. If you want to show records from one table whether they match or not, you'll need to use an outer join (left or right depending on the table you want to drive the results).

Instead of using the sub query in join you can directly use the join for better performance of the query.
select A.first_name, A.last_name, A.address from
(select first_name, last_name, address from table_A) as A
join
(select first_name, last_name, address from table_B) as B
on A.first_name=B.first_name, A.last_name=B.last_name, A.address=B.address
And one more thing don't use normal join use either left or right join based on your need. If you joining 3k records with 9M records all combination will form in the result. It makes very cost effective operation.

'ALL' concept in SQL queries

Relational Schema:
Students (**sid**, name, age, major)
Courses (**cid**, name)
Enrollment (**sid**, **cid**, year, term, grade)
Write a SQL query that returns the name of the students who took all courses.I'm not sure how I capture the concept of 'ALL' in a SQL query.
EDIT:
I want to be able write it without aggregation as I want to use the same logic for writing the query in relational algebra as well.
Thanks for the help!

One way of writing such queries is to count the number of course and number of courses each student took, and compare them:
SELECT s.*
FROM students s
JOIN (SELECT sid, COUNT(DISTINCT cid) AS student_courses
FROM enrollment
GROUP BY sid) e ON s.sid = e.sid
JOIN (SELECT COUNT(*) AS cnt
FROM courses) c ON cnt = student_cursed

This gives course combinations that are possible but haven't been taken...
SELECT s.sid, c.cid FROM students CROSS JOIN courses
EXCEPT
SELECT sid, cid FROM enrollment
So, you can then do the same with the student list...
SELECT sid FROM students
EXCEPT
(
SELECT DISTINCT
sid
FROM
(
SELECT s.sid, c.cid FROM students CROSS JOIN courses
EXCEPT
SELECT sid, cid FROM enrollment
)
AS not_enrolled
)
AS slacker_students
I don't like it, but it avoids aggregation...

SELECT *
FROM Students
WHERE NOT EXISTS (
SELECT 1 FROM Courses
LEFT OUTER JOIN Enrollment ON Courses.cid = Enrollment.cid
AND Enrollment.sid = Students.sid
WHERE Enrollment.sid IS NULL
)
btw. names of tables should be in singular form, not plural

Which Join for SQL plus query

I have 4 tables, I would like to select one column from each table, but only if the department has both 'Mick' and 'Dave working in it (must have both names, not one or the other). But it does not seem to be working properly:
SELECT SCHOOL_NAME, TOWN, COUNTY
FROM STUDENTS
NATURAL JOIN SCHOOLS NATURAL JOIN TOWNS NATURAL JOIN
COUNTIES
WHERE FIRST_NAME IN ('Mick','Dave)
/
I'm going wrong somewhere (probably lots of places :( ). Any help would be great

Don't use NATURAL JOIN. It is an abomination, because it does not take properly declared foreign key relationships into account. It only looks at the names of columns. This can introduce really hard to find errors.
Second, what you want is aggregation:
select sc.SCHOOL_NAME, t.TOWN, c.COUNTY
from STUDENTS st join
SCHOOLS sc
on st.? = sc.? join
TOWNS t
on t.? = ? join
COUNTIES c
on c.? = t.?
where FIRST_NAME in ('Mick', 'Dave')
group by sc.SCHOOL_NAME, t.TOWN, c.COUNTY
having count(distinct st.first_name) = 2;
The ? are placeholders for table and column names. If you are learning SQL, it is all the more important that you understand how columns line up for joins in different tables.
A where clause can only check the values in a single row. There is a separate row for each student, so there is no way -- with just a where -- to find both students. That is where the aggregation comes in.

You need at least three Join conditions, and properly end the string Dave with quote :
SELECT SCHOOL_NAME, TOWN, COUNTY
FROM SCHOOLS h
JOIN TOWNS t ON (t.id=h.town_id)
JOIN COUNTIES c ON (t.county_id=c.id)
WHERE EXISTS ( SELECT school_id
FROM STUDENTS s
WHERE s.first_name in ('Mick','Dave')
AND school_id = h.id
GROUP BY school_id
HAVING count(1)>1
);
SQL Fiddle Demo

You can use an analytic function in a sub-query to count the students who have the name Mick or Dave for each school_id (assuming that is your identifier for a school):
SELECT SCHOOL_NAME, TOWN, COUNTY
FROM ( SELECT *
FROM (
SELECT d.*,
COUNT(
DISTINCT
CASE WHEN FIRST_NAME IN ( 'Mick', 'Dave' ) THEN FIRST_NAME END
) OVER( PARTITION BY school_id )
AS num_matched
FROM STUDENTS d
)
WHERE num_matched = 2
)
NATURAL JOIN SCHOOLS
NATURAL JOIN TOWNS
NATURAL JOIN COUNTIES;
SQLFiddle
You would also be better to use an INNER JOIN and explicitly specify the join condition rather than relying on NATURAL JOIN.

SQL: Filter out entries that appear more than once

I'm new to SQL and hope to find some help here.
English is not my native language so if something seems unclear feel free to ask!
Like the topic name implies I want to filter out entries (Strings) from a table that exist more than once.
My code looks like this:
SELECT DISTINCT characterid, firstname, lastname, courseid
FROM Teaches
NATURAL JOIN Character
GROUP BY characterid, firstname, lastname, courseid
And it gives me this:
Table
The task is to filter out everyone who teaches more than 1 course. In this case it would be Snape and Quirrell.
I tried it with counting
HAVING count(characterid) > 1
But that didn't work. I would be very happy if someone could help me and maybe explain why that count didn't work. Thank you in advance!
EDIT: If I say "filter" then I mean I want it as a result table. So that in the end I get a table with 2 rows with
1) characterid Severus Snape
2) characterid Quirinus Quirrell
Sorry for being so unclear. Also I only included the courseid in the SELECT statement to see who teaches more than one course more clearly. The final table should only have the three columns "characterid", "firstname" and "lastname"
EDIT2: Here is the structure of the data base. Maybe I'm completely wrong so it could be helpful to you guys: Structure

Why that count didn't work? Because you group courseid, so it only count for each courseid, not count all course for each characterid
SO you should change it to
SELECT
characterid, firstname, lastname
FROM
Teaches
NATURAL JOIN
Character
GROUP BY
characterid, firstname, lastname
HAVING
COUNT(*) > 1;
And I suggest you use INNER JOIN instead of NATURAL JOIN. NATURAL JOIN is not standard, not clean, invisible to coder (Others can't know you want to join which columns, so it's not readable)
And according to your comment, I assume that you want to get all character that teaches more than 1 course in any year, any position.
So for your case you should use (if you want to get in any school then delete t.schoolid in the subquery):
SELECT
characterid, firstname, lastname
FROM
(
SELECT DISTINCT
c.characterid, c.firstname, c.lastname, t.courseid, t.schoolid
FROM
Teaches t
INNER JOIN
Character c
ON
t.characterid = c.characterid
)
GROUP BY
characterid, firstname, lastname
HAVING
COUNT(*) > 1;

How to join 2 tables based on a like in mysql

I have 2 tables Employee and Company_Employee.
Employee:
ID, FirstName
Company_Employee:
ID, Company_ID, Employee_ID
I want to do a search by First Name. I was thinking my query would look like:
select FirstName, ID from Employee where FirstName LIKE '%John%' and ID in (select id from Company_Employee)
This query returns no rows. Does anyone know how I can get the rows with a like by FirstName with these 2 tables?
Thanks!

Your query compares a company_employee.id with an employee.id. It should probably compare employee.id with company_employee.employee_id.
You can rewrite the query more clearly with a join:
select *
from employee e
join company_employee ce
on e.id = ce.Employee_ID
where e.FirstName like '%John%'

Something like this
SELECT
*
FROM
Employee e
INNER JOIN Company_Employee ce JOIN ON e.Id = ce.Id)
WHERE
FirstName LIKE '%JOHN%'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to solve union de-duplication problem in SQL - sql

If I want to make a Union between two tables (basketbalplayers and footballplayers), to only select one time the students who play both sports. But the problem is footballplayers use helmets which is a column in footballplayers table but basketball players don't have helmets. What should I do?

You don't have to query all the columns - query just the once you need (read: the columns commons to both tables): SELECT first_name, last_name FROM footballplayers UNION SELECT first_name, last_name FROM basketballplayers

I think you said you only want players who play both sports: select first_name, last_name from footballplayers f inner join basketballplayers b on f.first_name = b.firstname and f.last_name = b.last_name

Related

sub-queries are running fast but joining them is taking forever

'ALL' concept in SQL queries

Which Join for SQL plus query

SQL: Filter out entries that appear more than once

How to join 2 tables based on a like in mysql

Categories

Resources