Hive query equivalent of sql - sql

Hi I have a table student as follows:
student_id course_id
1111 100
2222 101
3333 101
4444 102
5555 103
And a courses table as follows:
course_id course_desc
100 Electronics
101 Computer
102 Mechanical
In case If I want to join the above 2 tables I have course_id from student table which is not listed in courses table. So every time I do join I should compare the course_id with courses table and come to know a new course_id is in students table.
I believe in Sql we can use such as :
select DISTINCT course_id from students WHERE course_id NOT IN ( select course_id FROM courses);
How this can be done in Hive, any help or suggestion is much appreciated.

This should work:
select students.course_id from students students LEFT OUTER JOIN courses courses ON (students.course_id = courses.course_id) where courses.course_id is null;
I don't think NOT IN is supported so this work around.

Related

How to inner join same table with different conditions?

I need to get contact information where employee id is null and not null. How do I join the same table with these different conditions. I need the information to populate a report with both employee information and person accompanied them to a event. Here is the query I have so far.
select events.id, (persons.firstname+' '+ persons.lastname) as employee
from events
inner join eventscontacts on events.id = eventcontacts.events_id
inner join contacts on eventcontacts.contact_id = contacts.id
inner join persons on contacts.person_id = person.id
Eventcontacts table
Id ContactType_id contact_id event_id
1 1 1 300
2 2 3 300
Contact type is 1 for employee and 2 for non emplopyees
contacts table
Id person_id employee_id
1 100 200
2 101 201
3 102 NULL
4 103 202
5 104 203
Person table
Id firstname lastname
100 John Stewart
101 Greg Larry
102 Kim Hans
103 Gloria June
104 Dan Duke
Result table
ID employee accompany
300 John Stewart Kim Hans
right now, I have information of all the employees for the event. I want the people who accompanied these person for the events. Their employee id is null in the contacts table. How do I join the contacts table again here?
An inner join will return only the rows that exist in both tables, where it seems like you want all the rows from the contacts table including the rows that don't match due to them lacking an employee id.
If you use an outer join, it will return rows that exist in contacts AND in events like an inner join but ALSO rows that ONLY exist in events and rows that ONLY exist in contacts.
In the case that I am explaining this poorly, I recommend you read this to help explain:
https://mode.com/sql-tutorial/sql-outer-joins/
If you can successfully use an outer join you will get all the visitors in one table regardless of having an id or not.

Query to find missing teacher-student combination

I have a single table with two columns, TEACHER_ID and STUDENT_ID, which holds the data of all teachers teaching the students. One teacher can teach multiple students and one student can be taught by many teachers.
TEACHER_ID STUDENT_ID
100 123
100 124
100 125
100 126
101 123
101 124
101 125
102 123
102 124
102 125
102 126
103 123
103 127
The need is to find which teacher is not teaching which student, i.e. it should show the output as the same 2 columns but the pair should be such that it doesn't exist in the table.
For example: student_id 127 is taught by teacher_id 103 alone, and similarly for all such missing pairs...
We can create a cross join to get all possible combinations and using the MINUS operator discard the actual data from the result leaving us with rest of the pairs.
But is there a better and efficient way to do this?
As for your question. You need to create all combinations first there isnt another way to find the missing pair.
SQL DEMO
With teachers as (
SELECT DISTINCT "TEACHER_ID"
FROM Table1
), students as (
SELECT DISTINCT "STUDENT_ID"
FROM Table1
)
SELECT teachers."TEACHER_ID" , students."STUDENT_ID"
FROM teachers
CROSS JOIN students
LEFT JOIN Table1 t
ON teachers."TEACHER_ID" = t."TEACHER_ID"
AND students."STUDENT_ID" = t."STUDENT_ID"
WHERE t."TEACHER_ID" IS NULL
ORDER BY 2, 1
You can do this by generating all combinations of teacher and student using cross join. Then filter out the ones that exist:
select t.teacher_id, s.student_id
from (select distinct teacher_id from t) t cross join
(select distinct student_id from t) s
minus
select t.teacher_id, s.student_id
from t;
For the filtering part, you can also use not in, not exists, or left join/where.

How to Join three tables with multiple values rows returned on the other two

Say for example.
I have three separate tables:
course Table which has CourseId, StudentIds etc
student which of course contains student data and StudentName
score table
I only want one column from each table and fuse them into one.
CourseId StudentName Scores
---------- ------------- ----------
1 Gashio 10
1 Gashio 20
1 Lee 35
1 Lee 40
1 Edith 5
2 Lana 3
2 Reisha 50
For every Course there's multiple students, and for every Scores there's multiple scores they get from the course for a month.
I wanted something like this as a result:
CourseId StudentName Scores
--------- ------------- -------------
1 Gashio 10|20
1 Lee 35|40
1 Edith 5
2 Lana 3
2 Reisha 50
Since the scores return multiple values, I wanted it to become one column separated by a delimeter.
I'm not sure if this is where I should be using STRING_AGG?
You need STRING_AGG and GROUP BY
SELECT course.CourseId,
student.StudentName,
STRING_AGG(Scores, ,'|') AS Scores
FROM course INNER JOIN
student ON student.StudentId = course.StudentId INNER JOIN
score ON score.studentId = student.StudentId
GROUP BY cource.CourseId,
student.StudentName
use string_agg() with delimeter
select CourseId,StudentName,string_agg(Scores,'|') as scores
from tablename
group by CourseId,StudentName

Querying 100k records to 5 records

I have a requirement in such a way that it should join two tables with more than 100k records in one table and just 5 records in another table as shown below
Employee Dept Result
id Name deptid deptid Name Name deptid Name
1 Jane 1 1 Science Jane 1 Science
2 Jack 2 2 Maths Dane 1 Science
3 Dane 1 3 Biology Jack 2 Maths
4 Drack 3 4 Social Drack 3 Biology
5 Drim 5 Zoology Kery 4 Social
6 Drum 5 Drum 5 Zoology
7 Krack
8 Kery 4
.
.
100k
Which join need to be used to get the query in an better way to perform to get the result as shown.
I just want the query to join with other table from employee table only which has dept which i thought of below query but wanted to know is there any better way to do it.
Select e.name,d.deptid,d.Name from
(Select deptid,Name from Employee where deptid IS NOT NULL) A
and dept d where A.deptid=d.deptid;
Firstly not sure why you are performing your query the way you are. Should be more like
SELECT A.name, D.deptid,D.Name
FROM Employee A
INNER JOIN dept D
ON A.deptid = D.deptid
No need of the IS NOT NULL statement.
If this is a ONE TIME or OCCASIONAL thing and performance is key (not a permanent query in your DB) you can leave out the join altogether and do it using CASE:
SELECT
A.name, A.deptid,
CASE
WHEN A.deptid = 1 THEN "Science"
WHEN A.deptid = 2 THEN "Maths"
...[etc for the other 3 departments]...
END as Name
FROM Employee A
If this is to be permanent and performance is key, simply try applying an INDEX on the foreign key deptid in the Employee table and use my first query above.

Joining multiple tables with a single query

Student
student_id FirstName LastName
---------------------------------------------------
1 Joe Bloggs
2 Alan Day
3 David Johnson
Student_Course
course_id student_id courseName
---------------------------------------------------
1 1 Computer Science
2 1 David Beckham Studies
3 1 Geography
1 3 Computer Science
3 3 Geography
Student_clubs
club_id student_id club_name club_count
---------------------------------------------------
1 1 Footbal 10
2 1 Rugby 10
3 1 Syncronized Swimming 10
4 3 Tennis 15
In the above example, student with id = 1 takes 3 course and is part of 3 clubs.
If i was to find out which courses a student is involved in or which club the student is part of i can do it but i will need to run two queries. Is it possible to run a single query against the
tables listed above so that the results come out like this:
Output
student_id FirstName Student_associated_courses Student_associated_clubs
---------------------------------------------------------------------------
1 Joe 1,2,3 Football, Rugby, Syncronized swimming
3 David 1,3 Tennis
Is it possible to get the above output with just one query? I am using JDBC to get the data so i am trying to see if i can avoid multiple trips to get the necessary data.
use GROUP_CONCAT with DISTINCT in MySQL
SELECT a.student_ID, a.firstname,
GROUP_CONCAT(DISTINCT b.course_ID),
GROUP_CONCAT(DISTINCT c.club_name)
FROM student a
INNER JOIN student_Course b
ON a.student_id = b.student_ID
INNER JOIN student_clubs c
ON a.student_ID = c.student_ID
GROUP BY a.student_ID, a.firstname
See SQLFiddle Demo
Try it like this:
SELECT *
FROM Student s JOIN
(SELECT sc."student_id", listagg(sc."course_id", ',')within group(ORDER BY sc."course_id")
FROM Student_Course sc
GROUP BY sc."student_id") s_course ON s."student_id"=s_course."student_id"
JOIN (SELECT sl."student_id", listagg(sl."club_name", ',')within GROUP(ORDER BY sl."club_name")
FROM Student_clubs sl
GROUP BY sl."student_id") s_club ON s."student_id"=s_club."student_id"
The "catch" is that LISTAGG doesn't work with DISTINCT keyword
Here is a fiddle