Hive: Invalid Column Reference - sql

In Hive, I've got four tables:
temp_basic_info (ID, MSISDN, GENDER, AGE, DAY, MONTH, YEAR, RELATIONSHIPSTATUS)
temp_education (ID, EDUCATION)
likes_and_music (ID, NAME, PAGE)
temp_output (ID, MSISDN, GENDER, AGE, DAY, MONTH, YEAR, RELATIONSHIPSTATUS, EDUCATION, LIKES_AND_PREFERENCES)
temp_output is empty.
Now, I want to transfer the appropriate fields from the other three tables into temp_output. likes_and_music has multiple instances of the same ID's, paired with varying NAMEs and PAGEs, so I'd have to put them in an array.
My projected output is something like the following:
0001 msisdn1 male 21 1 2 92 0 College [Jeep, soccer, PC games, etc...]
And here's my query so far:
Select a.ID, a.MSISDN, a.GENDER, a.AGE, a.DAY, a.MONTH, a.YEAR, a.RELATIONSHIPSTATUS, b.EDUCATION, COLLECT_SET(c.NAME) FROM temp_basic_info a JOIN temp_education b ON (a.ID = b.ID) JOIN likes_and_music c ON (c.ID = b.ID) GROUP BY a.ID, a.MSISDN, a.GENDER, a.AGE, a.DAY, a.MONTH, a.YEAR, a.RELATIONSHIPSTATUS, b.EDUCATION, c.name limit 10;
But the latter returns the following error:
FAILED: SemanticException [Error 10002]: Line 1:311 Invalid column reference 'EDUCATION'
What am I missing?

temp_education (ID, NAME)
I don't see the column b.education for table temp_education b

Related

Postgres Question: Aren't both a and b correct?

For questions below, use the following schema definition.
restaurant(rid, name, phone, street, city, state, zip)
customer(cid, fname, lname, phone, street, city, state, zip)
carrier(crid, fname, lname, lp)
delivery(did, rid, cid, tim, size, weight)
pickup(did, tim, crid)
dropoff(did, tim, crid)
It's a schema for a food delivery business that employs food carriers (carrier table).
Customers (customer table) order food from restaurants (restaurant table).
The restaurants order a delivery (delivery table); to deliver food from restaurant to customer.
The pickup table records when carrier picks up food at restaurant.
The dropoff table records when carrier drops off food at customer.
1.Find customers who have less than 5 deliveries.
a. select cid,count()
from delivery
group by cid
having count() < 5;
b. select a.cid,count()
from customer a
inner join delivery b
using(cid)
group by a.cid
having count() < 5;
c. select a.cid,count()
from customer a
left outer join delivery b
on a.cid=b.cid
group by a.cid
having count() < 5;
d. select cid,sum(case when b.cid is not null then 1 else 0 end)
from customer a
left outer join delivery b
using (cid)
group by cid
having sum(case when b.cid is not null then 1 else 0 end) < 5;
e. (write your own answer)
No, they are not correct. They miss customers who have had no deliveries.
The last is the best of a bunch of not so good queries. A better version would be:
select c.cid, count(d.cid)
from customer c left outer join
delivery d
on c.cid = d.cid
group by c.cid
having count(d.cid) < 5;
The sum(case) is over kill. And Postgres even offers a better solution than that!
count(*) filter (where d.cid is not null)
But count(d.cid) is still more concise.
Also note the use of meaningful table aliases. Don't get into the habit of using arbitrary letters for tables. That just makes queries hard to understand.

'ALL' concept in SQL queries

Relational Schema:
Students (**sid**, name, age, major)
Courses (**cid**, name)
Enrollment (**sid**, **cid**, year, term, grade)
Write a SQL query that returns the name of the students who took all courses.I'm not sure how I capture the concept of 'ALL' in a SQL query.
EDIT:
I want to be able write it without aggregation as I want to use the same logic for writing the query in relational algebra as well.
Thanks for the help!
One way of writing such queries is to count the number of course and number of courses each student took, and compare them:
SELECT s.*
FROM students s
JOIN (SELECT sid, COUNT(DISTINCT cid) AS student_courses
FROM enrollment
GROUP BY sid) e ON s.sid = e.sid
JOIN (SELECT COUNT(*) AS cnt
FROM courses) c ON cnt = student_cursed
This gives course combinations that are possible but haven't been taken...
SELECT s.sid, c.cid FROM students CROSS JOIN courses
EXCEPT
SELECT sid, cid FROM enrollment
So, you can then do the same with the student list...
SELECT sid FROM students
EXCEPT
(
SELECT DISTINCT
sid
FROM
(
SELECT s.sid, c.cid FROM students CROSS JOIN courses
EXCEPT
SELECT sid, cid FROM enrollment
)
AS not_enrolled
)
AS slacker_students
I don't like it, but it avoids aggregation...
SELECT *
FROM Students
WHERE NOT EXISTS (
SELECT 1 FROM Courses
LEFT OUTER JOIN Enrollment ON Courses.cid = Enrollment.cid
AND Enrollment.sid = Students.sid
WHERE Enrollment.sid IS NULL
)
btw. names of tables should be in singular form, not plural

Selecting cities that have 10 or more students and instructors combined in SQL

I need to show the city, state, number of student residents, number of instructor residents, and total student/instructor residents in that city. The information is contained in 3 tables: ZIPCODE, STUDENT, and INSTRUCTOR.
The ZIPCODE table has the columns ZIP, CITY, and STATE.
The STUDENT table has STUDENT_ID and ZIP.
The INSTRUCTOR table has INSTRUCTOR_ID and ZIP.
I've tried a couple of inner joins, and intersects, but I keep getting a wide variety of errors. I'm still very new with SQL, and am not sure how to actually make this work, any help or advice would be greatly appreciated.
You probably want a mix of union and join for this. I doubt you want intersect. Plenty of ways to do this, here's one
SELECT
Z.city,
Z.state,
SUM(case when d.typ = 's' then 1 ELSE 0 END) as count_students,
SUM(case when d.typ = 'i' then 1 ELSE 0 END) as count_instructors,
Count(*) as count_all
FROM
(SELECT * FROM
(SELECT 's' as typ, zip FROM student)
UNION ALL
(SELECT 'I ' as typ, zip FROM Instructor)
) d
INNER JOIN
zipcode z
ON d.zip on z.zip
GROUP BY
z.city, z.state
I pull all the records out of each student and instructor table and union them to make one big list, make a column to keep track of the type, the sum does the counting, when the type is s, the case when returns a 1. The sum will sum the 1s up as a count. You thus end up with a city/state/typ combination for each row and when grouped on city and state and summed on the typ, it gives a count
Here's another way to do this:
SELECT
Z.city,
Z.state,
SUM(s.ct) as count_students,
SUM(i.ct) as count_instructors,
SUM(s.ct) + SUM(I.ct) as count_all
FROM
zipcode z
LEFT OUTER JOIN
(SELECT zip, count(*) ct FROM student GROUP BY zip) s
ON s.zip = z.zip
LEFT OUTER JOIN
(SELECT zip, count(*) as ct FROM Instructor GROUP BY zip) i
ON i.zip = z.zip
GROUP BY z.city, z.state
We group and count the students and the instructors in their own subqueries producing just a single count per zip and join these (left join) to all the zip codes. We group in a sub query to ensure that there is only ever a 1:1 relationship between zipcode and s/i. If it were 1:many the sums would beome distorted. Because multiple zips can refer to one city there is another round of grouping and summing to aggregate all the zips from one city

counting in three joined table

I have three tables. The first one is PrivteOwner and has 5 columns (ownerno, fname, lname, address, telno), the second one is PropertyForRent that has 10 columns (propertyno, street, city, postcode, type, rooms, rent, ownerno, staffno, branchno) and the third one is Viewing with 4 columns (clientno, propertyno, viewdate, comment).
I want to find the owner who has the most properties without a viewing. My code is as below:
SELECT
CONCAT (A.fname, ' ', A.lname) AS OwnerName,
A.ownerno, B.propertyno, B.ownerno
FROM
PrivateOwner AS A
INNER JOIN
PropertyforRent AS B ON A.ownerno = B.ownerno
LEFT JOIN
viewing AS C
SELECT
ownerno, COUNT(ownerno), viewdate
FROM
Max_Property
GROUP BY ownerno
ORDER BY COUNT(ownerno) DESC
WHERE
ROWNUM = 1 and viewdate IS NULL;
Does this code work correctly? If yes how we can write it efficiently?
It's very difficult to answer question if you don't provide data definition, data or your expected result.
Anyway, from your description I think this might get you your desired result.
SELECT
TOP 1
PrivteOwner.ownerno,
PrivteOwner.fname,
PrivteOwner.lname,
COUNT(ViewNumber) AS PropertyNumber
FROM
(
SELECT
PropertyForRent.propertyno AS propertyno
, COUNT(Viewing.propertyno) AS ViewNumber
FROM PropertyForRent
LEFT JOIN Viewing ON Viewing.propertyno = PropertyForRent.propertyno
GROUP BY PropertyForRent.propertyno
) AS NoView
JOIN PropertyForRent ON PropertyForRent.propertyno = NoView.propertyno
JOIN PrivteOwner ON PrivteOwner.ownerno = PropertyForRent.ownerno
WHERE ViewNumber = 0
GROUP BY PrivteOwner.ownerno,
PrivteOwner.fname,
PrivteOwner.lname
ORDER BY PropertyNumber DESC

SQL Database with duplicate

Say I have multiset in my table and with the below I will get the duplicates
select name, address from users group by
name, address having count(*) > 1
But my problem is ... say I have another field called credit. I would want to compare credits in the duplicate values and would take the second if the second credit is higher than the first (that is max)
select name, address, from users group by
name, address having count(*) > 1
use
select A.name, A.address, max ( A.credits ) mc from users A
where (A.name, A.address) in
(
select B.name, B.address from users B group by
B.name, B.address having count(*) > 1
)
group by A.name, A.address