HIVE Get male and female count who opted for any course - sql

I have two tables, students and training. Student and Training tables are as below.
Student
ID name age sex salary
1213 lavanya 18 Female 8000
1208 reshma 19 Female 14000
1207 bhavya 20 Female 15000
1212 Arshad 28 Male 20000
1209 kranthi 22 Male 22000
1210 Satish 24 Male 25000
1211 Krishna 25 Male 26000
1203 khaleel 34 Male 30000
1204 prasant 30 Male 31000
1206 laxmi 25 Female 35000
1205 kiran 20 Male 40000
1201 gopal 45 Male 50000
1202 manisha 40 Female 51000
Training
1 1201 csharp
2 1205 c
3 1201 c
4 1202 java
5 1205 java
6 1203 shell
7 1204 hadoop
8 1201 hadoop
Now I want count of males and females who have joined any course.
I tried below query-
hive> select s.sex, count(*) from student join training t on s.id=t.sid group by s.sex;
But this query is giving output as Female 2 Male 4
Though expected outcome should be Female 1 Male 2
Please note this is a sample and short form of data being used.

This looks like your query, but - returns the result you mentioned (1 female, 2 male). If possible, post your own SQL*Plus copy/paste session (take my example) so that we'd see what you exactly did).
SQL> with student (id, name, sex) as
2 (select 1, 'alex', 'm' from dual union
3 select 2, 'rita', 'f' from dual union
4 select 3, 'max', 'm' from dual union
5 select 4, 'steve', 'm' from dual
6 ),
7 training (id, sid, course) as
8 (select 1, 2, 'java' from dual union
9 select 2, 3, 'c' from dual union
10 select 3, 1, 'java' from dual
11 )
12 select s.sex, count(*)
13 from student s join training t on t.sid = s.id
14 group by s.sex;
S COUNT(*)
- ----------
m 2
f 1

I try in MySQL and in Oracle, and this query is OK.
SELECT S.sex, count(*)
FROM student s
INNER JOIN training T on S.id = T.sid
GROUP BY S.sex;
RESULT, female = 1, male = 2

If the only thing you want is a simple count by gender, why not use
select sex, count(*)
from student
group by sex
order by sex

Use exists:
select s.sex, count(*)
from students s
where exists (select 1 from training t where t.sid = s.id);
The problem with join is that it counts each student based on the number of trainings they are in.

Here i had written a code taking your data:-
SELECT
final.ct_sex as sex,count(*) as num
FROM
(SELECT tb.sex as ct_sex FROM newschema.mytable AS tb JOIN (SELECT tr.ID,GROUP_CONCAT(tr.skill) as skills FROM newschema.train AS tr GROUP BY tr.ID) AS tp ON tb.ID = tp.ID) as final
group by
final.ct_sex

Not sure why join fails here, Below subquery is giving correct output though.
select sex, count(*) from salary where salary.id in (select sid from training) group by salary.sex;

Related

Show mapping based on student access

There is a table which has following data:
student subject code
student1 maths 312
student1 physics 785
student2 english 900
student3 geography 317
I am trying to restrict access to each student in the table to view data specific to their chosen subject. But there is one restriction to show maths data to student2. Thereby both student1 and student2 both would be able to see maths data, and this mapping has to be done without altering the master data. So only while displaying the table, student2 should be mapped to both english and maths.
Thanks for the help here!
One option is to - as you said - temporarily use UNION set operator. Something like this:
SQL> WITH
2 test (student, subject, code)
3 AS
4 (SELECT 'student1', 'maths', 312 FROM DUAL
5 UNION ALL
6 SELECT 'student1', 'physics', 785 FROM DUAL
7 UNION ALL
8 SELECT 'student2', 'english', 900 FROM DUAL
9 UNION ALL
10 SELECT 'student3', 'geography', 317 FROM DUAL)
11 SELECT *
12 FROM test
13 WHERE student = '&&par_student'
14 -- add this to your query
15 UNION
16 SELECT 'student2', 'maths', NULL
17 FROM DUAL
18 WHERE '&&par_student' = 'student2';
Enter value for par_student: student1 --> student1 is OK, it has two subjects
STUDENT SUBJECT CODE
-------- --------- ----------
student1 maths 312
student1 physics 785
SQL> undefine par_student
SQL> /
Enter value for par_student: student2 --> for student2, UNION is used
STUDENT SUBJECT CODE
-------- --------- ----------
student2 english 900
student2 maths
SQL> undefine par_student
SQL> /
Enter value for par_student: student3 --> nothing new for student3
STUDENT SUBJECT CODE
-------- --------- ----------
student3 geography 317
SQL>
Depending on tool you use, parameter might look as this (in e.g. TOAD):
WHERE student = :par_student
or any other way parameters are used in that tool of yours.
Something like:
SELECT student, subject, code
FROM (
SELECT t.*,
COUNT(
CASE
WHEN student = :your_student
OR (:your_student, subject) IN (('student2', 'maths'))
THEN 1
END
) OVER (PARTITION BY subject) AS has_access
FROM table_name t
)
WHERE has_access > 0
Then, for the sample data:
CREATE TABLE table_name (student, subject, code) AS
SELECT 'student1', 'maths', 312 FROM DUAL UNION ALL
SELECT 'student1', 'physics', 785 FROM DUAL UNION ALL
SELECT 'student2', 'english', 900 FROM DUAL UNION ALL
SELECT 'student3', 'geography', 317 FROM DUAL;
If :your_student is student1 then the output is:
STUDENT
SUBJECT
CODE
student1
maths
312
student1
physics
785
and if :your_student is student2 then the output is:
STUDENT
SUBJECT
CODE
student2
english
900
student1
maths
312
db<>fiddle here

Select query to return no records

I have Table 1 : EMP as below
ID NAME CITY AMT
-------------------------------------------
1 sajani Bangalore 20
2 Prashanth Bangalore 10
3 Jayvin Bangalore 10
Table 2: EMP1
ID NAME1 CITY1 AMT1
---------------------------------------------
1 Sajani Bangalore 10
1 Sajani Bangalore 10
2 Prashanth Bangalore 10
3 Jayvin Bangalore 10
ID is the Key and is common in both the files. I want a Select SQL statement which states table 1 = table 2 and once select query is executed, it should return 0 records.
Use MINUS operator
SELECT ID, NAME, CITY, AMT
FROM EMP
MINUS
SELECT ID, NAME1, CITY1, SUM(AMT1)
FROM EMP1
GROUP BY ID, NAME1, CITY1
Thanks, I figured out the solution
Select * from emp
where amt <>(
select sum(e1.amt1)
from emp1 e1
where e1.id =emp.id
)

joining 2 tables in sql which has no dependency on each other

I have 2 tables in the following way
Table 1:
e_id e_name e_salary e_age e_gender e_dept
---------------------------------------------------
1 sam 95000 45 male operations
2 bob 80000 21 male support
3 ann 125000 25 female analyst
Table 2:
d_salary d_age d_gender e_dept
----------------------------------
34000 25 male Admin
56000 41 female Tech
77000 35 female HR
I want the output something like this:
e_id e_name e_salary e_age e_gender e_dept d_salary d_age d_gender e_dept
1 sam 95000 45 male operations 34000 25 male Admin
2 bob 80000 21 male support 56000 41 female Tech
3 ann 125000 25 female analysts 77000 35 female HR
There is no dependency between the tables. No common columns. No primary or foreign key.
I tried using cross join that results in duplicate rows because it works on M X N
I am new to this SQL thing. Can someone help me, please? Thanks in advance
Though I didn't get the reason behind your desired output but you can get that with below query:
select a.e_id ,a.e_name ,a.e_salary ,a.e_age ,a.e_gender ,a.e_dept,b.d_salary ,b.d_age ,b.d_gender ,b.e_dept
from
(select e_id ,e_name ,e_salary ,e_age ,e_gender ,e_dept, row_number()over(order by e_id)rn
from table1)a
inner join
(select d_salary d_age d_gender e_dept,row_number()over(order by d_salary) rn
from table 2) b
on a.rn=b.rn
Generally you can create a row count using the row_number() window function on both tables and use this as join criterion. But this requires a certain order for both tables, which means that you have explicitly tell the query why is the Admin record ordered first and must be joined on the first record of table 1:
SELECT
*
FROM (
SELECT
*,
row_number() OVER (ORDER BY e_id) as row_count -- assuming e_id is your order criterion
FROM table1
) t1
JOIN (
SELECT
*,
row_number() OVER (ORDER BY /*whatever you expect to be ordered*/) as row_count
FROM table2
) t2
ON t1.row_count = t2.row_count

How to get a row of unique values from prior row followed by nulls till the next value?

I have a query that has multiple joins and fields. I have one row that has alot of duplicates. I need to only get the distict values from this specific row while leaving the size of the query the same due to the other joins.
I have tried group by and districts but they eliminate other critical information in the query. I need to leave the query length the same.
example:(pseudocode)
SELECT
Name
,StateID
,Age
,Toy
,ManufactureName
From
peopleTable as people
LEFT JOIN toyTable on people.id = toytable.id
LEFT JOIN ManufactureTable on toyTable.toyId=ManufactureTable.ManId
WHERE
toytable.id >1000
output
Name StateID Age Toy Manufacture
Carlo 1 10 Woody Disney
Sid 1 10 Buzz Disney
Abby 1 10 Car RaceMan
Bobby 4 10 Doll Barbie
Sally 6 10 Book Barns&
Jim 6 10 Woody Disney
ExpectedOutput
Name StateID Age Toy Manufacture NewField
Carlo 1 10 Woody Disney 1
Sid 1 10 Buzz Disney NULL
Abby 1 10 Car RaceMan NULL
Bobby 4 10 Doll Barbie 4
Sally 6 10 Book Barns& 6
Jim 6 10 Woody Disney Null
Would something like this help?
Using ROW_NUMBER analytic function, find out the first row in a group of those that share the same stateid. Note that I used order by null as I don't know which one is the first (name isn't, nor is age or toy or manufacture). If you don't care, leave it as is. If you know how to sort them, use that column.
SQL> with test (name, stateid, age, toy, manufacture) as
2 (select 'Carlo', 1, 10, 'Woody', 'Disney' from dual union all
3 select 'Sid' , 1, 10, 'Buzz' , 'Disney' from dual union all
4 select 'Abby' , 1, 10, 'Car' , 'RaceMan' from dual union all
5 select 'Bobby', 4, 10, 'Doll' , 'Barbie' from dual union all
6 select 'Sally', 6, 10, 'Book' , 'Barns&' from dual union all
7 select 'Jim' , 6, 10, 'Woody', 'Disney' from dual
8 )
9 select name, stateid, age, toy, manufacture,
10 case when row_number() over (partition by stateid order by null) = 1 then stateid
11 else null
12 end new_field
13 from test;
NAME STATEID AGE TOY MANUFAC NEW_FIELD
----- ---------- ---------- ----- ------- ----------
Carlo 1 10 Woody Disney 1
Sid 1 10 Buzz Disney
Abby 1 10 Car RaceMan
Bobby 4 10 Doll Barbie 4
Sally 6 10 Book Barns& 6
Jim 6 10 Woody Disney
6 rows selected.
SQL>

Select rows where every child row meets a condition

In my Oracle DB, I have two tables in a one-to-many relationship: Managers and Employees.
+------------+-------+------------+
| Manager_ID | Name | Department |
+------------+-------+------------+
| 1 | Steve | Sales |
| 2 | Ben | Sales |
| 3 | Molly | Accounts |
+------------+-------+------------+
+-------------+------------+--------+-----+
| Employee_ID | Manager_ID | Name | Age |
+-------------+------------+--------+-----+
| 1 | 1 | Kyle | 25 |
| 2 | 1 | Gary | 31 |
| 3 | 2 | Renee | 31 |
| 4 | 2 | Oliver | 32 |
+-------------+------------+--------+-----+
How do I select only those Managers where every one of his Employees is over the age of 30?
In my example data, the only Manager who meets this condition is Ben, because both of his employees are over 30.
I thought something like this would do it, but it's wrong:
SELECT m.manager_id
FROM managers m
WHERE m.manager_id IN (SELECT e.manager_id
FROM employees e
GROUP BY e.manager_id
HAVING e.age > 30)
Use not exists :
select m.*
from manager m
where not exists (select 1
from Employees e
where e.Manager_ID = m.Manager_ID and e.Age < 30
) and
exists (select 1 from Employees e where e.Manager_ID = m.Manager_ID)
The only thing I don't like about Yogesh's answer (which I upvoted, since it's probably the way I'd write it) is that you have to go to the employees table a second time, to make sure the manager actually has at least one employee.
On the plus side, the NOT EXISTS that Yogesh used will allow Oracle to stop looking at a manager's employees once it finds one that is too young. So, maybe it's a toss-up.
I'll offer this alternative. It is shorter than the NOT EXISTS and does not have to go to the employees table a second time.
SELECT m.*
FROM manager m
CROSS APPLY (
SELECT min(age) min_age
FROM employee e
WHERE e.manager_id = m.manager_id ) ma
where ma.min_age >= 30;
Using sub-query for counts
SQL> WITH manager(Manager_ID, Name, Department) AS (
2 SELECT 1, 'Steve', 'Sales' FROM dual UNION ALL
3 SELECT 2, 'Ben', 'Sales' FROM dual UNION ALL
4 SELECT 3, 'Molly', 'Accounts' FROM dual),
5 employee(Employee_ID, Manager_ID, Name, Age) AS (
6 SELECT 1 , 1, 'Kyle', 25 FROM dual UNION ALL
7 SELECT 2 ,1, 'Gary', 31 FROM dual UNION ALL
8 SELECT 3, 2, 'Renee', 31 FROM dual UNION ALL
9 SELECT 4, 2 , 'Oliver', 32 FROM dual)
10 ---------------------------
11 --- End of data preparation
12 ---------------------------
13 SELECT m.name
14 FROM manager m
15 JOIN (SELECT manager_id,
16 COUNT(1) total,
17 COUNT(CASE WHEN age > 30 THEN 1 ELSE NULL END) age_30_above
18 FROM employee
19 GROUP BY manager_id) ee
20 ON m.manager_id = ee.manager_id
21 WHERE total = age_30_above;
Output
NAME
-----
Ben
Your query will be:
SELECT m.name
FROM manager m
JOIN (SELECT manager_id,
COUNT(1) total,
COUNT(CASE WHEN age > 30 THEN 1 ELSE NULL END) age_30_above
FROM employee
GROUP BY manager_id) ee
ON m.manager_id = ee.manager_id
WHERE total = age_30_above;
SELECT manager_id
FROM employees -- managers
minus
select manager_id
from employees
where age <= 30
You can use ALL function like this:
SELECT m.manager_id
FROM managers m
WHERE (30 <= ALL (SELECT e.age FROM employees e WHERE e.manager_id = m.manager_id));
You might want to reverse the conditions, select all managers, who dont have any employee below 30
select * from managers
where manager_id not in (select manager_id
from employees
where age < 30)