Finding out Percentage Value using Hive - sql

I have some tables as:
Table_1:
+------------+--------------+
| Student_ID | Student_Name |
+------------+--------------+
| 000 | Jack |
| 001 | Ron |
| 002 | Nick |
+------------+--------------+
Table_2:
+-----+-------+-------+
| ID | Total | Score |
+-----+-------+-------+
| 000 | 100 | 80 |
| 001 | 100 | 80 |
| 002 | 100 | 80 |
+-----+-------+-------+
Table_3:
+-----+-------+-------+
| ID | Total | Score |
+-----+-------+-------+
| 000 | 100 | 60 |
| 001 | 100 | 80 |
| 002 | 100 | 70 |
+-----+-------+-------+
Expected_Output:
ID percent
000 70
001 80
002 75
I have created a hive table before. Now, I want to come up with a single HiveQL so that, I can get the expected output from these above 3 tables.
What I am thinking to do is, in my query I will:
use the Left outer join using ID
find the sum of "Total" and "Score" for each ID
divide sum of "Score" by sum of "Total" to get percentage.
I came up with this:
INSERT OVERWRITE TABLE expected_output
SELECT t1.Student_ID AS ID, (100*t4.SUM1/t4.SUM2) AS percent
FROM Table_1 t1
LEFT OUTER JOIN(
SELECT (ISNULL(Total,0) + ISNULL(Total,0)) AS ‘SUM2’, (ISNULL(Score,0) + ISNULL(Score,0)) AS ‘SUM1’
FROM t4
)ON (t1.Student_ID=t2.ID) JOIN Table_3 t3 ON (t3.ID=t2.ID);
And, I am stuck at this point. Not sure how to reach to the result.
Any idea please?

This is a simple join. Assuming you have one row per id in each of tables t2 and t3, you can do
SELECT t2.Student_ID AS ID, 100.0*(t2.score+t3.score)/(t2.total+t3.total) AS percent
FROM Table_2 t2
JOIN Table_3 t3 ON t3.ID=t2.ID

Related

Trying to join a table of individuals to a table of couples, give a family ID and not time out the server

I have one table with fake individual tax records like so (one row per filer):
T1:
+-------+---------+---------+
| Person| Spouse | Income |
+-------+---------+---------+
| 1 | 2 | 34000 |
| 2 | 1 | 10000 |
| 3 | NULL | 97000 |
| 4 | 6 | 11000 |
| 5 | NULL | 25000 |
| 6 | 4 | 100000 |
+-------+---------+---------+
I have a second table which has tax 'families', a single individual or married couple (one line per tax 'family').
T1_Family:
+-------- -+-------+---------+
| Family_id| Person| Spouse |
+-------- -+-------+---------+
| 2 | 2 | 1 |
| 3 | 3 | NULL |
| 5 | 5 | NULL |
| 6 | 6 | 4 |
+------ ---+-------+---------+
Family = max(Person) within a couple
The idea of joining the two is for example, to sum the income of 2 people in one tax family (aggregate to the family level).
So, I've tried the following:
select *
into family_table
from
(
(select * from T1_family)a
join
(select * from T1)b
on a.family = b.person **or a.spouse = b.person**
)
where family_id is not null and person is not null
What I should get (and I do get when I select 1 random couple) is one line per individual where I can then group by family_id and sum income, pension contributions, etc. BUT SQL times out before the tables can be joined. The part in bold is what's slowing down the process but I'm not sure what else to do.
Is there an easier way to group by family?
It is simpler to put the data on one row:
select a.*, p.income as person_income, s.income as spouse_income
into family_table
from t1_family a left join
t1 p
on a.person = p.person lef tjoin
t1 s
on a.spouse = s.person;
Of course, you can add them together as well.

How to join a number with the range of number from another table

I have the two following tables:
| ID | Count |
| --- | ----- |
| 1 | 45 |
| 2 | 5 |
| 3 | 120 |
| 4 | 87 |
| 5 | 60 |
| 6 | 200 |
| 7 | 31 |
| SizeName | LowerLimit | UpperLimit |
| -------- | ---------- | ---------- |
| Small | 0 | 49 |
| Medium | 50 | 99 |
| Large | 100 | 250 |
Basically, one table specifies an unknown number of range names and their associated integer ranges. So a count range of 0 to 49 from the person table gets a small designation. 50-99 gets 'medium' etc. I need it to be dynamic because I do not know the range names or integer values.
Can I do this in a single query or would I have to write a separate function to loop through the possibilities?
One way to do this would be to join the tables, depending on if you want to keep values outside of your "range names", or not, you could use LEFT, or INNER join respectively.
SELECT A.id, A.Count, B.SizeName
FROM tableA A
LEFT JOIN tableB B ON A.id >= B.LowerLimit AND A.id < B.UpperLimit
You can also use the BETWEEN operator in a JOIN like this:
SELECT a.id, a.Count, b.SizeName
FROM tableA a
JOIN tableB b ON a.id BETWEEN b.LowerLimit AND b.UpperLimit

Sql view from multiple tables

I am stuck in a query, please help.
I want to create view.
Table1
ID | Acode | Bcode | Ccode |
1 | 10 | 101 | 102 |
2 | 11 | 100 | 101 |
3 | 10 | 100 | 102 |
Table2
Acode | Adescription |
10 | English |
11 | Math |
Table3
Bcode | Bdescription |
100 | Grade A |
101 | Grade B |
Table4
Ccode | Cdescription |
100 | Level A |
101 | Level B |
102 | Level C |
I want to print all rows in Table1 with description from other tables based on code in table1.
Output should be:
data
NewView
ID | Acode |Adescription | Bcode | Bdescription | Ccode | Cdescription |
1 | 10 | English | 101 | Grade B | 102 | Level C |
2 | 11 | Math | 100 | Grade A | 101 | Level B |
3 | 10 | English | 100 | Grade A | 102 | Level C |
I created left join but it returns more rows than actual in table1. I want to have only all records from table1 with description from other tables.
Please help
Below is an example. Since you didn't post your original query attempt, we can't really say why you were getting multiple rows. No need for a LEFT JOIN unless you are missing codes in the joined tables.
SELECT Table1.ID
, Table1.Acode
, Table2.Adescription
, Table1.Bcode
, Table3.Bdescription
, Table1.Ccode
, Table4.Cdescription
FROM dbo.Table1
JOIN dbo.Table2 ON Table2.Acode = Table1.Acode
JOIN dbo.Table3 ON Table3.Bcode = Table1.Bcode
JOIN dbo.Table4 ON Table4.Ccode = Table1.Ccode;
Thanks for help
LEFT Join worked well. I tried to narrow down the tables one by one and found the table where I was getting duplicate records. After finding table I found that I forgot to add Unique key and 1 record (Description) was entered twice which was giving duplicate records and total number of rows were increased.
Thanks all to help me out,and Dan Guzman to point me for duplicate codes.

Left Join COUNT on tables

I have 2 tables:
puid | personid | ptitle
----------------------------
1 | 200 | richard
2 | 201 | swiss
suid | personidref | stitle
----------------------------
1 | 200 | alf
2 | 201 | lando
3 | 200 | willis
4 | 201 | luke
5 | 201 | kojak
6 | 200 | r2-d2
7 | 201 | jabba
I am trying to left join with a count of table two. I have tried to figure out to use generate_series or sub selects but I cant noodle the syntax.
In english: show me each unique person in table one with a count of each entry in table two.
example output:
puid | personid | ptitle | count
---------------------------------
1 | 200 | richard | 3
2 | 201 | swiss | 4
Is this is simple subquery, is generate_series the right tool for the job?
select *
from
t1
left join
(
select personidref, count(*) total
from t2
group by personidref
) s using(personidref)
order by puid
Notice that doing the aggregation before joining probably has a performance gain over doing it after.

MySQL: How to select and display ALL rows from one table, and calculate the sum of a where clause on another table?

I'm trying to display all rows from one table and also SUM/AVG the results in one column, which is the result of a where clause. That probably doesn't make much sense, so let me explain.
I need to display a report of all employees...
SELECT Employees.Name, Employees.Extension
FROM Employees;
--------------
| Name | Ext |
--------------
| Joe | 123 |
| Jane | 124 |
| John | 125 |
--------------
...and join some information from the PhoneCalls table...
--------------------------------------------------------------
| PhoneCalls Table |
--------------------------------------------------------------
| Ext | StartTime | EndTime | Duration |
--------------------------------------------------------------
| 123 | 2010-09-05 10:54:22 | 2010-09-05 10:58:22 | 240 |
--------------------------------------------------------------
SELECT Employees.Name,
Employees.Extension,
Count(PhoneCalls.*) AS CallCount,
AVG(PhoneCalls.Duration) AS AverageCallTime,
SUM(PhoneCalls.Duration) AS TotalCallTime
FROM Employees
LEFT JOIN PhoneCalls ON Employees.Extension = PhoneCalls.Extension
GROUP BY Employees.Extension;
------------------------------------------------------------
| Name | Ext | CallCount | AverageCallTime | TotalCallTime |
------------------------------------------------------------
| Joe | 123 | 10 | 200 | 2000 |
| Jane | 124 | 20 | 250 | 5000 |
| John | 125 | 3 | 100 | 300 |
------------------------------------------------------------
Now I want to filter out some of the rows that are included in the SUM and AVG calculations...
WHERE PhoneCalls.StartTime BETWEEN "2010-09-12 09:30:00" AND NOW()
...which will ideally result in a table looking something like this:
------------------------------------------------------------
| Name | Ext | CallCount | AverageCallTime | TotalCallTime |
------------------------------------------------------------
| Joe | 123 | 5 | 200 | 1000 |
| Jane | 124 | 10 | 250 | 2500 |
| John | 125 | 0 | 0 | 0 |
------------------------------------------------------------
Note that John has not made any calls in this date range, so his total CallCount is zero, but he is still in the list of results. I can't seem to figure out how to keep records like John's in the list. When I add the WHERE clause, those records are filtered out.
How can I create a select statement that displays all of the Employees and only SUMs/AVGs the values returned from the WHERE clause?
Use:
SELECT e.Name,
e.Extension,
Count(pc.*) AS CallCount,
AVG(pc.Duration) AS AverageCallTime,
SUM(pc.Duration) AS TotalCallTime
FROM Employees e
LEFT JOIN PhoneCalls pc ON pc.extension = e.extension
AND pc.StartTime BETWEEN "2010-09-12 09:30:00" AND NOW()
GROUP BY e.Name, e.Extension
The issue is when using an OUTER JOIN, specifying criteria in the JOIN section is applied before the JOIN takes place--like a derived table or inline view. The WHERE clause is applied after the OUTER JOIN, which is why when you specified the WHERE clause on the table being LEFT OUTER JOIN'd to that the rows you still wanted to see are being filtered out.