MAX value from the sum of columns in Hive

MAX value from the sum of columns in Hive - hive

I'm new to Hive, and I'm stuck on a fairly simple problem. My data looks like:
Name---Day---Doctor Bill--- Room Bill
Rakesh 1 2500 1500
Raja 1 5000 2300
Raju 1 4500 2000
Rakesh 2 3750 2250
Rakesh 3 3550 1750
Raja 2 4500 4000
Raju 2 3450 4725
To find out who paid the highest of total doctor bill?
Query:
hive> insert overwrite table maxdrbill select t.name,sum(t.drbill) as totaldrbill from patient t join (select name from patient group by name order by sum(drbill) desc LIMIT 1) t1 on t.name=t1.name GROUP by t.name;
When I run the below query in hive I get the following error:
FAILED: Error in semantic analysis: Line 1:149 Invalid table alias or
column reference drbill

Query
select name,SUM(doctorbill) as s from bills GROUP BY name ORDER BY s DESC LIMIT 1;
Output
Rakesh 9800
Hope it helps!

Performance wise I believe this will be much better as the data does not need to be sorted out to get the max.
Just get the Max value after suming:
SELECT t1.Name, MAX(TotalDrBill) FROM
(SELECT t.Name, SUM(t.drbill) as TotalDrBill FROM Patient t GROUP BY t.Name) t1

Related

SQL: finding the maximum average grouped by an ID

Consider following dataset:
id
name
mgr_id
salary
bonus
1
Paul
1
68000
10000
2
Lucas
2
29000
null
3
Max
1
50000
20000
4
Zack
2
30000
null
I now want to find the manager who pays his subordinates the highest average salary plus bonus. A manager is someone who is present in of the mgr_id cells. So in this example Paul and Lucas are managers because their id is present in the mgr_id column of themselves and Max for Paul and Zack for Lucas.
Basically I want MAX(AVG(salary + bonus)) and then grouped by the mgr_id. How can I do that?
I am using SQLite.
My expected output in this example would be simply the employee name 'Paul' because he has 2 subordinates (himself and Max) and pays them more than the other manager Lucas.

SELECT
mrg_id
, pay_avg
FROM
(
SELECT mrg_id
, AVG(salary + COALESCE(bonus,0)) pay_avg
FROM <table>
GROUP
BY mrg_id
) q
ORDER
BY pay_avg
desc
LIMIT 1

select top 1 t1.mgr_id,AVG((t1.salary)+(t1.bonus)) as tot_sal
from #tbl_emps as t1
group by t1.mgr_id
order by AVG((t1.salary)+(t1.bonus)) desc

joining 2 tables in sql which has no dependency on each other

I have 2 tables in the following way
Table 1:
e_id e_name e_salary e_age e_gender e_dept
---------------------------------------------------
1 sam 95000 45 male operations
2 bob 80000 21 male support
3 ann 125000 25 female analyst
Table 2:
d_salary d_age d_gender e_dept
----------------------------------
34000 25 male Admin
56000 41 female Tech
77000 35 female HR
I want the output something like this:
e_id e_name e_salary e_age e_gender e_dept d_salary d_age d_gender e_dept
1 sam 95000 45 male operations 34000 25 male Admin
2 bob 80000 21 male support 56000 41 female Tech
3 ann 125000 25 female analysts 77000 35 female HR
There is no dependency between the tables. No common columns. No primary or foreign key.
I tried using cross join that results in duplicate rows because it works on M X N
I am new to this SQL thing. Can someone help me, please? Thanks in advance

Though I didn't get the reason behind your desired output but you can get that with below query:
select a.e_id ,a.e_name ,a.e_salary ,a.e_age ,a.e_gender ,a.e_dept,b.d_salary ,b.d_age ,b.d_gender ,b.e_dept
from
(select e_id ,e_name ,e_salary ,e_age ,e_gender ,e_dept, row_number()over(order by e_id)rn
from table1)a
inner join
(select d_salary d_age d_gender e_dept,row_number()over(order by d_salary) rn
from table 2) b
on a.rn=b.rn

Generally you can create a row count using the row_number() window function on both tables and use this as join criterion. But this requires a certain order for both tables, which means that you have explicitly tell the query why is the Admin record ordered first and must be joined on the first record of table 1:
SELECT
*
FROM (
SELECT
*,
row_number() OVER (ORDER BY e_id) as row_count -- assuming e_id is your order criterion
FROM table1
) t1
JOIN (
SELECT
*,
row_number() OVER (ORDER BY /*whatever you expect to be ordered*/) as row_count
FROM table2
) t2
ON t1.row_count = t2.row_count

Oracle SQL: how to call created columns (alias) for pivot tables in a subquery

This is my first question in this community. It has helped me a lot before, so thank you all for being out there!
I have a problem with ORACLE PLSQL, I'm trying to create a pivot table that counts the number of people that are in a given salary range. I want cities as rows and salary_range as columns. My problem is when I select a column alias for the pivot table.
In table A I have the rows of all employees and their salaries, and in table B, I have their city. Both of them are linked by a key column named id_dpto. First, I join both tables selecting employee names, salaries, and cities. Second, I use CASE WHEN to create the range of salaries (less than 1000 and between 1000 and 2500 dollars) and give it the column alias SALARY_RANGE. Until here, everything is ok and the code runs perfectly.
My problem is on the third step. I use a subquery and PIVOT command to create the pivot to count by cities and salary_range, but when I use the select command in the alias it doesn't work, my error message is "'F'.'SALARY_RANGE' INVALID IDENTIFYER". Can you help me what is the proper way to select a created column (salary_range) in a pivot table? I've tried both, with the F after the from and without it.
Initial data base
| Name | salary | city |
| ---- | ------ | ------ |
|john | 999 | NY |
|adam | 500 | NY |
|linda | 1500 | NY |
|Matt | 2000 | London |
|Joel | 1500 | London |
Desired result:
city
salary less than 1000
salary between 1000 and 2500
NY
2
1
London
0
2
My code:
SELECT F.SALARY_RANGE, F.CITY
FROM (SELECT A.NAMES,
A.SALARY,
C.CITY,
CASE
WHEN SALARY < 1000 THEN 'LESS THAN 1000'
WHEN SALARY < 2500 THEN 'BETWEEN 1000 AND 2500'
END AS SALARY_RANGE FROM EMPLOYEES A
LEFT JOIN XXX B ON A.ID_DPTO = B.ID_DPTO) F
PIVOT
(COUNT(SALARY_RANGE)
FOR SALARY_RANGE IN ('LESS THAN 1000', 'BETWEEN 1000 AND 2500')
)
Thanks for helping me!

I think you should use * and exclude SALARY and NAMES from subquery:
SELECT *
FROM (SELECT B.CITY,
CASE
WHEN SALARY < 1000 THEN
'LESS THAN 1000'
WHEN SALARY < 2500 THEN
'BETWEEN 1000 AND 2500'
END AS SALARY_RANGE
FROM EMPLOYEES A
LEFT JOIN XXX B
ON A.ID_DPTO = B.ID_DPTO) F
PIVOT(COUNT(SALARY_RANGE)
FOR SALARY_RANGE IN('LESS THAN 1000', 'BETWEEN 1000 AND 2500'))

How to Calculate amount value of employees repeated many times

I have SQL file that contains more than 500 employees. So many employee are repeated many times like
Name amount Id
--------------------
Raj 500 1
Kumar 300 4
Karthi 400 3
Raj 300 1
Raj 800 1
Kumar 300 4
In the above sample, Raj is repeated many times. My question: I want to calculate all Raj name Amount values. How can I get the total amount of Raj employee? Please give some idea please help me

simply use sum and group by
select sum(amount), Name from table Group by name

How to Sum the 1st record of one column with the 2nd record of another column?

I am trying the Sum the 2nd record of one column with the 1st record of another column and store the result in a new column
Here is the example SQL Server table
Emp_Code Emp_Name Month Opening_Balance
G101 Sam 1 1000
G102 James 2 -2500
G103 David 3 3000
G104 Paul 4 1800
G105 Tom 5 -1500
I am trying to get the output as below on the new Reserve column
Emp_Code Emp_Name Month Opening_Balance Reserve
G101 Sam 1 1000 1000
G102 James 2 -2500 -1500
G103 David 3 3000 1500
G104 Paul 4 1800 3300
G105 Tom 5 -1500 1800
Actually the rule for calculating the Reserve column is that
For Month-1 it's the same as Opening Balance
For rest of the months its Reserve for Month-2 = Reserve for Month-1 + Opening Balance for Month-2

You seem to want a cumulative sum. In SQL Server 2012+, you would do:
select t.*,
sum(opening_balance) over (order by [Month]) as Reserve
from t;
In earlier versions, you would do this with a correlated subquery or apply:
select t.*,
(select sum(t2.opening_balance) from t t2 where t2.[Month] <= t.[Month]) as reserve
from t;

You can do a self join.
SELECT t.Emp_Code, t.Emp_Name, t.Month, t.Opening_Balance, t.Opening_Balance + n.Reserve
FROM Table1 t
JOIN Table2 n
ON t.Month = n.Month - 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

MAX value from the sum of columns in Hive - hive

Query select name,SUM(doctorbill) as s from bills GROUP BY name ORDER BY s DESC LIMIT 1; Output Rakesh 9800 Hope it helps!

Performance wise I believe this will be much better as the data does not need to be sorted out to get the max. Just get the Max value after suming: SELECT t1.Name, MAX(TotalDrBill) FROM (SELECT t.Name, SUM(t.drbill) as TotalDrBill FROM Patient t GROUP BY t.Name) t1

Related

SQL: finding the maximum average grouped by an ID

joining 2 tables in sql which has no dependency on each other

Oracle SQL: how to call created columns (alias) for pivot tables in a subquery

How to Calculate amount value of employees repeated many times

How to Sum the 1st record of one column with the 2nd record of another column?

Categories

Resources