Average function on different columns in HIVE

Average function on different columns in HIVE - sql

I want to find the average of 3 columns using hive query.
Consider the below data:
I need to find the average marks scored by each student and then the average of total marks in each school:
NULL should be ignored.
My output should look like this:
Can you guys help me out here

Hive should automatically ignore NULL values on aggregates as reported here.
For readability, I suggest using COALESCE instead of IF IS NULL statements such as: COALESCE(Math,0) as Math

Average of each student:
select school,SL_No,Name,Math,Phy,Chem,(if(Math is NULL,0,Math)+if(Phy is NULL,0,Phy)+if(Chem is NULL,0,Chem))/3 as avg_marks from my_table
Average Marks of each school
select school,avg(avg_marks) from (select school,SL_No,Name,Math,Phy,Chem,(if(Math is NULL,0,Math)+if(Phy is NULL,0,Phy)+if(Chem is NULL,0,Chem))/3 as avg_marks from my_table
) temp group by school

Related

Generate columns from values returned by SELECT

I've got a query that returns data like so:
student
course
grade
a-student
ENG-W05
100
a-student
MAT-W05
85
a-student
ENG-W06
100
b-student
MAT-W05
90
b-student
SCI-W05
75
The data is grouped by student and course. Ideally, I'd like to have the above data transformed into the below:
student
ENG-W05
MAT-W05
ENG-W06
SCI-W05
a-student
100
85
100
NULL
b-student
NULL
90
NULL
75
So, after the transformation, each student only has one record, with all of their grades (and any missing courses graded as null).
Does anyone have any ideas? Obviously, this is fairly simple to do if I take the data out and transform it in a language (like Python), but I'd love to get the data in the desired format with an SQL query.
Also, would it be possible to have the columns order alphabetically (ascending)? So, the final output would be:
student
ENG-W05
ENG-W06
MAT-W05
SCI-W05
a-student
100
100
85
NULL
b-student
NULL
NULL
90
75
EDIT: To clarify, the values in course aren't known. The ones I provided are just examples. So ideally, if more course values found there way into that first query result (the first table), they would still be mapped to columns in the final result (without needing to change the query). In reality, I actually have >1k distinct values for the course column, and so I can't manually write out each one.

demos:db<>fiddle
You can use conditional aggregation for that:
SELECT
student,
SUM(grade) FILTER (WHERE course = 'ENG-W05') as eng_w05,
SUM(grade) FILTER (WHERE course = 'MAT-W05') as mat_w05,
SUM(grade) FILTER (WHERE course = 'ENG-W06') as eng_w06,
SUM(grade) FILTER (WHERE course = 'SCI-W05') as sci_w05
FROM mytable
GROUP BY student
The FILTER clause allows to aggregate only some specific records. So this one aggregates all records for a specific course.
Finding the correct aggregate function could be difficult. Here SUM() does the job, as there's only one value per group. MAX() or MIN() would do it as well. It depends on your real requirement. If there's really only one value per group, it doesn't matter, you just need to do any aggregation.
Instead of FILTER clause, which is Postgres specific, you could use the more SQL standard fitting CASE clause:
SELECT
student,
SUM(
CASE
WHEN course = 'ENG-W05' THEN grade
END
) AS eng_w05,
...

You can use the conditional aggregation as follows:
select student,
max(case when course = 'ENG-W05' then grade end) as "ENG-W05",
max(case when course = 'MAT-W05' then grade end) as "MAT-W05",
max(case when course = 'ENG-W06' then grade end) as "ENG-W06",
max(case when course = 'SCI-W05' then grade end) as "SCI-W05"
from (your_query) t
group by student

Finding AVG of Rows and merging in to one in MS SQL

In the following screenshot, you can see city rows repeating 3 time ang with different values so I need to merge 3 cities into one ang, its value should be its average value.
Thanks
sai kumar

Just do this:
SELECT [Metric_Name]
,AVG(THSLD_RANGE_TO_VAL)
FROM [mytable]
GROUP BY [Metric_Name];

You can try the following query
SELECT Metric_Name,AVG(THSLD_RANGE_TO_VAL) as Average
FROM MetricTable
WHERE Metric_Name='CITY'
GROUP BY Metric_Name

sql count or sum?

I'm a newbie programmer, I want to sum a value of employee's attendance record
Anyway, what should I choose? COUNT or SUM?
I tried to use COUNT functions like this...
SELECT COUNT(jlh_sakit) AS sakit FROM rekap_absen
It shows value changed to "1" for 1 Record only.
And I try to use SUM functions like this...
SELECT SUM(jlh_sakit) AS sakit FROM rekap_absen
It shows all values changed ALL value to "1"
I want to display only 1 person for each sum
(e.g : John (2 sick, 2 permissions, 1 alpha)
Can you help me please?

If you are using any aggregate function like min/max/sum/count you should use group by. Now your question says "what should I choose? COUNT or SUM?" Assuming you have person_name, jlh_sakit which means sick/permission/alpha in your case you could use
select person_name, count(jhl_sakit) as attribute
from rekap_absen
group by person
This will give you output like:
person_name attribute
John 2
King 5

In order to sum by specified column, use group by statement.
SELECT SUM(sick),SUM(alpha),SUM(permissions),person FROM rekap_absen group by person
It will group your sums according to person.
You may name your sums like:
SELECT SUM(sick) as sick,SUM(alpha) as alpha,SUM(permissions) as permissions,person FROM rekap_absen group by person
Assuming that you have table rekap_absen with columns: person,sick,alpha,permissions

Calculate sum of one row and divide by sum of another row. Oracle view/query

This is my first question on here so bear with me. I have two tables in my Oracle database as following:
modules with fields:
module_code eg. INF211
module_title eg. Information technology
credits eg.20
module_progress with fields:
student_id eg. STU1
module_code eg. INF211
module_year eg. 1
module_percent eg. 65
Each student takes 5 modules a year.
So this is what I want to put this all in one query/view if possible:
Find sum of module percents for a particular student
Find sum of all credits for each module with regards to their module percents.
Divide sum of module percents by sum of credits and multiply by 100 to give me an average grade.
Can this be done?

SELECT student_id,
SUM(credits * module_percent) / SUM(credits) * 100.0
FROM module_progress mp
JOIN modules m
ON m.module_code = mp.module_code
GROUP BY
student_id

Retrieve names by ratio of their occurrence

I'm somewhat new to SQL queries, and I'm struggling with this particular problem.
Let's say I have query that returns the following 3 records (kept to one column for simplicity):
Tom
Jack
Tom
And I want to have those results grouped by the name and also include the fraction (ratio) of the occurrence of that name out of the total records returned.
So, the desired result would be (as two columns):
Tom | 2/3
Jack | 1/3
How would I go about it? Determining the numerator is pretty easy (I can just use COUNT() and GROUP BY name), but I'm having trouble translating that into a ratio out of the total rows returned.

SELECT name, COUNT(name)/(SELECT COUNT(1) FROM names) FROM names GROUP BY name;

Since the denominator is fixed, the "ratio" is directly proportional to the numerator. Unless you really need to show the denominator, it'll be a lot easier to just use something like:
select name, count(*) from your_table_name
group by name
order by count(*) desc
and you'll get the right data in the right order, but the number that's shown will be the count instead of the ratio.
If you really want that denominator, you'd do a count(*) on a non-grouped version of the same select -- but depending on how long the select takes, that could be pretty slow.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Average function on different columns in HIVE - sql

I want to find the average of 3 columns using hive query. Consider the below data: I need to find the average marks scored by each student and then the average of total marks in each school: NULL should be ignored. My output should look like this: Can you guys help me out here

Hive should automatically ignore NULL values on aggregates as reported here. For readability, I suggest using COALESCE instead of IF IS NULL statements such as: COALESCE(Math,0) as Math

Related

Generate columns from values returned by SELECT

Finding AVG of Rows and merging in to one in MS SQL

sql count or sum?

Calculate sum of one row and divide by sum of another row. Oracle view/query

Retrieve names by ratio of their occurrence

Categories

Resources