Calculate average with assigned values SQL - sql

Assume I have a table and it has the following columns:
assignment_id, employee.id, employee_salary and employee_performance
(assignment_id - int, employee.id - int, employee_salary - int, employee_performance - varchar).
The column employee_performance consists of the following values: excellent, good, average, bad, null.
I want to assign integers to the values and calculate average performance of an employee across different jobs. How can I do that?
For example, some employee has completed two assignments and his results are: excellent and good. I assign 10 to excellent and 9 to good and receive an average of 9.
assignment_id, employee.id, employee_salary employee_performance
1 1 100,000 excellent
2 1 100,000 excellent
3 1 100,000 good
4 4 50,000 good
5 3 75,000 null
Null means that an assignment is not yet completed.
I want to assign integers to employee_performance. For example, excellent - 10, good - 9, etc.
Result should:
eployee.id average_performance
1 9,7
3 9
epmployee.id = 4 is not included, as he does not have completed assignments.

You can use case. Something like this:
select employee_id,
avg(case when employee_performance = 'excellent' then 4.0
when employee_performance = 'good' then 3.0
when employee_performance = 'average' then 2.0
when employee_performance = 'bad' then 1.0
end) as avg_performance
from table
group by employee_id;

You can create a 2 column table that contains each performance (i.e. excellent, good, etc) and its corresponding value.
------------------------------
|performance | value|
------------------------------
|excellent | 10 |
------------------------------
|good | 9 |
------------------------------
Once you have that table you can do a simple join and group by to get the average value per employee.
select id, avg(pv.value)
from employees e
left join performance_values pv
on pv.performance = e.employee_performance
group by id

Related

Solving Logical Questions Using SQL

I am trying to solve a problem for a fun work exercise showing that SQL can be used to solve it. It is a puzzle that goes as follows:
Successfully navigating the waters during sea voyages is a challenging task. A captain’s most important decision is selecting the right crew for the voyage. A mix of different skill sets are required to sail the ship efficiently, navigate to the destination, and fish for food along the way.
Table 1 shows a list of crew members that are available for you to hire for the voyage. Each crew member demands a salary for the voyage and has different skill levels of Fishing, Sailing, and Navigation.
In order for your journey to be successful, you must have a cumulative skill of 15 or more in each of the three skill categories from all of your chosen crew members. You may choose as many crew members as you like.
Question: What is the minimum achievable cost for the voyage?"
I would say I am what I would consider an intermediate to advanced (depending on the situation) SQL user.
Not asking for an answer per-say but I have thought about the best way to solve and I was first thinking using a WHILE loop in some way. I have create a table to hold the data and added a 'salary_ranking' column (below). I am curious if anyone has any tips or suggestions on routes to go? I would like to use something I have never used before but also am trying to get to the most efficient answer.
Here is the data (I added the last column):
NAME FISHING SAILING NAVIGATION SALARY SALARY_RANK
---------- ----------- ----------- ----------- ----------- -----------
Amy 3 5 1 46000 3
Bill 1 2 5 43000 2
Carl 3 4 2 47000 4
Dan 4 3 1 36000 1
Eva 4 2 2 43000 2
Fred 1 3 4 55000 5
Greg 3 1 5 68000 8
Henry 5 4 2 64000 7
Ida 3 3 3 60000 6
(9 rows affected)
This is a CTE version, where I first create test data, then run a recursive query, using a MaxID to prevent it doing all the permutations.
declare #t table(Id int, NAME varchar(10), FISHING int, SAILING int, NAVIGATION int, SALARY int)
insert #t values (1,'Amy',3,5,1,46000)
,(2,'Bill',1,2,5,43000 )
,(3,'Carl',3,4,2,47000)
,(4,'Dan',4,3,1,36000)
,(5,'Eva',4,2,2,43000)
,(6,'Fred',1,3,4,55000)
,(7,'Greg',3,1,5,68000)
,(8,'Henry',5,4,2,64000)
,(9,'Ida',3,3,3,60000 )
;with cte as (
select convert(varchar(1000),name) as crew, fishing, sailing, navigation, salary, ID as MaxID from #t
union all
select convert(varchar(1000),cte.crew+', '+ t.name), cte.fishing+t.fishing, cte.sailing+t.sailing, cte.navigation+t.navigation, cte.salary+t.salary, t.ID
from #t t
join cte on t.ID>cte.MaxID
)
select top 1 crew,fishing,sailing,navigation,salary
from cte
where fishing>=15 and sailing>=15 and navigation>=15
order by salary
result is:
crew fishing sailing navigation salary
Amy, Bill, Carl, Greg, Henry 15 16 15 268000

Querying 100k records to 5 records

I have a requirement in such a way that it should join two tables with more than 100k records in one table and just 5 records in another table as shown below
Employee Dept Result
id Name deptid deptid Name Name deptid Name
1 Jane 1 1 Science Jane 1 Science
2 Jack 2 2 Maths Dane 1 Science
3 Dane 1 3 Biology Jack 2 Maths
4 Drack 3 4 Social Drack 3 Biology
5 Drim 5 Zoology Kery 4 Social
6 Drum 5 Drum 5 Zoology
7 Krack
8 Kery 4
.
.
100k
Which join need to be used to get the query in an better way to perform to get the result as shown.
I just want the query to join with other table from employee table only which has dept which i thought of below query but wanted to know is there any better way to do it.
Select e.name,d.deptid,d.Name from
(Select deptid,Name from Employee where deptid IS NOT NULL) A
and dept d where A.deptid=d.deptid;
Firstly not sure why you are performing your query the way you are. Should be more like
SELECT A.name, D.deptid,D.Name
FROM Employee A
INNER JOIN dept D
ON A.deptid = D.deptid
No need of the IS NOT NULL statement.
If this is a ONE TIME or OCCASIONAL thing and performance is key (not a permanent query in your DB) you can leave out the join altogether and do it using CASE:
SELECT
A.name, A.deptid,
CASE
WHEN A.deptid = 1 THEN "Science"
WHEN A.deptid = 2 THEN "Maths"
...[etc for the other 3 departments]...
END as Name
FROM Employee A
If this is to be permanent and performance is key, simply try applying an INDEX on the foreign key deptid in the Employee table and use my first query above.

How do you determine the average total of a column in Postgresql?

Consider the following Postgresql database table:
id | book_id | author_id
---------------------------
1 | 1 | 1
2 | 2 | 1
3 | 3 | 2
4 | 4 | 2
5 | 5 | 2
6 | 6 | 3
7 | 7 | 2
In this example, Author 1 has written 2 books, Author 2 has written 4 books, and Author 3 has written 1 book. How would I determine the average number of books written by an author using SQL? In other words, I'm trying to get, "An author has written an average of 2.3 books".
Thus far, attempts with AVG and COUNT have failed me. Any thoughts?
select avg(totalbooks) from
(select count(1) totalbooks from books group by author_id) bookcount
I think your example data actually only has 3 books for author id 2, so this would not return 2.3
http://sqlfiddle.com/#!15/3e36e/1
With the 4th book:
http://sqlfiddle.com/#!15/67eac/1
You'll need a subquery. The inner query will count the books with GROUP BY author; the outer query will scan the results of the inner query and avg them.
You can use a subquery in the FROM clause for this, or you can use a CTE (WITH expression).
For an average number of books per author you can do simply:
SELECT 1.0*COUNT(DISTINCT book_id)/count(DISTINCT author_id) FROM tbl;
For number of books per author:
SELECT 1.0*COUNT(DISTINCT book_id)/count(DISTINCT author_id)
FROM tbl GROUP BY author_id;
We need 1.0 factor to make the result not integer.
You can remove DISTINCT depending of result you want (it matters only if one book have many authors).
As Craig Ringer rightly pointed out 2 distincts may be expensive. For test performance I have generated 50 000 rows and I got followng results:
My query with 2 DISTINCTS: ~70ms
My query with 1 DISTINCT: ~40ms
Martin Booth's approach: ~30ms
Then added 1 milion rows and tested again:
My query with 2 DISTINCTS: ~1520ms
My query with 1 DISTINCT: ~820ms
Martin Booth's approach: ~1060ms
Then added another 9 milion rows and tested again:
My query with 2 DISTINCTS: ~17s
My query with 1 DISTINCT: ~11s
Martin Booth's approach: ~19s
So there is no universal solution.
This should work:
SELECT AVG(cnt) FROM (
SELECT COUNT(*) cnt FROM t
GROUP BY author_id
) s

SQL: Select all from column A and add a value from column B if present

I'm having quite an easy problem with SQL, I just can't word it properly (therefore I didn't find anything in google and my title probably is misleading)
The problem is: I have a big table containing transaction informations in the form (ID, EmployeeID, Date, Value) (and some more, but only those matter currently) and a list of all EmployeeIDs. What I want is a result table showing all employee IDs with their aggregated value of transactions in a given timespan.
The problem is: How do I get those employees into the result table that don't have an entry for the given time period?
e.g.
ID EMPLID DATE VALUE
1 1 2013-01-01 1000
2 2 2013-02-02 2000
3 1 2013-01-03 3000
4 2 2013-04-01 2000
5 2 2013-03-01 2000
6 1 2013-02-01 4000
EMPLID NAME
1 bob
2 alice
And now I want the aggregated value of all transactions after 2013-03-01 like this
EMPLID VALUE
1 0 <- how to get this based on the employee table?
2 4000
The SQL Server in use is Firebird and I connect to it through JDBC (if that matters)
SELECT a.EmpID, a.Name,
COALESCE(SUM(b.Value), 0) TotalValue
FROM Employee a
LEFT JOIN Transactions b
ON a.EmpID = b.EmpID AND
b.Date >= '2013-03-01'
GROUP BY a.EmpID, a.Name
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins

selecting top N rows for each group in a table

I am facing a very common issue regarding "Selecting top N rows for each group in a table".
Consider a table with id, name, hair_colour, score columns.
I want a resultset such that, for each hair colour, get me top 3 scorer names.
To solve this i got exactly what i need on Rick Osborne's blogpost "sql-getting-top-n-rows-for-a-grouped-query"
That solution doesn't work as expected when my scores are equal.
In above example the result as follow.
id name hair score ranknum
---------------------------------
12 Kit Blonde 10 1
9 Becca Blonde 9 2
8 Katie Blonde 8 3
3 Sarah Brunette 10 1
4 Deborah Brunette 9 2 - ------- - - > if
1 Kim Brunette 8 3
Consider the row 4 Deborah Brunette 9 2. If this also has same score (10) same as Sarah, then ranknum will be 2,2,3 for "Brunette" type of hair.
What's the solution to this?
If you're using SQL Server 2005 or newer, you can use the ranking functions and a CTE to achieve this:
;WITH HairColors AS
(SELECT id, name, hair, score,
ROW_NUMBER() OVER(PARTITION BY hair ORDER BY score DESC) as 'RowNum'
)
SELECT id, name, hair, score
FROM HairColors
WHERE RowNum <= 3
This CTE will "partition" your data by the value of the hair column, and each partition is then order by score (descending) and gets a row number; the highest score for each partition is 1, then 2 etc.
So if you want to the TOP 3 of each group, select only those rows from the CTE that have a RowNum of 3 or less (1, 2, 3) --> there you go!
The way the algorithm comes up with the rank, is to count the number of rows in the cross-product with a score equal to or greater than the girl in question, in order to generate rank. Hence in the problem case you're talking about, Sarah's grid would look like
a.name | a.score | b.name | b.score
-------+---------+---------+--------
Sarah | 9 | Sarah | 9
Sarah | 9 | Deborah | 9
and similarly for Deborah, which is why both girls get a rank of 2 here.
The problem is that when there's a tie, all girls take the lowest value in the tied range due to this count, when you'd want them to take the highest value instead. I think a simple change can fix this:
Instead of a greater-than-or-equal comparison, use a strict greater-than comparison to count the number of girls who are strictly better. Then, add one to that and you have your rank (which will deal with ties as appropriate). So the inner select would be:
SELECT a.id, COUNT(*) + 1 AS ranknum
FROM girl AS a
INNER JOIN girl AS b ON (a.hair = b.hair) AND (a.score < b.score)
GROUP BY a.id
HAVING COUNT(*) <= 3
Can anyone see any problems with this approach that have escaped my notice?
Use this compound select which handles OP problem properly
SELECT g.* FROM girls as g
WHERE g.score > IFNULL( (SELECT g2.score FROM girls as g2
WHERE g.hair=g2.hair ORDER BY g2.score DESC LIMIT 3,1), 0)
Note that you need to use IFNULL here to handle case when table girls has less rows for some type of hair then we want to see in sql answer (in OP case it is 3 items).