I am trying to write a query that selects from a list of employee_id and find duplicate book purchases (book_id) and associated cost savings (list_price). If a duplicate exists, it needs sum the prices of the amount of duplicate book_id's.
So if someone has a book costing $10 associated to their employee_id and the book is offered to them again, they don't have to buy it and there is a savings of $10. If that happens again, there's a savings of $20.
I tried a having>1 but I can't seem to get the query correct to accurately sum the savings.
Any help is appreciated.
To start,
select employee_id, book_id, count(*)
from book_purchases
group by employee_id, book_id
having count(*) > 1
gets you the list you need.
If we don't have to worry about the price changing, then we just add a column or two more to get:
select employee_id, book_id,
count(*) as copies_purchased,
sum(list_price) as total_spent,
count(*) - 1 as copies_unnecessarily_purchased,
(count(*) - 1) * avg(list_price) as amount_overspent
from book_purchases
group by employee_id, book_id
having count(*) > 1
Of course you can join to the employee and book tables to get names and titles to fat out the results a bit.
To get the total amount overspent by each employee, you could wrap the above query thusly:
select a.employee_id, sum(a.amount_overspent) as total_amount_overspent
from (
select employee_id, book_id,
count(*) as copies_purchased,
sum(list_price) as total_spent,
count(*) - 1 as copies_unnecessarily_purchased,
(count(*) - 1) * avg(list_price) as amount_overspent
from book_purchases
group by employee_id, book_id
having count(*) > 1
) as a
group by a.employee_id
Lastly, I went ahead and joined to an employee table that I presumed you have while I was at it:
select a.employee_id, emp.employee_name, sum(a.amount_overspent) as total_amount_overspent
from (
select employee_id, book_id,
count(*) as copies_purchased,
sum(list_price) as total_spent,
count(*) - 1 as copies_unnecessarily_purchased,
(count(*) - 1) * avg(list_price) as amount_overspent
from book_purchases
group by employee_id, book_id
having count(*) > 1
) as a
inner join employee as emp on emp.employee_id = a.employee_id
group by a.employee_id, emp.employee_name
To be clear, these aren't four separate queries; they're just intermediate stages in building the single query you see at the end.
I hope this helps.
Related
I have 1 table called day_shift, with columns: employee_id, shop_id and date.
I need to create SELECT query to my DB table to get shop_id for each employee_id ordered by the highest amount of date-records. Main idea: Employee able to work in any shop, programm add day-shift by shop_id ordered to date, but Employee will be assigned to the department in which it appears more often.
Actual query that give just first record in table by employee_id:
SELECT TOP 1 shop_id FROM day_shift WHERE employee_id = ?1 ORDER BY date desc
How to get shop_id with most frequent equal record date for user?
[EDIT 1]: Table also contain column id but I dont use it in query.
to get shop_id for each employee_id ordered by the highest amount of date-records.
I think you are using SQL Server. In any case, building on your query syntax, you would use:
SELECT TOP (1) WITH TIES employee_id, shop_id, date, COUNT(*)
FROM day_shift
GROUP BY employee_id, shop_id, date
ORDER BY ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY COUNT(*) DESC);
If you want this just for one employee:
SELECT TOP (1) employee_id, shop_id, date, COUNT(*)
FROM day_shift
WHERE employee_id = ?
GROUP BY employee_id, shop_id, date
ORDER BY COUNT(*) DESC;
Let's say I have a table orders with 20 columns. I'm only interested in the first 4 columns: id, department_id, region_id, datetime where id is a customer id and datetime is the time the customer placed an order. The other columns are more specific to product details (e.g. product_id), so on a given order, you may have multiple rows. I'm struggling to write a query to get me the earliest department and region by each customer as the same customer can have multiple combinations of department_id and region_id.
SELECT a.*
FROM (
SELECT id,
department_id,
region_id,
min(DATETIME) AS ts
FROM orders
GROUP BY id,
department_id,
region_id
) a
INNER JOIN (
SELECT id,
min(DATETIME) AS ts
FROM orders
GROUP BY id
) b
ON a.id = b.id
AND a.ts = b.ts
This seems to work, but it doesn't seem very efficient and poorly written. Is there a better way to write this? The table itself is fairly large, so this query is slow.
I would just do:
SELECT id, department_id, region_id, datetime
FROM (SELECT o.*
row_number() over (partition by id order by datetime) as seqnum
FROM orders o
) o
where seqnum = 1;
EDIT:
You can try this version in to see if it works better:
select o.*
from orders o join
(select id, min(datetime) as min_datetime
from orders
group by id
) oo
on oo.id = o.id and oo.datetime = o.datetime;
In most databases, the row_number() version would probably have better performance. However, Hive can make arcane optimization decisions and this might be better.
I think you maybe could use having like this:
SELECT id, department_id, region_id, min(datetime) AS ts
FROM orders
GROUP BY id, department_id, region_id
HAVING ts=min(datetime)
Use dense_rank() analytic function:
SELECT
id,
department_id,
region_id,
min(DATETIME) AS ts
FROM
(
SELECT id,
department_id,
region_id,
DATETIME,
dense_rank() over(partition by id order by DATETIME) AS rnk
FROM orders
)s
WHERE rnk=1 --records with minimal date by id
GROUP BY id,
department_id,
region_id;
This query does the same as yours, but the table will be scanned once, without join.
Tables:
Department (dept_id,dept_name)
Students(student_id,student_name,dept_id)
I am using Oracle. I have to print the name of that department that has the minimum no. of students. Since I am new to SQL, I am stuck on this problem. So far, I have done this:
select d.department_id,d.department_name,
from Department d
join Student s on s.department_id=d.department_id
where rownum between 1 and 3
group by d.department_id,d.department_name
order by count(s.student_id) asc;
The output is incorrect. It is coming as IT,SE,CSE whereas the output should be IT,CSE,SE! Is my query right? Or is there something missing in my query?
What am I doing wrong?
One of the possibilities:
select dept_id, dept_name
from (
select dept_id, dept_name,
rank() over (order by cnt nulls first) rn
from department
left join (select dept_id, count(1) cnt
from students
group by dept_id) using (dept_id) )
where rn = 1
Group data from table students at first, join table department, rank numbers, take first row(s).
left join are used is used to guarantee that we will check departments without students.
rank() is used in case that there are two or more departments with minimal number of students.
To find the department(s) with the minimum number of students, you'll have to count per department ID and then take the ID(s) with the minimum count.
As of Oracle 12c this is simply:
select department_id
from student
group by department_id
order by count(*)
fetch first row with ties
You then select the departments with an ID in the found set.
select * from department where id in (<above query>);
In older versions you could use RANK instead to rank the departments by count:
select department_id, rank() over (order by count(*)) as rnk
from student
group by department_id
The rows with rnk = 1 would be the department IDs with the lowest count. So you could select the departments with:
select * from department where (id, 1) in (<above query>);
I have read answers to similar questions but I cannot find a solution to my particular problem.
I will use a simple example to demonstrate my question.
I have a table called 'Prizes' with two columns: Employees and Awards
The employee column lists the employee's ID and award shows a single award won by the employee. If an employee has won multiple awards their ID will be listed in multiple rows of the table along with each unique award.
The table would look as follows:
Employee AWARD
1 Best dressed
1 Most attractive
2 Biggest time waster
1 Most talkative
3 Hardest worker
4 Most shady
3 Most positive
3 Heaviest drinker
2 Most facebook friends
Using this table, how would I select the ID's of the employees who won the most awards?
The output should be:
Employee
1
3
For the example as both these employees won 3 awards
Currently, the query below outputs the employee ID along with the number of awards they have won in descending order:
SELECT employee,COUNT(*) AS num_awards
FROM prizes
GROUP BY employee
ORDER BY num_awards DESC;
Would output:
employee num_awards
1 3
3 3
2 2
4 1
How could I change my query to select the employee(s) with the most awards?
A simple way to express this is using rank() or dense_rank():
SELECT p.*
FROM (SELECT employee, COUNT(*) AS num_awards,
RANK() OVER (ORDER BY COUNT(*) DESC) as seqnum
FROM prizes
GROUP BY employee
) p
WHERE seqnum = 1;
Being able to combine aggregation functions and analytic functions can make these queries much more concise.
You can use dense_rank to get all the rows with highest counts.
with cnts as (
SELECT employee, count(*) cnt
FROM prizes
GROUP BY employee)
, ranks as (select employee, cnt, dense_rank() over(order by cnt desc) rnk
from cnts)
select employee, cnt
from ranks where rnk = 1
I want to find the largest sale for each of my employees (and display the name of the employee). In MySQL, it's pretty straightforward:
select *
from employee, sale
where employee.id = sale.employee_id
group by employee_id
order by sale.total desc
This does pretty much what one would expect, it would return a list of employees and end up returning the largest sale record with the employee row.
But, Oracle does not allow you to return columns which are not group by expressions when a group by clause is used. Do this make what I do in MySQL "impossible" in Oracle? Or is there some workaround? I suppose I could perform some sort of subquery, but not sure if there is another way to do this that wouldn't quite be so complicated to construct.
Get rid of your select * and replace it with just the columns you need and then group by all the "non-processed" columns.
You'll end up with something like:
select employee.id, employee.name, max(sale.total)
from employee, sale
where employee.id = sale.employee_id
group by employee.id, employee.name
order by max(sale.total) desc
It's a pain - I've had to do this many times before - but just add all the related columns to your group by
To get the largest sale you can use group by with the max function.
select e.name, max (s.total)
from employee e, sale s
where e.id = s.employee_id
group by e.name
order by s.total desc
I have made an assumption that the employee name is in the name column of the employee table. I have also aliased the employee table and sales tables.
If you would prefer to see the total sales for an employee, you can swap out max() and use sum() instead.
Congratulations, you've learned just enough to be dangerous!
What you really want is each employee's largest sale. Now it happens that sorting them by sales amount desc and then grouping them works in MySQL, even though that isn't legal according to ANSI SQL. (Basically, MySQL is arbitrarily grabbing the first row for each employee, and that "works" because of the sort.)
The right way to do this is not to rely on the side of effect of the sort doing what you want; instead you should explicitly ask for what you want: the largest sale for each employee. In SQL that's:
select employee.id, max( sale.total)
from employee, sale
where employee.id = sale.employee_id
group by employee.id
order by 2
If you want to select an employee with a highest sale, you don't need GROUP BY here at all.
All you need is to select the highest sale and join it back to the employees:
SELECT *
FROM (
SELECT sale.*, ROW_NUMBER() OVER (ORDER BY total DESC) AS rn
FROM sale
) s
JOIN employee e
ON e.id = s.employee_id
AND s.rn = 1
This will select a single row with a total highest sale.
If you want to select per-employee highest sale, just add a PARTITION BY clause to your query:
SELECT *
FROM (
SELECT sale.*, ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY total DESC) AS rn
FROM sale
) s
JOIN employee e
ON e.id = s.employee_id
AND s.rn = 1