I am working on an SQL task and I cannot figure out how to get the sum of two columns from the same table while displaying information from another table.
I have tried multiple things and have spent probably about two hours trying to figure this out.
I have two tables: Employees and Fuel. I displayed all of the employee's information.First SQL statement I had to make:
SELECT firstname, lastname, title, registrationyear, make, model FROM Employees ORDER BY make;
My Employees table has the following columns: firstname, lastname, employeeid, make, model, registrationyear, title
My Fuel table has the following columns: currentprice, fueltype, fuelcost, mileage, mileagecount, fuelamount, employeeid, date
My instructions state: "A list that shows what cars the employees currently use (first SQL statement I made, so this one is DONE!)
Like the above report but also the total amount of kilometers that the employees have driven and the total fuel cost." (this is the task that I am trying to make a statement for)
I have tried using LIKE, UNION, UNION ALL, etc. and the best that I have been able to do is listing the employee information and the totals ON TOP of the information instead of in two separate columns of their own alongside the other data in the query.
I am really stuck here. Could anyone please help me?
This second task is muck more complex than the first one.
First of all, combining in a single row the columns from two or more tables is what join is for, so you will have to join the two tables based on employeeid. This will return you a table like this
employeeid | other emp fields | fuel date | other fueld fields
1 | ... | 01/01/2017 | ...
1 | ... | 01/02/2017 | ...
2 | ... | 01/01/2017 | ...
2 | ... | 02/01/2017 | ...
2 | ... | 04/03/2017 | ...
From here, you want the data from each employee combined with the sum of the rows from fuel related to that employee, and that's what group by is for.
When using group by you define a set of columns that defines the grouping criteria; everything else in your select statement will have to be grouped somehow (in your case with a sum), so that the columns in the group by stay unique.
Your final query would look like this
select t1.firstname, t1.lastname, t1.title, t1.registrationyear, t1.make, t1.model,
sum(t2.mileage) as total_milege,
sum(t2.fuelcost * t2.fuelamount) as total_fuel_cost
from Employees t1
join Fuel t2
on t1.employeeid = t2.employeeid
group by t1.firstname, t1.lastname, t1.title, t1.registrationyear, t1.make, t1.model
Note: I don't know the difference between mileage and mileagecount, so the part of my query involving those fields may need some tweaking.
You can use Inner join & Group By clause as mentioned below. Let me know if you mean something else.
SELECT A.firstname, A.lastname, A.title, A.registrationyear, A.make, A.model,
SUM(B.Column_Having_Kilometer_Driven_Value)
FROM
Employee A
INNER JOIN Fuel B ON A.EmployeeID = B.EmployeeID
Group By A.EmployeeID, A.firstname, A.lastname, A.title, A.registrationyear, A.make, A.model
Related
Probably a newbie question, but most of my SQL Server experience is basic reporting, with all of my formatting and grouping being made somewhat manually in Excel. Now I am tackling a homework problem that I must solve everything within SQL...
I have a database with 2 tables:
Employees(id, job title, partnerID)
Bugs_Fixed(employeeID, bugs2010, bugs2011, bugs2012)
Each employee has one partner, who is on the same table (Like if an employee with ID 34 had partner ID 201, then ID 201 would have partnerID 34)
I need to essentially group those 2 together and calculate the combined total number of bugs they fixed (each year combined) without repeating the data for the inverse partner/employee relationship.
For example:
| Team | AMT |
| 34, 20 | 717 |
| 76, 16 | 576 |
| 102, 3 | 901 |
I've gotten the query to select based on id, then sum the # of bugs, but that is for each individual employee and it needs to be represented as a group.
SELECT employeeID, partnerID, SUM (bugs2010 + bugs2011 + bugs2012) as 'AMT'
FROM Bugs_Fixed
JOIN Employees on Employees.id = Bugs_Fixed.employeeID
GROUP BY employeeID, partnerID
It calculates the yearly bugfixes correctly, but obviously doesn't partner up the 2 ids and their combined total.
Edit: Clarified SQL Server
You might be able to adress this by generating a concatenated key made of the partnerID and the employeeID. The trick is to order the IDs, like:
SELECT
CONCAT(GREATEST(employeeID, partnerID), ',', LEAST(employeeID, partnerID) as team,
SUM(bugs2010 + bugs2011 + bugs2012) as 'AMT'
FROM Bugs_Fixed
JOIN Employees on Employees.id = Bugs_Fixed.employeeID
GROUP BY team
Notes - you did not tag the RDBMS that you are using:
LEAST() and GREATEST() are not supported by all RDBMS (notably, SQL Server does not support them, while MySQL, Oracle and Postgres do).
using a table alias in the GROUP BY clause (here, team) is not allowed in SQL Server, while MySQL, Postgres and sqlite do support it
SELECT
ARRAY[e1.id, e2.id] AS team,
SUM(b.bugs2010) + SUM(b.bugs2011) + SUM(b.bugs2012) AS amt
FROM employees e1
INNER JOIN employees e2 ON e2.partnerID = e1.id AND e1.id < e2.id
INNER JOIN bugs_fixed b ON b.employeeId IN (e1.id, e2.id)
GROUP BY e1.id, e2.id
Hope this is the solution your are looking for:
select concat(str(e.id), ', ', str(e.partner_id)) as TEAM,
AMT = (select sum(bugs2010 + bugs2011 + bugs2012) from Bugs_Fixed
where employee_id in (e.id, e.partner_id))
from employee e
I'm trying to get months of Employees' birthdays that are found in at least 2 rows
I've tried to unite birthday information table with itself supposing that I could iterate through them abd get months that appear multiple times
There's the question: how to get birthdays with months that repeat more than once?
SELECT DISTINCT e.EmployeeID, e.City, e.BirthDate
FROM Employees e
GROUP BY e.BirthDate, e.City, e.EmployeeID
HAVING COUNT(MONTH(b.BirthDate))=COUNT(MONTH(e.BirthDate))
UNION
SELECT DISTINCT b.EmployeeID, b.City, b.BirthDate
FROM Employees b
GROUP BY b.EmployeeID, b.BirthDate, b.City
HAVING ...
Given table:
| 1 | City1 | 1972-03-26|
| 2 | City2 | 1979-12-13|
| 3 | City3 | 1974-12-16|
| 4 | City3 | 1979-09-11|
Expected result :
| 2 | City2 |1979-12-13|
| 3 | City3 |1974-12-16|
Think of it in steps.
First, we'll find the months that have more than one birthday in them. That's the sub-query, below, which I'm aliasing as i for "inner query". (Substitute MONTH(i.Birthdate) into the SELECT list for the 1 if you want to see which months qualify.)
Then, in the outer query (o), you want all the fields, so I'm cheating and using SELECT *. Theoretically, a WHERE IN would work here, but IN can have unfortunate side effects if a NULL comes back, so I never use it. Instead, there's a correlated sub=query; which is to say we look for any results where the month from the outer query is equal to the months that make the cut in the inner (correlated sub-) query.
When using a correlated sub-query in the WHERE clause, the SELECT list doesn't matter. You could put 1/0 and it won't throw an error. But I always use SELECT 1 to show that the inner query isn't actually returning any results to the outer query. It's just there to look for, well, the correlation between the two data sets.
SELECT
*
FROM
#table AS o
WHERE
EXISTS
(
SELECT
1
FROM
#table AS i
WHERE
MONTH(i.Birthdate) = MONTH(o.Birthdate)
GROUP BY
MONTH(i.Birthdate)
HAVING
COUNT(*) > 1
);
Seems to be an odd requirement.
This might help with some tweaks. Works in Oracle.
SELECT DATE FROM TABLE WHERE EXTRACT(MONTH FROM DATE)=EXTRACT(MONTH FROM SOMEDATE);
Give this a try and you may be able to dispense with your UNION:
SELECT
EmployeeId
, City
, BirthDate
FROM Employees
GROUP BY
EmployeeId
, City
, BirthDate
HAVING COUNT(Month(BirthDate)) > 2
Here is another approach using GROUP_CONCAT. It's not exactly what you're looking for but it might do the job. Eric's approach is better though. (Note: This is for MySQL)
SELECT GROUP_CONCAT(EmployeeID) EmployeeID, BirthDate, COUNT(*) DupeCount
FROM Employees
GROUP BY MONTH(BirthDate)
HAVING DupeCount> 1;
Background
I have a table which has six columns. The first three columns create the pk. I'm tasked with removing one of the pk columns.
I selected (using distinct) the data into a temp table (excluding the third column), and tried inserting all of that data back into the original table with the third column being '11' for every row as this is what I was instructed to do. (this column is going to be removed by a DBA after I do this)
However, when I went to insert this data back into the original table I get a pk constraint error. (shocking, I know)
The other three columns are just date columns, so the distinct select didn't create a unique pk for each record. What I'm trying to achieve is just calling a distinct on the first two columns, and then just arbitrarily selecting the three other columns as it doesn't matter which dates I choose (at least not on dev).
What I've tried
I found the following post which seems to achieve what I want:
How do I (or can I) SELECT DISTINCT on multiple columns?
I tried the answers from both Joel,and Erwin.
Attempt 1:
However, with Joels answer the set returned is too large - the inner join isn't doing what I thought it would do. Selecting distinct col1 and col2 there are 400 columns returned, however when I use his solution 600 rows are returned. I checked the data and in fact there were duplicate pk's. Here is my attempt at duplicating Joels answer:
select a.emp_no,
a.eec_planning_unit_cde,
'11' as area, create_dte,
create_by_emp_no, modify_dte,
modify_by_emp_no
from tempdb.guest.temp_part_time_evaluator b
inner join
(
select emp_no, eec_planning_unit_cde
from tempdb.guest.temp_part_time_evaluator
group by emp_no, eec_planning_unit_cde
) a
ON b.emp_no = a.emp_no AND b.eec_planning_unit_cde = a.eec_planning_unit_cde
Now, if I execute just the inner select statement 400 rows are returned. If I select the whole query 600 rows are returned? Isn't inner join supposed to only show the intersection of the two sets?
Attempt 2:
I also tried the answer from Erwin. This one has a syntax error and I'm having trouble googling the spec on the where clause (specifically, the trick he is using with (emp_no, eec_planning_unit_cde))
Here is the attempt:
select emp_no,
eec_planning_unit_cde,
'11' as area, create_dte,
create_by_emp_no,
modify_dte,
modify_by_emp_no
where (emp_no, eec_planning_unit_cde) IN
(
select emp_no, eec_planning_unit_cde
from tempdb.guest.temp_part_time_evaluator
group by emp_no, eec_planning_unit_cde
)
Now, I realize that the post I referenced is for postgresql. Doesn't T-SQL have something similar? Trying to google parenthesis isn't working too well.
Overview of Questions:
Why doesn't inner join return an intersection of two sets? From googling this is what I thought it was supposed to do
Is there another way to achieve the same method that I was trying in attempt 2 in t-sql?
It doesn't matter to me which one of these I use, or if I use another solution... how should I go about this?
A select distinct will be based on all columns so it does not guarantee the first two to be distinct
select pk1, pk2, '11', max(c1), max(c2), max(c3)
from table
group by pk1, pk2
You could TRY this:
SELECT a.emp_no,
a.eec_planning_unit_cde,
b.'11' as area,
b.create_dte,
b.create_by_emp_no,
b.modify_dte,
b.modify_by_emp_no
FROM
(
SELECT emp_no, eec_planning_unit_cde
FROM tempdb.guest.temp_part_time_evaluator
GROUP BY emp_no, eec_planning_unit_cde
) a
JOIN tempdb.guest.temp_part_time_evaluator b
ON a.emp_no = b.emp_no AND a.eec_planning_unit_cde = b.eec_planning_unit_cde
That would give you a distinct on those fields but if there is differences in the data between columns you might have to try a more brute force approch.
SELECT a.emp_no,
a.eec_planning_unit_cde,
a.'11' as area,
a.create_dte,
a.create_by_emp_no,
a.modify_dte,
a.modify_by_emp_no
FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY emp_no, eec_planning_unit_cde) rownumber,
a.emp_no,
a.eec_planning_unit_cde,
a.'11' as area,
a.create_dte,
a.create_by_emp_no,
a.modify_dte,
a.modify_by_emp_no
FROM tempdb.guest.temp_part_time_evaluator
) a
WHERE rownumber = 1
I'll reply one by one:
Why doesn't inner join return an intersection of two sets? From googling this is what I thought it was supposed to do
Inner join don't do an intersection. Le'ts supose this tables:
T1 T2
n s n s
1 A 2 X
2 B 2 Y
2 C
3 D
If you join both tables by numeric column you don't get the intersection (2 rows). You get:
select *
from t1 inner join t2
on t1.n = t2.n;
| N | S |
---------
| 2 | B |
| 2 | B |
| 2 | C |
| 2 | C |
And, your second query approach:
select *
from t1
where t1.n in (select n from t2);
| N | S |
---------
| 2 | B |
| 2 | C |
Is there another way to achieve the same method that I was trying in attempt 2 in t-sql?
Yes, this subquery:
select *
from t1
where not exists (
select 1
from t2
where t2.n = t1.n
);
It doesn't matter to me which one of these I use, or if I use another solution... how should I go about this?
yes, using #JTC second query.
I just learned about COALESCE and I'm wondering if it's possible to COALESCE an entire row of data between two tables? If not, what's the best approach to the following ramblings?
For instance, I have these two tables and assuming that all columns match:
tbl_Employees
Id Name Email Etc
-----------------------------------
1 Sue ... ...
2 Rick ... ...
tbl_Customers
Id Name Email Etc
-----------------------------------
1 Bob ... ...
2 Dan ... ...
3 Mary ... ...
And a table with id's:
tbl_PeopleInCompany
Id CompanyId
-----------------
1 1
2 1
3 1
And I want to query the data in a way that gets rows from the first table with matching id's, but gets from second table if no id is found.
So the resulting query would look like:
Id Name Email Etc
-----------------------------------
1 Sue ... ...
2 Rick ... ...
3 Mary ... ...
Where Sue and Rick was taken from the first table, and Mary from the second.
SELECT Id, Name, Email, Etc FROM tbl_Employees
WHERE Id IN (SELECT ID From tbl_PeopleInID)
UNION ALL
SELECT Id, Name, Email, Etc FROM tbl_Customers
WHERE Id IN (SELECT ID From tbl_PeopleInID) AND
Id NOT IN (SELECT Id FROM tbl_Employees)
Depending on the number of rows, there are several different ways to write these queries (with JOIN and EXISTS), but try this first.
This query first selects all the people from tbl_Employees that have an Id value in your target list (the table tbl_PeopleInID). It then adds to the "bottom" of this bunch of rows the results of the second query. The second query gets all tbl_Customer rows with Ids in your target list but excluding any with Ids that appear in tbl_Employees.
The total list contains the people you want — all Ids from tbl_PeopleInID with preference given to Employees but missing records pulled from Customers.
You can also do this:
1) Outer Join the two tables on tbl_Employees.Id = tbl_Customers.Id. This will give you all the rows from tbl_Employees and leave the tbl_Customers columns null if there is no matching row.
2) Use CASE WHEN to select either the tbl_Employees column or tbl_Customers column, based on whether tbl_Customers.Id IS NULL, like this:
CASE WHEN tbl_Customers.Id IS NULL THEN tbl_Employees.Name ELSE tbl_Customers.Name END AS Name
(My syntax might not be perfect there, but the technique is sound).
This should be pretty performant. It uses a CTE to basically build a small table of Customers that have no matching Employee records, and then it simply UNIONs that result with the Employee records
;WITH FilteredCustomers (Id, Name, Email, Etc)
AS
(
SELECT Id, Name, Email, Etc
FROM tbl_Customers C
INNER JOIN tbl_PeopleInCompany PIC
ON C.Id = PIC.Id
LEFT JOIN tbl_Employees E
ON C.Id = E.Id
WHERE E.Id IS NULL
)
SELECT Id, Name, Email, Etc
FROM tbl_Employees E
INNER JOIN tbl_PeopleInCompany PIC
ON C.Id = PIC.Id
UNION
SELECT Id, Name, Email, Etc
FROM FilteredCustomers
Using the IN Operator can be rather taxing on large queries as it might have to evaluate the subquery for each record being processed.
I don't think the COALESCE function can be used for what you're thinking. COALESCE is similar to ISNULL, except it allows you to pass in multiple columns, and will return the first non-null value:
SELECT Name, Class, Color, ProductNumber,
COALESCE(Class, Color, ProductNumber) AS FirstNotNull
FROM Production.Product
This article should explain it's application:
http://msdn.microsoft.com/en-us/library/ms190349.aspx
It sounds like Larry Lustig's answer is more along the lines of what you need though.
I have two different tables to track location of equipment. The "equipment" table tracks the current location and when it was installed there. If the equipment was previously at a different location, that information is kept in the "locationHistory" table. There is one row per equip_id in the equipment table. There can be 0 or more entries for each equip_id in the locationHistory table.
equipment
equip_id
current_location
install_date_at_location
locationHistory
equip_id
location
install_date
pickup_date
I want an SQL query that gets the date of the FIRST install_date for each piece of eqipment...
Example:
equipment
=========
equip_id | current_location | install_date_at_location
123 location1 1/23/2011
locationHistory
===============
equip_id | location | install_date | pickup_date
123 location2 1/1/2011 1/5/2011
123 location3 1/7/2011 1/20/2011
Should return: 123, 1/1/2011
Thoughts?
You will want to union the queries that each look at one field, then use a MIN against it.
Or you can use the CASE and MIN for the same effect
select e.equip_id, MIN(CASE WHEN h.install_date < e.install_date_at_location
THEN h.install_date
ELSE e.install_date_at_location
END) as first_install_date
from equipment e
left join locationHistory h on h.equip_id = e.equip_id
group by e.equip_id
Well, the critical piece of information is whether the install_at_location_date in equipment can ever be less than what I assume is the historical information in locationHistory. If that's not possible, you can do:
SELECT * FROM locationHistory L INNER JOIN
(SELECT equip_id, MIN(install_date) AS firstDate FROM locationHistory)
AS firstInstalls F
ON L.equip_id = F.equip_id AND L.install_date = F.firstDate
But if you have to worry about both tables, you need to create view that normalizes the tables for you, and then apply the query against the view:
CREATE VIEW normalLocations (equip_id, location, install_date) AS
SELECT equip_id, location, install_date_at_location FROM equipment
UNION ALL
SELECT equip_id, location, install_date FROM equipment;
SELECT * FROM normalLocations L INNER JOIN
(SELECT equip_id, MIN(install_date) AS firstDate FROM normalLocations)
AS firstInstalls F
ON L.equip_id = F.equip_id AND L.install_date = F.firstDate
A simple way to do it is:
SELECT U.Equip_ID, MIN(U.Install_Date)
FROM (SELECT E.Equip_ID, E.Install_Date_At_Location AS Install_Date
FROM Equipment AS E
UNION
SELECT L.Equip_ID, L.Install_Date
FROM LocationHistory AS L
) AS U
GROUP BY U.Equip_ID
This could generate a lot of rows from the LocationHistory table, but it isn't clear that it is worth 'optimizing' it by trying to apply a GROUP BY and MIN to the second half of the UNION (because you'd immediately redo the grouping with the result from the information in the equipment table).