SQL SUM() function not working properly with multiple tables in query - sql

I am trying to pull the sum of hours worked worked and compare it to the sum of hours paid for each individual employee. These are stored in two different tables. When I query the tables separately into 2 different tables they work perfect. When I place them in the same query the results are way off.
Sample Data-PayrollTransactions:
PayrollTransactions
PayrollTime
This is the query that does not work:
SELECT Emp_No, Sum(Regular_Hours) AS PaidRegHours, Sum(Overtime_Hours) AS PaidOTHours, Sum(Reg_Hours) AS ClockedRegHours, Sum(OT_Hrs) AS ClockedOTHours
FROM PayrollTransactions, PayrollTime
WHERE Employee_No = Emp_No
GROUP BY Emp_No;
The result it pulls for 1 employee is 1000 PaidRegHours. When doing a query just from PayrollTransactions as such:
SELECT Employee_No, Sum(Regular_Hours) AS PaidRegHours, Sum(Overtime_Hours) AS PaidOTHours
FROM PayrollTransactions
GROUP BY Employee_No;
the result for that same employee is 200 PaidRegHours, which is correct. This same problem exists for all my computed fields. I am unsure how to fix this problem. Thanks for your help!
Desired Results:
DesiredOutput

Classic problem of JOIN multiples. By querying these two tables that share a many-to-many relationship on employee, you return multiple pairings (i.e., duplicates, triplets, quadruples) that are then aggregated, turning actual 200 to 1,000 summed hours. Instead, consider joining one-to-one pairs which can be achieved by joining aggregates of both tables.
Below uses subqueries but can also use stored queries. Also, the explicit JOIN is used (current ANSI SQL standard) and not implicit join as you currently have with WHERE.
SELECT p.Employee_No, p.PaidRegHours, p.PaidOTHours, t.ClockedRegHours, t.ClockedOTHours
FROM
(SELECT Employee_No,
Sum(Regular_Hours) AS PaidRegHours,
Sum(Overtime_Hours) AS PaidOTHours
FROM PayrollTransactions
GROUP BY Employee_No) p
INNER JOIN
(SELECT Emp_No,
Sum(Reg_Hours) AS ClockedRegHours,
Sum(OT_Hrs) AS ClockedOTHours
FROM PayrollTime
GROUP BY Emp_No) t
ON p.Employee_No = t.Emp_No
Alternatively, with stored queries which sometimes can be more efficient with Access' engine:
SELECT p.Employee_No, p.PaidRegHours, p.PaidOTHours, t.ClockedRegHours, t.ClockedOTHours
FROM qryPayrollTransactionsAgg p
INNER JOIN qryPayrollTimeAgg t
ON p.Employee_No = t.Emp_No

Related

How do I display only one result (the highest) with SQL query? (Beginner)

I need help making the following query only display one result, the one with the MAX Procurement Rate.
Currently the query works, but displays all results not just the one with the output of the MAX function
SELECT SalesPeople.SalesPersonID, FirstName, LastName, Region, SalesRevenueYear1, ProcurementCost
FROM ProductRevenueAndCosts
INNER JOIN SalesPeople
ON ProductRevenueAndCosts.SalesPersonID = SalesPeople.SalesPersonID
WHERE SalesPeople.Region = 'Central' AND (
SELECT MAX (ProcurementCost)
FROM ProductRevenueAndCosts
WHERE SalesPeople.Region = 'Central'
)
If you add a LIMIT 1 clause at the end of your SQL, then only the first record will be shown. If you add an ORDER BY column_name, then the results will be ordered by that column. Using these two together is a quick way to get the max or min without having to worry about aggregate functions.
https://www.w3schools.com/mysql/mysql_limit.asp
Otherwise, you can try aggregating the results with a max function:
https://www.w3schools.com/mysql/mysql_min_max.asp
As mentioned, you need to correlate the subquery to outer query. Be sure to use aliases between same named columns and exercise good practice in qualifying all columns with table names or aliases especially in JOIN queries:
SELECT sp.SalesPersonID, sp.FirstName, sp.LastName, sp.Region, sp.SalesRevenueYear1,
prc.ProcurementCost
FROM ProductRevenueAndCosts prc
INNER JOIN SalesPeople sp
ON prc.SalesPersonID = prc.SalesPersonID
WHERE sp.Region = 'Central'
AND prc.ProcurementCost = ( -- CORRELATE OUTER QUERY WITH SUBQUERY
SELECT MAX(ProcurementCost)
FROM ProductRevenueAndCosts
)
Note: If running in MS Access, remove the comment

Specifying SELECT, then joining with another table

I just hit a wall with my SQL query fetching data from my MS SQL Server.
To simplify, say i have one table for sales, and one table for customers. They each have a corresponding userId which i can use to join the tables.
I wish to first SELECT from the sales table where say price is equal to 10, and then join it on the userId, in order to get access to the name and address etc. from the customer table.
In which order should i structure the query? Do i need some sort of subquery or what do i do?
I have tried something like this
SELECT *
FROM Sales
WHERE price = 10
INNER JOIN Customers
ON Sales.userId = Customers.userId;
Needless to say this is very simplified and not my database schema, yet it explains my problem simply.
Any suggestions ? I am at a loss here.
A SELECT has a certain order of its components
In the simple form this is:
What do I select: column list
From where: table name and joined tables
Are there filters: WHERE
How to sort: ORDER BY
So: most likely it was enough to change your statement to
SELECT *
FROM Sales
INNER JOIN Customers ON Sales.userId = Customers.userId
WHERE price = 10;
The WHERE clause must follow the joins:
SELECT * FROM Sales
INNER JOIN Customers
ON Sales.userId = Customers.userId
WHERE price = 10
This is simply the way SQL syntax works. You seem to be trying to put the clauses in the order that you think they should be applied, but SQL is a declarative languages, not a procedural one - you are defining what you want to occur, not how it will be done.
You could also write the same thing like this:
SELECT * FROM (
SELECT * FROM Sales WHERE price = 10
) AS filteredSales
INNER JOIN Customers
ON filteredSales.userId = Customers.userId
This may seem like it indicates a different order for the operations to occur, but it is logically identical to the first query, and in either case, the database engine may determine to do the join and filtering operations in either order, as long as the result is identical.
Sounds fine to me, did you run the query and check?
SELECT s.*, c.*
FROM Sales s
INNER JOIN Customers c
ON s.userId = c.userId;
WHERE s.price = 10

SQL count - first time

I am learning SQL (bit by bit!) trying to perform a query on our database and adding in a count function to show the total orders that appear against a customers id by counting in a inner join query.
Somehow it is pooling all the data together onto one customer with the count function though.
Can someone please suggest where I am going wrong?
SELECT tbl_customers.*, tbl_stateprov.stprv_Name, tbl_custstate.CustSt_Destination, COUNT(order_id) as total
FROM tbl_stateprov
INNER JOIN (tbl_customers
INNER JOIN (tbl_custstate
INNER JOIN tbl_orders ON tbl_orders.order_CustomerID = tbl_custstate.CustSt_Cust_ID)
ON tbl_customers.cst_ID = tbl_custstate.CustSt_Cust_ID)
ON tbl_stateprov.stprv_ID = tbl_custstate.CustSt_StPrv_ID
WHERE tbl_custstate.CustSt_Destination='BillTo'
AND cst_LastName LIKE '#URL.Alpha#%'
You need a GROUP BY clause in this statement in order to get what you want. You need to figure out what level you want to group it by in order to select which fields to add to the group by clause. If you just wanted to see it on a per customer basis, and the customers table had an id field, it would look like this (at the very end of your sql):
GROUP BY tbl_customers.id
Now you can certainly group by more fields, it just depends how you want to slice the results.
In your select statement you are using format like tableName.ColumnName but not for COUNT(order_id)
It should be COUNT(tableOrAlias.order_id)
Hope that helps.
As you are new to SQL it might also be worth considering the readability of your joins - the nested / bracketed joins you mentioned above are quite hard to read, and I would also personally alias your tables to make the query that bit more accessible:
SELECT
tbl_customers.customer_id
,tbl_stateprov.stprv_Name
,tbl_custstate.CustSt_Destination
,COUNT(order_id) as total
FROM tbl_stateprov statep
INNER JOIN tbl_custstate state ON statep.stprv_ID = state.CustSt_StPrv_ID
INNER JOIN tbl_customers customer ON customer.cst_ID = state.CustSt_Cust_ID
INNER JOIN tbl_orders orders ON orders.order_CustomerID = state.CustSt_Cust_ID
WHERE tbl_custstate.CustSt_Destination='BillTo'
AND cst_LastName LIKE '#URL.Alpha#%'
GROUP BY
tbl_customers.customer_id
,tbl_stateprov.stprv_Name
,tbl_custstate.CustSt_Destination
--And any other columns you want to include the count for

How to select all attributes in sql Join query

The following sql query below produces the specified result.
select product.product_no,product_type,salesteam.rep_name,salesteam.SUPERVISOR_NAME
from product
inner join salesteam
on product.product_rep=salesteam.rep_id
ORDER BY product.Product_No;
However my intensions are to further produce a more detailed result which will include all the attributes in the PRODUCT table. my approach is to list all the attributes in the first line of the query.
select product.product_no,product.product_date,product.product_colour,product.product_style,
product.product_age product_type,salesteam.rep_name,salesteam.SUPERVISOR_NAME
from product
inner join salesteam
on product.product_rep=salesteam.rep_id
ORDER BY product.Product_No;
Is there another way it can be done instead of listing all the attributes of PRoduct table one by one?
You can use * to select all columns from all tables, or you can use [table/alias].* to select all columns from the specified table. In your case, you can use product.*:
select product.*,salesteam.rep_name,salesteam.SUPERVISOR_NAME
from product
inner join salesteam
on product.product_rep=salesteam.rep_id
ORDER BY product.Product_No;
It is important to note that you should only do this if you are 100% sure you need every single column, and always will. There are performance implications associated with this; if you're selecting 100 columns from a table when you really only need 4 or 5 of them, you're adding a lot of overhead to the query. The DBMS has to work harder, and you're also sending more data across the wire (if your database is not on the same machine as your executing code).
If any columns are later added to the product table, those columns will also be returned by this query in the future.
select
product.*,
salesteam.rep_name,
salesteam.SUPERVISOR_NAME
from product inner join salesteam on
product.product_rep=salesteam.rep_id
ORDER BY
product.Product_No;
This should do.
You can write like this
select P.* --- all Product columns
,S.* --- all salesteam columns
from product P
inner join salesteam S
on P.product_rep=S.rep_id
ORDER BY P.Product_No;

SQL inner join query returns two identical columns

Let's say I have the following SQL query:
SELECT *
FROM employee
INNER JOIN department ON employee.EmpID = department.EmpID
I wanted to ask, why I am getting two EmpID columns, and how can I get only one of those, preferably the first.
I'm using SQL server
SELECT employee.EmpID, employee.name, ...
FROM employee
INNER JOIN department ON employee.EmpID=department.EmpID
Be precise and specify which columns you need instead of using the astrisk to select all columns.
You get all columns from these two tables, that's why you have two EmpID columns. The only JOIN type that removes common column is NATURAL JOIN, which is not implemented by SQL Server. Your query would look then like this:
SELECT *
FROM employee
NATURAL JOIN department
This generates join predicates by comparing all columns with the same name in both tables. The resulting table contains only one column for each pair of equally named columns.
You're getting all columns from all tables involved in your query since you're asking for it: SELECT *
If you want only specific column - you need to specify which ones you want:
SELECT e.EmpID, e.Name as 'Employee Name', d.Name AS 'Department Name'
FROM employee e
INNER JOIN department d ON e.EmpID = d.EmpID
Don't use *. Specify the columns you want in the field list.
SELECT E.EmpID, E.EmpName -- etc
FROM employee as E
INNER JOIN department as D
ON E.EmpID=D.EmpID
As stated by others, don't use *
See this SO question for reasons why:
Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc
Essentially, the answer to your question is that the output from a SQL SELECT query is not a relation, and therefore if you do not take care you may end up with duplicate attribute names (columns) and rows.
Standard SQL has some constructs to mitigate SQL's non-relational problems e.g. NATURAL JOIN would ensure the result has only one EmpID attribute. Sadly, SQL Server does not support this syntax but you can vote for it here.
Therefore, you are forced to write out in long-hand the columns you want, using the table name to qualify which attribute you prefer e.g. employee.EmpID.