SQL : how to calculate an average from a query - sql

I've been using SQL for about a week at my first full time job, and I'm trying to calculate some statistics from a query where I've combined columns from separate tables.
Specifically, I'm trying to calculate an average from a combined table, where I have applied filters (or constraints? I'm not clear on the SQL lingo).
From doing research on Google, I learned how to calculate an average:
SELECT AVG(column_name)
FROM table_name
The problem I'm having is that this only seems to work with existing tables in the database, not with new queries I have created.
A simplified version of my code is as follows:
SELECT
Animal_Facts.Animal_Name, Animal_Facts.Prev_Reg_Amount,
Names.Given_Name, Animal_Class.Class_Description
FROM
Names
INNER JOIN
Animal_Facts ON Names.Name_Key = Animal_Facts.Name_Key
INNER JOIN
Animal_Class ON Animal_Facts.Class_Key = Animal_Class.Class_Key
This query creates combines four columns from three tables, where Class_Description describes whether the animal is desexed, microchipped, owned by a pensioner etc, and Pre_Reg_Amount is the registration fee paid.
I want to find the average fee paid by pensioners, so I included the following line of code to filter the table:
AND Animal_Class.Class_Description LIKE ('%pensioner%')
And then to calculate the average I add:
SELECT AVG(Animal_Facts.Prev_Reg_Amount) from Animal_Facts
So my total code is:
SELECT
Animal_Facts.Animal_Name, Animal_Facts.Prev_Reg_Amount,
Names.Given_Name, Animal_Class.Class_Description
FROM
Names
INNER JOIN
Animal_Facts ON Names.Name_Key = Animal_Facts.Name_Key
INNER JOIN
Animal_Class ON Animal_Facts.Class_Key = Animal_Class.Class_Key
AND Animal_Class.Class_Description LIKE ('%pensioner%')
SELECT AVG(Animal_Facts.Prev_Reg_Amount)
FROM Animal_Facts
Now the problem is, after checking this calculation in Excel, I'm not actually getting the average of the pensioner data, but the average of all the data. Is there a way to calculate averages (and other statistics) directly from my created table in SQL?
Note: I am able to calculate all these statistics by exporting the data to Excel, but it is much more time consuming. I'd much rather learn how to do this within SQL.

SELECT AVG(af.Prev_Reg_Amount)
FROM
Animal_Facts af
INNER JOIN Animal_Class ac
ON af.Class_Key = ac.Class_Key
AND Class_Description LIKE ('%pensioner%')

Related

My Joins in query not pulling through correctly

Good evening. Could someone please help me with the following. I am trying to join two tables.The first id wbr_global.gl_ap_details. This stores historic GL information. The second table sandbox.utr_fixed_mapping is where account mapping is stored. For example, ana ccount number 60820 is mapped as Employee relation. The first table needs the mapping from the second table linked on the account number. The output I am getting is not right and way to bug. Any help would be appreciated!
Output
select sandbox.utr_fixed_mapping_na.new_mapping_1,sum(wbr_global.gl_ap_details.amount)
from wbr_global.gl_ap_details
LEFT JOIN sandbox.utr_fixed_mapping_na ON wbr_global.gl_ap_details.account_number = sandbox.utr_fixed_mapping_na.account_number
Where gl_ap_details.cost_center = '1172'
and gl_ap_details.period_name = 'JUL-21'
and gl_ap_details.ledger_name = 'Amazon.com, Inc.'
Group by 1;
I tried adding the cast function but after 5000 seconds of the query running I canceled it.
The query itself appears ok, but minor changes. Learn to use table "aliases". This way you don't have to keep typing long database.table.column all over. Additionally, SQL is easier to read doing it that way anyhow.
Notice the aliases "gl" and "fm" after the tables are declared, then these aliases are used to represent the columns.. Easier to read, would you agree.
Added GL Account number as described below the query.
select
gl.account_number,
fm.new_mapping_1,
sum(gl.amount)
from
wbr_global.gl_ap_details gl
LEFT JOIN sandbox.utr_fixed_mapping_na fm
ON gl.account_number = fm.account_number
Where
gl.cost_center = '1172'
and gl.period_name = 'JUL-21'
and gl.ledger_name = 'Amazon.com, Inc.'
Group by
gl.account_number,
fm.new_mapping_1
Now, as for your query and getting null. This just means that there are records within the gl_ap_details table with an account number that is not found in the utr_fixed_mapping_na table. So, to see WHAT gl account number does NOT exist, I have added it to the query. Its possible there are MULTIPLE records in the gl_ap_details that are not found in the mapping table. So, you may get
GLAccount Description SumOfAmount
glaccount1 null $someAmount
glaccount37 null $someAmount
glaccount49 null $someAmount
glaccount72 Depreciation $someAmount
glaccount87 Real Estate $someAmount
glaccount92 Building $someAmount
glaccount99 Salaries $someAmount
I obviously made-up glaccounts just to show the purpose. You may have multiple where the null's total amount is actually masking how many different gl account numbers were NOT found.
Once you find which are missing, you can check / confirm they SHOULD be in the mapping table.
FEEDBACK.
Since you do realize the missing numbers, lets consider a Cartesian result. If there are multiple entries in the mapping table for the same G/L account number, you will get a Cartesian result thus bloating your numbers. To clarify, lets say your mapping table has
Mapping file.
GL Descr1 NewMapping
1 test Salaries
1 testView Buildings
1 Another Depreciation
And your GL_AP_Details has
GL Amount
1 $100
Your total for the query would result in $300 because the query is trying to join the AP Details GL #1 to EACH of the entries in the mapping file thus bloating the amount. You could also add a COUNT(*) as NumberOfEntries to the query to see how many transactions it THINKS it is processing. Is there some "unique ID" in the GL_AP_Details table? If so, then you could also do a count of DISTINCT ID values. If they are different (distinct is lower than # of entries), I think THAT is your culprit.
select
fm.new_mapping_1,
sum(gl.amount),
count(*) as NumberOfEntries,
count( distinct gl.UniqueIdField ) as DistinctTransactions
from
wbr_global.gl_ap_details gl
LEFT JOIN sandbox.utr_fixed_mapping_na fm
ON gl.account_number = fm.account_number
Where
gl.cost_center = '1172'
and gl.period_name = 'JUL-21'
and gl.ledger_name = 'Amazon.com, Inc.'
Group by
fm.new_mapping_1
Might you also need to limit the mapping table for a specific prophecy or mec view?
If you "think" that the result of an aggregate is wrong, then the easiest way to verify this is to select the individual rows that correlate to 1 record in the aggregate output and inspect the records, looking for duplications.
For instance, pick 'Building Management':
SELECT fixed.new_mapping_1,details.amount,*
FROM wbr_global.gl_ap_details details
LEFT JOIN sandbox.utr_fixed_mapping_na fixed ON details.account_number = fixed.account_number
WHERE details.cost_center = '1172'
AND details.period_name = 'JUL-21'
AND details.ledger_name = 'Amazon.com, Inc.'
AND details.account_number = 'Building Management'
Notice that we tack on a ,* to the end of the projection, this will show you everything that the query has access to, you should look for repeating sections of data that you were not expecting, then depending on which table they originate from your might add additional criteria to the JOIN, or to the WHERE or you might need to group by additional columns.
This type of issue is really hard to comment on in a forum like this because it is highly specific to your schema, and the data contained within it, making solutions highly subjective to criteria you are not likely to publish online.
Generally if you think a calculation is wrong, you need to manually compute it to verify, this above advice helps you to inspect the data your query is using, you should either construct your own query or use other tools to build the data set that helps you to manually compute the correct values, then work them back into or replace your original query.
The speed issues are out of scope here, we can comment on the poor schema design but I suspect you don't have a choice. In the utr_fixed_mapping_na table you should make the account_number have the same column type as the source data, or add a new column that has the data in the original type, then you can setup indexes on the columns to improve the speed of the join.

MS Access - query that sums a sum through multiple joined tables

I have three tables Models, Buildups, and Components with many-to-many join tables in between them. Each model can have multiple buildups and buildups are composed of multiple components. The components table has a field called Retail.
I'm trying to create a query for a report where the user can see a model and know the total buildup retail amount which would be the sum of the Retail field of each component in the buildup and then a sum of each buildup in the model.
I need a way to reference the Sum of the Sum of components without the enter parameter box appearing when the query is run (strange enough, when the parameter box is left blank it calculates correctly but I don't want the box to pop up).
Is the solution a nested query? If so, how would I do that? Or is the solution to use DSum()? Once again, if so, how would I implement that?
I'm not sure what to reference to make the criteria portion of the DSum() formula work correctly.
Unless I've misunderstood your database structure or what you are looking to obtain, it seems like this would be sufficient:
select
mo.JandelModelID,
sum(co.retail) as Total_Retail
from
(
(
tblJandelModels mo inner join tblJandelModelBuildups mb on
mo.JandelModelID = mb.JandelModelID
)
inner join tblBuildupComponents bc on mb.BuildupID = bc.BuildupID
)
inner join tblComponents co on bc.ComponentID = co.ComponentID
group by
mo.JandelModelID

MS Access 2013, How to add totals row within SQL

I'm in need of some assistance. I have search and not found what I'm looking for. I have an assigment for school that requires me to use SQL. I have a query that pulls some colunms from two tables:
SELECT Course.CourseNo, Course.CrHrs, Sections.Yr, Sections.Term, Sections.Location
FROM Course
INNER JOIN Sections ON Course.CourseNo = Sections.CourseNo
WHERE Sections.Term="spring";
I need to add a Totals row at the bottom to count the CourseNo and Sum the CrHrs. It has to be done through SQL query design as I need to paste the code. I know it can be done with the datasheet view but she will not accept that. Any advice?
To accomplish this, you can union your query together with an aggregation query. Its not clear from your question which columns you are trying to get "Totals" from, but here's an example of what I mean using your query and getting counts of each (kind of useless example - but you should be able to apply to what you are doing):
SELECT
[Course].[CourseNo]
, [Course].[CrHrs]
, [Sections].[Yr]
, [Sections].[Term]
, [Sections].[Location]
FROM
[Course]
INNER JOIN [Sections] ON [Course].[CourseNo] = [Sections].[CourseNo]
WHERE [Sections].[Term] = [spring]
UNION ALL
SELECT
"TOTALS"
, SUM([Course].[CrHrs])
, count([Sections].[Yr])
, Count([Sections].[Term])
, Count([Sections].[Location])
FROM
[Course]
INNER JOIN [Sections] ON [Course].[CourseNo] = [Sections].[CourseNo]
WHERE [Sections].[Term] = “spring”
You can prepare your "total" query separately, and then output both query results together with "UNION".
It might look like:
SELECT Course.CourseNo, Course.CrHrs, Sections.Yr, Sections.Term, Sections.Location
FROM Course
INNER JOIN Sections ON Course.CourseNo = Sections.CourseNo
WHERE Sections.Term="spring"
UNION
SELECT "Total", SUM(Course.CrHrs), SUM(Sections.Yr), SUM(Sections.Term), SUM(Sections.Location)
FROM Course
INNER JOIN Sections ON Course.CourseNo = Sections.CourseNo
WHERE Sections.Term="spring";
Whilst you can certainly union the aggregated totals query to the end of your original query, in my opinion this would be really bad practice and would be undesirable for any real-world application.
Consider that the resulting query could no longer be used for any meaningful analysis of the data: if displayed in a datagrid, the user would not be able to sort the data without the totals row being interspersed amongst the rest of the data; the user could no longer use the built-in Totals option to perform their own aggregate operation, and the insertion of a row only identifiable by the term totals could even conflict with other data within the set.
Instead, I would suggest displaying the totals within an entirely separate form control, using a separate query such as the following (based on your own example):
SELECT Count(Course.CourseNo) as Courses, Sum(Course.CrHrs) as Hours
FROM Course INNER JOIN Sections ON Course.CourseNo = Sections.CourseNo
WHERE Sections.Term = "spring";
However, since CrHrs are fields within your Course table and not within your Sections table, the above may yield multiples of the desired result, with the number of hours multiplied by the number of corresponding records in the Sections table.
If this is the case, the following may be more suitable:
SELECT Count(Course.CourseNo) as Courses, Sum(Course.CrHrs) as Hours
FROM
Course INNER JOIN
(SELECT DISTINCT s.CourseNo FROM Sections s WHERE s.Term = "spring") q
ON Course.CourseNo = q.CourseNo

MS Access - Summing up a field to be used in another query is "duplicating" data

I am trying to sum up one field and use it in another query, but when I use the Totals button and then call that sum from the other query it considers that field as multiple instances but with the sum value in each one. How can I sum two fields in two different queries and then use those sums in another query? Note - I only separated them into 3 queries because I felt it would help me avoid "is not part of an aggregate function" errors.
Example Data
Inventory Query: This query groups by item and sums the qty_on_hand field
Item SumOfqty_on_hand
A 300
Job Material query: This query groups on the job's materials and sums up the qty_req field (quantity required to complete the job)
Item SumOfqty_req
A 500
When I make a third query to do the calculation [SumOfqty_req]-[SumOfqty_on_hand] the query does the calculation but for each record in the Job Material query.
Job Material Query
SELECT dbo_jobmatl.item,
IIf(([qty_released]-[qty_complete])<0,0,([qty_released]-[qty_complete]))*[matl_qty] AS qty_req
FROM new_BENInventory
INNER JOIN (dbo_jobmatl
INNER JOIN new_BENJobs
ON (new_BENJobs.suffix = dbo_jobmatl.suffix)
AND (dbo_jobmatl.job = new_BENJobs.job)
) ON new_BENInventory.item = dbo_jobmatl.item
GROUP BY dbo_jobmatl.item,
IIf(([qty_released]-[qty_complete])<0,0,([qty_released]-[qty_complete]))*[matl_qty];
Inventory Query
SELECT dbo_ISW_LPItem.item,
Sum(dbo_ISW_LPItem.qty_on_hand) AS SumOfqty_on_hand,
dbo_ISW_LP.whse,
dbo_ISW_LPItem.hold_flag
FROM (dbo_ISW_LP INNER JOIN dbo_ISW_LPItem
ON dbo_ISW_LP.lp_num = dbo_ISW_LPItem.lp_num)
INNER JOIN dbo_ISW_LPLot
ON (dbo_ISW_LPItem.lp_num = dbo_ISW_LPLot.lp_num)
AND (dbo_ISW_LPItem.item = dbo_ISW_LPLot.item)
AND (dbo_ISW_LPItem.qty_on_hand = dbo_ISW_LPLot.qty_on_hand)
GROUP BY dbo_ISW_LPItem.item,
dbo_ISW_LP.whse,
dbo_ISW_LPItem.hold_flag
HAVING (((Sum(dbo_ISW_LPItem.qty_on_hand))>0)
AND ((dbo_ISW_LP.whse) Like "BEN")
AND ((dbo_ISW_LPItem.hold_flag) Like 0));
Third Query
SELECT new_BENJobItems.item,
[qty_req]-[SumOfqty_on_hand] AS [Transfer QTY]
FROM new_BENInventory
INNER JOIN new_BENJobItems
ON new_BENInventory.item = new_BENJobItems.item;
Please note that anything that starts with dbo_ is a prefix for a table that sources the original data.
If any more clarification is needed I would be more than happy to provide it.
Looks like you need a GROUP BY new_BENJobItems.item on your final query along with a SUM() on the quantity. Or to remove the IIf(([qty_released]-[qty_complete])<0,0,([qty_released]-[qty_complete]))*[matl_qty] from your Job Material query. Or both. As written, the Job Material Query is going to return a record for every different key value in the joined input tables that has a distinct quantity, which doesn't seem like the granularity you want for that.

sql SUM value incorrect when using joins and group by

Im writing a query that sums order values broken down by product groups - problem is that when I add joins the aggregated SUM gets greatly inflated - I assume its because its adding in duplicate rows. Im kinda new to SQL, but I think its because I need to construct the query with sub selects or nested joins?
All data returns as expected, and my joins pull out the needed data, but the SUM(inv.item_total) AS Value returned is much higher that it should be - SQL below
SELECT so.Company_id, SUM(inv.item_total) AS Value, co.company_name,
agents.short_desc, stock_type.short_desc AS Type
FROM SORDER as so
JOIN company AS co ON co.company_id = so.company_id
JOIN invoice AS inv ON inv.Sorder_id = so.Sorder_id
JOIN sorder_item AS soitem ON soitem.sorder_id = so.Sorder_id
JOIN STOCK AS stock ON stock.stock_id = soitem.stock_id
JOIN stock_type AS stock_type ON stock_type.stype_id = stock.stype_id
JOIN AGENTS AS AGENTS ON agents.agent_id = co.agent_id
WHERE
co.last_ordered >'01-JAN-2012' and so.Sotype_id='1'
GROUP BY so.Company_id,co.company_name,agents.short_desc, stock_type.short_desc
Any guidence on how I should structure this query to pull out an "un-duplicated" SUM(inv.item_total) AS Value much appreciated.
To get an accurate sum, you want only the joins that are needed. So, this version should work:
SELECT so.Company_id, SUM(inv.item_total) AS Value, co.company_name
FROM SORDER so JOIN
company co
ON co.company_id = so.company_id JOIN
invoice inv
ON inv.Sorder_id = so.Sorder_id
group by so.Company_id, co.company_name
You can then add in one join at a time to see where the multiplication is taking place. I'm guessing it has to do with the agents.
It sounds like the joins are not accurate.
First suspect join
For example, would an agent be per company, or per invoice?
If it is per order, then should the join be something along the lines of
JOIN AGENTS AS AGENTS ON agents.agent_id = inv.agent_id
Second suspect join
Can one order have many items, and many invoices at the same time? That can cause problems as well. Say an order has 3 items and 3 invoices were sent out. According to your joins, the same item will show up 3 times means a total of 9 line items where there should be only 3. You may need to eliminate the invoices table
Possible way to solve this on your own:
I would remove all the grouping and sums, and see if you can filter by one invoice produce an unique set of rows for all the data.
Start with an invoice that has just one item and inspect your result set for accuracy. If that works, then add another invoice that has multiple and check the rows to see if you get your perfect dataset back. If not, then the columns that have repeating values (Company Name, Item Name, Agent Name, etc) are usually a good starting point for checking up on why the duplicates are showing up.