I am trying to write a query that will use the sum function to add up all values in 1 column then divide by the count of tuples in another table. For some reason when i run the sum query by itself i get the correct number back but when i use it in my query below the value is wrong.
this is what im trying to do but the numbers are coming out wrong.
select (sum(adonated) / count(p.pid)) as "Amount donated per Child"
from tsponsors s, player p;
I found out the issue is in the sum. below returns 650,000 when it should return 25000
select (sum(adonated)) as "Amount donated per Child"
from tsponsors s, player p;
if i remove the from player p it gets the correct amount. However i need the player table to get the number of players.
I have 3 tables that are related to this query.
player(pid, tid(fk))
team(tid)
tsponsors(tid(fk), adonated, sid(fk)) this is a joining table
what i want to get is the sum of all the amounts donated to each team sum(adonated) and divide this by the number of players in the database count(pid).
I guess your sponsors are giving amounts to teams. You then want to know the proportion of donations per child in the sponsored team.
You would then need something like this:
SELECT p.tid,(SUM(COALESCE(s.adonated,0)) / COUNT(p.pid)) AS "Amount donated per Child"
FROM player p
LEFT OUTER JOIN tsponsors s ON s.tid=p.tid
GROUP BY p.tid
I also used a LEFT OUTER JOIN in order to show 0$ if a team has no sponsors.
Try
select sum(s.adonated) / (SELECT count(p.pid) FROM player p)
as "Amount donated per Child"
from tsponsors s;
Your original query joins 2 tables without any condition, which results in cross join.
UPDATE
SELECT ts.tid, SUM(ts.adonated),num_plyr
FROM tsponsors ts
INNER JOIN
(
SELECT tid, COUNT(pid) as num_plyr
FROM player
GROUP BY tid
)a ON (a.tid = ts.tid)
GROUP BY ts.tid,num_plyr
Related
I am practicing a bit with SQL and I came across this exercise:
Consider the following database relating to albums, singers and sales:
Album (Code, Singer, Title)
Sales (Album, Year, CopiesSold)
with a constraint of referential integrity between the Sales Album attribute and the key of the
Album report.
Formulate the following query in SQL :
Find the code and title of the albums that have sold 10,000 copies
every year since they came out.
I had thought of solving it like this:
SELECT CODE, TITLE, COUNT (*)
FROM ALBUM JOIN SALES ON ALBUM.Code = SALES.Album
WHERE CopiesSold > 10000
HAVING COUNT(*) = /* Select difference from current year and came out year.*/
Can you help me with this? Thanks.
You can do this with an INNER JOIN, GROUP BY, and HAVING.
SELECT A.Code, A.Title
FROM ALBUM A
INNER JOIN SALES S ON S.Album = A.Code
GROUP BY A.Code, A.Title
HAVING MIN(S.CopiesSold) >= 10000
The HAVING clause will filter out albums whose minimum Copies Sold are < 10000.
EDIT
There was also a question about gaps in the Sales data, there are a number of ways to modify the above query to solve for this as well. One solution would be to use an embedded query to identify the correct number of years.
SELECT A.Code, A.Title
FROM ALBUM A
INNER JOIN SALES S ON S.Album = A.Code
GROUP BY A.Code, A.Title
HAVING MIN(S.CopiesSold) >= 10000 AND
COUNT(*) = (SELECT COUNT(DISTINCT Year) FROM SALES WHERE Year >= MIN(s.Year))
This solution assumes that at least one album by some artist was sold each year (a fairly safe bet). If you had a Years table there are simpler solutions. If the data is current there are also solutions that utilize DATEDIFF.
You can use correlated subqueries with EXISTS or NOT EXISTS respectively.
In one check if the maximum year minus the minimum year plus one is equal to the count of records with a defined year of an album. That way you make sure you don't get albums where there are figures missing for a year and you therefore cannot tell whether they sold 10000 or more or not. Also check that the maximum year is the current year not to miss gaps between the maximum year and the current year. (In the example code I will use the literal 2020 but there are means to get that dynamically. They depend on the DBMS however and you didn't state which one you're using.)
In the second one check that there's no record with undefined sales figures or sales figures lower than 10000 for the album. If no such record exists, all of the existing one have to have figures of 10000 or greater.
SELECT a1.code,
a1.title
FROM album a1
WHERE EXISTS (SELECT ''
FROM sales s1
WHERE s1.album = a1.code
HAVING max(s1.year) - min(s1.year) + 1 = count(s1.year)
AND max(s1.year) = 2020)
AND NOT EXISTS (SELECT *
FROM sales s2
WHERE s2.album = a1.code
AND s2.copiessold IS NULL
OR s2.copiessold < 10000);
I think the ALL keyword should work nicely here. Something like this:
SELECT * FROM Album
WHERE 10000 <= ALL (
SELECT CopiesSold FROM Sales
WHERE Sales.Album = Album.Code)
So I have two tables of sales, budget and actual.
"budget" has two columns: location and sales. For example,
location sales
24 $20000
36 $100300
40 $24700
Total $145000
"actual" has three columns: invoice_number, location, and sales. For example,
invoice location sales
10000 36 $5000
10001 40 $6000
10002 99 $7000
and so forth
Total $110000
In summary, "actual" records transactions at the invoice level, whereas "budget" is done at the location level only (no individual invoices).
I'm trying to create a summary table that lists actual and budget sales side by side, grouped by location. The total of the actual column should be $110000, and $145000 for budget. This is my attempt at it (on pgAdmin/ postgresql):
SELECT actual.location, SUM(actual.sales) AS actual_sales, SUM(budget.sales) AS budget_sales
FROM actual LEFT JOIN budget
ON actual.location = budget.location
GROUP BY actual.location;
I used LEFT JOIN because "actual" has locations that "budget" doesn't have (e.g. location 99).
I ended up with some gigantic numbers ($millions) on both the actual_sales and budget_sales columns, far exceeding the total actual ($110000) or budget sales ($145,000).
Is this because the way I wrote my query is basically asking SQL to join each invoice in "actual" to each line in "budget," therefore duplicating things many times over? If so how should I have written this?
Thanks in advance!
Based on your description, you seem to have duplicates in both tables. There are various ways to solve this problem. Here is one using union all and group by:
select Location,
sum(actual_sales) as actual_sales,
sum(budget_sales) as budget_sales
from ((select a.location, a.sales as actual_sales, null as budget_sales
from actual a
) union all
(select b.location, null, b.sales
from budget b
)
) ab
group by location;
This structure guarantees that each value is counted only once, regardless of the table.
The query looks fine to me. However, it is difficult to find out why the figures are wrong. My suggestion is that you do the sum by location separately for budget and actual into 2 temporary tables, and later put them together using LEFT JOIN.
Yes, you're joining the budget in once for each actual sales row. However, your Actual Sales sum shouldn't have been larger unless there were multiple budget rows for the same location. You should check for that, because it doesn't sound like there should be.
What you need to do in a case like this is sum the actual sales first in a CTE or subquery, then later join the result to the budget. That way you only have one row for each location. This does it for the actual sales. If you really do have more than one row for a location for budget as well, you might need to subquery the budget as well the same way.
Select Act.Location, Act.actual_sales, budget.sales as budget_sales
From
(
SELECT actual.location, SUM(actual.sales) AS actual_sales
FROM actual
GROUP BY actual.location
) Act
left join budget on Act.location = budget.location
Gordon's suggestion is good, an alternative using WITH statements is:
WITH aloc AS (
SELECT location, SUM(sales) FROM actual GROUP BY 1
), bloc AS (
SELECT location, SUM(sales) FROM budget GROUP BY 1
)
SELECT location, a.sum AS actual_sales, b.sum AS budget_sales
FROM aloc a LEFT JOIN bloc b USING (location)
This is equivalent to:
SELECT location, a.sum AS actual_sales, b.sum AS budget_sales
FROM (SELECT location, SUM(sales) FROM actual GROUP BY 1) a LEFT JOIN
(SELECT location, SUM(sales) FROM budget GROUP BY 1) b USING (location)
but I find WITH statements more readable.
The purpose of the subqueries is to get tables into a state where a row means something relevant, i.e. aloc contains a row per location, and hence cause the join to evaluate to what you want.
There are 2 tables, there is an expected result, the result is to have the total cost of each engagement calculated, there are multiple tests taken during each engagement, each test ranges in cost (all set values), the expected result must be in terms of EngagementId, EngagementCost
The 2 tables, with there respective fields
- EngagementTest (EngagementId, TestId)
- Test (TestId, TestCost)
How would one go calculating the cost of each engagement.
This is as far as i managed to get
SELECT EngagementId, COUNT(TESTId)
FROM EngagementTest
GROUP BY EngagementId;
Try a SUM of the TestCost column rather than a COUNT. COUNT just tells you the number of rows. SUM adds up the values within the rows and gives you a total. Also your existing query doesn't actually use the table that contains the cost data. You can INNER JOIN the two tables via TestId and then GROUP BY the EngagementId so you get the sum of each engagement.
Something like this:
SELECT
ET.EngagementId,
SUM(T.TestCost)
FROM
EngagementTest ET
INNER JOIN Test T
ON T.TestId = ET.TestId
GROUP BY
ET.EngagementId
It can be achieved using below query.
SELECT i.EngagementId, SUM(TestCost)
FROM EngagementTest i
INNER JOIN Test t
ON e.TestId = t.TestId
GROUP BY i.EngagementId
I have seen other questions like this one but feel mine is a bit different, or didn't quite understand the SQL in the other questions...so my apologies if this one is redundant or very easy..
Anyway, I have an accounting transaction DB that stores every transaction posting within our financial system on one line. What I am trying to do is net the sum of the debits and the credits for each GL account.
Here are the two basic queries I am executing to get the results that I would like to net.
Query 1 gives me the sum of all debit transactions posting to each gl account:
Select gl_debit, sum (amt) from FISC_YEAR2014 where fund = 'XXX'
group by gl_debit
Query 2 gives me the sum of all credit transactions posting to each gl account:
select gl_credit, sum (amt) from FISC_YEAR2014 where fund = 'XXX'
group by gl_credt
Now I would to subtract the credit amounts from the debit amounts to get net totals for each gl account. Make sense?
Thanks.
There are two ways to do this depending our your table definition. I think your situation is the first.
This is the normal way assuming credits and debits are in separate columns:
SELECT sum(gl_debit)-sum(gl_credit) as net_debit
FROM FISC_YEAR2014
WHERE fund = 'XXX'
This is the other way assuming direction is indicated by a separate column:
SELECT SUM(IF(is_debit=1,amount,-1*amount)) as net_debit
FROM FISC_YEAR2014
WHERE fund = 'XXX'
See also:
MySQL 'IF' in 'SELECT' statement
Can't calculate totals in general ledger report
What's a good way to store a financial ledger?
I believe this is what you need:
select
gl_account,
sum(amt)
from
(
select gl_debit gl_account,
sum(-amt) amt
from fisc_year2014
where fund = 'XXX'
group by gl_debit
union all
select gl_credit,
sum(amt)
from fisc_year2014
where fund = 'XXX'
group by gl_credit
)
group by
gl_account
There are two SELECTs: one to get the (negative) debits and another to get the credits. They are UNIONed to create a two-column result. The outer SELECT then aggregates the total sum by the gl_account code. If there is a mismatch (a gl_debit without a gl_credit, or vice-versa), then its amount would still be displayed.
SQLFiddle here (I added another row to show the effect of mismatched IDs)
To do this you should SUM the debits and credits separately in subqueries, then join those subqueries on gl_credit = gl_debit.
SELECT COALESCE(gl_credit, gl_debit) AS Id
,COALESCE(d.amt,0)-COALESCE(c.amt,0) AS Net
FROM (
SELECT gl_debit, SUM(amt) AS amt
FROM FISC_YEAR2014
GROUP BY gl_debit
) d
FULL OUTER JOIN (
SELECT gl_credit, SUM(amt) AS amt
FROM FISC_YEAR2014
GROUP BY gl_credit
) c ON d.gl_debit = c.gl_credit
ORDER BY COALESCE(gl_credit, gl_debit)
SQLFiddle
Outputs:
ID Net
-----------
101 -475
201 225
301 500
501 -250
If I were you rather than using a FULL OUTER JOIN, I'd select the ids from the accounts table or wherever you store them, then LEFT JOIN both of the subqueries to it, you haven't shown any other tables though so I can only speculate.
Im writing a query that sums order values broken down by product groups - problem is that when I add joins the aggregated SUM gets greatly inflated - I assume its because its adding in duplicate rows. Im kinda new to SQL, but I think its because I need to construct the query with sub selects or nested joins?
All data returns as expected, and my joins pull out the needed data, but the SUM(inv.item_total) AS Value returned is much higher that it should be - SQL below
SELECT so.Company_id, SUM(inv.item_total) AS Value, co.company_name,
agents.short_desc, stock_type.short_desc AS Type
FROM SORDER as so
JOIN company AS co ON co.company_id = so.company_id
JOIN invoice AS inv ON inv.Sorder_id = so.Sorder_id
JOIN sorder_item AS soitem ON soitem.sorder_id = so.Sorder_id
JOIN STOCK AS stock ON stock.stock_id = soitem.stock_id
JOIN stock_type AS stock_type ON stock_type.stype_id = stock.stype_id
JOIN AGENTS AS AGENTS ON agents.agent_id = co.agent_id
WHERE
co.last_ordered >'01-JAN-2012' and so.Sotype_id='1'
GROUP BY so.Company_id,co.company_name,agents.short_desc, stock_type.short_desc
Any guidence on how I should structure this query to pull out an "un-duplicated" SUM(inv.item_total) AS Value much appreciated.
To get an accurate sum, you want only the joins that are needed. So, this version should work:
SELECT so.Company_id, SUM(inv.item_total) AS Value, co.company_name
FROM SORDER so JOIN
company co
ON co.company_id = so.company_id JOIN
invoice inv
ON inv.Sorder_id = so.Sorder_id
group by so.Company_id, co.company_name
You can then add in one join at a time to see where the multiplication is taking place. I'm guessing it has to do with the agents.
It sounds like the joins are not accurate.
First suspect join
For example, would an agent be per company, or per invoice?
If it is per order, then should the join be something along the lines of
JOIN AGENTS AS AGENTS ON agents.agent_id = inv.agent_id
Second suspect join
Can one order have many items, and many invoices at the same time? That can cause problems as well. Say an order has 3 items and 3 invoices were sent out. According to your joins, the same item will show up 3 times means a total of 9 line items where there should be only 3. You may need to eliminate the invoices table
Possible way to solve this on your own:
I would remove all the grouping and sums, and see if you can filter by one invoice produce an unique set of rows for all the data.
Start with an invoice that has just one item and inspect your result set for accuracy. If that works, then add another invoice that has multiple and check the rows to see if you get your perfect dataset back. If not, then the columns that have repeating values (Company Name, Item Name, Agent Name, etc) are usually a good starting point for checking up on why the duplicates are showing up.