SQL joining two tables with different levels of details - sql

So I have two tables of sales, budget and actual.
"budget" has two columns: location and sales. For example,
location sales
24 $20000
36 $100300
40 $24700
Total $145000
"actual" has three columns: invoice_number, location, and sales. For example,
invoice location sales
10000 36 $5000
10001 40 $6000
10002 99 $7000
and so forth
Total $110000
In summary, "actual" records transactions at the invoice level, whereas "budget" is done at the location level only (no individual invoices).
I'm trying to create a summary table that lists actual and budget sales side by side, grouped by location. The total of the actual column should be $110000, and $145000 for budget. This is my attempt at it (on pgAdmin/ postgresql):
SELECT actual.location, SUM(actual.sales) AS actual_sales, SUM(budget.sales) AS budget_sales
FROM actual LEFT JOIN budget
ON actual.location = budget.location
GROUP BY actual.location;
I used LEFT JOIN because "actual" has locations that "budget" doesn't have (e.g. location 99).
I ended up with some gigantic numbers ($millions) on both the actual_sales and budget_sales columns, far exceeding the total actual ($110000) or budget sales ($145,000).
Is this because the way I wrote my query is basically asking SQL to join each invoice in "actual" to each line in "budget," therefore duplicating things many times over? If so how should I have written this?
Thanks in advance!

Based on your description, you seem to have duplicates in both tables. There are various ways to solve this problem. Here is one using union all and group by:
select Location,
sum(actual_sales) as actual_sales,
sum(budget_sales) as budget_sales
from ((select a.location, a.sales as actual_sales, null as budget_sales
from actual a
) union all
(select b.location, null, b.sales
from budget b
)
) ab
group by location;
This structure guarantees that each value is counted only once, regardless of the table.

The query looks fine to me. However, it is difficult to find out why the figures are wrong. My suggestion is that you do the sum by location separately for budget and actual into 2 temporary tables, and later put them together using LEFT JOIN.

Yes, you're joining the budget in once for each actual sales row. However, your Actual Sales sum shouldn't have been larger unless there were multiple budget rows for the same location. You should check for that, because it doesn't sound like there should be.
What you need to do in a case like this is sum the actual sales first in a CTE or subquery, then later join the result to the budget. That way you only have one row for each location. This does it for the actual sales. If you really do have more than one row for a location for budget as well, you might need to subquery the budget as well the same way.
Select Act.Location, Act.actual_sales, budget.sales as budget_sales
From
(
SELECT actual.location, SUM(actual.sales) AS actual_sales
FROM actual
GROUP BY actual.location
) Act
left join budget on Act.location = budget.location

Gordon's suggestion is good, an alternative using WITH statements is:
WITH aloc AS (
SELECT location, SUM(sales) FROM actual GROUP BY 1
), bloc AS (
SELECT location, SUM(sales) FROM budget GROUP BY 1
)
SELECT location, a.sum AS actual_sales, b.sum AS budget_sales
FROM aloc a LEFT JOIN bloc b USING (location)
This is equivalent to:
SELECT location, a.sum AS actual_sales, b.sum AS budget_sales
FROM (SELECT location, SUM(sales) FROM actual GROUP BY 1) a LEFT JOIN
(SELECT location, SUM(sales) FROM budget GROUP BY 1) b USING (location)
but I find WITH statements more readable.
The purpose of the subqueries is to get tables into a state where a row means something relevant, i.e. aloc contains a row per location, and hence cause the join to evaluate to what you want.

Related

SQL query for calculated column

I have a table that looks like this -
Table screenshot link - https://i.stack.imgur.com/Pztpq.png
I want to add a new column 'Manufacturer_Updated', such that -
If any particular 'Product' has more than 1 (distinct) 'Manufacturer', then the Manufacturer having highest 'Sales' should be populated in the 'Manufacturer_Updated' column for all rows of that particular 'Product'.
Ex - In the above screenshot, Product - 'TOTAL HAIR CARE NA' has 2 different Manufacturer, so in the 'Manufacturer_Updated' column, 'SEXY HAIR CONCEPTS' should appear for both the rows, as it has the higher sales.
Could someone pls help with this query? Thanks in Advance!
Something like this should work:
SELECT Manufacturer, Product, Sales, Manufacturer as Manufacturer_Updated FROM
WHERE amt_of_manufacturers > 1
((SELECT Product, max(Sales) as Sales, count(distinct Manufacturer) as amt_of_manufacturers
FROM your_table
GROUP BY Product) as q1
left join
(SELECT Manufacturer, Sales, Product
FROM your_table
) as q2
ON q1.Sales = q2.Sales
AND q1.Product = q2.Product
) as q3
In the first query (q1), you're retrieving maximum sales per each product along with amount of manufacturers for a specific product (used later in upper query). In the second one (q2) you just need to retrieve Manufacturer (to transform it later to Manufacturer_Updated), Sales and Product (as join keys). After this you only need to filter out all products with single manufacturer.
Alternatively, if you want to keep those, you can remove where amt_of_manufacturers > 1 and replace Manufacturer_Updated in the upper query with the following:
CASE WHEN
amt_of_manufacturers <=1 THEN null
ELSE Manufacturer
END AS Manufacturer_Updated

sql allocate values based on percentage

Say I have a database with products and revenue. I know that for the product 'Apple', we have many kinds of appples and roughly 70% of sales are granny smith and 30% are golden delicious.
select
delivery_month_id,
sales_order_id,
product_id,
product_nm,
net_cost_distributed_amt,
from dw.op_sales_order
where delivery_month_id >= 201601
What I have now is
I'm trying to get something like this
I'm assuming I need some case whens and sub queries but not entirely sure how to go about this.
You need a table (or similar row source, e.g., a WITH clause) with the product details. Call it, for example DW.PRODUCT_DETAILS. It should have three columns: PRODUCT_DETAIL_ID, PRODUCT_NM, and ALLOCATION_PCT, where PRODUCT_NM is the name as what appears in your OP_SALES_ORDER table.
Then, you can left join this table into your query to get your desired results:
SELECT so.delivery_month_id,
so.sales_order_id,
so.product_id,
so.product_nm,
so.net_cost_distributed_amt,
so.net_cost_distributed_amt * NVL (pd.allocation_pct,1) rev_revised
FROM dw.op_sales_order so
LEFT JOIN dw.product_details pd ON pd.product_nm = so.product_nm
WHERE so.delivery_month_id >= 201601
With the left join, things like oranges and grapefruits, which do not have any details, will not need to be in the PRODUCT_DETAILS table.

Newb help in designing query to subtract results of two queries in same table

I have seen other questions like this one but feel mine is a bit different, or didn't quite understand the SQL in the other questions...so my apologies if this one is redundant or very easy..
Anyway, I have an accounting transaction DB that stores every transaction posting within our financial system on one line. What I am trying to do is net the sum of the debits and the credits for each GL account.
Here are the two basic queries I am executing to get the results that I would like to net.
Query 1 gives me the sum of all debit transactions posting to each gl account:
Select gl_debit, sum (amt) from FISC_YEAR2014 where fund = 'XXX'
group by gl_debit
Query 2 gives me the sum of all credit transactions posting to each gl account:
select gl_credit, sum (amt) from FISC_YEAR2014 where fund = 'XXX'
group by gl_credt
Now I would to subtract the credit amounts from the debit amounts to get net totals for each gl account. Make sense?
Thanks.
There are two ways to do this depending our your table definition. I think your situation is the first.
This is the normal way assuming credits and debits are in separate columns:
SELECT sum(gl_debit)-sum(gl_credit) as net_debit
FROM FISC_YEAR2014
WHERE fund = 'XXX'
This is the other way assuming direction is indicated by a separate column:
SELECT SUM(IF(is_debit=1,amount,-1*amount)) as net_debit
FROM FISC_YEAR2014
WHERE fund = 'XXX'
See also:
MySQL 'IF' in 'SELECT' statement
Can't calculate totals in general ledger report
What's a good way to store a financial ledger?
I believe this is what you need:
select
gl_account,
sum(amt)
from
(
select gl_debit gl_account,
sum(-amt) amt
from fisc_year2014
where fund = 'XXX'
group by gl_debit
union all
select gl_credit,
sum(amt)
from fisc_year2014
where fund = 'XXX'
group by gl_credit
)
group by
gl_account
There are two SELECTs: one to get the (negative) debits and another to get the credits. They are UNIONed to create a two-column result. The outer SELECT then aggregates the total sum by the gl_account code. If there is a mismatch (a gl_debit without a gl_credit, or vice-versa), then its amount would still be displayed.
SQLFiddle here (I added another row to show the effect of mismatched IDs)
To do this you should SUM the debits and credits separately in subqueries, then join those subqueries on gl_credit = gl_debit.
SELECT COALESCE(gl_credit, gl_debit) AS Id
,COALESCE(d.amt,0)-COALESCE(c.amt,0) AS Net
FROM (
SELECT gl_debit, SUM(amt) AS amt
FROM FISC_YEAR2014
GROUP BY gl_debit
) d
FULL OUTER JOIN (
SELECT gl_credit, SUM(amt) AS amt
FROM FISC_YEAR2014
GROUP BY gl_credit
) c ON d.gl_debit = c.gl_credit
ORDER BY COALESCE(gl_credit, gl_debit)
SQLFiddle
Outputs:
ID Net
-----------
101 -475
201 225
301 500
501 -250
If I were you rather than using a FULL OUTER JOIN, I'd select the ids from the accounts table or wherever you store them, then LEFT JOIN both of the subqueries to it, you haven't shown any other tables though so I can only speculate.

How do I enforce, that the sum of a weight parameter equals 1, grouped by part of the key?

I have two tables in my setup. One with sales persons and there income. Each sales person only know their total income. For this particular income period, they are asked to give an estimate of their income on either private, small business or large business customers. This information is entered in the second table.
Income
=================
SalesPerson
Income
Distribution
=============================
SalesPerson
CustomerType
Weight
Now, my query would look something like this:
SELECT
Income.SalesPerson,
Distribution.CustomerType,
Income.Income * Distribution.Weight as DistributedIncome
FROM
Income INNER JOIN Distribution ON
Income.SalesPerson = Distribution.SalesPerson
How would I enforce, that the SUM(Weight) = 1 for each SalesPerson in Distribution?
Normalize by the sum, according to the same criteria.
SELECT
Income.SalesPerson,
Distribution.CustomerType,
Income.Income * Distribution.Weight/(
select sum(d.weight)
from distribution d
inner join income i on i.salesperson = d.salesperson
)
as DistributedIncome
FROM
Income INNER JOIN Distribution ON
Income.SalesPerson = Distribution.SalesPerson
If somehow want to select unmodified weights that sum to 1 then I believe you've got a case of the subset sum problem and you are probably not going to be able to solve this with an SQL query.

MySQL - Max() return wrong result

I tried this query on MySQL server (5.1.41)...
SELECT max(volume), dateofclose, symbol, volume, close, market FROM daily group by market
I got this result:
max(volume) dateofclose symbol volume close market
287031500 2010-07-20 AA.P 500 66.41 AMEX
242233000 2010-07-20 AACC 16200 3.98 NASDAQ
1073538000 2010-07-20 A 4361000 27.52 NYSE
2147483647 2010-07-20 AAAE.OB 400 0.01 OTCBB
437462400 2010-07-20 AAB.TO 31400 0.37 TSX
61106320 2010-07-20 AA.V 0 0.24 TSXV
As you can see, the maximum volume is VERY different from the 'real' value of the volume column?!?
The volume column is define as int(11) and I got 2 million rows in this table but it's very far from the max of MyISAM storage so I cannot believed this is the problem!? What is also strange is data get show from the same date (dateofclose). If I force a specific date with a WHERE clause, the same symbol came out with different max(volume) result. This is pretty weird...
Need some help here!
UPDATE :
Here's my edited "working" request:
SELECT a.* FROM daily a
INNER JOIN (
SELECT market, MAX(volume) AS max_volume
FROM daily
WHERE dateofclose = '20101108'
GROUP BY market
) b ON
a.market = b.market AND
a.volume = b.max_volume
So this give me, by market, the highest volume's stock (for nov 8, 2010).
As you can see, the maximum volume is VERY different from the 'real' value of the volume column?!?
This is because MySQL rather bizarrely doesn't GROUP things in a sensical way.
Selecting MAX(column) will get you the maximum value for that column, but selecting other columns (or column itself) will not necessarily select the entire row that the found MAX() value is in. You essentially get an arbitrary (and usually useless) row back.
Here's a thread with some workarounds using subqueries:
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
This is a subset of the "greatest n per group" problem. (There is a tag with that name but I am a new user so I can't retag).
This is usually best handled with an analytic function, but can also be written with a join to a sub-query using the same table. In the sub-query you identify the max value, then join to the original table on the keys to find the row that matches the max.
Assuming that {dateofclose, symbol, market} is the grain at which you want the maximum volume, try:
select
a.*, b.max_volume
from daily a
join
(
select
dateofclose, symbol, market, max(volume) as max_volume
from daily
group by
dateofclose, symbol, market
) b
on
a.dateofclose = b.dateofclose
and a.symbol = b.symbol
and a.market = b.market
Also see this post for reference.
Did you try adjusting your query to include Symbol in the grouping?
SELECT max(volume), dateofclose, symbol,
volume, close, market FROM daily group by market, symbol