How to get a column with count 0 in sql - sql

I am trying to find the number of sales per petrol company. However I also want to include petrol companies that have made no sales but I cannot figure out how to do it. The stations table includes all the stations however sales only includes stations which actually had any sales.
This is how I am finding the number of sales per petrol station, but this doesnt include companies with 0 sales:
select stations.company,count(sales.sale)
from stations
join sales on stations.id=sales.stationid
group by stations.company;
My idea is to create a union with another query which just finds the companies with 0 sales but I don't know how to get a column with a 0 value in it. I tried to add having count(sales.sale) = 0 but since the stations with no sales just don't appear in the sales table that doesn't work.
I have looked at similar stack overflow questions but they all seem to reference using a different type of join however I have tried using left/right outer/inner joins with no luck.

You haven't provided a minimum reproducible example so I can't be entirely certain, but seems like a case of needing a LEFT JOIN. Try the query below :
SELECT stations.company, count(sales.sale) AS TotalSales
FROM stations
LEFT JOIN sales ON stations.id=sales.stationid
GROUP BY stations.company

Related

SQL joining two tables with different levels of details

So I have two tables of sales, budget and actual.
"budget" has two columns: location and sales. For example,
location sales
24 $20000
36 $100300
40 $24700
Total $145000
"actual" has three columns: invoice_number, location, and sales. For example,
invoice location sales
10000 36 $5000
10001 40 $6000
10002 99 $7000
and so forth
Total $110000
In summary, "actual" records transactions at the invoice level, whereas "budget" is done at the location level only (no individual invoices).
I'm trying to create a summary table that lists actual and budget sales side by side, grouped by location. The total of the actual column should be $110000, and $145000 for budget. This is my attempt at it (on pgAdmin/ postgresql):
SELECT actual.location, SUM(actual.sales) AS actual_sales, SUM(budget.sales) AS budget_sales
FROM actual LEFT JOIN budget
ON actual.location = budget.location
GROUP BY actual.location;
I used LEFT JOIN because "actual" has locations that "budget" doesn't have (e.g. location 99).
I ended up with some gigantic numbers ($millions) on both the actual_sales and budget_sales columns, far exceeding the total actual ($110000) or budget sales ($145,000).
Is this because the way I wrote my query is basically asking SQL to join each invoice in "actual" to each line in "budget," therefore duplicating things many times over? If so how should I have written this?
Thanks in advance!
Based on your description, you seem to have duplicates in both tables. There are various ways to solve this problem. Here is one using union all and group by:
select Location,
sum(actual_sales) as actual_sales,
sum(budget_sales) as budget_sales
from ((select a.location, a.sales as actual_sales, null as budget_sales
from actual a
) union all
(select b.location, null, b.sales
from budget b
)
) ab
group by location;
This structure guarantees that each value is counted only once, regardless of the table.
The query looks fine to me. However, it is difficult to find out why the figures are wrong. My suggestion is that you do the sum by location separately for budget and actual into 2 temporary tables, and later put them together using LEFT JOIN.
Yes, you're joining the budget in once for each actual sales row. However, your Actual Sales sum shouldn't have been larger unless there were multiple budget rows for the same location. You should check for that, because it doesn't sound like there should be.
What you need to do in a case like this is sum the actual sales first in a CTE or subquery, then later join the result to the budget. That way you only have one row for each location. This does it for the actual sales. If you really do have more than one row for a location for budget as well, you might need to subquery the budget as well the same way.
Select Act.Location, Act.actual_sales, budget.sales as budget_sales
From
(
SELECT actual.location, SUM(actual.sales) AS actual_sales
FROM actual
GROUP BY actual.location
) Act
left join budget on Act.location = budget.location
Gordon's suggestion is good, an alternative using WITH statements is:
WITH aloc AS (
SELECT location, SUM(sales) FROM actual GROUP BY 1
), bloc AS (
SELECT location, SUM(sales) FROM budget GROUP BY 1
)
SELECT location, a.sum AS actual_sales, b.sum AS budget_sales
FROM aloc a LEFT JOIN bloc b USING (location)
This is equivalent to:
SELECT location, a.sum AS actual_sales, b.sum AS budget_sales
FROM (SELECT location, SUM(sales) FROM actual GROUP BY 1) a LEFT JOIN
(SELECT location, SUM(sales) FROM budget GROUP BY 1) b USING (location)
but I find WITH statements more readable.
The purpose of the subqueries is to get tables into a state where a row means something relevant, i.e. aloc contains a row per location, and hence cause the join to evaluate to what you want.

Parallel Date Sales SQL View

I have a challenge which I can't seem to resolve on my own and now need help!
I have a requirement to show parallel year date sales via SQL and by that I mean if today (20/08/2015) Customer A has purchased products worth 500, I want to know how much Customer A spent on the same day last year (so 20/08/2014).
Here's a SQL fiddle where I've built everything (I reckoned that would be easiest for you guys). I have 3 dimensions (DimProduct, DimDate and DimCustomer), a fact table (FactSales) and a view (VW_ParallelSales) which I've built on top. I have also left a query on the right hand side with what I'm trying to achieve. If you run the query you will see that for Antonio, the SaleAmount on 20140820 was 3500 and if you look at the very bottom of the table, you can see there's one more record for Antonio in the fact table on 20150820 for 6500. So esentially, what I want is to have that 3500 which was sold on 20140820 (which is the parallel year date of 20150820) under the column ParallelSales (which at the moment is showing as NULL).
It all works like a charm if I don't include the ProductKey in the view and have just the CustomerKey (see this fiddle). However, as soon as I add the Product Key, because there is no exact match of CustomerKey-ProductKey that has happened in the past, I'm getting NULLS for ParallelSales (or at least that's what I think the reason is).
What I want to be able to do is then use the view and join on both DimCustomer and DimProduct and run queries both ways, i.e.:
Query 1: How much did Customer A spend today vs today last year?
Query 2: How much of Product A did we sell today vs today last year?
At the moment, as is, I need to have 2 views for that - one that joins the two sub-queries in the view on CustomerKey and the other one - on ProductKey (and obviously the dates).
I know it's a lot to ask but I do need to get this to work and would appreciate your help immensely! Thanks :)
For customer sales in diferent years.
SQL Fiddle Demo
SELECT DimCustomer.CustomerName,
VW_Current.Saledate,
VW_Current.ParallelDate,
VW_Current.CurrentSales,
VW_Previous.CurrentSales as ParallelSale
FROM DimCustomer
INNER JOIN VW_ParallelSales VW_Current
ON DimCustomer.CustomerKey = VW_Current.CustomerKey
LEFT JOIN VW_ParallelSales VW_Previous
ON VW_Current.ParallelDate = VW_Previous.Saledate
AND DimCustomer.CustomerKey = VW_Previous.CustomerKey
ORDER BY 1, 2
For productkey
SQL Fiddle Demo
With sales as (
SELECT
DimProduct.ProductKey,
DimProduct.ProductName,
VW_ParallelSales.Saledate,
VW_ParallelSales.ParallelDate,
VW_ParallelSales.CurrentSales,
VW_ParallelSales.ParallelSales
FROM DimProduct INNER JOIN VW_ParallelSales ON DimProduct.ProductKey =
VW_ParallelSales.ProductKey
)
SELECT
s_recent.ProductName,
s_recent.Saledate ThisYear,
s_old.Saledate PreviousYear,
s_recent.CurrentSales CurrentSales,
s_old.CurrentSales ParallelSales
FROM
SALES s_recent
left outer join SALES s_old
on s_recent.saledate = s_old.saledate + 10000
and s_recent.ProductKey = s_old.ProductKey

Create one query with sum and count with each value pulled from a different table

I am trying to create a query that pulls two different aggregated values from three different tables during a specific date range. I am working in Access 2003.
I have:
tblPO which has the high level purchase order description (company name, shop order #, date of order, etc)
tblPODescription which has the dollar values of the individual line items from customers the purchase order
tblCostSheets which as a breakdown of the individual pieces that we need to manufacture to satisfy the customers purchase order.
I am looking to create a query that will allow me, based on the Shop Order #, to get both the sum of the dollar values from tblPODescriptions and the count of the different type of pieces we need to make from tblCostSheets.
A quick caveat: the purchase order may have 5 line items for a sum of say $1560 but it might take us making 8 or 9 different parts to satisfy those 5 line items. I can easily create a query that pulls either the sum or the count by themselves, but when I created my query with both, I end up with numbers that are multipled versions of what I want. I believe it is multiplying my piece counts and dollar values.
SELECT DISTINCTROW tblPO.CompanyName, tblPO.ShopOrderNo, tbl.OrderDate, Sum(tblPODescriptions.ItemAmount) AS SumOfItemAmount, Count(tblCostSheets.Description) AS CountOfDescription
FROM (tblPO INNER JOIN tblPODescriptions ON (tblPO.CompanyName = tblPODescriptions.CompanyName) AND (tblPO.PurchaseOrderNo = tblPODescriptions.PurchaseOrderNo) AND (tblPO.PODate = tblPODescriptions.PODate)) INNER JOIN tblCostSheets ON tblPO.ShopOrderNo = tblCostSheets.ShopOrderNo
GROUP BY tblPO.CompanyName, tblPO.ShopOrderNo, tblPO.OrderDate
HAVING (((tblPO.OrderDate) Between [Enter Start Date:] And [Enter End Date:]));

Join two queries together with default values of 0 if empty in Access

I have seen some close answers and I have been trying to adapt them to Access 2013, but I can't seem to get it to work. I have two queries:
First query returns
original_staff_data
Month
Year
staff_uid
staff_abbrev
employee_name
staff_salary
It pulls this from tables staff, and salary_by_month and employee_name and number_of_days_at_spec_building (this records where they check in when they work)
transaction_data_by_staff.total
Month
Year
staff_uid
total_revenue
totat_profit
this also pulls information from staff, but sums up over multiple dates in a transaction table creating a cumulative value for each staff_uid so I can't combine the two queries directly.
My problem is I want to create a query that brings results from both. However, not all staff members in Q1 will be in Q2 every day/week/month (vacations, etc) and since I want to ultimately create a final results:
Final_Result
Month
Year
staff_uid
staff_abbrev
employee_name
staff_salary
total_revenue
total_profit
The SQL:
SELECT
original_staff_data.*
, transaction_data_by_staff.total_rev
, transaction_data_by_staff.total_profit
FROM transaction_data_by_staff
RIGHT JOIN original_staff_data
ON (
transaction_data_by_staff.year = original_staff_data.year
AND transaction_data_by_staff.month = original_staff_data.month
) WHERE transaction_data_by_staff.[staff_uid] = [original_staff_data].[staff_uid];
I would like it if there is no revenue or profit that month from that employee, it makes those values 0. I have tried join (specifically RIGHT join with Q1 as the RIGHT join) and it doesn't seem to work, I still only get the subset. There are originally in the original_staff_data query 750 entries so therefore there should be in the final query 750 entries, I am only getting 252, which is the total in transaction_data_by_staff. Any clue on how the ACCESS 2013 SQL should look?
Thanks
Jon
Move the link by stuff_uid to the ON clause, like this:
SELECT original_staff_data.*, transaction_data_by_staff.total_rev, transaction_data_by_staff.total_profit
FROM transaction_data_by_staff RIGHT JOIN original_staff_data ON (transaction_data_by_staff.year = original_staff_data.year) AND (transaction_data_by_staff.month = original_staff_data.month)
AND (((transaction_data_by_staff.[staff_uid])=[original_staff_data].[staff_uid]));

sql SUM value incorrect when using joins and group by

Im writing a query that sums order values broken down by product groups - problem is that when I add joins the aggregated SUM gets greatly inflated - I assume its because its adding in duplicate rows. Im kinda new to SQL, but I think its because I need to construct the query with sub selects or nested joins?
All data returns as expected, and my joins pull out the needed data, but the SUM(inv.item_total) AS Value returned is much higher that it should be - SQL below
SELECT so.Company_id, SUM(inv.item_total) AS Value, co.company_name,
agents.short_desc, stock_type.short_desc AS Type
FROM SORDER as so
JOIN company AS co ON co.company_id = so.company_id
JOIN invoice AS inv ON inv.Sorder_id = so.Sorder_id
JOIN sorder_item AS soitem ON soitem.sorder_id = so.Sorder_id
JOIN STOCK AS stock ON stock.stock_id = soitem.stock_id
JOIN stock_type AS stock_type ON stock_type.stype_id = stock.stype_id
JOIN AGENTS AS AGENTS ON agents.agent_id = co.agent_id
WHERE
co.last_ordered >'01-JAN-2012' and so.Sotype_id='1'
GROUP BY so.Company_id,co.company_name,agents.short_desc, stock_type.short_desc
Any guidence on how I should structure this query to pull out an "un-duplicated" SUM(inv.item_total) AS Value much appreciated.
To get an accurate sum, you want only the joins that are needed. So, this version should work:
SELECT so.Company_id, SUM(inv.item_total) AS Value, co.company_name
FROM SORDER so JOIN
company co
ON co.company_id = so.company_id JOIN
invoice inv
ON inv.Sorder_id = so.Sorder_id
group by so.Company_id, co.company_name
You can then add in one join at a time to see where the multiplication is taking place. I'm guessing it has to do with the agents.
It sounds like the joins are not accurate.
First suspect join
For example, would an agent be per company, or per invoice?
If it is per order, then should the join be something along the lines of
JOIN AGENTS AS AGENTS ON agents.agent_id = inv.agent_id
Second suspect join
Can one order have many items, and many invoices at the same time? That can cause problems as well. Say an order has 3 items and 3 invoices were sent out. According to your joins, the same item will show up 3 times means a total of 9 line items where there should be only 3. You may need to eliminate the invoices table
Possible way to solve this on your own:
I would remove all the grouping and sums, and see if you can filter by one invoice produce an unique set of rows for all the data.
Start with an invoice that has just one item and inspect your result set for accuracy. If that works, then add another invoice that has multiple and check the rows to see if you get your perfect dataset back. If not, then the columns that have repeating values (Company Name, Item Name, Agent Name, etc) are usually a good starting point for checking up on why the duplicates are showing up.