Joining 3 tables on 2 columns? - sql

I've created 3 views with identical columns- Quantity, Year, and Variety. I want to join all three tables on year and variety in order to do some calculations with quantities.
The problem is that a particular year/variety combo does not occur on every view.
I've tried queries like :
SELECT
*
FROM
a
left outer join
b
on a.variety = b.variety
left outer join
c
on a.variety = c.variety or b.variety = c.variety
WHERE
a.year = '2015'
and b.year = '2015'
and a.year= '2015'
Obviously this isn't the right solution. Ideally I'd like to join on both year and variety and not use a where statement at all.
The desired output would be put all quantities of matching year and variety on the same line, regardless of null values on a table.
I really appreciate the help, thanks.

You want a full outer join, not a left join, like so:
Select coalesce(a.year, b.year, c.year) as Year
, coalesce(a.variety, b.variety, c.variety) as Variety
, a.Quantity, b.Quantity, c.Quantity
from tableA a
full outer join tableB b
on a.variety = b.variety
and a.year = b.year
full outer join tableC c
on isnull(a.variety, b.variety) = c.variety
and isnull(a.year, b.year) = c.year
where coalesce(a.year, b.year, c.year) = 2015
The left join you are using won't pick up values from b or c that aren't in a. Additionally, your where clause is dropping rows that don't have values in all three tables (because the year in those rows is null, which is not equal to 2015). The full outer join will grab rows from either table in the join, regardless of whether the other table contains a match.

Related

Why is the SQL full outer join is not presenting unmatched customers (avc_id)?

I appreciate your help in advance!
The right table avc_enr has 108K customers (b.avc_id) in it. In the 2nd table (alias a), we have about 97K customers (a.avc_id).
I tried to use right, left and full outer join but every time the count of customers shows 97K rather than 108K customers (under Total_users)... any idea why with full outer join the count function is not counting all customers even if no common match is found between two tables?
with avc_enr as
(
select
dt, avc_id, service_template_name
from
hive.thor_satellite.v_nms_inventory_nmsdb_avc_service
where
current_status = 'ACTIVE' and dt = 20220809
)
select
a.dt, a.metrics_date,
avg(a.vsat_fl_byte_count_kbps) as AUPU_Kbps,
count(b.avc_id) as Total_users
from
hive.thor_satellite.vda_satellite_nms_performance_smts_avc_pm_throughput a
full outer join
avc_enr b on a.avc_id = b.avc_id and a.dt = b.dt
where
a.dt = 20220809
group by
a.dt, a.metrics_date

SQL Get aggregate as 0 for non existing row using inner joins

I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks
Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.
You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.

Translating subquery to left join in sqlite

I have a query that is running against a SQLite database that uses a couple of subqueries. In order to accommodate some new requirements, I need to translate it to use joins instead. Below is the structure version of the original query:
SELECT c.id AS category_id, b.budget_year,
(
SELECT sum(actual)
FROM lines l1
WHERE status = 'complete'
AND category_id = c.id
AND billing_year = b.budget_year
) AS actual
(
SELECT sum(planned)
FROM lines l2
WHERE status IN ('forecasted', 'in-progress')
AND category_id = c.id
AND billing_year = b.budget_year
) AS rough_proposed
FROM categories AS c
LEFT OUTER JOIN budgets AS b ON (c.id = b.category_id)
GROUP BY c.id, b.budget_year;
The next query is my first attempt to convert it to use LEFT OUTER JOINs:
SELECT c.id AS category_id, b.budget_year, sum(l1.actual) AS actual, sum(l2.planned) AS planned
FROM categories AS c
LEFT OUTER JOIN budgets AS b ON (c.id = b.category_id)
LEFT OUTER JOIN lines AS l1 ON (l1.category_id = c.id
AND l1.billing_year = b.budget_year
AND l1.status = 'complete')
LEFT OUTER JOIN lines AS l2 ON (l2.category_id = c.id
AND l2.billing_year = b.budget_year
AND l2.status IN ('forecasted', 'in-progress'))
GROUP BY c.id, b.budget_year;
However, the actual and rough_proposed columns are much larger than expected. I am no SQL expert, and I am having a hard time understanding what is going on here. Is there a straightforward way to convert the subqueries to joins?
There is a problem with both your queries. However, the first query hides the problem, while the second query makes it visible.
Here is what's going on: you join lines twice - once as l1 and once more as l2. The query before grouping would have the same line multiple times when there are both actual lines and forecast-ed / in-progress lines. When this happens, each line would be counted multiple times, resulting in inflated values.
The first query hides this, because it does not apply aggregation to actual and rough_proposed columns. SQLite picks the first entry for each group, which has the correct value.
You can fix your query by joining to lines only once, and counting the amounts conditionally, like this:
SELECT
c.id AS category_id
, b.budget_year
, SUM(CASE WHEN l.status = 'complete' THEN l.actual END) AS actual
, SUM(CASE WHEN l.status IN ('forecasted', 'in-progress') THEN l.planned END) AS planned
FROM categories AS c
LEFT OUTER JOIN budgets AS b ON (c.id = b.category_id)
LEFT OUTER JOIN lines AS l ON (l.category_id = c.id AND l1.billing_year = b.budget_year)
GROUP BY c.id, b.budget_year
In this new query each row from lines is brought in only once; the decision to count it in one of the actual/planned columns is made inside the conditional expression embedded in the SUM aggregating function.

join with date dimension but don't want NULL for the dates with values

I have a query:
SELECT
d.FiscalMonth,
d.FiscalMonthOfYear,
p.Name
FROM
DimDate d
LEFT JOIN FactSales f on f.SaleDate=d.PKDate
LEFT JOIN DimPerson p on p.PersonId=f.PersonId
WHERE d.FiscalYear='2014/7/1'
group by d.FiscalMonth, d.FiscalMonthOfYear, p.Name
ORDER BY d.FiscalMonthOfYear asc, p.PersonID asc
Which gives me these results:
Which is all fine, I want to include all months, even the ones that don't have data. (In this case FiscalMonth 2-12.)
The problem I have is with that one NULL value where I have data, IE. FiscalMonthOfYear 1. The red box.
How would I go about not returning that one "NULL" for the FiscalMonth=2014-07-01? I've tried some various where clauses but any time I remove the "NULL" values from the results, I also remove all the ones I want (IE. FiscalMonthOfYear 2-12)
Any help or guidance is greatly appreciated!
Thanks!
-Russ
Update:
DimDate table has primary key PKDate, which is one row for every date:
DimDate
PKDate ....
2014-07-01
2014-07-02
2014-07-03
etc.
FaceSales table has one ore many Sales transactions for a given day:
FactSales
SaleDate Amount
2014-07-01 34.99
2014-07-01 21.89
2014-07-02 24.77
2014-07-04 22.77
The problem is that FactSales may not have a sale on a particular day. So my query is finding that one (or many) days with no transactions, and because of the LEFT JOIN is returning it. How would I go about removing this result so it's not in my results?
SELECT
d.PKDate
,f.SaleDate
FROM
DimDate d
LEFT JOIN FactSales f on f.SaleDate=d.PKDate
LEFT JOIN DimPerson p on p.PersonId=f.PersonId
WHERE d.FiscalYear='2014/7/1'
ORDER BY d.PKDate
The problems stems from the fact that you are actually trying to do two things at once:
You want all the Names related to sales of fiscal months with at
least one sale
You want an extra row for all fiscal month with no
sales
As often goes in these cases... you should solve the two distinct problems and then put together the results (with a UNION in this specific case).
Something like this:
SELECT * FROM
(
SELECT DISTINCT
d.FiscalMonth,
d.FiscalMonthOfYear,
p.Name
FROM DimDate d
JOIN FactSales f ON f.SaleDate=d.PKDate
JOIN DimPerson p ON p.PersonId=f.PersonId
WHERE d.FiscalYear='2014/7/1'
) UNION (
SELECT
d.FiscalMonth,
d.FiscalMonthOfYear,
NULL AS Name
FROM DimDate d
LEFT JOIN FactSales f ON f.SaleDate=d.PKDate
WHERE d.FiscalYear='2014/7/1'
GROUP BY d.FiscalMonth, d.FiscalMonthOfYear, p.Name
HAVING COUNT(f.SaleDate)=0
)
ORDER BY FiscalMonthOfYear asc, PersonID ASC
I haven't tested it, and there may be some better ways to solve the second part (SUBSELECT, EXISTS) but that depends a bit on the engine you are using.
You can do an inner join as follows:
SELECT
d.FiscalMonth,
d.FiscalMonthOfYear,
p.Name
FROM
DimDate d
INNER JOIN FactSales f on f.SaleDate=d.PKDate
LEFT JOIN DimPerson p on p.PersonId=f.PersonId
WHERE d.FiscalYear='2014/7/1'
group by d.FiscalMonth, d.FiscalMonthOfYear, p.Name
ORDER BY d.FiscalMonthOfYear asc, p.PersonID asc
The inner join does a union of the two tables without giving priority to the left table. For more on joins you can read this blog: Visual representation of sql joins
Which states that an INNER JOIN will return all of the records in the left table (table A) that have a matching record in the right table (table B) whearas a LEFT JOIN will return all of the records in the left table (table A) regardless if any of those records have a match in the right table (table B)

Left and right joining in a query

A friend asked me for help on building a query that would show how many pieces of each model were sold on each day of the month, showing zeros when no pieces were sold for a particular model on a particular day, even if no items of any model are sold on that day. I came up with the query below, but it isn't working as expected. I'm only getting records for the models that have been sold, and I don't know why.
select days_of_months.`Date`,
m.NAME as "Model",
count(t.ID) as "Count"
from MODEL m
left join APPLIANCE_UNIT a on (m.ID = a.MODEL_FK and a.NUMBER_OF_UNITS > 0)
left join NEW_TICKET t on (a.NEW_TICKET_FK = t.ID and t.TYPE = 'SALES'
and t.SALES_ORDER_FK is not null)
right join (select date(concat(2009,'-',temp_months.id,'-',temp_days.id)) as "Date"
from temp_months
inner join temp_days on temp_days.id <= temp_months.last_day
where temp_months.id = 3 -- March
) days_of_months on date(t.CREATION_DATE_TIME) =
date(days_of_months.`Date`)
group by days_of_months.`Date`,
m.ID, m.NAME
I had created the temporary tables temp_months and temp_days in order to get all the days for any month. I am using MySQL 5.1, but I am trying to make the query ANSI-compliant.
You should CROSS JOIN your dates and models so that you have exactly one record for each day-model pair no matter what, and then LEFT JOIN other tables:
SELECT date, name, COUNT(t.id)
FROM (
SELECT ...
) AS days_of_months
CROSS JOIN
model m
LEFT JOIN
APPLIANCE_UNIT a
ON a.MODEL_FK = m.id
AND a.NUMBER_OF_UNITS > 0
LEFT JOIN
NEW_TICKET t
ON t.id = a.NEW_TICKET_FK
AND t.TYPE = 'SALES'
AND t.SALES_ORDER_FK IS NOT NULL
AND t.CREATION_DATE_TIME >= days_of_months.`Date`
AND t.CREATION_DATE_TIME < days_of_months.`Date` + INTERVAL 1 DAY
GROUP BY
date, name
The way you do it now you get NULL's in model_id for the days you have no sales, and they are grouped together.
Note the JOIN condition:
AND t.CREATION_DATE_TIME >= days_of_months.`Date`
AND t.CREATION_DATE_TIME < days_of_months.`Date` + INTERVAL 1 DAY
instead of
DATE(t.CREATION_DATE_TIME) = DATE(days_of_months.`Date`)
This will help make your query sargable (optimized by indexes)
You need to use outer joins, as they do not require each record in the two joined tables to have a matching record.
http://dev.mysql.com/doc/refman/5.1/en/join.html
You're looking for an OUTER join. A left outer join creates a result set with a record from the left side of the join even if the right side does not have a record to be joined with. A right outer join does the same on the opposite direction, creates a record for the right side table even if the left side does not have a corresponding record. Any column projected from the table that does not have a record will have a NULL value in the join result.