Uses of unequal joins - sql

Of all the thousands of queries I've written, I can probably count on one hand the number of times I've used a non-equijoin. e.g.:
SELECT * FROM tbl1 INNER JOIN tbl2 ON tbl1.date > tbl2.date
And most of those instances were probably better solved using another method. Are there any good/clever real-world uses for non-equijoins that you've come across?

Bitmasks come to mind. In one of my jobs, we had permissions for a particular user or group on an "object" (usually corresponding to a form or class in the code) stored in the database. Rather than including a row or column for each particular permission (read, write, read others, write others, etc.), we would typically assign a bit value to each one. From there, we could then join using bitwise operators to get objects with a particular permission.

How about for checking for overlaps?
select ...
from employee_assignments ea1
, employee_assignments ea2
where ea1.emp_id = ea2.emp_id
and ea1.end_date >= ea2.start_date
and ea1.start_date <= ea1.start_date

Whole-day inetervals in date_time fields:
date_time_field >= begin_date and date_time_field < end_date_plus_1

Just found another interesting use of an unequal join on the MCTS 70-433 (SQL Server 2008 Database Development) Training Kit book. Verbatim below.
By combining derived tables with unequal joins, you can calculate a variety of cumulative aggregates. The following query returns a running aggregate of orders for each salesperson (my note - with reference to the ubiquitous AdventureWorks sample db):
select
SH3.SalesPersonID,
SH3.OrderDate,
SH3.DailyTotal,
SUM(SH4.DailyTotal) RunningTotal
from
(select SH1.SalesPersonID, SH1.OrderDate, SUM(SH1.TotalDue) DailyTotal
from Sales.SalesOrderHeader SH1
where SH1.SalesPersonID IS NOT NULL
group by SH1.SalesPersonID, SH1.OrderDate) SH3
join
(select SH1.SalesPersonID, SH1.OrderDate, SUM(SH1.TotalDue) DailyTotal
from Sales.SalesOrderHeader SH1
where SH1.SalesPersonID IS NOT NULL
group by SH1.SalesPersonID, SH1.OrderDate) SH4
on SH3.SalesPersonID = SH4.SalesPersonID AND SH3.OrderDate >= SH4.OrderDate
group by SH3.SalesPersonID, SH3.OrderDate, SH3.DailyTotal
order by SH3.SalesPersonID, SH3.OrderDate
The derived tables are used to combine all orders for salespeople who have more than one order on a single day. The join on SalesPersonID ensures that you are accumulating rows for only a single salesperson. The unequal join allows the aggregate to consider only the rows for a salesperson where the order date is earlier than the order date currently being considered within the result set.
In this particular example, the unequal join is creating a "sliding window" kind of sum on the daily total column in SH4.

Dublicates;
SELECT
*
FROM
table a, (
SELECT
id,
min(rowid)
FROM
table
GROUP BY
id
) b
WHERE
a.id = b.id
and a.rowid > b.rowid;

If you wanted to get all of the products to offer to a customer and don't want to offer them products that they already have:
SELECT
C.customer_id,
P.product_id
FROM
Customers C
INNER JOIN Products P ON
P.product_id NOT IN
(
SELECT
O.product_id
FROM
Orders O
WHERE
O.customer_id = C.customer_id
)
Most often though, when I use a non-equijoin it's because I'm doing some kind of manual fix to data. For example, the business tells me that a person in a user table should be given all access roles that they don't already have, etc.

If you want to do a dirty join of two not really related tables, you can join with a <>.
For example, you could have a Product table and a Customer table. Hypothetically, if you want to show a list of every product with every customer, you could do somthing like this:
SELECT *
FROM Product p
JOIN Customer c on p.SKU <> c.SSN
It can be useful. Be careful, though, because it can create ginormous result sets.

Related

SQL Server question - subqueries in column result with a join?

I have a distinct list of part numbers from one table. It is basically a table that contains a record of all the company's part numbers. I want to add columns that will pull data from different tables but only pertaining to the part number on that row of the distinct part list.
For example: if I have part A, B, C from the unique part list I want to add columns for Purchase quantity, repair quantity, loan quantity, etc... from three totally unique tables.
So it's almost like I need 3 subqueries that will sum of that data from the different tables for each part.
Can anybody steer me in the direction of how to do this? Please and thank you so much!
One method is correlated subqueries. Something like this:
select p.*,
(select count(*)
from purchases pu
where pu.part_id = p.part_id
) as num_purchases,
(select count(*)
from repairs r
where r.part_id = p.part_id
) as num_repairs,
(select count(*)
from loans l
where l.part_id = p.part_id
) as num_loans
from parts p;
Another option is joins with aggregation before the join. Or lateral joins (which are quite similar to correlated subqueries).

How do I use the . properly with sql?

I am new to sql and have a question about joining 2 tables. Why is there a . in between customers.custnum in this example. What is its significance and what does it do?
Ex.
Select
customers.custnum, state, qty
From
customers
Inner join
sales On customers.custnum = sales.custnum
The . is to specify a column of a table.
Let's use your customer table; we could do:
SELECT c.custnum, c.state, c.qty FROM customers as c INNER JOIN
sales as s ON c.custnum = s.custnum
You don't really need the . unless two tables have columns with the same name.
In the below query, there are two tables being referred. One is CUSTOMERS another is STATE. Since both has same column CUSTNUM, we need a way to tell the database which CUSTNUM are we referring to. Same as there may be many Bob's, if so their last name is used for disambiguation.
I would consider the below style as more clearer. That's opinionated.
Select
cust.custnum, cust.state, s.qty
From
customers cust -- use alias for meaningful referencing, you may be self-joining, during that time you can use cust1, cust2 as aliases.
Inner join
sales as s On cust.custnum = s.custnum
Think of it as a way to categorize the hierarchical nature of the database. Within a DB, there are tables, and within tables there are columns. It's just a way of keeping track, especially if you are working with multiple tables that may have the same column name.
For example, a table called Sales and a table called Customers might both have a column called Date. You may be writing a query where you only want the date from the Sales table, so you would specify that by writing:
Select *
From Sales
inner join Customers on Sales.ID = Customers.ID
where Sales.Date = '1/1/2019'

SQL: If there is No match on condition (row in table) return a default value and Rows which match from different tables

I have three tables: Clinic, Stock and StockLog.
I need to get all rows where Stock.stock < 5. I need to also show if an order has been placed and what amount it is; which is found in the table Stocklog.
The issue is that a user can set his stock level in the Stock table without placing an order which would go to Stocklog.
I need a query that can : return the rows in the Stock table and get the related order amounts in the Stocklog table. If no order has been placed in StockLog, then set amount to order amount to zero.
I have tried :
SELECT
Clinic.Name,
Stock.NameOfMedication, Stock.Stock,
StockLog.OrderAmount
FROM
Clinic
JOIN
Stock ON Stock.ClinicID = Clinic.ClinicID
JOIN
StockLog ON StockLog.StockID = Stock.StockID
WHERE
Stock.Stock <= 5
The issue with my query is that I lose rows which are not found in StockLog.
Any help on how to write this.
Thank you.
I am thinking the query should look like this:
SELECT c.Name, s.NameOfMedication, s.Stock,
COALESCE(sl.OrderAmount, 0) as OrderAmount
FROM Stock s LEFT JOIN
Clinic c
ON s.ClinicID = c.ClinicID LEFT JOIN
StockLog sl
ON sl.StockID = s.StockID
WHERE s.Stock <= 5 ;
You want to keep all rows in Stock (subject to the WHERE condition). So think: "make Stock the first table in the FROM and use LEFT JOIN for all the other tables."
If you want to keep all the rows that result from joining Clinic and Stock, then use a LEFT OUTER JOIN with StockLog. I don't know which SQL you're using (SQL Server, MySQL, PostgreSQL, Oracle), so I can't give you a precise example, but searching for "left outer join" in the relevant documentation should work.
See this Stack Overflow post for an explanation of the various kinds of joins.

SQL Counting and Joining

I'm taking a database course this semester, and we're learning SQL. I understand most simple queries, but I'm having some difficulty using the count aggregate function.
I'm supposed to relate an advertisement number to a property number to a branch number so that I can tally up the amount of advertisements by branch number and compute their cost. I set up what I think are two appropriate new views, but I'm clueless as to what to write for the select statement. Am I approaching this the correct way? I have a feeling I'm over complicating this bigtime...
with ad_prop(ad_no, property_no, overseen_by) as
(select a.ad_no, a.property_no, p.overseen_by
from advertisement as a, property as p
where a.property_no = p.property_no)
with prop_branch(property_no, overseen_by, allocated_to) as
(select p.property_no, p.overseen_by, s.allocated_to
from property as p, staff as s
where p.overseen_by = s.staff_no)
select distinct pb.allocated_to as branch_no, count( ??? ) * 100 as ad_cost
from prop_branch as pb, ad_prop as ap
where ap.property_no = pb.property_no
group by branch_no;
Any insight would be greatly appreciated!
You could simplify it like this:
advertisement
- ad_no
- property_no
property
- property_no
- overseen_by
staff
- staff_no
- allocated_to
SELECT s.allocated_to AS branch, COUNT(*) as num_ads, COUNT(*)*100 as ad_cost
FROM advertisement AS a
INNER JOIN property AS p ON a.property_no = p.property_no
INNER JOIN staff AS s ON p.overseen_by = s.staff_no
GROUP BY s.allocated_to;
Update: changed above to match your schema needs
You can condense your WITH clauses into a single statement. Then, the piece I think you are missing is that columns referenced in the column definition have to be aggregated if they aren't included in the GROUP BY clause. So you GROUP BY your distinct column then apply your aggregation and math in your column definitions.
SELECT
s.allocated_to AS branch_no
,COUNT(a.ad_no) AS ad_count
,(ad_count * 100) AS ad_cost
...
GROUP BY s.allocated_to
i can tell you that you are making it way too complicated. It should be a select statement with a couple of joins. You should re-read the chapter on joins or take a look at the following link
http://www.sql-tutorial.net/SQL-JOIN.asp
A join allows you to "combine" the data from two tables based on a common key between the two tables (you can chain more tables together with more joins). Once you have this "joined" table, you can pretend that it is really one table (aliases are used to indicate where that column came from). You understand how aggregates work on a single table right?
I'd prefer not to give you the answer so that you can actually learn :)

Select based on the number of appearances of an id in another table

I have a table B with cids and cities. I also have a table C that has these cids with extra information. I want to list all the cids in table C that are associated with ALL appearances of a given city in Table B.
My current solution relies on counting the number of times the given city appears in Table B and selecting only the cids that appear that many times. I don't know all the SQL syntax yet, but is there a way to select for this kind of pattern?
My current solution:
SELECT Agents.aid
FROM Agents, Customers, Orders
WHERE (Customers.city='Duluth')
AND (Agents.aid = Orders.aid)
AND (Customers.cid = Orders.cid)
GROUP BY Agents.aid
HAVING count(Agents.aid) > 1
It only works because I know right now with the HAVING statement.
Thanks for the help. I wasn't sure how to google this problem, since it's pretty specific.
EDIT: I'm pinpointing my problem a bit. I need to know how to determine if EVERY row in a table has a certain value for a field. Declaring a variable and counting the rows in a sub-selection and filtering out my results by IDs that appear that many times works, but It's really ugly.
There HAS to be a way to do this without explicitly count()ing rows. I hope.
Not an answer to your question, but a general improvement.
I'd recommend using JOIN syntax to join your tables together.
This would change your query to be:
SELECT Agents.aid
FROM Agents
INNER JOIN Orders
ON Agents.aid = Orders.aid
INNER JOIN Customers
ON Customers.cid = Orders.cid
WHERE Customers.city='Duluth'
GROUP BY Agents.aid
HAVING count(Agents.aid) > 1
What variant of SQL are you using?
To start with, you can (and should) use JOIN instead of doing it in the WHERE clause, e.g.,
select Agents.aid
from Agents
join Orders on Agents.aid = Orders.aid
join Customers on Customers.cid = Orders.cid
where Customers.city = 'Duluth'
group by Agents.aid
having count(Agents.aid) > 1
After that, I'm afraid I might be a little lost. Using the table names in your example query, what (in English, not pseudocode) are you trying to retrieve? For example, I think your sample query is retrieving the PK for all Agents that have been involved in at least 2 Orders involving Customers in Duluth.
Also, some table definitions for Agents, Orders, and Customers might help (then again, they might be irrelevant).
I'm not sure if I understood you problem, but I think the following query is what you want:
SELECT *
FROM customers b
INNER JOIN orders c USING (cid)
WHERE b.city = 'Duluth'
AND NOT EXISTS (SELECT 1
FROM customers b2
WHERE b2.city = b.city
AND b2.cid <> cid);
Probably you will need some indexes on these columns.