Where are Cartesian Joins used in real life? - sql

Where are Cartesian Joins used in real life?
Can some one please give examples of such a Join in any SQL database.

just random example. you have a table of cities: Id, Lat, Lon, Name. You want to show user table of distances from one city to another. You will write something like
SELECT c1.Name, c2.Name, SQRT( (c1.Lat - c2.Lat) * (c1.Lat - c2.Lat) + (c1.Lon - c2.Lon)*(c1.Lon - c2.Lon))
FROM City c1, c2

Here are two examples:
To create multiple copies of an invoice or other document you can populate a temporary table with names of the copies, then cartesian join that table to the actual invoice records. The result set will contain one record for each copy of the invoice, including the "name" of the copy to print in a bar at the top or bottom of the page or as a watermark. Using this technique the program can provide the user with checkboxes letting them choose what copies to print, or even allow them to print "special copies" in which the user inputs the copy name.
CREATE TEMP TABLE tDocCopies (CopyName TEXT(20))
INSERT INTO tDocCopies (CopyName) VALUES ('Customer Copy')
INSERT INTO tDocCopies (CopyName) VALUES ('Office Copy')
...
INSERT INTO tDocCopies (CopyName) VALUES ('File Copy')
SELECT * FROM InvoiceInfo, tDocCopies WHERE InvoiceDate = TODAY()
To create a calendar matrix, with one record per person per day, cartesian join the people table to another table containing all days in a week, month, or year.
SELECT People.PeopleID, People.Name, CalDates.CalDate
FROM People, CalDates

I've noticed this being done to try to deliberately slow down the system either to perform a stress test or an excuse for missing development deliverables.

Usually, to generate a superset for the reports.
In PosgreSQL:
SELECT COALESCE(SUM(sales), 0)
FROM generate_series(1, 12) month
CROSS JOIN
department d
LEFT JOIN
sales s
ON s.department = d.id
AND s.month = month
GROUP BY
d.id, month

This is the only time in my life that I've found a legitimate use for a Cartesian product.
At the last company I worked at, there was a report that was requested on a quarterly basis to determine what FAQs were used at each geographic region for a national website we worked on.
Our database described geographic regions (markets) by a tuple (4, x), where 4 represented a level number in a hierarchy, and x represented a unique marketId.
Each FAQ is identified by an FaqId, and each association to an FAQ is defined by the composite key marketId tuple and FaqId. The associations are set through an admin application, but given that there are 1000 FAQs in the system and 120 markets, it was a hassle to set initial associations whenever a new FAQ was created. So, we created a default market selection, and overrode a marketId tuple of (-1,-1) to represent this.
Back to the report - the report needed to show every FAQ question/answer and the markets that displayed this FAQ in a 2D matrix (we used an Excel spreadsheet). I found that the easiest way to associate each FAQ to each market in the default market selection case was with this query, unioning the exploded result with all other direct FAQ-market associations.
The Faq2LevelDefault table holds all of the markets that are defined as being in the default selection (I believe it was just a list of marketIds).
SELECT FaqId, fld.LevelId, 1 [Exists]
FROM Faq2Levels fl
CROSS JOIN Faq2LevelDefault fld
WHERE fl.LevelId=-1 and fl.LevelNumber=-1 and fld.LevelNumber=4
UNION
SELECT Faqid, LevelId, 1 [Exists] from Faq2Levels WHERE LevelNumber=4

You might want to create a report using all of the possible combinations from two lookup tables, in order to create a report with a value for every possible result.
Consider bug tracking: you've got one table for severity and another for priority and you want to show the counts for each combination. You might end up with something like this:
select severity_name, priority_name, count(*)
from (select severity_id, severity_name,
priority_id, priority_name
from severity, priority) sp
left outer join
errors e
on e.severity_id = sp.severity_id
and e.priority_id = sp.priority_id
group by severity_name, priority_name
In this case, the cartesian join between severity and priority provides a master list that you can create the later outer join against.

When running a query for each date in a given range. For example, for a website, you might want to know for each day, how many users were active in the last N days. You could run a query for each day in a loop, but it's simplest to keep all the logic in the same query, and in some cases the DB can optimize the Cartesian join away.

To create a list of related words in text mining, using similarity functions, e.g. Edit Distance

Related

Populate SQL query with blank rows

(This is a general SQL question, but I am specifically using MSAccess 2010 so looking for how to do this with Access' flavor of SQL)
I have a table called offices which has id, office_name, num_desks.
Another table called employees which has id, employee_name.
And a final table called employee_offices which has id, office_id, employee_id.
I can assign employees to offices via employee_offices.
I am trying to generate a report which shows all offices and the employees assigned to the, but also includes blank lines for any empty desks in that office.
I realize a "simple" way to do this would be to create a desks table with id, office_id, delete the num_desks column from the offices table and change employee_offices to something like employee_desks. Then my report would be a simple LEFT OUTER JOIN and it would include all the unassigned desks. However for the sake of sanity (in this case, there is no contextual difference between desks), I am not going to do this. Plus if I start deleting desks I have referential constraints to deal with (which obviously exist for a good reason and would catch the fact that I am leaving employees without a desk), but I just want to be able to change the number of desks.
I can calculate the number of empty desks (or lack of desks) through the following command:
SELECT
office_id,
num_desks - num_employees AS desk_diff,
MAX(0, num_desks - num_employees) AS blank_rows_to_add
FROM offices LEFT OUTER JOIN (
SELECT office_id, COUNT(employee_id) AS num_employees
FROM employee_offices
GROUP BY office_Id
) AS num_employees_by_office ON offices.id = num_employees_by_office.office_id
Is there a way to take this number (blank_rows_to_add) and somehow utilze it to add that many blank rows (or at least the row only has the office_id/office_name) to a report showing a list of employees by office? I know this can be done with VBA but I am specifically looking for an SQL method that also doesn't include a temp table if at all possible.
Thank you.

How to check if table relationship is one to many in SSMS

I'm writing some SQL code based on my tables but don't want to miss any edge cases. I'm wondering how do you check if there's a one to many relationship between two tables in SSMS
SELECT *
FROM Houses
JOIN Addresses
on Houses.Id = Addresses.HouseId
Unfortunately the data when queried doesn't give me any insight.
What I tried to do:
Checked table dependencies but that didn't give me any insight. It shows addresses are dependencies but no relationship details.
May I ask, is it possible to determine if one to one via SSMS?
You can run a query to determine how many values each house in the two tables. Well, in this case, we'll assume it is the primary key of houses and the values of HouseId are valid ids:
SELECT num_addresses, COUNT(*)
FROM (SELECT h.Id, COUNT(a.HouseId) as num_addresses
FROM Houses h LEFT JOIN
Addresses a
ON h.Id = a.HouseId
GROUP BY h.Id
) ha
GROUP BY num_addresses;
Then interpret the results:
If the only row returned has num_addresses of 1, then you have a 1-1 relationship.
If two rows are returns with values of 0 and 1, then you have a 1/0-1 relationship.
If multiple rows are returned and the minimum is 1 then you have a 1-n.
If multiple rows are returned and the minimum value is 0, then you have a 0-n.
You could extend this for more general relationships, but this answers the question you asked here.
The relationship between tables is shown by things called 'DATABASE - DIAGRAM'
Here is document about how to create database - diagram in SSMS.
How to create database diagram SSMS
Gordon Linoff response is perfect for the helicopter view -
To get all records side by side simply select all columns from both tables side by side with the left join
SELECT Houses.*, Address.*
FROM Houses
left JOIN Addresses on Addresses.HouseId = Houses.Id
union all
SELECT Houses.*, Address.*
FROM Addresses
left JOIN Houses on Houses.Id = Addresses.HouseId
Microsoft have implemented a reduced code method to do the same but I have not explored it so could not comment
https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/outer-joins?view=sql-server-ver15
What will be apparent is you will see Houses.ID and Addresses.HouseID side by side - An empty value in either means that the relevant table does not have a record for the relevant HouseID or ID
You can then cherry pick those edge case records as you see fit

Display correct results in many to many relatonship tables

I currently have three tables.
master_tradesmen
trades
master_tradesmen_trades (joins the previous two together in a many-to-many relationship). The 'trade_id' and 'master_tradesman_id' are the foreign keys.
Here is what I need to happen. A user performs a search and types in a trade. I need a query that displays all of the information from the master_tradesmen table whose trade in the master_tradesmen_trade table matches the search. For example, if 'plumbing' is typed in the search bar (trade_id 1), all of the columns for Steve Albertsen (master_tradesman_id 6) and Brian Terry (master_tradesman_id 8) would be displayed from the master_tradesmen table. As a beginner to SQL, trying to grasp the logic of this is about to make my head explode. I'm hoping that someone with more advanced SQL knowledge can wrap their head around this much easier than I can. Note: the 'trades' column in master_tradesmen is for display purposes only, not for querying. Thank you so much in advance!
You have a catalog for the tradesmen, & another catalog for the trades.
The trades should only appear once in the trades catalog in order to make your DB more consistent
Then you have your many-to-many table which connects the trades & master tradesmen tables.
If we want to get the tradesmen according to the given trade in the input, we should first
know the id of that trade which has to be unique, so in your DB you would have something
like the img. below :
Now we can make a query to select the id of trade :
DECLARE #id_trade int = SELECT trade_id FROM trades WHERE trade_name LIKE '%plumbing%'
Once we know the trading id, we can redirect to the 'master_tradesmen_trades' table to know the name of the people how work that trade :
SELECT * FROM master_tradesmen_trades WHERE trade_id = #id_trade
You will get the following result :
You may say, 'But there is still something wrong with it, as i am still not able to see the tradesmen', this is the moment when we make an inner join:
SELECT * FROM master_tradesmen_trades trades_and_tradesmen
INNER JOIN master_tradesman tradesmen
ON tradesmen.id = trades_and_tradesmen.master_tradesmen_id
WHERE trade_id = #id_trade
IF you need to see specific columns, you can do :
SELECT first_name, last_name, city, state FROM master_tradesmen_trades trades_and_tradesmen
INNER JOIN master_tradesman tradesmen
ON tradesmen.id = trades_and_tradesmen.master_tradesmen_id
WHERE trade_id = #id_trade

Return query results where two fields are different (Access 2010)

I'm working in a large access database (Access 2010) and am trying to return records where two locations are different.
In my case, I have a large number of birds that have been observed on multiple dates and potentially on different islands. Each bird has a unique BirdID and also an actual physical identifier (unfortunately that may have changed over time). [I'm going to try addressing the changing physical identifier issue later]. I currently want to query individual birds where one or more of their observations is different than the "IslandAlpha" (the first island where they were observed). Something along the lines of a criteria for BirdID: WHERE IslandID [doesn't equal] IslandAlpha.
I then need a separate query to find where all observations DO equal where they were first observed. So where IslandID = IslandAlpha
I'm new to Access, so let me know if you need more information on how my tables/relationships are set up! Thanks in advance.
Assuming the following tables:
Birds table in which all individual birds have records with a unique BirdID and IslandAlpha.
Sightings table in which individual sightings are recorded, including IslandID.
Your first query would look something like this:
SELECT *
FROM Birds
INNER JOIN Sightings ON Birds.BirdID=Sightings.BirdID
WHERE Sightings.IslandID <> Birds.IslandAlpha
You second query would be the same but with = instead of <> in the WHERE clause.
Please provide us information about the tables and columns you are using.
I will presume you are asking this question because a simple join of tables and filtering where IslandAlpha <> ObsLoc is not possible because IslandAlpha is derived from first observation record for each bird. Pulling first observation record for each bird requires a nested query. Need a unique record identifier in Observations - autonumber should serve. Assuming there is an observation date/time field, consider:
SELECT * FROM Observations WHERE ObsID IN
(SELECT TOP 1 ObsID FROM Observations AS Dupe
WHERE Dupe.ObsBirdID = Observations.ObsBirdID ORDER BY Dupe.ObsDateTime);
Now use that query for subsequent queries.
SELECT * FROM Observations
INNER JOIN Query1 ON Observations.ObsBirdID = Query1.ObsBirdID
WHERE Observations.ObsLocID <> Query1.ObsLocID;

How can I compare two tables and delete on matching fields (not matching records)

Scenario: A sampling survey needs to be performed on membership of 20,000 individuals. Survey sample size is 3500 of the total 20000 members. All membership individuals are in table tblMember. Same survey was performed the previous year and members whom were surveyed are in tblSurvey08. Membership data can change over the year (e.g. new email address, etc.) but the MemberID data stays the same.
How do I remove the MemberID/records contained tblSurvey08 from tblMember to create a new table of potential members to be surveyed (lets call it tblPotentialSurvey09). Again the record for a individual member may not match from the different tables but the MemberID field will remain constant.
I am fairly new at this stuff but I seem to be having a problem Googling a solution - I could use the EXCEPT function but the records for the individuals members are not necessarily the same from one table to next - just the MemberID may be the same.
Thanks
SELECT
* (replace with column list)
FROM
member m
LEFT JOIN
tblSurvey08 s08
ON m.member_id = s08.member_id
WHERE
s08.member_id IS NULL
will give you only members not in the 08 survey. This join is more efficient than a NOT IN construct.
A new table is not such a great idea, since you are duplicating data. A view with the above query would be a better choice.
I apologize in advance if I didn't understand your question but I think this is what you're asking for. You can use the insert into statement.
insert into tblPotentialSurvey09
select your_criteria from tblMember where tblMember.MemberId not in (
select MemberId from tblSurvey08
)
First of all, I wouldn't create a new table just for selecting potential members. Instead, I would create a new true/false (1/0) field telling if they are eligible.
However, if you'd still want to copy data to the new table, here's how you can do it:
INSERT INTO tblSurvey00 (MemberID)
SELECT MemberID
FROM tblMember m
WHERE NOT EXISTS (SELECT 1 FROM tblSurvey09 s WHERE s.MemberID = m.MemberID)
If you just want to create a new field as I suggested, a similar query would do the job.
An outer join should do:
select m_09.MemberID
from tblMembers m_09 left outer join
tblSurvey08 m_08 on m_09.MemberID = m_08.MemberID
where
m_08.MemberID is null