How can I implement SQL INTERSECT and MINUS operations in MS Access - sql

I have researched and haven't found a way to run INTERSECT and MINUS operations in MS Access. Does any way exist

INTERSECT is an inner join. MINUS is an outer join, where you choose only the records that don't exist in the other table.
INTERSECT
select distinct
a.*
from
a
inner join b on a.id = b.id
MINUS
select distinct
a.*
from
a
left outer join b on a.id = b.id
where
b.id is null
If you edit your original question and post some sample data then an example can be given.
EDIT: Forgot to add in the distinct to the queries.

INTERSECT is NOT an INNER JOIN. They're different. An INNER JOIN will give you duplicate rows in cases where INTERSECT WILL not. You can get equivalent results by:
SELECT DISTINCT a.*
FROM a
INNER JOIN b
on a.PK = b.PK
Note that PK must be the primary key column or columns. If there is no PK on the table (BAD!), you must write it like so:
SELECT DISTINCT a.*
FROM a
INNER JOIN b
ON a.Col1 = b.Col1
AND a.Col2 = b.Col2
AND a.Col3 = b.Col3 ...
With MINUS, you can do the same thing, but with a LEFT JOIN, and a WHERE condition checking for null on one of table b's non-nullable columns (preferably the primary key).
SELECT DISTINCT a.*
FROM a
LEFT JOIN b
on a.PK = b.PK
WHERE b.PK IS NULL
That should do it.

They're done through JOINs. The old fashioned way :)
For INTERSECT, you can use an INNER JOIN. Pretty straightforward. Just need to use a GROUP BY or DISTINCT if you have don't have a pure one-to-one relationship going on. Otherwise, as others had mentioned, you can get more results than you'd expect.
For MINUS, you can use a LEFT JOIN and use the WHERE to limit it so you're only getting back rows from your main table that don't have a match with the LEFT JOINed table.
Easy peasy.

Unfortunately MINUS is not supported in MS Access - one workaround would be to create three queries, one with the full dataset, one that pulls the rows you want to filter out, and a third that left joins the two tables and only pulls records that only exist in your full dataset.
Same thing goes for INTERSECT, except you would be doing it via an inner join and only returning records that exist in both.

No MINUS in Access, but you can use a subquery.
SELECT DISTINCT a.*
FROM a
WHERE a.PK NOT IN (SELECT DISTINCT b.pk FROM b)

I believe this one does the MINUS
SELECT DISTINCT
a.CustomerID,
b.CustomerID
FROM
tblCustomers a
LEFT JOIN
[Copy Of tblCustomers] b
ON
a.CustomerID = b.CustomerID
WHERE
b.CustomerID IS NULL

Related

What is a good way to make multiple full outer join?

I'd like to know if anyone would know an elegant and scalable method to full outer join multiple tables, given that I might want to regularly add new tables to the join?
For now my method consists in full joining table A with table B, store the result as a cte, then full joining the cte to table C, store the result as a cte2, full joining cte2 to table D... you got it.
Creating a new cte every time i want to add another table to the join is not very practical, but every other solutions i found so far have the same issue, there's always some kind of infinite looping either on ctes or in selects (like SELECT blabla FROM (SELECT blabla2 FROM..)).
Is there any way that i don't know that would help me perform this multiple full join without falling in an infinite recursive loop of ctes?
Thanks
EDIT: Sorry it seems it wasn't clear enough
When i perform a multiple full join in one query like:
SELECT
a.*, b.*, c.*
FROM
tableA a
FULL JOIN
tableB b
ON
a.id = b.id
FULL JOIN
tableC c
ON
a.id = c.id
If the id is present in tableB and tableC but not tableA, my result will create two lines where there should be one, because i joined b to a and c to a but not b to c. That's why i need to full join the result of the full join of a and b to c.
So if i have let's say five table instead of three, i need to full join the result of the full join of the result of the full join of the result of the full join... x)
This fiddle illustrates the problem.
If you want the rows from tables B and C to join, you need to accomodate the fact that maybe the data comes from table B and not A. The easiest is probably to use COALESCE.
Your join should therefore look like:
SELECT a.*, b.*, c.*
FROM tableA a
FULL JOIN tableB b ON a.id = b.id
FULL JOIN tableC c ON COALESCE(a.id, b.id) = c.id
-- FULL JOIN tableD d ON COALESCE(a.id, b.id, c.id) = d.id
-- FULL JOIN tableE e ON COALESCE(a.id, b.id, c.id, d.id) = e.id
Most databases that support FULL JOIN also support USING, which is the simplest way to do what you want:
SELECT *
FROM tableA a FULL JOIN
tableB b
USING (id) FULL JOIN
tableC c
USING (id);
The semantics of USING mean that only non-NULL values are used, if such a value is available.

join or merge two table based on id merge

I have two tables:
I am looking for the results like mentioned in the last.
I tried union (only similar col can be merged), left join, right join i am getting repeated fields in Null areas what can be other options where i can get null without column repeating
A full join would get all results from both tables.
select
A.ID,
A.ColA,
A.ColB,
B.ColC,
B.ColD
from TableA A
full join Table B on A.ID = B.ID
Here is a good post to understand joins
You can try distinct:
select distinct * from
tableA a,
tableB b
where a.id = b.id;
It will not give any duplicate tuples.

BigQuery Full outer join producing "left join" results

I have 2 tables, both of which contain distinct id values. Some of the id values might occur in both tables and some are unique to each table. Table1 has 10,910 rows and Table2 has 11,304 rows
When running a left join query:
SELECT COUNT(DISTINCT a.id)
FROM table1 a
JOIN table2 b on a.id = b.id
I get a total of 10,896 rows or 10,896 ids shared across both tables.
However, when I run a FULL OUTER JOIN on the 2 tables like this:
SELECT COUNT(DISTINCT a.id)
FROM table1 a
FULL OUTER JOIN EACH table2 b on a.id = b.id
I get total of 10,896 rows, but I was expecting all 10,910 rows from table1.
I am wondering if there is an issue with my query syntax.
As you are using EACH - it looks like you are running your queries in Legacy SQL mode.
In BigQuery Legacy SQL - COUNT(DISTINCT) function is probabilistic - gives statistical approximation and is not guaranteed to be exact.
You can use EXACT_COUNT_DISTINCT() function instead - this one gives you exact number but a little more expensive on back-end
Even better option - just use Standard SQL
For your specific query you will only need to remove EACH keyword and it should work as a charm
#standardSQL
SELECT COUNT(DISTINCT a.id)
FROM table1 a
JOIN table2 b on a.id = b.id
and
#standardSQL
SELECT COUNT(DISTINCT a.id)
FROM table1 a
FULL OUTER JOIN table2 b on a.id = b.id
I added the original query as a subquery and counted ids and produced the expected results. Still a little strange, but it works.
SELECT EXACT_COUNT_DISTINCT(a.id)
FROM
(SELECT a.id AS a.id,
b.id AS b.id
FROM table1 a FULL OUTER JOIN EACH table2 b on a.id = b.id))
It is because you count in both case the number of non-null lines for table a by using a count(distinct a.id).
Use a count(*) and it should works.
You will have to add coalesce... BigQuery, unlike traditional SQL does not recognize fields unless used explicitly
SELECT COUNT(DISTINCT coalesce(a.id,b.id))
FROM table1 a
FULL OUTER JOIN EACH table2 b on a.id = b.id
This query will now take full effect of full outer join :)

Restricting a LEFT JOIN

I have a table, let's call it "a" that is used in a left join in a view that involves a lot of tables. However, I only want to return rows of "a" if they also join with another table "b". So the existing code looks like
SELECT ....
FROM main ...
...
LEFT JOIN a ON (main.col2 = a.col2)
but it's returning too many rows, specifically ones where a doesn't have a match in b. I tried
SELECT ...
FROM main ...
...
LEFT JOIN (
SELECT a.col1, a.col2
FROM a
JOIN b ON (a.col3 = b.col3)) ON (a.col2 = main.col2)
which gives me the correct results but unfortunately "EXPLAIN PLAN" tells that doing it this way ends up forcing a full table scan of both a and b, which is making things quite slow. One of my co-workers suggested another LEFT JOIN on b, but that doesn't work because it gives me the b row when it's present, but doesn't stop returning the rows from a that don't have a match in b.
Is there any way to put the main.col2 condition in the sub-SELECT, which would get rid of the full table scans? Or some other way to do what I want?
SELECT ...
FROM ....
LEFT JOIN ( a INNER JOIN b ON .... ) ON ....
add a where (main.col2 = a.col2)
just do a join instead of a left join.
What if you created a view that gets you the "a" to "b" join, then do your left joins to that view?
Select ...
From Main
Left Join a on main.col2 = a.col2
where a.col3 in (select b.col3 from b) or a.col3 is null
you may also need to do some indexing on a.col3 and b.col3
First define your query between table "a" and "b" to make sure it is returning the rows you want:
Select
a.field1,
a.field2,
b.field3
from
table_a a
JOIN table_b b
on (b.someid = a.someid)
then put that in as a sub-query of your main query:
select
m.field1,
m.field2,
m.field3,
a.field1 as a_field1,
b.field1 as b_field1
from
Table_main m
LEFT OUTER JOIN
(
Select
a.field1,
a.field2,
b.field3
from
table_a a
JOIN table_b b
on (b.someid = a.someid)
) sq
on (sq.field1 = m.field1)
that should do it.
Ahh, missed the performance problem note - what I usually end up doing is putting the query from the view in a stored procedure, so I can generate the sub-queries to temp tables and put indexes on them. Suprisingly faster than you would expect. :-)

Need to create an expression in an outer join that only returns one row

I'm creating a really complex dynamic sql, it's got to return one row per user, but now I have to join against a one to many table. I do an outer join to make sure I get at least one row back (and can check for null to see if there's data in that table) but I have to make sure I only get one row back from this outer join part if there's multiple rows in this second table for this user.
So far I've come up with this: (sybase)
SELECT a.user_id
FROM table1 a
,table2 b
WHERE a.user_id = b.user_id
AND a.sub_id = (
SELECT min(c.sub_id)
FROM table2 c
WHERE b.sub_id = c.sub_id
)
The subquery finds the min value in the one to many table for that particular user.
This works but I fear nastiness from doing correlated subqueries when table 1 and 2 get very large.
Is there a better way? I'm trying to dream up a way to get joins to do it, but I'm not seeing it.
Also saying "where rowcount=1" or "top 1" doesn't help me, because I'm not trying to fix the above query, I'm ADDING the above to an already complex query.
In MySql you can ensure that any query returns at most X rows using
select *
from foo
where bar = 1
limit X;
Unfortunately, I'm fairly sure this is a MySQL-specific extension to SQL. However, a Google search for something like "mysql sybase limit" might turn up an equivalent for Sybase.
A few quick points:
You need to have definitive business rules. If the query returns more than one row then you need to think about why (beyond just "it's a 1:many relationship - WHY is it a 1:many relationship?). You should come up with the business solution rather than just use "min" because it gives you 1 row. The business solution might simply be "take the first one", in which case min might be the answer, but you need to make sure that's a conscious decision.
You should really try to use the ANSI syntax for joins. Not just because it's standard, but because the syntax that you have isn't really doing what you think it's doing (it's not an outer join) and some things are simply impossible to do with the syntax that you have.
Assuming that you end up using the MIN solution, here's one possible solution without the subquery. You should test it with various other solutions to make sure that they are equivalent in outcome and to see which performs the best.
SELECT
a.user_id, b.*
FROM
dbo.Table_1 a
LEFT OUTER JOIN dbo.Table_2 b ON b.user_id = a.user_id AND b.sub_id = a.sub_id
LEFT OUTER JOIN dbo.Table_2 c ON c.user_id = a.user_id AND c.sub_id < b.sub_id
WHERE
c.user_id IS NULL
You'll need to test this to see if it's really giving what you want and you might need to tweak it, but the basic idea is to use the second LEFT OUTER JOIN to ensure that there are no rows that exist with a lower sub_id than the one found in the first LEFT OUTER JOIN (if any is found). You can adjust the criteria in the second LEFT OUTER JOIN depending on the final business rules.
How about:
select a.user_id
from table1 a
where exists (select null from table2 b
where a.user_id = b.user_id
)
Maybe your example is too simplified, but I'd use a group by:
SELECT
a.user_id
FROM
table1 a
LEFT OUTER JOIN table2 b ON (a.user_id = b.user_id)
GROUP BY
a.user_id
I fear the only other way would be using nested queries:
The difference between this query and your example is a 'sub table' is only generated once, however in your example you generate a 'sub table' for each row in table1 (but may depend on the compiler, so you might want to use query analyser to check performance).
SELECT
a.user_id,
b.sub_id
FROM
table1 a
LEFT OUTER JOIN (
SELECT
user_id,
min(sub_id) as sub_id,
FROM
table2
GROUP BY
user_id
) b ON (a.user_id = b.user_id)
Also, if your query is getting quite complex I'd use temporary tables to simplify the code, it might cost a little more in processing time, but will make your queries much easier to maintain.
A Temp Table example would be:
SELECT
user_id
INTO
#table1
FROM
table1
WHERE
.....
SELECT
a.user_id,
min(b.sub_id) as sub_id,
INTO
#table2
FROM
#table1 a
INNER JOIN table2 b ON (a.user_id = b.user_id)
GROUP BY
a.user_id
SELECT
a.*,
b.sub_id
from
#table1 a
LEFT OUTER JOIN #table2 b ON (a.user_id = b.user_id)
First of all, I believe the query you are trying to write as your example is:
select a.user_id
from table1 a, table2 b
where a.user_id = b.user_id
and b.sub_id = (select min(c.sub_id)
from table2 c
where b.user_id = c.user_id)
Except you wanted an outer join (which I think someone edited out the Oracle syntax).
select a.user_id
from table1 a
left outer join table2 b on a.user_id = b.user_id
where b.sub_id = (select min(c.sub_id)
from table2 c
where b.user_id = c.user_id)
Well, you already have a query that works. If you are concerned about the speed you could
Add a field to table2 which
identifies which sub_id is the
'first one' or
Keep track of table2's primary key in table1, or in another table