How to SQL JOIN on concatenated columns in MS Access? - sql

I have 2 tables where column X and Y are concatenated to represent a unique identifier. I want to find all rows in tableB that do not exist in tableA and add them into tableC.
-------tableA-------- // tableA is a master refernce table with all names so far
|__X__|__Y__|_name__|
| 3 | 7 | Mary |
| 3 | 2 | Jaime |
-------tableB-------- // tableB is an input file with all daily names (some repeats already exist in tableA)
|__X__|__Y__|_name__|
| 2 | 5 | Smith |
| 3 | 7 | Mary |
-------tableC-------- // tableC is a temporary holding table for new names
|__X__|__Y__|_name__|
| | | |
DESIRED RESULT:
-------tableC-------- // tableB - tableA = tableC
|__X__|__Y__|_name__|
| 2 | 5 | Smith |
I want to match rows based on a concatenated X+Y value. My SQL query so far looks like this:
INSERT INTO tableC
SELECT * FROM tableA
LEFT JOIN tableB
ON tableA.X & table.B = tableB.X & tableB.Y
WHERE tableB.X & tableB.Y IS null
However, this does not give me the intended result. I cannot use EXISTS as my actual data set is very big. Could anyone give me suggestions?

I don't think the slowness is caused by exists. Your query is probably slow because you're trying to use concatenation to match multiple columns. Use and instead and make sure you have a composite index on (x,y):
This will select all unique rows in tableB that don't have the same (x,y) value in tableA. Note that any rows with the same x,y but a different name will show up in the result (i.e. 2,5,Joe would also appear). If you don't want that, then you have to group by x,y and decide which name you want in case of duplicate x,y but different name.
select distinct x,y,name
from tableB b
where not exists (
select 1 from tableA a
where a.x = b.x
and a.y = b.y
)

Related

Comparing aggregated columns to non aggregated columns to remove matches

I have two separate tables from two different databases that are performing a matching check.
If the values match I want them out of the result set. The first table (A) has multiple entries that contain the same symbol matches for the matching columns in the second table (B).
The entries in table B, if added up will ideally equal the value of one of the matching rows of A.
The tables look like below when queried separately.
Underneath the tables is what my query currently looks like. I thought if I group the columns by the symbols I could use the SUM of B to add up to the value of A which would get rid of the entries. However, I think because I am summing from B and not from A, then the A doesn't count as an aggregated column so must be included in the group by and doesn't allow for the summing to work in the way I'm wanting it to calculate.
How would I be able to run this query so the values in B are all summed up. Then, if matching to the symbol/value from any of the entries in A, don't get included in the result set?
Table A
| Symbol | Value |
|--------|-------|
| A | 1000 |
| A | 1000 |
| B | 1440 |
| B | 1440 |
| C | 1235 |
Table B
| Symbol | Value |
|--------|-------|
| A | 750 |
| A | 250 |
| B | 24 |
| B | 1416|
| C | 1874|
SELECT DBA.A, DBB.B
FROM DatabaseA DBA
INNER JOIN DatabaseB DBB on DBA.Symbol = DBB.Symbol
and DBA.Value != DBB.Value
group by DBA.Symbol, DBB.Symbol, DBB.Value
having SUM(DBB.Value) != DBA.Value
order by Symbol, Value
Edited to add ideal results
Table C
| SymbolB| ValueB| SymbolA | ValueA |
|--------|-------|---------|--------|
| C | 1874 | C | 1235 |
Wherever B adds up to A remove both. If they don't add, leave number inside result set
I will use CTE and use this common table expression (CTE) to search in Table A. Then join table A and table B on symbol.
WITH tDBB as (
SELECT DBB.Symbol, SUM(DBB.Value) as total
FROM tableB as DBB
GROUP BY DBB.Symbol
)
SELECT distinct DBB.Symbol as SymbolB, DBB.Value as ValueB, DBA.Symbol as SymbolA, DBA.Value as ValueA
FROM tableA as DBA
INNER JOIN tableB as DBB on DBA.Symbol = DBB.Symbol
WHERE DBA.Symbol in (Select Symbol from tDBB)
AND NOT DBA.Value in (Select total from tDBB)
Result:
|symbolB |valueB |SymbolA |ValueA |
|--------|-------|--------|-------|
| C | 1874 | C | 1235 |
with t3 as (
select symbol
,sum(value) as value
from t2
group by symbol
)
select *
from t3 join t on t.symbol = t3.symbol and t.value != t3.value
symbol
value
Symbol
Value
C
1874
C
1235
Fiddle

Difference between "NOT IN table.col" and "NOT IN SELECT col FROM table"

A pretty basic question. But what is the difference between
SELECT t.col
FROM table t, other_table o
WHERE t.col NOT IN o.col
and
SELECT col
FROM table
WHERE col NOT IN (SELECT col FROM other_table)
Semantically this sounds pretty equal to me, but the first one creates duplicates. What am I understanding wrong?
The first one won't even run in most RDBMS, but in oracle it returns every combination of records except where t.col = o.col, you'd see this if you added o.col to your SELECT
The latter query returns records from table that don't share the col value with any records in other_table.
Best illustrated by example:
Table1
| ANIMAL |
|--------|
| dog |
| cat |
| horse |
Table2
| ANIMAL |
|--------|
| dog |
| fish |
Queries:
SELECT t."animal",o."animal"
FROM Table1 t, Table2 o
WHERE t."animal" NOT IN o."animal"
| ANIMAL | ANIMAL2 |
|--------|---------|
| cat | dog |
| horse | dog |
| dog | fish |
| cat | fish |
| horse | fish |
SELECT t."animal"
FROM Table1 t
WHERE t."animal" NOT IN (SELECT o."animal" FROM Table2 o)
| ANIMAL |
|--------|
| horse |
| cat |
Demo: SQL Fiddle
Basically, you've got a cartesian product in the first query which would return every combination of records from the two tables, but your WHERE criteria filters out one of them. The second query has no JOIN, implicit/explicit, it's just taking records from one table and filtering based on criteria that happens to draw from another table.
As far as I know, the query (slightly modified):
SELECT t.col
FROM table t, other_table o
WHERE t.col <> o.col
makes a cartesian product, then filters it.
The below example might not be the exact process that takes place, but it might give an abstract overview of the situation.
If in table table you would have following rows:
col
----
A
B
and in table other_table there would be:
col
---
B
E
cartesian product (FROM table t, other_table o) of the two tables query would probably be:
table.col other_table.col
---------------------------
A B
A E
B B
B E
Then, applying the WHERE t.col <> o.col clause the above would be filtered, giving the results
table.col other_table.col
---------------------------
A B
A E
B E
Since in the query result set, there is only table.col chosen for the output, the final result contains A value duplicates:
table.col
---------
A
A
B
I hope it could help you some way.
# UPDATE
As for the query:
SELECT col
FROM table
WHERE col NOT IN (SELECT col FROM other_table)
Since there is no join, only the row set from the table table is taken into account when building the result.
As far as I understand well, the condition WHERE col NOT IN (SELECT col FROM other_table) is evaluated against each row from the table.
Column table.col is checked whether it belongs to the result set returned by the subquery taking the data from other_table. If it validates to TRUE, then, it's taken into result set, if not, it's excluded from it.
Summing it up, I think that the first query doubles the table.col values only because of the preparing phase, where the tables are joined (merged) together, thus second query takes to the result set only records from table using other_table only for validation purposes. That is implicated from the query structure - if I'm right of course.

Join tables with unknown number of rows without repeating column that is joined by

Here's my quandary:
I need to join all columns of two tables based on a primary key, but I don't want to repeat the primary key in the results.
The second table has the primary key and then unknown number and names of columns.
So essentially I want
SELECT * (except for b.PK) FROM
TableA a
JOIN TableB b ON a.PK = b.PK
The obvious solution would be to select all columns explicitly from table a except for a.PK, but let's say that I don't know the number or names of columns in table a either (except I know it has the PK).
So to sum:
How do I join two tables by their PKs, where I don't know the rest of their columns explicitly, and without repeating the PK in the results?
EDIT: (Using T-SQL with SQL Server)
Something like SELECT * except column foo FROM ... doesn't exist. But you can use a natural join, which eliminates redundant columns. You haven't mentioned your RDBMS, so here's an explanation from the MySQL manual. A natural join is standard SQL though.
The columns of a NATURAL join or a USING join may be different from
previously. Specifically, redundant output columns no longer appear,
and the order of columns for SELECT * expansion may be different from
before.
Consider this set of statements:
CREATE TABLE t1 (i INT, j INT);
CREATE TABLE t2 (k INT, j INT);
INSERT INTO t1 VALUES(1,1);
INSERT INTO t2 VALUES(1,1);
SELECT * FROM t1 NATURAL JOIN t2;
SELECT * FROM t1 JOIN t2 USING (j);
Previously, the statements produced this output:
+------+------+------+------+
| i | j | k | j |
+------+------+------+------+
| 1 | 1 | 1 | 1 |
+------+------+------+------+
+------+------+------+------+
| i | j | k | j |
+------+------+------+------+
| 1 | 1 | 1 | 1 |
+------+------+------+------+
In the first SELECT statement, column j appears in both tables and
thus becomes a join column, so, according to standard SQL, it should
appear only once in the output, not twice. Similarly, in the second
SELECT statement, column j is named in the USING clause and should
appear only once in the output, not twice. But in both cases, the
redundant column is not eliminated. Also, the order of the columns is
not correct according to standard SQL.
Now the statements produce this output:
+------+------+------+
| j | i | k |
+------+------+------+
| 1 | 1 | 1 |
+------+------+------+
+------+------+------+
| j | i | k |
+------+------+------+
| 1 | 1 | 1 |
+------+------+------+
The redundant column is eliminated and the column order is correct
according to standard SQL

SQL Join question

I'm a SQL newbie, so please forgive the ignorance :)
Basically, I'm wondering what would be a good way of 'joining' 2 tables A and B wherein I just want to check if certain cases in A are in B. The thing is, Not all entries in A need to have matches in B, just a few. For example, Table A
merchant_id | tablet_id | address
33232 | 1 | 83 abs
94732 | 2 | 92 bcu
47373 | 3 | dkid
48238 | 3 | kdid
has joins with other tables in a query. In this same query, I want to implement a condition wherein if tablet_id in B matches with that of A, then to ignore those cases.
merchant | tablet_id | incentive?
33232 | 1 | Yes
67382 | 2 | No
Like I said, A and B only have a few cases in common. I tried a query with a JOIN between A & B and got nothing returned since a join might not be possible if there are no intersecting values between A & B. I'm just looking to implement an IF condition kind of thing.
Hopefully I was articulate. Any help would be appreciated!
SELECT * FROM `A` WHERE `tablet_id` NOT IN (SELECT `tablet_id` FROM `B`)
SELECT
*
FROM
A LEFT JOIN B
ON A.tablet_id = B.tablet_id
WHERE
B.tablet_id is null
You may be looking for the OUTER JOIN.
SELECT *
FROM TableA
LEFT OUTER JOIN TableB ON TableA.tablet_id = TableB.tabletID
This will return all rows from table A, and join rows from Table B where they meet the criteria in the ON clause. If no row exists in Table B for a row in Table A, the Table B column values in the results will be NULL.

SQL inner join two tables with the same column names

I have two tables with a variable amount of columns. (I don't know how many columns or what there names will be) for example Table A and Table B.
TableA:
ID | B_ID | {variable}
TableB
ID | {variable}
Query:
SELECT TableA.*, TableB.* FROM TableA INNER JOIN TableB ON TableA.B_ID= TableB.id;
When TableA and TableB both have a column with a same name I can't distinguish between the two different columns. For example of both tables has the column "Name" this query would result in :
ID | ID | B_ID | NAME | NAME |
1 | 35 | 35 | bob | jim |
What I am looking for is a way to differentiate between the two tables. Preferably with a prefex for the column names such as.
TableA_ID | TableB_ID | TableA_B_ID | TableA_NAME | TableB_NAME |
1 | 35 | 35 | bob | jim |
I know of the "AS" keyword but the problem is that I don't know what the column names are going to be before hand. (I don't know if TableA or TableB are going to have the column Name)
So my question is
How do you differentiate the columns between the two tables with a INNER JOIN when the tables may have the same column names ?
I am using SQLite3.
Your result set (given your query) should have all of the TableA columns followed by all the TableB colums, so when you get to the second ID colum, you know you're into the TableB data.
That said, it is would seem odd to me that you're querying all the data out of two tables about which you know functionally nothing...
This is admittedly a hack solution, but this:
SELECT TableA.*, "#", TableB.*
FROM TableA INNER JOIN TableB ON TableA.B_ID= TableB.id;
Would produce a list of results which would be divided in two blocks, left and right of the # column.