Compare 2 Tables and write RowID from Table A to Table B - sql

I have Table A with RowID, Date, Vendor and Cost, Confirmed
I have table B with RowID, Date, Vendor and Cost, Confirmed
Table A list our purchases.
Table B list the statement data from the credit card.
I would like to compare Date, Vendor and Cost in Table A with the same columns in Table B. If there is a match with those three columns, then I would like to take the RowID value from Table A and write it to the matching row in Table B under the Confirmation column.
I am very new to SQL and I am not even sure this is a reasonable expectation.
What do you think?
Is this enough detail to provide your opinion?
Thank you for any help you can give.
Currently I am using an Outer Right Join to get all the rows that do NOT have a match. What I really need is the opposite.

It might help to know what database engine you are using...
My answers are going to relate to MS SQL Server, but much SQL Syntax is the same...
To answer your first question, I would write something like:
Update TableB Set Confirmation = TableA.RowID From TableA
Where TableA.Vendor = TableB.Vendor
And TableA.Cost = TableB.Cost
And TableA.Date = TableB.Date
I would use aliases, but I left them out to hopefully make it easier to understand.
To answer your second question, you can specify an INNER JOIN which is the opposite of an OUTER JOIN and as you mentioned, what you are looking for, as it will return ALL rows that Match and exclude the rest.

You can do this with this query
UPDATE
B
SET
Confirmation = A.RowID
FROM
TableA A
INNER JOIN TableB B
ON B.Vendor = A.Vendor
AND B.Cost = A.Cost
AND B.Date = A.Date
Basically we do an inner join to keep the intersection (the records that match) of the two tables. and update the records that coincided from table B with the id of table A

Related

SQL Query returns more

I'm having a bit of a problem with a SQL Query that returns too many results. I'm fairly new to SQL so please bear with me.
Please see the following:
Table Structures
The Query that I use looks like:
SELECT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
I just want the results from Table_B and not what it's giving me.
Please explain this to me as I have spent 3 days on it non-stop.
What am I missing?
You want data from TABLE_B? Then select from it only and have the conditions on the other tables in your where clause.
The inner joins on the other tables serve as existence tests, I assume? Don't do that. You'd only multiply your records, just as you are doing now, only to have to dismiss duplicates later. That can cause bad performance on large tables and errors in more complicated queries. Use EXISTS or IN instead.
select *
from table_b
where item_status <> 'C'
and (common_id, seq_3c) in
(
select common_id, seq_3c
from table_a
where checklist_status = 'I'
and admin_function = 'ADMA'
and checklist_cd = 'APPL'
)
and common_id in
(
select EMPLID
from table_c
where admit_term = '2171'
and institution = 'SOMEWHERE'
);
SELECT DISTINCT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
This should be easy to understand without looking at all your tables and output.
Suppose you join two tables, A and B, on a column id. You only want the columns from table B, and in table B the `id' column is a unique identifier.
Even so, if in table A an id (the same id) appears five times, the join will have five rows for that id. Then you just select the columns from table B, so it will look like you got the same row five different times.
Perhaps you don't really need a join? What is your underlying problem you are trying to solve?
It's hard to answer this question without more information about why you're executing these joins. I can explain why you're getting the results you're getting, and hopefully that will allow you to solve the problem yourself.
You start, in your FROM clause, with table A. You join this table with table B on matching COMMON_ID, which, based on the tables you provide, returns three matches for the one record you have in table A. This increases your result set size to three records. Next, you join these three records with table C, on matching ID. Because all ID's are, in fact, identical, this returns nine matches for every record in your current result set: you now have 9 x 3 = 27 records in your result set.
Finally, the WHERE clause comes into effect. This clause excludes 6 out of 9 records in table C, so you have 3 of those records left. Your final result set is therefore 1 (table A) x 3 (table B) x 3 (table C) = 9 records.

Having issues using a Right Join in MS Access 2010 with two tables that have the same fields

I have two tables, Table A and Table B. Each table have 4 fields, the name of the fields are the same for both. Both tables are extracted from other tables, and each record acts as a primary key.
I want to write a query in MS Access 2010 that gets the data unique to Table B and not shared with Table A. I am using the following image as a reference, and it looks like I need to do a Right Join.
Hello. There is something not right with my SQL, I've tested it and I am getting the incorrect result. Below is the closest I've gotten:
SELECT DISTINCT TableB.*
FROM TableB RIGHT JOIN TableA ON (TableB.Field1 = TableA.Field1) AND (TableB.Field2 = TableA.Field2) AND (TableB.Field3 = TableA.Field3) AND (TableB.Field4 = TableA.Field4)
WHERE (((TableA.Field1) Is Null));
I think it would be clearer for you to use not exists:
select tableb.*
from tableb
where not exists (select 1
from tablea
where (TableB.Field1 = TableA.Field1) AND (TableB.Field2 = TableA.Field2) AND (TableB.Field3 = TableA.Field3) AND (TableB.Field4 = TableA.Field4)
);
Your use of RIGHT JOIN is incorrect. As phrased, you want a LEFT JOIN. That is, you want to keep all rows in the first table (the "left" table in the JOIN) regardless of whether or not a match exists in the second table. However, the NOT EXISTS does the same thing and the logic is a bit clearer.
You want to have right join if tablea is in your select statement, but as you have
SELECT DISTINCT TableB.*
you may want to have a left join instead. My suggestion would be changing your code from right to left join.
TableB acts like table A from venn diagrams above.

Getting way more results than expected in SQL left join query

My code is such:
SELECT COUNT(*)
FROM earned_dollars a
LEFT JOIN product_reference b ON a.product_code = b.product_code
WHERE a.activity_year = '2015'
I'm trying to match two tables based on their product codes. I would expect the same number of results back from this as total records in table a (with a year of 2015). But for some reason I'm getting close to 3 million.
Table a has about 40,000,000 records and table b has 2000. When I run this statement without the join I get 2,500,000 results, so I would expect this even with the left join, but somehow I'm getting 300,000,000. Any ideas? I even refered to the diagram in this post.
it means either your left join is using only part of foreign key, which causes row multiplication, or there are simply duplicate rows in the joined table.
use COUNT(DISTINCT a.product_code)
What is the question are are trying to answer with the tsql?
instead of select count(*) try select a.product_code, b.product_code. That will show you which records match and which don't.
Should also add a where b.product_code is not null. That should exclude the records that don't match.
b is the parent table and a is the child table? try a right join instead.
Or use the table's unique identifier, i.e.
SELECT COUNT(a.earned_dollars_id)
Not sure what your datamodel looks like and how it is structured, but i'm guessing you only care about earned_dollars?
SELECT COUNT(*)
FROM earned_dollars a
WHERE a.activity_year = '2015'
and exists (select 1 from product_reference b ON a.product_code = b.product_code)

SQL Perfomance : Subselects in joins or direct joins?

I have got a question on the performance of the below tables\
Table A -- Has only 5 customer ID's(5 Rows 1 column)
Table B -- Is the master base for all Customer's and their information.(1 Million Rows and 500 Columns)
Query 1:-
Select A.*,
B.Age
from A
left join B
on A.Customer_id = B.Customer_id;
Query 2:-
Select a.*,
B.Age
from A
left join
(select Customer_id,age from B) C
on A.Customer_id = C.Customer_id;
The main question of performance here is because of the presence of 500 columns in Table B.
I feel the 2nd Query is better as SQL wont have to create a temporary table during the join containing all columns from table B.
Please let me know if this is wrong?
I feel the 2nd Query is better as SQL wont have to create a temporary table during the join containing all columns from table B.
You can tell whether Oracle does create a temporary table during the execution or not from the explain plan. You should also consider whether the Oracle kernel developers would not have got round such an obvious performance problem if it existed.
As it happens, there will be no temporary table, and there is nothing wrong with your first query. There is almost never a need to manipulate the query for performance reasons -- write queries that are the best encapsulation of the logic you require.
CREATE INDEX index_name ON table_b (customer_id)
then use
Select a.*,
B.Age
from A
left join (select Customer_id,
age
from B) C
on A.Customer_id = C.Customer_id;
500 columns is rather extensive.
Maybe you can create an index like:
CREATE INDEX index_name
ON table_b (customer_id,
age
);
sub query in select is faster than using join (no matter if direct join or sub select)
select
a.*,
(select b.age
from b
where b.customer_id = a.customer_id)
from a
note:
it behaves like outer join (return empty field in age if customer_id from b doesn't exists in a)
the sub query should return only one row from b per row from a.

How to efficiently retrieve data in one to many relationships

I am running into an issue where I have a need to run a Query which should get some rows from a main table, and have an indicator if the key of the main table exists in a subtable (relation one to many).
The query might be something like this:
select a.index, (select count(1) from second_table b where a.index = b.index)
from first_table a;
This way I would get the result I want (0 = no depending records in second_table, else there are), but I'm running a subquery for each record I get from the database. I need to get such an indicator for at least three similar tables, and the main query is already some inner join between at least two tables...
My question is if there is some really efficient way to handle this. I have thought of keeping record in a new column the "first_table", but the dbadmin don't allow triggers and keeping track of it by code is too risky.
What would be a nice approach to solve this?
The application of this query will be for two things:
Indicate that at least one row in second_table exists for a given row in first_table. It is to indicate it in a list. If no row in the second table exists, I won't turn on this indicator.
To search for all rows in first_table which have at least one row in second_table, or which don't have rows in the second table.
Another option I just found:
select a.index, b.index
from first_table a
left join (select distinct(index) as index from second_table) b on a.index = b.index
This way I will get null for b.index if it doesn' exist (display can finally be adapted, I'm concerned on query performance here).
The final objective of this question is to find a proper design approach for this kind of case. It happens often, a real application culd be a POS system to show all clients and have one icon in the list as an indicator wether the client has open orders.
Try using EXISTS, I suppose, for such case it might be better then joining tables. On my oracle db it's giving slightly better execution time then the sample query, but this may be db-specific.
SELECT first_table.ID, CASE WHEN EXISTS (SELECT * FROM second_table WHERE first_table.ID = second_table.ID) THEN 1 ELSE 0 END FROM first_table
why not try this one
select a.index,count(b.[table id])
from first_table a
left join second_table b
on a.index = b.index
group by a.index
Two ideas: one that doesn't involve changing your tables and one that does. First the one that uses your existing tables:
SELECT
a.index,
b.index IS NOT NULL,
c.index IS NOT NULL
FROM
a_table a
LEFT JOIN
b_table b ON b.index = a.index
LEFT JOIN
c_table c ON c.index = a.index
GROUP BY
a.index, b.index, c.index
Worth noting that this query (and likely any that resemble it) will be greatly helped if b_table.index and c_table.index are either primary keys or are otherwise indexed.
Now the other idea. If you can, instead of inserting a row into b_table or c_table to indicate something about the corresponding row in a_table, indicate it directly on the a_table row. Add exists_in_b_table and exists_in_c_table columns to a_table. Whenever you insert a row into b_table, set a_table.exists_in_b_table = true for the corresponding row in a_table. Deletes are more work since in order to update the a_table row you have to check if there are any rows in b_table other than the one you just deleted with the same index. If deletes are infrequent, though, this could be acceptable.
Or you can avoid join altogether.
WITH comb AS (
SELECT index
, 'N' as exist_ind
FROM first_table
UNION ALL
SELECT DISTINCT
index
, 'Y' as exist_ind
FROM second_table
)
SELECT index
, MAX(exist_ind) exist_ind
FROM comb
GROUP BY index
The application of this query will be for two things:
Indicate that at least one row in second_table exists for a given row in first_table. It is to indicate it in a list.
To search for all rows in first_table which have at least one row in second_table.
Here you go:
SELECT a.index, 1 as c_check -- 1: at least one row in second_table exists for a given row in first_table
FROM first_table a
WHERE EXISTS
(
SELECT 1
FROM second_table b
WHERE a.index = b.index
);
I am assuming that you can't change the table definitions, e.g. partitioning the columns.
Now, to get a good performance you need to take into account other tables which are getting joined to your main table.
It all depends on data demographics.
If the other joins will collapse the rows by high factor, you should consider doing a join between your first table and second table. This will allow the optimizer to pick best join order , i.e, first joining with other tables then the resulting rows joined with your second table gaining the performance.
Otherwise, you can take subquery approach (I'll suggest using exists, may be Mikhail's solution).
Also, you may consider creating a temporary table, if you need such queries more than once in same session.
I am not expert in using case, but will recommend the join...
that works even if you are using three tables or more..
SELECT t1.ID,t2.name, t3.date
FROM Table1 t1
LEFT OUTER JOIN Table2 t2 ON t1.ID = t2.ID
LEFT OUTER JOIN Table3 t3 ON t2.ID = t3.ID
--WHERE t1.ID = #ProductID -- this is optional condition, if want specific ID details..
this will help you fetch the data from Normalized(BCNF) tables.. as they always categorize data with type of nature in separate tables..
I hope this will do...