I have two tables that I would like to query, tableA has ~53_000 rows while tableB has ~530M rows.
SELECT
b.some_field AS field,
a.*
FROM tableA a -- 53_462
LEFT JOIN tableB b -- 527_795_032
ON a.user_id = b.user_id
AND a.numeric_field >= b.numeric_field
AND a.numeric_field <= b.other_numeric_field
;
This kills the query engine because the right hand side is much bigger than the left, so I think for every row on the left it has to query the right.
In such case (size of right hand side being much bigger than left) what is the best to do:
I am thinking about two possibilities:
switching up sides and using right join
creating a potentially much smaller table by querying the rows that exist in the right hand side and join that table
I have ended up creating a temp table that has only the matching rows.
CREATE TABLE b_temp
....
IN (
SELECT
DISTINCT user_id
FROM
a
)
This allows me to filter down b table to a reasonable size.
Related
I have two tables (A & B) my objective is to have some columns from A left joined with a few columns of B (both tables have a LOT of columns)
is it faster to :
A) Select A -> left join -> subselect B:
(selecting only the desired columns BEFORE the join)
SELECT * FROM (
SELECT A.col_1,A.col_2,A.col3,A.col_b FROM A
LEFT JOIN (
SELECT B.col_1,B.col_2,B.col_a FROM B) B_temp
ON A.col_b = B_temp.col_a
B) Select A -> left join -B:
(selecting only the desired columns AFTER the join)
SELECT A.col_1,A.col_2,A.col3,B.col_1,B.col_2 FROM A
LEFT JOIN B
ON A.col_b = B_temp.col_a
My gut tells me even tho the second option is way more readable, it might be worse since it first aglutinates everything moving a lot of data around. My consideration for this is:
If the left join returns many results the simple-trivial approach (option B) might have to carry all these extra unecessary columns
Am I going in the right-way towards optimizing this sql query ?
Unless your SQL software is old and moldy, its query planner will handle your two example queries the same way.
I am incredibly new to SQL and am trying to create a view for a pizza store database. The sides ordered table and the sides names table have to be separate but need a view that combines them.
This is the code I have entered,
CREATE VIEW ordered_sides_view
AS
SELECT
ordered_side_id, side.side_id, side_name, number_ordered,
SUM(number_ordered * price) AS 'total_cost'
FROM
ordered_side
FULL JOIN
side ON ordered_side.side_id = side.side_id
GROUP BY
ordered_side_id, side.side_id, side_name, number_ordered;
The problem is that this is the resulting table.
Screenshot of view table:
How do I get the names to match the ordered sides?
You fail to understand what a FULL JOIN and an INNER JOIN operation does.
FULL JOIN returns at least every row from each table (plus any extra values from the ON clause).
INNER JOIN returns only matching row sets based on the ON clause.
OUTER JOIN returns every matching row set PLUS the side of the join that the OUTER JOIN is on (LEFT OUTER JOIN vs RIGHT OUTER JOIN).
In your picture, you can clearly see that there are no rows that match from the tables ordered_side and side...
That is why switching to an INNER JOIN returns zero rows...there are no matches on the COLUMNS YOU CHOSE TO USE.
Why in your SELECT operator do you have this:
SELECT ordered_side_id, side.side_id, side_name, number_ordered,
while your ON clause has this:
side ON ordered_side.side_id = side.side_id
ordered_side_id !=ordered_side.side_id
Investigate your columns and fix your JOIN clause to match the correct columns.
P.S. I like how you structure your queries. Very nice and what an
expert does! It makes reading MUCH, MUCH easier. :)
One suggestion I might add is structure your columns in the SELECT statement in its own row:
SELECT ordered_side_id
, side.side_id
, side_name
, number_ordered
, SUM(number_ordered * price) AS Total_Cost --or written [Total_Cost]/'Total_Cost'
FROM ordered_side
FULL JOIN side ON ordered_side.ordered_side_id = side.side_id
GROUP BY ordered_side_id
, side.side_id
, side_name
, number_ordered;
I joined a new company that uses SAS Enterprise Guide.
I have 2 tables, table A has 100 row, and table B has over 30M rows (50-60 columns).
I tried to do a right join from A (100) to B (30M), it took over 2 hours and no result come back. I want to ask, will it help if I do a left join? I used the GUI and created the following query.
30M Record <- 100 Record ?
or
100 Record -> 30M Record ?
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_CASE_NUMBER AS
SELECT t2.EMPGRPCOM,
t2.SEQINVNUM,
t2.SBSID,
t2.SBSLASTNAME,
t2.SBSFIRSTNAME,
t2.PMTDUEDATE,
t2.PREMAMT,
t2.ITEMDESC,
t2.EFFDATE,
t2.PAYAMT,
t2.MCAIDRATECD,
t2.REBILLIND,
t2.BILLTYPE
FROM WORK.'CASE NUMBER'n t1
LEFT JOIN DW.BILLING t2 ON (t1.CaseNumber = t2.SBSID)
WHERE t2.LOB = 'MD' AND t2.PMTDUEDATE BETWEEN '1Jan2015:0:0:0'dt AND '31Dec2017:0:0:0'dt AND t2.SITEID = '0001';
QUIT;
Left join and Right join, all other things aside, are equivalent - if you implement them the same way, anyway. I.E.,
select a.*
from a
left join
b
on a.id=b.id
;
vs
select a.*
from b
right join
a
on b.id=a.id
;
Same exact query, no difference, same time used. SQL is an interpreted language, meaning the SQL interpreter looks at what you send it and figures out what the best way to do it is - so it sees both queries and knows in both cases to do the same thing.
You can read about this in all sorts of articles, this one is a good starting point, or if that link ages just search for "right join vs left join".
Now, what you might want to consider is writing this in a different way, namely not using SQL; this kind of query SQL should be good at but sometimes isn't for some reason. I would write it as a hash table search, where the smaller case_number dataset is loaded to memory, then data step iterate over the larger table and check if it's found in the smaller dataset - if so, then great, return it.
I'd also think about whether left/right join is what you want, vs. inner join. Seems to me that if you're returning solely t2 values, right/left join isn't correct (when t1 is the "primary"): you'll just get empty rows for the non-matches. Either return a t1 variable, or use inner join.
I have two tables, Table A and Table B. Each table have 4 fields, the name of the fields are the same for both. Both tables are extracted from other tables, and each record acts as a primary key.
I want to write a query in MS Access 2010 that gets the data unique to Table B and not shared with Table A. I am using the following image as a reference, and it looks like I need to do a Right Join.
Hello. There is something not right with my SQL, I've tested it and I am getting the incorrect result. Below is the closest I've gotten:
SELECT DISTINCT TableB.*
FROM TableB RIGHT JOIN TableA ON (TableB.Field1 = TableA.Field1) AND (TableB.Field2 = TableA.Field2) AND (TableB.Field3 = TableA.Field3) AND (TableB.Field4 = TableA.Field4)
WHERE (((TableA.Field1) Is Null));
I think it would be clearer for you to use not exists:
select tableb.*
from tableb
where not exists (select 1
from tablea
where (TableB.Field1 = TableA.Field1) AND (TableB.Field2 = TableA.Field2) AND (TableB.Field3 = TableA.Field3) AND (TableB.Field4 = TableA.Field4)
);
Your use of RIGHT JOIN is incorrect. As phrased, you want a LEFT JOIN. That is, you want to keep all rows in the first table (the "left" table in the JOIN) regardless of whether or not a match exists in the second table. However, the NOT EXISTS does the same thing and the logic is a bit clearer.
You want to have right join if tablea is in your select statement, but as you have
SELECT DISTINCT TableB.*
you may want to have a left join instead. My suggestion would be changing your code from right to left join.
TableB acts like table A from venn diagrams above.
Given the below scenario:
Table A has 1000 rows and Table B has 5000 rows.
Q1: Select * from Table_A Left Outer Join Table_B
ON condition
Q2: Select * from Table_B Left Outer Join Table_A
ON condition
Does this make any difference ? Would there be any performance difference in these situations?
Yes, it makes a big difference for a LEFT JOIN. The two statements are not the same, and the execution paths are likely to be different.
The first query keeps all rows in Table A, plus any matching values from Table B. So this version returns at least 1000 rows.
The second keeps all rows in Table B, plus any matching values from Table A. This is not the same thing. This version returns at least 5000 rows.
For an INNER JOIN (or FULL OUTER JOIN) then the order of the tables in the FROM clause does not affect the result set. However, depending on the optimizer it could affect how the joins are processed (I am thinking of long chains of joins where optimizers take short-cuts).
Does this make any difference ?
Yes it does. LEFT JOIN Definition: returns all rows from left table + matching rows in both table. Matching row means intersection of both tables.
So in your case, the number of rows returned will be very different.
Q1: Select * from Table_A Left Outer Join Table_B ON condition
In this case number of rows returned will be 1000 (since your tableA has 1000 rows and in left side of JOIN) plus the match (intersection between the tables)
Q2: Select * from Table_B Left Outer Join Table_A ON condition
In this case number of rows returned will be 5000 (since your tableB has 5000 rows and in left side of JOIN) plus the match (intersection between the tables)
See the visual representation of the same [Image taken from This CodeProject Post]:
The two queries will have in different results.
See W3 Schools Left Join
and go to the Try It Yourself page. The SQL can be edited for a LEFT OUTER JOIN.