Optimising SQL query - sql

I have a SQL query that performs an INNER JOIN on two tables having >50M rows each. I wish to reduce the time it takes to search through the join by reducing the rows that are joined based on a column present on one of the tables.
Say I have table1 with columns A,B,C and table2 with columns A,D,E. I wish to join based on column A but only those rows that have value 'e' for column E of table 2.
My SQL query :
SELECT one.B, two.D
FROM table1 one
INNER JOIN table2 two WHERE two.E IN ('e')
ON one.A = two.A
WHERE one.B > 10
AND two.D IN ('...')
It gives the error :
ORA-00905: missing keyword
Where am I going wrong? How do I achieve the intended result?

SELECT one.B, two.D
FROM table1 one
INNER JOIN table2 two -- WHERE two.E IN ('e') --> shouldn't use where here
ON one.A = two.A and two.E = 'e'
WHERE one.B > 10
AND two.D IN ('...')
Comments included in the code.

As vkp pointed out, the WHERE is improperly used. Instead you could also make a subquery to include that where statement. So that:
INNER JOIN table2 two WHERE two.E IN ('e')
becomes
INNER JOIN (select * from table2 WHERE E IN ('e')) two

You could also put the condition in the Where clause
SELECT one.B, two.D
FROM table1 a
JOIN table2 b
ON b.A = a.A
WHERE a.B > 10
And b.E = 'e'
AND b.D In ('...')
EDITED to remove 2nd incorrect suggestion

Related

Join multiple Tables based on multiple criteria over one field

I need find the most efficient way to join one table, to other three, using as criteria the values on theirs [Id_Orig] fields
Consider Table1 as the one with our universe of data, having the fields Below:
Select Id_Orig, F1, F2 From Table1
The field [Id_Orig] can have only three values: 'DO', 'CC' and 'DP'. I need to join other three tables with Table1, based on those values as shown below:
Table1 left join Table_DO : only for those records that have both [Id_Orig] = 'DO'
Table1 left join Table_CC : only for those records that have both [Id_Orig] = 'CC'
Table1 left join Table_DP : only for those records that have both [Id_Orig] = 'DP'
Suppose that Table1 has 1000 records, these must remain unchanged. The idea is only to add the fields from the other respective linked tables, as shown below:
Table1.Id_Orig, Table1.F1, Table1.F2, Table_DO.*, Table_CC.*, Table_DP.*
Can anyone tell me, please, how is the best way to achieve that, and if that could be done on the 'ON' Clause after the Left Join?
Thanks in advance.
Leopoldo Fernandes
Portugal
Try something like:
SELECT t1.*, tdo.*, tcc.*, tdp*
FROM Table1 AS t1
LEFT OUTER JOIN Table_DO AS tdo ON t1.[Id_Orig] = tdo.[Id_Orig] AND t1.[Id_Orig] = 'DO'
LEFT OUTER JOIN Table_CC AS tcc ON t1.[Id_Orig] = tcc.[Id_Orig] AND t1.[Id_Orig] = 'CC'
LEFT OUTER JOIN Table_DP AS tdp ON t1.[Id_Orig] = tdp.[Id_Orig] AND t1.[Id_Orig] = 'DP'
For example, when you JOIN to Table_DO, you said you want when both tables have [Id_Orig] = 'DO' ... since the JOIN condition is the values are equal, you only need to specify one column (I choose Table1).

BigQuery Full outer join producing "left join" results

I have 2 tables, both of which contain distinct id values. Some of the id values might occur in both tables and some are unique to each table. Table1 has 10,910 rows and Table2 has 11,304 rows
When running a left join query:
SELECT COUNT(DISTINCT a.id)
FROM table1 a
JOIN table2 b on a.id = b.id
I get a total of 10,896 rows or 10,896 ids shared across both tables.
However, when I run a FULL OUTER JOIN on the 2 tables like this:
SELECT COUNT(DISTINCT a.id)
FROM table1 a
FULL OUTER JOIN EACH table2 b on a.id = b.id
I get total of 10,896 rows, but I was expecting all 10,910 rows from table1.
I am wondering if there is an issue with my query syntax.
As you are using EACH - it looks like you are running your queries in Legacy SQL mode.
In BigQuery Legacy SQL - COUNT(DISTINCT) function is probabilistic - gives statistical approximation and is not guaranteed to be exact.
You can use EXACT_COUNT_DISTINCT() function instead - this one gives you exact number but a little more expensive on back-end
Even better option - just use Standard SQL
For your specific query you will only need to remove EACH keyword and it should work as a charm
#standardSQL
SELECT COUNT(DISTINCT a.id)
FROM table1 a
JOIN table2 b on a.id = b.id
and
#standardSQL
SELECT COUNT(DISTINCT a.id)
FROM table1 a
FULL OUTER JOIN table2 b on a.id = b.id
I added the original query as a subquery and counted ids and produced the expected results. Still a little strange, but it works.
SELECT EXACT_COUNT_DISTINCT(a.id)
FROM
(SELECT a.id AS a.id,
b.id AS b.id
FROM table1 a FULL OUTER JOIN EACH table2 b on a.id = b.id))
It is because you count in both case the number of non-null lines for table a by using a count(distinct a.id).
Use a count(*) and it should works.
You will have to add coalesce... BigQuery, unlike traditional SQL does not recognize fields unless used explicitly
SELECT COUNT(DISTINCT coalesce(a.id,b.id))
FROM table1 a
FULL OUTER JOIN EACH table2 b on a.id = b.id
This query will now take full effect of full outer join :)

SQL Getting value in same row as another known value

Best way is to explain in pseudo code
How do I Get x,
When table1.activity == "some_string"
Then x = table1.line_number in that same row.
I'm doing an INNER JOIN and I'm doing checks on table 2. Basically I don't want to join that row if table1.activity == "some_string"
well it's not mentioned table2 in your pseudo code
but you could filter values in the inner join ON statement or WHERE statement of the query. It depends what you want By example, (In where Section)
SELECT * FROM table1 AS pivot
INNER JOIN table2 USING(id)
WHERE pivot.activity <>'not_want_these_kind_of_Records';
Or in ON Section
SELECT * FROM table1 AS pivot
INNER JOIN table2 AS t2 ON t2.id=pivot.id
AND t2.activity <>'not_want_these_kind_of_Records';
The second One filter the results before join to the pivot table
Regards

Hive : Checking if a string from table 1 is present in a list of strings from table 2 while joining two tables

I am trying to join on whether a string(a column from table 1) is present in list of strings(a column from table 2) in Hive QL. Can anyone please help me with the syntax.
SELECT
A.id
FROM tab1 A
inner join tab2 B
ON (
(array_contains(B.purchase_items, A.item_id) = true )
)
Above SQL does not work.
First, unless Hive QL is backwards, your query is wrong upfront:
SELECT A.ID FROM A tab1
will return nothing because you've declared table "A" as "tab1". Either reverse the Alias or correct the table alias reference: (I assume tab1 is the table name, so go with option 1)
SELECT A.ID from tab1 A
--OR
SELECT tab1.id from A tab1
Second, joins do not work based on conditional criteria, they ARE the conditional criteria. Sort of...
For example:
SELECT A.ID
FROM tab1 A
INNER JOIN tab2 B
ON A.item_id = B.purchase_item
is almost like doing a simple cross join with a WHERE condition:
SELECT A.ID
FROM tab1 A, tab2 B --better to use it straight as "FROM tab1 A cross join tab2 B"
WHERE a.item_id = b.purchase_item
You can use LEFT SEMI JOIN, which would retrieve rows from left side table with columns matched from right side table.
SELECT A.id FROM tab1 A
LEFT SEMI JOIN tab2 B
ON A.col1 = B.col1 AND <any-other-join-cond>;
Note that the SELECT and WHERE clauses can’t reference columns from the right hand table.

Outer Join with Where returning Nulls

Hi I have 2 tables. I want to list
all records in table1 which are present in
table2
all records in table2 which are not present in table1 with a where condition
Null rows will be returned by table1 in second condition but I am unable to get the query working correctly. It is only returning null rows
SELECT
A.CLMSRNO,A.CLMPLANO,A.GENCURRCODE,A.CLMNETLOSSAMT,
A.CLMLOSSAMT,A.CLMCLAIMPRCLLOSSSHARE
FROM
PAKRE.CLMCLMENTRY A
RIGHT OUTER JOIN (
SELECT
B.CLMSRNO,B.UWADVICETYPE,B.UWADVICENO,B.UWADVPREMCURRCODE,
B.GENSUBBUSICLASS,B.UWADVICENET,B.UWADVICEKIND,B.UWADVYEAR,
B.UWADVQTR,B.ISMANUAL,B.UWCLMNOREFNO
FROM
PAKRE.UWADVICE B
WHERE
B.ISMANUAL=1
) r
ON a.CLMSRNO=r.CLMSRNO
ORDER BY
A.CLMSRNO DESC;
Which OS are you using ?
Table aliases are case sensistive on some platforms, which is why your join condition ON a.CLMSRNO=r.CLMSRNO fails.
Try with A.CLMSRNO=r.CLMSRNO and see if that works
I'm not understanding your first attempt, but here's basically what you need, I think:
SELECT *
FROM TABLE1
INNER JOIN TABLE2
ON joincondition
UNION ALL
SELECT *
FROM TABLE2
LEFT JOIN TABLE1
ON joincondition
AND TABLE1.wherecondition
WHERE TABLE1.somejoincolumn IS NULL
I think you may want to remove the subquery and put its columns into the main query e.g.
SELECT A.CLMSRNO, A.CLMPLANO, A.GENCURRCODE, A.CLMNETLOSSAMT,
A.CLMLOSSAMT, A.CLMCLAIMPRCLLOSSSHARE,
B.CLMSRNO, B.UWADVICETYPE, B.UWADVICENO, B.UWADVPREMCURRCODE,
B.GENSUBBUSICLASS, B.UWADVICENET, B.UWADVICEKIND, B.UWADVYEAR,
B.UWADVQTR, B.ISMANUAL, B.UWCLMNOREFNO
FROM PAKRE.CLMCLMENTRY A
RIGHT OUTER JOIN PAKRE.UWADVICE B
ON A.CLMSRNO = B.CLMSRNO
WHERE B.ISMANUAL = 1
ORDER
BY A.CLMSRNO DESC;